JP6236807B2

JP6236807B2 - Singing voice evaluation device and singing voice evaluation system

Info

Publication number: JP6236807B2
Application number: JP2013046103A
Authority: JP
Inventors: 紀行畑
Original assignee: Yamaha Corp
Current assignee: Yamaha Corp
Priority date: 2013-03-08
Filing date: 2013-03-08
Publication date: 2017-11-29
Anticipated expiration: 2033-03-08
Also published as: JP2014174293A

Description

この発明は、歌唱音声を評価する歌唱音声評価装置および歌唱音声評価システムに関する。 The present invention relates to a singing voice evaluation device and a singing voice evaluation system for evaluating a singing voice.

従来、カラオケ装置は、歌唱者の歌唱を採点する機能を備えるものがあった。カラオケ装置における採点は、お手本となるガイドメロディと比較して、歌唱音声とガイドメロディとの類似度を得点に換算するものが一般的である。 Conventionally, some karaoke apparatuses have a function of scoring a singer's singing. The scoring in the karaoke device is generally performed by converting the similarity between the singing voice and the guide melody into a score, as compared to a guide melody as a model.

しかし、このような機械採点は、ガイドメロディと類似している場合に得点が高くなるものであり、歌唱自体の官能評価はできていない。例えば、アレンジを加えて歌唱した場合、当該アレンジにより人間が良いと感じる歌唱であってもガイドメロディと類似しないために得点が低くなる場合がある。 However, such a machine scoring increases the score when it resembles a guide melody, and sensory evaluation of the singing itself has not been made. For example, when a song is sung with an arrangement, the score may be low because it is not similar to a guide melody even if the song is perceived as good by humans.

そこで、例えば特許文献１には、複数人の聴取者による歌唱評価を行い、聴取者の平均点と機械採点と、のいずれか高い側の得点を採点結果とすることで、人間による主観的な歌唱評価を反映させるカラオケ装置が提案されている。 Therefore, for example, in Patent Document 1, a singing evaluation is performed by a plurality of listeners, and the higher score between the average score of the listeners and the machine score is used as a scoring result, so that the human subjective Karaoke devices that reflect singing evaluation have been proposed.

特開平８−１１５０９１号公報JP-A-8-115091

しかし、特許文献１の装置では、複数人の聴取者による歌唱評価を待つ必要があり、その場で結果を提示することができなかった。 However, in the apparatus of Patent Document 1, it is necessary to wait for singing evaluation by a plurality of listeners, and the result cannot be presented on the spot.

そこで、この発明は、人間による歌唱評価をその場で提示することが可能な歌唱音声評価装置を提供することを目的とする。 Therefore, an object of the present invention is to provide a singing voice evaluation device capable of presenting singing evaluation by a human on the spot.

この発明の歌唱音声評価装置は、予め歌唱音声および当該歌唱音声に対する人的評価を対応付けて事前歌唱音声データとして記憶した記憶手段と、歌唱音声を入力する歌唱音声入力手段と、前記歌唱音声入力手段で入力した現在の歌唱音声を採点する採点手段と、を備えている。そして、採点手段は、前記現在の歌唱音声と前記事前歌唱音声データとを比較し、前記現在の歌唱音声に類似する事前歌唱音声データを抽出し、当該抽出した事前歌唱音声データにおける人的評価を、採点結果に含めて出力することを特徴とする。 The singing voice evaluation apparatus according to the present invention includes a storage unit that stores in advance a singing voice and a human evaluation of the singing voice and stores the singing voice data in advance, a singing voice input unit that inputs the singing voice, and the singing voice input. Scoring means for scoring the current singing voice input by the means. And the scoring means compares the current singing voice with the pre-singing voice data, extracts pre-singing voice data similar to the current singing voice, and human evaluation in the extracted pre-singing voice data Is included in the scoring result and output.

このように、本発明の歌唱音声評価装置は、予め人的評価が対応付けられた歌唱音声を記憶しておき、現在の歌唱音声に類似する歌唱音声を抽出することで、当該類似した歌唱音声に付与されている人的評価が採点結果に反映させることができる。例えば、アレンジを加えて音程を変更して歌唱した事前歌唱音声データにおいて、人間が良いと感じて高い人的評価が得られたものが記憶されている場合、同じようなアレンジで音程を変更して歌唱すると、対応付けられた人的評価が採点結果に反映されるため、人的評価をその場で提示することができる。 As described above, the singing voice evaluation device of the present invention stores the singing voice associated with the human evaluation in advance, and extracts the singing voice similar to the current singing voice, so that the similar singing voice is obtained. Can be reflected in the scoring results. For example, in the pre-singing voice data that was sung by changing the pitch by adding an arrangement, it is possible to change the pitch in the same arrangement when the human feeling is good and a high human evaluation is obtained. Singing, the associated human evaluation is reflected in the scoring result, so that the human evaluation can be presented on the spot.

なお、類似する事前歌唱音声データは１つだけではなく、複数を抽出し、抽出したそれぞれの事前歌唱音声データの類似度に応じて、それぞれの人的評価を重み付けし、採点結果に反映させることが好ましい。 In addition, not only one similar pre-singing voice data but also a plurality are extracted, each human evaluation is weighted according to the degree of similarity of each extracted pre-singing voice data, and reflected in the scoring result Is preferred.

また、採点結果には、人的評価だけでなく、機械採点の結果も反映されることが好ましい。また、事前歌唱音声データにも、機械採点の結果が含まれ、採点結果に事前歌唱音声データに含まれている機械採点の結果が反映されることが好ましい。 The scoring results preferably reflect not only human evaluation but also machine scoring results. Moreover, it is preferable that the result of machine scoring is also included in the pre-song voice data, and the result of machine scoring included in the pre-song voice data is reflected in the scoring result.

なお、採点結果は、１曲全体の結果だけを出力してもよいし、所定の区間毎（例えばＡメロ、Ｂメロ、サビ）に出力してもよい。 As the scoring result, only the result of one song may be output, or may be output for each predetermined section (for example, A melody, B melody, chorus).

なお、本発明は、歌唱者の端末（カラオケ装置またはユーザの所有する情報処理装置）で歌唱を行い、当該歌唱者の端末がサーバの記憶手段から事前歌唱音声データをダウンロードして、上記採点処理（採点手段）を行う態様も可能である。 In addition, this invention performs a singing in a singer's terminal (a karaoke apparatus or the information processing apparatus which a user owns), the said singer's terminal downloads prior singing voice data from the memory | storage means of a server, and the said scoring process A mode of performing (scoring means) is also possible.

この発明によれば、人間による歌唱評価をその場で提示することができる。 According to this invention, singing evaluation by a human can be presented on the spot.

カラオケシステムの構成を示すブロック図である。It is a block diagram which shows the structure of a karaoke system. カラオケ装置の構成を示すブロック図である。It is a block diagram which shows the structure of a karaoke apparatus. 楽曲データの構造を示す図である。It is a figure which shows the structure of music data. 機械採点の概念を説明する図である。It is a figure explaining the concept of machine scoring. サーバ１および評価者端末４の構成を示すブロック図である。It is a block diagram which shows the structure of the server 1 and the evaluator terminal 4. 図６（Ａ）は、事前歌唱音声データを示す図であり、図６（Ｂ）は、事前歌唱音声データの一覧データを示す図であり、図６（Ｃ）は、人的評価利用採点の概念を示す図である。FIG. 6 (A) is a diagram showing pre-song audio data, FIG. 6 (B) is a diagram showing list data of pre-song audio data, and FIG. 6 (C) is a human evaluation utilization scoring system. It is a figure which shows a concept. 類似度算出の概念を示す図である。It is a figure which shows the concept of similarity calculation. 人的評価利用採点の応用例１を示す図である。It is a figure which shows the example 1 of application of human evaluation utilization scoring. 人的評価利用採点の応用例２を示す図である。It is a figure which shows the example 2 of application of human evaluation utilization scoring. 人的評価利用採点処理の動作を示すフローチャートである。It is a flowchart which shows the operation | movement of a human evaluation utilization scoring process.

図１は、本発明の歌唱音声評価装置を備えたカラオケシステムの構成を示す図である。カラオケシステムは、インターネット等のネットワーク２を介して接続されるサーバ１と、複数のカラオケ店舗３と、複数の評価者端末４と、からなる。 FIG. 1 is a diagram showing a configuration of a karaoke system provided with a singing voice evaluation apparatus of the present invention. The karaoke system includes a server 1 connected via a network 2 such as the Internet, a plurality of karaoke stores 3, and a plurality of evaluator terminals 4.

各カラオケ店舗３には、ネットワーク２に接続されるルータ等の中継機５と、中継機５を介してネットワーク２に接続される複数のカラオケ装置７が設けられている。中継機５は、カラオケ店舗３の管理室内等に設置されている。複数台のカラオケ装置７は、それぞれ個室（カラオケボックス）に１台ずつ設置されている。また、各カラオケ装置７には、それぞれリモコン９が設置されている。 Each karaoke store 3 is provided with a relay device 5 such as a router connected to the network 2 and a plurality of karaoke devices 7 connected to the network 2 via the relay device 5. The repeater 5 is installed in a management room of the karaoke store 3 or the like. A plurality of karaoke apparatuses 7 are installed in each private room (karaoke box). Each karaoke device 7 is provided with a remote controller 9.

評価者端末４は、ＰＣやスマートフォン等の情報処理端末である。評価者端末４のユーザは、サーバ１に蓄積されている事前歌唱音声データ（各カラオケ装置７で歌唱者が歌唱した結果）を評価する。本実施形態では、各カラオケ装置７で歌唱を行うと、当該歌唱者の歌唱音声がサーバ１に送信され、人的評価待ちの事前歌唱音声データとなる。評価者端末４のユーザは、この事前歌唱音声データの歌唱音声を聴き、得点を付与する。この人的評価が事前歌唱音声データに登録されてサーバ１に蓄積される。そして、後に同じ曲を歌唱した歌唱者の歌唱音声がサーバ１に送信されたとき、サーバ１は、受信した現在の歌唱データと事前歌唱音声データとを比較し、類似する事前歌唱音声データを抽出して、当該抽出した事前歌唱音声データにおける人的評価を現在の歌唱音声における採点結果に反映させる。 The evaluator terminal 4 is an information processing terminal such as a PC or a smartphone. The user of the evaluator terminal 4 evaluates the pre-song voice data (results of singing by the singer at each karaoke device 7) stored in the server 1. In this embodiment, if each karaoke apparatus 7 sings, the singing voice of the said singer will be transmitted to the server 1, and it will become prior singing voice data waiting for human evaluation. The user of the evaluator terminal 4 listens to the singing voice of the pre-singing voice data and gives a score. This personal evaluation is registered in the pre-singing voice data and accumulated in the server 1. And when the singing voice of the singer who sang the same song later is transmitted to the server 1, the server 1 compares the received current singing data with the pre-singing voice data, and extracts similar pre-singing voice data. Then, the human evaluation in the extracted prior singing voice data is reflected in the scoring result in the current singing voice.

図２は、カラオケ装置７の構成を示すブロック図である。カラオケ装置７は、装置全体の動作を制御するＣＰＵ１１、および当該ＣＰＵ１１に接続される各種構成部からなる。ＣＰＵ１１には、ＲＡＭ１２、ＨＤＤ１３、ネットワークインタフェース（Ｉ／Ｆ）１４、ＬＣＤ（タッチパネル）１５、Ａ／Ｄコンバータ１７、音源１８、ミキサ（エフェクタ）１９、ＭＰＥＧ等のデコーダ２２、表示処理部２３、操作部２５、および送受信部２６が接続されている。 FIG. 2 is a block diagram showing a configuration of the karaoke apparatus 7. The karaoke apparatus 7 includes a CPU 11 that controls the operation of the entire apparatus, and various components connected to the CPU 11. The CPU 11 includes a RAM 12, an HDD 13, a network interface (I / F) 14, an LCD (touch panel) 15, an A / D converter 17, a sound source 18, a mixer (effector) 19, a decoder 22 such as an MPEG, a display processing unit 23, an operation The unit 25 and the transmission / reception unit 26 are connected.

ＨＤＤ１３は、ＣＰＵ１１の動作用プログラムが記憶されている。ワークメモリであるＲＡＭ１２には、ＣＰＵ１１の動作用プログラムを実行するために読み出すエリア、カラオケ曲を演奏するために楽曲データを読み出すエリア、予約リストや採点結果等のデータを一時記憶するエリア、等が設定される。また、ＨＤＤ１３は、カラオケ曲を演奏するための楽曲データを記憶している。さらに、ＨＤＤ１３は、モニタ２４に背景映像を表示するための映像データも記憶している。映像データは動画、静止画の両方を記憶している。楽曲データや映像データは、定期的にサーバ１（または他の配信センタ）から配信され、更新される。 The HDD 13 stores an operation program for the CPU 11. The RAM 12, which is a work memory, has an area for reading out the operation program for the CPU 11, an area for reading out song data for playing karaoke songs, an area for temporarily storing data such as a reservation list and scoring results, and the like. Is set. The HDD 13 stores music data for playing karaoke music. Further, the HDD 13 also stores video data for displaying a background video on the monitor 24. Video data stores both moving images and still images. Music data and video data are periodically distributed from the server 1 (or other distribution center) and updated.

ＣＰＵ１１は、カラオケ装置７を統括的に制御する制御部であり、機能的にシーケンサを内蔵し、カラオケ演奏を行う。また、ＣＰＵ１１は、音声信号生成処理、映像信号生成処理、機械採点処理、および人的評価利用採点処理を行う。 The CPU 11 is a control unit that comprehensively controls the karaoke apparatus 7 and functionally incorporates a sequencer to perform karaoke performance. The CPU 11 performs audio signal generation processing, video signal generation processing, machine scoring processing, and human evaluation use scoring processing.

タッチパネル１５および操作部２５は、カラオケ装置７の前面に設けられている。ＣＰＵ１１は、タッチパネル１５から入力される操作情報に基づいて、操作情報に応じた画像をタッチパネル１５上に表示し、ＧＵＩを実現する。また、リモコン９も同様のＧＵＩを実現するものである。ＣＰＵ１１は、タッチパネル１５、操作部２５、または送受信部２６を介してリモコン９から入力される操作情報に基づいて、各種の動作を行う。例えば、ユーザがタッチパネル１５、操作部２５、またはリモコン９を用いて人的評価利用採点の開始指示を行うと、ＣＰＵ１１は、人的評価利用採点処理を開始する。人的評価利用採点処理の詳細は後述する。 The touch panel 15 and the operation unit 25 are provided on the front surface of the karaoke apparatus 7. The CPU 11 displays an image corresponding to the operation information on the touch panel 15 based on the operation information input from the touch panel 15 to realize a GUI. The remote controller 9 also realizes the same GUI. The CPU 11 performs various operations based on operation information input from the remote controller 9 via the touch panel 15, the operation unit 25, or the transmission / reception unit 26. For example, when the user gives an instruction to start the human evaluation use scoring using the touch panel 15, the operation unit 25, or the remote controller 9, the CPU 11 starts the human evaluation use scoring process. Details of the human evaluation use scoring process will be described later.

次に、カラオケ演奏を行うための構成について説明する。上述したように、ＣＰＵ１１は、機能的にシーケンサを内蔵している。ＣＰＵ１１は、ＲＡＭ１２の予約リストに登録された予約曲の曲番号に対応する楽曲データをＨＤＤ１３から読み出し、シーケンサでカラオケ演奏を行う。 Next, a configuration for performing karaoke performance will be described. As described above, the CPU 11 functionally includes a sequencer. The CPU 11 reads music data corresponding to the music number of the reserved music registered in the reserved list in the RAM 12 from the HDD 13, and performs a karaoke performance with the sequencer.

楽曲データは、例えば図３に示すように、曲番号等が書き込まれているヘッダ、演奏用ＭＩＤＩデータが書き込まれている楽音トラック、ガイドメロディ用ＭＩＤＩデータが書き込まれているガイドメロディトラック、歌詞用ＭＩＤＩデータが書き込まれている歌詞トラック、バックコーラス再生タイミングおよび再生すべき音声データが書き込まれているコーラストラック、等からなっている。なお、楽曲データの形式としては、この例に限るものではない。 For example, as shown in FIG. 3, the music data includes a header in which a music number is written, a musical sound track in which performance MIDI data is written, a guide melody track in which MIDI data for guide melody is written, and lyrics It consists of a lyrics track in which MIDI data is written, a back chorus playback timing, a chorus track in which audio data to be played back is written, and the like. Note that the format of the music data is not limited to this example.

楽音トラックには、楽音を発生させる楽器の種類、タイミング、音程（キー）、強さ、長さ、定位（パン）、音響効果（エフェクト）等を示す情報が記録されている。シーケンサは、当該楽音トラックやガイドメロディトラックのデータに基づいて音源１８を制御し、カラオケ曲の楽音を発生する。 In the musical sound track, information indicating the type, timing, pitch (key), strength, length, localization (pan), sound effect (effect), etc. of the musical instrument that generates the musical sound is recorded. The sequencer controls the sound source 18 based on the data of the musical tone track and the guide melody track, and generates the musical tone of the karaoke song.

また、シーケンサは、コーラストラックの指定するタイミングでバックコーラスの音声データ（楽曲データに付随しているＭＰ３等の圧縮音声データ）を再生する。また、シーケンサは、歌詞トラックに基づいて曲の進行に同期して歌詞の文字パターンを合成し、この文字パターンを映像信号に変換して表示処理部２３に入力する。 The sequencer also reproduces the back chorus audio data (compressed audio data such as MP3 attached to the music data) at the timing designated by the chorus track. Further, the sequencer synthesizes the character pattern of the lyrics in synchronism with the progress of the song based on the lyrics track, converts the character pattern into a video signal, and inputs it to the display processing unit 23.

音源１８は、シーケンサの処理によってＣＰＵ１１から入力されたデータ（ノートイベントデータ）に応じて楽音信号（デジタル音声信号）を形成する。形成した楽音信号はミキサ１９に入力される。 The sound source 18 forms a musical sound signal (digital audio signal) according to data (note event data) input from the CPU 11 by processing of the sequencer. The formed tone signal is input to the mixer 19.

ミキサ１９は、音源１８が発生した楽音信号、コーラス音、およびマイク１６からＡ／Ｄコンバータ１７を介して入力された歌唱者の歌唱音声に対してエコー等の音響効果を付与するとともに、これらの信号をミキシングする。 The mixer 19 gives an acoustic effect such as an echo to the musical sound signal generated by the sound source 18, the chorus sound, and the singing voice of the singer input from the microphone 16 via the A / D converter 17. Mix the signal.

ミキシングされた各デジタル音声信号は、サウンドシステム（ＳＳ）２０に入力される。サウンドシステム２０は、Ｄ／Ａコンバータおよびパワーアンプを内蔵しており、入力されたデジタル信号をアナログ信号に変換して増幅し、スピーカ２１から放音する。ミキサ１９が各音声信号に付与する効果およびミキシングのバランスは、ＣＰＵ１１によって制御される。 Each mixed digital audio signal is input to a sound system (SS) 20. The sound system 20 incorporates a D / A converter and a power amplifier, converts an input digital signal into an analog signal, amplifies it, and emits sound from the speaker 21. The effect that the mixer 19 gives to each audio signal and the balance of mixing are controlled by the CPU 11.

ＣＰＵ１１は、上記シーケンサによる楽音の発生、歌詞テロップの生成と同期して、ＨＤＤ１３に記憶されている映像データを読み出して背景映像等を再生する。動画の映像データは、ＭＰＥＧ形式にエンコードされている。 The CPU 11 reads the video data stored in the HDD 13 and reproduces the background video and the like in synchronism with the generation of musical sounds by the sequencer and the generation of the lyrics telop. The video data of the moving image is encoded in the MPEG format.

ＣＰＵ１１は、読み出した背景映像の映像データをデコーダ２２に入力する。デコーダ２２は、入力されたＭＰＥＧ等のデータを映像信号に変換して表示処理部２３に入力する。表示処理部２３には、背景映像の映像信号以外に上記歌詞テロップの文字パターン等の映像が入力される。表示処理部２３は、背景映像の映像信号の上に歌詞テロップの映像をＯＳＤで合成してモニタ２４に出力する。モニタ２４は、表示処理部２３から入力された映像信号を表示する。 The CPU 11 inputs the read video data of the background video to the decoder 22. The decoder 22 converts the input data such as MPEG into a video signal and inputs it to the display processing unit 23. In addition to the video signal of the background video, the display processing unit 23 receives a video such as the character pattern of the lyrics telop. The display processing unit 23 combines the video of the lyrics telop with the OSD on the video signal of the background video and outputs it to the monitor 24. The monitor 24 displays the video signal input from the display processing unit 23.

以上の様にして、カラオケ演奏が行われる。次に、機械採点処理について説明する。機械採点処理は、歌唱者の歌唱音声を参照歌唱音声であるガイドメロディトラックと比較することによって行われる。機械採点処理は、ガイドメロディトラックのノートイベントデータ毎に、音程（ピッチ）、タイミング、音量等を比較することによって行われる。 Karaoke performances are performed as described above. Next, the machine scoring process will be described. The machine scoring process is performed by comparing the singing voice of the singer with the guide melody track that is the reference singing voice. The machine scoring process is performed by comparing the pitch (pitch), timing, volume, etc. for each note event data of the guide melody track.

すなわち、ＣＰＵ１１は、入力した歌唱音声（デジタル音声信号）をＲＡＭ１２に一時記憶し、当該歌唱音声のピッチを抽出する。ＣＰＵ１１は、抽出したピッチの値、当該ピッチが変化するタイミング、歌唱音声のレベル、等をガイドメロディトラックと比較し、得点に換算する。 That is, the CPU 11 temporarily stores the input singing voice (digital voice signal) in the RAM 12 and extracts the pitch of the singing voice. The CPU 11 compares the extracted pitch value, the timing at which the pitch changes, the level of the singing voice, and the like with the guide melody track, and converts it into a score.

例えば、歌唱音声のピッチが、所定時間以上、ガイドメロディトラックのピッチに合っていた（許容範囲に入っていた）場合には、高い得点を付与する。また、ピッチ変化のタイミングも得点に考慮される。さらに、ビブラート、抑揚、しゃくり（低い音程からなだらかに移行すること）等の技巧の有無に基づいて加点も行われる。 For example, when the pitch of the singing voice matches the pitch of the guide melody track for a predetermined time or longer (within an allowable range), a high score is given. Also, the timing of pitch change is taken into consideration in the score. Furthermore, points are also awarded based on the presence or absence of techniques such as vibrato, inflection, and sneezing (moving gently from a low pitch).

例えば、図４に示すように、ノートＡの区間においては、歌唱音声のピッチが、所定時間以上ガイドメロディトラックのピッチに合っていた（許容範囲に入っていた）ため、当該区間の得点として７０点が付与されている。一方で、ノートＢの区間においては、歌唱音声のピッチとガイドメロディトラックのピッチが大きく異なるため、当該区間の得点として低い得点（２０点）が付与されている。また、ノートＣの区間においては、基礎得点として７０点に加えてビブラートの技巧が検出されたため、合計の９０点が当該区間の得点として付与されている。 For example, as shown in FIG. 4, in the section of note A, the pitch of the singing voice matched the pitch of the guide melody track for a predetermined time or longer (it was within the allowable range), so the score for that section is 70. Points are given. On the other hand, in the section of note B, since the pitch of the singing voice and the pitch of the guide melody track are greatly different, a low score (20 points) is given as the score of the section. In addition, in the section of note C, vibrato skill was detected in addition to 70 points as the basic score, so a total of 90 points are given as the score of the section.

以上のようなノート毎の得点は、Ａメロ、Ｂメロ、サビ等の所定の区間毎（あるいは所定時間経過毎）に集計される。集計された得点は、自装置のＲＡＭ１２に一時記憶される。人的評価利用採点処理では、当該集計された得点、および歌唱音声（デジタル音声信号）がネットワークＩ／Ｆ１４を介してサーバ１に送信される。また、１曲の演奏が終了した時点でも、１曲を通した機械採点の結果、および歌唱音声（デジタル音声信号）がサーバ１に送信される。なお、機械採点は、各カラオケ装置７ではなく、サーバ１で行ってもよい。この場合、サーバ１には、歌唱音声のみが送信される。また、機械採点において、上述のような得点化する処理は、必須ではない。例えば、ガイドメロディとのピッチのずれ量、タイミングのずれ量等の情報等をサーバ１に送信し、これらの情報をサーバ１に蓄積する態様であってもよい。 The score for each note as described above is totaled for each predetermined section (or every elapse of a predetermined time) such as A melody, B melody, and chorus. The total score is temporarily stored in the RAM 12 of the own device. In the human evaluation use scoring process, the total score and the singing voice (digital voice signal) are transmitted to the server 1 via the network I / F 14. Even when the performance of one song is finished, the result of the machine scoring through one song and the singing voice (digital voice signal) are transmitted to the server 1. The machine scoring may be performed by the server 1 instead of each karaoke device 7. In this case, only the singing voice is transmitted to the server 1. Further, in the machine scoring, the process for scoring as described above is not essential. For example, a mode in which information such as a pitch shift amount with respect to the guide melody and timing shift amount is transmitted to the server 1 and the information is stored in the server 1 may be used.

次に、人的評価利用採点処理について説明する。人的評価利用採点処理は、主にサーバ１によって行われる。図５（Ａ）はサーバ１の構成を示すブロック図である。 Next, the human evaluation use scoring process will be described. The human evaluation use scoring process is mainly performed by the server 1. FIG. 5A is a block diagram showing the configuration of the server 1.

サーバ１は、ＣＰＵ５１、ＲＡＭ５２、ＨＤＤ５３、およびネットワークＩ／Ｆ５４を備えた情報処理装置である。ＣＰＵ５１は、ＨＤＤ５３に記憶されているサーバ１の動作用プログラムをＲＡＭ５２に読み出し、採点結果蓄積処理および歌唱音声比較処理を行う。 The server 1 is an information processing apparatus that includes a CPU 51, a RAM 52, an HDD 53, and a network I / F 54. The CPU 51 reads out the operation program of the server 1 stored in the HDD 53 to the RAM 52, and performs a scoring result accumulation process and a singing voice comparison process.

サーバ１のＨＤＤ５３には、事前歌唱音声データが蓄積されている（本発明の記憶手段に相当する）。図６（Ａ）は、事前歌唱音声データの構造を示す図である。サーバ１には、人的評価利用採点の開始指示を行った各カラオケ装置７から、歌唱音声（デジタル音声信号）および機械採点の結果が送信される。サーバ１は、受信した歌唱音声を用いて歌唱音声比較処理および採点結果蓄積処理を行う。歌唱音声比較処理については後述し、まず採点結果蓄積処理について説明する。 Pre-singing voice data is stored in the HDD 53 of the server 1 (corresponding to the storage means of the present invention). FIG. 6A is a diagram showing the structure of the pre-singing voice data. The singing voice (digital voice signal) and the result of the machine scoring are transmitted to the server 1 from each karaoke device 7 which has instructed start of scoring using human evaluation. The server 1 performs a singing voice comparison process and a scoring result accumulation process using the received singing voice. The singing voice comparison process will be described later, and the scoring result accumulation process will be described first.

サーバ１のＣＰＵ５１は、受信した歌唱音声と機械採点の結果を対応付けて、所定のヘッダ（データ名、曲番号等）を付与してＨＤＤ５３に記憶する。なお、歌唱者名の情報等のその他の情報を受信し、歌唱者名等の情報も対応付けて記憶してもよい。事前歌唱音声データには、評価者端末４のユーザによって評価された得点が平均人間採点として記憶される。ただし、最初にＨＤＤ５３に記憶された時点では、平均人間採点は存在しない。このようにして、図６（Ａ）に示したような事前歌唱音声データがＨＤＤ５３に記憶される。 The CPU 51 of the server 1 associates the received singing voice with the result of the machine scoring, assigns a predetermined header (data name, song number, etc.) and stores it in the HDD 53. In addition, other information, such as information on a singer's name, may be received and information, such as a singer's name, may also be stored in association with each other. The score evaluated by the user of the evaluator terminal 4 is stored as the average human score in the pre-singing voice data. However, when the data is first stored in the HDD 53, there is no average human score. In this way, the pre-singing voice data as shown in FIG.

次に、図５（Ｂ）は、評価者端末４の構成を示すブロック図である。評価者端末４は、ＣＰＵ７１、ＲＡＭ７２、ＲＯＭ７３、ネットワークＩ／Ｆ７４、表示処理部７５、モニタ７６、操作部７７、サウンドシステム（ＳＳ）７８、およびスピーカ７９を備えた情報処理装置である。 Next, FIG. 5B is a block diagram showing a configuration of the evaluator terminal 4. The evaluator terminal 4 is an information processing apparatus including a CPU 71, a RAM 72, a ROM 73, a network I / F 74, a display processing unit 75, a monitor 76, an operation unit 77, a sound system (SS) 78, and a speaker 79.

ＣＰＵ７１は、ＲＯＭ７３に記憶されている動作用プログラムをＲＡＭ７２に読み出し、歌唱音声再生処理および人的評価処理を行う。 The CPU 71 reads out the operation program stored in the ROM 73 to the RAM 72 and performs a singing voice reproduction process and a human evaluation process.

評価者端末４のユーザは、操作部７７を用いて人的評価の開始指示を行う。ＣＰＵ７１は、操作部７７を介して当該人的評価の開始指示を受け付けると、ネットワークＩ／Ｆ７４を介してサーバ１に当該開始指示を送信する。サーバ１は、当該開始指示を受信すると、事前歌唱音声データの一覧データを評価者端末４に送信する。 The user of the evaluator terminal 4 gives an instruction to start human evaluation using the operation unit 77. When the CPU 71 receives an instruction to start the personal evaluation via the operation unit 77, the CPU 71 transmits the start instruction to the server 1 via the network I / F 74. When the server 1 receives the start instruction, the server 1 transmits list data of the pre-singing voice data to the evaluator terminal 4.

図６（Ｂ）に示すように、一覧データには、例えば各事前歌唱音声データのデータ名、曲番号に対応する曲名、歌唱者名、平均人間採点結果、機械採点結果等が含まれている。評価者端末４のＣＰＵ７１は、当該一覧データをサーバ１から受信し、一覧データに対応する映像データを生成する。ＣＰＵ７１は、当該映像データを表示処理部７５に出力し、モニタ７６に表示する。ユーザは、操作部７７を介して、当該一覧から再生したい歌唱音声を選択する。 As shown in FIG. 6B, the list data includes, for example, the data name of each pre-song audio data, the song name corresponding to the song number, the singer name, the average human scoring result, the machine scoring result, and the like. . The CPU 71 of the evaluator terminal 4 receives the list data from the server 1 and generates video data corresponding to the list data. The CPU 71 outputs the video data to the display processing unit 75 and displays it on the monitor 76. The user selects a singing voice to be reproduced from the list via the operation unit 77.

歌唱音声が選択されると、ＣＰＵ７１は、当該選択された歌唱音声の再生リクエストをサーバ１に送信する。サーバ１のＣＰＵ５１は、当該再生リクエストに対応する事前歌唱音声データを読み出し、評価者端末４に送信する。 When the singing voice is selected, the CPU 71 transmits a reproduction request for the selected singing voice to the server 1. The CPU 51 of the server 1 reads the pre-song audio data corresponding to the reproduction request and transmits it to the evaluator terminal 4.

評価者端末４のＣＰＵ７１は、受信した事前歌唱音声データの歌唱音声（デジタル音声信号）を再生し、サウンドシステム７８に出力する。サウンドシステム７８は、入力された歌唱音声（デジタル信号）をアナログ信号に変換して増幅し、スピーカ７９から放音する。このようにして歌唱音声再生処理がなされる。 The CPU 71 of the evaluator terminal 4 reproduces the singing voice (digital voice signal) of the received prior singing voice data and outputs it to the sound system 78. The sound system 78 converts the input singing voice (digital signal) into an analog signal, amplifies it, and emits the sound from the speaker 79. In this way, the singing voice reproduction process is performed.

ユーザは、再生された歌唱音声を聴き、操作部７７を介して採点を行う。採点は、１曲全体としての得点を付与する形式であってもよいし、Ａメロ、Ｂメロ等の所定の区間毎に得点を付与する形式であってもよい。ユーザが採点した結果（人的評価の結果）は、サーバ１に送信される。 The user listens to the reproduced singing voice and scores through the operation unit 77. The scoring may be in the form of giving a score as a whole song, or in the form of giving a score for each predetermined section such as A melody and B melody. The result of scoring by the user (result of human evaluation) is transmitted to the server 1.

サーバ１のＣＰＵ５１は、受信した得点を過去に蓄積した得点と平均化し、平均人間採点として事前歌唱音声データに記憶する。平均人間採点は、１曲全体としての得点として記憶されていてもよいし、Ａメロ、Ｂメロ等の所定の区間毎の得点として記憶されていてもよい。このようにして採点結果蓄積処理がなされる。なお、採点された回数が多ければ多いほど人気の高い歌唱音声であるとみなし、平均人間採点の得点を高くしてもよい。また、採点を行うユーザ毎に異なる重み付けを付与した重み付け平均処理を行ってもよい。例えば採点回数の多いユーザが採点した得点の重み付けを大きくする。 The CPU 51 of the server 1 averages the received score with the score accumulated in the past, and stores it in the pre-song voice data as an average human score. The average human scoring may be stored as a score for the entire song, or may be stored as a score for each predetermined section such as A melody and B melody. In this way, scoring result accumulation processing is performed. It should be noted that the higher the number of times scored, the more popular the singing voice, and the higher the average human score may be. Moreover, you may perform the weighted average process which provided different weighting for every user who scores. For example, the weighting of scores scored by a user who scores a large number of times is increased.

次に、歌唱音声比較処理について説明する。歌唱力比較処理は、カラオケ装置７から送信された歌唱音声と事前歌唱音声データとを比較し、類似度を算出する処理である。サーバ１のＣＰＵ５１は、カラオケ装置７から受信した歌唱音声（デジタル音声信号）をＲＡＭ５２に一時記憶し、当該歌唱音声のピッチを抽出する。そして、ＣＰＵ５１は、受信した歌唱音声と同じ曲の事前歌唱音声データを読み出し、読み出した事前歌唱音声データに含まれている歌唱音声（デジタル音声信号）と比較する。なお、同じ曲の全ての事前歌唱音声データを読み出してもよいが、例えば最新の事前歌唱音声データから複数個（例えば１０個）だけを読み出してもよい。 Next, the singing voice comparison process will be described. The singing ability comparison process is a process of comparing the singing voice transmitted from the karaoke apparatus 7 with the pre-singing voice data and calculating the similarity. The CPU 51 of the server 1 temporarily stores the singing voice (digital voice signal) received from the karaoke device 7 in the RAM 52 and extracts the pitch of the singing voice. And CPU51 reads the pre-song audio | voice data of the same music as the received singing audio | voice, and compares with the singing audio | voice (digital audio | voice signal) contained in the read pre-singing audio | voice data. In addition, although all the prior song voice data of the same music may be read, for example, only a plurality (for example, ten) may be read from the latest previous song voice data.

類似度の算出は、機械採点と同様の手法で行われる。すなわち、受信した歌唱音声から抽出したピッチの値、当該ピッチが変化するタイミング、歌唱音声のレベル、等を、事前歌唱音声データに含まれている歌唱音声のピッチ、タイミング、レベル等と比較し、点数化する処理である。ただし、歌唱音声にはガイドメロディのようなノートイベントデータが存在しないため、受信した歌唱音声（または事前歌唱音声）のうち、ある程度同じピッチが連続する区間を１つのノートイベントデータとみなして比較を行う。 The similarity is calculated by the same method as the machine scoring. That is, the pitch value extracted from the received singing voice, the timing at which the pitch changes, the level of the singing voice, etc., are compared with the pitch, timing, level, etc. of the singing voice included in the pre-singing voice data, This is a process of scoring. However, since note event data such as a guide melody does not exist in the singing voice, a section in which the same pitch continues to some extent among received singing voices (or prior singing voices) is regarded as one note event data for comparison. Do.

図７（Ａ）および図７（Ｂ）は、類似度算出の概念を示す図である。同図に示す現在歌唱音声（受信した歌唱音声）は、図４で示した歌唱音声と同じものを示している。図７（Ａ）は、ガイドメロディの音程に対して忠実に歌唱を行った場合の事前歌唱音声データと比較した図である。 FIG. 7A and FIG. 7B are diagrams showing the concept of similarity calculation. The current singing voice (received singing voice) shown in the figure is the same as the singing voice shown in FIG. FIG. 7 (A) is a diagram comparing with pre-singing voice data when singing faithfully to the pitch of the guide melody.

この例における現在歌唱音声は、ノートＢの区間においてアレンジを加えて音程を変更して歌唱したものである。したがって、ガイドメロディの音程に対して忠実に歌唱を行った場合の事前歌唱音声データと比較すると、ノートＢの区間では低い類似度（２０％）が算出されるようになっている。 The current singing voice in this example is sung by changing the pitch by adding arrangement in the section of note B. Therefore, a lower similarity (20%) is calculated in the section of note B compared to the pre-song voice data when singing faithfully to the pitch of the guide melody.

一方、図７（Ｂ）は、アレンジを加えてノートＢの区間の音程を変更して歌唱した場合の事前歌唱音声データと比較した図である。この例では、現在歌唱音声も事前歌唱音声も、ともにノートＢの区間においてアレンジを加えて音程を変更して歌唱しているため、高い類似度（７０％）が算出されている。 On the other hand, FIG. 7 (B) is a figure compared with prior singing voice data in the case of singing by changing the pitch of the section of note B by adding an arrangement. In this example, since both the current singing voice and the pre-singing voice are sung by changing the pitch in the section of note B, the high similarity (70%) is calculated.

以上のような類似度は、Ａメロ、Ｂメロ、サビ等の所定の区間毎（あるいは所定時間経過毎）に集計される。または、１曲を通した平均類似度（全体の類似度）が算出される。 The degree of similarity as described above is totaled for each predetermined section (or every predetermined time) such as A melody, B melody, and rust. Alternatively, the average similarity (total similarity) through one song is calculated.

サーバ１のＣＰＵ５１は、区間毎または全体の類似度が高い事前歌唱音声データを抽出する。そして、抽出した事前歌唱音声データに付与されている平均人間採点を、現在の歌唱音声の採点結果に反映させる。例えば、図６（Ｃ）に示すように、最も類似度の高い事前歌唱音声データＣに付与されている平均人間採点（８０点）と、当該最も類似度の高い事前歌唱音声データＣに付与されている機械採点（６０点）とを平均化した得点（７０点）を現在の歌唱音声の採点結果として出力する。 The CPU 51 of the server 1 extracts prior singing voice data having a high degree of similarity for each section or for the entire section. And the average human scoring given to the extracted prior singing voice data is reflected in the scoring result of the current singing voice. For example, as shown in FIG. 6 (C), the average human score (80 points) given to the pre-song voice data C with the highest similarity and the pre-song voice data C with the highest similarity are given. The score (70 points) obtained by averaging the machine scoring (60 points) is output as the current singing voice scoring result.

あるいは、最も類似度の高い事前歌唱音声データＣに付与されている平均人間採点（８０点）をそのまま採点結果として出力してもよいし、最も類似度の高い事前歌唱音声データＣに付与されている平均人間採点（８０点）と、現在の歌唱音声における機械採点（例えば６５点）とを平均化した得点（７２．５点）を採点結果として出力してもよい。 Alternatively, the average human score (80 points) given to the pre-song voice data C with the highest degree of similarity may be output as a scoring result as it is, or given to the pre-song voice data C with the highest degree of similarity. A score (72.5 points) obtained by averaging the average human score (80 points) and the machine score (for example, 65 points) in the current singing voice may be output as the score result.

例えば、図７（Ｂ）に示したように、アレンジを加えて音程を変更して歌唱した事前歌唱音声データの平均人間採点に高い得点が付与されていた場合、機械採点の結果が低くとも、当該アレンジによる人的評価が反映されることになり、機械採点よりも高い採点結果が得られる。また、逆にガイドメロディに忠実に歌唱した場合であっても、人間が良いと感じなかった歌唱については、機械採点の結果が高くとも、人的評価が反映されることにより機械採点よりも低い採点結果が得られる。 For example, as shown in FIG. 7B, when a high score is given to the average human scoring of the pre-song voice data sung by changing the pitch by adding an arrangement, even if the result of the machine scoring is low, The human evaluation by the arrangement will be reflected, and a scoring result higher than the machine scoring will be obtained. On the other hand, even if the singing was faithful to the guide melody, the singing that the human did not feel good was lower than the mechanical scoring because the human scoring was reflected even though the mechanical scoring result was high A scoring result is obtained.

このように、本実施形態のカラオケシステムでは、人間が上手い（または下手）と感じて高い（または低い）人的評価が得られたものが記憶されている場合、同じような歌い方で（例えばアレンジで音程を変更して）歌唱すると、対応付けられた高い（または低い）人的評価が採点結果に反映されるため、人的評価をその場で提示することができる。 As described above, in the karaoke system of this embodiment, when a person who feels good (or poor) and obtained a high (or low) human evaluation is stored in a similar manner (for example, When singing with the pitch changed in the arrangement, the associated high (or low) human evaluation is reflected in the scoring result, so that the human evaluation can be presented on the spot.

なお、図６（Ｃ）の例では、最も類似する事前歌唱音声データの人的評価を利用する例を示したが、複数の事前歌唱音声データを抽出し、抽出したそれぞれの事前歌唱音声データの類似度に応じて、それぞれの人的評価を重み付けし、採点結果に反映させることが好ましい。 In addition, in the example of FIG. 6C, although the example using the human evaluation of the most similar pre-song voice data was shown, a plurality of pre-song voice data is extracted, and each of the pre-song voice data extracted is extracted. It is preferable to weight each human evaluation according to the degree of similarity and reflect it in the scoring results.

図８は、人的評価利用採点の応用例１を示す図である。応用例１では、類似度の高い複数（４つ）の事前歌唱音声データを抽出する例を示している。 FIG. 8 is a diagram showing an application example 1 of human evaluation use scoring. The application example 1 shows an example in which a plurality (four) of pre-singing voice data with high similarity are extracted.

この例では、現在の歌唱音声と事前歌唱音声データとの類似度に応じて採点結果に重み付け加算処理を行う。すなわち、最も類似度の高い事前歌唱音声データに最も高い重み（寄与率）を設定し、当該最も類似度の高い事前歌唱音声データに付与されている採点結果を大きく反映させ、類似度が低くなるにつれて重み（寄与率）を低くし、類似度に応じて採点結果を変化させる。 In this example, a weighted addition process is performed on the scoring results according to the degree of similarity between the current singing voice and the pre-singing voice data. That is, the highest weight (contribution rate) is set to the pre-song voice data having the highest similarity, and the scoring result given to the pre-song voice data having the highest similarity is greatly reflected, and the similarity becomes low. Accordingly, the weight (contribution rate) is lowered, and the scoring result is changed according to the similarity.

また、この例では、それぞれの事前歌唱音声データにおける平均人間採点と、機械採点とを平均化した得点を複合採点として算出する。そして、複合採点と寄与率とを乗算し、寄与率修正点を算出する。例えば、事前歌唱音声データＣは、平均人間採点（８０点）と機械採点（６０点）とを平均化した複合採点が７０点として算出されているが、寄与率が４０％であるため、修正点として２８点が付与される。同様に、事前歌唱音声データＢは、修正点として２３．５５点が付与され、事前歌唱音声データＡは、修正点として１６．２点が付与され、事前歌唱音声データＤは、修正点として７．３５点が付与される。そしてこれら修正点を加算した得点（７５．１点）が採点結果として出力される。 Moreover, in this example, the score which averaged the average human scoring and machine scoring in each prior singing voice data is calculated as a composite scoring. Then, the composite scoring and the contribution rate are multiplied to calculate a contribution rate correction point. For example, the preliminary singing voice data C is calculated as a composite score obtained by averaging the average human score (80 points) and the machine score (60 points), but the contribution rate is 40%. 28 points are given as points. Similarly, the preliminary singing voice data B is given 23.55 points as correction points, the preliminary singing voice data A is given 16.2 points as correction points, and the preliminary singing voice data D is 7 points as correction points. .35 points will be awarded. And the score (75.1 points) which added these correction points is output as a scoring result.

なお、この例においても、抽出したそれぞれの事前歌唱音声データに付与されている平均人間採点に寄与率を乗算して加算し、機械採点を考慮しないようにしてもよい。ただし、機械採点を反映させることで、人間による主観的な評価だけでなく、音程の正確さ等の客観的な評価も反映させることができ、より高精度な採点を行うことが可能である。 In this example as well, the average human scoring given to each extracted pre-song voice data may be multiplied by the contribution rate and added, so that the mechanical scoring may not be considered. However, by reflecting machine scoring, it is possible to reflect not only subjective evaluation by humans but also objective evaluation such as accuracy of pitch, and more accurate scoring can be performed.

次に、図９は、人的評価利用採点の応用例２を示す図である。応用例２では、応用例１に対し、現在の歌唱音声の機械採点結果も反映させる場合の例を示している。 Next, FIG. 9 is a figure which shows the example 2 of application of human evaluation utilization scoring. The application example 2 shows an example in which the machine scoring result of the current singing voice is also reflected in the application example 1.

この例では、複合採点として、それぞれの事前歌唱音声データにおける平均人間採点に対し、現在の歌唱音声の機械採点結果による補正を行う。すなわち、この例では、平均人間採点×（現在歌唱機械採点／事前歌唱音声データの機械採点）＝複合採点として、各事前歌唱音声データの複合採点を算出する。そして、算出した複合採点の結果に寄与率を乗算し、寄与率修正点を算出する。 In this example, as the composite scoring, the average human scoring in each pre-song speech data is corrected by the mechanical scoring result of the current singing speech. That is, in this example, a composite score of each preliminary singing voice data is calculated as average human scoring × (current singing machine scoring / pre-singing voice data mechanical scoring) = composite scoring. Then, the contribution rate correction score is calculated by multiplying the calculated composite scoring result by the contribution rate.

例えば、事前歌唱音声データＣは、平均人間採点（８０点）に対し、現在歌唱機械採点（６５点）／事前歌唱音声データの機械採点（６０点）が乗算され、８０×（６５／６０）＝８６．６７点が複合採点として算出されている。そして、寄与率が４０％であるため、修正点として３４．６７点が付与される。同様に、事前歌唱音声データＢは、修正点として１８．３点が付与され、事前歌唱音声データＡは、修正点として１３．３２点が付与され、事前歌唱音声データＤは、修正点として５．７５点が付与される。そしてこれら修正点を加算した得点（７２．０４点）が採点結果として出力される。 For example, the pre-singing voice data C is obtained by multiplying the average human scoring (80 points) by the current singing machine scoring (65 points) / machine scoring of the pre-singing voice data (60 points), and 80 × (65/60). = 86.67 points are calculated as a composite score. Since the contribution rate is 40%, 34.67 points are given as correction points. Similarly, the pre-singing voice data B is given 18.3 points as correction points, the pre-singing voice data A is given 13.32 points as correction points, and the pre-singing voice data D is 5 points as correction points. .75 points are awarded. And the score (72.04 points) which added these correction points is output as a scoring result.

このように、人的評価を主としながらも、機械採点の結果による微調整を行うことで、より高精度な採点を行うことが可能である。 In this way, it is possible to perform more accurate scoring by performing fine adjustment based on the result of machine scoring while mainly performing human evaluation.

次に、人的評価利用採点処理の動作について、図１０のフローチャートを参照して説明する。歌唱者がタッチパネル１５、操作部２５、またはリモコン９を用いて人的評価利用採点処理の開始指示を行い、カラオケ演奏が開始されると、図１０に示す動作を行う。 Next, the operation | movement of a human evaluation utilization scoring process is demonstrated with reference to the flowchart of FIG. When the singer gives an instruction to start the human evaluation use scoring process using the touch panel 15, the operation unit 25, or the remote controller 9, and the karaoke performance is started, the operation shown in FIG. 10 is performed.

まず、カラオケ装置７のＣＰＵ１１は、カラオケ演奏を行うとともに、歌唱音声の機械採点を開始する（ｓ１１）。次に、ＣＰＵ１１は、所定区間（例えばＡメロ）が経過したか否かを判断し（ｓ１２）、所定区間が経過した場合には、当該所定区間毎に機械採点の結果を集計し、当該所定区間における機械採点結果および歌唱音声をサーバ１に送信する（ｓ１３）。 First, the CPU 11 of the karaoke apparatus 7 performs karaoke performance and starts singing voice mechanical scoring (s11). Next, the CPU 11 determines whether or not a predetermined section (for example, A melody) has elapsed (s12). If the predetermined section has elapsed, the CPU 11 aggregates the results of machine scoring for each predetermined section, and The machine scoring result and the singing voice in the section are transmitted to the server 1 (s13).

サーバ１のＣＰＵ５１は、採点結果および歌唱音声を受信すると（ｓ１４）、歌唱音声比較処理を行い、当該区間において類似度の高い事前歌唱音声データを抽出する（ｓ１５）。そして、抽出した事前歌唱音声データに付与されている平均人間採点を、現在の歌唱音声の採点結果に反映させ、採点結果として出力する（ｓ１６）。当該採点結果は、カラオケ装置７に送信される（ｓ１７）。なお、類似する事前歌唱音声データが存在しなかった、または低い（例えば５０％以下の）類似度の事前歌唱音声データだけしか存在しなかった場合には、カラオケ装置７に「評価待ち」を示す情報を送信し、カラオケ装置７においてはモニタ２４に「評価待ちです」等と表示して、機械採点の結果だけを表示するようにしてもよい。 When the CPU 51 of the server 1 receives the scoring result and the singing voice (s14), the CPU 51 performs the singing voice comparison process and extracts the pre-singing voice data having high similarity in the section (s15). Then, the average human scoring given to the extracted prior singing voice data is reflected in the scoring result of the current singing voice and output as the scoring result (s16). The scoring result is transmitted to the karaoke apparatus 7 (s17). Note that if there is no similar pre-song voice data, or only pre-song voice data with a low similarity (for example, 50% or less) exists, the karaoke apparatus 7 indicates “waiting for evaluation”. The information may be transmitted, and the karaoke apparatus 7 may display “Waiting for evaluation” or the like on the monitor 24 to display only the result of the machine scoring.

なお、カラオケ装置７からサーバ１に送信された歌唱音声および機械採点の結果は、事前歌唱音声データとして、サーバ１のＨＤＤ５３に蓄積され、ユーザによる人的評価の対象の歌唱音声となる
カラオケ装置７のＣＰＵ１１は、サーバ１から採点結果を受信し（ｓ１８）、当該採点結果をモニタ２４に表示する（ｓ１９）。なお、採点結果は、区間毎の採点結果、および現在までの区間を平均化した全体の採点結果を表示することが好ましい。 The singing voice transmitted from the karaoke apparatus 7 to the server 1 and the result of the machine scoring are accumulated as the pre-singing voice data in the HDD 53 of the server 1 and become the singing voice to be subject to human evaluation by the user. The CPU 11 receives the scoring result from the server 1 (s18), and displays the scoring result on the monitor 24 (s19). The scoring results preferably display the scoring results for each section and the overall scoring results obtained by averaging the sections up to now.

最後に、ＣＰＵ１１は、楽曲の演奏が終了したか否かを判断し（ｓ２０）、楽曲の演奏が終了するまではｓ１２以下の処理を繰り返し行う。楽曲の演奏が終了した場合には、１曲全体の採点結果を表示してもよい。 Finally, the CPU 11 determines whether or not the music performance has ended (s20), and repeats the processing from s12 onward until the music performance ends. When the performance of the music is finished, the scoring result of the entire music may be displayed.

なお、本実施形態においては、カラオケ装置７で歌唱を行い、歌唱音声をサーバ１に送信して、人的評価利用採点処理を行う例を示したが、例えばユーザが自身の所有するＰＣやスマートフォン等の情報処理装置を用いて歌唱を行い、サーバ１または当該情報処理装置が本発明の歌唱音声評価装置を実現することも可能である。すなわち、ユーザは、自身が所有するＰＣやスマートフォンを用いて歌唱を行い、当該ＰＣやスマートフォンが歌唱音声をサーバ１に送信することで、人的評価利用採点処理を実現する。 In addition, in this embodiment, although the example which performs singing with the karaoke apparatus 7, transmits a singing voice | voice to the server 1, and performs a human evaluation utilization scoring process was shown, for example, a user's own PC and smart phone It is also possible to sing using an information processing device such as the server 1 or the information processing device to realize the singing voice evaluation device of the present invention. That is, the user sings using a PC or smartphone owned by the user, and the PC or smartphone transmits the singing voice to the server 1, thereby realizing the human evaluation use scoring process.

この場合、ユーザは、自身の所有する情報処理装置を用いて人的評価利用採点処理の開始指示を行う。ユーザが人的評価利用採点処理の開始指示を行うと、当該ユーザの所有する情報処理装置がカラオケ演奏を行う。ユーザの歌唱音声は、サーバ１に送信される（これにより、サーバ１のＣＰＵ５１は、本発明の歌唱音声入力手段を実現する）。そして、サーバ１のＣＰＵ５１が人的評価利用採点処理（図５に示したフローチャートにおけるｓ１５以下の処理）を行い、本発明の採点手段を実現する。このようにして、ユーザ自身の所有する情報処理装置（またはカラオケ装置７）からサーバ１に歌唱音声を送信し、サーバ１により本発明の歌唱音声評価装置を実現することも可能である。また、ユーザの所有する情報処理装置（またはカラオケ装置７）がサーバ１から事前歌唱音声データをダウンロードし、当該情報処理装置が歌唱音声比較処理を行うことも可能である。 In this case, the user instructs the start of the human evaluation use scoring process using the information processing apparatus owned by the user. When the user gives an instruction to start the human evaluation use scoring process, the information processing apparatus owned by the user performs a karaoke performance. The user's singing voice is transmitted to the server 1 (thereby, the CPU 51 of the server 1 realizes the singing voice input means of the present invention). Then, the CPU 51 of the server 1 performs a human evaluation use scoring process (s15 and subsequent processes in the flowchart shown in FIG. 5), thereby realizing the scoring means of the present invention. In this way, the singing voice can be transmitted from the information processing apparatus (or karaoke apparatus 7) owned by the user to the server 1, and the singing voice evaluation apparatus of the present invention can be realized by the server 1. It is also possible for the information processing device (or karaoke device 7) owned by the user to download the pre-song voice data from the server 1 and for the information processing device to perform the singing voice comparison process.

なお、本実施形態においては、歌唱音声（デジタル音声信号）をサーバ１に送信し、事前歌唱音声データとして蓄積する例を示したが、歌唱音声を含む映像データ（例えば歌唱者が歌いながら踊る姿を撮影したもの等）をサーバ１に送信し、事前歌唱音声データとして蓄積するようにしてもよい。 In addition, in this embodiment, although the example which transmits song audio | voice (digital audio | voice signal) to the server 1 and accumulate | stores as prior song audio | voice data was shown, the image data (for example, a singer dancing while singing) was shown. Or the like) may be transmitted to the server 1 and stored as pre-singing voice data.

１…サーバ
２…ネットワーク
３…カラオケ店舗
４…評価者端末
５…中継機
７…カラオケ装置
９…リモコン
１１…ＣＰＵ
１２…ＲＡＭ
１３…ＨＤＤ
１４…ネットワークＩ／Ｆ
１５…タッチパネル
１６…マイク
１７…Ａ／Ｄコンバータ
１８…音源
１９…ミキサ
２０…サウンドシステム
２１…スピーカ
２２…デコーダ
２３…表示処理部
２４…モニタ
２５…操作部
２６…送受信部 DESCRIPTION OF SYMBOLS 1 ... Server 2 ... Network 3 ... Karaoke store 4 ... Evaluator terminal 5 ... Relay machine 7 ... Karaoke apparatus 9 ... Remote control 11 ... CPU
12 ... RAM
13 ... HDD
14 ... Network I / F
DESCRIPTION OF SYMBOLS 15 ... Touch panel 16 ... Microphone 17 ... A / D converter 18 ... Sound source 19 ... Mixer 20 ... Sound system 21 ... Speaker 22 ... Decoder 23 ... Display processing part 24 ... Monitor 25 ... Operation part 26 ... Transmission / reception part

Claims

Storage means for storing the singing voice and the human evaluation for the singing voice in advance and storing them as the pre-singing voice data;
Singing voice input means for inputting singing voice;
Scoring means for scoring the current singing voice input by the singing voice input means;
With
The scoring means compares the current singing voice and the pre-singing voice data, extracts pre-singing voice data similar to the current singing voice, and performs a human evaluation on the extracted pre-singing voice data. , Output in scoring results ,
The preliminary singing voice data includes a machine scoring result,
The scoring means performs a current singing machine scoring to compare the singing voice and the reference singing voice, and both the result of the current singing machine scoring and the result of the mechanical scoring included in the preliminary singing voice data. A singing voice evaluation device for outputting in the scoring results .

The scoring means extracts a plurality of preliminary singing voice data similar to the current singing voice,
The singing voice evaluation apparatus according to claim 1, wherein each human evaluation is weighted according to the degree of similarity of each extracted prior singing voice data, and is included in the scoring result and output.

The scoring means singing voice evaluation device according to claim 1 or claim 2 outputs the rating result by scoring the current singing voice for every predetermined section.

A singing voice evaluation system comprising a server and a singer's terminal,
Storage means according to any one of claims 1 to 3 provided on one of the terminals or the server of the singer, the singer scoring means according to any one of claims 1 to 3 A singing voice evaluation system provided in either the terminal or the server, and the singing voice input means according to any one of claims 1 to 3 provided in the terminal of the singer.