JP6944390B2

JP6944390B2 - Karaoke equipment

Info

Publication number: JP6944390B2
Application number: JP2018014549A
Authority: JP
Inventors: 勇太岡田
Original assignee: Daiichikosho Co Ltd
Current assignee: Daiichikosho Co Ltd
Priority date: 2018-01-31
Filing date: 2018-01-31
Publication date: 2021-10-06
Anticipated expiration: 2038-01-31
Also published as: JP2019132978A

Description

本発明はカラオケ装置に関する。 The present invention relates to a karaoke device.

カラオケ装置では、様々な楽曲のカラオケ歌唱を楽しむことができる。楽曲の中には、英語や中国語等、外国語でカラオケ歌唱を行う洋楽曲が含まれている。このため、洋楽曲の歌詞テロップは外国語で表示される。また、洋楽曲の場合、外国語の歌詞テロップと合わせて母国語（たとえば仮名）のルビが表示されるため、外国語の発音に慣れていない利用者であっても洋楽曲のカラオケ歌唱を楽しむことが可能となっている。一方、当該外国語を母国語とする利用者の中には、ルビが表示されることでカラオケ歌唱を行い辛いと感じる者もいる。 With the karaoke device, you can enjoy karaoke singing of various songs. The songs include Western songs that sing karaoke in foreign languages such as English and Chinese. Therefore, the lyrics telop of Western songs is displayed in a foreign language. Also, in the case of Western songs, the ruby of the native language (for example, a pseudonym) is displayed together with the lyrics telop of the foreign language, so even users who are not accustomed to pronunciation in foreign languages can enjoy karaoke singing of Western songs. It is possible. On the other hand, some users whose mother tongue is the foreign language find it difficult to sing karaoke because ruby is displayed.

そこで、カラオケ装置は、リモコン装置等を介してルビを表示／非表示とすることができる。また、特許文献１には、利用者の発音の一致度合いによって、歌詞テロップのルビを表示したり消去したりする技術が開示されている。 Therefore, the karaoke device can display / hide ruby via a remote controller or the like. Further, Patent Document 1 discloses a technique for displaying or erasing ruby of a lyric telop depending on the degree of matching of the pronunciation of the user.

特開平９−２４４６６７号公報Japanese Unexamined Patent Publication No. 9-24467

ところで、ある外国語の発音を行う場合、母国語を同じくする利用者同士では、発音が容易な単語及び困難な単語が共通する傾向にある。一方、歌詞テロップには複数の単語が含まれているため、これらの単語が混在している可能性が高い。 By the way, when pronouncing a certain foreign language, users who have the same native language tend to have common words that are easy to pronounce and words that are difficult to pronounce. On the other hand, since the lyrics telop contains multiple words, it is highly possible that these words are mixed.

このような場合に、カラオケ歌唱を行いながら、リモコン装置を介して単語毎にルビの表示をＯＮ／ＯＦＦすることは煩雑である。また、特許文献１に開示された技術は、ある歌唱区間における発音一致度合の判定結果に応じて、次の歌唱区間におけるルビの表示を切り替えるものである。従って、ある歌唱区間に含まれる単語の発音が容易であった場合には、次の歌唱区間におけるルビが表示されない。しかし、次の歌唱区間に発音が困難な単語が含まれる場合、利用者はルビを参照できないため、カラオケ歌唱を行うことが困難となる。 In such a case, it is complicated to turn on / off the ruby display for each word via the remote control device while singing karaoke. Further, the technique disclosed in Patent Document 1 switches the display of ruby in the next singing section according to the determination result of the degree of pronunciation matching in one singing section. Therefore, if the words included in a certain singing section are easily pronounced, the ruby in the next singing section is not displayed. However, if the next singing section contains a word that is difficult to pronounce, the user cannot refer to the ruby, which makes it difficult to sing karaoke.

本発明の目的は、外国語の楽曲をカラオケ歌唱する際、利用者の母国語に応じて単語毎にルビの表示態様を切り替えることが可能なカラオケ装置を提供することにある。 An object of the present invention is to provide a karaoke device capable of switching a ruby display mode for each word according to the native language of a user when singing a song in a foreign language.

上記目的を達成するための主たる発明は、外国語の楽曲をカラオケ歌唱した際の発音を評価するための基準情報に基づいて、利用者の歌唱音声信号を評価し、楽曲の歌詞に含まれる単語毎の発音の評価結果を示す発音評価データを取得する評価取得部と、母国語を同じくする複数の利用者の前記発音評価データに基づいて、前記単語毎の発音の統計的な評価結果を示す発音評価統計データを算出する統計算出部と、前記利用者が外国語の楽曲をカラオケ歌唱する際、歌詞テロップデータに基づいて外国語の歌詞テロップを表示させ、且つ当該歌詞テロップに含まれる単語毎に、当該利用者の母国語に対応する前記発音評価統計データに応じた表示態様でルビデータに基づく母国語のルビを表示する表示制御部と、を有するカラオケ装置である。
本発明の他の特徴については、後述する明細書及び図面の記載により明らかにする。 The main invention for achieving the above object is to evaluate the user's singing voice signal based on the reference information for evaluating the pronunciation when karaoke singing a song in a foreign language, and the words included in the lyrics of the song. Based on the evaluation acquisition unit that acquires pronunciation evaluation data showing the evaluation result of each pronunciation and the pronunciation evaluation data of a plurality of users who have the same native language, the statistical evaluation result of pronunciation for each word is shown. A statistical calculation unit that calculates pronunciation evaluation statistical data, and when the user sings a song in a foreign language in karaoke, the lyrics telop in the foreign language is displayed based on the lyrics telop data, and each word included in the lyrics telop is displayed. It is a karaoke device having a display control unit for displaying ruby in the native language based on the ruby data in a display mode corresponding to the pronunciation evaluation statistical data corresponding to the native language of the user.
Other features of the present invention will be clarified by the description of the description and drawings described later.

本発明によれば、外国語の楽曲をカラオケ歌唱する際、利用者の母国語に応じて単語毎にルビの表示態様を切り替えることができる。 According to the present invention, when singing a song in a foreign language in karaoke, the display mode of ruby can be switched for each word according to the native language of the user.

実施形態に係るカラオケ装置の構成を示す図である。It is a figure which shows the structure of the karaoke apparatus which concerns on embodiment. 実施形態に係る発音評価データの例を示す図である。It is a figure which shows the example of the pronunciation evaluation data which concerns on embodiment. 実施形態に係る発音評価統計データの例を示す図である。It is a figure which shows the example of the pronunciation evaluation statistical data which concerns on embodiment. 実施形態に係る評価結果とルビの表示態様の関係を示す図である。It is a figure which shows the relationship between the evaluation result which concerns on embodiment, and the display mode of ruby. 実施形態に係る歌詞テロップ及びルビの表示例を示す図である。It is a figure which shows the display example of the lyrics telop and ruby which concerns on embodiment. 実施形態に係るカラオケ装置の処理を示すフローチャートである。It is a flowchart which shows the process of the karaoke apparatus which concerns on embodiment. 実施形態に係るカラオケ装置の処理を示すフローチャートである。It is a flowchart which shows the process of the karaoke apparatus which concerns on embodiment.

＜実施形態＞
図１〜図６Ｂを参照して、本実施形態に係るカラオケ装置１について説明する。 <Embodiment>
The karaoke device 1 according to the present embodiment will be described with reference to FIGS. 1 to 6B.

＝＝カラオケ装置＝＝
カラオケ装置１は、利用者が選曲した楽曲のカラオケ演奏、及び利用者がカラオケ歌唱を行うための装置である。図１に示すように、カラオケ装置１は、カラオケ本体１０、スピーカ２０、表示装置３０、マイク４０、及びリモコン装置５０を備える。 == Karaoke device ==
The karaoke device 1 is a device for performing karaoke performance of music selected by the user and singing karaoke by the user. As shown in FIG. 1, the karaoke device 1 includes a karaoke body 10, a speaker 20, a display device 30, a microphone 40, and a remote control device 50.

スピーカ２０はカラオケ本体１０からの放音信号に基づいて放音するための構成である。表示装置３０はカラオケ本体１０からの信号に基づいて映像や画像を画面に表示するための構成である。マイク４０は利用者の歌唱音声（マイク４０への入力音声）をアナログの歌唱音声信号に変換してカラオケ本体１０に入力するための構成である。 The speaker 20 is configured to emit sound based on the sound emitted signal from the karaoke main body 10. The display device 30 is configured to display an image or an image on the screen based on the signal from the karaoke main body 10. The microphone 40 is configured to convert the user's singing voice (input voice to the microphone 40) into an analog singing voice signal and input it to the karaoke main body 10.

（カラオケ本体のハードウェア）
図１に示すように、カラオケ本体１０は、制御部１１、通信部１２、記憶部１３、音響処理部１４、表示処理部１５及び操作部１６を備える。各構成はインターフェース（図示なし）を介してバスＢに接続されている。 (Karaoke hardware)
As shown in FIG. 1, the karaoke main body 10 includes a control unit 11, a communication unit 12, a storage unit 13, an acoustic processing unit 14, a display processing unit 15, and an operation unit 16. Each configuration is connected to bus B via an interface (not shown).

カラオケ本体１０は、選曲された楽曲のカラオケ演奏制御、歌詞や背景画像等の表示制御、マイク４０を通じて入力された歌唱音声信号の処理といった、カラオケ歌唱に関する各種の制御を行う。 The karaoke body 10 performs various controls related to karaoke singing, such as karaoke performance control of selected songs, display control of lyrics and background images, and processing of singing voice signals input through a microphone 40.

制御部１１は、ＣＰＵおよびメモリ（いずれも図示無し）を備える。ＣＰＵは、メモリに記憶された動作プログラムを実行することにより各種の制御機能を実現する。メモリは、ＣＰＵに実行されるプログラムを記憶したり、プログラムの実行時に各種情報を一時的に記憶したりする記憶装置である。 The control unit 11 includes a CPU and a memory (neither of which is shown). The CPU realizes various control functions by executing an operation program stored in the memory. The memory is a storage device that stores a program to be executed by the CPU and temporarily stores various information when the program is executed.

通信部１２は、ルーター（図示なし）を介してカラオケ本体１０を通信回線に接続するためのインターフェースを提供する。 The communication unit 12 provides an interface for connecting the karaoke body 10 to the communication line via a router (not shown).

記憶部１３は、各種のデータを記憶する大容量の記憶装置であり、たとえばハードディスクドライブなどである。記憶部１３は、カラオケ装置１によりカラオケ演奏を行うための複数の楽曲データを記憶する。 The storage unit 13 is a large-capacity storage device that stores various types of data, such as a hard disk drive. The storage unit 13 stores a plurality of music data for performing karaoke performance by the karaoke device 1.

楽曲データは、個々の楽曲を特定するための識別情報（楽曲ＩＤ）が付与されている。楽曲データは、伴奏データ、リファレンスデータ、背景画像データ、歌詞データ、及び属性情報を含む。伴奏データは、カラオケ演奏音の元となるＭＩＤＩ形式のデータである。リファレンスデータは、利用者によるカラオケ歌唱を採点する際の基準として用いられるデータである。リファレンスデータは、ピッチ（音高）データ、音長データ、タイミングデータ等を含む。背景画像データは、カラオケ演奏時に合わせて表示装置３０等に表示される背景画像に対応するデータである。歌詞データは、表示装置３０等に表示させる歌詞（歌詞テロップ）に関するデータである。属性情報は、曲名、歌手名、作詞・作曲者名、及びジャンル等の当該楽曲に関する情報である。また、本実施形態に係る属性情報には、歌詞の言語を示す言語情報が含まれる。たとえば、日本語の歌詞の楽曲データであれば、属性情報として「日本語」という言語情報を含む。 The music data is provided with identification information (music ID) for identifying each music. The music data includes accompaniment data, reference data, background image data, lyrics data, and attribute information. The accompaniment data is MIDI format data that is the source of the karaoke performance sound. The reference data is data used as a reference when scoring a karaoke song by a user. The reference data includes pitch (pitch) data, sound length data, timing data, and the like. The background image data is data corresponding to the background image displayed on the display device 30 or the like at the time of karaoke performance. The lyrics data is data related to lyrics (lyric telop) to be displayed on the display device 30 or the like. The attribute information is information about the music such as a song title, a singer name, a lyricist / composer name, and a genre. Further, the attribute information according to the present embodiment includes linguistic information indicating the language of the lyrics. For example, in the case of music data of Japanese lyrics, the language information "Japanese" is included as attribute information.

ここで、楽曲の中には歌詞が外国語の楽曲が存在する。外国語は、母国語以外の言語である。たとえば、日本語を母国語とする利用者にとって、英語や中国語でカラオケ歌唱する必要がある楽曲（歌詞が英語や中国語の楽曲）は外国語の楽曲に相当する。また、英語を母国語とする利用者にとって、日本語でカラオケ歌唱する必要がある楽曲（歌詞が日本語の楽曲）は外国語の楽曲に相当する。 Here, some songs have lyrics in foreign languages. A foreign language is a language other than your mother tongue. For example, for a user whose native language is Japanese, a song that requires karaoke singing in English or Chinese (a song whose lyrics are in English or Chinese) corresponds to a song in a foreign language. In addition, for users whose native language is English, songs that require karaoke singing in Japanese (songs whose lyrics are in Japanese) correspond to songs in a foreign language.

本実施形態における外国語の楽曲の楽曲データは、伴奏データ等の他に、発音リファレンスデータ及びルビデータを含む。なお、外国語の楽曲の歌詞データは、外国語による歌詞テロップを表示するためのデータである。たとえば、英語の楽曲の歌詞データは、複数の英単語により構成されている。 The music data of the music in a foreign language in the present embodiment includes pronunciation reference data and ruby data in addition to accompaniment data and the like. The lyrics data of a song in a foreign language is data for displaying a lyrics telop in a foreign language. For example, the lyrics data of an English song is composed of a plurality of English words.

発音リファレンスデータは、外国語の楽曲をカラオケ歌唱した際の発音を評価するためのデータである。発音リファレンスデータは、「基準情報」の一例である。 The pronunciation reference data is data for evaluating the pronunciation when a song in a foreign language is sung in karaoke. The pronunciation reference data is an example of "reference information".

具体的に、発音リファレンスデータは、楽曲の歌詞に含まれる単語毎の正確な発音を示すデータであり、たとえば、外国語を母国語とする者（以下、「ネイティブ」という場合がある）が発音した音声を変換したデータである。 Specifically, the pronunciation reference data is data showing the accurate pronunciation of each word included in the lyrics of the song, and is pronounced by, for example, a person whose native language is a foreign language (hereinafter, may be referred to as "native"). This is the converted data of the voice.

上述のリファレンスデータは、カラオケ歌唱における音程やリズム等を含めた総合的な歌唱評価の基準として用いられる。一方、発音リファレンスデータは、カラオケ歌唱において発声されたある単語の発音が、ネイティブの発音にどれだけ近似しているかを評価するための基準として用いられる。 The above-mentioned reference data is used as a standard for comprehensive singing evaluation including pitch and rhythm in karaoke singing. On the other hand, the pronunciation reference data is used as a standard for evaluating how close the pronunciation of a certain word uttered in karaoke singing is to the native pronunciation.

ルビデータは、歌詞テロップに対して母国語のルビを付すための文字データである。たとえば、ルビデータは、英語（外国語）の歌詞テロップに対して日本語（母国語）のルビを付すために使用される。 Ruby data is character data for adding ruby in the native language to a lyric telop. For example, ruby data is used to add Japanese (native) ruby to English (foreign language) lyrics telops.

音響処理部１４は、制御部１１の制御に基づき、楽曲に対する演奏の制御およびマイク４０を通じて入力された歌唱音声信号の処理を行う。音響処理部１４は、たとえばＭＩＤＩ音源、ミキサ、アンプ（いずれも図示なし）を含む。制御部１１は、予約された楽曲の伴奏データを、テンポクロック信号に基づいて順次読み出し、ＭＩＤＩ音源に入力する。ＭＩＤＩ音源は、当該伴奏データに基づいて楽音信号を生成する。ミキサは、当該楽音信号およびマイク４０から出力される歌唱音声信号を適当な比率でミキシングしてアンプに出力する。アンプは、ミキサからのミキシング信号を増幅し、放音信号としてスピーカ２０へ出力する。これにより、スピーカ２０からは放音信号に基づくカラオケ演奏音およびマイク４０からの歌唱音声が放音される。 Based on the control of the control unit 11, the sound processing unit 14 controls the performance of the music and processes the singing voice signal input through the microphone 40. The sound processing unit 14 includes, for example, a MIDI sound source, a mixer, and an amplifier (none of which are shown). The control unit 11 sequentially reads out the accompaniment data of the reserved music based on the tempo clock signal and inputs it to the MIDI sound source. The MIDI sound source generates a musical tone signal based on the accompaniment data. The mixer mixes the musical tone signal and the singing voice signal output from the microphone 40 at an appropriate ratio and outputs the mixture to the amplifier. The amplifier amplifies the mixing signal from the mixer and outputs it to the speaker 20 as a sound emission signal. As a result, the karaoke performance sound based on the sound emission signal and the singing sound from the microphone 40 are emitted from the speaker 20.

表示処理部１５は、制御部１１の制御に基づき、表示装置３０における各種表示に関する処理を行う。たとえば、表示処理部１５は、カラオケ演奏時における背景画像に歌詞や各種アイコンが重ねられた映像を表示装置３０に表示させる制御を行う。 The display processing unit 15 performs processing related to various displays in the display device 30 based on the control of the control unit 11. For example, the display processing unit 15 controls the display device 30 to display an image in which lyrics and various icons are superimposed on a background image during a karaoke performance.

操作部１６は、パネルスイッチおよびリモコン受信回路などからなり、利用者によるカラオケ装置１のパネルスイッチあるいはリモコン装置５０の操作に応じて選曲信号、演奏中止信号などの操作信号を制御部１１に対して出力する。制御部１１は、操作部１６からの操作信号を検出し、対応する処理を実行する。 The operation unit 16 includes a panel switch, a remote control reception circuit, and the like, and sends operation signals such as a music selection signal and a performance stop signal to the control unit 11 according to the operation of the panel switch of the karaoke device 1 or the remote control device 50 by the user. Output. The control unit 11 detects the operation signal from the operation unit 16 and executes the corresponding process.

リモコン装置５０は、カラオケ本体１０に対する各種操作をおこなうための装置である。利用者はリモコン装置５０を用いて歌唱を希望するカラオケ楽曲の選曲（予約）等を行うことができる。 The remote control device 50 is a device for performing various operations on the karaoke main body 10. The user can use the remote control device 50 to select (reserve) a karaoke song that he / she wants to sing.

（カラオケ本体のソフトウェア）
図１に示すように、カラオケ本体１０は、評価取得部１００、統計算出部２００、及び表示制御部３００を備える。評価取得部１００、統計算出部２００、及び表示制御部３００は、ＣＰＵがメモリに記憶されるプログラムを実行することにより実現される。 (Karaoke main unit software)
As shown in FIG. 1, the karaoke main body 10 includes an evaluation acquisition unit 100, a statistical calculation unit 200, and a display control unit 300. The evaluation acquisition unit 100, the statistical calculation unit 200, and the display control unit 300 are realized by the CPU executing a program stored in the memory.

［評価取得部］
評価取得部１００は、外国語の楽曲をカラオケ歌唱した際の発音を評価するための基準情報に基づいて、利用者の歌唱音声信号を評価し、楽曲の歌詞に含まれる単語毎の発音の評価結果を示す発音評価データを取得する。 [Evaluation acquisition department]
The evaluation acquisition unit 100 evaluates the user's singing voice signal based on the reference information for evaluating the pronunciation when singing a karaoke song in a foreign language, and evaluates the pronunciation of each word included in the lyrics of the song. Acquire pronunciation evaluation data showing the result.

評価結果は、利用者がある単語を発音した際に、どれくらいネイティブと近似した発音ができたかを評価することにより得られる。評価結果は、たとえば、１〜５の５段階（数字が大きいほどネイティブの発音に近似しているとする）の発音レベルとして設定できる。 The evaluation result is obtained by evaluating how much the user can pronounce a word that is close to the native pronunciation. The evaluation result can be set, for example, as a pronunciation level of 5 levels from 1 to 5 (the larger the number, the closer to the native pronunciation).

具体例として、日本語を母国語とする利用者Ａが英語（外国語）の楽曲Ｘのカラオケ歌唱を行ったとする。この場合、評価取得部１００は、記憶部１３から楽曲Ｘの発音リファレンスデータを読み出し、利用者Ａのカラオケ歌唱により得られた歌唱音声信号と比較して単語毎に発音の評価を行う。たとえば、歌唱音声信号を解析して得られたある単語の特徴パターンが、当該ある単語の発音リファレンスデータに近い場合、評価取得部１００は、当該単語の発音がネイティブと近似した発音であると評価する。この場合、評価取得部１００は、当該単語について発音レベルが高い（上記例であれば数値「５」）という評価結果を設定する。評価取得部１００は、楽曲Ｘに含まれる全ての単語について発音レベルを設定することで、利用者Ａに対する発音評価データを取得する。評価取得部１００は、取得した発音評価データを記憶部１３に記憶させる。同様に、他の利用者が外国語のカラオケ歌唱を行った場合にも、評価取得部１００は、利用者毎に取得した発音評価データを記憶部１３に記憶させる。図２は、記憶部１３に記憶された利用者Ａ〜利用者Ｃ（いずれも日本語を母国語とする利用者）それぞれの発音評価データの一例である。ここでは、単語毎に、評価結果として「１」〜「５」の発音レベルが設定されている。 As a specific example, it is assumed that a user A whose mother tongue is Japanese sings a karaoke song of an English (foreign language) song X. In this case, the evaluation acquisition unit 100 reads the pronunciation reference data of the music X from the storage unit 13 and evaluates the pronunciation for each word by comparing with the singing voice signal obtained by the karaoke singing of the user A. For example, when the characteristic pattern of a word obtained by analyzing the singing voice signal is close to the pronunciation reference data of the word, the evaluation acquisition unit 100 evaluates that the pronunciation of the word is similar to the native pronunciation. do. In this case, the evaluation acquisition unit 100 sets an evaluation result that the pronunciation level of the word is high (in the above example, the numerical value is “5”). The evaluation acquisition unit 100 acquires pronunciation evaluation data for the user A by setting the pronunciation level for all the words included in the music X. The evaluation acquisition unit 100 stores the acquired pronunciation evaluation data in the storage unit 13. Similarly, when another user sings a karaoke song in a foreign language, the evaluation acquisition unit 100 stores the pronunciation evaluation data acquired for each user in the storage unit 13. FIG. 2 is an example of pronunciation evaluation data of each of users A to C (users whose mother tongue is Japanese) stored in the storage unit 13. Here, pronunciation levels of "1" to "5" are set as evaluation results for each word.

なお、評価取得部１００は、ある利用者の発音評価データを取得するにあたり、当該ある利用者の母国語を特定する必要がある。この際、評価取得部１００は、利用者毎に予め登録された利用者情報を参照して母国語を特定することができる。また、表示制御部３００は、楽曲を予約する際にリモコン装置５０に母国語を選択する画面を表示させる。ある利用者は、当該画面で自分の母国語を選択する。評価取得部１００は、当該入力に基づいてある利用者の母国語を特定する。或いは、評価取得部１００は、リモコン装置５０の画面で表示されている使用言語に基づいて、利用者の母国語を特定することでもよい。 In order to acquire the pronunciation evaluation data of a certain user, the evaluation acquisition unit 100 needs to specify the mother tongue of the certain user. At this time, the evaluation acquisition unit 100 can specify the mother tongue by referring to the user information registered in advance for each user. In addition, the display control unit 300 causes the remote controller device 50 to display a screen for selecting a native language when reserving a musical piece. A user selects his / her native language on the screen. The evaluation acquisition unit 100 identifies a user's native language based on the input. Alternatively, the evaluation acquisition unit 100 may specify the native language of the user based on the language used displayed on the screen of the remote controller device 50.

利用者の母国語を特定した後、評価取得部１００は、予約された楽曲の言語と特定された母国語とが一致しているかどうかにより発音評価データの取得要否を判断する。 After identifying the native language of the user, the evaluation acquisition unit 100 determines whether or not it is necessary to acquire the pronunciation evaluation data based on whether or not the language of the reserved music matches the identified native language.

また、発音評価データは、外国語毎に取得される。たとえば、利用者Ａが英語の楽曲と中国語の楽曲をカラオケ歌唱した場合、記憶部１３には利用者Ａの英語に対する発音評価データと中国語に対する発音評価データが別々に記憶される。また、取得された発音評価データは、歌唱履歴と合わせてサーバ（図示なし）に記憶されることでもよい。 In addition, pronunciation evaluation data is acquired for each foreign language. For example, when the user A sings an English song and a Chinese song in karaoke, the storage unit 13 stores the pronunciation evaluation data for English and the pronunciation evaluation data for Chinese separately. Further, the acquired pronunciation evaluation data may be stored in a server (not shown) together with the singing history.

［統計算出部］
統計算出部２００は、母国語を同じくする複数の利用者の発音評価データに基づいて、単語毎の発音の統計的な評価結果を示す発音評価統計データを算出する。 [Statistical calculation unit]
The statistical calculation unit 200 calculates pronunciation evaluation statistical data showing a statistical evaluation result of pronunciation for each word based on pronunciation evaluation data of a plurality of users who have the same native language.

具体的に、統計算出部２００は、母国語を同じくする利用者の発音評価データを記憶部１３から読み出す。そして、統計算出部２００は、発音評価データに含まれる単語毎に評価結果の統計を求める。統計は、平均値や偏差値、分散値、標準偏差等、一般的な統計値として求めることができる。 Specifically, the statistical calculation unit 200 reads out the pronunciation evaluation data of users who have the same native language from the storage unit 13. Then, the statistical calculation unit 200 obtains the statistics of the evaluation result for each word included in the pronunciation evaluation data. Statistics can be obtained as general statistical values such as mean value, deviation value, variance value, standard deviation, and the like.

たとえば、利用者Ａ〜利用者Ｅがそれぞれ英語の楽曲Ｘのカラオケ歌唱を行ったとする。利用者Ａ〜利用者Ｃの母国語は日本語であり、利用者Ｄ及び利用者Ｅの母国語は中国語であるとする。記憶部１３には、楽曲Ｘのカラオケ歌唱に基づいて取得された各利用者の発音評価データが記憶されているとする。 For example, it is assumed that users A to E each sing a karaoke song of an English song X. It is assumed that the native languages of users A to C are Japanese, and the native languages of users D and E are Chinese. It is assumed that the storage unit 13 stores the pronunciation evaluation data of each user acquired based on the karaoke singing of the music X.

この場合、統計算出部２００は、利用者Ａ〜利用者Ｃの英語に対する発音評価データを記憶部１３から読み出し、単語毎に評価を行う。たとえば、図２に示した英単語「ｌｏｖｅ」の評価結果は、利用者Ａが「４」、利用者Ｂが「３」、利用者Ｃが「５」である。統計算出部２００は、各評価結果の平均値である「４」を英単語「ｌｏｖｅ」に対する統計的な評価結果として算出する。 In this case, the statistical calculation unit 200 reads the pronunciation evaluation data for English of the users A to C from the storage unit 13 and evaluates each word. For example, the evaluation results of the English word "love" shown in FIG. 2 are "4" for user A, "3" for user B, and "5" for user C. The statistical calculation unit 200 calculates "4", which is the average value of each evaluation result, as a statistical evaluation result for the English word "love".

統計算出部２００は、楽曲Ｘに含まれる全ての単語について同様の処理を行うことで、利用者Ａ〜利用者Ｃの発音評価統計データ（母国語を同じくする利用者のデータ）を算出する。統計算出部２００は、算出した発音評価統計データを記憶部１３に記憶させる。図３は、利用者Ａ〜利用者Ｃの発音評価データに基づく発音評価統計データを示した図である。 The statistical calculation unit 200 calculates pronunciation evaluation statistical data (data of users having the same native language) of users A to C by performing the same processing on all the words included in the music X. The statistical calculation unit 200 stores the calculated pronunciation evaluation statistical data in the storage unit 13. FIG. 3 is a diagram showing pronunciation evaluation statistical data based on pronunciation evaluation data of users A to C.

発音評価統計データには、母国語を特定する情報が関連付けられている。また、発音評価統計データに含まれる各単語には、母数（統計に使用された数）が関連付けられている。図３に示した発音評価統計データであれば、母国語として「日本語」が関連付けられており、各単語（「ｌｏｖｅ」、「ｍｉｄｎｉｇｈｔ」、「ｏｆ」、「ｐｒｏｍｉｓｅｓ」、「ｗｅａｋ」）にはそれぞれ母数「３」が関連付けられている。 Information that identifies the mother tongue is associated with the pronunciation evaluation statistical data. In addition, each word included in the pronunciation evaluation statistical data is associated with a population (the number used in the statistics). In the pronunciation evaluation statistical data shown in FIG. 3, "Japanese" is associated as the mother tongue, and each word ("love", "midnight", "of", "promises", "week") is associated with it. Are associated with the population parameter "3".

なお、ある時期にヒットした洋楽曲等、良くカラオケ歌唱される楽曲に含まれる単語については、母数が急激に増加することがある。母数が多くなるにつれて、発音評価統計データの変化は乏しくなる。従って、統計算出部２００は、所定数の発音評価データに基づいて、発音評価統計データを算出することが好ましい。更に、母数の増加に伴い、利用者全体でみた場合の発音スキルが向上することが考えられる。そこで、統計算出部２００は、取得時期が比較的新しい発音評価データ（たとえば、直近１００回の発音評価データ）に基づいて、発音評価統計データを算出することが好ましい。 The population parameter of words included in songs that are often sung in karaoke, such as Western songs that hit at a certain time, may increase sharply. As the population parameter increases, the change in pronunciation evaluation statistical data becomes scarce. Therefore, it is preferable that the statistical calculation unit 200 calculates the pronunciation evaluation statistical data based on a predetermined number of pronunciation evaluation data. Furthermore, as the population parameter increases, it is conceivable that the pronunciation skills of all users will improve. Therefore, it is preferable that the statistical calculation unit 200 calculates the pronunciation evaluation statistical data based on the pronunciation evaluation data (for example, the pronunciation evaluation data of the last 100 times) whose acquisition time is relatively new.

また、上記例では、楽曲Ｘの歌詞に含まれる単語毎の発音の評価結果を示す発音評価データを用いて発音評価統計データを算出したが、これに限られない。たとえば、利用者Ａが英語の楽曲Ｘのカラオケ歌唱を行い、利用者Ｂが英語の楽曲Ｙのカラオケ歌唱を行い、利用者Ｃが英語の楽曲Ｚのカラオケ歌唱を行ったとする。また、いずれの楽曲の歌詞にも英単語「ｌｏｖｅ」が含まれていたとする。この場合、統計算出部２００は、それぞれの楽曲に含まれる英単語「ｌｏｖｅ」の発音レベルに基づいて、統計的な評価結果を算出することも可能である。更に、楽曲が異なる場合、いずれか一の楽曲にしか含まれていない単語がある可能性が高い。このような場合、統計算出部２００は、当該単語の発音レベルを、母国語を同じくする利用者の評価結果として発音評価統計データを求めることでもよい。一方、いずれか一の楽曲にしか含まれていない単語がある場合、統計算出部２００は、当該単語の評価結果を除いて、発音評価統計データを求めることでもよい。 Further, in the above example, the pronunciation evaluation statistical data is calculated using the pronunciation evaluation data indicating the pronunciation evaluation result for each word included in the lyrics of the music X, but the present invention is not limited to this. For example, assume that user A sings karaoke of English song X, user B sings karaoke of English song Y, and user C sings karaoke of English song Z. Further, it is assumed that the lyrics of each song contain the English word "love". In this case, the statistical calculation unit 200 can also calculate the statistical evaluation result based on the pronunciation level of the English word "love" included in each song. Furthermore, if the songs are different, there is a high possibility that some words are included in only one of the songs. In such a case, the statistical calculation unit 200 may obtain the pronunciation evaluation statistical data as the evaluation result of the user who has the same native language as the pronunciation level of the word. On the other hand, when there is a word included in only one of the songs, the statistical calculation unit 200 may obtain the pronunciation evaluation statistical data excluding the evaluation result of the word.

ここで、統計算出部２００は、母国語を同じくする利用者が外国語のカラオケ歌唱を行う都度（発音評価データが取得される都度）、記憶部１３に既に記憶されている発音評価統計データを算出し直すことが好ましい。また、その都度、発音評価統計データに新たな単語に対する発音の評価結果を追加することも可能である。 Here, the statistical calculation unit 200 stores the pronunciation evaluation statistical data already stored in the storage unit 13 each time a user having the same native language sings a karaoke song in a foreign language (every time the pronunciation evaluation data is acquired). It is preferable to recalculate. It is also possible to add the pronunciation evaluation result for a new word to the pronunciation evaluation statistical data each time.

具体的に、統計算出部２００は、発音評価統計データに含まれている単語が再度カラオケ歌唱された場合には、当該単語に対する発音の評価結果を含めた統計的な評価結果を改めて算出することで発音評価統計データを変更し、発音評価統計データに含まれていない新たな単語がカラオケ歌唱された場合には、当該新たな単語に対する発音の評価結果を発音評価統計データに追加する。 Specifically, when the word included in the pronunciation evaluation statistical data is sung again in karaoke, the statistical calculation unit 200 recalculates the statistical evaluation result including the pronunciation evaluation result for the word. When the pronunciation evaluation statistical data is changed in and a new word not included in the pronunciation evaluation statistical data is sung in karaoke, the pronunciation evaluation result for the new word is added to the pronunciation evaluation statistical data.

たとえば、図３に示した発音評価統計データが記憶部１３に記憶されたとする。その後、日本語を母国語とする利用者Ｆが英語の楽曲Ｘのカラオケ歌唱を行った場合、評価取得部１００は、利用者Ｆの歌唱音声信号を評価し、発音評価データを取得する。 For example, it is assumed that the pronunciation evaluation statistical data shown in FIG. 3 is stored in the storage unit 13. After that, when the user F whose mother tongue is Japanese sings the karaoke of the English song X, the evaluation acquisition unit 100 evaluates the singing voice signal of the user F and acquires the pronunciation evaluation data.

統計算出部２００は、記憶部１３から図３に示した発音評価統計データを読み出し、利用者Ｆの発音評価データを含めた統計的な評価結果を改めて算出する。たとえば、英単語「ｌｏｖｅ」について、利用者Ｆの評価結果が「３」であったとする。この場合、統計算出部２００は、記憶部１３に記憶された利用者Ａ〜利用者Ｃの英単語「ｌｏｖｅ」の評価結果の平均値「４」（母数３）と、利用者Ｆの評価結果「３」に基づいて、統計的な評価結果を改めて求める。この例において、統計算出部２００は、（統計的な評価結果「４」×母数「３」）＋利用者Ｆの評価結果「３」／母数「４」＝「３．７５」を新たな統計的な評価結果として算出する。また、英単語「ｌｏｖｅ」には新たに母数「４」を関連付ける。 The statistical calculation unit 200 reads out the pronunciation evaluation statistical data shown in FIG. 3 from the storage unit 13 and recalculates the statistical evaluation result including the pronunciation evaluation data of the user F. For example, it is assumed that the evaluation result of the user F is "3" for the English word "love". In this case, the statistical calculation unit 200 has the average value "4" (parameter 3) of the evaluation results of the English words "love" of the users A to C stored in the storage unit 13 and the evaluation of the user F. Based on the result "3", the statistical evaluation result is obtained again. In this example, the statistical calculation unit 200 newly adds (statistical evaluation result "4" x parameter "3") + evaluation result "3" of user F / parameter "4" = "3.75". Calculated as a statistical evaluation result. In addition, the parameter "4" is newly associated with the English word "love".

統計算出部２００は、楽曲Ｘに含まれる全ての単語について、統計的な評価結果を改めて算出し母数を更新することで、発音評価統計データの変更を行う。 The statistical calculation unit 200 changes the pronunciation evaluation statistical data by recalculating the statistical evaluation results and updating the population parameter for all the words included in the music X.

一方、利用者Ｆが英語の楽曲Ｙのカラオケ歌唱を行ったとする。ここで、楽曲Ｙの中に楽曲Ｘに含まれていない英単語「ｄｅａｔｈ」が含まれていた場合、評価取得部１００は、利用者Ｆのカラオケ歌唱により得られた歌唱音声信号、及び楽曲Ｙの発音リファレンスデータに基づいて、新たな単語「ｄｅａｔｈ」の発音を評価し、その評価結果を発音レベルの値として取得する。この場合、統計算出部２００は、取得された評価結果を、母国語を日本語とする利用者の発音評価統計データに追加する。 On the other hand, it is assumed that the user F sings the English song Y in karaoke. Here, when the English word "data" that is not included in the music X is included in the music Y, the evaluation acquisition unit 100 receives the singing voice signal obtained by the karaoke singing of the user F and the music Y. The pronunciation of the new word "death" is evaluated based on the pronunciation reference data of, and the evaluation result is acquired as the value of the pronunciation level. In this case, the statistical calculation unit 200 adds the acquired evaluation result to the pronunciation evaluation statistical data of the user whose mother tongue is Japanese.

［表示制御部］
表示制御部３００は、表示処理部１５を制御し、カラオケ装置１おける各種表示制御を行う。 [Display control unit]
The display control unit 300 controls the display processing unit 15 and performs various display controls on the karaoke device 1.

本実施形態において、表示制御部３００は、利用者が外国語の楽曲をカラオケ歌唱する際、歌詞テロップデータに基づいて外国語の歌詞テロップを表示させ、且つ当該歌詞テロップに含まれる単語毎に、当該利用者の母国語に対応する発音評価統計データに応じた表示態様でルビデータに基づく母国語のルビを表示する。 In the present embodiment, when the user sings a song in a foreign language in karaoke, the display control unit 300 displays the lyrics telop in the foreign language based on the lyrics telop data, and for each word included in the lyrics telop, The lyrics of the native language based on the ruby data are displayed in a display mode according to the pronunciation evaluation statistical data corresponding to the native language of the user.

たとえば、日本語を母国語とする利用者Ａが英語の楽曲Ｚをカラオケ歌唱するとする。また、日本語を母国語とする利用者の発音評価統計データとして図３に示すデータが記憶部１３に記憶されているとする。 For example, suppose user A whose mother tongue is Japanese sings English song Z in karaoke. Further, it is assumed that the data shown in FIG. 3 is stored in the storage unit 13 as pronunciation evaluation statistical data of a user whose mother tongue is Japanese.

表示制御部３００は、記憶部１３から楽曲Ｚの歌詞テロップデータを読み出し、楽曲Ｚのカラオケ演奏に合わせて、表示装置３０に英語の歌詞テロップを表示させる。また、表示制御部３００は、記憶部１３から楽曲Ｚのルビデータを読み出し、歌詞テロップの表示と合わせてルビを表示させる。 The display control unit 300 reads the lyrics telop data of the music Z from the storage unit 13, and causes the display device 30 to display the English lyrics telop in accordance with the karaoke performance of the music Z. Further, the display control unit 300 reads the ruby data of the music Z from the storage unit 13 and displays the ruby together with the display of the lyrics telop.

この際、表示制御部３００は、歌詞テロップに含まれる単語が、図３に示す発音評価統計データに含まれているかどうかを確認する。歌詞テロップに含まれる単語が発音評価統計データに含まれている場合、表示制御部３００は、当該単語の評価結果に応じた表示態様でルビを表示させる。 At this time, the display control unit 300 confirms whether or not the word included in the lyrics telop is included in the pronunciation evaluation statistical data shown in FIG. When the word included in the lyrics telop is included in the pronunciation evaluation statistical data, the display control unit 300 displays ruby in a display mode according to the evaluation result of the word.

評価結果とルビの表示態様の関係は予め設定されている。図４は、評価結果としての発音レベルの値とルビの表示態様の関係を規定したテーブルデータである。図４においては、発音レベルの値が高くなればなるほど（ネイティブに近似した発音をすればするほど）、ルビの表示サイズが小さくなる（発音レベルが最大値の場合、ルビを表示させない）よう設定されている。このようなテーブルデータは、たとえば記憶部１３に記憶されている。なお、テーブルデータを用いる代わりに、所定の変換式に基づいて関数的にルビの表示サイズを決定してもよい。 The relationship between the evaluation result and the display mode of ruby is preset. FIG. 4 is table data that defines the relationship between the pronunciation level value as the evaluation result and the display mode of ruby. In FIG. 4, the higher the pronunciation level value (the more natively approximated pronunciation is), the smaller the ruby display size (when the pronunciation level is the maximum value, the ruby is not displayed) is set. Has been done. Such table data is stored in, for example, the storage unit 13. Instead of using the table data, the display size of ruby may be functionally determined based on a predetermined conversion formula.

ここで、楽曲Ｚの歌詞テロップに単語「ｐｒｏｍｉｓｅｓ」が含まれているとする。この場合、表示制御部３００は、図３に示す発音評価統計データから単語「ｐｒｏｍｉｓｅｓ」の発音レベルの値「１．７」を特定し、図４に示すテーブルデータを参照して発音レベルの値「１．７」に対応する表示態様を決定する。そして、表示制御部３００は、単語「ｐｒｉｍｉｓｅｓ」のルビ「プロミスィズ」を通常サイズの１．５倍のサイズで表示させる（図５参照）。なお、図５の例では、単語「Ｄｏ」は発音レベルの値が「５」であるため表示されず、単語「ｂｅｌｉｅｖｅ」及び「ｆａｌｓｅ」は発音レベルの値が「３」であるため通常サイズ（１．０倍）で表示され、単語「ｎｏｔ」は発音レベルの値が「４」であるため、通常サイズよりも少し小さいサイズ（０．８倍）で表示されている。 Here, it is assumed that the word "promises" is included in the lyrics telop of the song Z. In this case, the display control unit 300 identifies the pronunciation level value "1.7" of the word "promises" from the pronunciation evaluation statistical data shown in FIG. 3, and refers to the table data shown in FIG. 4 to obtain the pronunciation level value. The display mode corresponding to "1.7" is determined. Then, the display control unit 300 displays the ruby "promises" of the word "primises" in a size 1.5 times the normal size (see FIG. 5). In the example of FIG. 5, the word "Do" is not displayed because the pronunciation level value is "5", and the words "believe" and "false" are normal sizes because the pronunciation level value is "3". It is displayed at (1.0 times), and the word "not" is displayed at a size (0.8 times) slightly smaller than the normal size because the pronunciation level value is "4".

一方、表示制御部２００は、発音評価統計データに含まれていない新たな単語を歌詞テロップとして表示する場合、所定の表示態様でルビデータに基づくルビを表示する。 On the other hand, when displaying a new word not included in the pronunciation evaluation statistical data as a lyrics telop, the display control unit 200 displays ruby based on the ruby data in a predetermined display mode.

所定の表示態様は、予めルビデータにおいて設定されている。たとえば、ルビの表示サイズを変更する場合、所定の表示態様として通常サイズ（１．０倍）が設定される。 The predetermined display mode is set in advance in the ruby data. For example, when changing the display size of ruby, a normal size (1.0 times) is set as a predetermined display mode.

なお、表示態様の変更は、ルビの表示サイズの変更に限られない。たとえば、ルビの表示色や輝度を変更することでもよいし、ルビの表示時間や表示タイミングを変更する（発音レベルの値が低い単語は早めに表示する等）ことでもよい。或いは、ルビの字体を変更したり、ルビに下線を付したりすることでもよい。評価結果（発音レベル）とルビの表示態様（表示色、輝度、表示時間、表示タイミング、字体、下線の有無など）との関係は、前述したようにテーブルデータとして予め設定され、記憶部１３に記憶されてもよいし、所定の変換式に基づいて関数的に表示態様を決定してもよい。また、これらを組み合わせることで表示態様の変更を行ってもよい。 The change in the display mode is not limited to the change in the display size of ruby. For example, the display color and brightness of ruby may be changed, or the display time and timing of ruby may be changed (words with a low pronunciation level value are displayed earlier, etc.). Alternatively, the ruby font may be changed or the ruby may be underlined. The relationship between the evaluation result (pronunciation level) and the ruby display mode (display color, brightness, display time, display timing, font, presence / absence of underline, etc.) is preset as table data as described above, and is stored in the storage unit 13. It may be stored, or the display mode may be determined functionally based on a predetermined conversion formula. Moreover, you may change the display mode by combining these.

＝＝カラオケ装置１の動作について＝＝
次に、図６Ａ及び図６Ｂを参照して本実施形態におけるカラオケ装置１の動作の具体例について述べる。 == About the operation of the karaoke device 1 ==
Next, a specific example of the operation of the karaoke device 1 in the present embodiment will be described with reference to FIGS. 6A and 6B.

［発音評価統計データの取得］
図６Ａは、発音評価統計データを取得する際のカラオケ装置１の動作例を示すフローチャートである。 [Acquisition of pronunciation evaluation statistical data]
FIG. 6A is a flowchart showing an operation example of the karaoke device 1 when acquiring pronunciation evaluation statistical data.

日本語を母国語とする利用者が英語の楽曲Ｘのカラオケ歌唱を行った場合、カラオケ装置１は、日本語を母国語とする利用者が英語の楽曲を歌唱した場合の発音評価統計データを、記憶部１３から読み出し（発音評価統計データの読み出し。ステップ１０）、マイク４０を介して得られた音声に基づいて歌唱音声信号を取得する（歌唱音声信号の取得。ステップ１１）。 When a user whose native language is Japanese sings a karaoke song of English song X, the karaoke device 1 uses pronunciation evaluation statistical data when a user whose native language is Japanese sings an English song. , Read from the storage unit 13 (pronunciation evaluation statistical data read. Step 10), and acquire a singing voice signal based on the voice obtained through the microphone 40 (acquisition of singing voice signal. Step 11).

評価取得部１００は、記憶部１３から楽曲Ｘの発音リファレンスデータを読み出し、ステップ１１で得られた歌唱音声信号と比較することで、楽曲Ｘの歌詞に含まれる単語毎に発音の評価を行い、発音評価データを取得する（発音評価データの取得。ステップ１２）。 The evaluation acquisition unit 100 reads the pronunciation reference data of the song X from the storage unit 13 and compares it with the singing voice signal obtained in step 11 to evaluate the pronunciation for each word included in the lyrics of the song X. Acquire pronunciation evaluation data (acquisition of pronunciation evaluation data. Step 12).

ステップ１０で読み出された発音評価統計データに含まれている単語が再度カラオケ歌唱された場合（ステップ１３でＹの場合）、統計算出部２００は、当該単語に対する発音の評価結果を含めた統計的な評価結果を改めて算出することで発音評価統計データを変更する（発音評価統計データの変更。ステップ１４）。統計算出部２００は、ステップ１４で算出した発音評価統計データを記憶部１３に記憶させる（発音評価統計データの記憶。ステップ１５）。一方、発音評価統計データに含まれていない新たな単語がカラオケ歌唱された場合（ステップ１３でＮの場合）、統計算出部２００は、当該新たな単語に対する発音の評価結果を発音評価統計データに追加して記憶する（評価結果を発音評価統計データに追加して記憶。ステップ１６）。統計算出部２００は、楽曲Ｘに含まれる全ての単語について統計的な評価結果を算出するまで（ステップ１７でＹの場合まで）、ステップ１１〜ステップ１６の処理を繰り返し行う。 When the word included in the pronunciation evaluation statistical data read in step 10 is sung again in karaoke (in the case of Y in step 13), the statistical calculation unit 200 includes statistics including the pronunciation evaluation result for the word. The pronunciation evaluation statistical data is changed by recalculating the typical evaluation result (change of pronunciation evaluation statistical data. Step 14). The statistical calculation unit 200 stores the pronunciation evaluation statistical data calculated in step 14 in the storage unit 13 (memory of pronunciation evaluation statistical data. Step 15). On the other hand, when a new word not included in the pronunciation evaluation statistical data is sung in karaoke (in the case of N in step 13), the statistical calculation unit 200 converts the pronunciation evaluation result for the new word into the pronunciation evaluation statistical data. Add and memorize (add and memorize the evaluation result to the pronunciation evaluation statistical data. Step 16). The statistical calculation unit 200 repeats the processes of steps 11 to 16 until the statistical evaluation results are calculated for all the words included in the music X (until the case of Y in step 17).

［ルビの表示］
図６Ｂは、ルビを表示させる際のカラオケ装置１の動作例を示すフローチャートである。この例では、図６Ａに示した処理により、日本語を母国語とする利用者の英語に対する発音評価統計データが既に取得され、記憶部１３に記憶されているとする。 [Display of ruby]
FIG. 6B is a flowchart showing an operation example of the karaoke device 1 when displaying ruby. In this example, it is assumed that the pronunciation evaluation statistical data for English of a user whose mother tongue is Japanese has already been acquired by the process shown in FIG. 6A and stored in the storage unit 13.

ここで、日本語を母国語とする利用者が英語の楽曲Ｙのカラオケ歌唱を行う場合、表示制御部３００は、記憶部１３から楽曲Ｙの歌詞テロップデータを読み出し、カラオケ演奏に合わせて表示装置３０に歌詞テロップを表示させる（歌詞テロップの表示。ステップ２０）。なお、利用者の母国語は、たとえば、利用者がカラオケ装置１にログインした際、利用者について予め登録された利用者情報に基づいて特定される。 Here, when a user whose native language is Japanese sings a karaoke song of the English song Y, the display control unit 300 reads the lyrics telop data of the song Y from the storage unit 13 and displays it according to the karaoke performance. Display the lyrics telop on 30 (display of the lyrics telop. Step 20). The native language of the user is specified, for example, based on the user information registered in advance for the user when the user logs in to the karaoke device 1.

この際、表示制御部３００は、表示される歌詞テロップに含まれる単語が記憶部１３に記憶されている発音評価統計データに含まれているかどうかを確認する。表示される歌詞テロップに含まれる単語が発音評価統計データに含まれている場合（ステップ２１でＹの場合）、表示制御部３００は、歌詞テロップの表示に合わせて、発音評価統計データに応じた表示態様でルビを表示させる（発音評価統計データに応じた表示態様でルビを表示。ステップ２２）。 At this time, the display control unit 300 confirms whether or not the word included in the displayed lyrics telop is included in the pronunciation evaluation statistical data stored in the storage unit 13. When the word included in the displayed lyrics telop is included in the pronunciation evaluation statistical data (in the case of Y in step 21), the display control unit 300 responds to the pronunciation evaluation statistical data according to the display of the lyrics telop. The ruby is displayed in the display mode (the ruby is displayed in the display mode according to the pronunciation evaluation statistical data. Step 22).

一方、表示される歌詞テロップに含まれる単語が発音評価統計データに含まれていない場合（ステップ２１でＮの場合）、表示制御部３００は、歌詞テロップの表示に合わせて、予め設定された所定の表示態様でルビを表示させる（所定の表示態様でルビを表示。ステップ２３）。 On the other hand, when the word included in the displayed lyrics telop is not included in the pronunciation evaluation statistical data (in the case of N in step 21), the display control unit 300 sets a predetermined value according to the display of the lyrics telop. The ruby is displayed in the display mode of (step 23).

表示制御部３００は、楽曲Ｙの歌詞テロップ及びルビを全て表示するまで（ステップ２４でＹの場合）、ステップ２０〜ステップ２３の処理を繰り返し行う。 The display control unit 300 repeats the processes of steps 20 to 23 until all the lyrics telop and ruby of the music Y are displayed (in the case of Y in step 24).

以上から明らかなように、本実施形態に係るカラオケ装置１は、外国語の楽曲をカラオケ歌唱した際の発音を評価するための発音リファレンスデータに基づいて、利用者の歌唱音声信号を評価し、楽曲の歌詞に含まれる単語毎の発音の評価結果を示す発音評価データを取得する評価取得部１００と、母国語を同じくする複数の利用者の発音評価データに基づいて、単語毎の発音の統計的な評価結果を示す発音評価統計データを算出する統計算出部２００と、利用者が外国語の楽曲をカラオケ歌唱する際、歌詞テロップデータに基づいて外国語の歌詞テロップを表示させ、且つ当該歌詞テロップに含まれる単語毎に、当該利用者の母国語に対応する発音評価統計データに応じた表示態様でルビデータに基づく母国語のルビを表示する表示制御部３００と、を有するカラオケ装置である。 As is clear from the above, the karaoke device 1 according to the present embodiment evaluates the singing voice signal of the user based on the pronunciation reference data for evaluating the pronunciation when karaoke singing a song in a foreign language. Statistics of pronunciation for each word based on the evaluation acquisition unit 100 that acquires pronunciation evaluation data indicating the evaluation result of pronunciation for each word included in the lyrics of the song, and the pronunciation evaluation data of a plurality of users who share the same native language. A statistical calculation unit 200 that calculates pronunciation evaluation statistical data showing a typical evaluation result, and when a user sings a song in a foreign language in karaoke, the lyrics telop in the foreign language is displayed based on the lyrics telop data, and the lyrics It is a karaoke device having a display control unit 300 for displaying the ruby of the native language based on the ruby data in a display mode corresponding to the pronunciation evaluation statistical data corresponding to the native language of the user for each word included in the telop.

このようなカラオケ装置１によれば、外国語の楽曲をカラオケ歌唱する際、利用者の母国語に応じて単語毎にルビの表示態様の切り替えが可能となる。具体的に、カラオケ装置１は、母国語を同じくする利用者の発音評価データに基づく統計的な評価結果（たとえば、発音レベルの平均値）に応じて、歌詞テロップに含まれる単語毎にルビの表示態様を切り替えることができる。従って、たとえば、発音が容易な単語についてはルビを表示させないことにより、ルビの表示によりカラオケ歌唱が妨げられるといった状況が生じない。また、発音が困難な単語についてはルビを大きく表示させる等により、ルビが見やすくなり、また母国語を同じくする利用者が共通して発音が困難な単語であることを、現在カラオケ歌唱を行っている利用者自身が容易に認識できるため、注意してカラオケ歌唱を行うことができる。更に、予め算出された発音評価統計データに応じてルビの表示態様を変えるため、発音が困難な単語にも関わらずルビが表示されないといった状況を回避することができる。 According to such a karaoke device 1, when singing a karaoke song in a foreign language, it is possible to switch the display mode of ruby for each word according to the mother tongue of the user. Specifically, the karaoke device 1 has ruby characters for each word included in the lyrics telop according to the statistical evaluation result (for example, the average value of the pronunciation level) based on the pronunciation evaluation data of users who have the same native language. The display mode can be switched. Therefore, for example, by not displaying ruby for words that are easy to pronounce, there is no situation in which the display of ruby interferes with karaoke singing. In addition, for words that are difficult to pronounce, by displaying the ruby in a large size, it becomes easier to see the ruby, and karaoke singing is currently being performed to show that the words that are common to users who have the same native language are difficult to pronounce. Since the user can easily recognize it, he / she can sing karaoke with caution. Further, since the ruby display mode is changed according to the pronunciation evaluation statistical data calculated in advance, it is possible to avoid the situation where the ruby is not displayed even though the word is difficult to pronounce.

また、統計算出部２００は、発音評価統計データに含まれている単語が再度カラオケ歌唱された場合には、当該単語に対する発音の評価結果を含めた統計的な評価結果を改めて算出することで発音評価統計データを変更し、発音評価統計データに含まれていない新たな単語がカラオケ歌唱された場合には、当該新たな単語に対する発音の評価結果を発音評価統計データに追加する。このように、既に評価済みの単語の統計的な評価結果を更新することにより、母国語を同じくする利用者の外国語の習熟度を反映してルビの表示態様を変えることができる。また、新たな単語に対する発音の評価結果を発音評価統計データの一部として追加することにより、表示態様の切り替えが可能なルビを増やすことができる。 Further, when the word included in the pronunciation evaluation statistical data is sung again in karaoke, the statistical calculation unit 200 pronounces the word by recalculating the statistical evaluation result including the evaluation result of the pronunciation of the word. When the evaluation statistical data is changed and a new word not included in the pronunciation evaluation statistical data is sung in karaoke, the pronunciation evaluation result for the new word is added to the pronunciation evaluation statistical data. In this way, by updating the statistical evaluation results of the words that have already been evaluated, it is possible to change the display mode of ruby to reflect the proficiency level of the foreign language of the user who has the same mother tongue. Further, by adding the pronunciation evaluation result for a new word as a part of the pronunciation evaluation statistical data, it is possible to increase the number of ruby characters whose display modes can be switched.

また、表示制御部３００は、発音評価統計データに含まれていない新たな単語を歌詞テロップとして表示する場合、所定の表示態様でルビデータに基づくルビを表示する。このような構成によれば、発音評価データに含まれていない新たな単語がある楽曲をカラオケ歌唱する場合であっても、利用者が参照しやすい適当な表示態様でルビを表示することができる。 Further, when displaying a new word not included in the pronunciation evaluation statistical data as a lyrics telop, the display control unit 300 displays ruby based on the ruby data in a predetermined display mode. According to such a configuration, even when karaoke singing a song having a new word not included in the pronunciation evaluation data, the ruby can be displayed in an appropriate display mode that is easy for the user to refer to. ..

＜その他＞
なお、上記実施形態では、歌唱音声信号と発音リファレンスデータとを比較することにより、歌唱評価データを取得する例について述べたが、これに限られない。母国語及び外国語の発音の特徴を分類したパターンデータを含むデータベースを利用することにより、歌唱評価データを取得することができる。この場合、楽曲データは発音リファレンスデータを含む必要が無い。このようなデータベースは、「基本情報」の一例である。 <Others>
In the above embodiment, an example of acquiring singing evaluation data by comparing a singing voice signal and pronunciation reference data has been described, but the present invention is not limited to this. Singing evaluation data can be obtained by using a database containing pattern data that classifies the pronunciation characteristics of the native language and the foreign language. In this case, the music data does not need to include the pronunciation reference data. Such a database is an example of "basic information".

具体的に、カラオケ装置１（記憶部１３）は、日本人が発音した英語の発音の特徴パターンデータ、及びネイティブの発音の特徴パターンデータからなるデータベースを記憶しておく。ここで、利用者Ａが外国語の楽曲Ｘのカラオケ歌唱を行った場合、評価取得部１００は、歌唱音声信号を解析し、単語毎の特徴パターンを抽出する。評価取得部１００は、抽出された特徴パターンをデータベースと比較し、日本語の発音との近似度及びネイティブの発音との近似度に応じて発音レベルの値を設定する。評価取得部１００は、楽曲Ｘに含まれる全ての単語について発音レベルの値を設定することで、利用者Ａの発音評価データを取得する（抽出された特徴パターンとデータベースとの比較処理について、詳細は特開２００１−２８２０９６号公報を参照）。 Specifically, the karaoke device 1 (storage unit 13) stores a database composed of English pronunciation feature pattern data pronounced by Japanese and native pronunciation feature pattern data. Here, when the user A sings a karaoke song of the music X in a foreign language, the evaluation acquisition unit 100 analyzes the singing audio signal and extracts a feature pattern for each word. The evaluation acquisition unit 100 compares the extracted feature patterns with the database, and sets the pronunciation level value according to the degree of approximation with the Japanese pronunciation and the degree of approximation with the native pronunciation. The evaluation acquisition unit 100 acquires the pronunciation evaluation data of the user A by setting the pronunciation level values for all the words included in the music X (details regarding the comparison process between the extracted feature pattern and the database). See Japanese Patent Application Laid-Open No. 2001-282096).

また、同じ母国語を使用する利用者であっても、年齢、性別、外国語のスキル等、様々な違いによって同じ単語であっても発音の得意・不得意がある。そこで、統計算出部２００は、それらの違いを考慮して、発音評価統計データを算出することでもよい。 In addition, even users who use the same mother tongue have strengths and weaknesses in pronunciation of the same word due to various differences such as age, gender, and foreign language skills. Therefore, the statistical calculation unit 200 may calculate the pronunciation evaluation statistical data in consideration of the difference between them.

具体的に、統計算出部２００は、母国語を同じくする複数の利用者であって、且つ少なくとも年齢及び／または性別を含む識別情報を同じくする利用者毎に発音評価統計データを算出する。 Specifically, the statistical calculation unit 200 calculates pronunciation evaluation statistical data for each of a plurality of users who have the same native language and who have the same identification information including at least age and / or gender.

識別情報は、母国語を同じくする利用者を区別するための情報であり、少なくとも年齢及び／または性別を含む。その他の識別情報としては、留学歴、海外在住歴、外国語に関する試験結果や資格（たとえば、ＴＯＥＩＣ（登録商標）の得点、実用英語技能検定の級数）、利用者の住所または歌唱地、或いは外国語の楽曲の歌唱履歴等を用いることができる。これらの識別情報は、たとえば、各利用者の利用者情報に含まれていてもよいし、各利用者の発音評価データと関連付けて記憶部１３に記憶されていてもよい。 The identification information is information for distinguishing users who have the same native language, and includes at least age and / or gender. Other identification information includes study abroad history, overseas residence history, test results and qualifications related to foreign languages (for example, TOEIC (registered trademark) score, Practical English proficiency test series), user's address or singing place, or foreign language. The singing history of the song of the word can be used. These identification information may be included in the user information of each user, for example, or may be stored in the storage unit 13 in association with the pronunciation evaluation data of each user.

また、表示制御部３００は、利用者の母国語及び識別情報に対応する発音評価統計データに応じた表示態様でルビデータに基づく母国語のルビを表示する。 In addition, the display control unit 300 displays ruby in the native language based on the ruby data in a display mode corresponding to the pronunciation evaluation statistical data corresponding to the user's native language and identification information.

たとえば、日本語を母国語とする２０代・男性が英語の楽曲Ｘのカラオケ歌唱を行うとする。この場合、統計算出部２００は、日本語を母国語とする複数の利用者の発音評価データを記憶部１３から読み出す。そして、統計算出部２００は、読み出した発音評価データに関連付けられた識別情報に基づいて、２０代且つ男性の発音評価データのみを抽出する。統計算出部２００は、抽出した発音評価データに基づいて、発音評価統計データを算出する。表示制御部３００は、算出した当該発音評価統計データに応じた表示態様でルビデータに基づく日本語のルビを表示させる。 For example, suppose a man in his twenties whose mother tongue is Japanese sings karaoke of English song X. In this case, the statistical calculation unit 200 reads out the pronunciation evaluation data of a plurality of users whose mother tongue is Japanese from the storage unit 13. Then, the statistical calculation unit 200 extracts only the pronunciation evaluation data of men in their twenties based on the identification information associated with the read pronunciation evaluation data. The statistical calculation unit 200 calculates the pronunciation evaluation statistical data based on the extracted pronunciation evaluation data. The display control unit 300 displays Japanese ruby based on the ruby data in a display mode according to the calculated pronunciation evaluation statistical data.

なお、カラオケ歌唱を行う利用者の年齢、性別は、予め登録された利用者情報に基づいて特定してもよいし、カラオケ歌唱を行う前に予め所定の識別情報を入力することでもよい。或いは、公知の顔認証技術を利用して、カラオケ歌唱を行う利用者の年齢、性別を特定することでもよい。 The age and gender of the user who sings karaoke may be specified based on the user information registered in advance, or predetermined identification information may be input in advance before singing karaoke. Alternatively, the age and gender of the user who sings karaoke may be specified by using a known face recognition technique.

このように、母国語を同じくする複数の利用者の中から識別情報を同じくする利用者毎に発音評価統計データを算出することにより、カラオケ歌唱を行う利用者により適した表示態様でルビの表示が可能となる。 In this way, by calculating the pronunciation evaluation statistical data for each user who has the same identification information from among a plurality of users who have the same native language, the ruby is displayed in a display mode more suitable for the user who sings karaoke. Is possible.

上記実施形態は、例として提示したものであり、発明の範囲を限定するものではない。上記の構成は、適宜組み合わせて実施することが可能であり、発明の要旨を逸脱しない範囲で、種々の省略、置き換え、変更を行うことができる。上記実施形態やその変形は、発明の範囲や要旨に含まれると同様に、特許請求の範囲に記載された発明とその均等の範囲に含まれる。 The above embodiment is presented as an example and does not limit the scope of the invention. The above configurations can be implemented in appropriate combinations, and various omissions, replacements, and changes can be made without departing from the gist of the invention. The above-described embodiments and modifications thereof are included in the scope and gist of the invention, as well as in the scope of the invention described in the claims and the equivalent scope thereof.

１カラオケ装置
１００評価取得部
２００統計算出部
３００表示制御部 1 Karaoke device 100 Evaluation acquisition unit 200 Statistics calculation unit 300 Display control unit

Claims

Pronunciation evaluation data that evaluates the user's singing voice signal based on the reference information for evaluating the pronunciation when singing a foreign language song in karaoke, and shows the evaluation result of the pronunciation of each word included in the lyrics of the song. With the evaluation acquisition department to acquire
A statistical calculation unit that calculates pronunciation evaluation statistical data showing statistical evaluation results of pronunciation for each word based on the pronunciation evaluation data of a plurality of users who share the same native language.
When the user sings a song in a foreign language in karaoke, the lyrics telop in the foreign language is displayed based on the lyrics telop data, and each word included in the lyrics telop corresponds to the user's native language. A display control unit that displays the lyrics of the native language based on the lyrics data in a display mode according to the pronunciation evaluation statistical data,
Karaoke device with.

When the word included in the pronunciation evaluation statistical data is sung again in karaoke, the statistical calculation unit recalculates the statistical evaluation result including the pronunciation evaluation result for the word, thereby calculating the pronunciation. When the evaluation statistical data is changed and a new word not included in the pronunciation evaluation statistical data is sung in karaoke, the pronunciation evaluation result for the new word is added to the pronunciation evaluation statistical data. The karaoke device according to claim 1.

Claim 1 or 2 is characterized in that, when displaying a new word not included in the pronunciation evaluation statistical data as a lyrics telop, the display control unit displays ruby based on the ruby data in a predetermined display mode. The karaoke device described.

The karaoke device according to any one of claims 1 to 3, wherein the display control unit displays ruby in the native language based on the ruby data in a size corresponding to the pronunciation evaluation statistical data.

The statistical calculation unit calculates the pronunciation evaluation statistical data for each of a plurality of users who have the same native language and who have the same identification information including at least age and / or gender.
The display control unit according to claims 1 to 4, wherein the display control unit displays ruby in the mother tongue based on the ruby data in a display mode corresponding to the pronunciation evaluation statistical data corresponding to the user's mother tongue and identification information. The karaoke device described in any one.