JP2020034750A

JP2020034750A - Karaoke device

Info

Publication number: JP2020034750A
Application number: JP2018161624A
Authority: JP
Inventors: 政之鎌田; Masayuki Kamata
Original assignee: Daiichikosho Co Ltd
Current assignee: Daiichikosho Co Ltd
Priority date: 2018-08-30
Filing date: 2018-08-30
Publication date: 2020-03-05
Anticipated expiration: 2038-08-30
Also published as: JP7082549B2

Abstract

To provide a karaoke device capable of correcting a scoring result of karaoke singing by a user without being noticed by the user.SOLUTION: There is provided a karaoke device functioning as: extraction means for extracting voiceprint data of respective users from collected conversation voices; decision means for deciding ranks of the respective voiceprint data based upon the collected conversation voices and storing information on the ranks in association with the respective voiceprint data; and correction means for identifying, when one user sings at karaoke, voiceprint data matching data for voiceprint authentication extracted from a singing voice of the one user, and then correcting, based upon information on the rank associated with the identified voiceprint data, a scoring result of the karaoke singing by the one user.SELECTED DRAWING: Figure 4

Description

本発明はカラオケ装置に関する。 The present invention relates to a karaoke apparatus.

カラオケ装置は、利用者によるカラオケ歌唱を評価し、採点結果を提示する採点機能を有する。カラオケ歌唱の評価は、マイクにより入力された歌唱音声から抽出した歌唱音声データと、カラオケ演奏された楽曲の主旋律を示すリファレンスデータとを比較することにより行う。採点機能を利用することで、利用者同士で採点結果を競い合ったり、カラオケ歌唱の練習成果を確認したりすることができる。 The karaoke apparatus has a scoring function of evaluating a karaoke singing by a user and presenting a scoring result. The evaluation of the karaoke singing is performed by comparing singing voice data extracted from the singing voice input by the microphone with reference data indicating the main melody of the music piece played by the karaoke. By using the scoring function, it is possible for users to compete with each other for scoring results, and to check the karaoke singing practice results.

或いは、接待や職場の飲み会のような場においても、場を盛り上げるために採点機能を利用することがある。しかし、従来の採点機能では客観的かつ機械的な採点が行われるため、たとえ接待相手や会社の上司であってもカラオケ歌唱が上手くない場合には、低い採点結果が提示される。そのため、場の雰囲気が悪くなる可能性があった。 Alternatively, even in places such as entertainment and drinking parties at work, a scoring function may be used to excite the place. However, in the conventional scoring function, objective and mechanical scoring is performed. Therefore, even if the boss sings a karaoke song, even if it is a business partner or a company boss, a low scoring result is presented. For this reason, there was a possibility that the atmosphere of the place became worse.

そこで、特許文献１には、歌唱者に気づかれることなく操作して採点結果を割り増しする機能を有するカラオケ装置が開示されている。 In view of this, Patent Literature 1 discloses a karaoke apparatus having a function of operating a singer without being noticed and increasing a scoring result.

特開２００２−１０８３６８号公報JP-A-2002-108368

しかし、特許文献１のカラオケ装置によれば、利用者がカラオケ歌唱を行う都度、割り増し採点コマンドを入力する必要があるため、利用者が採点値の割り増しに気づく可能性があった。 However, according to the karaoke apparatus of Patent Literature 1, it is necessary for the user to input the extra scoring command every time the user sings a karaoke, so that the user may notice the extra scoring value.

本発明の目的は、カラオケ歌唱を行った利用者に気づかれることなく、当該カラオケ歌唱の採点結果を補正可能なカラオケ装置を提供することにある。 An object of the present invention is to provide a karaoke apparatus that can correct the scoring result of the karaoke singing without being noticed by a user who has performed the karaoke singing.

上記目的を達成するための主たる発明は、利用者によるカラオケ歌唱を評価し、採点結果を提示する採点機能を有するカラオケ装置であって、所定範囲内に居る全ての利用者の会話音声を集音可能な集音部と、前記カラオケ装置を制御する制御部と、を有し、前記制御部は、集音した前記会話音声から、各利用者の声紋データを抽出する抽出手段、集音した前記会話音声に基づいて各声紋データの序列を決定し、当該序列の情報を当該各声紋データと紐付けて記憶させる決定手段、ある利用者がカラオケ歌唱を行った場合、当該ある利用者の歌唱音声から抽出された声紋認証用データと一致する前記声紋データを特定し、特定された声紋データに紐付けられている前記序列の情報に基づいて、当該ある利用者のカラオケ歌唱の採点結果を補正する補正手段、として機能するカラオケ装置である。
本発明の他の特徴については、後述する明細書及び図面の記載により明らかにする。 A main invention for achieving the above object is a karaoke apparatus having a scoring function of evaluating a karaoke singing by a user and presenting a scoring result, and collects conversation voices of all users within a predetermined range. A possible sound collection unit, and a control unit for controlling the karaoke apparatus, wherein the control unit extracts voiceprint data of each user from the collected conversation voice, Determining means for determining the order of each voiceprint data based on the conversation voice and storing the information of the order in association with each voiceprint data, and when a certain user sings karaoke, the singing voice of the certain user The voiceprint data matching the voiceprint authentication data extracted from the specified voiceprint data is specified, and the scoring result of the karaoke song of the user is corrected based on the information on the order linked to the specified voiceprint data. That the correction means is a karaoke device that functions as a.
Other features of the present invention will be apparent from the description in the specification and the drawings described below.

本発明によれば、カラオケ歌唱を行った利用者に気づかれることなく、当該カラオケ歌唱の採点結果を補正できる。 ADVANTAGE OF THE INVENTION According to this invention, the scoring result of the said karaoke singing can be corrected, without the user who performed the karaoke singing noticing.

第１実施形態に係るカラオケ装置を示す図である。It is a figure showing a karaoke device concerning a 1st embodiment. 第１実施形態に係るカラオケ本体のハードウェア構成例を示す図である。It is a figure showing the example of hardware constitutions of the karaoke main part concerning a 1st embodiment. 第１実施形態に係る序列の例を示す図である。It is a figure showing the example of the order concerning a 1st embodiment. 第１実施形態に係るカラオケ装置の処理を示すフローチャートである。5 is a flowchart illustrating a process of the karaoke apparatus according to the first embodiment. 第２実施形態に係るカラオケ本体のハードウェア構成例を示す図である。It is a figure showing the example of hardware constitutions of the karaoke main part concerning a 2nd embodiment. 第２実施形態に係る第１の記憶手段に記憶されているテーブルデータの例を示す図である。FIG. 9 is a diagram illustrating an example of table data stored in a first storage unit according to the second embodiment. 第２実施形態に係る序列の例を示す図である。It is a figure showing the example of the order concerning a 2nd embodiment. 第２実施形態に係るカラオケ装置の処理を示すフローチャートである。It is a flowchart which shows the process of the karaoke apparatus which concerns on 2nd Embodiment. 第３実施形態に係るカラオケ装置が設置されているカラオケルームを示す図である。It is a figure showing the karaoke room where the karaoke device concerning a 3rd embodiment is installed. 第３実施形態に係るカラオケ本体のハードウェア構成例を示す図である。It is a figure showing the example of hardware constitutions of the karaoke main part concerning a 3rd embodiment. 第３実施形態に係る第１の記憶手段に記憶されているテーブルデータの例を示す図である。FIG. 14 is a diagram illustrating an example of table data stored in a first storage unit according to a third embodiment. 第３実施形態に係る序列の例を示す図である。It is a figure showing the example of the order concerning a 3rd embodiment. 第３実施形態に係るカラオケ装置の処理を示すフローチャートである。It is a flowchart which shows the process of the karaoke apparatus which concerns on 3rd Embodiment.

＜第１実施形態＞
図１〜図４を参照して、第１実施形態に係るカラオケ装置について説明する。 <First embodiment>
The karaoke apparatus according to the first embodiment will be described with reference to FIGS.

＝＝カラオケ装置＝＝
カラオケ装置１は、カラオケ演奏及び利用者がカラオケ歌唱を行うための装置である。カラオケ装置１は、たとえばカラオケ店舗の各部屋（カラオケルーム）に設置される。カラオケ装置１は、利用者によるカラオケ歌唱を評価し、採点結果を提示する採点機能を有する（詳細は後述）。 == Karaoke device ==
The karaoke apparatus 1 is an apparatus for performing karaoke performance and singing karaoke by a user. The karaoke apparatus 1 is installed in each room (karaoke room) of a karaoke store, for example. The karaoke apparatus 1 has a scoring function for evaluating a karaoke singing by a user and presenting a scoring result (details will be described later).

図１に示すように、カラオケ装置１は、カラオケ本体１０、スピーカ２０、表示装置３０、マイク４０、及びリモコン装置５０を備える。 As shown in FIG. 1, the karaoke apparatus 1 includes a karaoke body 10, a speaker 20, a display device 30, a microphone 40, and a remote control device 50.

カラオケ本体１０は、選曲されたカラオケ楽曲の演奏制御、歌詞や背景映像等の表示制御、マイク４０を通じて入力された音声信号の処理といった、カラオケ演奏やカラオケ歌唱に関する各種の制御を行う。スピーカ２０はカラオケ本体１０からの放音信号に基づいて放音するための構成である。表示装置３０はカラオケ本体１０からの信号に基づいて映像や画像を画面に表示するための構成である。マイク４０は利用者の歌唱音声をアナログの音声信号に変換してカラオケ本体１０に入力するための構成である。リモコン装置５０は、カラオケ本体１０に対する各種操作をおこなうための装置である。 The karaoke main body 10 performs various controls related to karaoke performance and karaoke singing, such as performance control of the selected karaoke music, display control of lyrics and background images, and processing of audio signals input through the microphone 40. The speaker 20 is configured to emit sound based on a sound emission signal from the karaoke body 10. The display device 30 is configured to display a video or an image on a screen based on a signal from the karaoke body 10. The microphone 40 is configured to convert the singing voice of the user into an analog voice signal and input the analog voice signal to the karaoke body 10. The remote control device 50 is a device for performing various operations on the karaoke body 10.

ここで、本実施形態に係るマイク４０は、所定範囲内に居る全ての利用者の会話音声を集音可能な「集音部」としても用いられる。所定範囲は、集音部によって会話音声の集音が可能となる範囲である。たとえば、カラオケ装置１が設置されているカラオケルーム内は、「所定範囲内」の一例である。マイク４０は、カラオケ歌唱が開始される前に、カラオケルーム内において交わされる会話音声を集音する。 Here, the microphone 40 according to the present embodiment is also used as a “sound collection unit” capable of collecting conversation voices of all users within a predetermined range. The predetermined range is a range in which conversation sound can be collected by the sound collection unit. For example, the inside of the karaoke room where the karaoke apparatus 1 is installed is an example of “within a predetermined range”. The microphone 40 collects conversation voices exchanged in the karaoke room before karaoke singing is started.

マイク４０は、集音した会話音声（会話音声データ）をカラオケ本体１０に出力する。なお、集音部はマイク４０とは別に設けられていてもよい。また、集音部は複数設けられていてもよい The microphone 40 outputs the collected conversation voice (conversation voice data) to the karaoke body 10. Note that the sound collection unit may be provided separately from the microphone 40. Also, a plurality of sound collecting units may be provided.

＝＝カラオケ本体１０＝＝
図２に示すように、本実施形態に係るカラオケ本体１０は、記憶部１０ａ、通信部１０ｂ、入力部１０ｃ、及び制御部１０ｄを備える。各構成はインターフェース（図示なし）を介してバスＢに接続されている。 == Karaoke body 10 ==
As shown in FIG. 2, the karaoke body 10 according to the present embodiment includes a storage unit 10a, a communication unit 10b, an input unit 10c, and a control unit 10d. Each component is connected to the bus B via an interface (not shown).

［記憶部、通信部、入力部］
記憶部１０ａは、カラオケ演奏を行うための楽曲データ等、各種のデータを記憶する大容量の記憶装置である。通信部１０ｂは、カラオケ本体１０とカラオケ装置１の他の構成との通信を行うためのインターフェースを提供する。入力部１０ｃは、利用者が各種の指示入力を行うための構成である。なお、表示装置３０やリモコン装置５０の表示画面に表示された各種アイコンを選択することで、各種の指示入力を行うことも可能である。この場合、表示装置３０やリモコン装置５０が入力部１０ｃとして機能する。 [Storage unit, communication unit, input unit]
The storage unit 10a is a large-capacity storage device that stores various data such as music data for performing a karaoke performance. The communication unit 10b provides an interface for performing communication between the karaoke body 10 and other components of the karaoke apparatus 1. The input unit 10c has a configuration for the user to input various instructions. Various instructions can be input by selecting various icons displayed on the display screen of the display device 30 or the remote control device 50. In this case, the display device 30 and the remote control device 50 function as the input unit 10c.

［制御部］
制御部１０ｄは、カラオケ装置１における各種の制御を行う。制御部１０ｄは、ＣＰＵおよびメモリ（いずれも図示無し）を備える。ＣＰＵは、メモリに記憶されたプログラムを実行することにより各種の機能を実現する。本実施形態においてはＣＰＵがメモリに記憶されるプログラムを実行することにより、制御部１０ｄは、抽出手段１００、決定手段２００、採点手段３００、補正手段４００、及び提示手段５００として機能する。 [Control unit]
The control unit 10d performs various controls in the karaoke device 1. The control unit 10d includes a CPU and a memory (both not shown). The CPU realizes various functions by executing a program stored in the memory. In the present embodiment, when the CPU executes the program stored in the memory, the control unit 10d functions as the extracting unit 100, the determining unit 200, the scoring unit 300, the correcting unit 400, and the presenting unit 500.

（抽出手段）
抽出手段１００は、集音した会話音声から、各利用者の声紋データを抽出する。 (Extraction means)
The extraction unit 100 extracts voiceprint data of each user from the collected conversation voice.

会話音声から声紋データを抽出する方法は、公知の技術を利用することができる。また、利用者が複数いる場合に、一の集音部で集音した会話音声の中から利用者毎の音声を分離する方法についても、公知の技術（たとえば、三菱電機株式会社ＨＰ “http://www.mitsubishielectric.co.jp/news/2017/0524-e.html”）を利用することができる。 A known technique can be used as a method for extracting voiceprint data from a conversation voice. Also, in the case where there are a plurality of users, a method of separating the voice of each user from the conversation voice collected by one sound collection unit is also known in the art (for example, Mitsubishi Electric Corporation HP “http: //www.mitsubishielectric.co.jp/news/2017/0524-e.html ”).

具体例として、株式会社ＸのＡ部長とＢ社員（役職無し）が、取引先であるＹ株式会社のＣ部長とＤ課長を接待することになり、二次会でカラオケルームに入室したとする。この場合、Ａ部長、Ｂ社員、Ｃ部長、Ｄ課長は、カラオケ装置１の利用者に相当する。 As a specific example, it is assumed that the general manager A and the employee B (no position) of X Corporation entertain the karaoke room at the second party after receiving the business managers of the general managers C and D of Y Corporation. In this case, the manager A, the employee B, the manager C, and the manager D correspond to the user of the karaoke apparatus 1.

マイク４０は、各利用者がカラオケルームに入室した時点から会話音声の集音を行い、抽出手段１００に出力する。抽出手段１００は、会話音声から、利用者それぞれの声紋データを抽出する。この例では、利用者が４名いるため、通常は４つの声紋データ（声紋データＶ１〜声紋データＶ４）が抽出される。なお、カラオケ装置１は、抽出された各声紋データがどの利用者の声紋データであるかは識別できない。 The microphone 40 collects conversation voice from the time when each user enters the karaoke room, and outputs the voice to the extraction means 100. The extracting means 100 extracts voiceprint data of each user from the conversation voice. In this example, since there are four users, usually four voiceprint data (voiceprint data V1 to voiceprint data V4) are extracted. In addition, the karaoke apparatus 1 cannot identify which user the extracted voiceprint data is.

（決定手段）
決定手段２００は、集音した会話音声に基づいて各声紋データの序列を決定し、当該序列の情報を当該各声紋データと紐付けて記憶させる。 (Determining means)
The determining means 200 determines the order of each voiceprint data based on the collected conversation voice, and stores the information of the order in association with each voiceprint data.

本実施形態において、決定手段２００は、集音した会話音声を処理して得られた敬語情報に基づいて序列を決定する。敬語情報は、各利用者が会話で使用した敬語に対応する情報である。たとえば、敬語情報は、会話の中で使用された場合に利用者間の上位下位が反映される３分類（尊敬語、謙譲語、丁寧語）や５分類（尊敬語、謙譲語、丁重語、丁寧語、美化語）に含まれる語に相当する。本実施形態における「序列」は、役職及び立場に起因するものである。たとえば、序列が高い利用者は、役職がより高い者であったり、接待される側（接待相手）であったりする。 In the present embodiment, the determining means 200 determines the order based on honorific information obtained by processing the collected conversational voice. The honorific information is information corresponding to the honorific used by each user in the conversation. For example, honorific information is classified into three categories (respect, humble, polite) and five categories (respect, humble, polite, Polite words, beautifying words). The “order” in the present embodiment is based on the position and position. For example, a user with a high rank may be a person with a higher position or a party to be entertained (entertained party).

敬語情報の取得及び序列の決定は、公知の方法を利用することができる。 Known methods can be used to obtain honorific information and determine the order.

具体例として、決定手段２００は、一の会話音声を音声認識処理することでテキストデータを生成する（特開２０１４−０２６６０３号公報等参照）。決定手段２００は、テキスト化された会話音声の中から敬語を検出し、全文字数における敬語の文字数の占める割合をスコアとして算出する。 As a specific example, the determining unit 200 generates text data by performing voice recognition processing on one conversation voice (see Japanese Patent Application Laid-Open No. 2014-026603). The deciding means 200 detects a honorific word from the textual conversation voice, and calculates a ratio of the number of characters of the honorific character to the total number of characters as a score.

たとえば、テキスト化された会話音声が「いつもたいへんおせわになっております」であった場合、全文字数が１８文字、敬語（「お」「おります」）の文字数が計５文字である。従って、決定手段２００は、（５／１８）×１００＝２７．８２（点）というスコアを算出する。一方、テキスト化された会話音声が「いやいや、こちらこそどうも」であった場合、敬語の文字数は０文字である。この場合、決定手段２００は、当該会話音声については０（点）というスコアを算出する。 For example, if the text conversation voice is "always overwhelmed", the total number of characters is 18 characters, and the number of honorific words ("o" and "ori") is 5 characters in total. Therefore, the determining means 200 calculates a score of (5/18) × 100 = 27.82 (point). On the other hand, when the textual conversation voice is “No, this is not it,” the number of honorific words is zero. In this case, the determining unit 200 calculates a score of 0 (point) for the conversation voice.

決定手段２００は、声紋データ毎にスコアを算出する。一の声紋データについて複数の会話音声が存在する場合、決定手段２００は、全会話音声の文字数と全敬語の文字数に基づいてスコアを算出する。 The determining means 200 calculates a score for each voiceprint data. When there are a plurality of conversation voices for one voiceprint data, the determination unit 200 calculates a score based on the number of characters of all conversation voices and the number of characters of all honorific words.

決定手段２００は、スコアの低い順、すなわち敬語を使う頻度が少ない順に、声紋データの序列を決定する。 The deciding means 200 decides the order of the voiceprint data in the descending order of the score, that is, in the descending order of the use of honorifics.

図３は、本実施形態において算出された、声紋データ毎のスコア、及び決定された序列を示した図である。図３においては、声紋データＶ１に対応するスコアが１６（点）、声紋データＶ２に対応するスコアが３８（点）、声紋データＶ３に対応するスコアが１０（点）、声紋データＶ４に対応するスコアが１８（点）となっている。また、決定手段２００により、声紋データのスコアの低い順に、声紋データＶ３、声紋データＶ１、声紋データＶ４、声紋データＶ２という序列が決定されている。 FIG. 3 is a diagram showing the score for each voiceprint data calculated in the present embodiment and the determined order. In FIG. 3, the score corresponding to the voiceprint data V1 is 16 (points), the score corresponding to the voiceprint data V2 is 38 (points), the score corresponding to the voiceprint data V3 is 10 (points), and the score corresponding to the voiceprint data V4. The score is 18 (points). The determining unit 200 determines the order of the voiceprint data V3, the voiceprint data V1, the voiceprint data V4, and the voiceprint data V2 in ascending order of the score of the voiceprint data.

なお、スコアの算出方法は、上記例に限られない。たとえば、決定手段２００は、全文字数における敬語の数の割合をスコアとして算出してもよい。たとえば、テキスト化された会話音声が「いつもたいへんおせわになっております」であれば、決定手段２００は、敬語の数（「お」、「おります」の２つ）に応じて（２／１８）×１００＝１１．１１（点）というスコアを算出する。また、決定手段２００は、敬語の分類を区別して検出し、分類ごとに重み付けをしたテーブル（たとえば尊敬語：１．５、謙譲語：１．３、丁寧語１．０）を参照してスコアを算出してもよい。 The method for calculating the score is not limited to the above example. For example, the determining means 200 may calculate the ratio of the number of honorific words in the total number of characters as a score. For example, if the text-based conversational voice is “always overwhelmed”, the determination means 200 determines (2 A score of (/18)×100=11.11 (point) is calculated. Further, the determining means 200 distinguishes and detects the classification of honorific words, and refers to a table weighted for each classification (for example, respectful word: 1.5, humble word: 1.3, polite word 1.0), and scores. May be calculated.

決定手段２００は、決定した序列の情報を各声紋データと紐付けて記憶部１０ａに記憶させる。図３の例において、決定手段２００は、声紋データＶ３に序列「１位」を紐付け、声紋データＶ１に序列「２位」を紐付け、声紋データＶ４に序列「３位」を紐付け、声紋データＶ２に序列「４位」を紐付けて記憶部１０ａに記憶させる。 The determining means 200 stores the information of the determined order in the storage unit 10a in association with each voiceprint data. In the example of FIG. 3, the determining unit 200 links the voiceprint data V3 with the rank “1st”, the voiceprint data V1 with the rank “2nd”, and the voiceprint data V4 with the rank “3rd”. The voiceprint data V2 is stored in the storage unit 10a in association with the rank “fourth”.

（採点手段）
採点手段３００は、利用者によるカラオケ歌唱を評価し、採点結果を得る。具体的に、採点手段３００は、利用者の歌唱音声から抽出した歌唱音声データを、音高、音量及び歌唱技法の少なくとも一つに基づいて採点を行う。採点結果は、たとえば具体的な数値（１００点を満点とする数値）として求めることができる。 (Scoring method)
The scoring means 300 evaluates the karaoke singing by the user and obtains a scoring result. Specifically, the scoring unit 300 scores the singing voice data extracted from the singing voice of the user based on at least one of a pitch, a volume, and a singing technique. The scoring result can be obtained, for example, as a specific numerical value (a numerical value with a perfect score of 100).

カラオケ歌唱の評価は、公知の技術を利用することができる。たとえば、採点手段３００は、マイク４０から入力された歌唱音声信号から、ピッチ（音高）データ、音量データ等の歌唱音声データを抽出し、カラオケ歌唱を行った楽曲のリファレンスデータと比較することにより、採点値を得ることができる。採点手段３００は、得られた採点結果を補正手段４００に出力する。 For the evaluation of the karaoke singing, a known technique can be used. For example, the scoring unit 300 extracts singing voice data such as pitch (pitch) data and volume data from a singing voice signal input from the microphone 40, and compares the extracted singing voice data with reference data of a song that has performed karaoke singing. , Scoring value can be obtained. The scoring unit 300 outputs the obtained scoring result to the correcting unit 400.

（補正手段）
補正手段４００は、ある利用者がカラオケ歌唱を行った場合、当該ある利用者の歌唱音声から抽出された声紋認証用データと一致する声紋データを特定し、特定された声紋データに紐付けられている序列の情報に基づいて、当該ある利用者のカラオケ歌唱の採点結果を補正する。 (Correction means)
When a certain user sings a karaoke song, the correction means 400 specifies voiceprint data that matches the voiceprint authentication data extracted from the singing voice of the certain user, and associates the voiceprint data with the specified voiceprint data. The scoring result of the karaoke singing of the certain user is corrected based on the information of the order.

声紋認証用データは、カラオケ歌唱を行う利用者の歌唱音声から抽出した声紋データである。たとえば、上記例において、Ａ部長、Ｂ社員、Ｃ部長、Ｄ課長がカラオケルームに入室した後、順番にカラオケ歌唱を行ったとする。抽出手段１００は、カラオケ楽曲の演奏開始に伴ってカラオケ歌唱を行う利用者の歌唱音声を解析することにより、ある利用者の声紋データを声紋認証用データとして抽出する。 The voiceprint authentication data is voiceprint data extracted from a singing voice of a user who sings a karaoke song. For example, in the above example, assume that the manager A, the employee B, the manager C, and the manager D enter the karaoke room and then sing karaoke in order. The extracting unit 100 extracts voiceprint data of a certain user as voiceprint authentication data by analyzing a singing voice of a user who sings karaoke at the start of the performance of the karaoke music.

採点結果の補正は、予め設定され記憶部１０ａに記憶されている補正情報に基づいて行う。補正情報は、たとえば、「序列１位の利用者の採点結果に対して１０点をプラスする」といった具体的な数値を示すものや、「序列１位の利用者の採点結果を１０％アップする」といったものでもよい。但し、採点結果の上限値（たとえば１００点）がある場合、当該上限値を越えないように調整することが好ましい。また、序列１位の利用者の採点結果に対してのみの補正ではなく、序列に応じて加点に差を付けたり、序列によっては減点する補正を行ってもよい。たとえば、「序列１位は＋１０点、序列２位は＋５点、序列３位は±０点、序列４位は−５点」といった補正情報であってもよい。なお、採点結果を減点する場合、実際のカラオケ歌唱が上手く聞こえるにも関わらず、採点結果が低くなることがありうる。このような状況が生じると、採点結果に疑念を持たれる可能性（採点結果が操作されていることを知られてしまう可能性）がある。よって、採点手段３００による採点値が一定値以上の場合（たとえば、９０点以上）、仮に減点対象となる序列であっても減点しないことが好ましい。 The correction of the scoring result is performed based on the correction information set in advance and stored in the storage unit 10a. The correction information indicates, for example, a specific numerical value such as “Add 10 points to the scoring result of the first-ranked user”, or “increase the scoring result of the first-ranked user by 10%”. ". However, if there is an upper limit (for example, 100 points) of the scoring result, it is preferable to adjust the score so as not to exceed the upper limit. Further, instead of correcting only the scoring result of the first-ranked user, correction may be made to add a difference in points according to the rank or to deduct points depending on the rank. For example, correction information such as “ranking first place is +10 points, ranking second place is +5 points, ranking third place is ± 0 point, and ranking fourth place is −5 points” may be used. In addition, when the scoring result is deducted, the scoring result may be low even though the actual karaoke singing sounds good. When such a situation occurs, there is a possibility that the scoring result may be doubted (a possibility that the scoring result is operated). Therefore, when the scoring value by the scoring unit 300 is equal to or more than a certain value (for example, 90 points or more), it is preferable that no deduction is performed even if the order is a point to be deducted.

補正手段４００は、記憶部１０ａに記憶されている声紋データの中から、声紋認証用データと一致する声紋データを特定する。 The correction unit 400 specifies voiceprint data that matches the voiceprint authentication data from the voiceprint data stored in the storage unit 10a.

たとえば、Ｂ社員がカラオケ歌唱を行ったとする。この場合、補正手段４００は、Ｂ社員の声紋認証用データに基づいて記憶部１０ａの中から一の声紋データを特定する。ここでは、図３に示した音声データＶ２がＢ社員の声紋データとして特定されたとする。 For example, assume that employee B sings karaoke. In this case, the correction means 400 specifies one voiceprint data from the storage unit 10a based on the voiceprint authentication data of the employee B. Here, it is assumed that the voice data V2 shown in FIG. 3 is specified as the voiceprint data of the employee B.

Ｂ社員がカラオケ歌唱を終了し、採点手段３００は、Ｂ社員のカラオケ歌唱の採点結果として「８０点」を算出したとする。補正手段４００は、採点結果「８０点」に対し、特定した一の声紋データＶ２に紐付けられている序列の情報に基づいて、採点結果の補正を行う。ここで、補正情報が「序列１位の利用者の採点結果に対して１０点をプラスする」となっていたとする。この場合、図３の例によれば、特定した一の声紋データＶ２の序列は１位ではない。よって、補正手段４００は、採点結果「８０点」の補正を行うことなく、そのまま、提示手段５００に出力する。 It is assumed that the employee B finishes the karaoke singing, and the scoring unit 300 calculates “80 points” as the result of the scoring of the karaoke singing of the employee B. The correcting unit 400 corrects the scoring result based on the information of the order linked to the identified one voiceprint data V2 for the scoring result “80 points”. Here, it is assumed that the correction information is “Add 10 points to the scoring result of the first-ranked user”. In this case, according to the example of FIG. 3, the order of the identified one voiceprint data V2 is not the first. Therefore, the correcting unit 400 outputs the result to the presenting unit 500 without correcting the scoring result “80 points”.

一方、たとえば、Ｃ部長がカラオケ歌唱を行ったとする。この場合、補正手段４００は、Ｃ部長の声紋認証用データに基づいて記憶部１０ａの中から一の声紋データを特定する。ここでは、図３に示した音声データＶ３がＣ部長の声紋データとして特定されたとする。 On the other hand, for example, assume that the manager C sings karaoke. In this case, the correction unit 400 specifies one voiceprint data from the storage unit 10a based on the voiceprint authentication data of the manager C. Here, it is assumed that the voice data V3 shown in FIG. 3 has been specified as the voiceprint data of the C section manager.

Ｃ部長がカラオケ歌唱を終了し、採点手段３００は、Ｃ部長のカラオケ歌唱の採点結果として「８５点」を算出したとする。補正手段４００は、採点結果「８５点」に対し、特定した一の声紋データＶ３に紐付けられている序列の情報に基づいて、採点結果の補正を行う。上記と同様、補正情報が「序列１位の利用者の採点結果に対して１０点をプラスする」となっていたとする。この場合、図３の例によれば、特定した一の声紋データＶ３の序列は１位である。よって、補正手段４００は、採点結果「８５点」に「１０点」を加えた「９５点」をＣ部長の採点結果とし、提示手段５００に出力する。 It is assumed that the manager C finishes the karaoke singing, and the scoring unit 300 calculates “85 points” as the scoring result of the karaoke singing of the manager C. The correcting unit 400 corrects the scoring result based on the information of the order linked to the identified one voiceprint data V3 for the scoring result “85 points”. Similarly to the above, it is assumed that the correction information is “Add 10 points to the scoring result of the first-ranked user”. In this case, according to the example of FIG. 3, the specified one voiceprint data V3 is ranked first. Accordingly, the correcting unit 400 outputs “95 points” obtained by adding “10 points” to the scoring result “85 points” as the scoring result of the C section manager, and outputs the result to the presentation unit 500.

［提示手段］
提示手段５００は、採点手段３００で得られた採点結果、或いは補正手段４００で補正された採点結果を利用者に対して提示する。 [Presentation means]
The presenting unit 500 presents the scoring result obtained by the scoring unit 300 or the scoring result corrected by the correcting unit 400 to the user.

採点結果の提示方法は様々な手法が可能である。たとえば、提示手段５００は、採点結果を表示装置３０の表示画面に表示させることができる。或いは、提示手段５００は、スピーカ２０を介して採点結果を放音させることができる。 Various methods can be used to present the scoring results. For example, the presentation unit 500 can display the scoring result on the display screen of the display device 30. Alternatively, the presentation means 500 can emit the scoring result via the speaker 20.

＝＝カラオケ装置における処理について＝＝
次に、図４を参照して本実施形態に係るカラオケ装置１における処理の具体例について述べる。図４は、カラオケ装置１における処理例を示すフローチャートである。 == Processing in karaoke device ==
Next, a specific example of processing in the karaoke apparatus 1 according to the present embodiment will be described with reference to FIG. FIG. 4 is a flowchart illustrating a processing example in the karaoke apparatus 1.

利用者がカラオケルームに入室した後、マイク４０は、カラオケルーム内に居る全ての利用者の会話音声を集音する（会話音声の集音。ステップ１０）。 After the user enters the karaoke room, the microphone 40 collects the conversation voices of all the users in the karaoke room (collection of conversation voices, step 10).

抽出手段１００は、ステップ１０で集音した会話音声から、各利用者の声紋データを抽出する（声紋データの抽出。ステップ１１）。 The extraction means 100 extracts voiceprint data of each user from the conversation voice collected in step 10 (voiceprint data extraction; step 11).

決定手段２００は、ステップ１０で集音した会話音声に基づいて各声紋データの序列を決定する（声紋データの序列を決定。ステップ１２）。 The deciding means 200 decides the order of each voiceprint data based on the conversation voice collected in step 10 (determines the order of voiceprint data, step 12).

決定手段２００は、ステップ１２で決定した序列の情報を、ステップ１１で抽出された声紋データと紐付けて記憶部１０ａに記憶させる（序列の情報及び声紋データの記憶。ステップ１３）。 The deciding means 200 stores the information on the sequence determined in step 12 in association with the voiceprint data extracted in step 11 in the storage unit 10a (storage of the sequence information and voiceprint data; step 13).

ここで、ある利用者がカラオケ歌唱を行った場合（ステップ１４でＹの場合）、マイク４０は、当該ある利用者の歌唱音声を集音する（歌唱音声の集音。ステップ１５）。抽出手段１００は、歌唱音声からある利用者の声紋データ（声紋認証用データ）を抽出する（声紋認証用データの抽出。ステップ１６）。 Here, when a certain user sings a karaoke song (in the case of Y in step 14), the microphone 40 collects the singing voice of the certain user (collection of singing voice; step 15). The extracting means 100 extracts voiceprint data (voiceprint authentication data) of a user from the singing voice (extraction of voiceprint authentication data, step 16).

補正手段４００は、ステップ１６で抽出された声紋認証用データと一致する声紋データを、ステップ１３で記憶された声紋データの中から特定する（声紋データの特定。ステップ１７）。 The correction means 400 specifies voiceprint data that matches the voiceprint authentication data extracted in step 16 from the voiceprint data stored in step 13 (voiceprint data specification, step 17).

採点手段３００は、ある利用者のカラオケ歌唱の評価を行い、採点結果を得る（カラオケ歌唱の採点。ステップ１８）。 The scoring means 300 evaluates the karaoke singing of a certain user and obtains a scoring result (scoring of karaoke singing. Step 18).

補正手段４００は、ステップ１７で特定された声紋データに紐付けられている序列の情報に基づいて、ステップ１８で得られた、ある利用者のカラオケ歌唱の採点結果を補正する（採点結果の補正。ステップ１９）。 The correction means 400 corrects the scoring result of a certain user's karaoke singing obtained in step 18 based on the information of the order linked to the voiceprint data specified in step 17 (correction of the scoring result). Step 19).

提示手段５００は、ステップ１９で補正された採点結果を提示する（採点結果の提示。ステップ２０）。 The presentation means 500 presents the scoring result corrected in step 19 (presentation of the scoring result. Step 20).

このように、本実施形態に係るカラオケ装置１は、利用者によるカラオケ歌唱を評価し、採点結果を提示する採点機能を有する。カラオケ装置１は、所定範囲内に居る全ての利用者の会話音声を集音可能なマイク４０と、カラオケ装置１を制御する制御部１０ｄを有する。制御部１０ｄは、集音した会話音声から、各利用者の声紋データを抽出する抽出手段１００、集音した会話音声に基づいて各声紋データの序列を決定し、当該序列の情報を当該各声紋データと紐付けて記憶させる決定手段２００、ある利用者がカラオケ歌唱を行った場合、当該ある利用者の歌唱音声から抽出された声紋認証用データと一致する声紋データを特定し、特定された声紋データに紐付けられている序列の情報に基づいて、当該ある利用者のカラオケ歌唱の採点結果を補正する補正手段４００、として機能する。 As described above, the karaoke apparatus 1 according to the present embodiment has a scoring function of evaluating a karaoke singing by a user and presenting a scoring result. The karaoke apparatus 1 includes a microphone 40 that can collect conversation voices of all users within a predetermined range, and a control unit 10d that controls the karaoke apparatus 1. The control unit 10d extracts the voiceprint data of each user from the collected conversation voices, determines the rank of each voiceprint data based on the collected conversation voices, and stores the information of the ranks in each voiceprint. Determining means 200 for storing in association with data, when a certain user sings karaoke, specifies voiceprint data that matches voiceprint authentication data extracted from the singing voice of the certain user, and specifies the specified voiceprint It functions as a correction unit 400 that corrects the scoring result of the karaoke song of the certain user based on the information of the order linked to the data.

このようなカラオケ装置１によれば、ある利用者がカラオケ歌唱を行った場合、自動的に決定された声紋データの序列に応じて採点結果を補正することができる。この場合、序列の高い相手（たとえば接待相手）の方が良い採点結果が得られる傾向にあるため、採点機能を利用して場を盛り上げることができる。すなわち、本実施形態に係るカラオケ装置１によれば、カラオケ歌唱を行った利用者に気づかれることなく、当該カラオケ歌唱の採点結果を補正できる。 According to such a karaoke apparatus 1, when a certain user sings karaoke, the scoring result can be corrected according to the automatically determined order of voiceprint data. In this case, since a partner with a higher rank (for example, an entertainment partner) tends to obtain a better scoring result, the place can be excited using the scoring function. That is, according to the karaoke apparatus 1 according to the present embodiment, the scoring result of the karaoke song can be corrected without being noticed by the user who has performed the karaoke song.

また、本実施形態に係る決定手段２００は、集音した会話音声を処理して得られた敬語情報に基づいて算出したスコアにより序列を決定することを特徴とする。一般的なビジネスマナーとして、立場が上の者に対しては敬語を使用することが好ましいとされている。よって、そのような敬語情報に基づいて算出したスコアを利用して序列を決定することにより、より正確に声紋データの序列を決定できる。 Further, the determining means 200 according to the present embodiment is characterized in that an order is determined based on a score calculated based on honorific word information obtained by processing collected speech sounds. As a general business manner, it is preferable to use honorifics for those who are superior. Therefore, by determining the rank using the score calculated based on such honorific information, the rank of the voiceprint data can be determined more accurately.

＜第２実施形態＞
次に、図５〜図８を参照して、第２実施形態に係るカラオケ装置について説明する。本実施形態では、会話音声に含まれる各利用者の敬称情報を利用して、声紋データの序列を決定する例について説明する。なお、第１実施形態と同様の構成については詳細な説明を省略する。 <Second embodiment>
Next, a karaoke apparatus according to a second embodiment will be described with reference to FIGS. In the present embodiment, an example will be described in which the order of the voiceprint data is determined using the title information of each user included in the conversation voice. Note that a detailed description of the same configuration as that of the first embodiment is omitted.

（第１の記憶手段）
図５に示すように、本実施形態における記憶部１０ａの記憶領域の一部は、第１の記憶手段６００として機能する。第１の記憶手段６００は、利用者の敬称と所定のスコアとを対応付けた敬称スコアテーブルを記憶する。敬称は、ある者が相手に対して敬意、尊敬の念を込めて用いられる名前等の後ろに付ける接尾語（「さん」、「さま」、「くん」等）、またはその語自体で相手を表現する代名詞（「部長」、「先生」等）である。 (First storage means)
As shown in FIG. 5, a part of the storage area of the storage unit 10a in the present embodiment functions as a first storage unit 600. The first storage unit 600 stores a title of honor score table in which the title of the user is associated with a predetermined score. A title is a suffix ("san", "sama", "kun", etc.) that a person attaches to a name that is used with respect and respect for the other person, or the word itself. It is a pronoun to express (such as "manager" or "teacher").

スコアは、敬称毎に所定の値が対応付けられている。ここで、敬称の中でも上の者が下の者に使う傾向が高い語（「くん」、「ちゃん」等）、と下の者が上の者に使う傾向が高い語（「さん」、「さま」等）がある。本実施形態におけるスコアは、敬称の中でも上の者が下の者に使う傾向が高い語について高く、下の者が上の者に使う傾向が高い語について低く設定されている。 A predetermined value is associated with each score for each title. Here, of the titles, words that the upper one has a higher tendency to use for the lower ones ("kun", "chan", etc.) and words that the lower ones have a high tendency to use for the upper ones ("san", " "). In the present embodiment, the score is set high for words in which the upper one is more likely to use the lower one among the titles, and lower for words that the lower one is more likely to use for the upper one.

図６は、敬称スコアテーブルの一例である。この例では、「さま」、「さん」、「くん」、「ちゃん」の順でスコアが低くなるようにテーブルを構成している。 FIG. 6 is an example of the honorific score table. In this example, the table is configured such that the scores become lower in the order of “sama”, “san”, “kun”, and “chan”.

（決定手段）
本実施形態に係る決定手段２００は、集音した会話音声を処理して得られた敬称情報と敬称スコアテーブルとに基づいて算出したスコアにより、序列を決定する。 (Determining means)
The determining unit 200 according to the present embodiment determines the order based on the scores calculated based on the honorific information and the honorific score table obtained by processing the collected speech sounds.

具体例として、Ａ部長がＢ社員を連れて、Ｙ会社の接待の反省会と称して、行きつけのカラオケスナックを訪れたとする。このカラオケスナックにおいては、一台のカラオケ本体１０に対し、複数のマイク４０が設けられている。各マイクは、カラオケ本体１０と通信可能となっている。マイク４０は、カラオケスナックの各テーブルに少なくとも一本ずつ置かれている。マイク４０は、それが置かれたテーブル近傍の会話音声を収集する。本実施形態における「所定範囲」は、マイク４０が置かれたテーブル近傍に相当する。 As a specific example, it is assumed that the manager A visits his favorite karaoke snack with the employee B, referred to as a reflection meeting of the entertainment of the company Y. In this karaoke snack, a plurality of microphones 40 are provided for one karaoke body 10. Each microphone can communicate with the karaoke body 10. At least one microphone 40 is placed on each table of the karaoke snack. The microphone 40 collects conversation voice near the table where the microphone is placed. The “predetermined range” in the present embodiment corresponds to the vicinity of the table where the microphone 40 is placed.

決定手段２００は、会話音声を音声認識処理することで、一の会話音声に含まれる敬称情報を検出する。決定手段２００は、検出した敬称情報を敬称スコアテーブルに当てはめ、一の会話音声におけるスコアを求める。決定手段２００は、声紋データ毎にスコアを合計し、合計したスコアに応じて序列を決定する。 The deciding unit 200 detects the title information included in one conversation voice by performing voice recognition processing on the conversation voice. The deciding means 200 applies the detected title information to the title title table, and obtains a score for one conversational voice. The deciding means 200 sums up the scores for each voiceprint data, and decides the order according to the totaled scores.

たとえば、Ａ部長とＢ社員の会話音声を処理した結果、声紋データＶａには、敬称「さま」が１回、「さん」が２回、「くん」が１２回、「ちゃん」が３回含まれていたとする。決定手段２００は、敬称スコアテーブルを参照して敬称毎のスコアと回数を乗算し、乗算した敬称毎のスコアを合計することで総合点「８８点」を求める。一方、声紋データＶｂには、敬称「さま」が１回、「さん」が１０回、「くん」が２回、「ちゃん」が０回含まれていたとする。決定手段２００は、敬称スコアテーブルを参照して敬称毎のスコアと回数を乗算し、乗算した敬称毎のスコアを合計することで総合点「４１点」を求める。この場合、決定手段２００は、声紋データＶａの序列を「１位」、声紋データＶｂの序列を「２位」として決定する（図７参照）。なお、スコアの算出方法は、上記例に限られない。たとえば、決定手段２００は、敬称毎のスコアの合計を敬称の回数で除して比較してもよい。具体的には、声紋データＶａには敬称が全部で１＋２＋１２＋３＝１８回含まれ、総合点が「８８点」であった場合、決定手段２００は、スコアを８８／１８＝４．８９点と算出する。また、声紋データＶｂには敬称が全部で１＋１０＋２＋０＝１３回含まれ、総合点が「４１点」であった場合、決定手段２００は、スコアを４１／１３＝３．１５点と算出する。そして、決定手段２００は、算出したスコアを比較することにより、声紋データＶａの序列を「１位」、声紋データＶｂの序列を「２位」として決定できる。 For example, as a result of processing conversation voices of the manager A and the employee B, the voiceprint data Va includes the title “Sama” once, “San” twice, “Kun” 12 times, and “Chan” 3 times. It was assumed that The determining means 200 obtains the total score “88 points” by multiplying the score for each title by referring to the title list and multiplying the multiplied score for each title. On the other hand, it is assumed that the voiceprint data Vb includes the title “Sama” once, “San” 10 times, “Kun” twice, and “Chan” 0 times. The determining means 200 obtains the total score “41 points” by multiplying the score for each title by referring to the title table and multiplying the multiplied score for each title. In this case, the determining means 200 determines the order of the voiceprint data Va as “first” and the order of the voiceprint data Vb as “second” (see FIG. 7). The method for calculating the score is not limited to the above example. For example, the determining means 200 may compare the total of the scores for each honorific title by the number of honorific titles. Specifically, the voiceprint data Va includes 1 + 2 + 12 + 3 = 18 times in all, and when the total score is “88 points”, the determining unit 200 calculates the score as 88/18 = 4.89 points. I do. In addition, the voiceprint data Vb includes a total of 1 + 10 + 2 + 0 = 13 times for the honorific title, and when the total score is “41 points”, the determining unit 200 calculates the score as 41/13 = 3.15 points. Then, by comparing the calculated scores, the determination unit 200 can determine the rank of the voiceprint data Va as “1st” and the rank of the voiceprint data Vb as “2nd”.

＝＝カラオケ装置における処理について＝＝
次に、図８を参照して本実施形態に係るカラオケ装置１における処理の具体例について述べる。図８は、カラオケ装置１における処理例を示すフローチャートである。本実施形態における第１の記憶手段６００は、敬称スコアテーブルを記憶している。 == Processing in karaoke device ==
Next, a specific example of processing in the karaoke apparatus 1 according to the present embodiment will be described with reference to FIG. FIG. 8 is a flowchart illustrating a processing example in the karaoke apparatus 1. The first storage unit 600 in the present embodiment stores a title of honorific score table.

利用者がカラオケルームに入室した後、マイク４０は、カラオケルーム内に居る全ての利用者の会話音声を集音する（会話音声の集音。ステップ３０）。 After the user enters the karaoke room, the microphone 40 collects the conversation voices of all the users in the karaoke room (collection of conversation voices, step 30).

抽出手段１００は、ステップ３０で集音した会話音声から、各利用者の声紋データを抽出する（声紋データの抽出。ステップ３１）。 The extracting means 100 extracts voiceprint data of each user from the conversation voice collected in step 30 (voiceprint data extraction; step 31).

決定手段２００は、ステップ３０で集音した会話音声を処理して得られた敬称情報と敬称スコアテーブルとに基づいて算出したスコアにより、各声紋データの序列を決定する（敬称情報等に基づいて声紋データの序列を決定。ステップ３２）。 The determining means 200 determines the order of each voiceprint data based on the title calculated based on the title information and the title table obtained by processing the conversational voice collected in step 30 (based on the title information and the like). The order of the voiceprint data is determined (step 32).

決定手段２００は、ステップ３２で決定した序列の情報を、ステップ３１で抽出された声紋データと紐付けて記憶部１０ａに記憶させる（序列の情報及び声紋データの記憶。ステップ３３）。 The determining means 200 stores the information of the sequence determined in step 32 in association with the voiceprint data extracted in step 31 in the storage unit 10a (storage of the sequence information and voiceprint data; step 33).

ステップ３４以降の処理は、第１実施形態におけるステップ１４以降の処理と同様であるため詳細な説明を省略する。 The processing after step 34 is the same as the processing after step 14 in the first embodiment, and a detailed description thereof will be omitted.

このように、本実施形態に係るカラオケ装置１は、利用者の敬称と所定のスコアとを対応付けた敬称スコアテーブルを記憶する第１の記憶手段６００を有し、決定手段２００は、集音した会話音声を処理して得られた敬称情報と敬称スコアテーブルとに基づいて算出したスコアにより、序列を決定する。このように会話音声に含まれる敬称情報を利用することにより、より確実に序列を決定することができる。 As described above, the karaoke apparatus 1 according to the present embodiment includes the first storage unit 600 that stores the title of the title in which the title of the user is associated with the predetermined score. The rank is determined based on the title calculated based on the title information and the title table obtained by processing the spoken voice. By using the title information included in the conversation voice in this way, the order can be determined more reliably.

＜第３実施形態＞
次に、図９〜図１３を参照して、第３実施形態に係るカラオケ装置について説明する。本実施形態では、各利用者の着席位置を利用して、声紋データの序列を決定する例について説明する。なお、第１実施形態または第２実施形態と同様の構成については詳細な説明を省略する。 <Third embodiment>
Next, a karaoke apparatus according to a third embodiment will be described with reference to FIGS. In the present embodiment, an example will be described in which the rank of voiceprint data is determined using the seating position of each user. Note that a detailed description of the same configuration as the first embodiment or the second embodiment is omitted.

本実施形態に係る集音部は、各利用者の着席位置にそれぞれ設けられた複数の指向性マイクロフォンである。 The sound collection unit according to the present embodiment is a plurality of directional microphones provided at each user's seating position.

図９は、カラオケ装置１が設置されたカラオケルームＲ内を示した図である。カラオケルームＲは、所謂、「ＶＩＰルーム」であり、接待等、特別な場面での利用に適した部屋である。カラオケルームＲ内には、カラオケ装置１と共に、５つの座席（座席Ｓ１〜Ｓ５）、２つのテーブル（テーブルＴ１、Ｔ２）、及び１つのディスプレイＤが設置されている。図９の例において、表示装置３０は、カラオケルームＲの壁に設置されたディスプレイＤに相当する。 FIG. 9 is a diagram showing the inside of the karaoke room R in which the karaoke apparatus 1 is installed. The karaoke room R is a so-called “VIP room” and is a room suitable for special occasions such as entertainment. In the karaoke room R, along with the karaoke apparatus 1, five seats (seats S1 to S5), two tables (tables T1, T2), and one display D are installed. In the example of FIG. 9, the display device 30 corresponds to the display D installed on the wall of the karaoke room R.

図９の例において、各座席の近傍には、それぞれ指向性マイクロフォンＭ１〜Ｍ５が設置されている。指向性マイクロフォンは、座席に座った利用者の会話音声のみを集音することができる。 In the example of FIG. 9, directional microphones M1 to M5 are respectively installed near each seat. The directional microphone can collect only the conversation voice of the user sitting on the seat.

（第２の記憶手段）
図１０に示すように、本実施形態における記憶部１０ａの記憶領域の一部は、第２の記憶手段７００として機能する。第２の記憶手段７００は、所定範囲内における利用者の着席位置と所定のスコアとを対応付けた着席位置スコアテーブルを記憶する。 (Second storage means)
As shown in FIG. 10, a part of the storage area of the storage unit 10a in the present embodiment functions as the second storage unit 700. The second storage unit 700 stores a seating position score table in which a user's seating position within a predetermined range is associated with a predetermined score.

着席位置スコアテーブルは、所定範囲毎に設けられている。図９の例であれば、カラオケルームＲが、所定範囲に相当する。 The seating position score table is provided for each predetermined range. In the example of FIG. 9, the karaoke room R corresponds to the predetermined range.

図１１は、カラオケルームＲにおける着席位置スコアテーブルを示している。図９に示したように、カラオケルームＲについては、座席Ｓ１〜Ｓ５が設置されている。ここで、一般的なビジネスマナーにおいては、部屋の入り口から遠い席に序列の高い者が座ることが好ましいとされている。そこで、カラオケルームＲの入り口から遠い座席から順（座席Ｓ１、Ｓ２、Ｓ３、Ｓ４、Ｓ５の順）にスコアが高く（４０点、３０点、２０点、１０点、０点の順）なるようにテーブルを構成している。 FIG. 11 shows a seating position score table in the karaoke room R. As shown in FIG. 9, seats S1 to S5 are installed in the karaoke room R. Here, in general business manners, it is preferable that a person with a high rank sits in a seat far from the entrance of the room. Therefore, the score is set higher (in the order of 40 points, 30 points, 20 points, 10 points, and 0 points) in order from the seat far from the entrance of the karaoke room R (in the order of seats S1, S2, S3, S4, S5). The table is configured.

（決定手段）
本実施形態に係る決定手段２００は、集音した会話音声に対応する着席位置と着席位置スコアテーブルとに基づいて算出したスコアにより、序列を決定する。 (Determining means)
The determining means 200 according to the present embodiment determines the order based on the seat position corresponding to the collected conversation voice and the score calculated based on the seat position score table.

具体例として、株式会社ＸのＡ部長とＢ社員（役職無し）が、取引先であるＹ株式会社のＣ部長とＤ課長を接待することになり、二次会でカラオケルームＲに入室したとする。また、Ｃ部長が座席Ｓ２に着席し、Ｄ課長が座席Ｓ３に着席し、Ａ部長が座席Ｓ４に着席し、Ｂ社員が座席Ｓ５に着席したとする。この場合、Ａ部長、Ｂ社員、Ｃ部長、Ｄ課長は、カラオケ装置１の利用者に相当する。 As a specific example, it is assumed that the manager A and the employee B (no position) of X Co., Ltd. entertain the karaoke room R at the second party after receiving the business managers of the managers C and D of Y Corporation. Further, assume that the manager C sits on the seat S2, the manager D sits on the seat S3, the manager A sits on the seat S4, and the employee B sits on the seat S5. In this case, the manager A, the employee B, the manager C, and the manager D correspond to the user of the karaoke apparatus 1.

指向性マイクロフォンＭ１〜Ｍ５は、各利用者が座席に着席した時点から会話音声の集音を行い、抽出手段１００に出力する。抽出手段１００は、会話音声から、座席毎の声紋データを抽出する。この例では、利用者が４名いるため、通常は４つの声紋データが抽出される。なお、カラオケ装置１は、各座席に着席している利用者が誰か、及び抽出された各声紋データがどの利用者の声紋データであるかは識別できない。 The directional microphones M <b> 1 to M <b> 5 collect the conversation voice from the time when each user takes a seat, and output the voice to the extraction unit 100. The extracting means 100 extracts voiceprint data for each seat from the conversation voice. In this example, since there are four users, usually four voiceprint data are extracted. Note that the karaoke apparatus 1 cannot identify who is seated in each seat and which user's voiceprint data is the extracted voiceprint data.

ここで、指向性マイクロフォンと座席は一対一に対応している。従って、たとえば指向性マイクロフォンＭ１により集音された会話音声から抽出した声紋データＶ１１は、序列が最も高い利用者の声紋データであると判断できる。この例では、指向性マイクロフォンＭ２〜Ｍ５が収集した会話音声から、声紋データＶ１２〜Ｖ１５が抽出される。 Here, the directional microphone and the seat correspond one-to-one. Therefore, for example, the voiceprint data V11 extracted from the conversation voice collected by the directional microphone M1 can be determined to be the voiceprint data of the user having the highest rank. In this example, voiceprint data V12 to V15 are extracted from the conversation voice collected by the directional microphones M2 to M5.

決定手段２００は、各利用者の着席位置（すなわち、各指向性マイクロフォンの設置位置）を、着席位置スコアテーブルに当てはめ、集音された声紋データ毎のスコアを算出する。決定手段２００は、声紋データ毎のスコアに応じて序列を決定する。 The deciding unit 200 applies the seating position of each user (that is, the installation position of each directional microphone) to the seating position score table, and calculates a score for each collected voiceprint data. The determining means 200 determines the order according to the score for each voiceprint data.

たとえば、声紋データＶ１２は、座席位置Ｓ２に設置された指向性マイクロフォンＭ２で集音した会話音声から抽出される。よって、決定手段２００は、声紋データＶ１２について、座席位置Ｓ２及び着席位置スコアテーブルからスコア「３０点」を算出する。同様に、声紋データＶ１３は、座席位置Ｓ３に設置された指向性マイクロフォンＭ３で集音した会話音声から抽出される。よって、決定手段２００は、声紋データＶ１３について、座席位置Ｓ３及び着席位置スコアテーブルからスコア「２０点」を算出する。声紋データＶ１４は、座席位置Ｓ４に設置された指向性マイクロフォンＭ４で集音した会話音声から抽出される。よって、決定手段２００は、声紋データＶ１３について、座席位置Ｓ４及び着席位置スコアテーブルからスコア「１０点」を算出する。声紋データＶ１５は、座席位置Ｓ５に設置された指向性マイクロフォンＭ５で集音した会話音声から抽出される。よって、決定手段２００は、声紋データＶ１５について、座席位置Ｓ５及び着席位置スコアテーブルからスコア「０点」を算出する。 For example, the voiceprint data V12 is extracted from a conversation voice collected by the directional microphone M2 installed at the seat position S2. Therefore, the determining means 200 calculates a score “30 points” from the seat position S2 and the seating position score table for the voiceprint data V12. Similarly, the voice print data V13 is extracted from the conversation voice collected by the directional microphone M3 installed at the seat position S3. Therefore, the determination means 200 calculates a score “20 points” for the voiceprint data V13 from the seat position S3 and the seating position score table. The voiceprint data V14 is extracted from the conversation voice collected by the directional microphone M4 installed at the seat position S4. Therefore, the determining means 200 calculates a score “10 points” for the voiceprint data V13 from the seat position S4 and the seating position score table. The voiceprint data V15 is extracted from the conversation voice collected by the directional microphone M5 installed at the seat position S5. Therefore, the determining means 200 calculates a score “0” for the voiceprint data V15 from the seat position S5 and the seating position score table.

この場合、決定手段２００は、声紋データＶ１２の序列を「１位」、声紋データＶ１３の序列を「２位」、声紋データＶ１４の序列を「３位」、声紋データＶ１５の序列を「４位」として決定する（図１２参照）。 In this case, the determining unit 200 determines that the rank of the voiceprint data V12 is “1st”, the rank of the voiceprint data V13 is “2nd”, the rank of the voiceprint data V14 is “3rd”, and the rank of the voiceprint data V15 is “4th”. (See FIG. 12).

＝＝カラオケ装置における処理について＝＝
次に、図１３を参照して本実施形態に係るカラオケ装置１における処理の具体例について述べる。図１３は、カラオケ装置１における処理例を示すフローチャートである。本実施形態では、複数の利用者が図９に示したカラオケルームＲに入室し、各自が座席に着席したとする。本実施形態における第２の記憶手段７００は、着席位置スコアテーブルを記憶している。 == Processing in karaoke device ==
Next, a specific example of processing in the karaoke apparatus 1 according to the present embodiment will be described with reference to FIG. FIG. 13 is a flowchart illustrating a processing example in the karaoke apparatus 1. In the present embodiment, it is assumed that a plurality of users enter the karaoke room R shown in FIG. 9 and each of them takes a seat. The second storage unit 700 in the present embodiment stores a seating position score table.

利用者が着席した後、各指向性マイクロフォンは、座席毎に利用者の会話音声を集音する（座席毎に会話音声を集音。ステップ５０）。 After the user is seated, each directional microphone collects the conversation voice of the user for each seat (collects conversation voice for each seat. Step 50).

抽出手段１００は、ステップ５０で集音した会話音声から、各利用者の声紋データを抽出する（声紋データの抽出。ステップ５１）。 The extracting means 100 extracts voiceprint data of each user from the conversation voice collected in step 50 (extraction of voiceprint data; step 51).

決定手段２００は、ステップ５０で集音した会話音声に対応する着席位置と着席位置スコアテーブルとに基づいて算出したスコアにより、各声紋データの序列を決定する（着席位置等に基づいて声紋データの序列を決定。ステップ５２）。 The determining means 200 determines the order of each voiceprint data based on the seat position corresponding to the conversation voice collected in step 50 and the score calculated based on the seat position score table (the voiceprint data of the voiceprint data is determined based on the seat position and the like). Determine the order, step 52).

決定手段２００は、ステップ５２で決定した序列の情報を、ステップ５１で抽出された声紋データと紐付けて記憶部１０ａに記憶させる（序列の情報及び声紋データの記憶。ステップ５３）。 The determining means 200 stores the information of the order determined in step 52 in association with the voiceprint data extracted in step 51 in the storage unit 10a (storage of information of the order and voiceprint data; step 53).

ステップ５４以降の処理は、第１実施形態におけるステップ１４以降の処理と同様であるため詳細な説明を省略する。 Since the processing after step 54 is the same as the processing after step 14 in the first embodiment, a detailed description is omitted.

このように、本実施形態に係るカラオケ装置１は、所定範囲内における利用者の着席位置と所定のスコアとを対応付けた着席位置スコアテーブルを記憶する第２の記憶手段７００を有し、集音部は、各利用者の着席位置にそれぞれ設けられた複数の指向性マイクロフォンＭ１〜Ｍ５である。決定手段２００は、集音した会話音声に対応する着席位置と着席位置スコアテーブルとに基づいて算出したスコアにより、序列を決定する。このように会話音声に対応する着席位置を利用することにより、より確実に序列を決定することができる。 As described above, the karaoke apparatus 1 according to the present embodiment includes the second storage unit 700 that stores the seating position score table in which the user's seating position within the predetermined range is associated with the predetermined score. The sound section is a plurality of directional microphones M1 to M5 provided at the seating positions of the respective users. The deciding means 200 decides the order based on the seat position corresponding to the collected conversation voice and the score calculated based on the seat position score table. By using the seating position corresponding to the conversation voice in this way, the order can be determined more reliably.

なお、上記実施形態では、各座席に設置された指向性マイクロフォンを利用する例について説明したがこれに限られない。たとえば、二本のマイクロフォンを用いて複数の利用者の会話音声をステレオ信号で集音する。カラオケ装置１は、集音した会話音声を音源分離し、各会話音声の定位や音量を検出することで、各利用者の位置（着席した座席）を求めてもよい。なお、音源分離、及び会話音声の定位や音量の検出は公知の技術を利用できる。 In the above embodiment, an example is described in which a directional microphone installed in each seat is used, but the present invention is not limited to this. For example, conversation sounds of a plurality of users are collected as stereo signals using two microphones. The karaoke apparatus 1 may determine the position (seated seat) of each user by separating the sound source of the collected conversation sound and detecting the localization and volume of each conversation sound. Known techniques can be used for sound source separation and localization and volume detection of conversational voice.

また、決定手段２００は、集音された会話音声から公知の技術を用いて性別情報を取得し、座席位置の情報と併せて序列を決定してもよい。たとえば、ある利用者が異性の隣に着席している場合、当該ある利用者はコンパニオンの接客を受けており、接待される側すなわち序列の高い利用者であるとみなすことができる。 Further, the determining means 200 may acquire gender information from the collected conversation voice using a known technique, and determine the order together with the information on the seat position. For example, if a certain user is sitting next to the opposite sex, the certain user is receiving a companion, and can be regarded as a side to be treated, that is, a user with a high rank.

＜その他＞
第１実施形態〜第３実施形態は、適宜組み合わせて実施することが可能である。たとえば、決定手段２００は、声紋データ毎に、テキスト化された会話音声の中から敬語を検出し、全文字数における敬語の文字数の占める割合によるスコア（第１実施形態参照）と、検出した敬称情報を敬称スコアテーブルに当てはめて算出したスコア（第２実施形態）と、集音した会話音声に対応する着席位置と着席位置スコアテーブルとに基づいて算出したスコア（第３実施形態）とを合計したスコアを求め、当該合計したスコアに基づいて序列を決定してもよい。 <Others>
The first to third embodiments can be implemented in appropriate combinations. For example, for each voiceprint data, the determining unit 200 detects a honorific word from the text-based conversational voice, and calculates a score based on the ratio of the number of honorific words to the total number of characters (see the first embodiment) and the detected honorific information (Second embodiment), and a score calculated based on the seating position and the seating position score table corresponding to the collected conversational voice (third embodiment). A score may be obtained, and a ranking may be determined based on the total score.

上記実施形態は、例として提示したものであり、発明の範囲を限定するものではない。上記の構成は、適宜組み合わせて実施することが可能であり、発明の要旨を逸脱しない範囲で、種々の省略、置き換え、変更を行うことができる。上記実施形態やその変形は、発明の範囲や要旨に含まれると同様に、特許請求の範囲に記載された発明とその均等の範囲に含まれる。 The above embodiments have been presented by way of example and do not limit the scope of the invention. The above configurations can be appropriately combined and implemented, and various omissions, replacements, and changes can be made without departing from the spirit of the invention. The above-described embodiments and their modifications are included in the scope and gist of the invention, and are also included in the invention described in the claims and equivalents thereof.

１カラオケ装置
１００抽出手段
２００決定手段
３００採点手段
４００補正手段
５００提示手段
６００第１の記憶手段
７００第２の記憶手段 1 Karaoke device 100 Extraction means 200 Determination means 300 Scoring means 400 Correction means 500 Presentation means 600 First storage means 700 Second storage means

Claims

A karaoke device having a scoring function for evaluating a karaoke song by a user and presenting a scoring result,
A sound collection unit capable of collecting conversation voices of all users within a predetermined range,
A control unit for controlling the karaoke device;
Has,
The control unit includes:
Extracting means for extracting voiceprint data of each user from the collected conversation voice;
Determining means for determining an order of each voiceprint data based on the collected conversation voice, and storing information of the order in association with each of the voiceprint data;
When a certain user sings a karaoke song, the voiceprint data matching the voiceprint authentication data extracted from the singing voice of the certain user is specified, and the order of the sequence linked to the specified voiceprint data is specified. Correction means for correcting the scoring result of the karaoke song of the user based on the information,
A karaoke device that functions as a

2. The karaoke apparatus according to claim 1, wherein the determining unit determines the order based on a score calculated based on honorific information obtained by processing the collected conversation voice. 3.

A first storage unit for storing a title table in which the title of the user is associated with a predetermined score;
3. The order determining unit according to claim 1, wherein the determining unit determines the order based on a title calculated based on the honorific information obtained by processing the collected conversational voice and the honorific score table. Karaoke equipment.

A second storage unit that stores a seating position score table that associates a user's seating position within the predetermined range with a predetermined score,
The sound collection unit is a plurality of directional microphones,
4. The method according to claim 1, wherein the determining unit determines the order based on a score calculated based on a seating position corresponding to the collected conversation voice and the seating position score table. 5. A karaoke apparatus according to any one of the above.