JP5553446B2

JP5553446B2 - Amusement system

Info

Publication number: JP5553446B2
Application number: JP2010235043A
Authority: JP
Inventors: 基康田中
Original assignee: MegaChips Corp
Current assignee: MegaChips Corp
Priority date: 2010-10-20
Filing date: 2010-10-20
Publication date: 2014-07-16
Anticipated expiration: 2030-10-20
Also published as: JP2012088521A

Description

本発明は、ユーザが再生される楽曲データに合わせて歌を歌うことができるアミューズメントシステムに関する。 The present invention relates to an amusement system in which a user can sing a song in accordance with music data to be reproduced.

カラオケでは、ユーザが選択した楽曲の演奏に合わせて歌を歌う形式が一般的である。カラオケ装置のモニタには、ユーザが選択した楽曲の歌詞が、楽曲の演奏の進行に合わせて表示される。これにより、ユーザは、選択した楽曲の歌詞を全て覚えていなくても、カラオケを楽しむことができる。モニタに表示される歌詞の背景には、歌を歌うキャラクタの映像や、楽曲のイメージに合わせた映像などが表示される。 In karaoke, a format in which a song is sung in accordance with the performance of the music selected by the user is common. On the monitor of the karaoke apparatus, the lyrics of the music selected by the user are displayed as the performance of the music progresses. Thereby, even if the user does not remember all the lyrics of the selected music, he can enjoy karaoke. On the background of the lyrics displayed on the monitor, a video of a character that sings a song, a video that matches the image of the music, and the like are displayed.

カラオケは、家族あるいは友人同士などの少人数のグループで楽しむことが多い。カラオケを盛り上げるために、歌を歌うユーザが好みのタイミングで効果音などを発生することができる装置が特許文献１に開示されている。 Karaoke is often enjoyed by small groups such as family members or friends. Japanese Patent Application Laid-Open No. 2004-151867 discloses an apparatus that can generate a sound effect or the like at a desired timing for a user who sings a song to excite karaoke.

下記特許文献１に係る電子パーカッション装置は、マラカスなど打楽器の形状をしており、加速度センサを備えている。電子パーカッション装置は、楽曲の間奏が演奏されているときなど、自装置を操作することができるタイミングを、ＬＥＤを点灯させることによりユーザに通知する。ＬＥＤの点灯期間内にユーザが電子パーカッション装置を振ることによって、パーカッション音が再生される。 An electronic percussion device according to the following Patent Document 1 has a percussion instrument shape such as maracas, and includes an acceleration sensor. The electronic percussion device notifies the user of the timing at which the device can be operated by turning on the LED, such as when an interlude is being played. When the user shakes the electronic percussion device during the lighting period of the LED, the percussion sound is reproduced.

特開２００４−２８７０２０号公報JP 2004-287020 A

上述したように、カラオケは、少人数のグループで楽しむことが多い。ユーザが歌う歌を聞く人は、カラオケに参加した家族あるいは友人等に限られる。つまり、カラオケでは、ライブのシンガーのように、多くの観客に自分の歌を聴いてもらうという体験をすることができない。 As described above, karaoke is often enjoyed by a small group. A person who listens to a song sung by the user is limited to a family member or a friend who participates in karaoke. In other words, in karaoke, you can't have many spectators listen to your song like a live singer.

そこで、本発明は、前記問題点に鑑み、臨場感の高いライブ演奏を疑似的に体験することができるアミューズメントシステムを提供することを目的とする。 Therefore, in view of the above problems, an object of the present invention is to provide an amusement system capable of experiencing a live performance with a high sense of presence in a pseudo manner.

上記課題を解決するため、請求項１記載の発明は、アミューズメントシステムであって、本体装置と、ユーザが保持し、前記ユーザが入力した音声を音声データとして出力する音声入力装置と、前記音声データに対する音声認識処理を行って、前記ユーザが話したフレーズを示すフレーズ情報を生成する音声認識装置と、を備え、前記音声入力装置は、前記音声入力装置の動きを示す第１動き情報を出力する第１動き情報出力部、を含み、前記本体装置は、前記ユーザが選択した楽曲データを再生する再生部と、前記フレーズ情報及び前記第１動き情報に基づいて、前記ユーザのパフォーマンスを特定するパフォーマンス特定部と、特定されたパフォーマンスに対する聴衆の反応レベルを、前記フレーズ情報が検出されたタイミングと前記第１動き情報が検出されたタイミングとの時間差に基づいて決定し、前記聴衆の反応を示す複数のリアクションデータの中から、前記特定されたパフォーマンス及び前記聴衆の反応レベルに応じた再生リアクションデータを選択し、前記再生リアクションデータの再生を前記再生部に指示する反応指示部と、を含む。 In order to solve the above-mentioned problem, an invention according to claim 1 is an amusement system, comprising a main unit, a voice input device held by a user and outputting a voice inputted by the user as voice data, and the voice data A speech recognition device that performs a speech recognition process on the device to generate phrase information indicating a phrase spoken by the user, and the speech input device outputs first motion information indicating a motion of the speech input device. first motion information output unit includes the main unit, a reproducing unit for reproducing the music data selected by the user, on the basis of the phrase information and the first motion information, identify performance of the user and performance specification unit that, the response level of the audience for the identified performance, said timing of the phrase information is detected first Determined based on a time difference between the timing at which the motion information is detected, from among the plurality of the reaction data indicating the response of the audience, to select the playback reaction data corresponding to response levels of the identified performance and the audience A reaction instruction unit that instructs the reproduction unit to reproduce the reproduction reaction data.

請求項２記載の発明は、アミューズメントシステムであって、本体装置と、ユーザが保持し、前記ユーザが入力した音声を音声データとして出力する音声入力装置と、前記ユーザを撮影して映像データを出力する撮像装置と、前記音声データに対する音声認識処理を行って、前記ユーザが話したフレーズを示すフレーズ情報を生成する音声認識装置と、を備え、前記本体装置は、前記ユーザが選択した楽曲データを再生する再生部と、前記映像データを解析して前記ユーザの動きを示す第１動き情報を生成する映像解析部と、前記フレーズ情報及び前記第１動き情報に基づいて、前記ユーザのパフォーマンスを特定するパフォーマンス特定部と、特定されたパフォーマンスに対する聴衆の反応レベルを、前記フレーズ情報が検出されたタイミングと前記第１動き情報が検出されたタイミングとの時間差に基づいて決定し、前記聴衆の反応を示す複数のリアクションデータの中から、前記特定されたパフォーマンス及び前記聴衆の反応レベルに応じた再生リアクションデータを選択し、前記再生リアクションデータの再生を前記再生部に指示する反応指示部と、を含む。 The invention according to claim 2 is an amusement system, which is a main body device, a voice input device that is held by a user and that outputs voice inputted by the user as voice data, and outputs video data by photographing the user. an imaging device for the performing speech recognition processing on audio data, and a speech recognition device for generating a phrase information indicating a phrase that the user has spoken, the main unit, the music data selected by the user and a reproduction unit for reproducing, the video analysis unit generates the first motion information indicating a motion of the user by analyzing the image data, based on the phrase information and the first motion information, the user performance and performance specifying unit configured to specify the reaction level of the audience for the identified performance, the phrase information is detected Timing Play is determined based on the time difference between the timing of grayed and the first motion information is detected, from among the plurality of the reaction data indicating the response of the audience, according to the response level of the identified performance and the audience A reaction instruction unit that selects reaction data and instructs the reproduction unit to reproduce the reproduction reaction data.

請求項３記載の発明は、請求項１または請求項２に記載のアミューズメントシステムにおいて、前記反応指示部は、前記聴衆の反応レベルに基づいて、前記再生リアクションデータの再生条件を決定し、前記再生条件を前記再生部に指示する。 According to a third aspect of the present invention, in the amusement system according to the first or second aspect , the reaction instruction unit determines a reproduction condition of the reproduction reaction data based on a reaction level of the audience, and the reproduction A condition is instructed to the playback unit.

請求項４記載の発明は、請求項３に記載のアミューズメントシステムにおいて、前記反応指示部は、前記ユーザにより設定された聴衆の客層に基づいて、前記聴衆の反応レベルを決定するアミューズメントシステム。 According to a fourth aspect of the present invention, in the amusement system according to the third aspect, the reaction instructing unit determines a reaction level of the audience based on the audience of the audience set by the user .

請求項５記載の発明は、請求項１ないし請求項４のいずれかに記載のアミューズメントシステムにおいて、前記パフォーマンス特定部は、前記フレーズ情報に基づいて、前記ユーザが前記聴衆に対して質問形式の特定のフレーズを呼び掛け、かつ、前記第１動き情報に基づいて、前記音声入力装置の向きが反転したと判定した場合、前記ユーザが前記聴衆を煽るパフォーマンスを行ったと判定し、前記反応指示部は、前記再生リアクションデータとして、前記特定のフレーズに対して聴衆が一斉に応答をする映像及び音声が記録されたリアクションデータを選択する。 According to a fifth aspect of the present invention, in the amusement system according to any one of the first to fourth aspects, the performance specifying unit specifies a question format for the audience based on the phrase information. And when it is determined that the direction of the voice input device is reversed based on the first movement information, it is determined that the user has performed a performance of scolding the audience, and the reaction instruction unit includes: As the reproduction reaction data, reaction data in which video and audio in which the audience responds simultaneously to the specific phrase is selected is selected.

請求項６記載の発明は、請求項１ないし請求項４のいずれかに記載のアミューズメントシステムにおいて、前記パフォーマンス特定部は、前記フレーズ情報に基づいて、前記ユーザが前記聴衆に対して合唱を要求する特定のフレーズを呼び掛け、かつ、前記第１動き情報に基づいて、前記音声入力装置の向きが反転したと判定した場合、前記ユーザが前記聴衆に合唱を要求するパフォーマンスを行ったと判定し、前記反応指示部は、前記再生リアクションデータとして、前記聴衆が合唱する映像及び音声が記録されたリアクションデータを選択する。 According to a sixth aspect of the present invention, in the amusement system according to any one of the first to fourth aspects, the performance specifying unit requests the audience to chorus based on the phrase information. When calling a specific phrase and determining that the direction of the voice input device is reversed based on the first movement information, the user determines that the user has performed a performance requesting chorus, and the response The instructing unit selects reaction data in which video and audio sung by the audience are recorded as the reproduction reaction data.

請求項７記載の発明は、請求項１ないし請求項４のいずれかに記載のアミューズメントシステムにおいて、前記パフォーマンス特定部は、前記動き情報に基づいて、前記ユーザが手拍子を行っていると判定した場合、前記ユーザが手拍子を先導するパフォーマンスを行ったと判定し、前記反応指示部は、前記再生リアクションデータとして、前記聴衆が手拍子する映像及び手拍子の音声が記録されたリアクションデータを選択する。 The invention according to claim 7 is the amusement system according to any one of claims 1 to 4 , wherein the performance specifying unit determines that the user is clapping based on the motion information Then, it is determined that the user has performed a performance that leads the clapping, and the reaction instruction unit selects reaction data in which video and clapping sound recorded by the audience are recorded as the reproduction reaction data.

請求項８記載の発明は、請求項１ないし請求項４のいずれかに記載のアミューズメントシステムにおいて、前記パフォーマンス特定部は、前記動き情報に基づいて、前記ユーザが腕を振る動作をしていると判定した場合、前記ユーザが前記聴衆に対して両手を振る動作を要求するパフォーマンスを行ったと判定し、前記反応指示部は、前記再生リアクションデータとして、前記聴衆が両手を振る映像が記録されたリアクションデータを選択する。 According to an eighth aspect of the present invention, in the amusement system according to any one of the first to fourth aspects, the performance specifying unit performs an operation in which the user swings his / her arm based on the movement information. When the determination is made, it is determined that the user has performed a performance requesting an operation of waving both hands to the audience, and the reaction instruction unit is a reaction in which an image of the audience shaking hands is recorded as the reproduction reaction data. Select data.

請求項９記載の発明は、請求項１、請求項３ないし請求項８のいずれかに記載のアミューズメントシステムにおいて、さらに、前記ユーザが前記音声入力装置を保持する手と反対の手で保持するコントローラ、を備え、前記コントローラは、前記コントローラの動きを示す第２動き情報を出力する第２動き情報出力部、を含み、前記パフォーマンス特定部は、前記第２動き情報に基づいて、前記ユーザのパフォーマンスを特定する。 A ninth aspect of the present invention is the amusement system according to any one of the first, third, and eighth aspects, wherein the user holds the voice input device with a hand opposite to the hand that holds the voice input device. The controller includes a second motion information output unit that outputs second motion information indicating the motion of the controller, and the performance specifying unit is configured to perform the performance of the user based on the second motion information. Is identified.

請求項１０記載の発明は、請求項３ないし請求項９のいずれかに記載のアミューズメントシステムにおいて、前記反応指示部は、前記楽曲データの第１パートで行われた全てのパフォーマンスを、各パフォーマンスに対する前記聴衆の反応レベルに基づいて採点し、前記楽曲データの第２パートで行われた全てのパフォーマンスを、各パフォーマンスに対する前記聴衆の反応レベルに基づいて採点する。 According to a tenth aspect of the present invention, in the amusement system according to any one of the third to ninth aspects, the reaction instruction unit performs all the performances performed in the first part of the music data on each performance. Scoring is based on the audience response level, and all performances performed in the second part of the music data are scored based on the audience response level for each performance.

請求項１１記載の発明は、アミューズメントシステムであって、本体装置と、ユーザが保持する音声入力装置と、を備え、前記音声入力装置は、前記本体装置と無線通信を行う無線通信部、を含み、前記本体装置は、前記ユーザが選択した楽曲データを再生する再生部と、前記音声入力装置から送信される無線信号の有無に基づいて、前記ユーザのパフォーマンスを特定するパフォーマンス特定部と、聴衆の反応を示す音声又は映像が記録された複数のリアクションデータの中から、特定されたパフォーマンスに対応する再生リアクションデータを選択し、前記再生リアクションデータの再生を前記再生部に指示する反応指示部と、を含む。 The invention according to claim 11 is an amusement system comprising a main unit and a voice input device held by a user, and the voice input device includes a radio communication unit that performs radio communication with the main unit. the main unit includes a reproducing unit for reproducing the music data selected by the user, based on the presence or absence of a radio signal transmitted from the audio input device, and performance specifying unit for specifying the performance of the user, hearing A reaction instruction unit that selects reproduction reaction data corresponding to the specified performance from a plurality of reaction data in which audio or video indicating a public reaction is recorded, and instructs the reproduction unit to reproduce the reproduction reaction data. And including.

請求項１２記載の発明は、請求項１１に記載のアミューズメントシステムにおいて、前記パフォーマンス特定部は、前記楽曲データの再生が開始された後に、前記無線信号を検出した場合、前記ユーザが仮想的なライブ会場に入場したと判定する。 According to a twelfth aspect of the present invention, in the amusement system according to the eleventh aspect , when the performance specifying unit detects the radio signal after the reproduction of the music data is started, the user performs a virtual live operation. It is determined that you have entered the venue.

請求項１３記載の発明は、請求項１１または請求項１２に記載のアミューズメントシステムにおいて、前記パフォーマンス特定部は、前記楽曲データの再生が開始された後に、前記無線信号を検出できなくなった場合、前記ユーザが仮想的なライブ会場から退場したと判定する。 According to a thirteenth aspect of the present invention, in the amusement system according to the eleventh or twelfth aspect , when the performance specifying unit becomes unable to detect the radio signal after the reproduction of the music data is started, It is determined that the user has left the virtual live venue.

請求項１４記載の発明は、請求項１ないし請求項１３のいずれかに記載のアミューズメントシステムにおいて用いられる音声入力装置である。 A fourteenth aspect of the present invention is a voice input device used in the amusement system according to any one of the first to thirteenth aspects.

請求項１５記載の発明は、ユーザが保持し、前記ユーザが入力した音声を音声データとして出力し、自装置の動きを示す第１動き情報を出力する音声入力装置と、前記音声データに対する音声認識処理を行って、前記ユーザが話したフレーズを示すフレーズ情報を生成する音声認識装置と、通信可能な本体装置に搭載されるコンピュータを、前記ユーザが選択した楽曲データを再生する再生部、前記フレーズ情報及び前記第１動き情報の少なくとも一方に基づいて、前記ユーザのパフォーマンスを特定するパフォーマンス特定部、特定されたパフォーマンスに対する聴衆の反応レベルを、前記フレーズ情報が検出されたタイミングと前記第１動き情報が検出されたタイミングとの時間差に基づいて決定し、前記聴衆の反応を示す複数のリアクションデータの中から、前記特定されたパフォーマンス及び前記聴衆の反応レベルに応じた再生リアクションデータを選択し、前記再生リアクションデータの再生を前記再生部に指示する反応指示部、として機能させるためのプログラムである。 The invention according to claim 15 is a voice input device that outputs voice data that is held by a user and input by the user as voice data, and outputs first motion information indicating a motion of the device itself, and voice recognition for the voice data. processing performed, a voice recognition device for generating a phrase information indicating a phrase that the user has spoken, the computer to be installed in communicable main unit reproducing unit for reproducing the music data selected by the user, the Based on at least one of the phrase information and the first motion information, a performance identifying unit that identifies the performance of the user, the audience's response level to the identified performance, the timing at which the phrase information is detected, and the first motion information is determined based on the time difference between the detected timing, a plurality of reactance indicating the reaction of the audience Among Yondeta, a program for functioning to select the reproduction reaction data corresponding to response levels of the identified performance and the audience, the reproduction of the reproduction reaction data as a reaction instruction unit, to instruct the playback unit is there.

請求項１６記載の発明は、ユーザが保持し、前記ユーザが入力した音声を音声データとして出力する音声入力装置と、前記ユーザを撮影して映像データを出力する撮像装置と、
前記音声データに対する音声認識処理を行って、前記ユーザが話したフレーズを示すフレーズ情報を生成する音声認識装置と、通信可能な本体装置に搭載されるコンピュータを、前記ユーザが選択した楽曲データを再生する再生部、前記映像データを解析して前記ユーザの動きを示す第１動き情報を生成する映像解析部、前記フレーズ情報及び前記第１動き情報の少なくとも一方に基づいて、前記ユーザのパフォーマンスを特定するパフォーマンス特定部、特定されたパフォーマンスに対する聴衆の反応レベルを、前記フレーズ情報が検出されたタイミングと前記第１動き情報が検出されたタイミングとの時間差に基づいて決定し、前記聴衆の反応を示す複数のリアクションデータの中から、前記特定されたパフォーマンス及び前記聴衆の反応レベルに応じた再生リアクションデータを選択し、前記再生リアクションデータの再生を前記再生部に指示する反応指示部、として機能させるためのプログラムである。 The invention according to claim 16 is an audio input device that is held by a user and outputs audio input by the user as audio data, an imaging device that shoots the user and outputs video data,
Performing speech recognition processing on the audio data, and voice recognition device for generating a phrase information indicating a phrase that the user has spoken, the computer to be installed in communicable main device, the music data selected by the user Based on at least one of the phrase information and the first motion information, a playback unit that reproduces, a video analysis unit that analyzes the video data and generates first motion information indicating the user's motion, A performance identifying unit that identifies the audience's response level to the identified performance based on a time difference between the timing at which the phrase information is detected and the timing at which the first motion information is detected ; from a plurality of the reaction data indicating the specified performance and anti of the audience Select reproduction reaction data corresponding to the level, a program for operating the reproduction of the reproduction reaction data as a reaction instruction unit, to instruct the playback unit.

請求項１７記載の発明は、ユーザが保持する音声入力装置と無線通信が可能な本体装置に搭載されるコンピュータを、前記ユーザが選択した楽曲データを再生する再生部、前記音声入力装置から送信される無線信号の有無に基づいて、前記ユーザのパフォーマンスを特定するパフォーマンス特定部、聴衆の反応を示す複数のリアクションデータの中から、特定されたパフォーマンスに対応する再生リアクションデータを選択し、前記再生リアクションデータの再生を前記再生部に指示する反応指示部、として機能させるためのプログラムである。 The invention of claim 17, wherein the transmission of the computer the user is mounted on the main device capable of voice input device and the wireless communication for holding, reproduction unit for reproducing the music data selected by the user, from the voice input device Based on the presence / absence of a radio signal, a performance specifying unit for specifying the performance of the user, and selecting a playback reaction data corresponding to the specified performance from a plurality of reaction data indicating an audience reaction, It is a program for functioning as a reaction instruction unit that instructs the reproduction unit to reproduce reaction data.

本発明において、本体装置は、ライブの聴衆の状況を示す聴衆状況データを再生して、モニタなどに表示する。本体装置は、ユーザがマイクに入力したフレーズを示すフレーズ情報や、ユーザの動きを示す動き情報に基づいて、ユーザが行うライブパフォーマンスを特定し、ライブパフォーマンスに対する聴衆の反応を示すリアクションデータを再生する。これにより、ユーザは、表示された聴衆に対してライブ演奏を提供するシンガーとして振る舞いながら、カラオケを楽しむことができるとともに、臨場感の高いライブ演奏を疑似的に体験することができる。 In the present invention, the main device reproduces the audience status data indicating the status of the live audience and displays it on a monitor or the like. The main device identifies the live performance performed by the user based on the phrase information indicating the phrase input to the microphone by the user and the movement information indicating the user's movement, and reproduces reaction data indicating the audience's reaction to the live performance. . Accordingly, the user can enjoy karaoke while acting as a singer that provides a live performance to the displayed audience, and can experience a live performance with a high sense of reality in a pseudo manner.

本発明の実施の形態によるカラオケシステムの全体図である。1 is an overall view of a karaoke system according to an embodiment of the present invention. 本体装置の機能的構成を示すブロック図である。It is a block diagram which shows the functional structure of a main body apparatus. マイクの機能的構成を示すブロック図である。It is a block diagram which shows the functional structure of a microphone. コントローラの機能的構成を示すブロック図である。It is a block diagram which shows the functional structure of a controller. 本体装置の動作の流れを示すフローチャートである。It is a flowchart which shows the flow of operation | movement of a main body apparatus. モニタの画面に表示される観客の映像を示す図である。It is a figure which shows the image | video of the spectator displayed on the screen of a monitor. 画面内の観客に対して観客煽りを行う手順を示す図である。It is a figure which shows the procedure of performing audience talk with respect to the audience in a screen. 画面内の観客に対して観客煽りを行う手順を示す図である。It is a figure which shows the procedure of performing audience talk with respect to the audience in a screen. 画面内の観客に対して合唱を先導する手順を示す図である。It is a figure which shows the procedure which leads chorus with respect to the audience in a screen. 画面内の観客に対して合唱を先導する手順を示す図である。It is a figure which shows the procedure which leads chorus with respect to the audience in a screen. 画面内の観客に対して手拍子を先導する手順を示す図である。It is a figure which shows the procedure of leading clapping with respect to the audience in a screen. 画面内の観客に対して手拍子を先導する手順を示す図である。It is a figure which shows the procedure of leading clapping with respect to the audience in a screen. 画面内の観客に対してウェーブ（両手を大きく振る動作）を先導する手順を示す図である。It is a figure which shows the procedure of leading a wave (operation | movement which shakes both hands largely) with respect to the audience in a screen. 画面内の観客に対してウェーブを先導する手順を示す図である。It is a figure which shows the procedure of leading a wave with respect to the audience in a screen. ユーザがライブ会場に仮想的に入場する手順を示す図である。It is a figure which shows the procedure in which a user virtually enters a live venue. ライブの客層を設定する設定画面を示す図である。It is a figure which shows the setting screen which sets a live customer segment.

以下、図面を参照しつつ、本発明の実施の形態について説明する。本実施の形態では、アミューズメントシステムの一例として、カラオケシステムについて説明する。 Hereinafter, embodiments of the present invention will be described with reference to the drawings. In this embodiment, a karaoke system will be described as an example of an amusement system.

｛１．カラオケシステム１の全体構成｝
図１は、カラオケシステム１の全体図である。カラオケシステム１は、本体装置２と、モニタ３と、マイク４と、コントローラ５とを備える。 {1. Overall configuration of karaoke system 1}
FIG. 1 is an overall view of the karaoke system 1. The karaoke system 1 includes a main body device 2, a monitor 3, a microphone 4, and a controller 5.

本体装置２は、カラオケシステム１の全体制御を行う処理装置であり、ユーザが選択した楽曲に対応する楽曲データ及び映像データを再生する。ユーザは、図示しないリモコンを用いて、カラオケで歌う楽曲を選択する。 The main device 2 is a processing device that performs overall control of the karaoke system 1 and reproduces music data and video data corresponding to the music selected by the user. The user selects music to sing at karaoke using a remote controller (not shown).

モニタ３は、液晶ディスプレイなどであり、本体装置２から出力される映像を表示する。モニタ３は、図示しないスピーカを備えており、本体装置２により再生される楽曲データを音声として出力する。 The monitor 3 is a liquid crystal display or the like, and displays an image output from the main device 2. The monitor 3 includes a speaker (not shown), and outputs music data reproduced by the main body device 2 as sound.

マイク４は、ユーザが歌を歌うときに音声を入力する音声入力装置である。マイク４は、自装置の動きを検出する動き検出機能を有し、ユーザの動きを示す動き情報を本体装置２に送信する。 The microphone 4 is a voice input device that inputs voice when the user sings a song. The microphone 4 has a motion detection function for detecting the motion of the own device, and transmits motion information indicating the user's motion to the main body device 2.

コントローラ５は、自装置の動きを検出する動き検出機能を有しており、動き情報を本体装置２に送信する。ユーザは、一方の手にマイク４を持ち、他方の手にコントローラ５を持って歌を歌う。図１に示すように、コントローラ５は、直方体状の形状であるが、リストバンドのように、ユーザの腕にはめることができる形状であってもよい。この場合、ユーザがコントローラ５を誤って落とすことを防止できる。 The controller 5 has a motion detection function for detecting the motion of the device itself, and transmits motion information to the main device 2. The user sings a song with the microphone 4 in one hand and the controller 5 in the other hand. As shown in FIG. 1, the controller 5 has a rectangular parallelepiped shape, but may have a shape that can be fitted on the user's arm, such as a wristband. In this case, the user can be prevented from dropping the controller 5 by mistake.

本体装置２は、ユーザが選択した楽曲の演奏音を出力するとともに、大勢の観客が入場したライブ会場の様子が記録された映像をモニタ３に表示させる。つまり、モニタ３には、ライブ会場のステージに立つシンガー（ユーザ）を見ている大勢の観客の映像が表示される。 The main unit 2 outputs the performance sound of the music selected by the user and displays on the monitor 3 a video in which the state of the live venue where many spectators entered is recorded. That is, the monitor 3 displays images of many spectators watching a singer (user) standing on the stage of the live venue.

ユーザは、選択した楽曲の歌を歌うだけでなく、マイク４を通して画面３ａ内の観客に呼び掛けたり、マイク４を観客に向けたりすることにより、観客に対して様々なライブパフォーマンスを行う。本体装置２は、マイク４及びコントローラ５から送信される動き情報に基づいて、ユーザの両手の動きを検出する。本体装置２は、マイク４に入力された音声や、マイク４及びコントローラ５の動き情報に基づいて、ユーザのライブパフォーマンスを特定する。特定されたライブパフォーマンスに対する観客の反応を示す映像が、モニタ３に表示される。このように、ユーザは、カラオケシステム１を利用することにより、実際にライブ会場でライブを行うシンガーのように振る舞うことができるため、臨場感の高いライブ演奏を疑似的に体験することができる。 The user not only sings the song of the selected music piece, but also performs various live performances for the audience by calling the audience in the screen 3a through the microphone 4 or pointing the microphone 4 at the audience. The main device 2 detects the movement of both hands of the user based on the movement information transmitted from the microphone 4 and the controller 5. The main device 2 identifies the user's live performance based on the sound input to the microphone 4 and the movement information of the microphone 4 and the controller 5. An image showing the audience reaction to the identified live performance is displayed on the monitor 3. In this way, by using the karaoke system 1, the user can behave like a singer who actually performs at a live venue, so that a live performance with a high sense of reality can be simulated.

次に、本体装置２の構成について説明する。図２は、本体装置２の機能的構成を示すブロック図である。本体装置２は、無線通信部２１と、データ取得部２２と、再生部２３と、音声認識部２４と、パフォーマンス特定部２５と、反応指示部２６と、記憶部２７と、出力部２８とを備える。 Next, the configuration of the main device 2 will be described. FIG. 2 is a block diagram illustrating a functional configuration of the main device 2. The main device 2 includes a wireless communication unit 21, a data acquisition unit 22, a reproduction unit 23, a voice recognition unit 24, a performance identification unit 25, a reaction instruction unit 26, a storage unit 27, and an output unit 28. Prepare.

無線通信部２１は、マイク４及びコントローラ５と無線通信を行い、マイク４及びコントローラ５から動き情報４２Ａ，５１Ａを取得する。無線通信には、Ｂｌｕｅｔｏｏｔｈ（登録商標）、無線ＬＡＮ、赤外線通信などを利用することができる。 The wireless communication unit 21 performs wireless communication with the microphone 4 and the controller 5 and acquires motion information 42 A and 51 A from the microphone 4 and the controller 5. For wireless communication, Bluetooth (registered trademark), wireless LAN, infrared communication, or the like can be used.

データ取得部２２は、ユーザが選択した楽曲に対応する楽曲データ６１、観客映像データ６２、複数のリアクションデータ６３，６３・・・を、インターネット等を介して楽曲サーバから取得する。楽曲データ６１は、ユーザが選択した楽曲の演奏音を記録したデータである。観客映像データ６２は、ライブ会場に入場した大勢の観客（聴衆）の状況を示す聴衆状況データであり、観客の映像及び音声を記録したデータである。ここで、聴衆とは、ユーザが楽曲に合わせて歌う歌をライブ会場で鑑賞する人々を指し、観客を含む概念である。リアクションデータ６３は、ライブパフォーマンスに対する観客のリアクションを示す映像及び音声が記録されたデータである。リアクションデータ６３は、複数のライブパフォーマンスのうちいずれか一つに対応するとともに、観客の反応レベルのいずれか一つに対応する。反応レベルの詳細については、後述する。 The data acquisition unit 22 acquires music data 61, audience video data 62, a plurality of reaction data 63, 63,... Corresponding to the music selected by the user from the music server via the Internet or the like. The music data 61 is data in which the performance sound of the music selected by the user is recorded. Audience video data 62 is audience status data indicating the status of a large number of spectators (audiences) who entered the live venue, and is data that records the video and audio of the audience. Here, the audience refers to people who appreciate the song that the user sings along with the music at the live venue, and is a concept that includes the audience. The reaction data 63 is data in which video and audio indicating the reaction of the audience to the live performance are recorded. The reaction data 63 corresponds to any one of a plurality of live performances and also corresponds to any one of the audience reaction levels. Details of the reaction level will be described later.

本実施の形態では、大勢の観客の状況を示す聴衆状況データとして、観客映像データ６２を用いる例を説明する。しかし、大勢の観客（聴衆）の状況を示す音声データを用いてもよい。具体的には、観客映像データ６２に代えて、観客（聴衆）の歓声などが記録された音声データなどを用いることができる。 In the present embodiment, an example will be described in which audience video data 62 is used as audience situation data indicating the situation of a large audience. However, audio data indicating the situation of a large audience (audience) may be used. Specifically, in place of the audience video data 62, audio data in which a cheer of the audience (audience) is recorded can be used.

再生部２３は、データ取得部２２が取得した楽曲データ６１及び観客映像データ６２を再生する。音声認識部２４は、マイク４から送信された音声データ４Ａに対し音声認識処理を行い、ユーザがマイク４に入力したフレーズを検出する。音声認識部２４は、検出したフレーズを示すフレーズ情報２４Ａを出力する。 The reproduction unit 23 reproduces the music data 61 and the audience video data 62 acquired by the data acquisition unit 22. The voice recognition unit 24 performs voice recognition processing on the voice data 4 A transmitted from the microphone 4 and detects a phrase input to the microphone 4 by the user. The voice recognition unit 24 outputs phrase information 24A indicating the detected phrase.

パフォーマンス特定部２５は、フレーズ情報２４Ａと、動き情報４２Ａ，５１Ａとに基づいて、ユーザが行ったライブパフォーマンスを特定する。パフォーマンス特定部２５は、フレーズ情報２４Ａが特定のフレーズ（シンガーが観客に呼び掛けるフレーズ）を含み、かつ、ユーザが特定のフレーズに対応した動作を行ったと判断した場合、ユーザがライブパフォーマンスを行ったと判断する。 The performance specifying unit 25 specifies the live performance performed by the user based on the phrase information 24A and the motion information 42A and 51A. The performance specifying unit 25 determines that the user has performed a live performance when the phrase information 24A includes a specific phrase (a phrase that the singer calls the audience) and the user has performed an action corresponding to the specific phrase. To do.

反応指示部２６は、特定されたライブパフォーマンスに対する観客の反応レベルを決定する。特定されたライブパフォーマンスと観客の反応レベルとに基づいて、観客のリアクションを示すリアクションデータ６３が選択される。反応指示部２６は、観客の反応レベルに基づいて、選択されたリアクションデータ６３の再生条件を決定する。再生条件とは、選択したリアクションデータ６３の再生時の音量などである。反応指示部２６は、選択したリアクションデータ６３を、決定した再生条件で再生することを再生部２３に指示する。 The reaction instruction unit 26 determines the audience reaction level for the specified live performance. Reaction data 63 indicating the reaction of the audience is selected based on the identified live performance and the audience reaction level. The reaction instruction unit 26 determines the reproduction condition of the selected reaction data 63 based on the audience reaction level. The reproduction condition is a volume when the selected reaction data 63 is reproduced. The reaction instruction unit 26 instructs the reproduction unit 23 to reproduce the selected reaction data 63 under the determined reproduction condition.

記憶部２７は、ハードディスク装置などであり、データ取得部２２が取得した楽曲データ６１、映像データ６２及びリアクションデータ６３，６３，・・・を格納する。出力部２８は、再生部２３により生成された音声及び映像を、モニタ３に出力する。 The storage unit 27 is a hard disk device or the like, and stores music data 61, video data 62, and reaction data 63, 63,... Acquired by the data acquisition unit 22. The output unit 28 outputs the audio and video generated by the playback unit 23 to the monitor 3.

次に、マイク４の構成について説明する。図３は、マイク４の機能的構成を示すブロック図である。マイク４は、音声入力部４１と、センサ部４２と、無線通信部４３とを備える。 Next, the configuration of the microphone 4 will be described. FIG. 3 is a block diagram showing a functional configuration of the microphone 4. The microphone 4 includes a voice input unit 41, a sensor unit 42, and a wireless communication unit 43.

音声入力部４１は、ユーザが発する音声を入力して電気信号に変換し、音声データ４Ａを出力する。センサ部４２は、マイク４の動きの変化を検出する加速度センサを備えており、マイク４の動きを示す動き情報４２Ａを出力する。センサ部４２は、加速度センサの他に、地磁気センサやジャイロスコープなどを備えていてもよい。無線通信部４３は、本体装置２と無線通信を行い、音声データ４Ａ及び動き情報４２Ａを本体装置２に送信する。 The voice input unit 41 inputs a voice uttered by the user, converts it into an electrical signal, and outputs voice data 4A. The sensor unit 42 includes an acceleration sensor that detects a change in the movement of the microphone 4, and outputs movement information 42 A indicating the movement of the microphone 4. The sensor unit 42 may include a geomagnetic sensor or a gyroscope in addition to the acceleration sensor. The wireless communication unit 43 performs wireless communication with the main body device 2 and transmits audio data 4A and motion information 42A to the main body device 2.

次に、コントローラ５の構成について説明する。図４は、コントローラ５の機能的構成を示すブロック図である。コントローラ５は、センサ部５１と、無線通信部５２とを備える。センサ部５１は、マイク４のセンサ部４２と同様に加速度センサを備えており、コントローラ５の動きを示す動き情報５１Ａを出力する。無線通信部５２は、本体装置２と無線通信を行い、動き情報５１Ａを本体装置２に送信する。 Next, the configuration of the controller 5 will be described. FIG. 4 is a block diagram showing a functional configuration of the controller 5. The controller 5 includes a sensor unit 51 and a wireless communication unit 52. The sensor unit 51 includes an acceleration sensor similarly to the sensor unit 42 of the microphone 4, and outputs motion information 51 A indicating the motion of the controller 5. The wireless communication unit 52 performs wireless communication with the main body device 2 and transmits motion information 51 A to the main body device 2.

なお、カラオケシステム１は、ユーザの動きを撮影するカメラを備えていてもよい。この場合、本体装置２は、映像データを解析してユーザの動きを検出する画像解析部を備える。カメラは、たとえば、モニタ３の上に設置され、ユーザを撮影した映像データをリアルタイムに本体装置２に出力する。ユーザがモニタ３及びカメラの前でライブパフォーマンスを行うことにより、画像処理部は、ユーザの動きを示す動き情報を生成して出力する。パフォーマンス特定部２５は、フレーズ情報２４Ａと、画像解析部から出力される動き情報とに基づいて、ユーザが行ったライブパフォーマンスを特定する。この場合、カラオケシステム１は、コントローラ５を備える必要はなく、マイク４にセンサ部４２を設けなくてもよい。 The karaoke system 1 may include a camera that captures a user's movement. In this case, the main body device 2 includes an image analysis unit that analyzes video data and detects a user's movement. For example, the camera is installed on the monitor 3 and outputs video data obtained by photographing the user to the main unit 2 in real time. When the user performs a live performance in front of the monitor 3 and the camera, the image processing unit generates and outputs motion information indicating the user's motion. The performance specifying unit 25 specifies the live performance performed by the user based on the phrase information 24A and the motion information output from the image analysis unit. In this case, the karaoke system 1 does not need to include the controller 5, and the sensor unit 42 may not be provided in the microphone 4.

｛２．本体装置２の動作｝
以下、ユーザのライブパフォーマンスに応じて、モニタ３から出力される観客の映像及び音声が変更される処理について、本体装置２の動作を中心に説明する。以下の説明では、マイク４及びコントローラ５から送信される動き情報４２Ａ，５１Ａを用いる例について説明する。カメラを用いてユーザの動きを検出する場合であっても、同様の処理が行われる。 {2. Operation of main device 2}
Hereinafter, the process of changing the video and audio of the spectator output from the monitor 3 in accordance with the user's live performance will be described focusing on the operation of the main device 2. In the following description, an example using motion information 42A and 51A transmitted from the microphone 4 and the controller 5 will be described. Similar processing is performed even when a user's movement is detected using a camera.

図５は、本体装置２の動作の流れを示すフローチャートである。最初に、ユーザは、図示しないリモコンを操作して、カラオケで歌いたい楽曲を選択する。データ取得部２２は、ユーザにより選択された楽曲に対応する楽曲データ６１と、観客映像データ６２と、リアクションデータ６３，６３，・・・とを楽曲サーバから取得して、記憶部２７に格納する（ステップＳ１）。 FIG. 5 is a flowchart showing an operation flow of the main device 2. First, the user operates a remote controller (not shown) to select a song that he wants to sing at karaoke. The data acquisition unit 22 acquires music data 61 corresponding to the music selected by the user, audience video data 62, reaction data 63, 63,... From the music server and stores them in the storage unit 27. (Step S1).

再生部２３は、楽曲データ６１及び観客映像データ６２の再生を開始する（ステップＳ２）。これにより、ユーザが選択した楽曲の演奏音と、ライブ会場でライブを楽しむ観客の映像及び歓声とが、モニタ３から出力される。 The reproducing unit 23 starts reproducing the music data 61 and the audience video data 62 (step S2). Thereby, the performance sound of the music selected by the user and the video and cheers of the audience enjoying the live performance at the live venue are output from the monitor 3.

図６は、多くの観客の映像が表示されたモニタ３を示す図である。ユーザは、画面３ａに多くの観客の映像が表示されたモニタ３の前に立つことにより、画面３ａ内の多くの観客と向かい合う。すなわち、ユーザは、モニタ３に映し出された観客に対してライブを行うシンガーとして、楽曲の演奏音に合わせて歌う。ユーザは、ライブを行うシンガーとして、歌を歌いながら、様々なライブパフォーマンスを行うことができる。 FIG. 6 is a diagram showing the monitor 3 on which many spectator videos are displayed. The user stands in front of the monitor 3 on which images of many spectators are displayed on the screen 3a, so as to face many spectators in the screen 3a. That is, the user sings along with the performance sound of the music as a singer that performs live for the audience displayed on the monitor 3. The user can perform various live performances while singing a song as a live singer.

本体装置２は、楽曲データ６１の再生が終了するまで（ステップＳ３においてＹｅｓ）、ステップＳ４〜ステップＳ９の処理を繰り返し実行する。 The main device 2 repeatedly executes the processes of steps S4 to S9 until the reproduction of the music data 61 ends (Yes in step S3).

マイク４は、音声入力部４１に入力された音声フレーズを、音声データ４Ａとしてリアルタイムに本体装置２に送信している。本体装置２が音声データ４Ａを受信した場合（ステップＳ４においてＹｅｓ）、音声認識部２４は、音声データ４Ａに対する音声認識処理を実行する（ステップＳ５）。音声認識部２４は、ユーザの発した音声フレーズを記録したフレーズ情報２４Ａを出力する。パフォーマンス特定部２５は、フレーズ情報２４Ａに基づいて、ユーザが画面３ａ内の観客に対して特定のフレーズを呼び掛けた否かを判定する（ステップＳ６）。 The microphone 4 is transmitting the audio | voice phrase input into the audio | voice input part 41 to the main body apparatus 2 in real time as audio | voice data 4A. When the main device 2 receives the voice data 4A (Yes in Step S4), the voice recognition unit 24 executes voice recognition processing on the voice data 4A (Step S5). The voice recognition unit 24 outputs phrase information 24A in which a voice phrase issued by the user is recorded. Based on the phrase information 24A, the performance specifying unit 25 determines whether or not the user has called a specific phrase to the audience in the screen 3a (step S6).

フレーズ情報２４Ａが特定のフレーズを含まない場合（ステップＳ６においてＮｏ）、マイク４に入力された音声は、ユーザの歌声であると判定される。本体装置２は、ステップＳ３の処理に戻る。一方、フレーズ情報２４Ａが特定のフレーズを含む場合（ステップＳ６においてＹｅｓ）、パフォーマンス特定部２５は、ユーザが観客に特定のフレーズを呼び掛けたと判断する。そして、パフォーマンス特定部２５は、特定のフレーズに応じた動きをしたか否かを、動き情報４２Ａ，５１Ａに基づいて確認する。 When the phrase information 24A does not include a specific phrase (No in step S6), it is determined that the voice input to the microphone 4 is the user's singing voice. The main device 2 returns to the process of step S3. On the other hand, when the phrase information 24A includes a specific phrase (Yes in step S6), the performance specifying unit 25 determines that the user has called the audience for the specific phrase. And the performance specific | specification part 25 confirms whether it moved according to a specific phrase based on movement information 42A, 51A.

マイク４は、センサ部４２が出力する動き情報４２Ａを本体装置２にリアルタイムに送信する。同様に、コントローラ５は、センサ部５１が出力する動き情報５１Ａを本体装置２にリアルタイムに送信する。パフォーマンス特定部２５は、動き情報４２Ａ，５１Ａに基づいて、特定のフレーズに応じた動きをユーザが行ったか否かを判定する（ステップＳ７）。 The microphone 4 transmits the movement information 42A output from the sensor unit 42 to the main body device 2 in real time. Similarly, the controller 5 transmits the motion information 51A output from the sensor unit 51 to the main body device 2 in real time. The performance specifying unit 25 determines whether or not the user has made a movement according to the specific phrase based on the movement information 42A and 51A (step S7).

特定のフレーズに応じた動きが検出されなかった場合（ステップＳ７においてＮｏ）、本体装置２は、ステップＳ３の処理に戻る。 If no movement corresponding to the specific phrase is detected (No in step S7), main device 2 returns to the process in step S3.

特定のフレーズに応じた動きが検出された場合（ステップＳ７においてＹｅｓ）、パフォーマンス特定部２５は、ユーザが画面３ａ内の観客に対してライブパフォーマンスを行ったと判定する。ユーザが行うライブパフォーマンスとして、観客煽り、観客の合唱の先導、手拍子の先導などが挙げられる。 When the movement according to the specific phrase is detected (Yes in step S7), the performance specifying unit 25 determines that the user has performed the live performance on the audience in the screen 3a. Examples of live performances performed by the user include audience enthusiasts, audience chorus leaders, and clapping leaders.

反応指示部２６は、フレーズ情報２４Ａ及び動き情報４２Ａ，５１Ａを用いて、ユーザのライブパフォーマンスに対する観客の反応レベルを決定する（ステップＳ８）。反応指示部２６は、ユーザが行ったライブパフォーマンスと、反応レベルとに基づいて、リアクションデータ６３を選択する。たとえば、反応レベルが高い場合、反応指示部２６は、ユーザのライブパフォーマンスに対して全観客が反応する映像が記録されたリアクションデータ６３を選択する。 The reaction instruction unit 26 uses the phrase information 24A and the movement information 42A, 51A to determine the audience reaction level with respect to the user's live performance (step S8). The reaction instruction unit 26 selects the reaction data 63 based on the live performance performed by the user and the reaction level. For example, when the reaction level is high, the reaction instructing unit 26 selects the reaction data 63 in which an image in which the entire audience responds to the user's live performance is recorded.

反応指示部２６は、選択されたリアクションデータ６３の再生を、再生部２３に対して指示する。再生部２３は、指示されたリアクションデータ６３を、再生中の楽曲データ６１とともに再生する（ステップＳ９）。この結果、ユーザのライブパフォーマンスに対する観客のリアクションが画面３ａに表示されるとともに、楽曲の演奏音と、観客の歓声とが重なってモニタ３から出力される。 The reaction instruction unit 26 instructs the reproduction unit 23 to reproduce the selected reaction data 63. The reproducing unit 23 reproduces the instructed reaction data 63 together with the music data 61 being reproduced (step S9). As a result, the reaction of the audience with respect to the user's live performance is displayed on the screen 3a, and the performance sound of the music and the cheer of the audience overlap and are output from the monitor 3.

反応指示部２６は、リアクションデータ６３の再生条件を反応レベルに応じて変更してもよい。たとえば、反応レベルが高ければ、観客の歓声の音量を大きくしてもよい。反応レベルが低ければ、観客の歓声の音量を小さくしたり、観客の歓声が記録されたデータの再生速度を遅くしたりしてもよい。 The reaction instruction unit 26 may change the reproduction condition of the reaction data 63 according to the reaction level. For example, if the reaction level is high, the audience cheer volume may be increased. If the response level is low, the audience cheer volume may be reduced, or the playback speed of data recorded with the audience cheer may be reduced.

図５に示すフローチャートでは、音声データ４Ａから特定のフレーズを検出した後に、検出した特定のフレーズに応じた動きが行われたか否かを判定している。しかし、ライブパフォーマンスの種別に応じて、ステップＳ４〜Ｓ６の処理と、動き検出（ステップＳ７）の処理の順序とを入れ替えてもよい。また、特定のフレーズ及びユーザの動きのいずれか一方に基づいて、ユーザのライブパフォーマンスを特定してもよい。 In the flowchart shown in FIG. 5, after detecting a specific phrase from the audio data 4 A, it is determined whether or not a movement according to the detected specific phrase has been performed. However, the processing order of steps S4 to S6 and the order of motion detection (step S7) may be interchanged according to the type of live performance. Moreover, you may identify a user's live performance based on any one of a specific phrase and a user's movement.

本実施の形態では、観客映像データ６２と、リアクションデータ６３とが異なるデータである例を説明しているが、これに限られない。たとえば、観客映像データ６２の各観客をオブジェクト化してもよい。この場合、本体装置２は、特定されたライブフォーマンと反応レベルとに基づいて、各観客オブジェクトの動きを変更する。たとえば、本体装置２は、特定されたライブパフォーマンスに対する観客の反応として、各観客オブジェクトの動作や歓声を変更することができる。また、反応指示部２６は、反応レベルに応じて、ユーザのライブパフォーマンスに対して反応する観客オブジェクトの割合を変化させることができる。 In the present embodiment, an example is described in which the audience video data 62 and the reaction data 63 are different data, but the present invention is not limited to this. For example, each audience in the audience video data 62 may be converted into an object. In this case, the main body device 2 changes the movement of each audience object based on the specified live forman and reaction level. For example, the main device 2 can change the behavior and cheers of each spectator object as a spectator reaction to the specified live performance. Further, the reaction instruction unit 26 can change the proportion of the audience objects that react to the user's live performance according to the reaction level.

｛３．ライブパフォーマンスの具体例｝
以下、本体装置２が検出することができるライブパフォーマンスの具体例について説明する。 {3. Specific example of live performance}
Hereinafter, a specific example of live performance that can be detected by the main device 2 will be described.

｛３．１．観客煽り｝
図７Ａ及び図７Ｂは、ユーザが画面３ａ内の観客３１に対して観客煽りを行う手順を示す図である。図７Ａ及び図７Ｂでは、画面３ａ内の観客３１の動きを分かりやすく説明するために、画面３ａ内に一人の観客３１のみを表示している。実際には、再生部２３が観客映像データ６２を再生することにより、ライブ会場と、ライブ会場に来ている多数の観客３１とが画面３ａに表示される。図７Ａ、図７Ｂにおいて、本体装置２及びコントローラ５の表示を省略している。 {3.1. Spend the audience}
FIG. 7A and FIG. 7B are diagrams illustrating a procedure in which the user strikes the audience 31 in the screen 3a. 7A and 7B, only one spectator 31 is displayed in the screen 3a in order to easily explain the movement of the spectator 31 in the screen 3a. Actually, the reproduction unit 23 reproduces the audience video data 62, so that the live venue and a large number of spectators 31 coming to the live venue are displayed on the screen 3a. 7A and 7B, the display of the main body device 2 and the controller 5 is omitted.

観客煽りとは、ライブ中のシンガーが質問形式のフレーズを観客に対して呼びかけ、観客がシンガーに対して一斉に返事をするというライブパフォーマンスである。ユーザは、楽曲の前奏あるいは間奏が再生されているときなどに、観客煽りを行うことができる。 Audience resentment is a live performance in which a live singer calls a question-type phrase to the audience and the audience responds to the singer all at once. The user can perform audience enthusiasm when a prelude or interlude of music is being played.

本体装置２は、ライブパフォーマンスとして観客煽りを特定する場合、マイク４から送信される音声データ４Ａ及び動き情報４２Ａを使用し、コントローラ５から送信される動き情報５１Ａを使用しない。 When the main unit 2 specifies the audience excitement as the live performance, the main unit 2 uses the audio data 4A and the motion information 42A transmitted from the microphone 4, and does not use the motion information 51A transmitted from the controller 5.

本体装置２は、ライブパフォーマンスとして観客煽りを特定する場合、特定のフレーズの検出を先に行う。図７Ａに示すように、ユーザは、最初に、マイク４をユーザ自身に向けながら、たとえば、「のってるかーい？」という質問形式の特定のフレーズ（以下、「煽りフレーズ」と呼ぶ。）をマイク４に入力する。マイク４は、煽りフレーズを音声データ４Ａとして本体装置２に送信する。 The main device 2 first detects a specific phrase when the audience performance is specified as the live performance. As shown in FIG. 7A, the user first directs the microphone 4 toward the user himself, for example, a specific phrase in a question format of “Do n’t you?” (Hereinafter, referred to as “buzz phrase”). Is input to the microphone 4. The microphone 4 transmits the beat phrase as audio data 4A to the main device 2.

音声認識部２４は、受信した音声データ４１Ａに対して音声認識処理を行って（ステップＳ５）、フレーズ情報２４Ａを生成する。パフォーマンス特定部２５には、シンガーがライブパフォーマンスの際に観客に呼び掛ける様々なフレーズが設定されている。パフォーマンス特定部２５は、フレーズ情報２４Ａと、設定された様々なフレーズとを比較することにより、ユーザが煽りフレーズを観客に呼び掛けたと判断する（ステップＳ６においてＹｅｓ）。 The voice recognition unit 24 performs voice recognition processing on the received voice data 41A (step S5) to generate phrase information 24A. Various phrases that the singer calls to the audience during the live performance are set in the performance specifying unit 25. The performance specifying unit 25 compares the phrase information 24A with the various set phrases, and determines that the user has called the utterance phrase to the audience (Yes in step S6).

ユーザは、ユーザ自身に向けたマイク４に煽りフレーズを入力した後に、マイク４をユーザ自身の方向から画面３ａの方向に向ける。つまり、ユーザは、矢印４５（図７Ａ参照）に示すように、マイク４の向きを反転させる。マイク４のセンサ部４２は、ユーザの動きに応じた動き情報４２Ａを生成する。パフォーマンス特定部２５は、マイク４から送信された動き情報４２Ａに基づいて、マイク４の向きが反転されたと判定する。パフォーマンス特定部２５は、煽りフレーズがマイク４に入力され、かつ、マイク４の向きが反転されたことから、ユーザがライブパフォーマンスとして観客煽りを行ったと判定する（ステップＳ７においてＹｅｓ）。 The user turns the microphone 4 in the direction of the screen 3a from the user's own direction after inputting the phrase to the microphone 4 directed toward the user himself / herself. That is, the user reverses the direction of the microphone 4 as indicated by the arrow 45 (see FIG. 7A). The sensor unit 42 of the microphone 4 generates motion information 42A corresponding to the user's motion. The performance specifying unit 25 determines that the direction of the microphone 4 is reversed based on the motion information 42 A transmitted from the microphone 4. The performance specifying unit 25 determines that the user has engaged in the audience as a live performance because the speaking phrase is input to the microphone 4 and the direction of the microphone 4 is reversed (Yes in step S7).

反応指示部２６は、ユーザが行った観客煽りに対する観客の反応レベルを決定する（ステップＳ８）。反応指示部２６は、煽りフレーズが入力された音声入力タイミングと、マイクの反転が検出された動き検出タイミングとの時間差（検出時間差）に基づいて、反応レベルを決定する。音声入力タイミングは、フレーズ情報２４Ａに含まれている。動き検出タイミングは、動き情報４２Ａに含まれている。 The reaction instructing unit 26 determines the audience reaction level with respect to the audience excitement performed by the user (step S8). The reaction instruction unit 26 determines the reaction level based on the time difference (detection time difference) between the voice input timing when the beat phrase is input and the motion detection timing when the inversion of the microphone is detected. The voice input timing is included in the phrase information 24A. The motion detection timing is included in the motion information 42A.

反応指示部２６には、反応レベルが最大となる、音声検出タイミングと動き検出タイミングとの時間差（理想時間差）が設定されている。反応指示部２６は、理想時間差と、検出時間差とのずれの大きさに基づいて反応レベルを決定する。たとえば、ずれが小さい場合、反応指示部２６は、観客煽りがスムーズに行われたと判断して、反応レベルを高くする。一方、マイクを反転させるタイミングが遅い場合など、ずれが大きい場合、煽りフレーズに対する観客の反応がばらつくと考えられる。この場合、反応レベルは低下する。また、反応指示部２６は、ユーザが観客煽りを、楽曲データ６１のどの再生位置で行ったかを考慮して、反応レベルを決定してもよい。 In the reaction instruction unit 26, a time difference (ideal time difference) between the voice detection timing and the motion detection timing at which the reaction level is maximized is set. The reaction instruction unit 26 determines the reaction level based on the magnitude of the difference between the ideal time difference and the detection time difference. For example, when the deviation is small, the reaction instruction unit 26 determines that the audience has been smoothly struck and increases the reaction level. On the other hand, if the deviation is large, such as when the timing to invert the microphone is late, it is considered that the audience's reaction to the utterance phrase varies. In this case, the reaction level decreases. In addition, the reaction instruction unit 26 may determine the reaction level in consideration of the reproduction position in the music data 61 where the user has engaged the audience.

反応指示部２６は、観客煽りに対するリアクションデータ６３を、決定した反応レベルに基づいて、記憶部２７に格納されたリアクションデータ６３，６３，・・・の中から選択する。反応レベルが高ければ、観客全員が観客煽りに対して反応する映像が記録されたリアクションデータ６３が選択される。反応レベルが低ければ、観客煽りに対して反応する観客の数が少ない映像が記録されたリアクションデータ６３が選択される。 The reaction instruction unit 26 selects the reaction data 63 for the audience audience from the reaction data 63, 63,... Stored in the storage unit 27 based on the determined reaction level. If the reaction level is high, the reaction data 63 in which a video in which all the spectators react to the audience is recorded is selected. If the reaction level is low, the reaction data 63 in which a video with a small number of spectators who react to the audience sensation is recorded is selected.

反応指示部２６は、選択したリアクションデータ６３の再生を、再生部２３に指示する。このとき、選択されたリアクションデータ６３の再生条件も、再生部２３に通知される。再生条件は、観客煽りに対する反応レベルに基づいて決定される。たとえば、反応レベルが高ければ、リアクションデータ６３の再生時の音量が大きくなる。反応レベルが低ければ、リアクションデータ６３の再生時の音量を小さくしたり、観客の歓声の再生速度を小さくしたりすることができる。 The reaction instruction unit 26 instructs the reproduction unit 23 to reproduce the selected reaction data 63. At this time, the playback condition of the selected reaction data 63 is also notified to the playback unit 23. The reproduction condition is determined based on the reaction level with respect to the audience. For example, if the reaction level is high, the volume when the reaction data 63 is reproduced increases. If the reaction level is low, the volume of the reaction data 63 during playback can be reduced, and the playback speed of the audience cheer can be reduced.

再生部２３は、再生中の楽曲データに合わせて、反応指示部２６に指示されたリアクションデータ６３を再生する（ステップＳ９）。この結果、図７Ｂに示すように、観客煽りに対する観客の反応として、画面３ａ内の観客３１が右手を挙げるようすが画面３ａに表示されるとともに、「イェーイ！」という観客の歓声が再生される。実際には、画面３ａには、多くの観客が一斉に観客煽りに対して反応する様子が表示されるため、ユーザは、ライブ会場でのライブパフォーマンスを疑似的に体験することができる。観客煽りのやり方によっては、ライブ会場内の観客３１の反応が大きく変化するため、ユーザは、実際にライブを行うシンガーと同様の緊張感を楽しむことが可能となる。 The reproduction unit 23 reproduces the reaction data 63 instructed by the reaction instruction unit 26 in accordance with the music data being reproduced (step S9). As a result, as shown in FIG. 7B, the audience 31 in the screen 3a raises the right hand as a response of the audience to the audience, but is displayed on the screen 3a and the cheer of the audience “Yay!” Is reproduced. . Actually, since the screen 3a displays how many spectators react to the audience at once, the user can experience the live performance at the live venue in a pseudo manner. Depending on how the audience is enthusiastic, the reaction of the audience 31 in the live venue changes greatly, so that the user can enjoy the same tension as a singer who actually performs live.

｛３．２．合唱の先導｝
ユーザは、ライブパフォーマンスとして、画面３ａ内の観客に対して、ユーザが歌っている楽曲の合唱を先導することができる。図８Ａ及び図８Ｂは、ユーザが観客の合唱を先導する手順を示す図である。 {3.2. Leading chorus}
As a live performance, the user can lead the chorus of the music sung by the user to the audience in the screen 3a. 8A and 8B are diagrams illustrating a procedure in which the user leads the audience's chorus.

図８Ａ及び図８Ｂは、図７Ａと同様に、画面３ａ内の観客の動きを分かりやすく説明するために、二人の観客３１，３１のみを表示している。図８Ａ、図８Ｂにおいて、本体装置２及びコントローラ５の表示を省略している。 8A and 8B show only two spectators 31 and 31 in order to explain the movement of the spectators in the screen 3a in an easy-to-understand manner, as in FIG. 7A. 8A and 8B, the display of the main body device 2 and the controller 5 is omitted.

本体装置２は、合唱を先導するライブパフォーマンスが行われたと判定する場合には、マイク４から送信される音声データ４Ａ及び動き情報４２Ａを使用する。コントローラ５から送信される動き情報５１Ａは、合唱の先導の判定に用いられない。 When determining that the live performance leading the chorus has been performed, the main device 2 uses the audio data 4A and the motion information 42A transmitted from the microphone 4. The motion information 51A transmitted from the controller 5 is not used for the determination of the chorus lead.

図８Ａに示すように、楽曲データ６１が再生されている間、画面３ａには、ライブ会場の観客の映像と、再生されている楽曲の歌詞３２とが表示されている。再生されている楽曲の中で観客の合唱が可能な部分（たとえば、楽曲のサビの部分）に対応する歌詞３２が、図８Ａに示すように、四角い枠で囲まれて表示される。これにより、ユーザは、合唱を先導するタイミングを知ることができる。 As shown in FIG. 8A, while the music data 61 is being reproduced, the screen 3a displays the video of the audience in the live venue and the lyrics 32 of the music being reproduced. As shown in FIG. 8A, the lyrics 32 corresponding to the portion (for example, the chorus portion of the song) that can be sung by the audience in the reproduced music are displayed surrounded by a square frame. Thereby, the user can know the timing which leads chorus.

本体装置２は、ライブパフォーマンスとして合唱の先導を特定する場合、ユーザが合唱を観客に呼び掛ける音声フレーズ（以下、「合唱フレーズ」と呼ぶ。）を先に検出する。ユーザは、マイク４をユーザ自身に向けながら、たとえば、「一緒に歌おう！」という合唱フレーズをマイク４に入力する。マイク４は、入力された合唱フレーズを音声データ４Ａとして本体装置２に送信する。上記と同様の手順で、パフォーマンス特定部２５は、ユーザが合唱フレーズを呼び掛けたと判定する。 When the main device 2 specifies the chorus lead as the live performance, the main device 2 first detects a voice phrase (hereinafter referred to as “choral phrase”) that calls the chorus to the audience. For example, the user inputs a chorus phrase “Let's sing together!” To the microphone 4 while pointing the microphone 4 at the user himself / herself. The microphone 4 transmits the input choral phrase to the main device 2 as audio data 4A. In the same procedure as described above, the performance specifying unit 25 determines that the user has called for a choral phrase.

ユーザは、合唱フレーズをマイク４に入力した後に、マイク４を画面３ａの方向（矢印４５の方向）へ反転させる。パフォーマンス特定部２５は、マイク４から送信された動き情報４２Ａに基づいて、マイク４の向きが反転されたと判定する。パフォーマンス特定部２５は、合唱フレーズがマイク４に入力され、かつ、マイク４の向きが反転されたことから、ユーザがライブパフォーマンスとして合唱を先導していると判定する。 After inputting the choral phrase into the microphone 4, the user reverses the microphone 4 in the direction of the screen 3 a (the direction of the arrow 45). The performance specifying unit 25 determines that the direction of the microphone 4 is reversed based on the motion information 42 A transmitted from the microphone 4. Since the chorus phrase is input to the microphone 4 and the direction of the microphone 4 is reversed, the performance specifying unit 25 determines that the user is leading chorus as a live performance.

反応指示部２６は、ユーザが行った合唱の先導に対する反応レベルを決定する。反応レベルの決定手順については、上記と同様であるため、その説明を省略する。 The reaction instruction unit 26 determines a reaction level for the lead of the chorus performed by the user. Since the procedure for determining the reaction level is the same as described above, the description thereof is omitted.

反応指示部２６は、観客が合唱をするリアクションデータ６３，６３，・・・のうち、決定した反応レベルに対応するリアクションデータ６３の再生を指示する。リアクションデータ６３の再生条件も再生部２３に通知される。これにより、合唱する観客の数や合唱の音量は、反応レベルに応じて変化する。図８Ｂに示すように、ユーザが行った合唱の要求に対する反応として、画面３ａ内の観客３１，３１が肩を組んで合唱する様子が表示される。また、ユーザが合唱を先導した歌詞に対応する観客の歌声が、モニタ３から出力される。実際には、画面３ａには、多くの観客が一斉に合唱をする様子が表示されるため、ユーザは、ライブ会場で、観客と一緒に歌を歌うというライブパフォーマンスを疑似的に体験することができる。 The reaction instruction unit 26 instructs the reproduction of the reaction data 63 corresponding to the determined reaction level among the reaction data 63, 63,. The playback condition of the reaction data 63 is also notified to the playback unit 23. Thereby, the number of spectators who sing and the volume of the chorus change according to the reaction level. As shown in FIG. 8B, as a response to the chorus request made by the user, a state in which the spectators 31, 31 in the screen 3a sing with their shoulders crossed is displayed. In addition, the singing voice of the audience corresponding to the lyrics that the user has led the chorus is output from the monitor 3. Actually, the screen 3a displays a state in which many spectators sing at the same time, so the user can experience a live performance of singing a song together with the spectator at the live venue. it can.

観客の合唱は、ユーザがマイク４を動かさない限り継続される。ユーザは、観客の合唱を停止させる場合、マイク４を画面３ａの方向からユーザ自身の方向へ反転させる。パフォーマンス特定部２５は、リアクションデータ６３の再生中にマイク４の向きが反転した場合、合唱を先導が終了したと判定する。これにより、楽曲データ６１及び観客映像データ６２の再生が再開される。 The audience's chorus is continued unless the user moves the microphone 4. When the user stops the chorus of the audience, the microphone 4 is reversed from the direction of the screen 3a to the user's own direction. If the direction of the microphone 4 is reversed during the reproduction of the reaction data 63, the performance specifying unit 25 determines that the lead has been completed. Thereby, the reproduction of the music data 61 and the audience video data 62 is resumed.

｛３．３．手拍子の先導｝
ユーザは、ライブパフォーマンスとして、画面３ａ内の観客３１に対して手拍子を先導する動作を行うことができる。図９Ａ及び図９Ｂは、ユーザが画面３ａ内の観客３１，３１に対して手拍子を先導する手順を示す図である。 {3.3. Leading hand clapping}
As a live performance, the user can perform an operation of leading clapping to the audience 31 in the screen 3a. FIG. 9A and FIG. 9B are diagrams showing a procedure in which the user leads the hand clapping to the spectators 31 and 31 in the screen 3a.

図９Ａ及び図９Ｂは、図７Ａと同様に、画面３ａ内の観客３１の動きを分かりやすく説明するために、二人の観客３１，３１のみを表示している。図９Ａ、図９Ｂにおいて、本体装置２の表示を省略している。本体装置２は、手拍子の先導を検出する場合、マイク４から送信される音声データ４Ａ、動き情報４２Ａと、コントローラ５から送信される動き情報５１Ａを使用する。 9A and 9B show only two spectators 31 and 31 in order to explain the movement of the spectators 31 in the screen 3a in an easy-to-understand manner, as in FIG. 7A. 9A and 9B, the display of the main device 2 is omitted. The main body device 2 uses the audio data 4A, the motion information 42A transmitted from the microphone 4 and the motion information 51A transmitted from the controller 5 when detecting the lead of the clapping.

図９Ａに示すように、楽曲データ６１が再生されている間、ユーザは、ライブパフォーマンスとして手拍子を先導することができる。本体装置２は、手拍子の先導を検出する場合、手拍子を先導する音声フレーズ（以下、「手拍子フレーズ」と呼ぶ。）を先に検出する（ステップＳ６においてＹｅｓ）。 As shown in FIG. 9A, while the music data 61 is being played back, the user can lead clapping as a live performance. When the main device 2 detects the lead of the hand time signature, the main device 2 first detects a voice phrase that leads the hand time signature (hereinafter referred to as a “hand time phrase”) (Yes in step S6).

具体的には、ユーザは、マイク４に手拍子フレーズを入力する。手拍子フレーズは、たとえば、「みんな、手拍子よろしく！！」などである。マイク４は、入力された手拍子フレーズを音声データ４Ａとして本体装置２に送信する。上記と同様の手順で、パフォーマンス特定部２５は、ユーザが画面３ａ内の観客に対して、手拍子フレーズを呼び掛けたと判定する。 Specifically, the user inputs a clapping phrase into the microphone 4. The clapping phrase is, for example, “Everyone clapping!” The microphone 4 transmits the input clapping phrase to the main device 2 as audio data 4A. In the same procedure as described above, the performance specifying unit 25 determines that the user has called a clapping phrase to the audience in the screen 3a.

ユーザは、手拍子フレーズをマイク４に入力した後に、手拍子を先導する動作を行う。図９Ｂに示すように、ユーザは、楽曲の演奏音（再生される楽曲データ６１）のリズムに合わせて手拍子をする。ユーザが右手にマイク４を、左手にコントローラ５を持っているため、マイク４及びコントローラ５は、手拍子の動きを示す動き情報４２Ａ，５１Ａを本体装置２に送信する。 The user performs an operation of leading the clapping after inputting the clapping phrase to the microphone 4. As shown in FIG. 9B, the user clapping in time with the rhythm of the music performance sound (reproduced music data 61). Since the user has the microphone 4 in the right hand and the controller 5 in the left hand, the microphone 4 and the controller 5 transmit motion information 42 A and 51 A indicating the movement of the hand clapping to the main body device 2.

パフォーマンス特定部２５は、受信した動き情報４２Ａ，５１Ａに基づいて、ユーザの両手の動きを検出する。たとえば、マイク４及びコントローラ５が左右に繰り返し動き、かつ、マイク４が動く方向とコントローラ５が動く方向とが反対方向である場合、パフォーマンス特定部２５は、ユーザが手拍子の動作をしていると判断することができる（ステップＳ７においてＹｅｓ）。つまり、パフォーマンス特定部２５が、手拍子フレーズと、手拍子の動作とを検出することにより、ユーザが手拍子を先導していると判断する。 The performance specifying unit 25 detects the movement of both hands of the user based on the received movement information 42A and 51A. For example, when the microphone 4 and the controller 5 repeatedly move left and right, and the direction in which the microphone 4 moves and the direction in which the controller 5 moves are opposite directions, the performance specifying unit 25 indicates that the user is clapping. It can be judged (Yes in step S7). In other words, the performance specifying unit 25 determines that the user is leading the clapping by detecting the clapping phrase and the clapping action.

反応指示部２６は、手拍子フレーズを検出したタイミングと、手拍子の動きを検出したタイミングとに基づいて、観客の反応レベルを決定する（ステップＳ８）。反応レベルの決定手順は、基本的には、観客煽りの反応レベルの決定と同様である。また、反応指示部２６は、動き情報４２Ａ，５１Ａを用いて、ユーザの手拍子のリズムを算出し、算出したリズムに基づいて、反応レベルを決定してもよい。たとえば、反応指示部２６は、楽曲の演奏音のリズムとユーザの手拍子のリズムとのずれを検出し、検出したずれが小さいほど、反応レベルを高くすることができる。 The reaction instruction unit 26 determines the response level of the audience based on the timing at which the clapping phrase is detected and the timing at which the movement of the clapping is detected (step S8). The procedure for determining the reaction level is basically the same as that for determining the response level of the audience. Moreover, the reaction instruction | indication part 26 may calculate a user's clapping rhythm using movement information 42A, 51A, and may determine a reaction level based on the calculated rhythm. For example, the reaction instruction unit 26 can detect a deviation between the rhythm of the musical performance sound and the clapping rhythm of the user, and the reaction level can be increased as the detected deviation is smaller.

反応指示部２６は、観客が手拍子をするリアクションデータ６３，６３・・・のうち、決定した反応レベルに対応するリアクションデータ６３の再生を指示する。リアクションデータ６３の再生条件も、再生部２３に通知される。これにより、楽曲データ６１とともに、観客が手拍子をするリアクションデータ６３が再生されることにより（ステップＳ９）、図９Ｂに示すように、画面３ａ内の観客３１，３１が頭上で手拍子をする映像が表示される。画面３ａに表示する観客のうち、手拍子をする観客の割合は、反応レベルによって変化する。また、手拍子の音量も、反応レベルに基づいて決められた再生条件によって変化する。 The reaction instruction unit 26 instructs the reproduction of the reaction data 63 corresponding to the determined reaction level among the reaction data 63, 63. The playback condition of the reaction data 63 is also notified to the playback unit 23. As a result, the reaction data 63 that the audience is clapping with the music data 61 is reproduced (step S9), and as shown in FIG. 9B, the video of the audience 31, 31 in the screen 3a clapping overhead. Is displayed. Of the audience displayed on the screen 3a, the proportion of the audience clapping varies depending on the reaction level. In addition, the volume of the clapping also changes depending on the playback condition determined based on the reaction level.

このように、ユーザが手拍子をする動きに合わせて、画面３ａ内の観客３１，３１が手拍子をする映像が表示されるため、ユーザは、ライブ会場で、観客と一体となって手拍子をするというライブパフォーマンスを疑似的に体験することができる。 In this way, the video of the audience 31, 31 clapping in the screen 3a is displayed in accordance with the movement of the user clapping, so the user is clapping with the audience at the live venue. You can experience live performance in a simulated manner.

｛３．４．ウェーブ（両手を大きく振る動作）の先導｝
ユーザは、ライブパフォーマンスとして、画面３ａ内の観客に対してウェーブを先導する動作を行うことができる。ウェーブとは、ライブ会場の観客が、ライブ中の楽曲に合わせて両手を左右に大きく振る動作である。図１０Ａ及び図１０Ｂは、ユーザがウェーブを先導する手順を示す図である。 {3.4. Leading wave (motion of shaking both hands)}
As a live performance, the user can perform an operation of leading the wave to the audience in the screen 3a. A wave is an operation in which the audience at a live venue shake their hands greatly to the left and right according to the music being played. 10A and 10B are diagrams illustrating a procedure in which a user leads a wave.

図１０Ａ及び図１０Ｂは、図９Ａと同様に、画面３ａ内の観客の動きを分かりやすく説明するために、二人の観客３１，３１のみを表示している。図１０Ａ、図１０Ｂにおいて、本体装置２の表示を省略している。本体装置２は、ウェーブの先導を検出する場合、マイク４から送信される音声データ４Ａ，動き情報４２Ａと、コントローラ５から送信される動き情報５１Ａを使用する。 10A and 10B show only two spectators 31 and 31 in order to explain the movement of the spectators in the screen 3a in an easy-to-understand manner, as in FIG. 9A. 10A and 10B, the display of the main device 2 is omitted. When detecting the leading of the wave, the main body device 2 uses the audio data 4A and motion information 42A transmitted from the microphone 4 and the motion information 51A transmitted from the controller 5.

図１０Ａに示すように、楽曲データ６１が再生されている間、ユーザは、ライブパフォーマンスとしてウェーブを先導することができる。本体装置２は、ウェーブの先導を検出する場合、ウェーブを呼び掛ける音声フレーズ（以下、「ウェーブフレーズ」と呼ぶ。）を先に検出する。 As shown in FIG. 10A, while the music data 61 is being reproduced, the user can lead the wave as a live performance. When the main device 2 detects the leading of the wave, the main device 2 first detects an audio phrase that calls the wave (hereinafter referred to as a “wave phrase”).

具体的には、ユーザは、ウェーブフレーズをマイク４に入力する。ウェーブフレーズは、たとえば、「みんな、両手を大きく振って！！」である。マイク４は、入力されたウェーブフレーズを音声データ４Ａとして本体装置２に送信する。上記と同様の手順で、パフォーマンス特定部２５は、ユーザが画面３ａ内の観客に対してウェーブフレーズを呼び掛けたと判定する（ステップＳ６においてＹｅｓ）。 Specifically, the user inputs a wave phrase to the microphone 4. For example, the wave phrase is “Everyone shake both hands!”. The microphone 4 transmits the input wave phrase to the main unit 2 as audio data 4A. In the same procedure as described above, the performance specifying unit 25 determines that the user has called a wave phrase to the audience in the screen 3a (Yes in step S6).

ユーザは、ウェーブフレーズをマイク４に入力した後に、ウェーブの動作を行う。図１０Ｂに示すように、ユーザは、楽曲の演奏音（再生される楽曲データ６１）のリズムに合わせて、頭上で両手を大きく振る。ユーザが右手にマイク４を、左手にコントローラ５を持っているため、マイク４及びコントローラ５は、ウェーブの動きを示す動き情報４２Ａ，５１Ａを本体装置２に送信する。 The user performs a wave operation after inputting the wave phrase to the microphone 4. As shown in FIG. 10B, the user shakes his hands greatly over the head in accordance with the rhythm of the performance sound of the music (music data 61 to be reproduced). Since the user has the microphone 4 on the right hand and the controller 5 on the left hand, the microphone 4 and the controller 5 transmit motion information 42 A and 51 A indicating wave motion to the main body device 2.

パフォーマンス特定部２５は、受信した動き情報４２Ａ，５１Ａに基づいて、ユーザの両手の動きを検出する。たとえば、パフォーマンス特定部２５が、マイク４及びコントローラ５が左右に繰り返し大きく動く動作を検出し、かつ、マイク４が動く方向と、コントローラ５が動く方向と一致する場合、ユーザがウェーブの動作をしていると判断する。 The performance specifying unit 25 detects the movement of both hands of the user based on the received movement information 42A and 51A. For example, if the performance specifying unit 25 detects an operation in which the microphone 4 and the controller 5 repeatedly move left and right, and the direction in which the microphone 4 moves matches the direction in which the controller 5 moves, the user performs a wave operation. Judge that

反応指示部２６は、ユーザがウェーブの先導をしていると判定した場合（ステップＳ７においてＹｅｓ）、観客の反応レベルを決定する（ステップＳ８）。ユーザが両手を振るリズムに基づいて、反応レベルが決定される。たとえば、反応指示部２６は、楽曲の演奏音のリズムと、ユーザのウェーブのリズムとのずれを検出し、検出したずれが小さいほど、反応レベルを高くすることができる。 When it is determined that the user is leading the wave (Yes in step S7), the reaction instruction unit 26 determines the reaction level of the audience (step S8). The reaction level is determined based on the rhythm with which the user shakes both hands. For example, the reaction instruction unit 26 detects a deviation between the rhythm of the musical performance sound and the user's wave rhythm, and the reaction level can be increased as the detected deviation is smaller.

反応指示部２６は、観客がウェーブをするリアクションデータ６３，６３，・・・のうち、決定した反応レベルに対応するリアクションデータ６３の再生を指示する。これにより、再生される楽曲データ６１とともに、観客がウェーブをするリアクションデータ６３が再生されることにより（ステップＳ９）、図１０Ｂに示すように、画面３ａ内の観客３１，３１が頭上でウェーブをする映像が表示される。画面３ａに表示する観客のうち、ウェーブをする観客の割合は、反応レベルによって変化する。 The reaction instruction unit 26 instructs the reproduction of the reaction data 63 corresponding to the determined reaction level among the reaction data 63, 63,. Thereby, the reaction data 63 that the audience waves together with the music data 61 to be reproduced is reproduced (step S9), and as shown in FIG. 10B, the audience 31 and 31 in the screen 3a wave over the head. Will be displayed. Of the audiences displayed on the screen 3a, the proportion of the audiences who wave will vary depending on the reaction level.

ユーザは、ライブパフォーマンスとして手拍子またはウェーブを先導する場合、手拍子フレーズまたはウェーブフレーズを画面３ａ内の観客に対して呼びかけなくてもよい。たとえば、本体装置２は、手拍子フレーズを検出せず、ユーザが手拍子をする動作のみを検出した場合であっても、観客が手拍子をするリアクションデータ６３を再生してもよい。 When a user leads a hand beat or wave as a live performance, the user does not have to call the hand beat phrase or wave phrase to the audience in the screen 3a. For example, the main device 2 may reproduce the reaction data 63 in which the audience is clapping even if the user detects only a clapping phrase without detecting a clapping phrase.

ユーザは、コントローラ５を用いず、マイク４のみを用いて、手拍子またはウェーブを先導してもよい。パフォーマンス特定部２５は、マイク４が繰り返し左右に動いている場合に、ユーザが手拍子またはウェーブを先導していると判定すればよい。このとき、手拍子とウェーブとの違いは、ユーザがマイク４を左右に動かす大きさに区別すればよい。 The user may lead the clapping or wave using only the microphone 4 without using the controller 5. The performance specifying unit 25 may determine that the user is leading the clapping or wave when the microphone 4 repeatedly moves left and right. At this time, the difference between the clapping and the wave may be distinguished by the size by which the user moves the microphone 4 left and right.

｛３．５．入場及び退場｝
ユーザは、ライブパフォーマンスとして、画面３ａに表示されているライブ会場に疑似的に入場したり、退場したりすることができる。 {3.5. Admission and Exit}
As a live performance, the user can enter or leave the live venue displayed on the screen 3a in a pseudo manner.

最初に、ユーザがライブ会場に疑似的に入場するケースについて説明する。図１１は、ユーザがライブ会場に疑似的に入場する手順を示す図である。パフォーマンス特定部２５は、動き情報４２Ａ，５１Ａに代えて、マイク４からの無線信号の有無に基づいてユーザの動きを検出する。 First, a case where a user enters a live venue in a pseudo manner will be described. FIG. 11 is a diagram showing a procedure for a user to pseudo-enter a live venue. The performance specifying unit 25 detects the user's movement based on the presence / absence of a radio signal from the microphone 4 instead of the movement information 42A and 51A.

ユーザは、楽曲データ６１の再生が開始される前に、本体装置２が設置された部屋から出て待機する。このとき、本体装置２の無線通信部２１は、マイク４から送信される無線信号を検出することができない。 The user goes out of the room where the main device 2 is installed and stands by before the reproduction of the music data 61 is started. At this time, the wireless communication unit 21 of the main body device 2 cannot detect the wireless signal transmitted from the microphone 4.

楽曲データ６１の再生が開始された後に、ユーザは、マイク４を持って、カラオケシステム１が設置された部屋へ入る。無線通信部２１は、ユーザが入室した場合、マイク４からの無線信号を検出することができる。パフォーマンス特定部２５は、楽曲データ６１の再生が開始された後に、マイク４からの無線信号を検出した場合、ユーザがライブ会場に入場したと判定する。 After the reproduction of the music data 61 is started, the user enters the room where the karaoke system 1 is installed with the microphone 4. The wireless communication unit 21 can detect a wireless signal from the microphone 4 when the user enters the room. When the performance specifying unit 25 detects a wireless signal from the microphone 4 after the reproduction of the music data 61 is started, the performance specifying unit 25 determines that the user has entered the live venue.

ユーザがライブ会場に入場したと判定された場合、反応指示部２６は、観客の反応レベルを決定する。反応指示部２６は、マイク４からの無線信号を検出したタイミングに基づいて、観客の反応レベルを決定する。 When it is determined that the user has entered the live venue, the reaction instruction unit 26 determines the reaction level of the audience. The reaction instruction unit 26 determines the reaction level of the audience based on the timing when the wireless signal from the microphone 4 is detected.

反応指示部２６は、シンガーの入場時の観客の反応を示すリアクションデータ６３の再生を、再生部２３に指示する。このとき、リアクションデータ６３の再生条件も、再生部２３に指示される。再生部２３は、ユーザの入場に合わせて、観客が大きく盛り上がる映像と、観客の歓声とが記録されたリアクションデータ６３を再生する。観客の歓声の音量は、再生条件に応じて変更される。 The reaction instructing unit 26 instructs the reproducing unit 23 to reproduce the reaction data 63 indicating the reaction of the audience when the singer enters. At this time, the playback condition of the reaction data 63 is also instructed to the playback unit 23. The playback unit 23 plays back the reaction data 63 in which the video that the audience is greatly excited and the cheer of the audience are recorded in accordance with the entrance of the user. The volume of the audience cheers is changed according to the playback conditions.

次に、ユーザのライブ会場からの退場について説明する。ユーザは、ライブ会場から疑似的に退場する場合、マイク４を持って部屋から退出する。無線通信部２１がマイク４からの無線信号を検出することができなった場合、パフォーマンス特定部２５は、ユーザがライブ会場から退出したと判定する。 Next, the user leaving the live venue will be described. When the user leaves the live venue in a pseudo manner, the user leaves the room with the microphone 4. When the wireless communication unit 21 cannot detect the wireless signal from the microphone 4, the performance specifying unit 25 determines that the user has left the live venue.

反応指示部２６は、ユーザがライブ会場から退場したと判定された場合、観客の反応レベルを決定する。反応指示部２６は、ユーザがライブ会場から退場したタイミングや、ユーザが退場するまでに行われたライブパフォーマンスの反応レベルに基づいて、退場時の反応レベルを決定する。反応レベルが高ければ、反応指示部２６は、観客がアンコールを要求するリアクションデータ６３の再生を再生部２３に指示する。一方、反応レベルが低ければ、反応指示部２６は、観客の歓声の音量を小さくするように再生部２３に指示したり、観客がブーイングをするリアクションデータ６３の再生を再生部２３に指示したりすることができる。 When it is determined that the user has left the live venue, the reaction instruction unit 26 determines the response level of the audience. The reaction instruction unit 26 determines the reaction level at the time of leaving based on the timing when the user leaves the live venue and the reaction level of the live performance performed until the user leaves. If the reaction level is high, the reaction instructing unit 26 instructs the reproducing unit 23 to reproduce the reaction data 63 for which the audience requests encore. On the other hand, if the reaction level is low, the reaction instruction unit 26 instructs the reproduction unit 23 to reduce the volume of the cheer of the audience, or instructs the reproduction unit 23 to reproduce the reaction data 63 that the audience is booing. can do.

また、マイク４に、ユーザが入場及び退場したことを通知するスイッチを設けてもよい。この場合、入場スイッチが押された場合、マイク４は、入場信号を本体装置２に送信する。退場スイッチが押された場合、マイク４は、退場信号を本体装置２に送信する。本体装置２は、入場信号及び退場信号の受信に応じて、観客の反応を変化させる。これにより、ユーザは、本体装置２が設置された部屋から出入りしなくてもよい。 The microphone 4 may be provided with a switch for notifying that the user has entered and exited. In this case, when the entrance switch is pressed, the microphone 4 transmits an entrance signal to the main body device 2. When the exit switch is pressed, the microphone 4 transmits an exit signal to the main body device 2. The main device 2 changes the audience reaction in response to the reception of the entrance signal and the exit signal. Thereby, the user does not need to go in and out of the room where the main body device 2 is installed.

このように、ユーザは、歌を歌い出す前、あるいは、歌を歌い終わった後であっても、ライブの臨場感を疑似的に体験することが可能となる。 In this way, the user can experience a live presence in a pseudo manner even before starting to sing a song or after singing a song.

｛３．６．その他のライブパフォーマンス｝
ユーザは、上述したライブパフォーマンスの他に、様々なライブパフォーマンスを行うことができる。 {3.6. Other live performances}
The user can perform various live performances in addition to the live performance described above.

たとえば、ユーザは、コントローラ５を持つ左手を大きく回しながら、右手にマイク４を持って歌ってもよい。この場合、コントローラ５は、コントローラ５が回転する動きを示す動き情報５１Ａを、本体装置２に送信する。パフォーマンス特定部２５は、動き情報５１Ａに基づいて、ユーザの左手の動きを特定する。この場合、ユーザの動きに合わせて観客が左手を回す映像が記録されたリアクションデータが、楽曲データ６１とともに再生される。 For example, the user may sing with the microphone 4 in the right hand while turning the left hand with the controller 5 greatly. In this case, the controller 5 transmits movement information 51A indicating the movement of the controller 5 to the main body device 2. The performance specifying unit 25 specifies the movement of the user's left hand based on the movement information 51A. In this case, the reaction data in which the video of the spectator turning his / her left hand in accordance with the user's movement is recorded together with the music data 61.

ユーザは、上述のライブパフォーマンスを組み合わせたパフォーマンスを行ってもよい。たとえば、ユーザは、ライブパフォーマンスとして、合唱の先導とウェーブの先導とを同時に行ってもよい。ユーザは、合唱フレーズをマイク４に入力した後に、図１０Ｂに示すウェーブの動作を行う。パフォーマンス特定部２５は、検出した合唱フレーズに基づいて、ユーザが合唱を先導していると判定し、検出したユーザの両手の動きに基づいてウェーブを先導していると判定する。この結果、画面３ａには、観客がウェーブをする映像が表示される。また、再生中の楽曲の演奏音と、観客が合唱する歌声とが、同時にモニタ３から出力される。 The user may perform a performance combining the above-described live performance. For example, the user may simultaneously perform chorus lead and wave lead as a live performance. The user performs the wave operation shown in FIG. 10B after inputting the choral phrase into the microphone 4. The performance specifying unit 25 determines that the user is leading chorus based on the detected choral phrase, and determines that the wave is leading based on the detected movement of both hands of the user. As a result, an image in which the audience waves is displayed on the screen 3a. In addition, the performance sound of the music being reproduced and the singing voice sung by the audience are simultaneously output from the monitor 3.

｛４．ライブパフォーマンス以外の機能｝
｛４．１．観客指定機能｝
ユーザは、カラオケシステム１を用いてカラオケをする場合、ライブ会場の観客の客層を指定することができる。ユーザは、設定した客層に応じたライブパフォーマンスを行う必要がある。 {4. Functions other than live performance}
{4.1. Audience designation function}
When karaoke is performed using the karaoke system 1, the user can specify the audience of the audience at the live venue. The user needs to perform live performance according to the set customer segment.

図１２は、ライブの客層を設定する設定画面を示す図である。ユーザは、図示しないリモコンを操作して、観客の客層を性別、年齢に基づいて設定する。性別設定では、ユーザは、男性主体、女性主体、及び中間のいずれかを指定する。男性主体は、観客の男性の比率が、８０％に設定されることを示す。女性主体は、観客の女性の比率が、８０％に設定されることを示す。中間とは、観客の男性の比率と女性との比率が５０％ずつであることを示す。年齢設定では、ユーザは、１５〜２５歳、２５〜４５歳、及び４５歳以上のいずれかから選択する。年齢設定により、ユーザが選択した年齢層の観客の比率が、７０％に設定される。この結果、楽曲データ６１とともに、設定された客層に応じた観客映像データ６２が再生される。 FIG. 12 is a diagram showing a setting screen for setting a live customer segment. The user operates a remote controller (not shown) to set the audience audience based on gender and age. In the gender setting, the user designates a male subject, a female subject, or an intermediate. The male subject indicates that the ratio of male spectators is set to 80%. The female subject indicates that the ratio of female audience members is set to 80%. The intermediate means that the ratio of males to females is 50%. In the age setting, the user selects from 15 to 25 years old, 25 to 45 years old, and 45 years old or older. By the age setting, the ratio of the audience in the age group selected by the user is set to 70%. As a result, together with the music data 61, audience video data 62 corresponding to the set customer segment is reproduced.

客層が設定された場合、観客の反応レベルは、ユーザが選択した楽曲や、ユーザが行うライブパフォーマンスに応じて変化する。このため、ユーザは、観客の客層を意識して、ライブパフォーマンスを行う必要がある。たとえば、ユーザは、４５歳以上で女性主体の客層を指定した場合、比較的ゆっくりとしたテンポの楽曲を選択し、観客に対して激しい動きを要求しないライブパフォーマンス（ウェーブ先導など）を行うことによって、反応レベルを上げることができる。一方、同様の客層でハードロック系の楽曲を先導し、ライブパフォーマンスとして観客煽りを繰り返し行った場合、観客の反応レベルは低くなる。このように、観客の客層を設定することにより、ユーザは、様々な観客を対象にしたライブを疑似的に経験することができる。 When the audience is set, the audience reaction level changes according to the music selected by the user and the live performance performed by the user. For this reason, the user needs to perform a live performance in consideration of the audience of the audience. For example, when a user designates a female-centered audience who is over 45 years old, by selecting a relatively slow tempo music and performing a live performance (such as wave guidance) that does not require intense movement for the audience , Can increase the reaction level. On the other hand, when a hard rock music is led by the same audience and the audience is repeatedly struck as a live performance, the audience's reaction level becomes low. In this way, by setting the audience layer of the audience, the user can experience a simulated live performance for various audiences.

｛４．２．対戦モード｝
カラオケシステム１は、複数のユーザがライブパフォーマンスの出来を競い合う対戦モードを実行することができる。以下、対戦モードの実行時における、本体装置２の動作について説明する。 {4.2. Battle mode}
The karaoke system 1 can execute a battle mode in which a plurality of users compete for live performance. Hereinafter, the operation of the main device 2 when the battle mode is executed will be described.

対戦モードでは、二人のユーザが交互にライブパフォーマンスを行う。ここでは、歌詞が３番まである楽曲を用いて、対戦が行われるケースを説明する。最初に、第１のユーザが、ライブパフォーマンスを行いながら、楽曲の１番を歌う。次に、第２のユーザがライブパフォーマンスを行いながら、楽曲の２番を歌う。このとき、第１のユーザ及び第２のユーザは、マイク４及びコントローラ５を共用する。しかし、複数のマイク４及びコントローラ５を、対戦モードを行う人数分用意してもよい。 In the battle mode, two users perform live performance alternately. Here, the case where a battle | competition is performed using the music which has the lyrics to the 3rd is demonstrated. First, the first user sings song 1 while performing a live performance. Next, the second user sings the second piece of music while performing a live performance. At this time, the first user and the second user share the microphone 4 and the controller 5. However, a plurality of microphones 4 and controllers 5 may be prepared for the number of persons performing the battle mode.

各ユーザが行うライブパフォーマンスは、本体装置２により採点される。第１のユーザ及び第２のユーザのうち、得点の高いユーザが、演奏中の楽曲の３番を歌うことができる。 The live performance performed by each user is scored by the main device 2. Among the first user and the second user, a user with a high score can sing No. 3 of the music being played.

ライブパフォーマンスの採点方法について説明する。上述したように、本体装置２は、ユーザが行ったライブパフォーマンスを行うたびに反応レベルを決定している。本体装置２は、第１のユーザがライブパフォーマンスを行うたびに、決定した反応レベルに基づいて各パフォーマンスを採点する。第１ユーザが行ったライブパフォーマンスごとの点数の総和が、第１ユーザの得点として算出される。第２ユーザの得点も、同様に算出される。 Explain how to score live performances. As described above, the main body device 2 determines the reaction level every time the live performance performed by the user is performed. Each time the first user performs a live performance, the main device 2 scores each performance based on the determined reaction level. The total score for each live performance performed by the first user is calculated as the score of the first user. The score of the second user is calculated in the same way.

第２のユーザがライブパフォーマンスを終了した後に、各ユーザの得点が画面３ａに表示される。勝者となったユーザが、楽曲の３番でライブパフォーマンスを行うことができる。このように、対戦モードを設けることにより、複数のユーザがライブパフォーマンスを競い合うという新たなカラオケシステムを提供することができる。 After the second user finishes the live performance, the score of each user is displayed on the screen 3a. The winning user can perform a live performance with the third song. Thus, by providing the battle mode, it is possible to provide a new karaoke system in which a plurality of users compete for live performance.

｛変形例｝
マイク４及びコントローラ５は、バイブレーション機能を備えていてもよい。たとえば、本体装置２の反応指示部２６は、観客の反応レベルが一定のレベルを超えた場合、マイク４及びコントローラ５に振動指示信号を送信する。マイク４及びコントローラ５は、振動指示信号を受信した場合、一定の時間（３秒程度）、自装置を振動させる。これにより、ユーザは、画面３ａ内の観客の映像だけでなく、マイク４及びコントローラ５を介して観客の反応を知ることができる。 {Modifications}
The microphone 4 and the controller 5 may have a vibration function. For example, the reaction instruction unit 26 of the main device 2 transmits a vibration instruction signal to the microphone 4 and the controller 5 when the audience reaction level exceeds a certain level. When the microphone 4 and the controller 5 receive the vibration instruction signal, the microphone 4 and the controller 5 vibrate their own devices for a certain period of time (about 3 seconds). Thereby, the user can know not only the image of the audience in the screen 3 a but also the reaction of the audience via the microphone 4 and the controller 5.

本体装置２は、ライブ会場の選択機能を設けてもよい。この場合、ユーザは、図示しないリモコンを操作することにより、アリーナ、野球場、及びコンサートホールなどをライブ会場として選択することができる。本体装置２は、選択された会場に応じた観客映像データを再生する。これにより、ユーザは、ライブのシンガーとして、様々なライブ会場でのライブパフォーマンスを疑似的に経験することができる。 The main device 2 may be provided with a live venue selection function. In this case, the user can select an arena, a baseball field, a concert hall, or the like as a live venue by operating a remote controller (not shown). The main device 2 reproduces the audience video data corresponding to the selected venue. Thereby, the user can experience the live performance in various live venues in a pseudo manner as a live singer.

図２に示すように、本体装置２が音声認識部２４を備える例を説明したが、マイク４が音声認識部２４を備えていてもよい。マイク４は、フレーズ情報２４Ａが音声認識部２４により生成されるたびに、生成されたフレーズ情報２４をリアルタイムに本体装置２に送信する。 As illustrated in FIG. 2, the example in which the main device 2 includes the voice recognition unit 24 has been described, but the microphone 4 may include the voice recognition unit 24. The microphone 4 transmits the generated phrase information 24 to the main body device 2 in real time each time the phrase information 24A is generated by the voice recognition unit 24.

リアクションデータ６３が、ライブパフォーマンスに対する観客のリアクションを示す映像及び音声が記録されたデータである例を説明した。しかし、リアクションデータ６３は、聴衆の反応を示す音声のみが記録されたデータであってもよい。たとえば、観客煽りに対する観客の歓声や、手拍子の先導に対する手拍子、聴衆のブーイングなどが記録されたデータを、リアクションデータ６３として用いることができる。 The example in which the reaction data 63 is data in which video and audio indicating the reaction of the audience to the live performance is recorded has been described. However, the reaction data 63 may be data in which only sound indicating the reaction of the audience is recorded. For example, the data in which the cheering of the audience with respect to the audience sensation, the clapping of the clapping of the clapping, the booing of the audience, etc. can be used as the reaction data 63.

リアクションデータ６３は、上述のような、ライブパフォーマンスに対する観客の動作を具体的に示すデータでなく、観客の反応を象徴的に示すデータであってもよい。たとえば、リアクションデータ６３として、ライブ会場から複数の打ち上げ花火が上がる映像が記録されたデータを用いることができる。反応指示部２６が反応レベルに応じて打ち上げ花火の数を変更することにより、ユーザは、ライブパフォーマンスに対する観客の反応を知ることができる。あるいは、反応指示部２６は、反応レベルが低いときのリアクションデータ６３として、観客のいないライブ会場の映像が記録されたデータや、荒れ狂う海が記録されたデータを選択してもよい。 The reaction data 63 may be data that symbolizes the reaction of the audience rather than the data that specifically indicates the audience's action on the live performance as described above. For example, as the reaction data 63, it is possible to use data in which a video in which a plurality of fireworks are recorded from a live venue is recorded. When the reaction instruction unit 26 changes the number of fireworks to fire according to the reaction level, the user can know the reaction of the audience to the live performance. Or the reaction instruction | indication part 26 may select the data by which the image | video of the live venue without a spectator was recorded, and the data by which the raging sea was recorded as the reaction data 63 when the reaction level is low.

１カラオケシステム
２本体装置
３モニタ
３ａ画面
４マイク
５コントローラ
２１，４３，５２無線通信部
２２データ取得部
２３再生部
２４音声認識部
２５パフォーマンス特定部
２６反応指示部
４２、５１センサ部 DESCRIPTION OF SYMBOLS 1 Karaoke system 2 Main body apparatus 3 Monitor 3a Screen 4 Microphone 5 Controller 21,43,52 Wireless communication part 22 Data acquisition part 23 Reproduction part 24 Voice recognition part 25 Performance specific part 26 Reaction instruction part 42, 51 Sensor part

Claims

A main unit;
A voice input device that is held by a user and outputs voice input by the user as voice data;
A speech recognition device that performs speech recognition processing on the speech data and generates phrase information indicating a phrase spoken by the user;
With
The voice input device includes:
A first motion information output unit that outputs first motion information indicating the motion of the voice input device;
Including
The main unit is
A reproducing unit for reproducing the music data selected by the user,
The phrase information and based on the first motion information, and performance specifying unit for specifying the performance of the user,
An audience response level with respect to the identified performance is determined based on a time difference between the timing at which the phrase information is detected and the timing at which the first motion information is detected, and a plurality of reaction data indicating the audience response A reaction instruction unit for selecting reproduction reaction data according to the identified performance and the audience reaction level, and instructing the reproduction unit to reproduce the reproduction reaction data;
Amusement system including.

A main unit;
A voice input device that is held by a user and outputs voice input by the user as voice data;
An imaging device that images the user and outputs video data;
A speech recognition device that performs speech recognition processing on the speech data and generates phrase information indicating a phrase spoken by the user;
With
The main unit is
A reproducing unit for reproducing the music data selected by the user,
A video analysis unit that analyzes the video data and generates first motion information indicating the movement of the user;
The phrase information and based on the first motion information, and performance specifying unit for specifying the performance of the user,
An audience response level with respect to the identified performance is determined based on a time difference between the timing at which the phrase information is detected and the timing at which the first motion information is detected, and a plurality of reaction data indicating the audience response A reaction instruction unit for selecting reproduction reaction data according to the identified performance and the audience reaction level, and instructing the reproduction unit to reproduce the reproduction reaction data;
Amusement system including.

In the amusement system according to claim 1 or 2 ,
An amusement system in which the reaction instruction unit determines a reproduction condition of the reproduction reaction data based on a reaction level of the audience, and instructs the reproduction unit of the reproduction condition.

In the amusement system according to claim 3 ,
The said reaction instruction | indication part is an amusement system which determines the reaction level of the said audience based on the customer base of the audience set by the said user .

In the amusement system according to any one of claims 1 to 4 ,
The performance specifying unit calls the user a specific phrase in a question format based on the phrase information, and the direction of the voice input device is reversed based on the first movement information. If it is determined that the user has performed a performance of scolding the audience,
The reaction instruction unit is an amusement system that selects, as the reproduction reaction data, reaction data in which video and audio in which an audience responds to the specific phrase all at once are recorded.

In the amusement system according to any one of claims 1 to 4 ,
The performance specifying unit calls a specific phrase for the user to request chorus based on the phrase information, and the direction of the voice input device is reversed based on the first movement information. If it is determined that the user has performed a performance requesting chorus from the audience,
The reaction instruction unit is an amusement system that selects reaction data in which video and audio sung by the audience are recorded as the reproduction reaction data.

In the amusement system according to any one of claims 1 to 4 ,
When the performance identifying unit determines that the user is clapping based on the motion information, the performance identifying unit determines that the user has performed a performance that leads the clapping,
An amusement system in which the reaction instruction unit selects reaction data in which video and clapping sounds recorded by the audience are recorded as the reproduction reaction data.

In the amusement system according to any one of claims 1 to 4 ,
The performance specifying unit determines that the user has performed a performance requesting an operation of waving both hands to the audience when it is determined that the user is performing an operation of waving an arm based on the movement information. ,
The reaction instruction unit is an amusement system that selects reaction data in which a video of the audience shaking hands is recorded as the reproduction reaction data.

The amusement system according to any one of claims 1 and 3 to 8 , further comprising:
A controller that the user holds with the hand opposite to the hand holding the voice input device;
With
The controller is
A second motion information output unit for outputting second motion information indicating the motion of the controller;
Including
The performance specifying unit is an amusement system that specifies the performance of the user based on the second motion information.

In the amusement system according to any one of claims 3 to 9 ,
The reaction instruction unit scores all performances performed in the first part of the music data based on the audience's response level for each performance, and performs all performances performed in the second part of the music data. An amusement system for scoring a game based on the audience's level of response to each performance.

A main unit;
A voice input device held by the user;
With
The voice input device includes:
A wireless communication unit for performing wireless communication with the main device,
Including
The main unit is
A reproducing unit for reproducing the music data selected by the user,
A performance identifying unit that identifies the performance of the user based on the presence or absence of a radio signal transmitted from the voice input device;
From a plurality of the reaction data that the audio or video showing the reaction was recorded auditory crowd, to select the playback reaction data corresponding to the specified performance, reaction instruction to reproduction of the reproduction reaction data to the reproduction unit And
Amusement system including.

The amusement system according to claim 11,
An amusement system in which the performance specifying unit determines that the user has entered a virtual live venue when the wireless signal is detected after the reproduction of the music data is started.

In the amusement system according to claim 11 or 12 ,
The performance specifying unit determines that the user has left the virtual live venue when the wireless signal cannot be detected after the music data has started to be reproduced.

A voice input device used in the amusement system according to any one of claims 1 to 13.

A voice input device that holds the user and outputs the voice input by the user as voice data and outputs first motion information indicating the movement of the user apparatus, and performs voice recognition processing on the voice data, and the user A speech recognition device that generates phrase information indicating a spoken phrase and a computer mounted on a communicable main unit,
Reproducing unit for reproducing the music data selected by the user,
A performance identifying unit that identifies the performance of the user based on at least one of the phrase information and the first motion information;
An audience response level with respect to the identified performance is determined based on a time difference between the timing at which the phrase information is detected and the timing at which the first motion information is detected, and a plurality of reaction data indicating the audience response A reaction instruction unit that selects reproduction reaction data according to the identified performance and the audience reaction level, and instructs the reproduction unit to reproduce the reproduction reaction data;
Program to function as.

An audio input device that is held by a user and outputs audio input by the user as audio data, an imaging device that images the user and outputs video data, and performs audio recognition processing on the audio data, and the user A speech recognition device that generates phrase information indicating a phrase spoken by a computer and a computer mounted on a communicable main unit,
Reproducing unit for reproducing the music data selected by the user,
A video analysis unit that analyzes the video data and generates first motion information indicating the movement of the user;
A performance identifying unit that identifies the performance of the user based on at least one of the phrase information and the first motion information;
An audience response level with respect to the identified performance is determined based on a time difference between the timing at which the phrase information is detected and the timing at which the first motion information is detected, and a plurality of reaction data indicating the audience response A program for selecting playback reaction data corresponding to the specified performance and the audience reaction level from among them and causing the playback unit to function as a reaction instruction unit that instructs the playback unit to play back the playback reaction data.

A computer installed in the main unit capable of wireless communication with the voice input device held by the user,
Reproducing unit for reproducing the music data selected by the user,
A performance identifying unit that identifies the performance of the user based on the presence or absence of a radio signal transmitted from the voice input device;
From a plurality of the reaction data indicating the response of hearing crowd, select the reproduction reaction data corresponding to the specified performance, reaction instructing unit for instructing the reproduction of the reproduction reaction data to the reproduction portion,
Program to function as.