JP2006106411A

JP2006106411A - Sound output controller, musical piece reproduction device, sound output control method, program thereof and recording medium with the program recorded thereon

Info

Publication number: JP2006106411A
Application number: JP2004293801A
Authority: JP
Inventors: Ryuichiro Matsumoto; 隆一郎松本; 和哉 ▲高▼橋; Kazuya Takahashi
Original assignee: Pioneer Electronic Corp
Current assignee: Pioneer Corp
Priority date: 2004-10-06
Filing date: 2004-10-06
Publication date: 2006-04-20
Anticipated expiration: 2024-10-06
Also published as: JP4244338B2

Abstract

<P>PROBLEM TO BE SOLVED: To provide a musical piece reproduction device which outputs good insertion sound. <P>SOLUTION: When a calculating means 470 of the musical piece reproduction device 400 recognizes a request in which a prescribed musical piece is to be reproduced in a voice corresponding mode, for example, the means starts reproduction of the prescribed musical piece and obtains in-vehicle sound including voice "yea", for example being collected by a microphone 200. The calculating means 470 recognizes a voice BPM of the voice "yea" based on the musical piece BPM of the musical piece being reproduced and recognizes a voice rhythm pattern of the voice corresponding to the voice BPM. Then, the calculating means 470 makes an uttering means 450 output insertion sound corresponding to the voice rhythm pattern i.e., the insertion sound synchronizing the musical piece BPM, in a properly matched manner with the musical piece based on the insertion data related to the rhythm pattern information corresponding to the voice rhythm pattern. <P>COPYRIGHT: (C)2006,JPO&NCIPI

Description

本発明は、所定の音を出力させる制御をする音出力制御装置、楽曲再生装置、音出力制御方法、そのプログラム、および、そのプログラムを記録した記録媒体に関する。 The present invention relates to a sound output control device that controls to output a predetermined sound, a music playback device, a sound output control method, a program thereof, and a recording medium on which the program is recorded.

従来、音声を認識して音声データを出力するロボットや玩具、テレビゲームプログラムなどの各種構成が広く利用されている（例えば、特許文献１参照）。例えば、特許文献１に記載のものは、ロボットに適用した構成で、頭部ユニットの所定位置に配設されたマイクロフォンでユーザの発話を含む周囲の音声を集音する。そして、得られた音声信号に基づいて、モデル記憶部の状態情報における韻律を感情モデルの値に基づいて制御した合成音を生成し、スピーカから出力させる構成が採られている。 Conventionally, various configurations such as robots, toys, and video game programs that recognize voice and output voice data have been widely used (see, for example, Patent Document 1). For example, the device described in Patent Document 1 has a configuration applied to a robot, and collects surrounding sounds including a user's utterance with a microphone disposed at a predetermined position of the head unit. Then, based on the obtained audio signal, a synthesized sound is generated in which the prosody in the state information of the model storage unit is controlled based on the value of the emotion model, and is output from the speaker.

特開２００２−３０４１８７号公報（第３頁右欄−第１０頁左欄）JP 2002-304187 A (page 3 right column-page 10 left column)

ところで、上述した特許文献１に記載のような発音に応じて音声出力する従来の構成では、例えば楽曲に応じて聴取者が合いの手を入れた場合、合いの手の発話に応じて音声出力される。しかしながら、楽曲のリズムに応じたタイミングの合いの手に対して、合いの手に対して音声出力される合成音は楽曲のリズムから既にずれたタイミングとなり、聴取者の楽曲の鑑賞が阻害されるおそれがある問題が一例として挙げられる。 By the way, in the conventional structure which outputs a sound according to pronunciation like the patent document 1 mentioned above, when a listener puts a matching hand according to a music, for example, a sound is output according to the utterance of a matching hand. However, the timing of the synthesized sound that is output to the matching hand according to the rhythm of the music is already shifted from the rhythm of the music, which may hinder the listener's appreciation of the music Is given as an example.

本発明の目的は、このような点に鑑みて、所定の音を良好に出力可能な音出力制御装置、楽曲再生装置、音出力制御方法、そのプログラム、および、そのプログラムを記録した記録媒体を提供することである。 In view of the above, an object of the present invention is to provide a sound output control device, a music playback device, a sound output control method, a program thereof, and a recording medium on which the program is recorded. Is to provide.

請求項１に記載の発明は、所定の音を音出力手段から出力させる音出力制御装置であって、外部状態に関する外部状態情報を取得する外部状態情報取得手段と、楽曲に関する楽曲データを出力する楽曲出力手段から出力される前記楽曲データの音の特性を認識する楽曲特性認識手段と、所定の前記外部状態に関する前記外部状態情報を取得したことを認識すると、前記音の特性に同調する出力形態の音を前記音出力手段から出力させる制御をする音出力制御手段と、を具備したことを特徴とした音出力制御装置である。 The invention according to claim 1 is a sound output control device that outputs a predetermined sound from the sound output means, and outputs external state information acquisition means for acquiring external state information related to the external state, and outputs music data related to the music. A music characteristic recognition means for recognizing the sound characteristics of the music data output from the music output means, and an output form that synchronizes with the sound characteristics upon recognizing that the external state information relating to the predetermined external state has been acquired. And a sound output control means for controlling the sound output means to output the sound.

請求項２１に記載の発明は、請求項１ないし請求項２０のいずれかに記載の音出力制御装置と、前記楽曲データを再生して前記楽曲出力手段から出力させる処理をする楽曲再生処理手段と、を具備したことを特徴とした楽曲再生装置である。 A twenty-first aspect of the present invention is the sound output control device according to any one of the first to twentieth aspects of the present invention, and a music reproduction processing unit that performs a process of reproducing the music data and outputting the music data from the music output unit. , A music reproducing apparatus characterized by comprising:

請求項２２に記載の発明は、請求項１ないし請求項２０のいずれかに記載の音出力制御装置と、前記楽曲データを取得して前記音出力制御装置の前記音出力制御手段で出力が制御される前記音を前記楽曲データの音に重畳させた重畳楽曲データを生成する重畳楽曲データ生成手段と、前記重畳楽曲データを再生して前記楽曲出力手段から出力させる処理をする楽曲再生処理手段と、を具備したことを特徴とした楽曲再生装置である。 According to a twenty-second aspect of the present invention, the output is controlled by the sound output control device according to any one of the first to twentieth aspects, and the sound output control unit of the sound output control device that acquires the music data. Superimposed music data generating means for generating superimposed music data obtained by superimposing the sound to be sounded on the music data; music reproduction processing means for performing processing for reproducing the superimposed music data and outputting it from the music output means; , A music reproducing apparatus characterized by comprising:

請求項２３に記載の発明は、所定の音を音出力手段から出力させる音出力制御方法であって、外部状態に関する外部状態情報を取得し、楽曲に関する楽曲データを出力する楽曲出力手段から出力される前記楽曲データの音の特性を認識し、所定の前記外部状態に関する前記外部状態情報を取得したことを認識すると、前記音の特性に同調する出力形態の音を前記音出力手段から出力させる制御をすることを特徴とする音出力制御方法である。 The invention described in claim 23 is a sound output control method for outputting a predetermined sound from the sound output means, which is output from the music output means for acquiring external state information relating to the external state and outputting music data relating to the music. Control to output from the sound output means a sound in an output form tuned to the sound characteristics when recognizing the sound characteristics of the music data and recognizing that the external state information relating to the predetermined external state has been acquired. A sound output control method characterized by:

請求項２４に記載の発明は、演算手段を請求項１ないし請求項２０のいずれかに記載の音出力制御装置として機能させることを特徴とした音出力制御プログラムである。 According to a twenty-fourth aspect of the present invention, there is provided a sound output control program that causes a computing means to function as the sound output control device according to any one of the first to twentieth aspects.

請求項２５に記載の発明は、請求項２３に記載の音出力制御方法を演算手段に実行させることを特徴とした音出力制御プログラムである。 A twenty-fifth aspect of the invention is a sound output control program that causes a calculation means to execute the sound output control method according to the twenty-third aspect.

請求項２６に記載の発明は、請求項２４または請求項２５に記載の音出力制御プログラムが演算手段にて読取可能に記録されたことを特徴とした音出力制御プログラムを記録した記録媒体である。 The invention described in claim 26 is a recording medium on which a sound output control program is recorded, wherein the sound output control program according to claim 24 or claim 25 is recorded so as to be readable by an arithmetic means. .

以下、本発明に係る一実施の形態を図面に基づいて説明する。本実施の形態では、本発明の音出力制御装置を備えた楽曲再生装置として移動体である例えば車両に搭載される装置構成について例示して説明する。なお、本発明における楽曲再生装置としては、移動体に搭載される構成の他、例えば家屋などの建造物に設置される構成や携帯型の構成にも適用できる。また、本実施の形態では、外部状態である外部音の状態として、車両の運転手や同乗者（以下、利用者と称す）が発音する音声の状態を例示して説明する。また、外部状態である身体の動作の状態として、利用者の身体の動作の状態を例示して説明する。なお、外部音としては、利用者が発音する音声に限らず、利用者の手拍子や鼻歌あるいは口笛の音、車外から車内に伝播する音、さらにはタンバリンやカスタネットあるいはマラカスなどの楽器を奏でる音など、各種の音を対象とすることができる。また、身体の動作としては、車内や車外の動物や車外の人物の身体の動作を対象とすることができる。さらに、外部状態としては、車内や車外の物体の車両に対応する動きなどを対象とすることができる。図１は、本発明に係る一実施の形態における楽曲再生システムの概略構成を示すブロック図である。図２は、楽曲リストデータの概略構成を示す模式図である。図３は、挿入音リストデータの概略構成を示す模式図である。図４は、楽曲再生装置を構成する演算手段の概略構成を示す模式図である。図５は、音声対応モードにおける挿入音の出力状態の一例を示す模式図で、（Ａ）は楽曲の歌詞で、（Ｂ）は音声で、（Ｃ）は挿入音である。図６は、動作対応モードにおける挿入音の出力状態の一例を示す模式図で、（Ａ）は楽曲の歌詞で、（Ｂ）は動作で、（Ｃ）は挿入音である。 Hereinafter, an embodiment according to the present invention will be described with reference to the drawings. In the present embodiment, an apparatus configuration mounted on, for example, a vehicle that is a moving body will be described as an example of a music reproducing apparatus including the sound output control apparatus of the present invention. In addition, as a music reproduction apparatus in this invention, it is applicable also to the structure installed in structures, such as a house, and a portable structure other than the structure mounted in a mobile body. Further, in the present embodiment, as an external sound state that is an external state, a sound state that is sounded by a vehicle driver or a passenger (hereinafter referred to as a user) will be described as an example. Moreover, the state of the user's body movement will be described as an example of the state of the body movement that is an external state. The external sound is not limited to the sound produced by the user, but the user's clapping, nasal or whistling sounds, the sound propagating from the outside of the vehicle to the inside of the vehicle, and the sound of playing an instrument such as a tambourine, castanets or maracas For example, various sounds can be targeted. In addition, the movement of the body can be the movement of an animal inside or outside the car or a person outside the car. Furthermore, as an external state, a movement corresponding to a vehicle of an object inside or outside the vehicle can be targeted. FIG. 1 is a block diagram showing a schematic configuration of a music reproducing system according to an embodiment of the present invention. FIG. 2 is a schematic diagram showing a schematic configuration of music list data. FIG. 3 is a schematic diagram showing a schematic configuration of insertion sound list data. FIG. 4 is a schematic diagram showing a schematic configuration of a calculation means constituting the music reproducing device. FIG. 5 is a schematic diagram showing an example of the output state of the insertion sound in the voice corresponding mode, where (A) is the lyrics of the music, (B) is the voice, and (C) is the insertion sound. FIG. 6 is a schematic diagram showing an example of the output state of the insertion sound in the motion corresponding mode, where (A) is the lyrics of the music, (B) is the operation, and (C) is the insertion sound.

〔楽曲再生システムの構成〕
図１において、１００は楽曲再生システムで、この楽曲再生システム１００は、移動体としての例えば車内で利用者が発音する音声に対応する音を再生中の楽曲に合わせて出力するものである。また、楽曲再生システム１００は、利用者の動作に対応する音を再生中の楽曲に合わせて出力するものである。なお、以下において、音声や動作に対応して出力される音を挿入音と称して説明する。ここで、移動体としては、自動車や電車などの車両に限らず、例えば飛行機や船舶など移動するいずれの移動体にも適用できる。そして、楽曲再生システム１００は、マイクロフォン２００（以下、マイク２００と称す）と、撮像手段３００と、楽曲再生装置４００と、などを備えている。 [Configuration of music playback system]
In FIG. 1, reference numeral 100 denotes a music playback system. The music playback system 100 outputs a sound corresponding to a sound produced by a user in a car as a moving body in accordance with the music being played back. In addition, the music playback system 100 outputs a sound corresponding to the user's operation in accordance with the music being played back. In the following, the sound output corresponding to the sound or operation will be described as an insertion sound. Here, the moving body is not limited to a vehicle such as an automobile or a train, and can be applied to any moving body such as an airplane or a ship. The music playback system 100 includes a microphone 200 (hereinafter referred to as a microphone 200), an imaging unit 300, a music playback device 400, and the like.

マイク２００は、車両の室内空間における例えば天井の略中央部分に配設され、利用者の音声や車外から車内に伝播する音などの車内で認識できる音（以下、車内音と称す）を取得すなわち集音する。そして、マイク２００は、集音した車内音に対応する外部状態情報としての外部音情報である車内音情報を楽曲再生装置４００へ出力する。 The microphone 200 is disposed in, for example, a substantially central portion of the ceiling in the interior space of the vehicle, and acquires sounds that can be recognized in the vehicle such as a user's voice and a sound that propagates from the outside of the vehicle to the inside of the vehicle (hereinafter referred to as in-vehicle sound). Collect sound. Then, the microphone 200 outputs in-vehicle sound information, which is external sound information as external state information corresponding to the collected in-vehicle sound, to the music reproducing device 400.

撮像手段３００は、車両の室内空間における例えば天井の前方部分に配設され、利用者を撮像する。そして、この撮像した内容に対応する外部状態情報としての身体動作情報である撮像データを楽曲再生装置４００へ出力する。なお、撮像手段３００の代わりに、例えば以下のようなセンサを設ける構成としてもよい。すなわち、室内空間の所定領域において利用者から放射される熱量を認識し、この熱量に対応する外部状態情報を出力するセンサを設ける構成としてもよい。また、所定の光線を所定領域に照射して、この光線の反射状態に対応する外部状態情報を出力するセンサを設ける構成としてもよい。さらに、車両に配設される構成に限らず、身体の一部に配設され振動ジャイロや加速度計などの検出値や測定値を外部状態情報として出力するセンサを設ける構成としてもよい。 The imaging means 300 is disposed, for example, in the front part of the ceiling in the vehicle interior space, and images the user. Then, imaging data that is physical motion information as external state information corresponding to the captured content is output to the music reproducing device 400. For example, the following sensor may be provided instead of the imaging unit 300. In other words, a sensor that recognizes the amount of heat radiated from the user in a predetermined area of the indoor space and outputs external state information corresponding to the amount of heat may be provided. Alternatively, a sensor may be provided that irradiates a predetermined region with a predetermined light beam and outputs external state information corresponding to the reflection state of the light beam. Furthermore, the configuration is not limited to the configuration provided in the vehicle, and a configuration may be provided in which a sensor that is provided on a part of the body and outputs a detection value or measurement value such as a vibration gyroscope or an accelerometer as external state information is provided.

楽曲再生装置４００は、ＭＩＤＩ（Musical Instrument Digital Interface）形式やＷＡＶＥ形式あるいはＭＰＥＧ（Moving Picture Coding Experts Group）形式など汎用デジタルフォーマットで記憶された楽曲や、ＣＤ（Compact Disc）やＭＤ（Mini Disc）などに記録された楽曲などを再生する。また、楽曲再生装置４００は、利用者の音声や動作に対応する挿入音を再生中の楽曲に合わせて出力する。この楽曲再生装置４００は、例えば車両に搭載された図示しないバッテリから供給される電力により動作する。そして、楽曲再生装置４００は、楽曲データ記憶手段４１０と、音データ記憶手段としての挿入音データ記憶手段４２０と、メモリ４３０と、入力手段４４０と、音出力手段および楽曲出力手段としての発音手段４５０と、表示手段４６０と、演算手段４７０と、などを備えている。 The music playback device 400 includes music stored in a general-purpose digital format such as MIDI (Musical Instrument Digital Interface) format, WAVE format or MPEG (Moving Picture Coding Experts Group) format, CD (Compact Disc), MD (Mini Disc), etc. Play music recorded on the. In addition, the music playback device 400 outputs an insertion sound corresponding to the user's voice or operation in accordance with the music being played back. The music reproducing device 400 is operated by electric power supplied from a battery (not shown) mounted on the vehicle, for example. The music reproducing device 400 includes a music data storage unit 410, an insertion sound data storage unit 420 as a sound data storage unit, a memory 430, an input unit 440, and a sound output unit 450 as a sound output unit and a music output unit. A display unit 460, a calculation unit 470, and the like.

楽曲データ記憶手段４１０は、図２に示すような楽曲リストデータ５００を適宜読み出し可能に記憶する。この楽曲データ記憶手段４１０としては、ＨＤ（Hard Disk）などの磁気ディスク、ＣＤやＭＤあるいはＤＶＤ（Digital Versatile Disc）などの光ディスク、光磁気ディスク、メモリカードなどの記録媒体に読み出し可能に記憶するドライブやドライバなどを備えた構成などが例示できる。 The music data storage means 410 stores the music list data 500 as shown in FIG. The music data storage means 410 includes a magnetic disk such as an HD (Hard Disk), an optical disk such as a CD, an MD, or a DVD (Digital Versatile Disc), a magneto-optical disk, and a drive that stores the data in a readable manner. A configuration including a driver and a driver can be exemplified.

楽曲リストデータ５００は、再生される楽曲のリストに関するデータである。そして、楽曲リストデータ５００は、少なくとも１つの楽曲個別データ５１０が１つのデータ構造として関連付けられて構成されている。 The music list data 500 is data relating to a list of music to be played back. The music list data 500 is configured by associating at least one piece of music individual data 510 as one data structure.

楽曲個別データ５１０は、１つの楽曲に関する情報である。そして、楽曲個別データ５１０は、楽曲データ５１１と、楽曲関連情報５１２と、などが１つのデータ構造として構成されたテーブル構造である。なお、楽曲個別データ５１０は、楽曲データ５１１のみで構成される場合もある。 The music piece individual data 510 is information related to one piece of music piece. The music individual data 510 has a table structure in which music data 511, music related information 512, and the like are configured as one data structure. Note that the individual music data 510 may be composed of only the music data 511.

楽曲データ５１１は、楽曲の再生の際に用いられるデータである。この楽曲データ５１１は、例えばＭＩＤＩ形式やＷＡＶＥ形式あるいはＭＰＥＧ形式などの楽曲を再生可能な形式により記録されている。 The music data 511 is data used when playing music. The music data 511 is recorded in a format capable of reproducing music such as MIDI format, WAVE format, or MPEG format.

楽曲関連情報５１２は、楽曲データ５１１により再生される楽曲に関する情報である。そして、楽曲関連情報５１２は、楽曲名情報５１２Ａと、演奏者情報５１２Ｂと、楽曲リズム情報５１２Ｃと、などが１つのデータ構造として構成されたテーブル構造である。なお、楽曲関連情報５１２は、各情報５１２Ａ，５１２Ｂ，５１２Ｃなどのうちの少なくともいずれか１つが格納された構成となる場合もある。 The music related information 512 is information related to the music reproduced by the music data 511. The music related information 512 has a table structure in which music name information 512A, player information 512B, music rhythm information 512C, and the like are configured as one data structure. Note that the music related information 512 may be configured to store at least one of the pieces of information 512A, 512B, and 512C.

楽曲名情報５１２Ａは、楽曲の名前を示す情報をデータ化したものである。演奏者情報５１２Ｂは、楽曲の演奏者を示す情報をデータ化したものである。楽曲リズム情報５１２Ｃは、例えばＢＰＭ（Beats Per Minute）や拍子などの楽曲のリズムなどを示す情報をデータ化したものである。この楽曲リズム情報５１２Ｃとしては、例えばＭＩＤＩメッセージなどが例示できる。 The song name information 512A is data obtained by converting information indicating the name of a song. The player information 512B is obtained by converting information indicating the player of the music into data. The music rhythm information 512C is data obtained by converting information indicating the rhythm of music such as BPM (Beats Per Minute) and time signature. As this music rhythm information 512C, a MIDI message etc. can be illustrated, for example.

挿入音データ記憶手段４２０は、図３に示すような挿入音リストデータ６００を適宜読み出し可能に記憶する。この挿入音データ記憶手段４２０としては、楽曲データ記憶手段４１０と同様の構成などが例示できる。 The insertion sound data storage means 420 stores the insertion sound list data 600 as shown in FIG. The inserted sound data storage means 420 can be exemplified by the same configuration as the music data storage means 410.

挿入音リストデータ６００は、利用者の音声や動作に対応して再生中の楽曲に合わせて適宜出力される挿入音のリストに関するデータである。この挿入音リストデータ６００は、利用者による入力手段４４０の入力操作により適宜変更、追加、削除される。そして、挿入音リストデータ６００は、少なくとも１つの挿入音個別データ６１０が１つのデータ構造として関連付けられて構成されている。 The insertion sound list data 600 is data relating to a list of insertion sounds that are appropriately output in accordance with the music being played in response to the user's voice and actions. The insertion sound list data 600 is appropriately changed, added, or deleted by an input operation of the input unit 440 by the user. The inserted sound list data 600 is configured by associating at least one inserted sound individual data 610 as one data structure.

挿入音個別データ６１０は、１つの挿入音に関する情報である。そして、挿入音個別データ６１０は、音データとしての挿入音データ６１１と、挿入音関連情報６１２と、などが１つのデータ構造として構成されたテーブル構造である。 The inserted sound individual data 610 is information related to one inserted sound. The inserted sound individual data 610 has a table structure in which inserted sound data 611 as sound data, inserted sound related information 612, and the like are configured as one data structure.

挿入音データ６１１は、挿入音の再生の際に用いられるデータである。この挿入音データ６１１は、ＭＩＤＩ形式により記録されている。この挿入音データ６１１により再生される挿入音としては、例えばドラム、和太鼓、ピアノ、シンバル、ギターなどの楽器を単独で奏でる音、あるいはこれらの楽器を複数組み合わせて奏でる音が例示できる。なお、ここでは、挿入音データ６１１としては、ＭＩＤＩ形式のものに限らず、例えばＷＡＶＥ形式やＭＰＥＧ形式など挿入音を再生可能ないずれの形式のものとしてもよい。また、挿入音としては、楽器を奏でる音に限らず、例えば「イェイ」、「ヒュウ」、「ヘイ」、「ヨイショ」などの音声、鼻歌や口笛の音、波や風あるいは水などの自然の音、クラクションの音などの人工的な音などいずれの音としてもよい。 The insertion sound data 611 is data used when reproducing the insertion sound. The insertion sound data 611 is recorded in the MIDI format. As an insertion sound reproduced by the insertion sound data 611, for example, a sound of a musical instrument such as a drum, a Japanese drum, a piano, a cymbal, a guitar, or a combination of a plurality of these musical instruments can be exemplified. Here, the insertion sound data 611 is not limited to the MIDI format, and may be any format capable of reproducing the insertion sound, such as the WAVE format or the MPEG format. In addition, the insertion sound is not limited to the sound of playing a musical instrument. For example, sounds such as “Yay”, “Hyu”, “Hay”, “Yoisho”, sounds of nose and whistle, natural sounds such as waves, winds and water. Any sound such as a sound or an artificial sound such as a horn sound may be used.

挿入音関連情報６１２は、挿入音データ６１１より再生される挿入音に関する情報である。そして、挿入音関連情報６１２は、リズムパターン情報６１２Ａと、所定外部状態情報としての対象言語情報６１２Ｂと、所定外部状態情報としての対象動作情報６１２Ｃと、などが１つのデータ構造として構成されたテーブル構造である。 The insertion sound related information 612 is information regarding the insertion sound reproduced from the insertion sound data 611. The inserted sound related information 612 is a table in which rhythm pattern information 612A, target language information 612B as predetermined external state information, target motion information 612C as predetermined external state information, and the like are configured as one data structure. It is a structure.

リズムパターン情報６１２Ａは、挿入音データ６１１で再生される挿入音のリズムパターンを示す情報をデータ化したものである。このリズムパターンとしては、例えば「ロック」や「ジャズ」あるいは「リズムアンドブルース」などが例示できる。対象言語情報６１２Ｂは、音声として発音される言語を示す情報をデータ化したものである。この対象言語情報６１２Ｂで示される言語としては、例えば「イェイ」、「イェイェイ」、「ヒュウ」、「ヘイ」、「ヨイショ」などが例示できる。対象動作情報６１２Ｃは、利用者の動作を示す情報をデータ化したものである。この対象動作情報６１２Ｃで示される動作としては、例えば手や頭あるいは胴体を揺らす動き、さらには手を突き上げる動きなどが例示できる。 The rhythm pattern information 612A is data obtained by converting information indicating the rhythm pattern of the inserted sound reproduced by the inserted sound data 611 into data. Examples of the rhythm pattern include “rock”, “jazz”, and “rhythm and blues”. The target language information 612B is data obtained by converting information indicating a language that is pronounced as speech. Examples of the language indicated by the target language information 612B include “Yay”, “Yayye”, “Hyu”, “Hay”, “Yoshisho”, and the like. The target action information 612C is data obtained by converting information indicating a user's action. Examples of the motion indicated by the target motion information 612C include a motion of shaking the hand, head, or torso, and a motion of pushing up the hand.

メモリ４３０は、楽曲を再生したり挿入音を出力させる際に必要な各種情報や各種データを適宜読み出し可能に記憶する。また、メモリ４３０は、楽曲再生装置４００全体を動作制御するＯＳ（Operating System）上に展開される各種プログラムなどを記憶している。このメモリ４３０としては、例えば停電などにより突然電源が落ちた際にも記憶が保持される構成のメモリ、例えばＣＭＯＳ（Complementary Metal-Oxide Semiconductor）メモリなどを用いることが望ましい。なお、メモリとしては、ＨＤ、ＤＶＤ、光ディスク、メモリカードなどの記録媒体に読み出し可能に記憶するドライブやドライバなどを備えた構成としてもよい。 The memory 430 stores various information and various data necessary for reproducing a music piece or outputting an insertion sound so as to be appropriately read out. The memory 430 stores various programs developed on an OS (Operating System) that controls the operation of the entire music playback device 400. As the memory 430, it is desirable to use a memory having a configuration in which memory is retained even when the power is suddenly turned off due to a power failure, for example, a CMOS (Complementary Metal-Oxide Semiconductor) memory. Note that the memory may be configured to include a drive and a driver that are readable and stored in a recording medium such as an HD, a DVD, an optical disk, or a memory card.

入力手段４４０は、入力操作される図示しない各種操作ボタンや操作つまみなどを有している。この操作ボタンや操作つまみの入力操作の内容としては、例えば楽曲再生装置４００の動作内容の設定などの設定事項である。具体的には、動作内容の設定としては、楽曲の再生や停止などの設定、音量の調整、音声に対応する挿入音を出力するモード（以下、音声対応モードと称す）の設定、動作に対応する挿入音を出力するモード（以下、動作対応モードと称す）の設定などが例示できる。そして、入力手段４４０は、設定事項の入力操作により、各種情報を操作信号として演算手段４７０へ適宜出力して設定させる。なお、この入力手段４４０としては、操作ボタンや操作つまみなどの入力操作に限らず、例えば表示手段４６０に設けられたタッチパネルによる入力操作や、音声による入力操作など、各種設定事項を設定入力可能ないずれの構成が適用できる。また、入力手段４４０としては、例えば図示しないリモートコントローラ（以下、リモコンと称す）より赤外線を介して送信される各種情報を受信して、この各種情報を操作信号として演算手段４７０へ送信するリモコン受光部を備えた構成などとしてもよい。 The input unit 440 includes various operation buttons and operation knobs (not shown) that are input. The contents of the input operation of the operation buttons and operation knobs are setting items such as setting of operation contents of the music reproducing device 400, for example. Specifically, the operation settings can be set to play or stop music, adjust the volume, set the mode to output an insertion sound corresponding to audio (hereinafter referred to as audio-compatible mode), and operate. For example, a setting of a mode for outputting an insertion sound (hereinafter referred to as an operation corresponding mode) can be exemplified. Then, the input unit 440 appropriately sets various information as operation signals to the calculation unit 470 by an input operation of setting items. The input unit 440 is not limited to an input operation such as an operation button or an operation knob. For example, various setting items such as an input operation using a touch panel provided on the display unit 460 and an input operation using voice can be set and input. Either configuration is applicable. Further, as the input means 440, for example, remote control light reception that receives various information transmitted via infrared rays from a remote controller (hereinafter referred to as a remote controller) (not shown) and transmits the various information as operation signals to the calculation means 470. It is good also as a structure provided with the part.

発音手段４５０は、例えば車両のインストルメントパネル部、ドア部、リアダッシュボード部などに配設された図示しないスピーカを有している。この発音手段４５０は、演算手段４７０にて制御され、演算手段４７０からの楽曲データ５１１や挿入音データ６１１などの音声信号をスピーカから音により出力する。 The sound generation means 450 has a speaker (not shown) disposed on, for example, an instrument panel portion, a door portion, a rear dashboard portion, or the like of the vehicle. The sound generation means 450 is controlled by the calculation means 470 and outputs sound signals such as the music data 511 and the insertion sound data 611 from the calculation means 470 from the speaker.

表示手段４６０は、演算手段４７０にて制御され、演算手段４７０からの画像データの画像信号を画面表示させる。画像データとしては、例えば再生している楽曲の名前、演奏者、演奏時間、音量の大きさなどの画像データなどが例示できる。この表示手段４６０としては、例えば液晶パネルや有機ＥＬ（Electro Luminescence）パネル、ＰＤＰ（Plasma Display Panel）、ＣＲＴ（Cathode-Ray Tube）、ＦＥＤ（Field Emission Display）、電気泳動ディスプレイパネルなどが例示できる。 The display unit 460 is controlled by the calculation unit 470 and displays the image signal of the image data from the calculation unit 470 on the screen. Examples of the image data include image data such as the name of the music being played, performer, performance time, and volume. Examples of the display means 460 include a liquid crystal panel, an organic EL (Electro Luminescence) panel, a PDP (Plasma Display Panel), a CRT (Cathode-Ray Tube), an FED (Field Emission Display), and an electrophoretic display panel.

演算手段４７０は、図示しない各種入出力ポート、例えばマイク２００が接続されるマイクポート、撮像手段３００が接続される撮像ポート、楽曲データ記憶手段４１０が接続される楽曲記憶ポート、挿入音データ記憶手段４２０が接続される挿入音記憶ポート、メモリ４３０が接続されるメモリポート、入力手段４４０が接続されるキー入力ポート、発音手段４５０が接続される発音制御ポート、表示手段４６０が接続される表示制御ポートなどを有する。そして、演算手段４７０は、各種プログラムとして、図４に示すように、楽曲再生処理手段４７１と、楽曲特性認識手段としての楽曲リズム認識手段４７２と、外部状態情報取得手段および外部状態リズム認識手段としての音声状態認識手段４７３と、外部状態情報取得手段および外部状態リズム認識手段としての動作状態認識手段４７４と、音出力制御手段としての挿入音再生制御手段４７５と、などを備えている。なお、楽曲リズム認識手段４７２、音声状態認識手段４７３、動作状態認識手段４７４、および、挿入音再生制御手段４７５にて、本発明の音出力制御装置が構成されている。 The calculation means 470 includes various input / output ports (not shown) such as a microphone port to which the microphone 200 is connected, an imaging port to which the imaging means 300 is connected, a music storage port to which the music data storage means 410 is connected, and an insertion sound data storage means. Insertion sound storage port to which 420 is connected, memory port to which memory 430 is connected, key input port to which input means 440 is connected, sound generation control port to which sound generation means 450 is connected, display control to which display means 460 is connected Has ports etc. As shown in FIG. 4, the calculation means 470 is a music reproduction processing means 471, a music rhythm recognition means 472 as a music characteristic recognition means, an external state information acquisition means, and an external state rhythm recognition means. Voice state recognition means 473, operation state recognition means 474 as external state information acquisition means and external state rhythm recognition means, insertion sound reproduction control means 475 as sound output control means, and the like. The music rhythm recognition unit 472, the voice state recognition unit 473, the operation state recognition unit 474, and the insertion sound reproduction control unit 475 constitute the sound output control device of the present invention.

楽曲再生処理手段４７１は、楽曲データ記憶手段４１０に記憶された楽曲データ５１１に基づいて楽曲を再生する処理をする。具体的には、楽曲再生処理手段４７１は、入力手段４４０における入力操作に基づく所定の楽曲を再生させる旨の操作信号を取得すると、この所定の楽曲に対応する楽曲個別データ５１０を検索する。さらに、楽曲再生処理手段４７１は、この楽曲個別データ５１０に楽曲リズム情報５１２Ｃが組み込まれているか否かを判断する。そして、楽曲リズム情報５１２Ｃが組み込まれていると判断した場合、この楽曲リズム情報５１２Ｃのリズムに基づいた速度で楽曲データ５１１を音声信号として発音手段４５０へ出力する。また、楽曲再生処理手段４７１は、楽曲リズム情報５１２Ｃが組み込まれていないと判断した場合、楽曲データ５１１を予め設定された速度で音声信号として発音手段４５０へ出力する。さらに、楽曲再生処理手段４７１は、楽曲個別データ５１０に楽曲名情報５１２Ａや演奏者情報５１２Ｂが組み込まれている場合、各情報５１２Ａ，５１２Ｂに記載された楽曲の名前、演奏者、楽曲の演奏時間などの再生されている楽曲に関する各種情報を表示させるための画像データを生成する。そして、この画像データを画像信号として表示手段４６０へ出力する。 The music reproduction processing unit 471 performs processing for reproducing music based on the music data 511 stored in the music data storage unit 410. Specifically, when the music reproduction processing unit 471 acquires an operation signal for reproducing a predetermined music based on the input operation of the input unit 440, the music reproduction processing unit 471 searches the music individual data 510 corresponding to the predetermined music. Further, the music reproduction processing means 471 determines whether or not the music rhythm information 512C is incorporated in the music individual data 510. If it is determined that the music rhythm information 512C is incorporated, the music data 511 is output to the sound generation means 450 as an audio signal at a speed based on the rhythm of the music rhythm information 512C. When the music reproduction processing unit 471 determines that the music rhythm information 512C is not incorporated, the music reproduction processing unit 471 outputs the music data 511 to the sound generation unit 450 as an audio signal at a preset speed. Furthermore, when the music name information 512A and the player information 512B are incorporated in the music individual data 510, the music reproduction processing means 471 includes the name of the music, the performer, and the performance time of the music described in the information 512A and 512B. Image data for displaying various information related to the music being played back is generated. Then, this image data is output to the display means 460 as an image signal.

楽曲リズム認識手段４７２は、楽曲再生処理手段４７１で再生している楽曲のリズムを認識する。具体的には、楽曲リズム認識手段４７２は、発音手段４５０で出力されている楽曲データ５１１を有する楽曲個別データ５１０に楽曲リズム情報５１２Ｃが組み込まれているか否かを判断する。ここで、楽曲リズム情報５１２Ｃが組み込まれているか否かを判断する方法としては、例えば所定のＭＩＤＩメッセージの有無に基づいて判断する方法が例示できる。なお、楽曲のリズムに関する他の情報の有無に基づいて判断する方法としてもよい。楽曲リズム認識手段４７２は、楽曲リズム情報５１２Ｃが組み込まれていると判断した場合、この楽曲リズム情報５１２Ｃを取得して、楽曲のＢＰＭ（以下、楽曲ＢＰＭと称す）を認識する。そして、この楽曲ＢＰＭに関する楽曲ＢＰＭ情報をメモリ４３０に記憶させる。また、楽曲リズム認識手段４７２は、楽曲リズム情報５１２Ｃが組み込まれていないと判断した場合、例えば以下のような処理を実施して楽曲ＢＰＭを認識する。すなわち、楽曲リズム認識手段４７２は、楽曲データ５１１の音声信号を取得して、この音声信号に基づいて楽曲の出力波形を認識する。さらに、この出力波形のピークの間隔を認識し、このピーク間隔から楽曲ＢＰＭを認識する。そして、楽曲リズム認識手段４７２は、この楽曲ＢＰＭに関する楽曲ＢＰＭ情報をメモリ４３０に記憶させる。 The music rhythm recognition means 472 recognizes the rhythm of the music being played back by the music playback processing means 471. Specifically, the music rhythm recognition unit 472 determines whether or not the music rhythm information 512 </ b> C is incorporated in the music individual data 510 including the music data 511 output from the sound generation unit 450. Here, as a method of determining whether or not the music rhythm information 512C is incorporated, for example, a method of determining based on the presence or absence of a predetermined MIDI message can be exemplified. In addition, it is good also as the method of judging based on the presence or absence of the other information regarding the rhythm of a music. When the music rhythm recognition unit 472 determines that the music rhythm information 512C is incorporated, the music rhythm recognition unit 472 acquires the music rhythm information 512C and recognizes the BPM of the music (hereinafter referred to as music BPM). Then, the music BPM information related to the music BPM is stored in the memory 430. When the music rhythm recognition unit 472 determines that the music rhythm information 512C is not incorporated, for example, the music rhythm recognition unit 472 performs the following processing to recognize the music BPM. That is, the music rhythm recognition means 472 acquires the audio signal of the music data 511 and recognizes the output waveform of the music based on this audio signal. Furthermore, the peak interval of this output waveform is recognized, and the music BPM is recognized from this peak interval. Then, the music rhythm recognition unit 472 stores the music BPM information related to the music BPM in the memory 430.

音声状態認識手段４７３は、入力手段４４０における入力操作に基づく音声対応モードに設定されたことを認識すると、利用者の音声の状態を適宜検出する。そして、音声状態認識手段４７３は、音声リズム認識手段４７３Ａと、言語認識手段４７３Ｂと、を備えている。 When the voice state recognition unit 473 recognizes that the voice corresponding mode based on the input operation in the input unit 440 is set, the voice state recognition unit 473 appropriately detects the voice state of the user. The voice state recognition unit 473 includes a voice rhythm recognition unit 473A and a language recognition unit 473B.

音声リズム認識手段４７３Ａは、音声のリズムを認識する。具体的には、音声リズム認識手段４７３Ａは、マイク２００に車内音を集音させて車内音情報を出力させる。そして、音声リズム認識手段４７３Ａは、マイク２００からの車内音情報を予め設定された所定時間だけ取得して、この車内音情報に基づいて車内音の出力波形を認識する。さらに、音声リズム認識手段４７３Ａは、この出力波形における出力値が予め設定された基準出力値以下の部分をゼロに設定、すなわち車内音から基準音量以下の音を削除する。ここで、基準出力値は、例えば車両の走行音の出力値や、利用者の通常の会話の出力値と略同一の値に設定されている。すなわち、基準出力値は、車内音の出力波形における基準出力値以下の部分がゼロに設定された際に、利用者が楽曲に合わせて意識的に発音した例えば「イェイ」や「イェイェイ」などの音声に対応する成分のみがゼロにならない状態に設定されている。 The voice rhythm recognition unit 473A recognizes the voice rhythm. Specifically, the voice rhythm recognition unit 473A causes the microphone 200 to collect in-vehicle sound and output in-vehicle sound information. Then, the voice rhythm recognition unit 473A acquires the in-vehicle sound information from the microphone 200 for a predetermined time set in advance, and recognizes the output waveform of the in-vehicle sound based on the in-vehicle sound information. Further, the voice rhythm recognition unit 473A sets a portion where the output value in the output waveform is equal to or less than a preset reference output value to zero, that is, deletes the sound below the reference volume from the vehicle interior sound. Here, the reference output value is set to a value that is substantially the same as, for example, the output value of the running sound of the vehicle or the output value of the normal conversation of the user. In other words, the reference output value is, for example, “Yay” or “Yayy” that the user consciously pronounced according to the music when the portion below the reference output value in the output waveform of the interior sound is set to zero. It is set so that only the component corresponding to the sound does not become zero.

そして、音声リズム認識手段４７３Ａは、車内音から削除した音の時間長が予め設定された外部音認識基準時間としての音声認識基準時間以下であることを認識すると、車内音に例えば「イェイ」などの利用者が意識して発音した音声が含まれていると判断する。さらに、音声リズム認識手段４７３Ａは、楽曲ＢＰＭ情報をメモリ４３０から取得して、この楽曲ＢＰＭ情報の楽曲ＢＰＭと、基準音量以下の音を削除した車内音の出力波形すなわち音声の出力波形のピークの間隔と、に基づいて、楽曲ＢＰＭを基準にした音声のＢＰＭ（以下、音声ＢＰＭと称す）を演算して認識する。そして、音声リズム認識手段４７３Ａは、この音声ＢＰＭに基づいて、音声のリズムパターン（以下、音声リズムパターンと称す）を認識し、この音声リズムパターンに関する音声リズムパターン情報をメモリ４３０に記憶させる。 When the voice rhythm recognizing unit 473A recognizes that the time length of the sound deleted from the vehicle interior sound is equal to or less than the speech recognition reference time as the preset external sound recognition reference time, for example, "Yay" It is determined that voices consciously pronounced are included. Furthermore, the voice rhythm recognition unit 473A acquires the music BPM information from the memory 430, and outputs the peak of the music BPM of the music BPM information and the output waveform of the in-vehicle sound from which the sound below the reference volume is deleted, that is, the voice output waveform. Based on the interval, an audio BPM based on the music BPM (hereinafter referred to as audio BPM) is calculated and recognized. Then, the voice rhythm recognition unit 473A recognizes a voice rhythm pattern (hereinafter referred to as a voice rhythm pattern) based on the voice BPM, and stores voice rhythm pattern information related to the voice rhythm pattern in the memory 430.

言語認識手段４７３Ｂは、車内音情報に基づいて、音声リズム認識手段４７３Ａで音声リズムパターンが認識された音声により発音されている言語（以下、音声言語と称す）を認識する。例えば、言語認識手段４７３Ｂは、音声リズム認識手段４７３Ａで取得された車内音情報をデジタル信号化して、音響的な特徴を認識する。さらに、音声の統計的な情報をモデル化した音声モデルから、車内音情報に基づく音響的な特徴に最も近い言語を検索する。そして、言語認識手段４７３Ｂは、この検索した言語を音声言語として認識する。また、言語認識手段４７３Ｂは、音声言語を認識すると、この音声言語に関する音声言語情報をメモリ４３０に記憶させる。 The language recognizing unit 473B recognizes a language (hereinafter referred to as a “speech language”) that is pronounced by the sound whose sound rhythm pattern is recognized by the sound rhythm recognizing unit 473A based on the in-vehicle sound information. For example, the language recognizing unit 473B converts the in-vehicle sound information acquired by the sound rhythm recognizing unit 473A into a digital signal and recognizes an acoustic feature. Furthermore, a language closest to an acoustic feature based on in-vehicle sound information is searched from a speech model obtained by modeling speech statistical information. Then, the language recognition unit 473B recognizes the searched language as a speech language. Further, when recognizing the speech language, the language recognition unit 473B stores the speech language information related to the speech language in the memory 430.

動作状態認識手段４７４は、入力手段４４０における入力操作に基づく動作対応モードに設定されたことを認識すると、利用者の動作の状態を適宜認識する。そして、動作状態認識手段４７４は、動作リズム認識手段４７４Ａと、動作内容認識手段４７４Ｂと、を備えている。 When the operation state recognition unit 474 recognizes that the operation corresponding mode based on the input operation in the input unit 440 is set, the operation state recognition unit 474 appropriately recognizes the operation state of the user. The motion state recognition unit 474 includes a motion rhythm recognition unit 474A and a motion content recognition unit 474B.

動作リズム認識手段４７４Ａは、動作のリズムを認識する。具体的には、動作リズム認識手段４７４Ａは、撮像手段３００に利用者を撮像させ、撮像データを出力させる。さらに、動作リズム認識手段４７４Ａは、撮像手段３００からの撮像データを予め設定された所定時間だけ取得して、この撮像データに基づいて利用者の身体の所定の部分、例えば頭や手などを把握する。そして、この把握した所定部分が所定の方向に予め設定された身体動作基準量以上動いたか否かを認識する。ここで、身体動作基準量は、例えば利用者の運転動作や窓を開閉する動作など、楽曲に合わせた意識的な動作以外の動作に対応する量に設定されている。なお、複数の撮像手段３００を設け、これらの撮像手段３００で利用者を撮像させて、身体の所定部分の動作を３次元的に認識する構成としてもよい。 The motion rhythm recognition means 474A recognizes the motion rhythm. Specifically, the motion rhythm recognition unit 474A causes the imaging unit 300 to image the user and output the imaging data. Furthermore, the motion rhythm recognition unit 474A acquires the image data from the image capturing unit 300 for a predetermined time, and grasps a predetermined part of the user's body, such as a head or a hand, based on the image data. To do. Then, it is recognized whether or not the grasped predetermined part has moved in a predetermined direction by a predetermined body motion reference amount or more. Here, the body motion reference amount is set to an amount corresponding to a motion other than a conscious motion in accordance with the music, such as a user's driving motion or a window opening / closing motion. In addition, it is good also as a structure which provides the some imaging means 300, makes a user image by these imaging means 300, and recognizes the operation | movement of the predetermined part of a body three-dimensionally.

そして、動作リズム認識手段４７４Ａは、身体の所定部分が身体動作基準量以上動いた、すなわち例えば手や頭あるいは胴体が左右方向に身体動作基準量以上動いたことや、手が上下方向に身体動作基準量以上動いたことを認識すると、利用者が楽曲に合わせて意識的に身体を動かしたと判断する。さらに、動作リズム認識手段４７４Ａは、楽曲ＢＰＭ情報をメモリ４３０から取得して、この楽曲ＢＰＭ情報の楽曲ＢＰＭと、例えば手が所定の位置から他の位置まで動くまでの時間の長さと、に基づいて、楽曲ＢＰＭを基準にした動作のＢＰＭ（以下、動作ＢＰＭと称す）を演算して認識する。そして、動作リズム認識手段４７４Ａは、この動作ＢＰＭに基づいて、動作のリズムパターン（以下、動作リズムパターンと称す）を認識し、この動作リズムパターンに関する動作リズムパターン情報をメモリ４３０に記憶させる。 The motion rhythm recognition unit 474A moves the predetermined part of the body more than the body motion reference amount, that is, for example, that the hand, head, or torso moves more than the body motion reference amount in the left-right direction, When recognizing that it has moved more than the reference amount, it is determined that the user has consciously moved his body according to the music. Further, the motion rhythm recognition unit 474A acquires the music BPM information from the memory 430, and based on the music BPM of the music BPM information and, for example, the length of time until the hand moves from a predetermined position to another position. The operation BPM (hereinafter referred to as the operation BPM) based on the music BPM is calculated and recognized. The motion rhythm recognition unit 474A recognizes a motion rhythm pattern (hereinafter referred to as a motion rhythm pattern) based on the motion BPM, and stores motion rhythm pattern information related to the motion rhythm pattern in the memory 430.

動作内容認識手段４７４Ｂは、動作の内容を認識する。具体的には、動作内容認識手段４７４Ｂは、動作リズム認識手段４７４Ａで取得された撮像データに基づいて、身体動作基準量以上動いた身体の所定の部分と、この所定の部分が動いた方向と、を特定する。そして、この特定した所定の部分および動いた方向に基づいて、動作の内容を認識する。例えば、動作内容認識手段４７４Ｂは、頭が左右方向に移動していることを特定した場合に頭を揺らす動作であることを認識し、手が上下方向に移動していることを特定した場合に手を突き上げる動作であることを認識する。そして、動作内容認識手段４７４Ｂは、この動作内容に関する動作内容情報をメモリ４３０に記憶させる。 The operation content recognition unit 474B recognizes the content of the operation. Specifically, the motion content recognition unit 474B is configured to determine a predetermined part of the body that has moved more than the physical motion reference amount based on the imaging data acquired by the motion rhythm recognition unit 474A, and a direction in which the predetermined part has moved. , Specify. Then, the contents of the operation are recognized based on the specified predetermined part and the moving direction. For example, when the action content recognizing means 474B recognizes that the head is moving when it is specified that the head is moving in the left-right direction, and has specified that the hand is moving up and down, Recognize that the action is to push up the hand. Then, the operation content recognition unit 474B stores the operation content information regarding the operation content in the memory 430.

挿入音再生制御手段４７５は、楽曲再生処理手段４７１で再生されている楽曲に合わせて挿入音を再生する制御をする。そして、挿入音再生制御手段４７５は、挿入位置特定手段４７５Ａと、挿入音再生処理手段４７５Ｂと、などを備えている。 The insertion sound reproduction control means 475 controls to reproduce the insertion sound in accordance with the music being reproduced by the music reproduction processing means 471. The insertion sound reproduction control unit 475 includes an insertion position specifying unit 475A, an insertion sound reproduction processing unit 475B, and the like.

挿入位置特定手段４７５Ａは、挿入音を再生する位置、すなわち挿入音を楽曲に挿入する位置（以下、挿入位置と称す）を特定する。具体的には、挿入位置特定手段４７５Ａは、楽曲リズム認識手段４７２で認識された再生中の楽曲の出力波形に基づいて、楽曲の拍子を認識する。さらに、この楽曲の拍子に基づいて、楽曲の小節の区切りを認識する。そして、挿入位置特定手段４７５Ａは、音声対応モードに設定されている場合、言語認識手段４７３Ｂで音声言語が認識された音声が発せられた時点から２つ目に現れる楽曲の小節の始めを挿入位置として特定する。すなわち、挿入位置特定手段４７５Ａは、音声が発せられた時点から予め設定された出力処理期限時間以内に挿入音が出力されるように挿入位置を特定する。 The insertion position specifying means 475A specifies the position where the insertion sound is reproduced, that is, the position where the insertion sound is inserted into the music (hereinafter referred to as the insertion position). Specifically, the insertion position specifying means 475A recognizes the time signature of the music based on the output waveform of the music being played recognized by the music rhythm recognition means 472. Further, based on the time signature of the music, the break of the music bar is recognized. If the voice recognition mode is set, the insertion position specifying unit 475A inserts the beginning of the second measure of the music that appears second from the time when the voice whose speech language is recognized by the language recognition unit 473B is emitted. As specified. That is, the insertion position specifying unit 475A specifies the insertion position so that the insertion sound is output within a preset output processing time limit from the time when the sound is emitted.

例えば、挿入位置特定手段４７５Ａは、図５（Ａ）に示すような再生時間Ｔ１，Ｔ３，Ｔ４，Ｔ５，Ｔ７，Ｔ８，Ｔ９に小節の始めが現れる楽曲が再生されている状態において、図５（Ｂ）に示すような再生時間Ｔ１および再生時間Ｔ３の間の再生時間Ｔ２に発せられた「イェイ」の音声を認識すると、再生時間Ｔ２から２つ目に現れる小節の始め、すなわち再生時間Ｔ４に現れる小節の始めを挿入位置として特定する。さらに、再生時間Ｔ５および再生時間Ｔ７の間の再生時間Ｔ６に発せられた「イェイェイ」の音声を認識すると、再生時間Ｔ８に現れる小節の始めを挿入位置として特定する。 For example, the insertion position specifying means 475A is shown in FIG. 5A in a state in which a piece of music in which the beginning of a bar appears at the reproduction time T1, T3, T4, T5, T7, T8, T9 as shown in FIG. When the “Yay” sound uttered at the reproduction time T2 between the reproduction time T1 and the reproduction time T3 as shown in FIG. 5B is recognized, the beginning of the second measure appearing from the reproduction time T2, that is, the reproduction time T4. The beginning of the bar appearing in is specified as the insertion position. Further, when the “Yay” sound uttered at the reproduction time T6 between the reproduction time T5 and the reproduction time T7 is recognized, the beginning of the bar appearing at the reproduction time T8 is specified as the insertion position.

また、挿入位置特定手段４７５Ａは、動作対応モードに設定されている場合、動作内容認識手段４７４Ｂで動作内容が認識された動作が実施された時点から、２つ目に現れる楽曲の小節の始めを挿入位置として特定する。すなわち、挿入位置特定手段４７５Ａは、動作が実施された時点から出力処理期限時間以内に挿入音が出力されるように挿入位置を特定する。例えば、挿入位置特定手段４７５Ａは、図６（Ａ）に示すような図５（Ａ）と同じ楽曲が再生されている状態において、図６（Ｂ）に示すような再生時間Ｔ２に実施された「手を突き上げる」という動作を認識すると、再生時間Ｔ４に現れる小節の始めを挿入位置として特定する。さらに、再生時間Ｔ６に実施された「頭を揺らす」という動作を認識すると、再生時間Ｔ８に現れる小節の始めを挿入位置として特定する。 In addition, when the operation position mode is set, the insertion position specifying unit 475A starts the second musical measure from the time when the operation content is recognized by the operation content recognition unit 474B. Specify the insertion position. That is, the insertion position specifying unit 475A specifies the insertion position so that the insertion sound is output within the output processing time limit from when the operation is performed. For example, the insertion position specifying unit 475A is performed at the reproduction time T2 as shown in FIG. 6B in the state where the same music as that shown in FIG. 5A is being reproduced as shown in FIG. 6A. When recognizing the action of “pushing up the hand”, the beginning of the bar appearing at the reproduction time T4 is specified as the insertion position. Further, when the operation of “shaking the head” performed at the reproduction time T6 is recognized, the beginning of the bar appearing at the reproduction time T8 is specified as the insertion position.

なお、ここでは、例えば音声が発せられた時点から２つ目に現れる小節の始めを挿入位置として特定する構成について例示するが、これに限らず音声が発せられた時点から１つ目や３つ目の小節の始め、あるいは所定の小節の終わり、さらには所定の小節の始めから終わりの間を挿入位置として特定する構成としてもよい。 Here, for example, a configuration is described in which the beginning of the second measure appearing from the time when the sound is emitted is specified as the insertion position, but the present invention is not limited to this, and the first and third from the time when the sound is emitted are exemplified. A configuration may be used in which the beginning of the eye bar, or the end of a predetermined bar, or the start to end of a predetermined bar is specified as the insertion position.

挿入音再生処理手段４７５Ｂは、音声や動作に対応する挿入音を楽曲に合わせて再生する処理をする。具体的には、挿入音再生処理手段４７５Ｂは、音声対応モードに設定されている場合、メモリ４３０から音声リズムパターン情報および音声言語情報を取得する。さらに、音声リズムパターン情報の音声リズムパターンに略一致するリズムパターンに関するリズムパターン情報６１２Ａと、音声言語情報の音声言語に略一致する言語に関する対象言語情報６１２Ｂと、が組み込まれた挿入音個別データ６１０を挿入音リストデータ６００から検索する。そして、挿入音再生処理手段４７５Ｂは、この検索した挿入音個別データ６１０に組み込まれた挿入音データ６１１を取得する。さらに、挿入音再生処理手段４７５Ｂは、この挿入音データ６１１のＢＰＭに関する情報の設定値を楽曲ＢＰＭ情報の楽曲ＢＰＭに対応する値に設定する。そして、挿入音再生処理手段４７５Ｂは、挿入位置特定手段４７５Ａで特定された挿入位置において挿入音が出力されるように、この取得した挿入音データ６１１を音声信号として発音手段４５０へ出力する。 The insertion sound reproduction processing means 475B performs processing for reproducing the insertion sound corresponding to the voice or the operation in accordance with the music. Specifically, the insertion sound reproduction processing unit 475B acquires the sound rhythm pattern information and the sound language information from the memory 430 when the sound corresponding mode is set. Furthermore, inserted sound individual data 610 in which rhythm pattern information 612A related to a rhythm pattern substantially matching the sound rhythm pattern of the sound rhythm pattern information and target language information 612B related to a language substantially matching the sound language of the sound language information are incorporated. Is searched from the insertion sound list data 600. Then, the insertion sound reproduction processing means 475B acquires the insertion sound data 611 incorporated in the searched insertion sound individual data 610. Further, the insertion sound reproduction processing means 475B sets the setting value of the information related to the BPM of the insertion sound data 611 to a value corresponding to the music BPM of the music BPM information. Then, the insertion sound reproduction processing means 475B outputs the acquired insertion sound data 611 as a sound signal to the sound generation means 450 so that an insertion sound is output at the insertion position specified by the insertion position specification means 475A.

例えば、挿入音再生処理手段４７５Ｂは、図５（Ｂ）に示すような「イェイ」の音声に対応する音声リズムパターン情報および音声言語情報に基づいて、ドラムを奏でる音である「ジャカジャン」の挿入音を出力させるための挿入音データ６１１を取得する。そして、挿入音再生処理手段４７５Ｂは、この挿入音データ６１１のＢＰＭに関する情報を楽曲ＢＰＭに対応させて設定して、図５（Ｃ）に示すように、「イェイ」の音声に対応する挿入位置、すなわち再生時間Ｔ４に「ジャカジャン」の挿入音の再生が開始され、再生時間Ｔ５に終了されるように出力する。また、挿入音再生処理手段４７５Ｂは、「イェイェイ」の音声に対応する挿入音データ６１１として、ドラムを奏でる音である「ジャカジャカジャン」の挿入音を出力させるための挿入音データ６１１を取得する。そして、「イェイェイ」の音声に対応して、再生時間Ｔ８に「ジャカジャカジャン」の挿入音の再生が開始され、再生時間Ｔ９に終了されるように挿入音データ６１１を出力する。さらに、挿入音再生処理手段４７５Ｂは、ここでは図示しないが「ヒュウ」の音声に対応する挿入音データ６１１として、和太鼓を奏でる音である「ドンドン」の挿入音を出力させるための挿入音データ６１１を取得して出力する。 For example, the inserted sound reproduction processing means 475B inserts “Jakajan” which is a sound of playing a drum based on the audio rhythm pattern information and the audio language information corresponding to the “Yay” audio as shown in FIG. Insert sound data 611 for outputting a sound is acquired. Then, the insertion sound reproduction processing means 475B sets information related to the BPM of the insertion sound data 611 in association with the music BPM, and as shown in FIG. 5C, the insertion position corresponding to the voice of “Yay” In other words, the reproduction of the “Jaca Jean” insertion sound is started at the reproduction time T4 and is output at the reproduction time T5. Also, the insertion sound reproduction processing means 475B acquires the insertion sound data 611 for outputting the insertion sound of “Jakajakajan”, which is a sound of playing a drum, as the insertion sound data 611 corresponding to the “Yay” sound. Then, in response to the voice of “Yay Ya”, the reproduction of the insertion sound of “Jakajakajan” is started at the reproduction time T8, and the insertion sound data 611 is output so as to end at the reproduction time T9. Furthermore, although not shown here, the inserted sound reproduction processing means 475B inserts the inserted sound data for outputting the inserted sound of “Don Don”, which is a sound of a Japanese drum, as the inserted sound data 611 corresponding to the sound of “Hyu”. 611 is acquired and output.

また、挿入音再生処理手段４７５Ｂは、動作対応モードに設定されている場合、メモリ４３０から動作リズムパターン情報および動作内容情報を取得する。さらに、動作リズムパターン情報の動作リズムパターンに略一致するリズムパターンに関するリズムパターン情報６１２Ａと、動作内容情報の動作内容に略一致する動きに関する対象動作情報６１２Ｃと、が組み込まれた挿入音個別データ６１０を挿入音リストデータ６００から検索する。そして、挿入音再生処理手段４７５Ｂは、この検索した挿入音個別データ６１０に組み込まれた挿入音データ６１１を、挿入位置特定手段４７５Ａで特定された挿入位置において挿入音が出力されるように、音声信号として発音手段４５０へ出力する。 Further, the insertion sound reproduction processing unit 475B acquires the operation rhythm pattern information and the operation content information from the memory 430 when the operation corresponding mode is set. Furthermore, inserted sound individual data 610 in which rhythm pattern information 612A related to a rhythm pattern that substantially matches the motion rhythm pattern of the motion rhythm pattern information and target motion information 612C related to a motion that substantially matches the motion content of the motion content information are incorporated. Is searched from the insertion sound list data 600. Then, the insertion sound reproduction processing means 475B outputs the insertion sound data 611 incorporated in the searched insertion sound individual data 610 so that the insertion sound is output at the insertion position specified by the insertion position specifying means 475A. The signal is output to the sound generation means 450 as a signal.

例えば、挿入音再生処理手段４７５Ｂは、図６（Ｂ）に示すような「手を突き上げる」という動作に対応する動作リズムパターン情報および動作内容情報に基づいて、シンバルを奏でる音である「ジャーン」の挿入音を出力させるための挿入音データ６１１を取得する。そして、挿入音再生処理手段４７５Ｂは、図６（Ｃ）に示すように、「手を突き上げる」という動作に対応して、再生時間Ｔ４に「ジャーン」の挿入音の再生が開始され、再生時間Ｔ５に終了されるように挿入音データ６１１を出力する。また、挿入音再生処理手段４７５Ｂは、「頭を揺らす」という動作に対応する挿入音データ６１１として、ピアノを奏でる音である「ポロローン」の挿入音を出力させるための挿入音データ６１１を取得する。そして、「頭を揺らす」という動作に対応して、再生時間Ｔ８に「ポロローン」の挿入音の再生が開始され、再生時間Ｔ９に終了されるように挿入音データ６１１を出力する。 For example, the inserted sound reproduction processing means 475B is a sound that plays a cymbal based on motion rhythm pattern information and motion content information corresponding to the motion of “pushing up a hand” as shown in FIG. The insertion sound data 611 for outputting the insertion sound is acquired. Then, as shown in FIG. 6C, the insertion sound reproduction processing means 475B starts reproduction of the insertion sound of “Jarn” at the reproduction time T4 in response to the operation of “pushing up the hand”. The insertion sound data 611 is output so as to end at T5. Also, the insertion sound reproduction processing means 475B acquires the insertion sound data 611 for outputting the insertion sound of “Pololone” that is a sound of playing the piano, as the insertion sound data 611 corresponding to the action of “shaking the head”. . Then, in response to the operation of “shaking the head”, the reproduction of the insertion sound of “Pololone” is started at the reproduction time T8, and the insertion sound data 611 is output so as to end at the reproduction time T9.

〔楽曲再生システムの動作〕
次に、楽曲再生システム１００の動作を図面を参照して説明する。 [Operation of music playback system]
Next, the operation of the music playback system 100 will be described with reference to the drawings.

（音声対応モードにおける挿入音出力処理）
まず、楽曲再生システム１００の動作として、音声対応モードにおける挿入音出力処理について図７に基づいて説明する。ここでは、図５（Ａ）に示すような楽曲に合わせて、利用者が図５（Ｂ）に示すような音声を発音した状態を例示して説明する。図７は、音声対応モードにおける挿入音出力処理を示すフローチャートである。 (Inserted sound output processing in voice-compatible mode)
First, as an operation of the music playback system 100, an insertion sound output process in the voice corresponding mode will be described with reference to FIG. Here, a state in which the user has produced a sound as shown in FIG. 5B according to the music as shown in FIG. 5A will be described as an example. FIG. 7 is a flowchart showing an insertion sound output process in the voice corresponding mode.

まず、利用者は、入力手段４４０の入力操作により、音声対応モードで所定の楽曲の再生を要求する旨を設定入力する。そして、楽曲再生システム１００の楽曲再生装置４００は、図７に示すように、演算手段４７０の楽曲再生処理手段４７１にて、音声対応モードで所定の楽曲の再生を要求する旨の要求を認識すると、この所定の楽曲に対応する楽曲データ５１１に基づいて楽曲を再生する（ステップＳ１０１）。この後、演算手段４７０は、楽曲リズム認識手段４７２にて、再生中の楽曲の楽曲ＢＰＭを認識し（ステップＳ１０２）、この楽曲ＢＰＭに関する楽曲ＢＰＭ情報をメモリ４３０に記憶させる。ここでは、演算手段４７０は、図５（Ａ）に示すような楽曲を再生して、この楽曲に対応する楽曲ＢＰＭ情報を記憶させる。さらに、演算手段４７０は、音声状態認識手段４７３の音声リズム認識手段４７３Ａにて、マイク２００から出力される車内音情報の車内音を所定時間だけ取得して（ステップＳ１０３）、この車内音から基準音量以下の音を削除する（ステップＳ１０４）。そして、音声リズム認識手段４７３Ａは、車内音から削除した音の時間長が音声認識基準時間以下か否かを判断する（ステップＳ１０５）。 First, the user inputs a setting for requesting reproduction of a predetermined music piece in the voice corresponding mode by an input operation of the input unit 440. Then, as shown in FIG. 7, the music playback device 400 of the music playback system 100 recognizes a request for requesting playback of a predetermined music in the audio corresponding mode in the music playback processing unit 471 of the calculation unit 470. The music is reproduced based on the music data 511 corresponding to the predetermined music (step S101). Thereafter, the calculation means 470 recognizes the music BPM of the music being reproduced by the music rhythm recognition means 472 (step S102), and stores the music BPM information related to the music BPM in the memory 430. Here, the calculation means 470 reproduces a song as shown in FIG. 5A and stores the song BPM information corresponding to this song. Further, the calculation means 470 acquires the in-vehicle sound of the in-vehicle sound information output from the microphone 200 for a predetermined time by the audio rhythm recognition means 473A of the audio state recognition means 473 (step S103), and uses the in-vehicle sound as a reference. The sound below the volume is deleted (step S104). Then, the voice rhythm recognition means 473A determines whether or not the time length of the sound deleted from the vehicle interior sound is equal to or shorter than the voice recognition reference time (step S105).

このステップＳ１０５において、音声認識基準時間以下ではない、すなわち音声認識基準時間よりも長いと判断した場合、車内音に利用者が楽曲に合わせて意識的に発音した音声が含まれていないと認識する。そして、音声リズム認識手段４７３Ａは、ステップＳ１０３に戻り車内音を取得する処理をする。一方、ステップＳ１０５において、音声リズム認識手段４７３Ａは、音声認識基準時間以下であると判断した場合、車内音に利用者が意識的に発音した音声が含まれていると認識する。ここでは、音声リズム認識手段４７３Ａは、再生時間Ｔ２や再生時間Ｔ６を含む時間の車内音を取得した際に、車内音に利用者が意識的に発音した「イェイ」や「イェイェイ」という音声が含まれていると認識する。そして、メモリ４３０に記憶された楽曲ＢＰＭ情報の楽曲ＢＰＭ、および、音声の出力波形のピーク間隔に基づいて、楽曲ＢＰＭを基準にした音声ＢＰＭを認識する（ステップＳ１０６）。さらに、音声リズム認識手段４７３Ａは、音声ＢＰＭに基づいて、音声に対応する音声リズムパターンを認識し（ステップＳ１０７）、この音声リズムパターンに関する音声リズムパターン情報をメモリ４３０に記憶させる。また、音声状態認識手段４７３は、言語認識手段４７３Ｂにて、音声リズム認識手段４７３Ａで外部リズムパターンが認識された音声の音声言語を認識し（ステップＳ１０８）、この音声言語に関する音声言語情報をメモリ４３０に記憶させる。ここでは、言語認識手段４７３Ｂは、音声リズム認識手段４７３Ａで再生時間Ｔ２や再生時間Ｔ６を含む時間の車内音が取得された際に、「イェイ」や「イェイェイ」という言語を音声言語として認識して、この音声言語に関する音声言語情報を記憶させる。 If it is determined in step S105 that the time is not shorter than the voice recognition reference time, that is, longer than the voice recognition reference time, it is recognized that the in-vehicle sound does not include the voice that the user consciously pronounced according to the music. . Then, the voice rhythm recognition unit 473A returns to step S103 and performs processing for acquiring the in-vehicle sound. On the other hand, if it is determined in step S105 that the voice rhythm recognition unit 473A is equal to or shorter than the voice recognition reference time, the voice rhythm recognition unit 473A recognizes that the in-vehicle sound includes voice consciously pronounced by the user. Here, when the sound rhythm recognition means 473A acquires the in-vehicle sound of the time including the reproduction time T2 and the reproduction time T6, the sound “Yay” or “Yay” uttered by the user consciously in the in-vehicle sound is obtained. Recognize that it is included. Then, based on the music BPM of the music BPM information stored in the memory 430 and the peak interval of the audio output waveform, the audio BPM based on the music BPM is recognized (step S106). Further, the voice rhythm recognition unit 473A recognizes the voice rhythm pattern corresponding to the voice based on the voice BPM (step S107), and stores the voice rhythm pattern information related to the voice rhythm pattern in the memory 430. Further, the voice state recognition unit 473 recognizes the voice language of the voice whose external rhythm pattern is recognized by the voice rhythm recognition unit 473A in the language recognition unit 473B (step S108), and stores the voice language information related to the voice language in the memory. 430 is stored. Here, the language recognizing unit 473B recognizes the language “Yay” or “Yayy” as a sound language when the sound rhythm recognizing unit 473A obtains the in-vehicle sound including the reproduction time T2 and the reproduction time T6. Then, the speech language information related to this speech language is stored.

この後、演算手段４７０は、挿入音再生制御手段４７５の挿入音再生処理手段４７５Ｂにて、メモリ４３０に記憶された音声リズムパターン情報および音声言語情報に基づいて、音声リズムパターンおよび音声言語に対応する挿入音個別データ６１０を検索する。そして、挿入音再生処理手段４７５Ｂは、この検索した挿入音個別データ６１０の挿入音データ６１１を取得する（ステップＳ１０９）。さらに、挿入音再生制御手段４７５は、挿入位置特定手段４７５Ａにて、再生中の楽曲の小節の区切りを認識し、音声言語が認識された音声が発せられた時点から、２つ目に現れる楽曲の小節の始めを挿入位置として特定する（ステップＳ１１０）。そして、挿入音再生処理手段４７５Ｂは、挿入位置特定手段４７５Ａで特定された挿入位置に対応する時間に挿入音データ６１１を再生して（ステップＳ１１１）、楽曲に合わせた挿入音を出力する。ここでは、挿入音再生制御手段４７５は、図５（Ｃ）に示すように、「イェイ」の音声に対応する「ジャカジャン」の挿入音を再生時間Ｔ４から再生時間Ｔ５まで再生し、「イェイェイ」の音声に対応する「ジャカジャカジャン」の挿入音を再生時間Ｔ８から再生時間Ｔ９まで再生する。この後、音声リズム認識手段４７３Ａは、楽曲の再生が終了したか否かを判断する（ステップＳ１１２）。このステップＳ１１２において、楽曲の再生が終了したと判断した場合、音声対応モードにおける挿入音出力処理を終了する。一方、音声リズム認識手段４７３Ａは、楽曲の再生が終了していないと判断した場合、ステップＳ１０３に戻り音声を取得する処理をする。 Thereafter, the calculation means 470 corresponds to the voice rhythm pattern and the voice language based on the voice rhythm pattern information and the voice language information stored in the memory 430 by the insertion sound reproduction processing means 475B of the insertion sound reproduction control means 475. The inserted sound individual data 610 to be searched is searched. Then, the insertion sound reproduction processing means 475B acquires the insertion sound data 611 of the searched insertion sound individual data 610 (step S109). Further, the insertion sound reproduction control means 475 recognizes the break of the measure of the music being reproduced by the insertion position specifying means 475A, and the music that appears second from the point in time when the sound whose speech language is recognized is emitted. The beginning of the bar is specified as the insertion position (step S110). Then, the insertion sound reproduction processing means 475B reproduces the insertion sound data 611 at a time corresponding to the insertion position specified by the insertion position specifying means 475A (step S111), and outputs an insertion sound that matches the music. Here, as shown in FIG. 5C, the insertion sound reproduction control means 475 reproduces the insertion sound of “Jakajan” corresponding to the sound of “Yay” from the reproduction time T4 to the reproduction time T5, and “Yay”. The inserted sound of “Jakajakajan” corresponding to the voice of “3” is reproduced from the reproduction time T8 to the reproduction time T9. Thereafter, the voice rhythm recognition unit 473A determines whether or not the reproduction of the music has ended (step S112). If it is determined in step S112 that the music has been reproduced, the insertion sound output process in the voice corresponding mode is terminated. On the other hand, when the sound rhythm recognition unit 473A determines that the reproduction of the music has not ended, the sound rhythm recognition unit 473A returns to step S103 to perform a process of acquiring sound.

（動作対応モードにおける挿入音出力処理）
次に、楽曲再生システム１００の動作として、動作対応モードにおける挿入音出力処理について図８に基づいて説明する。ここでは、図６（Ａ）に示すような楽曲に合わせて、利用者が図６（Ｂ）に示すような動作を実施した状態を例示して説明する。なお、音声対応モードにおける挿入音出力処理で実施される処理と同様の処理については、説明を簡略化する。図８は、動作対応モードにおける挿入音出力処理を示すフローチャートである。 (Inserted sound output processing in motion-compatible mode)
Next, as an operation of the music reproduction system 100, an insertion sound output process in the operation corresponding mode will be described with reference to FIG. Here, a state in which the user has performed the operation shown in FIG. 6B according to the music shown in FIG. 6A will be described as an example. In addition, description is simplified about the process similar to the process implemented by the insertion sound output process in audio | voice corresponding | compatible mode. FIG. 8 is a flowchart showing the insertion sound output process in the operation support mode.

まず、利用者は、入力手段４４０の入力操作により、動作対応モードで所定の楽曲の再生を要求する旨を設定入力する。そして、楽曲再生システム１００の楽曲再生装置４００は、図８に示すように、演算手段４７０の楽曲再生処理手段４７１にて、動作対応モードで所定の楽曲の再生を要求する旨の要求を認識すると、この所定の楽曲を再生する（ステップＳ２０１）。この後、演算手段４７０は、楽曲リズム認識手段４７２にて、再生中の楽曲の楽曲ＢＰＭを認識して（ステップＳ２０２）、楽曲ＢＰＭ情報をメモリ４３０に記憶させる。さらに、演算手段４７０は、動作状態認識手段４７４の動作リズム認識手段４７４Ａにて、撮像手段３００から出力される撮像データを所定時間だけ取得して（ステップＳ２０３）、利用者の身体の所定部分が身体動作基準量以上動いたか否かを判断する（ステップＳ２０４）。 First, the user inputs a setting for requesting reproduction of a predetermined music piece in the operation-compatible mode by an input operation of the input unit 440. Then, as shown in FIG. 8, the music playback device 400 of the music playback system 100 recognizes a request for requesting playback of a predetermined music in the operation corresponding mode in the music playback processing unit 471 of the calculation unit 470. The predetermined music is reproduced (step S201). Thereafter, the calculation means 470 causes the music rhythm recognition means 472 to recognize the music BPM of the music being reproduced (step S202) and store the music BPM information in the memory 430. Further, the calculation means 470 acquires the imaging data output from the imaging means 300 for a predetermined time by the motion rhythm recognition means 474A of the motion state recognition means 474 (step S203), and the predetermined part of the user's body is detected. It is determined whether or not the body movement reference amount has been exceeded (step S204).

このステップＳ２０４において、身体動作基準量以上動いていないと判断した場合、利用者が楽曲に合わせて意識的に身体を動かしていないと認識する。そして、動作リズム認識手段４７４Ａは、ステップＳ２０３に戻り撮像データを取得する処理をする。一方、ステップＳ２０４において、動作リズム認識手段４７４Ａは、身体動作基準量以上動いていると判断した場合、利用者が楽曲に合わせて意識的に身体を動かしたと認識する。ここでは、動作リズム認識手段４７４Ａは、再生時間Ｔ２や再生時間Ｔ６を含む時間の撮像データを取得した際に、利用者が意識的に「手を突き上げる」という動作や「頭を揺らす」という動作を実施したと認識する。そして、メモリ４３０に記憶された楽曲ＢＰＭ情報の楽曲ＢＰＭ、および、身体の所定部分が所定の位置から他の位置まで動くまでの時間の長さに基づいて、楽曲ＢＰＭを基準にした動作ＢＰＭを認識する（ステップＳ２０５）。さらに、動作リズム認識手段４７４Ａは、動作ＢＰＭに基づいて、動作に対応する動作リズムパターンを認識し（ステップＳ２０６）、この動作リズムパターンに関する動作リズムパターン情報をメモリ４３０に記憶させる。また、動作状態認識手段４７４は、動作内容認識手段４７４Ｂにて、動作リズム認識手段４７４Ａで動作リズムパターンが認識された動作の動作内容を認識し（ステップＳ２０７）、この動作内容に関する動作内容情報をメモリ４３０に記憶させる。ここでは、動作内容認識手段４７４Ｂは、動作リズム認識手段４７４Ａで再生時間Ｔ２や再生時間Ｔ６を含む時間の撮像データが取得された際に、「手を突き上げる」という動作や「頭を揺らす」という動作を動作内容として認識して、この動作内容に関する動作内容を記憶させる。 In this step S204, when it is determined that the body movement reference amount is not moved, it is recognized that the user has not intentionally moved the body according to the music. Then, the motion rhythm recognition unit 474A returns to step S203 and performs processing for acquiring imaging data. On the other hand, in step S204, when it is determined that the movement rhythm recognition unit 474A is moving more than the physical movement reference amount, the movement rhythm recognition unit 474A recognizes that the user has consciously moved the body according to the music. Here, the motion rhythm recognition means 474A is an operation in which the user consciously “picks up his hand” or “shakes his head” when acquiring imaging data for a time including the playback time T2 and the playback time T6. We recognize that we performed. Then, based on the music BPM of the music BPM information stored in the memory 430 and the length of time until the predetermined part of the body moves from the predetermined position to another position, the operation BPM based on the music BPM is calculated. Recognize (step S205). Furthermore, the motion rhythm recognition unit 474A recognizes the motion rhythm pattern corresponding to the motion based on the motion BPM (step S206), and stores the motion rhythm pattern information regarding the motion rhythm pattern in the memory 430. Further, the motion state recognition unit 474 recognizes the motion content of the motion whose motion rhythm pattern has been recognized by the motion rhythm recognition unit 474A by the motion content recognition unit 474B (step S207), and obtains motion content information regarding the motion content. Store in the memory 430. Here, the action content recognition means 474B is said to “pick up the hand” or “shake the head” when the action rhythm recognition means 474A acquires imaging data for a time including the reproduction time T2 and the reproduction time T6. The operation is recognized as the operation content, and the operation content related to the operation content is stored.

この後、演算手段４７０は、挿入音再生制御手段４７５の挿入音再生処理手段４７５Ｂにて、メモリ４３０に記憶された動作リズムパターン情報および動作内容情報に基づいて、動作リズムパターンおよび動作内容に対応する挿入音データ６１１を取得する（ステップＳ２０８）。さらに、挿入音再生制御手段４７５は、挿入位置特定手段４７５Ａにて、動作内容が認識された動作が実施された時点から、２つ目に現れる楽曲の小節の始めを挿入位置として特定する（ステップＳ２０９）。そして、挿入音再生処理手段４７５Ｂは、挿入位置特定手段４７５Ａで特定された挿入位置に対応する時間に挿入音データ６１１を再生して（ステップＳ２１０）、楽曲に合わせた挿入音を出力する。ここでは、挿入音再生制御手段４７５は、図６（Ｃ）に示すように、「手を突き上げる」という動作に対応する「ジャーン」の挿入音を再生時間Ｔ４から再生時間Ｔ５まで再生し、「頭を揺らす」という動作に対応する「ポロローン」の挿入音を再生時間Ｔ８から再生時間Ｔ９まで再生する。この後、動作リズム認識手段４７４Ａは、楽曲の再生が終了したか否かを判断する（ステップＳ２１１）。このステップＳ２１１において、楽曲の再生が終了したと判断した場合、動作対応モードにおける挿入音出力処理を終了する。一方、動作リズム認識手段４７４Ａは、楽曲の再生が終了していないと判断した場合、ステップＳ２０３に戻り撮像データを取得する処理をする。 Thereafter, the calculation unit 470 responds to the operation rhythm pattern and the operation content based on the operation rhythm pattern information and the operation content information stored in the memory 430 by the insertion sound reproduction processing unit 475B of the insertion sound reproduction control unit 475. Insert sound data 611 to be acquired is acquired (step S208). Further, the insertion sound reproduction control means 475 specifies the beginning of the second measure of the music appearing as the insertion position from the point in time when the insertion position specifying means 475A performs the action whose action content is recognized (step). S209). Then, the insertion sound reproduction processing means 475B reproduces the insertion sound data 611 at a time corresponding to the insertion position specified by the insertion position specifying means 475A (step S210), and outputs an insertion sound that matches the music. Here, as shown in FIG. 6C, the insertion sound reproduction control means 475 reproduces the insertion sound of “Jarn” corresponding to the operation of “pick up the hand” from the reproduction time T4 to the reproduction time T5. The insertion sound of “Pololone” corresponding to the action of “shaking the head” is reproduced from the reproduction time T8 to the reproduction time T9. Thereafter, the motion rhythm recognition unit 474A determines whether or not the reproduction of the music has ended (step S211). If it is determined in step S211 that the music has been reproduced, the insertion sound output process in the motion corresponding mode is terminated. On the other hand, when the motion rhythm recognition unit 474A determines that the reproduction of the music has not ended, the operation rhythm recognition unit 474A returns to step S203 to perform processing for acquiring imaging data.

〔楽曲再生システムの作用効果〕
上述したように、上記実施の形態では、楽曲再生装置４００の演算手段４７０は、楽曲再生処理手段４７１にて、例えば音声対応モードで所定の楽曲を再生する旨の要求を認識すると、この所定の楽曲の再生を開始する。また、演算手段４７０は、音声状態認識手段４７３の音声リズム認識手段４７３Ａにて、マイク２００で集音された例えば「イェイ」の音声を含む車内音を取得する。さらに、音声リズム認識手段４７３Ａは、再生中の楽曲の楽曲ＢＰＭを基準にした「イェイ」の音声の音声ＢＰＭを認識し、この音声ＢＰＭに対応する音声の音声リズムパターンを認識する。この後、演算手段４７０は、挿入音再生制御手段４７５の挿入音再生処理手段４７５Ｂにて、音声リズムパターンに対応するリズムパターン情報６１２Ａに関連付けられた挿入音データ６１１を取得する。そして、挿入音再生処理手段４７５Ｂは、この挿入音データ６１１に基づいて音声リズムパターンに対応する挿入音、すなわち楽曲の特性である楽曲ＢＰＭに合わせた挿入音を発音手段４５０から出力させる。このため、楽曲再生装置４００は、楽曲に同調する挿入音を挿入するので、楽曲の聴取の際に利用者が感じる違和感を低減できる。また、利用者は、例えば「イェイ」の音声に対して楽曲の特性に同調する「ジャカジャン」の挿入音が楽曲に挿入されるので、自ら発する音声に楽曲が反応してくれているような感覚を体感でき、楽曲の聴取をより楽しむことができる。したがって、楽曲再生装置４００は、挿入音を良好に出力できる。 [Effects of music playback system]
As described above, in the above embodiment, when the calculation means 470 of the music playback device 400 recognizes a request to play back a predetermined music in, for example, the voice-compatible mode, the music playback processing means 471 recognizes this predetermined music. Start playing the song. In addition, the calculation unit 470 obtains in-vehicle sound including, for example, “Yay” sound collected by the microphone 200 by the voice rhythm recognition unit 473A of the voice state recognition unit 473. Furthermore, the voice rhythm recognition unit 473A recognizes the voice BPM of the “Yay” voice based on the music BPM of the music being played, and recognizes the voice rhythm pattern of the voice corresponding to the voice BPM. Thereafter, the calculation means 470 acquires the insertion sound data 611 associated with the rhythm pattern information 612A corresponding to the sound rhythm pattern by the insertion sound reproduction processing means 475B of the insertion sound reproduction control means 475. Then, the insertion sound reproduction processing means 475B causes the sound generation means 450 to output an insertion sound corresponding to the sound rhythm pattern based on the insertion sound data 611, that is, an insertion sound that matches the music BPM that is the characteristic of the music. For this reason, since the music reproducing device 400 inserts an insertion sound that synchronizes with the music, it is possible to reduce a sense of discomfort that the user feels when listening to the music. In addition, the user feels that the music is responding to the sound he / she makes, for example, because the insertion sound of “Jakajan” that synchronizes with the characteristics of the music is inserted into the music with respect to the sound of “Yay” You can enjoy listening to music more. Therefore, the music reproducing device 400 can output the insertion sound satisfactorily.

さらに、挿入音再生処理手段４７５Ｂは、利用者により発せられた音声の音声リズムパターンに対応する挿入音を出力させる。具体的には、挿入音再生処理手段４７５Ｂは、例えば「イェイ」に対応して「ジャカジャン」の挿入音を出力させ、「イェイェイ」に対して「ジャカジャカジャン」の挿入音を出力させる。このため、利用者は、自ら発する音声に対応するリズムパターンの挿入音により、楽曲がより良好に反応してくれているような感覚を体感できる。したがって、楽曲再生装置４００は、挿入音をより良好に出力できる。 Furthermore, the insertion sound reproduction processing means 475B outputs an insertion sound corresponding to the sound rhythm pattern of the sound emitted by the user. Specifically, the insertion sound reproduction processing means 475B, for example, outputs an insertion sound of “Jakajan” corresponding to “Yay” and outputs an insertion sound of “Jakajakajan” to “Yayye”. For this reason, the user can feel the sensation that the music is responding more favorably by the insertion sound of the rhythm pattern corresponding to the sound uttered by the user. Therefore, the music reproducing device 400 can output the insertion sound more satisfactorily.

そして、挿入音再生処理手段４７５Ｂは、音声の音声言語や動作の内容に対応した挿入音を出力させる。具体的には、挿入音再生処理手段４７５Ｂは、「イェイ」の音声に対応して「ジャカジャン」のドラム音を、「ヒュウ」の音声に対応して「ドンドン」の和太鼓音を、「手を突き上げる」という動作に対応して「ジャーン」のシンバル音を、「頭を揺らす」という動作に対応して「ポロローン」のピアノ音を、それぞれ出力させる。このため、利用者は、音声や動作の内容に対応して出力される互いに異なる挿入音により、楽曲がさらに良好に反応してくれているような感覚を体感できる。したがって、楽曲再生装置４００は、挿入音をさらに良好に出力できる。 Then, the insertion sound reproduction processing unit 475B outputs an insertion sound corresponding to the voice language of the voice and the content of the operation. Specifically, the inserted sound reproduction processing means 475B outputs the drum sound of “Jakajan” corresponding to the sound of “Yay”, the Japanese drum sound of “Don Don” corresponding to the sound of “Hyu”, "Jurn" cymbal sound is output in response to the action of "pushing up", and "Pololone" piano sound is output in response to the action of "shaking the head". For this reason, the user can experience a sensation that the music is responding more satisfactorily by using different insertion sounds that are output corresponding to the voice and the content of the operation. Therefore, the music reproducing device 400 can output the insertion sound even better.

また、挿入音再生処理手段４７５Ｂは、楽曲の特性に同調する挿入音として楽曲ＢＰＭに合わせた挿入音を出力させる。このため、楽曲再生装置４００は、楽曲ＢＰＭに合わせた挿入音により、楽曲の聴取の際に利用者が感じる違和感をより低減できる。したがって、楽曲再生装置４００は、挿入音をより良好に出力できる。 Further, the insertion sound reproduction processing means 475B outputs an insertion sound that matches the music BPM as an insertion sound synchronized with the characteristics of the music. For this reason, the music reproducing device 400 can further reduce the uncomfortable feeling that the user feels when listening to the music, with the insertion sound that matches the music BPM. Therefore, the music reproducing device 400 can output the insertion sound more satisfactorily.

さらに、挿入位置特定手段４７５Ａは、例えば言語認識手段４７３Ｂにて音声言語が認識された音声が発せられた時点から２つ目に現れる楽曲の小節の始めを、挿入音を挿入する挿入位置として認識する。このため、楽曲再生装置４００は、小節に合わせたタイミングで挿入する挿入音により、楽曲の聴取の際に利用者が感じる違和感をさらに低減できる。したがって、楽曲再生装置４００は、挿入音をさらに良好に出力できる。 Further, the insertion position specifying means 475A recognizes, for example, the beginning of the measure of the music appearing second from the time when the speech whose speech language is recognized by the language recognition means 473B is emitted as the insertion position for inserting the insertion sound. To do. For this reason, the music reproducing apparatus 400 can further reduce the uncomfortable feeling that the user feels when listening to the music, by using the insertion sound inserted at the timing matched with the measure. Therefore, the music reproducing device 400 can output the insertion sound even better.

また、挿入位置特定手段４７５Ａは、例えば音声が発せられた時点から出力処理期限時間以内に挿入音が出力されるように挿入位置を特定する。このため、利用者は、音声を発してから出力処理期限時間以内に出力される挿入音により、楽曲がさらに良好に反応してくれているような感覚を体感できる。したがって、楽曲再生装置４００は、挿入音をさらに良好に出力できる。 Further, the insertion position specifying means 475A specifies the insertion position so that the insertion sound is output within the output processing time limit from the time when the sound is emitted, for example. For this reason, the user can feel the sensation that the music is responding more satisfactorily by the insertion sound that is output within the output processing time limit after the sound is emitted. Therefore, the music reproducing device 400 can output the insertion sound even better.

さらに、挿入音再生処理手段４７５Ｂは、挿入音データ記憶手段４２０に記憶された挿入音データ６１１に基づいて挿入音を出力する。このため、挿入音再生処理手段４７５Ｂは、挿入音データ６１１を取得して出力するだけの簡単な方法で挿入音を出力できる。したがって、挿入音再生処理手段４７５Ｂの処理負荷を減らすことができる。 Further, the insertion sound reproduction processing means 475B outputs an insertion sound based on the insertion sound data 611 stored in the insertion sound data storage means 420. For this reason, the insertion sound reproduction processing means 475B can output the insertion sound by a simple method of acquiring and outputting the insertion sound data 611. Therefore, it is possible to reduce the processing load of the insertion sound reproduction processing means 475B.

また、挿入音データ６１１として、ＭＩＤＩ形式のデータを用いている。そして、挿入音再生処理手段４７５Ｂは、挿入音データ６１１のＢＰＭに関する情報を楽曲ＢＰＭに合わせて設定して挿入音として再生する。このため、挿入音データ６１１のデータ量を最小限に抑えることができる。また、また、挿入音再生処理手段４７５Ｂは、挿入音データ６１１のＢＰＭに関する情報の設定値を変更するだけで挿入音のＢＰＭを変更でき、１つの挿入音データ６１１で複数のＢＰＭに対応させた挿入音を出力させることができる。したがって、挿入音データ記憶手段４２０の容量を小さくでき、楽曲再生装置４００のコストを低減できる。 Further, MIDI format data is used as the insertion sound data 611. Then, the insertion sound reproduction processing means 475B sets information related to the BPM of the insertion sound data 611 according to the music BPM and reproduces it as an insertion sound. For this reason, the data amount of the insertion sound data 611 can be minimized. Further, the insertion sound reproduction processing means 475B can change the BPM of the insertion sound only by changing the setting value of the information related to the BPM of the insertion sound data 611, and one insertion sound data 611 can correspond to a plurality of BPMs. Insert sound can be output. Therefore, the capacity of the inserted sound data storage means 420 can be reduced, and the cost of the music reproducing device 400 can be reduced.

さらに、挿入音データ６１１を対象言語情報６１２Ｂや対象動作情報６１２Ｃに関連付けて記憶する挿入音データ記憶手段４２０を設けている。そして、挿入音再生処理手段４７５Ｂは、例えば音声言語に略一致する言語に関する対象言語情報６１２Ｂを挿入音データ記憶手段４２０から検索し、この検索した対象言語情報６１２Ｂに関連付けられた挿入音データ６１１を取得する。このため、挿入音再生処理手段４７５Ｂは、音声言語に対応する対象言語情報６１２Ｂを検索するだけで、挿入音データ６１１を取得できる。したがって、挿入音再生処理手段４７５Ｂは、挿入音データ６１１をより容易に取得できる。また、挿入音再生処理手段４７５Ｂは、挿入音データ記憶手段４２０を無線媒体や有線媒体のネットワークを介して接続されたサーバ装置に設ける構成と比べて、挿入音データ６１１をより迅速に取得できる。 Further, there is provided an insertion sound data storage means 420 for storing the insertion sound data 611 in association with the target language information 612B and the target motion information 612C. Then, the inserted sound reproduction processing means 475B searches the inserted sound data storage means 420 for target language information 612B related to a language that substantially matches the speech language, for example, and inserts the inserted sound data 611 associated with the searched target language information 612B. get. For this reason, the insertion sound reproduction processing means 475B can acquire the insertion sound data 611 only by searching the target language information 612B corresponding to the speech language. Therefore, the insertion sound reproduction processing unit 475B can acquire the insertion sound data 611 more easily. Also, the insertion sound reproduction processing means 475B can acquire the insertion sound data 611 more quickly than the configuration in which the insertion sound data storage means 420 is provided in a server device connected via a wireless medium or a wired medium network.

そして、挿入音再生処理手段４７５Ｂは、マイク２００で集音した音声に対応した挿入音を楽曲に合わせて出力させる。このため、利用者は、所定の音声を発するだけの簡単な方法で挿入音を出力させることができる。また、利用者は、音声を出力するだけでよいので、例えば運転を中断することなく挿入音を出力させることができる。したがって、利用者は、楽曲の聴取を容易に楽しむことができる。 Then, the insertion sound reproduction processing means 475B outputs an insertion sound corresponding to the sound collected by the microphone 200 in accordance with the music. For this reason, the user can output the insertion sound by a simple method that only emits a predetermined sound. Moreover, since the user only needs to output a sound, for example, an insertion sound can be output without interrupting driving. Therefore, the user can easily enjoy listening to music.

また、挿入音再生処理手段４７５Ｂは、音声リズム認識手段４７３Ａで車内音に基準音量以上の音声が音声認識基準時間以上含まれていることが認識されると、この音声に対応した挿入音を出力させる。このため、楽曲再生装置４００は、例えば車両の走行音や通常の会話などの音に反応して挿入音を出力することがなく、利用者が楽曲に合わせて発した音声のみに反応して挿入音を出力できる。したがって、楽曲再生装置４００は、挿入音をさらに良好に出力できる。 In addition, when the voice rhythm recognition unit 473A recognizes that the sound inside the vehicle includes a sound of a reference volume or higher than the reference sound time, the insertion sound reproduction processing unit 475B outputs an insertion sound corresponding to the sound. Let For this reason, the music playback device 400 does not output an insertion sound in response to, for example, a running sound of a vehicle or a normal conversation, and inserts only in response to a voice uttered by a user according to the music. Sound can be output. Therefore, the music reproducing device 400 can output the insertion sound even better.

そして、挿入音再生処理手段４７５Ｂは、撮像手段３００からの撮像データに撮像された利用者の動作に対応した挿入音を楽曲に合わせて出力させる。このため、利用者は、所定の動作をするだけの簡単な方法で挿入音を出力させることができる。したがって、利用者は、楽曲の聴取をより容易に楽しむことができる。 Then, the insertion sound reproduction processing unit 475B outputs an insertion sound corresponding to the user's action imaged in the imaging data from the imaging unit 300 in accordance with the music. For this reason, the user can output the insertion sound by a simple method that only performs a predetermined operation. Therefore, the user can enjoy listening to music more easily.

さらに、挿入音再生処理手段４７５Ｂは、動作リズム認識手段４７４Ａで撮像データに基づいて身体の所定部分が所定方向に身体動作基準量以上動いたことが認識されると、この身体の所定部分の動作に対応した挿入音を出力させる。このため、楽曲再生装置４００は、例えば運転動作などに反応して挿入音を出力することがなく、利用者が楽曲に合わせて実施した動作のみに反応して挿入音を出力できる。したがって、楽曲再生装置４００は、挿入音をさらに良好に出力できる。 Furthermore, when the motion rhythm recognition unit 474A recognizes that the predetermined part of the body has moved in the predetermined direction by a body motion reference amount or more, the insertion sound reproduction processing unit 475B performs the motion of the predetermined part of the body. The insertion sound corresponding to is output. For this reason, the music reproducing device 400 does not output an insertion sound in response to, for example, a driving operation, and can output an insertion sound only in response to an operation performed by the user in accordance with the music. Therefore, the music reproducing device 400 can output the insertion sound even better.

そして、楽曲再生処理を実施する楽曲再生装置４００に本発明の音出力制御装置を適用している。このため、音声に対応した挿入音を出力でき、楽曲再生装置４００の利便性を高めることができる。 And the sound output control apparatus of this invention is applied to the music reproduction apparatus 400 which implements a music reproduction process. For this reason, the insertion sound corresponding to the sound can be output, and the convenience of the music reproducing device 400 can be enhanced.

〔実施形態の変形〕
なお、本発明は、上述した各実施の形態に限定されるものではなく、本発明の目的を達成できる範囲で以下に示される変形をも含むものである。 [Modification of Embodiment]
In addition, this invention is not limited to each embodiment mentioned above, The deformation | transformation shown below is included in the range which can achieve the objective of this invention.

すなわち、挿入音再生処理手段４７５Ｂにて、楽曲の音の高さ、強さ、および、音色のうちの少なくともいずれか１つに同調する挿入音を出力する構成としてもよい。例えば、楽曲の音量の大きさに対応させて挿入音の音量の大きさを設定する構成や、楽曲の音色に対応させて挿入音の音色を設定する構成などとしてもよい。このような構成にすれば、楽曲再生装置４００は、例えば楽曲の音量よりも著しく大きい音量の挿入音や、楽曲の音色と著しく異なる音色の挿入音を出力することがなくなり、楽曲の聴取の際に利用者が感じる違和感を低減できる。 In other words, the insertion sound reproduction processing means 475B may output an insertion sound that is tuned to at least one of the pitch, strength, and tone color of the music. For example, a configuration in which the volume level of the insertion sound is set in correspondence with the volume level of the music, or a configuration in which the tone color of the insertion sound is set in correspondence with the tone color of the music may be used. With such a configuration, the music playback device 400 does not output an insertion sound having a volume that is significantly higher than the volume of the music or an insertion sound having a tone that is significantly different from the tone of the music, for example. It is possible to reduce the discomfort felt by the user.

また、例えば楽曲リズム認識手段４７２にて、再生中の楽曲のコードの進行を認識するとともにコードの進行予測をする。そして、挿入音再生処理手段４７５Ｂにて、この進行予測されたコードに対応する挿入音を出力する構成などとしてもよい。このような構成にすれば、利用者は、楽曲に調和する挿入音を聴取でき、楽曲の聴取をより楽しむことができる。 Further, for example, the music rhythm recognition unit 472 recognizes the chord progression of the music being played and predicts the chord progression. The inserted sound reproduction processing means 475B may output an inserted sound corresponding to the predicted chord. With such a configuration, the user can listen to the insertion sound that harmonizes with the music, and can more enjoy listening to the music.

そして、挿入音を音声リズムパターンや動作リズムパターンに対応させずに、音声の音声言語や動作の内容にのみ対応させる構成としてもよい。このような構成にすれば、各リズム認識手段４７３Ａ，４７４Ａを設ける必要がなく、演算手段４７０の構成を簡略化できる。したがって、楽曲再生装置４００のコストを低減できる。 And it is good also as a structure which makes an insertion sound respond | correspond only to the audio | voice speech language and the content of an operation | movement, without making it respond | correspond to an audio | voice rhythm pattern or an operation | movement rhythm pattern. With such a configuration, it is not necessary to provide each rhythm recognition unit 473A, 474A, and the configuration of the calculation unit 470 can be simplified. Therefore, the cost of the music reproducing device 400 can be reduced.

また、挿入音を音声の音声言語や動作の内容に対応させずに、音声リズムパターンや動作リズムパターンのみに対応させる構成としてもよい。このような構成にすれば、言語認識手段４７３Ｂや動作内容認識手段４７４Ｂを設ける必要がなく、演算手段４７０の構成を簡略化でき、楽曲再生装置４００のコストを低減できる。 Moreover, it is good also as a structure which makes an insertion sound respond | correspond only to an audio | voice rhythm pattern and an operation | movement rhythm pattern, without making it respond | correspond to the audio | voice speech language and the content of operation | movement. With such a configuration, there is no need to provide the language recognition unit 473B and the operation content recognition unit 474B, the configuration of the calculation unit 470 can be simplified, and the cost of the music reproducing device 400 can be reduced.

さらに、挿入音を音声や動作に対応させずに、挿入音再生処理手段４７５Ｂにて、例えば楽曲ＢＰＭに合わせた挿入音を選択して出力させる構成などとしてもよい。このような構成にすれば、各状態認識手段４７３，４７４を設ける必要がなく、演算手段４７０の構成をさらに簡略化でき、楽曲再生装置４００のコストをさらに低減できる。 Furthermore, it is good also as a structure etc. which select and output the insertion sound matched with the music BPM in the insertion sound reproduction | regeneration processing means 475B, without making an insertion sound respond | correspond to an audio | voice and operation | movement. With such a configuration, it is not necessary to provide the state recognition units 473 and 474, the configuration of the calculation unit 470 can be further simplified, and the cost of the music reproducing device 400 can be further reduced.

また、挿入音再生処理手段４７５Ｂにて、楽曲ＢＰＭに合わせた挿入音ではなく、楽曲の拍子や拍などのリズムに合わせた挿入音を出力する構成としてもよい。例えば、図９（Ａ）に示すような４拍子のリズムの楽曲に対して「ジャカジャン」の挿入音を２回出力する場合、「ジャカジャン」の挿入音を、図９（Ｂ）に示すように２拍で出力する構成や、図９（Ｃ）に示すように１拍で出力する構成としてもよい。さらに、「ジャカジャカジャン」の挿入音を２回出力する場合、「ジャカジャカジャン」の挿入音を、図９（Ｄ）に示すように３拍で出力する構成や、図９（Ｅ）に示すように１拍で出力する構成としてもよい。このような構成にしても、楽曲再生装置４００は、楽曲に同調する挿入音を挿入するので、楽曲の聴取の際に利用者が感じる違和感を低減できる。 Further, the insertion sound reproduction processing means 475B may output an insertion sound that matches a rhythm such as the time signature or beat of the music instead of the insertion sound that matches the music BPM. For example, in the case where the “Jaca Jean” insertion sound is output twice for a music with a 4-beat rhythm as shown in FIG. 9A, the “Jaca Jean” insertion sound is output as shown in FIG. 9B. It is good also as a structure which outputs in 2 beats, or a structure which outputs in 1 beat as shown in FIG.9 (C). Further, when the insertion sound of “Jakajakajan” is output twice, the insertion sound of “Jakajakajan” is output in 3 beats as shown in FIG. 9D, or as shown in FIG. 9E. It is good also as a structure which outputs in 1 beat. Even with such a configuration, the music reproducing device 400 inserts an insertion sound that is in tune with the music, so that it is possible to reduce a sense of incongruity felt by the user when listening to the music.

そして、挿入音再生処理手段４７５Ｂにて、楽曲の始めや終わり、さらには楽曲の盛り上がる部分であるいわゆるサビの部分にそれぞれ同調する挿入音を出力する構成としてもよい。例えば、挿入音再生処理手段４７５Ｂにて、楽曲の始めに対応してゆったりとおとなしめの挿入音を、サブの部分に対応して派手な感じの挿入音である例えば「ジャカジャカジャカジャカジャカジャカジャン」のドラム音を、終わりに対応して終わりを連想させる例えば「デケデケジャカジャカジャカジャカジャーン」のドラム音を出力する構成としてもよい。このような構成にすれば、利用者は、楽曲の曲調に同調する挿入音を聴取でき、楽曲の聴取をさらに楽しむことができる。 Then, the insertion sound reproduction processing means 475B may be configured to output insertion sounds that are respectively tuned to the beginning and end of the music, and further to the so-called rust portion that is the rising portion of the music. For example, in the insertion sound reproduction processing means 475B, an insertion sound that is loose and gentle corresponding to the beginning of the music is inserted into a flashy feeling corresponding to the sub part, for example, “Jakajakajakajakajakajakajan” For example, the drum sound may be configured to output a drum sound of “Dekedeke Jaka Jaka Jaka Jaka Jane”, which is reminiscent of the end corresponding to the end. With such a configuration, the user can listen to the insertion sound that synchronizes with the tune of the music, and can further enjoy listening to the music.

さらに、挿入音再生処理手段４７５Ｂにて、挿入音を楽曲の小節に合わせたタイミングではなく、例えば楽曲の拍子や拍さらにはＢＰＭなどのリズムに合わせたタイミングで出力する構成としてもよい。このような構成にしても、楽曲再生装置４００は、リズムに合わせたタイミングで挿入する挿入音により、楽曲の聴取の際に利用者が感じる違和感をさらに低減でき、挿入音をさらに良好に出力できる。 Further, the inserted sound reproduction processing means 475B may output the inserted sound not at the timing of matching the music bar, but at the timing of the music, for example, the beat or beat of the music or the rhythm such as BPM. Even with such a configuration, the music reproducing device 400 can further reduce the uncomfortable feeling felt by the user when listening to the music by the insertion sound inserted at a timing that matches the rhythm, and can output the insertion sound more satisfactorily. .

また、挿入位置特定手段４７５Ａにて、挿入位置を小節に合わせて特定する構成について例示したが、これに限らず、音声が発せられた時点や動作が実施された時点から出力処理開始時間である例えば３秒間が経過した時点を挿入位置として特定する構成としてもよい。このような構成にすれば、挿入位置特定手段４７５Ａは、楽曲の小節を認識することなく挿入位置を容易に特定できる。したがって、挿入位置特定手段４７５Ａの処理負荷を低減できる。さらに、挿入位置特定手段４７５Ａの構成を簡略化できる。 In addition, the configuration in which the insertion position specifying unit 475A specifies the insertion position in accordance with the measure has been exemplified, but the present invention is not limited to this, and the output processing start time is from the time when the sound is emitted or the operation is performed. For example, it is good also as a structure which specifies the time of 3 second passing as an insertion position. With such a configuration, the insertion position specifying unit 475A can easily specify the insertion position without recognizing the measure of the music. Therefore, it is possible to reduce the processing load on the insertion position specifying means 475A. Furthermore, the configuration of the insertion position specifying means 475A can be simplified.

そして、挿入位置特定手段４７５Ａにて、例えば音声が発せられた時点から出力処理期限時間以内に挿入音が出力されるように挿入位置を特定する構成について例示したが、これに限らず例えば以下のような構成としてもよい。すなわち、挿入位置特定手段４７５Ａの構成にて、出力処理期限時間が経過した後に挿入音が出力されるように挿入位置を適宜特定する構成としてもよい。このような構成にしても、楽曲再生装置４００は、楽曲に同調する挿入音を挿入するので、楽曲の聴取の際に利用者が感じる違和感を低減できる。 In the insertion position specifying unit 475A, for example, the configuration for specifying the insertion position so that the insertion sound is output within the output processing time limit from the time when the sound is emitted is exemplified. It is good also as such a structure. That is, the insertion position specifying unit 475A may be configured to appropriately specify the insertion position so that the insertion sound is output after the output processing time limit has elapsed. Even with such a configuration, the music reproducing device 400 inserts an insertion sound that is in tune with the music, so that it is possible to reduce a sense of incongruity felt by the user when listening to the music.

さらに、挿入音再生処理手段４７５Ｂにて、挿入音データ記憶手段４２０から取得する挿入音データ６１１に基づいて挿入音を出力する構成について例示したが、これに限らず挿入音再生処理手段４７５Ｂにて挿入音データを生成して挿入音を出力する構成としてもよい。このような構成にすれば、楽曲再生装置４００に挿入音データ記憶手段４２０を設ける必要がなくなり、楽曲再生装置４００の重量を軽くできる。また、楽曲再生装置４００のコストを低減できる。 Further, although the insertion sound reproduction processing unit 475B has exemplified the configuration in which the insertion sound is output based on the insertion sound data 611 acquired from the insertion sound data storage unit 420, the present invention is not limited thereto, and the insertion sound reproduction processing unit 475B It is good also as a structure which produces | generates insertion sound data and outputs insertion sound. With such a configuration, it is not necessary to provide the insertion sound data storage means 420 in the music playback device 400, and the weight of the music playback device 400 can be reduced. Moreover, the cost of the music reproducing device 400 can be reduced.

また、楽曲再生装置４００に無線媒体や有線媒体のネットワークを介して接続されたサーバ装置に、挿入音データ記憶手段４２０を設ける構成としてもよい。このような構成にすれば、楽曲再生装置４００の重量を軽くできる。また、挿入音データ６１１をサーバ装置に接続された他の楽曲再生装置４００と共有できる。 Further, the insertion sound data storage unit 420 may be provided in a server device connected to the music playback device 400 via a network of a wireless medium or a wired medium. With such a configuration, the weight of the music reproducing device 400 can be reduced. Further, the insertion sound data 611 can be shared with other music reproducing devices 400 connected to the server device.

そして、挿入音データ６１１としてＭＩＤＩ形式のデータを用いたが、ＷＡＶＥ形式やＭＰＥＧ形式など他の形式のデータを用いる構成としてもよい。このような構成にすれば、楽曲再生装置４００は、ＭＩＤＩ形式では再生できない例えば自然の音や人物の音声を挿入音として適宜出力できる。したがって、楽曲再生装置４００は、利用者に楽曲の聴取をさらに楽しませることができる。 Although the MIDI format data is used as the insertion sound data 611, data in another format such as the WAVE format or the MPEG format may be used. With such a configuration, the music reproducing device 400 can appropriately output, for example, natural sounds or human voices that cannot be reproduced in the MIDI format as insertion sounds. Therefore, the music reproducing device 400 can make the user further enjoy listening to the music.

さらに、音声リズム認識手段４７３Ａに、車内音に基準音量以上の音声が音声認識基準時間以上含まれているか否かを判断する機能を設けた構成について例示したが、このような機能を設けない構成としてもよい。また、動作リズム認識手段４７４Ａに、身体の所定部分が所定方向に身体動作基準量以上動いたか否かを判断する機能を設けた構成について例示したが、このような機能を設けない構成としてもよい。これらのような構成にすれば、ステップＳ１０５やステップＳ２０４の処理を省略でき、各リズム認識手段４７３Ａ，４７４Ａの処理負荷を低減できる。また、各リズム認識手段４７３Ａ，４７４Ａの構成を簡略化でき、楽曲再生装置４００のコストを低減できる。 Furthermore, although the voice rhythm recognition unit 473A has been illustrated with respect to the configuration in which the function of determining whether or not the sound within the vehicle includes the voice of the reference volume or higher is higher than the voice recognition reference time, the configuration in which such a function is not provided It is good. In addition, although the operation rhythm recognition unit 474A is illustrated with the configuration provided with the function of determining whether or not the predetermined part of the body has moved more than the physical motion reference amount in the predetermined direction, the configuration may not be provided with such a function. . With such a configuration, the processing in step S105 and step S204 can be omitted, and the processing load on each rhythm recognition means 473A, 474A can be reduced. Moreover, the structure of each rhythm recognition means 473A, 474A can be simplified, and the cost of the music reproducing device 400 can be reduced.

そして、楽曲再生装置４００に、音声に対応する挿入音および動作に対応する挿入音を出力する機能を設けた構成について例示したが、いずれか一方の機能を設けない構成としてもよい。このような構成にすれば、各状態認識手段４７３，４７４のうちのいずれか一方を設ける必要がなく、演算手段４７０の構成を簡略化でき、楽曲再生装置４００のコストを低減できる。 And although the composition which provided the function which outputs the insertion sound corresponding to an audio | voice and the insertion sound corresponding to operation | movement was illustrated in the music reproduction apparatus 400, it is good also as a structure which does not provide any one function. With such a configuration, it is not necessary to provide any one of the state recognition units 473 and 474, the configuration of the calculation unit 470 can be simplified, and the cost of the music reproducing device 400 can be reduced.

また、利用者が発音する音声に対応して挿入音を出力する構成について例示したが、これに限らず利用者の手拍子や鼻歌あるいは口笛の音、さらにはタンバリンやカスタネットあるいはマラカスなどの楽器を奏でる音に対応して挿入音を出力する構成としてもよい。ここで、楽器は、楽曲再生装置４００に有線媒体や無線媒体のネットワークを介して接続されていてもよいし、接続されていなくてもよい。このような構成にすれば、利用者は、上述した各種音に対応する挿入音により、自ら発する各種音や自ら奏でる楽器音に楽曲が反応してくれているような感覚を体感でき、楽曲の聴取をより楽しむことができる。 In addition, the configuration for outputting the insertion sound in response to the sound generated by the user is illustrated, but not limited to this, the user's hand clapping, nose singing, whistling sound, and tambourine, castanets, maracas and other musical instruments It is good also as a structure which outputs an insertion sound corresponding to the sound to play. Here, the musical instrument may or may not be connected to the music playback device 400 via a wired or wireless network. With such a configuration, the user can experience the feeling that the music is responding to the various sounds generated by himself or the instrument sounds played by the insertion sound corresponding to the various sounds described above. You can enjoy listening more.

さらに、例えば楽曲再生処理手段４７１を本発明の重畳楽曲データ生成手段として機能させる。そして、楽曲再生処理手段４７１にて、挿入音再生処理手段４７５Ｂで取得した挿入音データ６１１の挿入音を再生中の楽曲データ５１１の楽曲に重畳させた重畳楽曲データを生成する。この後、楽曲再生処理手段４７１にて、この重畳楽曲データを再生して、挿入音が挿入された楽曲を出力する構成としてもよい。このような構成にすれば、挿入音再生処理手段４７５Ｂに挿入音の出力処理を実施させることなく楽曲および挿入音を出力できる。したがって、各再生処理手段４７１，４７５Ｂの処理により楽曲および挿入音を出力させる上記実施の形態と比べて、挿入音出力時における演算手段４７０の処理負荷を低減できる。 Further, for example, the music reproduction processing means 471 is caused to function as the superimposed music data generation means of the present invention. Then, the music reproduction processing unit 471 generates superimposed music data in which the insertion sound of the insertion sound data 611 acquired by the insertion sound reproduction processing unit 475B is superimposed on the music of the music data 511 being reproduced. Thereafter, the music reproduction processing means 471 may reproduce the superimposed music data and output the music in which the insertion sound is inserted. With such a configuration, it is possible to output the music and the insertion sound without causing the insertion sound reproduction processing means 475B to perform the insertion sound output process. Therefore, the processing load of the computing means 470 at the time of outputting the insertion sound can be reduced as compared with the above embodiment in which the music and the insertion sound are output by the processing of each reproduction processing means 471 and 475B.

また、例えば再生されている楽曲の構成を予測して、この構成に基づいた挿入音を出力させる構成としてもよい。すなわち、例えば楽曲の波形を解析して、音素のアタック密度の高い部分と低い部分とを検出する。さらに、この検出した各部分に基づいてコード進行予測をし、このコード進行予測からパターンを検出して楽曲構成を予測する。そして、例えば楽曲の始まりや終わりなどでは、音声やリズムを模したパターンを基調パターンとした挿入音を出力させる構成としてもよい。また、楽曲のサビの前などでは、基調パターンに装飾的な音符を付加した拡張パターンの挿入音を出力させる構成としてもよい。さらに、例えば１つの楽器で間奏を奏でるいわゆるソロ間奏の後など、楽曲で最も盛り上がる部分では、サビの前などと比べて装飾の割合を増加させた拡張パターンの挿入音を出力させる構成としてもよい。 Further, for example, the configuration of a music piece being reproduced may be predicted, and an insertion sound based on this configuration may be output. That is, for example, the waveform of music is analyzed to detect a high and low part of phoneme attack density. Furthermore, chord progression prediction is performed based on each detected part, and a music composition is predicted by detecting a pattern from the chord progression prediction. For example, at the beginning or end of a song, a configuration may be adopted in which an insertion sound using a pattern simulating voice or rhythm as a basic pattern is output. In addition, before the rust of the music, an insertion sound of an extended pattern in which a decorative note is added to the key pattern may be output. Furthermore, for example, after a so-called solo interlude where an interlude is played with a single instrument, an insertion sound of an extended pattern with an increased proportion of decoration compared to before the chorus may be output at the most exciting part of the music. .

ここで、上述したような装飾した拡張パターンの挿入音を生成する構成としては、例えば以下のような構成が例示できる。すなわち、例えば楽曲の最小ビート感で装飾付加音の最小分解能を決定する。そして、基調パターンをアクセントとして、決定した最小分解能におけるアクセント以外の部分に装飾音符を付加した挿入音を生成する構成、例えば基調パターンの前の部分や後の部分さらには前後の部分に、装飾的な音符を付加する構成などとしてもよい。例えば打楽器について例示すると、スネアドラムの音に対応する挿入音にタムタムやシンバルを加えるなど、挿入音として使用する音色の数を増やす構成などとしてもよい。 Here, as a configuration for generating the insertion sound of the extended pattern as described above, for example, the following configuration can be exemplified. That is, for example, the minimum resolution of the decoration additional sound is determined by the minimum beat feeling of the music. A configuration that generates an insertion sound in which a decorative note is added to a portion other than the accent at the determined minimum resolution, using the keynote pattern as an accent, for example, a decorative pattern in the front part, the rear part, and the front and rear parts of the keynote pattern. It is good also as a structure which adds a simple note. For example, for a percussion instrument, a configuration may be adopted in which the number of timbres used as the insertion sound is increased, such as adding a tom or cymbal to the insertion sound corresponding to the sound of the snare drum.

また、本発明の楽曲再生装置４００を車載型のいわゆるカーオーディオに適用した構成について例示したが、パーソナルコンピュータ、ロボット、楽曲の伴奏音楽だけを出力するいわゆるカラオケ装置などに適用してもよい。また、本発明の音出力制御装置を楽曲再生装置４００に適用した構成に限らず、楽曲リズム認識手段４７２、音声状態認識手段４７３、動作状態認識手段４７４、および、挿入音再生制御手段４７５を独立させた構成としてもよい。そして、各手段４７２〜４７５を独立させた構成をパーソナルコンピュータ、ロボット、携帯電話などの各種電子機器や電気機器に設けてもよい。ここで、各手段４７２〜４７５を独立させた構成における楽曲リズム認識手段４７２でリズムを認識する楽曲としては、テレビジョン受像機やラジオ受信機から出力される楽曲、携帯電話や設置式電話の着信音、公開演奏会で演奏される楽曲などが例示できる。 Moreover, although the composition which applied the music reproduction apparatus 400 of this invention to what is called a vehicle-mounted car audio was illustrated, you may apply to what is called a karaoke apparatus etc. which output only the accompaniment music of a personal computer, a robot, and a music. Further, the sound output control device of the present invention is not limited to the configuration in which the sound output control device is applied to the music playback device 400, but the music rhythm recognition unit 472, the sound state recognition unit 473, the operation state recognition unit 474, and the insertion sound playback control unit 475 are independent. It is good also as the structure made to do. And the structure which made each means 472-475 independent may be provided in various electronic devices and electric devices, such as a personal computer, a robot, and a mobile telephone. Here, as music for recognizing rhythm by music rhythm recognizing means 472 in which each means 472 to 475 is made independent, music output from a television receiver or radio receiver, mobile phone or stationary telephone incoming call Examples include sound and music played at a public concert.

そして、上述した各機能をプログラムとして構築したが、例えば回路基板などのハードウェアあるいは１つのＩＣ（Integrated Circuit）などの素子にて構成するなどしてもよく、いずれの形態としても利用できる。なお、プログラムや別途記録媒体から読み取らせる構成とすることにより、取扱が容易で、利用の拡大が容易に図れる。 Each function described above is constructed as a program, but may be configured by hardware such as a circuit board or an element such as a single integrated circuit (IC), and can be used in any form. In addition, by adopting a configuration that allows reading from a program or a separate recording medium, handling is easy, and usage can be easily expanded.

その他、本発明の実施の際の具体的な構造および手順は、本発明の目的を達成できる範囲で他の構造などに適宜変更できる。 In addition, the specific structure and procedure for carrying out the present invention can be changed as appropriate to other structures and the like within the scope of achieving the object of the present invention.

〔実施の形態の効果〕
上述したように、上記実施の形態では、楽曲再生装置４００の演算手段４７０は、例えば音声対応モードで所定の楽曲を再生する旨の要求を認識すると、この所定の楽曲の再生を開始するとともに、マイク２００で集音された例えば「イェイ」の音声を含む車内音を取得する。さらに、演算手段４７０は、再生中の楽曲の楽曲ＢＰＭを基準にした「イェイ」の音声の音声ＢＰＭを認識し、この音声ＢＰＭに対応する音声の音声リズムパターンを認識する。この後、演算手段４７０は、音声リズムパターンに対応するリズムパターン情報６１２Ａに関連付けられた挿入音データ６１１に基づいて、音声リズムパターンに対応する挿入音、すなわち楽曲の特性である楽曲ＢＰＭに合わせた挿入音を楽曲に合わせて発音手段４５０から出力させる。このため、楽曲再生装置４００は、楽曲に同調する挿入音を挿入するので、楽曲の聴取の際に利用者が感じる違和感を低減できる。したがって、楽曲再生装置４００は、挿入音を良好に出力できる。 [Effect of the embodiment]
As described above, in the above-described embodiment, when the calculation unit 470 of the music playback device 400 recognizes a request to play back a predetermined music in, for example, the voice corresponding mode, the playback of the predetermined music starts. The in-vehicle sound including, for example, “Yay” sound collected by the microphone 200 is acquired. Further, the calculation means 470 recognizes the voice BPM of the “Yay” voice based on the music BPM of the music being reproduced, and recognizes the voice rhythm pattern of the voice corresponding to the voice BPM. Thereafter, the calculation means 470 matches the insertion sound corresponding to the voice rhythm pattern, that is, the music BPM that is the characteristic of the music, based on the insertion sound data 611 associated with the rhythm pattern information 612A corresponding to the voice rhythm pattern. The insertion sound is output from the sound generation means 450 according to the music. For this reason, since the music reproducing device 400 inserts an insertion sound that synchronizes with the music, it is possible to reduce a sense of discomfort that the user feels when listening to the music. Therefore, the music reproducing device 400 can output the insertion sound satisfactorily.

本発明の一実施の形態に係る楽曲再生システムの概略構成を示すブロック図である。It is a block diagram which shows schematic structure of the music reproduction system which concerns on one embodiment of this invention. 前記実施の形態における楽曲リストデータの概略構成を示す模式図である。It is a schematic diagram which shows schematic structure of the music list data in the said embodiment. 前記実施の形態における挿入音リストデータの概略構成を示す模式図である。It is a schematic diagram which shows schematic structure of the insertion sound list data in the said embodiment. 前記実施の形態における楽曲再生装置を構成する演算手段の概略構成を示す模式図である。It is a schematic diagram which shows schematic structure of the calculating means which comprises the music reproduction apparatus in the said embodiment. 前記実施の形態における音声対応モードにおける挿入音の出力状態の一例を示す模式図で、（Ａ）は楽曲の歌詞で、（Ｂ）は音声で、（Ｃ）は挿入音である。It is a schematic diagram which shows an example of the output state of the insertion sound in the audio | voice corresponding | compatible mode in the said embodiment, (A) is the lyric of a music, (B) is an audio | voice, (C) is an insertion sound. 前記実施の形態における動作対応モードにおける挿入音の出力状態の一例を示す模式図で、（Ａ）は楽曲の歌詞で、（Ｂ）は動作で、（Ｃ）は挿入音である。FIG. 5 is a schematic diagram illustrating an example of an output state of an insertion sound in the motion corresponding mode in the embodiment, where (A) is a song lyrics, (B) is an operation, and (C) is an insertion sound. 前記実施の形態における音声対応モードにおける挿入音出力処理を示すフローチャートである。It is a flowchart which shows the insertion sound output process in the audio | voice corresponding | compatible mode in the said embodiment. 前記実施の形態における動作対応モードにおける挿入音出力処理を示すフローチャートである。It is a flowchart which shows the insertion sound output process in the operation | movement corresponding mode in the said embodiment. 本発明の他の実施の形態に係る挿入音の出力状態の一例を示す模式図で、（Ａ）は楽曲のリズムで、（Ｂ）は「ジャカジャン」の挿入音の出力パターン１で、（Ｃ）は「ジャカジャン」の挿入音の出力パターン２で、（Ｄ）は「ジャカジャカジャン」の挿入音の出力パターン１で、（Ｅ）は「ジャカジャカジャン」の挿入音の出力パターン２である。FIG. 6 is a schematic diagram illustrating an example of an output state of an insertion sound according to another embodiment of the present invention, where (A) is a rhythm of a song, (B) is an output pattern 1 of an insertion sound of “Jakajan”, and (C ) Is an output pattern 2 of an insertion sound of “Jakajajan”, (D) is an output pattern 1 of an insertion sound of “Jakajakajan”, and (E) is an output pattern 2 of an insertion sound of “Jakajakajan”.

Explanation of symbols

４００楽曲再生装置
４２０音データ記憶手段としての挿入音データ記憶手段
４５０音出力手段および楽曲出力手段としての発音手段
４７０演算手段
４７１重畳楽曲データ生成手段としても機能しうる楽曲再生処理手段
４７２音出力制御装置を構成する楽曲特性認識手段としての楽曲リズム認識手段
４７３音出力制御装置を構成する、外部状態情報取得手段および外部状態リズム認識手段としての音声状態認識手段
４７４音出力制御装置を構成する、外部状態情報取得手段および外部状態リズム認識手段としての動作状態認識手段
４７５音出力制御装置を構成する音出力制御手段としての挿入音再生制御手段
５１１楽曲データ
６１１音データとしての挿入音データ
６１２Ｂ所定外部状態情報としての対象言語情報
６１２Ｃ所定外部状態情報としての対象動作情報 400 Music Reproducing Device 420 Inserted Sound Data Storage Means as Sound Data Storage Means 450 Sound Output Means and Sound Generation Means as Music Output Means 470 Arithmetic Means 471 Music Reproduction Processing Means that Can Function as Superposed Music Data Generation Means 472 Sound Output Control Music rhythm recognizing means as a music characteristic recognizing means constituting the apparatus 473 External state information acquiring means constituting the sound output control apparatus and voice state recognizing means as the external state rhythm recognizing means 474 External constituting the sound output control apparatus Operating state recognition means as state information acquisition means and external state rhythm recognition means 475 Inserted sound reproduction control means as sound output control means constituting the sound output control device 511 Music data 611 Inserted sound data as sound data 612B Predetermined external state Target language information as information 612C Predetermined Target operation information as external status information

Claims

A sound output control device for outputting a predetermined sound from the sound output means,
An external state information acquisition means for acquiring external state information related to the external state;
Music characteristic recognition means for recognizing the sound characteristics of the music data output from the music output means for outputting music data relating to the music;
Recognizing that the external state information related to the predetermined external state has been acquired, a sound output control means for controlling the sound output means to output a sound in an output form synchronized with the characteristics of the sound;
A sound output control device comprising:

The sound output control device according to claim 1,
The sound output control means controls to output a sound in an output form that is tuned to at least one of the pitch, strength, and tone color of the music. apparatus.

The sound output control device according to claim 1 or 2,
The sound output control means controls to output a sound in an output form synchronized with the chord of the music.

The sound output control device according to any one of claims 1 to 3,
Comprising external state rhythm recognition means for recognizing the rhythm of the predetermined external state;
The sound output control device is characterized in that the sound output control means controls to output a sound in an output form synchronized with the rhythm of the predetermined external state.

The sound output control device according to any one of claims 1 to 4,
The sound output control device is characterized in that the sound output control means controls to output a sound in an output form corresponding to the content of the predetermined external state.

The sound output control device according to any one of claims 1 to 5,
The sound output control means controls to output a sound in an output form synchronized with the rhythm of the music.

The sound output control device according to claim 6,
The sound output control means controls to output a sound in an output form in accordance with BPM (Beats Per Minute) of the music.

The sound output control device according to claim 6 or 7,
The sound output control means controls to output a sound in an output form synchronized with the tune of the music.

The sound output control device according to any one of claims 1 to 5,
The sound output control device controls the output of the sound at a timing according to the rhythm of the music.

The sound output control device according to claim 9,
The sound output control device is characterized in that the sound output control means controls to output the sound at a timing according to a measure of the music.

The sound output control device according to any one of claims 1 to 10,
The sound output control means performs control to output the sound before an output processing deadline time that is a predetermined time after the external state information relating to the predetermined external state is acquired. Sound output control device.

The sound output control device according to any one of claims 1 to 8,
The sound output control means performs control to output the sound when recognizing that an elapsed time from the acquisition of the external state information relating to the predetermined external state has reached an output processing start time which is a predetermined time. Sound output control device characterized by.

The sound output control device according to any one of claims 1 to 12,
The sound output control device, wherein the sound output control means acquires sound data related to the sound output in the output form, and controls to output the sound in the output form based on the sound data.

The sound output control device according to claim 13,
The sound data conforms to the MIDI (Musical Instrument Digital Interface) standard,
The sound output control device, wherein the sound output control means performs control to change a set value of information relating to an output form in the sound data to output a sound of the output form.

The sound output control device according to claim 13 or 14,
Sound data storage means for storing the sound data in association with predetermined external state information relating to the predetermined external state;
The sound output control means retrieves the predetermined external state information related to the predetermined external state of the acquired external state information from the sound data storage means, and the sound associated with the retrieved predetermined external state information A sound output control device characterized by acquiring data.

The sound output control device according to any one of claims 1 to 15,
The external state information acquisition means acquires external sound information related to an external sound as the external state information,
When the sound output control means recognizes that the external sound information related to the predetermined external sound has been acquired, the sound output control means controls to output the sound.

The sound output control device according to claim 16,
The sound output control unit is configured to output the sound when recognizing that a time length of the external sound equal to or less than a reference volume in the external sound of the external sound information is equal to or less than an external sound recognition reference time that is a predetermined time. A sound output control device characterized by

The sound output control device according to claim 16 or 17,
The sound output control device, wherein the external sound is a sound.

The sound output control device according to any one of claims 1 to 18,
The external state information acquisition means acquires body motion information related to body motion as the external state information,
When the sound output control means recognizes that the body motion information related to a predetermined body motion has been acquired, the sound output control means controls to output the sound.

The sound output control device according to claim 19,
The sound output control means performs control to output the sound when recognizing that the amount of movement of the body in the body movement of the body movement information is equal to or greater than a body movement recognition reference amount that is a predetermined movement amount. Sound output control device characterized by.

The sound output control device according to any one of claims 1 to 20,
Music reproduction processing means for performing processing for reproducing the music data and outputting the music data from the music output means;
A music playback device characterized by comprising:

The sound output control device according to any one of claims 1 to 20,
Superimposed music data generating means for acquiring the music data and generating superimposed music data in which the sound whose output is controlled by the sound output control means of the sound output control device is superimposed on the sound of the music data;
Music reproduction processing means for performing processing for reproducing the superimposed music data and outputting the music from the music output means;
A music playback device characterized by comprising:

A sound output control method for outputting a predetermined sound from sound output means,
Get external state information about the external state,
Recognizing the sound characteristics of the music data output from the music output means for outputting music data related to the music,
When recognizing that the external state information relating to a predetermined external state has been acquired, a sound output control method is provided for controlling the sound output means to output a sound in an output form that is tuned to the sound characteristics.

21. A sound output control program that causes a computing means to function as the sound output control device according to claim 1.

A sound output control program for causing a calculation means to execute the sound output control method according to claim 23.

26. A recording medium on which a sound output control program according to claim 24 or 25 is recorded so as to be readable by an arithmetic means.