JP2016066034A

JP2016066034A - Karaoke device, and control method of karaoke device

Info

Publication number: JP2016066034A
Application number: JP2014196219A
Authority: JP
Inventors: 薫満留; Kaoru Mitsudome
Original assignee: Brother Industries Ltd
Current assignee: Brother Industries Ltd
Priority date: 2014-09-26
Filing date: 2014-09-26
Publication date: 2016-04-28
Anticipated expiration: 2034-09-26
Also published as: JP6337723B2

Abstract

PROBLEM TO BE SOLVED: To provide a karaoke device capable of extracting user voice even when sounds are emitted from the karaoke device, and a control method of the karaoke device.SOLUTION: A karaoke device includes: music information expressing reproduction sound of the music; an acoustic control means for generating pronunciation information, which expresses the sound emitted from the speaker, including singing information expressing a singing voice of a singer; acquisition means for acquiring user voice information expressing the voice of the user and voice operation information including information expressing the sound emitted from the speaker according to the voicing information; extraction means for extracting the user voicing information included in the voice operation information by comparing the voice operation information and the voicing information; and first control means for generating instruction information expressing various kinds of instructions according to the user voice information extracted by the extraction means.SELECTED DRAWING: Figure 2

Description

本発明は、伴奏音などの音がスピーカから出力されている場合でも、ユーザの音声により各種の指示を行うことができるカラオケ装置に関する。 The present invention relates to a karaoke apparatus capable of giving various instructions by a user's voice even when a sound such as an accompaniment sound is output from a speaker.

従来、音声認識手段を利用した楽曲検索機能を搭載したカラオケ装置が知られている。しかし、カラオケサービスの利用中にカラオケボックスにおいてはカラオケ用楽曲などが再生されており、カラオケボックスが無音の状態であることはない。つまり、カラオケボックスでユーザが音声認識手段を用いた楽曲検索を行っても、カラオケ装置が再生するカラオケ用楽曲などの周囲の音によって音声認識が妨げられてしまう。この音声認識が妨げられる場合に対処するために、例えば、特許文献１に記載の音楽演奏装置が提案された。この音楽演奏装置は、音声が入力されたとき、楽曲の演奏中か否かを判断する手段を備え、該手段が楽曲の演奏中であると判断すると、音声認識手段による認識ができないことを示すメッセージを出力する。 2. Description of the Related Art Conventionally, a karaoke apparatus equipped with a music search function using voice recognition means is known. However, during the use of the karaoke service, karaoke music or the like is played in the karaoke box, and the karaoke box is not silent. That is, even if the user performs a music search using the voice recognition means in the karaoke box, voice recognition is hindered by ambient sounds such as karaoke music played by the karaoke device. In order to cope with the case where the voice recognition is hindered, for example, a music performance device described in Patent Document 1 has been proposed. This music performance device is provided with means for determining whether or not a music is being played when a voice is input, and if the means determines that the music is being played, it indicates that recognition by the voice recognition means is not possible. Output a message.

特開平１１−２６５１９０号公報JP 11-265190 A

しかしながら、カラオケボックスで音声による楽曲検索を利用する場合、ユーザの音声を抽出する妨げとなる音は、歌唱時においては、主に楽曲の伴奏音、および歌唱者の歌唱音のような楽曲再生装置により出力される音である。また、楽曲再生装置は、歌唱時以外の時にも、広告動画などを再生し、スピーカからは広告音を出力する。そのため、歌唱中でなくても正しく音声を認識することができない可能性がある。 However, when using a music search by voice in a karaoke box, the sound that hinders the extraction of the user's voice is mainly a music accompaniment sound and a music playback device such as a singer's singing sound at the time of singing Is a sound output by. In addition, the music reproducing device reproduces an advertising moving image or the like at a time other than singing, and outputs an advertising sound from the speaker. Therefore, there is a possibility that the voice cannot be recognized correctly even when not singing.

そこで、本発明はカラオケ装置から音が発音されている場合でも、ユーザの音声の抽出を行えるカラオケ装置、およびカラオケ装置の制御方法を提供することを目的とする。 Therefore, an object of the present invention is to provide a karaoke apparatus that can extract a user's voice even when sound is generated from the karaoke apparatus, and a method for controlling the karaoke apparatus.

この目的を達成するために、請求項１記載のカラオケ装置は、楽曲の再生音を表す楽曲情報と、歌唱者の歌唱音を表す歌唱情報とを含む発音情報であって、スピーカに発生させる音を表す発音情報を生成する音響制御手段と、ユーザの音声を表すユーザ音声情報と、発音情報に従ってスピーカから発生された音を表す情報とを含む音声操作情報を取得する取得手段と、音声操作情報と、発音情報とを比較して、音声操作情報に含まれるユーザ音声情報を抽出する抽出手段と、抽出手段により抽出されたユーザ音声情報に従って、各種指示を表す指示情報を生成する第１制御手段と、を備えることを特徴とする。 In order to achieve this object, the karaoke apparatus according to claim 1 is sound generation information including music information representing a reproduction sound of music and singing information representing a singer's singing sound, which is generated by a speaker. Sound control means for generating pronunciation information representing voice, acquisition means for obtaining voice operation information including user voice information representing user's voice, and information representing sound generated from a speaker according to the pronunciation information, voice operation information And the first control means for generating instruction information representing various instructions in accordance with the user voice information extracted by the extraction means, and extracting the user voice information included in the voice operation information. And.

請求項２記載のカラオケ装置は、音声操作情報と発音情報とを同期させるために所定の周波数の音を表す同期音情報を、発音情報に付加する同期音情報付加手段を備え、抽出手段は、同期音情報が表す所定の周波数に従って、音声操作情報と発音情報とを同期させ、ユーザ音声情報を抽出することを特徴とする。 The karaoke apparatus according to claim 2 includes synchronization sound information adding means for adding, to the pronunciation information, synchronization sound information representing a sound of a predetermined frequency in order to synchronize the voice operation information and the pronunciation information, According to a predetermined frequency represented by the synchronization sound information, the voice operation information and the pronunciation information are synchronized to extract user voice information.

請求項３記載のカラオケ装置の取得手段は、発音情報に従ってスピーカから発生された同期音情報の表す所定の周波数の音を取得し、同期音情報付加手段は、取得手段によりユーザ音声情報の取得を終えると、同期音情報の付加を終了し、抽出手段は、音声操作情報に含まれる同期音情報と一致する周波数の音を表す特定の情報と、発音情報に含まれる同期音情報との付加開始位置および付加終了位置を同期させ、ユーザ音声情報を抽出することを特徴とする。 The acquisition device of the karaoke apparatus according to claim 3 acquires a sound of a predetermined frequency represented by the synchronization sound information generated from the speaker according to the pronunciation information, and the synchronization sound information addition device acquires the user voice information by the acquisition device. When finished, the addition of the synchronization sound information ends, and the extraction means starts adding specific information representing a sound having a frequency matching the synchronization sound information included in the voice operation information and the synchronization sound information included in the pronunciation information User voice information is extracted by synchronizing the position and the addition end position.

請求項４記載のカラオケ装置は、楽曲を再生する楽曲再生装置と、移動可能な端末装置とを備えるカラオケ装置であって、端末装置は、取得手段と、抽出手段と、各種処理とを実行する第１制御手段と、楽曲再生装置と通信するための第１通信手段とを備え、楽曲再生装置は、音響制御手段と、同期音情報付加手段と、各種処理とを実行する第２制御手段と、端末装置と通信するための第２通信手段とを備え、第１制御手段は、抽出手段により抽出されたユーザ音声情報に従って、各種指示を表す指示情報を生成することを特徴とする。 The karaoke apparatus according to claim 4 is a karaoke apparatus including a music playback device for playing back music and a movable terminal device, and the terminal device executes acquisition means, extraction means, and various processes. A first communication means for communicating with the music playback device; the music playback device; a second control means for executing an acoustic control means, a synchronization sound information adding means, and various processes; Second communication means for communicating with the terminal device, wherein the first control means generates instruction information representing various instructions according to the user voice information extracted by the extraction means.

請求項５記載のカラオケ装置の第１制御手段は、取得手段に音声操作情報の取得を開始させる処理と、発音情報の生成を開始させることを表す生成指示を第２制御手段に送る処理と、取得手段により取得された音声操作情報が同期音情報を含んでいる場合、ユーザ音声情報を音声操作情報に含める許可を表す通知を表示手段に表示させる処理と、ユーザ音声情報を音声操作情報に含めることを終了する場合、発音情報の生成を終了させることを表す生成終了指示を第２制御手段に送る処理と、音声操作情報と、発音情報とを抽出手段に比較させて、音声操作情報に含まれるユーザ音声情報を抽出手段に抽出させる処理と、抽出されたユーザ音声情報に従って、各種指示を表す指示情報を生成する処理とを実行する。第２制御手段は、第１制御手段からの生成指示に従って、音響制御手段に発音情報の生成を開始させる処理と、同期音情報付加手段によって発音情報に同期音情報を付加させ、かつスピーカに発音させる処理と、第１制御手段からの生成終了指示に従って、音響制御手段に発音情報の生成を終了させ、発音情報を第１制御手段に送る処理とを実行することを特徴とする。 The first control means of the karaoke apparatus according to claim 5 is a process of causing the acquisition means to start acquiring voice operation information, a process of sending a generation instruction indicating starting generation of pronunciation information to the second control means, When the voice operation information acquired by the acquisition unit includes synchronous sound information, a process for displaying a notification indicating permission to include the user voice information in the voice operation information on the display unit, and including the user voice information in the voice operation information If the process ends, the process for sending the generation end instruction indicating the end of the generation of the pronunciation information to the second control means, the voice operation information, and the pronunciation information are compared with the extraction means and included in the voice operation information. A process for causing the extraction means to extract the user voice information to be extracted and a process for generating instruction information representing various instructions according to the extracted user voice information are executed. The second control means causes the sound control means to start generating the pronunciation information in accordance with the generation instruction from the first control means, adds the synchronization sound information to the pronunciation information by the synchronization sound information addition means, and generates the sound to the speaker. And a process of causing the sound control means to finish generating the pronunciation information and sending the pronunciation information to the first control means in accordance with a generation end instruction from the first control means.

請求項１記載のカラオケ装置は、スピーカに発生させる音を表す前記発音情報を生成する音響制御手段と、ユーザの音声を表すユーザ音声情報と、発音情報に従ってスピーカから発生された音を表す情報とを含む音声操作情報を取得する取得手段と、音声操作情報と、発音情報とを比較して、音声操作情報に含まれるユーザ音声情報を抽出する抽出手段と、抽出手段により抽出されたユーザ音声情報に従って、各種指示を表す指示情報を生成する第１制御手段とを備える。取得手段は、ユーザの音声を表すユーザ音声情報と、発音情報に従ってスピーカに発生させた音を表す情報とを含む音声操作情報を取得するため、抽出手段は、音声操作情報と、発音情報とを比較して、ユーザ音声情報を抽出することができる。これにより、抽出されたユーザ音声情報に従って、各種指示を表す指示情報を生成することができる。 The karaoke apparatus according to claim 1, acoustic control means for generating the pronunciation information representing a sound generated by a speaker, user voice information representing a user's voice, and information representing a sound generated from the speaker according to the pronunciation information; An acquisition means for acquiring voice operation information including: an extraction means for comparing the voice operation information with the pronunciation information to extract user voice information included in the voice operation information; and the user voice information extracted by the extraction means And first control means for generating instruction information representing various instructions. The obtaining means obtains voice operation information including user voice information representing the user's voice and information representing the sound generated by the speaker according to the pronunciation information. Therefore, the extraction means includes the voice operation information and the pronunciation information. In comparison, user voice information can be extracted. Thereby, instruction information representing various instructions can be generated in accordance with the extracted user voice information.

請求項２記載のカラオケ装置は、発音情報に音声操作情報と発音情報とを同期させるために所定の周波数の音を表す同期音情報を付加する同期音情報付加手段を備え、抽出手段は、同期音情報の表す所定の周波数に従って、音声操作情報と発音情報とを同期させ、ユーザ音声情報を抽出する。同期音情報付加手段により発音情報に同期音情報を付加することで、抽出手段は、発音情報と音声操作情報とをより正確に比較することができる。これにより、ユーザ音声情報の抽出をより確実に行うことができる。抽出されたユーザ音声情報に従って、各種指示を表す指示情報を生成することができる。 The karaoke apparatus according to claim 2 further includes synchronization sound information adding means for adding synchronization sound information representing a sound of a predetermined frequency to synchronize the sound operation information and the sound generation information with the sound generation information, In accordance with a predetermined frequency represented by the sound information, the voice operation information and the pronunciation information are synchronized, and the user voice information is extracted. By adding the synchronization sound information to the pronunciation information by the synchronization sound information addition means, the extraction means can compare the pronunciation information and the voice operation information more accurately. Thereby, user voice information can be extracted more reliably. Instruction information representing various instructions can be generated in accordance with the extracted user voice information.

請求項３記載のカラオケ装置では、取得手段は、発音情報に従ってスピーカから発生された同期音情報の表す所定の周波数の音を取得し、同期音情報付加手段は、取得手段がユーザ音声情報の取得を終えると同期音情報の付加を終了し、抽出手段は、音声操作情報に含まれる同期音情報と一致する周波数の音を表す特定の情報と、発音情報に含まれる同期音情報との付加開始位置および付加終了位置を同期させ、ユーザ音声情報を抽出する。音声操作情報、および発音情報に含まれる同期音情報に一致する周波数の音の付加開始位置と付加終了位置とをそれぞれ同期せることで、抽出手段は、発音情報と音声操作情報とをより正確に比較することができる。これにより、ユーザ音声情報の抽出をより確実に行うことができる。また、同期音情報の付加開始位置および付加終了位置を同期させ、その開始位置から終了位置までを比較すればよいため、抽出処理を軽減できる。 In the karaoke apparatus according to claim 3, the acquisition unit acquires a sound having a predetermined frequency represented by the synchronization sound information generated from the speaker in accordance with the pronunciation information, and the acquisition unit acquires the user voice information. When the process is finished, the addition of the synchronization sound information ends, and the extraction means starts adding specific information representing a sound having a frequency matching the synchronization sound information included in the voice operation information and the synchronization sound information included in the pronunciation information. The user voice information is extracted by synchronizing the position and the addition end position. By synchronizing the addition start position and the addition end position of the sound of the frequency matching the voice operation information and the synchronization sound information included in the pronunciation information, the extraction means can more accurately synchronize the pronunciation information and the voice operation information. Can be compared. Thereby, user voice information can be extracted more reliably. Further, since the addition start position and the addition end position of the synchronization sound information are synchronized and compared from the start position to the end position, the extraction process can be reduced.

請求項４記載のカラオケ装置は、楽曲を再生する楽曲再生装置と、移動可能な端末装置とを備える。端末装置は、取得手段と、抽出手段と、各種処理とを実行する第１制御手段と、楽曲再生装置と通信するための第１通信手段とを備える。楽曲再生装置は、音響制御手段と、同期音情報付加手段と、各種処理とを実行する第２制御手段と、端末装置と通信するための第２通信手段とを備える。第１制御手段は、抽出手段により抽出されたユーザ音声情報に従って、各種指示を表す指示情報を生成する。カラオケ装置が楽曲再生装置と、移動可能な端末装置とを備えることで、移動可能な端末装置を利用して楽曲再生装置から離れた場所でも音声によって指示情報を生成できる。 The karaoke device according to claim 4 includes a music playback device for playing back music and a movable terminal device. The terminal device includes acquisition means, extraction means, first control means for executing various processes, and first communication means for communicating with the music playback device. The music reproducing device includes an acoustic control unit, a synchronization sound information adding unit, a second control unit that performs various processes, and a second communication unit for communicating with the terminal device. The first control means generates instruction information representing various instructions according to the user voice information extracted by the extracting means. Since the karaoke device includes the music playback device and the movable terminal device, the instruction information can be generated by voice even at a place away from the music playback device using the movable terminal device.

本発明の第１の実施形態に係るカラオケ装置１の楽曲再生装置１０、およびリモコン端末２０ａのブロック図である。1 is a block diagram of a music playback device 10 and a remote control terminal 20a of a karaoke device 1 according to a first embodiment of the present invention. 楽曲再生装置１０の制御部１００、及び端末装置２０ａの制御部２００により実行される処理を表すフローチャートである。It is a flowchart showing the process performed by the control part 100 of the music reproduction apparatus 10, and the control part 200 of the terminal device 20a. 制御部２００により実行されるユーザ音声情報の抽出処理を表す概略図である。It is the schematic showing the extraction process of the user audio | voice information performed by the control part. 本発明の第２の実施形態における制御部１００の処理を表すフローチャートである。It is a flowchart showing the process of the control part 100 in the 2nd Embodiment of this invention. 本発明の変形例における制御部１００により実行されるユーザ音声情報の抽出処理を表す概略図である。It is the schematic showing the extraction process of the user audio | voice information performed by the control part 100 in the modification of this invention.

以下、本発明の実施の形態について図面を参照して説明する。
［第１の実施形態］
図１は第１の実施形態に係るカラオケ装置１の楽曲再生装置１０、およびリモコン端末２０ａのブロック図である。カラオケ装置１は、楽曲再生装置１０、リモコン装置２０ａ、および、端末装置２０ｂを備える。 Hereinafter, embodiments of the present invention will be described with reference to the drawings.
[First Embodiment]
FIG. 1 is a block diagram of a music playback device 10 and a remote control terminal 20a of the karaoke device 1 according to the first embodiment. The karaoke apparatus 1 includes a music playback device 10, a remote control device 20a, and a terminal device 20b.

［楽曲再生装置１０の構成］
楽曲再生装置１０は、リモコン装置２０ａおよび、端末装置２０ｂと通信可能に設けられる。楽曲再生装置１０は、制御部１００、メモリ１０１、記憶部１１０、操作処理部１２０、操作部１２１、映像再生部１３０、映像制御部１３１、音響制御部１４０、および通信部１５０を備える。 [Configuration of Music Reproducing Device 10]
The music reproducing device 10 is provided so as to be able to communicate with the remote control device 20a and the terminal device 20b. The music playback device 10 includes a control unit 100, a memory 101, a storage unit 110, an operation processing unit 120, an operation unit 121, a video playback unit 130, a video control unit 131, an acoustic control unit 140, and a communication unit 150.

制御部１００は、楽曲再生装置１０の処理を制御する。メモリ１０１は、ＲＯＭ、およびＲＡＭを備える。ＲＯＭは制御部１００が実行するプログラムを記憶する。ＲＡＭは、一時的な記憶領域であり、制御部１００の制御に関する情報を一時的に記憶する。 The control unit 100 controls processing of the music playback device 10. The memory 101 includes a ROM and a RAM. The ROM stores a program executed by the control unit 100. The RAM is a temporary storage area, and temporarily stores information related to the control of the control unit 100.

記憶部１１０は、記憶領域を備え、各種情報を記憶する。記憶部１１０は、楽曲再生装置１０によって再生される楽曲の音を表す楽曲情報、表示される画像情報、および再生される映像情報などを記憶する。 The storage unit 110 includes a storage area and stores various types of information. The storage unit 110 stores music information representing the sound of music played by the music playback device 10, image information to be displayed, video information to be played back, and the like.

操作処理部１２０は、ユーザによる操作部１２１の操作を表す操作情報を受け、制御部１００の動作を切り替える制御要求に操作情報を変換して制御部１００に送る。操作部１２１は、スイッチまたはタッチパネルを備え、スイッチまたはタッチパネルはユーザにより押される。操作部１２１は、ユーザが押したスイッチの位置および回数などを表す操作情報、またはタッチパネルの位置および回数などを表す操作情報を操作処理部１２０に送る。操作部１２１から送られた操作情報は操作処理部１２０によって制御要求に変換され、制御部１００へ送られる。 The operation processing unit 120 receives operation information representing the operation of the operation unit 121 by the user, converts the operation information into a control request for switching the operation of the control unit 100, and sends the operation information to the control unit 100. The operation unit 121 includes a switch or a touch panel, and the switch or touch panel is pressed by the user. The operation unit 121 sends operation information indicating the position and frequency of the switch pressed by the user or operation information indicating the position and frequency of the touch panel to the operation processing unit 120. The operation information sent from the operation unit 121 is converted into a control request by the operation processing unit 120 and sent to the control unit 100.

映像再生部１３０は、制御部１００から送られる再生指示に従って、映像情報を再生する。映像情報は、表示開始時間の情報とともに再生指示に含まれる。映像情報は、表示開始時間の情報に合わせて映像再生部１３０により再生される。再生された映像情報は、映像制御部１３１に送られる。映像制御部１３１は、制御部１００から送られるタブ譜および歌詞テロップと、映像再生部１３０から送られる映像情報とを受ける。映像再生部１３０は、これらの情報を合成し、かつモニタ１３２に表示可能な形式に変換し、この変換された情報をモニタ１３２の画面に表示させる。 The video playback unit 130 plays back video information in accordance with a playback instruction sent from the control unit 100. The video information is included in the reproduction instruction together with the display start time information. The video information is reproduced by the video reproduction unit 130 in accordance with the display start time information. The reproduced video information is sent to the video control unit 131. The video control unit 131 receives the tablature and lyrics telop sent from the control unit 100 and the video information sent from the video playback unit 130. The video reproduction unit 130 combines these information and converts the information into a format that can be displayed on the monitor 132, and displays the converted information on the screen of the monitor 132.

音響制御部１４０は、制御部１００から送られる再生指示に従って、楽曲情報を再生する。音響制御部１４０は、歌唱用マイク１４１から送られる歌唱音情報と楽曲情報とを合成して、スピーカ１４２に発音させる発音情報を生成する。音響制御部１４０は、制御部１００からの同期音情報生成指示に従って、あらかじめ定められた周波数の音を表す同期音情報を生成する。音響制御部１４０は、生成した同期音情報を発音情報に合成する。合成された発音情報に従って、音響制御部１４０はスピーカ１４２に発音させる。また、合成された発音情報は、制御部１００に送られる。同期音情報が表す、あらかじめ定められた周波数の音とは、一般的に人が聞くことのできる２０〜２００００Ｈｚの範囲を除く周波数の音であって、スピーカ１４２が発音可能、かつ後述の操作用マイク２４１で集音可能な周波数の音である。 The sound control unit 140 reproduces the music information according to the reproduction instruction sent from the control unit 100. The acoustic control unit 140 synthesizes the singing sound information sent from the singing microphone 141 and the music information, and generates pronunciation information that causes the speaker 142 to generate a sound. The acoustic control unit 140 generates synchronized sound information representing a sound having a predetermined frequency in accordance with the synchronized sound information generation instruction from the control unit 100. The acoustic control unit 140 synthesizes the generated synchronized sound information with the pronunciation information. The sound control unit 140 causes the speaker 142 to sound according to the synthesized sound information. The synthesized pronunciation information is sent to the control unit 100. The sound of a predetermined frequency represented by the synchronization sound information is a sound of a frequency other than the range of 20 to 20000 Hz that can be generally heard by humans, can be produced by the speaker 142, and is used for an operation described later. The sound has a frequency that can be collected by the microphone 241.

通信部１５０は、後述するリモコン装置２０ａ、および端末装置２０ｂとの通信を可能にする無線通信機能を備える。通信部１５０は、制御部１００に対する制御要求をリモコン装置２０ａ、および端末装置２０ｂから受信し、制御要求を制御部１００へ送る。また、通信部１５０は、リモコン装置２０ａ、および端末装置２０ｂに対する制御要求を制御部１００から受け、リモコン装置２０ａ、および端末装置２０ｂに送る。さらに、通信部１５０は、記憶部１１０に記憶された画像情報、および発音情報などの各種情報を送受信する機能を備える。 The communication unit 150 includes a wireless communication function that enables communication with a remote control device 20a and a terminal device 20b described later. The communication unit 150 receives a control request for the control unit 100 from the remote control device 20a and the terminal device 20b, and sends the control request to the control unit 100. The communication unit 150 receives a control request for the remote control device 20a and the terminal device 20b from the control unit 100, and sends the control request to the remote control device 20a and the terminal device 20b. Furthermore, the communication unit 150 has a function of transmitting and receiving various types of information such as image information and pronunciation information stored in the storage unit 110.

［リモコン装置２０ａの構成］
次に、リモコン装置２０ａの構成について図１を参照して説明する。 [Configuration of Remote Control Device 20a]
Next, the configuration of the remote control device 20a will be described with reference to FIG.

リモコン装置２０ａは、楽曲再生装置１０と通信可能に設けられる。リモコン装置２０ａは、制御部２００、メモリ２０１、記憶部２１０、操作処理部２２０、操作部２２１、映像再生部２３０、映像制御部２３１、表示部２３２、音響制御部２４０、操作用マイク２４１、および通信部２５０を備える。 The remote control device 20a is provided so as to be able to communicate with the music reproducing device 10. The remote control device 20a includes a control unit 200, a memory 201, a storage unit 210, an operation processing unit 220, an operation unit 221, a video reproduction unit 230, a video control unit 231, a display unit 232, an acoustic control unit 240, an operation microphone 241, and A communication unit 250 is provided.

制御部２００は、リモコン装置２０ａの処理を制御する。メモリ２０１は、ＲＯＭ、およびＲＡＭを備える。ＲＯＭは制御部２００が実行するプログラムを記憶する。ＲＡＭは、一時的な記憶領域であり、制御部２００の制御に関する情報を一時的に記憶する。 The control unit 200 controls processing of the remote control device 20a. The memory 201 includes a ROM and a RAM. The ROM stores a program executed by the control unit 200. The RAM is a temporary storage area, and temporarily stores information related to control by the control unit 200.

記憶部２１０は、記憶領域を備え、各種情報を記憶する。記憶部２１０は、楽曲リスト、リモコン装置２０ａによって表示される画像情報、および再生される映像情報などを記憶する。 The storage unit 210 includes a storage area and stores various types of information. The storage unit 210 stores a music list, image information displayed by the remote control device 20a, reproduced video information, and the like.

操作処理部２２０は、ユーザによる操作部２２１の操作を表す操作情報を受け、制御部２００の動作を切り替える制御要求に操作情報を変換して制御部２００に送る。操作部２２１は、スイッチまたはタッチパネルを備え、スイッチまたはタッチパネルはユーザにより押される。操作部２２１は、ユーザが押したスイッチの位置および回数などを表す操作情報、またはタッチパネルの位置および回数などを表す操作情報を操作処理部２２０に送る。操作部１２１から送られた操作情報は操作処理部２２０によって制御要求に変換され、制御部２００へ送られる。 The operation processing unit 220 receives operation information indicating the operation of the operation unit 221 by the user, converts the operation information into a control request for switching the operation of the control unit 200, and sends the operation information to the control unit 200. The operation unit 221 includes a switch or a touch panel, and the switch or touch panel is pressed by the user. The operation unit 221 sends operation information indicating the position and number of times of the switch pressed by the user or operation information indicating the position and number of times of the touch panel to the operation processing unit 220. The operation information sent from the operation unit 121 is converted into a control request by the operation processing unit 220 and sent to the control unit 200.

映像再生部２３０は、制御部２００から送られる再生指示に従って、映像情報を再生する。映像情報は、表示開始時間を表す情報とともに再生指示に含まれる。映像情報は、表示開始時間を表す情報に合わせて映像再生部２３０により再生される。再生された映像情報は、映像制御部２３１に送られる。映像制御部２３１は、制御部２００から送られる画像情報と、映像再生部２３０から送られる映像情報とを受ける。映像再生部２３０は、これらの情報を合成し、かつ表示部２３２に表示可能な形式に変換し、この変換された情報を表示部２３２に表示させる。 The video playback unit 230 plays back video information in accordance with a playback instruction sent from the control unit 200. The video information is included in the reproduction instruction together with information indicating the display start time. The video information is reproduced by the video reproduction unit 230 in accordance with information indicating the display start time. The reproduced video information is sent to the video control unit 231. The video control unit 231 receives image information sent from the control unit 200 and video information sent from the video playback unit 230. The video reproduction unit 230 combines these pieces of information and converts them into a format that can be displayed on the display unit 232, and causes the display unit 232 to display the converted information.

音響制御部２４０は、操作用マイク２４１から送られる音声操作情報を制御部２００に送る。操作用マイク２４１は、入力される音声をＡ／Ｄ変換器を備え、入力された音声を音声操作情報に変換し、音響制御部２４０に送る。また、音響制御部２４０は、同期音情報に一致する周波数の音を表す特定の情報が、所定の強さ以上の強さで音声操作情報に含まれているか否かを判断し、その特定の情報が所定の強さ以上の強さで音声操作情報に含まれている場合に、検知信号を制御部２００に送る。 The acoustic control unit 240 sends voice operation information sent from the operation microphone 241 to the control unit 200. The operation microphone 241 includes an A / D converter for input sound, converts the input sound into sound operation information, and sends the sound operation information to the sound control unit 240. In addition, the sound control unit 240 determines whether or not specific information representing a sound having a frequency matching the synchronization sound information is included in the voice operation information with a strength higher than a predetermined strength. When the information is included in the voice operation information with a strength higher than a predetermined strength, a detection signal is sent to the control unit 200.

通信部２５０は、楽曲再生装置１０との通信を可能にする無線通信機能を備える。通信部２５０は、制御部２００に対する制御要求を楽曲再生装置１０から受け、制御要求を制御部１００へ送る。また、通信部２５０は、楽曲再生装置１０に対する制御要求を制御部２００から受け、楽曲再生装置１０に送る。さらに、通信部２５０は、楽曲再生装置１０の記憶部１１０に記憶された画像情報、および発音情報などの各種情報を受ける機能を備える。 The communication unit 250 includes a wireless communication function that enables communication with the music playback device 10. The communication unit 250 receives a control request for the control unit 200 from the music playback device 10 and sends the control request to the control unit 100. The communication unit 250 receives a control request for the music playback device 10 from the control unit 200 and sends the control request to the music playback device 10. Furthermore, the communication unit 250 has a function of receiving various types of information such as image information and pronunciation information stored in the storage unit 110 of the music playback device 10.

次に、楽曲再生装置１０およびリモコン装置２０ａの処理について説明する。図２は、楽曲再生装置１０およびリモコン装置２による音声検索処理のフローチャートである。楽曲再生装置１０の処理は、制御部１００が行う処理であり、リモコン端末２０ａの処理は、制御部２００が行う処理である。音声検索処理は、音声検索を開始することを表す制御要求が、操作処理部２２０、または通信部２５０から、制御部２００に送られることにより開始される。また、楽曲の再生中、または曲間のＢＧＭ再生中であっても、音声検索を開始することを表す制御要求が、制御部２００に送られることにより、音声検索処理は実行される。 Next, processing of the music playback device 10 and the remote control device 20a will be described. FIG. 2 is a flowchart of voice search processing by the music playback device 10 and the remote control device 2. The process of the music reproducing device 10 is a process performed by the control unit 100, and the process of the remote control terminal 20a is a process performed by the control unit 200. The voice search process is started when a control request indicating that voice search is started is sent from the operation processing unit 220 or the communication unit 250 to the control unit 200. In addition, even during music reproduction or BGM reproduction between songs, a control request indicating that voice search is started is sent to the control unit 200, whereby voice search processing is executed.

［楽曲再生装置１０の処理］
楽曲再生装置１０の制御部１００が行う処理について説明する。楽曲再生装置１０の処理は、通信部１５０から制御部１００に制御要求が送られることで開始される。その制御要求は、後述するステップＳ２０２でリモコン装置２０ａから送られる、発音情報の記憶を開始することを表す制御要求である。通信部１５０から送られた制御要求に従って、音響制御部１４０は、制御部１００に発音情報を送るように指示される。発音情報を送るように指示されると、ステップＳ１０２の処理が開始される。 [Processing of Music Playback Device 10]
The process which the control part 100 of the music reproduction apparatus 10 performs is demonstrated. The processing of the music playback device 10 is started when a control request is sent from the communication unit 150 to the control unit 100. The control request is a control request that is sent from the remote control device 20a in step S202, which will be described later, and represents the start of storing pronunciation information. In accordance with the control request sent from the communication unit 150, the sound control unit 140 is instructed to send the pronunciation information to the control unit 100. When instructed to send pronunciation information, the process of step S102 is started.

ステップＳ１０２では、発音情報の記憶が開始される。具体的には、音響制御部１４０から送られた発音情報が記憶部１１０に記憶される。発音情報の記憶が開始されると処理はステップＳ１０３に移される。発音情報が記憶部１１０に記憶される処理は、後述するステップＳ１０６の処理が実行されるまで継続される。 In step S102, storage of pronunciation information is started. Specifically, the pronunciation information sent from the acoustic control unit 140 is stored in the storage unit 110. When storage of pronunciation information is started, the process proceeds to step S103. The process of storing the pronunciation information in the storage unit 110 is continued until the process of step S106 described later is executed.

ステップＳ１０３では、同期音情報の生成が指示される。同期音情報を生成させる同期音情報生成指示が音響制御部１４０に送られる。音響制御部１４０は、同期音情報生成指示に従って、あらかじめ定められた周波数の音を表す同期音情報を生成する。生成された同期音情報は、音響制御部１４０により発音情報に合成される。合成された発音情報に従って、スピーカ１４２は発音情報が表す音を発音する。また、その合成された発音情報は、ステップＳ１０２で開始された発音情報を記憶部１１０に記憶させる処理によって記憶部１１０に記憶される。同期音情報同期音情報生成指示が音響制御部１４０に送られると、処理はステップＳ１０４に移される。 In step S103, generation of synchronization sound information is instructed. A synchronous sound information generation instruction for generating synchronous sound information is sent to the acoustic control unit 140. The acoustic control unit 140 generates synchronized sound information representing a sound having a predetermined frequency in accordance with the synchronized sound information generation instruction. The generated synchronized sound information is combined with the pronunciation information by the acoustic control unit 140. In accordance with the synthesized pronunciation information, the speaker 142 generates a sound represented by the pronunciation information. The synthesized pronunciation information is stored in the storage unit 110 by the process of storing the pronunciation information started in step S102 in the storage unit 110. When the synchronization sound information generation instruction is sent to the acoustic control unit 140, the process proceeds to step S104.

ステップＳ１０４では、リモコン装置２０ａから発音情報の記憶を終了する制御要求が送られたか否かが判断される。後述するステップＳ２０６において、発音情報の記憶を終了することを表す制御要求がリモコン装置２０ａの通信部２５０から通信部１５０に送られる。通信部１５０は送られた制御要求を制御部１００に送る。制御要求が送られると、ステップＳ１０４の処理はＹＥＳとなり、処理はステップＳ１０５に移される。制御要求が送られない場合、ステップＳ１０４の処理はＮＯとなり、ステップＳ１０４の処理が継続される。 In step S104, it is determined whether or not a control request for terminating the storage of the pronunciation information is sent from the remote control device 20a. In step S206, which will be described later, a control request indicating the end of the storage of the pronunciation information is sent from the communication unit 250 of the remote control device 20a to the communication unit 150. The communication unit 150 sends the sent control request to the control unit 100. When a control request is sent, the process of step S104 becomes YES, and the process proceeds to step S105. If the control request is not sent, the process of step S104 is NO, and the process of step S104 is continued.

ステップＳ１０５では、同期音情報の生成終了が指示される。ステップＳ１０４において通信部１５０から送られた制御要求に従って、同期音情報の生成を終了させる同期音情報生成終了指示が音響制御部１４０に送られる。音響制御部１４０は、送られた同期音情報生成終了指示に従って、同期音情報の生成を終了する。同期音情報生成終了指示が音響制御部１４０に送られると、処理はステップＳ１０６に移される。 In step S105, the end of generation of the synchronization sound information is instructed. In accordance with the control request sent from the communication unit 150 in step S104, a synchronization sound information generation end instruction for terminating the generation of synchronization sound information is sent to the acoustic control unit 140. The acoustic control unit 140 ends the generation of the synchronization sound information in accordance with the transmitted synchronization sound information generation end instruction. When the synchronization sound information generation end instruction is sent to the acoustic control unit 140, the process proceeds to step S106.

ステップＳ１０６では、発音情報の記憶が終了される。ステップＳ１０４において通信部１５０から送られた制御要求に従って、音響制御部１４０は、発音情報を制御部１００に送ることを停止するように指示される。また、ステップＳ１０２で開始された発音情報を記憶部１１０に記憶させる処理が停止される。発音情報の記憶が終了されると、処理はステップＳ１０７に移される。 In step S106, the storage of the pronunciation information is terminated. In accordance with the control request sent from the communication unit 150 in step S104, the acoustic control unit 140 is instructed to stop sending the pronunciation information to the control unit 100. Further, the process of storing the pronunciation information started in step S102 in the storage unit 110 is stopped. When the pronunciation information is stored, the process proceeds to step S107.

ステップＳ１０７では、記憶部１１０に記憶された発音情報がリモコン装置２０ａに送られる。具体的には、記憶部１１０に記憶された発音情報が読み出され、通信部１５０に送られる。発音情報が全て通信部１５０送られると、発音情報の送信が終了したことを表す完了情報が通信部１５０に送られる。通信部１５０は、発音情報を通信部２５０に送り、発音情報を送り終えると、完了情報を通信部２５０に送る。完了情報が、通信部１５０に送られると、図２のフローチャートにおける、楽曲再生装置１０の処理が終了される。 In step S107, the pronunciation information stored in the storage unit 110 is sent to the remote control device 20a. Specifically, the pronunciation information stored in the storage unit 110 is read and sent to the communication unit 150. When all the pronunciation information is sent to the communication unit 150, completion information indicating that the transmission of the pronunciation information has been completed is sent to the communication unit 150. The communication unit 150 sends the pronunciation information to the communication unit 250, and sends the completion information to the communication unit 250 after completing the transmission of the pronunciation information. When the completion information is sent to the communication unit 150, the process of the music reproducing device 10 in the flowchart of FIG.

［リモコン端末２０ａの処理］
リモコン端末２０ａの制御部２００が行う処理について説明する。リモコン装置２０ａの処理は、音声検索を開始することを表す制御要求が操作処理部２２０から制御部２００に送られることにより開始される。 [Processing of remote control terminal 20a]
Processing performed by the control unit 200 of the remote control terminal 20a will be described. The processing of the remote control device 20a is started when a control request indicating the start of voice search is sent from the operation processing unit 220 to the control unit 200.

ステップＳ２０１では、音声操作情報の記憶が開始される。具体的には、音響制御部２４０は、音声操作情報を制御部２００に送るように制御部２００により指示される。音響制御部２４０から送られた音声操作情報が記憶部２１０に記憶される。音声操作情報の記憶が開始されると処理はステップＳ２０２に移される。音声操作情報が記憶部２１０に記憶される処理は、後述するステップＳ２０８の処理が実行されるまで継続される。 In step S201, storage of voice operation information is started. Specifically, the acoustic control unit 240 is instructed by the control unit 200 to send voice operation information to the control unit 200. Voice operation information sent from the acoustic control unit 240 is stored in the storage unit 210. When storage of the voice operation information is started, the process proceeds to step S202. The process of storing the voice operation information in the storage unit 210 is continued until the process of step S208 described later is executed.

ステップＳ２０２では、発音情報の記憶を開始することを表す制御要求が通信部２５０に送られる。通信部２５０は、制御部２００から送られた制御要求を通信部１５０に送る。制御要求が送られると処理はステップＳ２０３に移される。 In step S <b> 202, a control request indicating the start of storing pronunciation information is sent to the communication unit 250. The communication unit 250 sends the control request sent from the control unit 200 to the communication unit 150. If a control request is sent, the process proceeds to step S203.

ステップＳ２０３では、同期音情報が受け取られたか否かが判断される。具体的には、同期音情報に一致する周波数の音を表す特定の情報が、所定の強さ以上の強さで音声操作情報に含まれていることを示す検知信号が、音響制御部２４０から制御部２００に送られたか否かが判断される。ステップＳ１０３の処理において、同期音情報が合成された発音情報が表す音がスピーカ１４２によって発音される。音声操作情報は、スピーカ１４２から発音された発音情報が表す音も含む音の情報である。音響制御部２４０は、フーリエ変換などの周波数解析を音声操作情報に対して実行し、同期音情報の周波数に一致する周波数の音を表す特定の情報が所定の強さ以上の強さで音声操作情報に含まれているか否かを判断する。音響制御部２４０は、特定の情報が所定の強さ以上の強さで音声操作情報に含まれている場合、同期音情報に関連する特定の情報を検知したことを表す検知信号を制御部２００に送る。検知信号が送られた場合、ステップＳ２０３の処理はＹＥＳとなり、処理はステップＳ２０４に移される。検知信号が送られない場合は、ステップＳ２０３の処理はＮＯとなり、ステップＳ２０３の処理が継続される。 In step S203, it is determined whether synchronous sound information has been received. Specifically, a detection signal indicating that specific information representing a sound having a frequency matching the synchronization sound information is included in the voice operation information with a strength higher than a predetermined strength is received from the acoustic control unit 240. It is determined whether or not it has been sent to the control unit 200. In the process of step S103, the sound represented by the sound generation information obtained by synthesizing the synchronization sound information is generated by the speaker 142. The voice operation information is sound information including the sound represented by the pronunciation information generated from the speaker 142. The acoustic control unit 240 performs frequency analysis such as Fourier transform on the voice operation information, and performs voice operation with specific information representing a sound having a frequency matching the frequency of the synchronization sound information with a strength higher than a predetermined strength. Determine whether it is included in the information. When the specific information is included in the voice operation information with a strength greater than or equal to a predetermined strength, the acoustic control unit 240 generates a detection signal indicating that specific information related to the synchronization sound information has been detected. Send to. If the detection signal is sent, the process of step S203 is YES, and the process proceeds to step S204. If the detection signal is not sent, the process of step S203 is NO, and the process of step S203 is continued.

ステップＳ２０４では、ユーザによる音声の入力の許可を表す画像を表示することが指示される。具体的には、ユーザの音声による指示を操作用マイク２４１に入力することの許可を表す画像を表示させる表示指示が映像制御部２３１に送られる。音声の入力の許可を表す画像は、記憶部２１０から読み出される。読み出された許可を表す画像は、映像制御部２３１に送られる。映像制御部２３１は、許可を表す画像を表示部２３２に送る。許可を表す画像が、映像制御部２３１に送られると、処理はステップＳ２０５に移される。 In step S204, it is instructed to display an image representing permission of voice input by the user. Specifically, a display instruction for displaying an image representing permission to input a user's voice instruction to the operation microphone 241 is sent to the video control unit 231. An image representing permission of voice input is read from the storage unit 210. The read image indicating permission is sent to the video control unit 231. The video control unit 231 sends an image indicating permission to the display unit 232. When the image indicating permission is sent to the video control unit 231, the process proceeds to step S205.

ステップＳ２０５では、ユーザの音声の入力が終了したか否かが判断される。具体的には、ユーザの音声の入力が終了したことを表す制御要求が操作処理部２２０から送られたか否かが判断される。ユーザは、音声の入力が終了すると、操作部２２１を操作する。ユーザによる操作部２２１の操作に従って、音声の入力が終了したことを表す操作情報が操作部２２１から操作処理部２２０に送られる。操作処理部２２０は、音声の入力が終了したことを示す操作情報を制御要求に変換し、制御部２００に送る。音声の入力が終了したことを表す制御要求が送られた場合、ステップＳ２０５の処理はＹＥＳとなり、処理はステップＳ２０６に移される。音声の入力が終了したことを表す制御要求が送られない場合、ステップＳ２０５の処理はＮＯとなり、ステップＳ２０５の処理が継続される。 In step S205, it is determined whether or not the user's voice input has been completed. Specifically, it is determined whether or not a control request indicating that the user's voice input has been completed is sent from the operation processing unit 220. When the user finishes inputting voice, the user operates the operation unit 221. In accordance with the operation of the operation unit 221 by the user, operation information indicating that the voice input has been completed is sent from the operation unit 221 to the operation processing unit 220. The operation processing unit 220 converts operation information indicating that voice input has been completed into a control request and sends the control request to the control unit 200. If a control request indicating that the voice input has been completed is sent, the process of step S205 is YES, and the process proceeds to step S206. When the control request indicating that the voice input is finished is not sent, the process of step S205 is NO and the process of step S205 is continued.

ステップＳ２０６では、発音情報の記憶を終了する制御要求が楽曲再生装置１０に送られる。発音情報の記憶を終了することを表す制御要求が通信部２５０に送られる。通信部２５０は、制御部２００から送られた制御要求を通信部１５０に送る。発音情報の記憶を終了することを表す制御要求が制御部２００から通信部２５０へ送られると処理はステップＳ２０７に移される。 In step S <b> 206, a control request for ending the storage of pronunciation information is sent to the music reproducing device 10. A control request indicating termination of storage of pronunciation information is sent to the communication unit 250. The communication unit 250 sends the control request sent from the control unit 200 to the communication unit 150. When a control request indicating termination of storage of pronunciation information is sent from the control unit 200 to the communication unit 250, the process proceeds to step S207.

ステップＳ２０７では、楽曲再生装置１０から送られる発音情報が受け取られたか否かが判断される。具体的には、ステップＳ１０７において、通信部１５０から通信部２５０に発音情報と、完了情報とが送られる。通信部２５０により受け取られた発音情報は、制御部２００に送られ、記憶部２１０に記憶される。さらに、発音情報の送信が終了すると、発音情報の送信が終了したことを表す完了情報が通信部１５０から通信部２５０に送られる。完了情報は通信部２５０から制御部２００に送られる。完了情報が、通信部２５０から制御部２００に送られる場合、ステップＳ２０７の処理はＹＥＳとなり、処理はステップＳ２０８に移される。完了情報が、通信部２５０から制御部２００に送られない場合、ステップＳ２０７の処理はＮＯとなり、ステップＳ２０７の処理が継続される。 In step S207, it is determined whether or not the pronunciation information sent from the music reproducing device 10 has been received. Specifically, in step S107, sound generation information and completion information are sent from the communication unit 150 to the communication unit 250. The pronunciation information received by the communication unit 250 is sent to the control unit 200 and stored in the storage unit 210. Further, when the transmission of the pronunciation information is completed, completion information indicating that the transmission of the pronunciation information is completed is sent from the communication unit 150 to the communication unit 250. The completion information is sent from the communication unit 250 to the control unit 200. When the completion information is sent from the communication unit 250 to the control unit 200, the process of step S207 is YES, and the process proceeds to step S208. When the completion information is not sent from the communication unit 250 to the control unit 200, the process of step S207 is NO and the process of step S207 is continued.

ステップＳ２０８では、音声操作情報の記憶が終了される。具体的には、ステップＳ２０１で開始された音声操作情報を記憶部２１０に記憶させる処理が停止される。また、音声操作情報を制御部２００に送ることを終了させる記憶終了指示が音響制御部２４０に送られる。音響制御部２４０は、記憶終了指示に従って音声操作情報を制御部２００に送ることを停止する。音声操作情報の記憶が終了され、記憶終了指示が音響制御部１４０に送られると、処理はステップＳ２０９に移される。 In step S208, the storage of the voice operation information is terminated. Specifically, the process of storing the voice operation information started in step S201 in the storage unit 210 is stopped. In addition, a storage end instruction for terminating the sending of the voice operation information to the control unit 200 is sent to the acoustic control unit 240. The acoustic control unit 240 stops sending voice operation information to the control unit 200 in accordance with the storage end instruction. When the storage of the voice operation information is ended and a storage end instruction is sent to the acoustic control unit 140, the process proceeds to step S209.

ステップＳ２０９では、音声操作情報と発音情報とが比較され、ユーザの音声であるユーザ音声情報が抽出される。図３は、制御部２００により実行されるユーザ音声情報の抽出処理を表す概略図である。音声操作情報には、同期音情報に対応する周波数の音を表し、かつ、所定の強さ以上の強さの音が含まれる。発音情報には、同期音情報に対応する音が含まれる。同期音情報に対応する音が含まれている区間においては、音声操作情報、および発音情報にそれぞれ含まれる楽曲情報に対応する音と、歌唱情報に対応する音とが、音声操作情報、および発音情報に共通して含まれている。よって、音声操作情報と発音情報とにそれぞれ含まれる２つの同期音情報に対応する音が含まれる２つの区間の始まりと終わりとをそれぞれ合わせることにより、音声操作情報と発音情報とを同期させることができる。 In step S209, the voice operation information and the pronunciation information are compared, and user voice information that is the user's voice is extracted. FIG. 3 is a schematic diagram illustrating user voice information extraction processing executed by the control unit 200. The voice operation information represents a sound having a frequency corresponding to the synchronization sound information and includes a sound having a strength higher than a predetermined strength. The pronunciation information includes a sound corresponding to the synchronized sound information. In the section including the sound corresponding to the synchronized sound information, the sound corresponding to the music information included in the voice operation information and the pronunciation information, and the sound corresponding to the singing information are the voice operation information and the pronunciation. Included in common information. Therefore, the voice operation information and the pronunciation information are synchronized by matching the start and end of the two sections including the sounds corresponding to the two synchronized sound information respectively included in the voice operation information and the pronunciation information. Can do.

同期された音声操作情報と発音情報とにそれぞれ含まれる同期音情報に対応する周波数の音の強さは、同期された時間毎に比較される。比較された結果、音声操作情報と発音情報とにそれぞれ含まれる２つの同期音情報に対応する時間毎の周波数の音の強さの比が算出される。発音情報に含まれる各周波数の音の強さは、算出された比に従って増幅される。これにより、音声操作情報に含まれる同期音情報に対応する周波数の音は、発音情報に含まれる同期音情報に対応する周波数の音と同じ強さにされる。音声操作情報から発音情報に対応する周波数が減算されることで、同期音情報に対応する周波数の音は除去される。また、発音情報に含まれる楽曲情報および歌唱情報も算出された比に従って増幅される。音声操作情報から発音情報に対応する周波数が減算されることで除去される。これにより、図３に示されるように音声操作情報から発音情報に対応する周波数が除去され、ユーザ音声情報が抽出される。ユーザ音声情報が抽出されると、処理はステップＳ２１０に移される。 The strength of the sound of the frequency corresponding to the synchronized sound information respectively included in the synchronized voice operation information and the pronunciation information is compared for each synchronized time. As a result of the comparison, the ratio of the sound intensity at the frequency for each time corresponding to the two synchronized sound information respectively included in the voice operation information and the pronunciation information is calculated. The sound intensity of each frequency included in the pronunciation information is amplified according to the calculated ratio. Thereby, the sound of the frequency corresponding to the synchronous sound information included in the voice operation information is set to the same intensity as the sound of the frequency corresponding to the synchronous sound information included in the pronunciation information. By subtracting the frequency corresponding to the pronunciation information from the voice operation information, the sound having the frequency corresponding to the synchronized sound information is removed. In addition, music information and singing information included in the pronunciation information are also amplified according to the calculated ratio. It is removed by subtracting the frequency corresponding to the pronunciation information from the voice operation information. Thereby, as shown in FIG. 3, the frequency corresponding to the pronunciation information is removed from the voice operation information, and the user voice information is extracted. When user voice information is extracted, the process proceeds to step S210.

ステップＳ２１０では、ユーザ音声情報を文字に変換し、その文字に従って、制御要求が生成される。また、制御要求に従って楽曲名が抽出される。ステップＳ２０９において抽出されたユーザ音声情報に対応する音が時間の経過とともに変化するスペクトルに従って、ユーザ音声情報に含まれる子音と母音とが特定される。子音と母音とが特定されると、ユーザ音声情報に対応する音は対応する文字に置き換えられる。ユーザ音声情報に含まれる文字列が特定されると、特定された文字列で楽曲の検索を行う制御要求が生成される。その制御要求に従って、記憶部２１０に記憶される楽曲リストから特定された文字列に一致する楽曲名が抽出されて読み出される。また、楽曲リストに文字列と一致する楽曲名がない場合は一致する楽曲がないことを表す情報が記憶部２１０から読み出される。 In step S210, user voice information is converted into characters, and a control request is generated according to the characters. The music title is extracted according to the control request. Consonants and vowels included in the user voice information are specified according to a spectrum in which the sound corresponding to the user voice information extracted in step S209 changes with time. When the consonant and the vowel are specified, the sound corresponding to the user voice information is replaced with the corresponding character. When a character string included in the user voice information is specified, a control request for searching for a song with the specified character string is generated. In accordance with the control request, a song name that matches the character string specified from the song list stored in the storage unit 210 is extracted and read. If there is no song name that matches the character string in the song list, information indicating that there is no matching song is read from the storage unit 210.

ステップＳ２１１では、ステップＳ２１０で検索された結果が表示される。ステップＳ２１１の文字列に一致する楽曲名が抽出される処理において、楽曲リストから文字列に一致する楽曲名が抽出された場合、楽曲名と、歌手名、およびジャンルなどの楽曲名に対応付けられた情報と、に対応する画像情報が記憶部２１０から読み出される。記憶部２１０から読み出された各画像情報は、映像制御部２３１に送られる。映像制御部２３１に送られた画像情報は表示可能な形式に変換されて表示部２３２に送られ、楽曲名などの各画像が表示部２３２に表示される。また、楽曲リストに文字列と一致する楽曲名がない場合、一致する楽曲がないことを表す画像情報が記憶部２１０から読み出される。読み出された画像情報は、映像制御部２３１に送られる。映像制御部２３１に送られた画像情報は、表示可能な形式に変換されて表示部２３２に送られ、一致する楽曲がないことを表す画像が表示部２３２に表示される。検索結果が表示されると、音声検索処理は終了される。 In step S211, the result retrieved in step S210 is displayed. In the process of extracting a song name that matches the character string in step S211, if a song name that matches the character string is extracted from the song list, the song name is associated with a song name such as a singer name and a genre. And image information corresponding to the information is read from the storage unit 210. Each piece of image information read from the storage unit 210 is sent to the video control unit 231. The image information sent to the video control unit 231 is converted into a displayable format and sent to the display unit 232, and each image such as a music title is displayed on the display unit 232. If there is no song name that matches the character string in the song list, image information indicating that there is no matching song is read from the storage unit 210. The read image information is sent to the video control unit 231. The image information sent to the video control unit 231 is converted into a displayable format and sent to the display unit 232, and an image indicating that there is no matching music is displayed on the display unit 232. When the search result is displayed, the voice search process is terminated.

［第２の実施形態］
次に、本発明の第２の実施形態について説明する。第１の実施形態では、楽曲再生装置１０と、リモコン装置２０ａとが備えられ、操作用マイク２４１はリモコン装置２０ａに備えられた。また、音声検索処理を行う場合、その処理は楽曲再生装置１０と、リモコン装置２０ａとによって実行された。しかし、第２の実施形態では、楽曲再生装置１０に操作用マイク２４１が備えられ、操作用マイク２４１は音響制御部１４０に接続される。音響制御部１４０は、操作用マイク２４１から送られる音声操作情報を制御部１００に送る。さらに、音声検索処理は、楽曲再生装置１０の制御部１００において実行される。そのため、リモコン装置２０ａの制御部２００で行われる処理は制御部１００で行われる。また、第２の実施形態において、記憶部２１０は記憶部１１０、操作処理部２２０は操作処理部１２０、操作部２２１は操作部１２１、映像再生部２３０は映像再生部１３０、映像制御部２３１は映像制御部１３１、表示部２３２はモニタ１３２、および音響制御部２４０は音響制御部１４０にそれぞれ含まれて構成される。このため、第２の実施形態における楽曲再生装置１０が、第１の実施形態における楽曲再生装置１０およびリモコン装置２０ａのそれぞれの動作を行う。制御部１００の処理については図４を用いて後述する。特に記載がない限り、装置の構成、および情報の意味は、第１の実施形態と同様であるため、その説明を省略する。 [Second Embodiment]
Next, a second embodiment of the present invention will be described. In the first embodiment, the music reproducing device 10 and the remote control device 20a are provided, and the operation microphone 241 is provided in the remote control device 20a. Further, when performing the voice search process, the process is executed by the music reproducing device 10 and the remote control device 20a. However, in the second embodiment, the music reproducing apparatus 10 includes the operation microphone 241, and the operation microphone 241 is connected to the acoustic control unit 140. The sound control unit 140 sends voice operation information sent from the operation microphone 241 to the control unit 100. Furthermore, the voice search process is executed in the control unit 100 of the music playback device 10. Therefore, the process performed by the control unit 200 of the remote control device 20a is performed by the control unit 100. In the second embodiment, the storage unit 210 is the storage unit 110, the operation processing unit 220 is the operation processing unit 120, the operation unit 221 is the operation unit 121, the video playback unit 230 is the video playback unit 130, and the video control unit 231 is The video control unit 131, the display unit 232 are included in the monitor 132, and the acoustic control unit 240 is included in the acoustic control unit 140. For this reason, the music playback device 10 in the second embodiment performs the operations of the music playback device 10 and the remote control device 20a in the first embodiment. The processing of the control unit 100 will be described later with reference to FIG. Unless otherwise specified, the configuration of the apparatus and the meaning of information are the same as those in the first embodiment, and thus description thereof is omitted.

［楽曲再生装置１０の処理］
図４は第２の実施形態における楽曲再生装置１０の音声検索処理を表すフローチャートである。楽曲再生装置１０の制御部１００の処理について説明する。楽曲再生装置１０の処理は、通信部１５０または操作処理部１２０から制御部１００に制御要求が送られることで開始される。その制御要求は、発音情報の記憶を開始することを表す制御要求である。その制御要求に従って、音響制御部１４０は、制御部１００に発音情報、および音声操作情報を送るように指示される。発音情報、および音声操作情報を送るように指示されると、ステップＳ３０１の処理が開始される。 [Processing of Music Playback Device 10]
FIG. 4 is a flowchart showing the voice search processing of the music playback device 10 according to the second embodiment. Processing of the control unit 100 of the music playback device 10 will be described. The processing of the music playback device 10 is started when a control request is sent from the communication unit 150 or the operation processing unit 120 to the control unit 100. The control request is a control request indicating the start of storing pronunciation information. In accordance with the control request, the acoustic control unit 140 is instructed to send the pronunciation information and the voice operation information to the control unit 100. When it is instructed to send pronunciation information and voice operation information, the process of step S301 is started.

ステップＳ３０１では、音声操作情報と、発音情報との記憶が開始される。具体的には、音響制御部１４０から送られた発音情報と、音声操作情報とが記憶部１１０に記憶される。発音情報の記憶が開始されると、処理はステップＳ３０２に移される。発音情報と、音声操作情報とが記憶部１１０に記憶される処理は、後述するステップＳ３０６の処理が実行されるまで継続される。 In step S301, storage of voice operation information and pronunciation information is started. Specifically, the pronunciation information and voice operation information sent from the acoustic control unit 140 are stored in the storage unit 110. When storage of pronunciation information is started, the process proceeds to step S302. The process of storing the pronunciation information and the voice operation information in the storage unit 110 is continued until the process of step S306 described later is executed.

ステップＳ３０２では、同期音情報の生成が指示される。具体的には、同期音情報を生成させる同期音情報生成指示が音響制御部１４０に送られる。音響制御部１４０は、同期音情報生成指示に従って、あらかじめ定められた周波数の音を表す同期音情報を生成する。生成された同期音情報は、音響制御部１４０により発音情報に合成される。合成された発音情報に従って、スピーカ１４２は発音情報が表す音を発音する。また、その合成された発音情報は、ステップＳ３０１で開始された発音情報を記憶部１１０に記憶させる処理によって記憶部１１０に記憶される。同期音情報生成指示が音響制御部１４０に送られると、処理はステップＳ３０３に移される。 In step S302, generation of synchronization sound information is instructed. Specifically, a synchronization sound information generation instruction for generating synchronization sound information is sent to the acoustic control unit 140. The acoustic control unit 140 generates synchronized sound information representing a sound having a predetermined frequency in accordance with the synchronized sound information generation instruction. The generated synchronized sound information is combined with the pronunciation information by the acoustic control unit 140. In accordance with the synthesized pronunciation information, the speaker 142 generates a sound represented by the pronunciation information. The synthesized pronunciation information is stored in the storage unit 110 by the process of storing the pronunciation information started in step S301 in the storage unit 110. When the synchronization sound information generation instruction is sent to the acoustic control unit 140, the process proceeds to step S303.

ステップＳ３０３では、ユーザによる音声の入力の許可を表す画像を表示することが指示される。具体的には、ユーザの音声による指示を操作用マイク２４１に入力することの許可を表す画像を表示させる表示指示が映像制御部１３１に送られる。音声の入力の許可を表す画像情報は、記憶部１１０から読み出される。読み出された許可を表す画像情報は、映像制御部１３１に送られる。映像制御部１３１は、許可を表す画像情報を表示可能な形式に変換してモニタ１３２に送る。許可を表す表示画像情報が映像制御部１３１に送られると、処理はステップＳ３０４に移される。 In step S303, it is instructed to display an image representing permission of voice input by the user. Specifically, a display instruction for displaying an image representing permission to input an instruction by the user's voice to the operation microphone 241 is sent to the video control unit 131. Image information indicating permission of voice input is read from the storage unit 110. Image information indicating the read permission is sent to the video control unit 131. The video control unit 131 converts the image information indicating permission into a displayable format and sends it to the monitor 132. When the display image information indicating permission is sent to the video control unit 131, the process proceeds to step S304.

ステップＳ３０４では、ユーザの音声の入力が終了したか否かが判断される。具体的には、ユーザの音声の入力が終了したことを表す制御要求が操作処理部１２０から送られたか否かが判断される。音声の入力が終了し、ユーザが操作部１２１を操作する場合、音声の入力が終了すると、ユーザによる操作部１２１の操作に従って、音声の入力が終了したことを表す操作情報が操作部１２１から操作処理部１２０に送られる。操作処理部１２０は、音声の入力が終了したことを示す操作情報を制御要求に変換し、制御部１００に送る。音声の入力が終了したことを表す制御要求が送られた場合、ステップＳ３０４の処理はＹＥＳとなり、処理はステップＳ３０５に移される。音声の入力が終了したことを表す制御要求が送られない場合、ステップＳ３０４の処理はＮＯとなり、ステップＳ３０４の処理が継続される。 In step S304, it is determined whether or not the user's voice input has been completed. Specifically, it is determined whether or not a control request indicating that the user's voice input has been completed is sent from the operation processing unit 120. When the voice input is finished and the user operates the operation unit 121, when the voice input is finished, operation information indicating that the voice input is finished is operated from the operation unit 121 according to the operation of the operation unit 121 by the user. It is sent to the processing unit 120. The operation processing unit 120 converts operation information indicating that voice input has been completed into a control request and sends the control request to the control unit 100. If a control request indicating that the voice input has been completed is sent, the process of step S304 is YES, and the process proceeds to step S305. When the control request indicating that the input of the voice is finished is not sent, the process of step S304 is NO and the process of step S304 is continued.

ステップＳ３０５では、同期音情報の生成終了が指示される。具体的には、同期音情報の生成を終了させる同期音情報生成終了指示が音響制御部１４０に送られる。音響制御部１４０は、送られた同期音情報生成終了指示に従って、同期音情報の生成を終了する。同期音情報生成終了指示が音響制御部１４０に送られると、処理はステップＳ３０６に移される。 In step S305, the generation of synchronization sound information is instructed to end. Specifically, a synchronization sound information generation end instruction for ending generation of synchronization sound information is sent to the acoustic control unit 140. The acoustic control unit 140 ends the generation of the synchronization sound information in accordance with the transmitted synchronization sound information generation end instruction. When the synchronization sound information generation end instruction is sent to the acoustic control unit 140, the process proceeds to step S306.

ステップＳ３０６では、音声操作情報の生成が終了される。具体的には、ステップＳ３０１で開始された音声操作情報を記憶部１１０に記憶させる処理が停止される。また、音声操作情報を制御部１００に送ることを終了させる記憶終了指示が音響制御部１４０に送られる。音響制御部１４０は、記憶終了指示に従って音声操作情報を制御部１００に送ることを停止する。音声操作情報の記憶が終了され、記憶終了指示が音響制御部１４０に送られると、処理はステップＳ３０７に移される。 In step S306, the generation of the voice operation information is terminated. Specifically, the process of storing the voice operation information started in step S301 in the storage unit 110 is stopped. In addition, a storage end instruction for ending sending the voice operation information to the control unit 100 is sent to the acoustic control unit 140. The acoustic control unit 140 stops sending voice operation information to the control unit 100 in accordance with the storage end instruction. When the storage of the voice operation information is ended and a storage end instruction is sent to the acoustic control unit 140, the process proceeds to step S307.

ステップＳ３０７では、音声操作情報と発音情報とが比較され、ユーザの音声であるユーザ音声情報が抽出される。図３は、制御部１００により実行されるユーザ音声情報の抽出処理を表す概略図である。音声操作情報には、同期音情報に対応する周波数の音を表し、かつ、所定の強さ以上の強さの音が含まれる。発音情報には、同期音情報に対応する音が含まれる。同期音情報に対応する音が含まれている区間においては、音声操作情報、および発音情報にそれぞれ含まれる楽曲情報に対応する音と、歌唱情報に対応する音とが、音声操作情報、および発音情報に共通して含まれている。よって、音声操作情報と発音情報とにそれぞれ含まれる２つの同期音情報に対応する音が含まれる２つの区間の始まりと終わりとをそれぞれ合わせることにより、音声操作情報と発音情報とを同期させることができる。 In step S307, the voice operation information and the pronunciation information are compared, and user voice information that is the user's voice is extracted. FIG. 3 is a schematic diagram illustrating user voice information extraction processing executed by the control unit 100. The voice operation information represents a sound having a frequency corresponding to the synchronization sound information and includes a sound having a strength higher than a predetermined strength. The pronunciation information includes a sound corresponding to the synchronized sound information. In the section including the sound corresponding to the synchronized sound information, the sound corresponding to the music information included in the voice operation information and the pronunciation information, and the sound corresponding to the singing information are the voice operation information and the pronunciation. Included in common information. Therefore, the voice operation information and the pronunciation information are synchronized by matching the start and end of the two sections including the sounds corresponding to the two synchronized sound information respectively included in the voice operation information and the pronunciation information. Can do.

同期された音声操作情報と発音情報とにそれぞれ含まれる同期音情報に対応する周波数の音の強さは、同期された時間毎に比較される。比較された結果、音声操作情報と発音情報とにそれぞれ含まれる２つの同期音情報に対応する時間毎の周波数の音の強さの比が算出される。発音情報に含まれる各周波数の音の強さは、算出された比に従って増幅される。これにより、音声操作情報に含まれる同期音情報に対応する周波数の音は、発音情報に含まれる同期音情報に対応する周波数の音と同じ強さにされる。音声操作情報から発音情報に対応する周波数が減算されることで、同期音情報に対応する周波数の音は除去される。また、発音情報に含まれる楽曲情報および歌唱情報も算出された比に従って増幅される。音声操作情報から発音情報に対応する周波数が減算されることで除去される。これにより、図３に示されるように音声操作情報から発音情報に対応する周波数が除去され、ユーザ音声情報が抽出される。ユーザ音声情報が抽出されると、処理はステップＳ３０８に移される。 The strength of the sound of the frequency corresponding to the synchronized sound information respectively included in the synchronized voice operation information and the pronunciation information is compared for each synchronized time. As a result of the comparison, the ratio of the sound intensity at the frequency for each time corresponding to the two synchronized sound information respectively included in the voice operation information and the pronunciation information is calculated. The sound intensity of each frequency included in the pronunciation information is amplified according to the calculated ratio. Thereby, the sound of the frequency corresponding to the synchronous sound information included in the voice operation information is set to the same intensity as the sound of the frequency corresponding to the synchronous sound information included in the pronunciation information. By subtracting the frequency corresponding to the pronunciation information from the voice operation information, the sound having the frequency corresponding to the synchronized sound information is removed. In addition, music information and singing information included in the pronunciation information are also amplified according to the calculated ratio. It is removed by subtracting the frequency corresponding to the pronunciation information from the voice operation information. Thereby, as shown in FIG. 3, the frequency corresponding to the pronunciation information is removed from the voice operation information, and the user voice information is extracted. When user voice information is extracted, the process proceeds to step S308.

ステップＳ３０８では、ユーザ音声情報が文字に変換され、その文字に従って、制御要求が生成される。また、制御要求に従って楽曲名が抽出される。ステップＳ３０８において抽出されたユーザ音声情報に対応する音が時間の経過とともに変化するスペクトルに従って、ユーザ音声情報に含まれる子音と母音とが特定される。子音と母音とが特定されると、ユーザ音声情報に対応する音は対応する文字に置き換えられる。ユーザ音声情報に含まれる文字列が特定されると、特定された文字列で楽曲の検索を行う制御要求が生成される。その制御要求に従って、記憶部１１０に記憶される楽曲リストから特定された文字列に一致する楽曲名が抽出されて読み出される。また、楽曲リストに文字列と一致する楽曲名がない場合は一致する楽曲がないことを表す情報が記憶部１１０から読み出される。 In step S308, the user voice information is converted into characters, and a control request is generated according to the characters. The music title is extracted according to the control request. Consonants and vowels included in the user voice information are specified according to a spectrum in which the sound corresponding to the user voice information extracted in step S308 changes with time. When the consonant and the vowel are specified, the sound corresponding to the user voice information is replaced with the corresponding character. When a character string included in the user voice information is specified, a control request for searching for a song with the specified character string is generated. In accordance with the control request, the music name that matches the character string specified from the music list stored in the storage unit 110 is extracted and read. If there is no song name that matches the character string in the song list, information indicating that there is no matching song is read from the storage unit 110.

ステップＳ３０９では、ステップＳ３０８で検索された結果が表示される。ステップＳ３０８の文字列に一致する楽曲名が抽出される処理において、楽曲リストから文字列に一致する楽曲名が抽出された場合、楽曲名と、歌手名、およびジャンルなどの楽曲名に対応付けられた情報と、に対応する画像が記憶部１１０から読み出される。記憶部１１０から読み出された各画像は、映像制御部１３１に送られる。映像制御部１３１に送られた画像はモニタ１３２に送られ、モニタ１３２に表示される。また、楽曲リストに文字列と一致する楽曲名がない場合、一致する楽曲がないことを表す画像が記憶部１１０から読み出される。読み出された画像は、映像制御部１３１に送られる。映像制御部１３１に送られた画像は、モニタ１３２に送られ、モニタ１３２に表示される。検索結果が表示されると、音声検索処理は終了される。 In step S309, the result retrieved in step S308 is displayed. In the process of extracting a song name that matches the character string in step S308, if a song name that matches the character string is extracted from the song list, the song name is associated with a song name such as a singer name and a genre. And the image corresponding to the information are read from the storage unit 110. Each image read from the storage unit 110 is sent to the video control unit 131. The image sent to the video control unit 131 is sent to the monitor 132 and displayed on the monitor 132. If there is no song name that matches the character string in the song list, an image indicating that there is no matching song is read from the storage unit 110. The read image is sent to the video control unit 131. The image sent to the video control unit 131 is sent to the monitor 132 and displayed on the monitor 132. When the search result is displayed, the voice search process is terminated.

［変形例］
［変形例１］
両実施形態において、音声により楽曲を検索する制御要求の生成について説明をしたが、楽曲のキーを変更するキーコントロールを表す制御要求、および楽曲の再生テンポを変更するテンポコントロールを表す制御要求を音声により生成する処理を行ってもよい。これにより、音声により楽曲を検索するだけでなく、楽曲再生装置１０が行う各種処理を音声により生成される制御要求により切り替えることができる。 [Modification]
[Modification 1]
In both embodiments, generation of a control request for searching for a song by voice has been described. However, a control request that represents key control for changing the key of the song and a control request that represents tempo control for changing the playback tempo of the song are voiced. You may perform the process produced | generated by. Thereby, not only the music can be searched by voice, but various processes performed by the music playback device 10 can be switched by a control request generated by the voice.

［変形例２］
第１の実施形態のリモコン装置２０ａは、リモコン用アプリケーションが実行されたユーザの携帯電話などの端末装置から構成される端末装置２０ｂにより構成されてもよい。端末装置２０ｂは、リモコン用アプリケーションにより、楽曲再生装置１０に関連付けされ、リモコン装置２０ａと同様の動作を実行する。この場合、一般的に携帯電話などの端末装置に備えられている通話用のマイクは、操作用マイク２４１に対応し、赤外線または、インターネット回線を通じて通信可能な構成は、通信部２５０に対応し、楽曲再生装置１０と通信すればよい。 [Modification 2]
The remote control device 20a of the first embodiment may be configured by a terminal device 20b including a terminal device such as a user's mobile phone in which a remote control application is executed. The terminal device 20b is associated with the music playback device 10 by the remote control application, and performs the same operation as the remote control device 20a. In this case, a call microphone generally provided in a terminal device such as a mobile phone corresponds to the operation microphone 241, and a configuration capable of communicating via infrared or an Internet line corresponds to the communication unit 250. What is necessary is just to communicate with the music reproduction apparatus 10.

［変形例３］
第１実施形態において、ステップＳ２０９では、音声操作情報と、発音情報とにそれぞれ含まれる２つの同期音情報に対応する音が含まれる２つの区間の始まりと終わりとがそれぞれ合わせられることにより、音声操作情報と発音情報とが同期された。しかし、変形例３においては、ステップＳ２０９の処理は、音声操作情報と発音情報とに含まれる２つの区間の開始位置のみが合わせられることにより同期されてもよい。このため、変形例３において実行されるステップＳ１０３の同期音情報が生成される処理は、一定時間後に終了されればよい。一定期間とは、ステップＳ２０９において同期音情報が検知されることが可能な期間である。また、ステップＳ１０３の同期音情報が生成される処理は、一定時間の後に終了されるため、ステップＳ１０５の処理は実行されない。一例として、図５は第１の実施形態においてステップＳ１０３の同期音情報が生成される処理が一定時間の後に終了された場合のステップＳ２０９の処理の概要を表す。音声操作情報と発音情報とにそれぞれ含まれる２つの同期音情報に対応する音が含まれる２つの区間の始まりを合わせることにより、音声操作情報と発音情報とを同期させることができる。同期後の処理は、ステップＳ２０９と同様の処理を行う。変形例３のユーザ音声情報が抽出される処理は、記憶された音声操作情報、または発音情報のうち短い期間記憶されていた情報の終端まで行われる。これにより、ユーザ音声情報が抽出される。第２の実施形態においても同様に、ステップＳ３０７の処理では、音声操作情報と発音情報とに含まれる２つの区間の開始位置のみが合わせられることにより同期されてもよい。または、ステップＳ３０３の同期音情報が生成される処理は、一定時間後に終了されればよい。そして、ステップＳ３０５の処理は実行されない。 [Modification 3]
In the first embodiment, in step S209, the start and end of two sections including sounds corresponding to two pieces of synchronized sound information included in the sound operation information and the pronunciation information are combined, respectively. Operation information and pronunciation information were synchronized. However, in Modification 3, the process of step S209 may be synchronized by matching only the start positions of the two sections included in the voice operation information and the pronunciation information. For this reason, the process of generating the synchronization sound information in step S103 executed in the third modification may be terminated after a predetermined time. The certain period is a period during which the synchronization sound information can be detected in step S209. Further, since the process of generating the synchronization sound information in step S103 is terminated after a certain time, the process of step S105 is not executed. As an example, FIG. 5 shows an overview of the process of step S209 when the process of generating the synchronization sound information of step S103 is terminated after a predetermined time in the first embodiment. The voice operation information and the pronunciation information can be synchronized by matching the beginnings of the two sections in which the sounds corresponding to the two synchronized sound information included in the voice operation information and the pronunciation information are included. The process after the synchronization is the same as that in step S209. The process of extracting user voice information according to the third modification is performed up to the end of the stored voice operation information or the information stored for a short period of time in the pronunciation information. Thereby, user voice information is extracted. Similarly, in the second embodiment, in the process of step S307, only the start positions of the two sections included in the voice operation information and the pronunciation information may be synchronized. Alternatively, the process of generating the synchronization sound information in step S303 may be terminated after a certain time. Then, the process of step S305 is not executed.

［変形例４］
両実施形態において、楽曲は記憶部に記憶された楽曲リストから検索されたが、図示しないサーバ装置に記憶される楽曲リストから検索されてもよい。この変形例では、楽曲再生装置１０、リモコン装置２０ａ、または端末装置２０ｂは、図１に図示しないサーバ装置と接続される。サーバ装置の記憶部は、その記憶領域に楽曲リストを記憶している。ステップＳ２１０または、ステップＳ３０６で生成された制御要求は、サーバ装置に送られ、サーバ装置は制御要求に従って、楽曲リストから楽曲名を抽出する。サーバ装置は、抽出された楽曲名を、楽曲再生装置１０、リモコン装置２０ａ、または端末装置２０ｂに送り、送られた装置は、検索された楽曲名を表示させる。 [Modification 4]
In both embodiments, the music is searched from the music list stored in the storage unit, but may be searched from the music list stored in a server device (not shown). In this modification, the music reproducing device 10, the remote control device 20a, or the terminal device 20b is connected to a server device (not shown in FIG. 1). The storage unit of the server device stores a music list in the storage area. The control request generated in step S210 or step S306 is sent to the server device, and the server device extracts the song name from the song list according to the control request. The server device sends the extracted music name to the music playback device 10, the remote control device 20a, or the terminal device 20b, and the sent device displays the searched music name.

［本実施形態の効果］
本実施形態に記載の楽曲再生装置１０によれば、楽曲再生装置１０が発音させる音は、音声検索を行う場合には雑音となってしまう。そのため、楽曲再生装置１０は、発音させる音を表す発音情報を記憶させ、リモコン端末２０ａは、記憶された発音情報を基に、ユーザの音声を表すユーザ音声情報を含む音声操作情報からユーザ音声情報を抽出することができる。これにより、音声操作情報から、発音情報が削除されユーザの音声情報をより正しく認識することができる。また、本実施形態によれば、雑音を検知するためのマイクを備える必要がない。 [Effect of this embodiment]
According to the music reproducing device 10 described in the present embodiment, the sound produced by the music reproducing device 10 becomes noise when performing a voice search. Therefore, the music reproducing device 10 stores pronunciation information representing the sound to be generated, and the remote control terminal 20a uses the voice operation information including the user voice information representing the user's voice based on the stored pronunciation information as the user voice information. Can be extracted. Thereby, the pronunciation information is deleted from the voice operation information, and the user's voice information can be recognized more correctly. Moreover, according to this embodiment, it is not necessary to provide the microphone for detecting noise.

［特許請求の範囲記載の構成と本実施形態との対応関係］
本実施形態に記載の構成と、特許請求の範囲記載の構成との対応関係は次の通りである。楽曲再生装置１０は楽曲再生装置の一例である。音響制御部１４０は、音響制御手段、取得手段、および同期音情報付加手段の一例である。制御部１００は、抽出手段、第１制御手段、および第２制御手段の一例である。制御部２００は、抽出手段、第１制御手段の一例である。音響制御部２４０は、取得手段の一例である。リモコン装置２０ａおよび、端末装置２０ｂは端末装置の一例である。楽曲再生装置１０は、楽曲再生装置の一例である。ステップＳ２０９、またはステップＳ３０７の処理は、抽出手段の動作および抽出ステップの一例である。ステップＳ１０２からステップＳ１０６までの処理、およびステップＳ３０１からステップＳ３０６までの処理は、音響制御ステップの一例である。ステップＳ２１０、またはステップＳ３０８の処理は、生成ステップの一例である。 [Correspondence Relationship between Configuration described in Claims and This Embodiment]
The correspondence between the configuration described in the present embodiment and the configuration described in the claims is as follows. The music playback device 10 is an example of a music playback device. The acoustic control unit 140 is an example of an acoustic control unit, an acquisition unit, and a synchronization sound information addition unit. The control unit 100 is an example of an extraction unit, a first control unit, and a second control unit. The control unit 200 is an example of an extraction unit and a first control unit. The acoustic control unit 240 is an example of an acquisition unit. The remote control device 20a and the terminal device 20b are examples of terminal devices. The music playback device 10 is an example of a music playback device. The process of step S209 or step S307 is an example of the operation of the extraction means and the extraction step. The process from step S102 to step S106 and the process from step S301 to step S306 are an example of an acoustic control step. The process of step S210 or step S308 is an example of a generation step.

１…カラオケ装置、１０…楽曲再生装置、２０ａ…リモコン装置、２０ｂ…端末装置、１００…制御部、１１０…記憶部、１２０…操作制御部、１３１…映像制御部、１４０…音響制御部、１５０…通信部、２００…制御部、２１０…記憶部、２２０…操作制御部、２３１…映像制御部、２４０…音響制御部、２５０…通信部。 DESCRIPTION OF SYMBOLS 1 ... Karaoke apparatus, 10 ... Music reproduction | regeneration apparatus, 20a ... Remote control apparatus, 20b ... Terminal device, 100 ... Control part, 110 ... Memory | storage part, 120 ... Operation control part, 131 ... Image | video control part, 140 ... Sound control part, 150 ... Communication unit, 200 ... Control unit, 210 ... Storage unit, 220 ... Operation control unit, 231 ... Video control unit, 240 ... Sound control unit, 250 ... Communication unit.

Claims

Sound control means for generating the pronunciation information representing the sound to be generated by a speaker, which is pronunciation information including music information representing the reproduced sound of the music and singing information representing the singing sound of the singer,
Acquisition means for acquiring voice operation information including user voice information representing a user's voice and information representing a sound generated from a speaker according to the pronunciation information;
Extracting means for comparing the voice operation information with the pronunciation information and extracting the user voice information included in the voice operation information;
First control means for generating instruction information representing various instructions according to the user voice information extracted by the extraction means;
A karaoke apparatus comprising:

Synchronous sound information adding means for adding synchronous sound information representing a sound of a predetermined frequency to the sounding information in order to synchronize the voice operation information and the sounding information,
2. The karaoke apparatus according to claim 1, wherein the extraction unit synchronizes the voice operation information and the pronunciation information according to a predetermined frequency represented by the synchronization sound information, and extracts the user voice information.

The acquisition means acquires a sound having a predetermined frequency represented by the synchronous sound information generated from a speaker according to the pronunciation information,
The synchronization sound information adding means ends the addition of the synchronization sound information when the acquisition means finishes acquiring the user voice information,
The extraction means includes a start position and an end position of specific information representing a sound having a frequency matching the synchronization sound information included in the voice operation information, an addition start position of the synchronization sound information included in the pronunciation information, and The karaoke apparatus according to claim 2, wherein the user voice information is extracted by synchronizing with an addition end position.

A karaoke device comprising a music playback device for playing back music and a movable terminal device,
The terminal device
The acquisition means, the extraction means, the first control means for executing various processes, and the first communication means for communicating with the music reproducing device;
With
The music player is
A second control means for executing the acoustic control means, the synchronization sound information adding means, various processes, and a second communication means for communicating with the terminal device;
With
4. The karaoke apparatus according to claim 2, wherein the first control unit generates instruction information representing various instructions according to the user voice information extracted by the extracting unit. 5.

The first control means causes the obtaining means to start obtaining the voice operation information;
A process of sending a generation instruction representing the generation of the pronunciation information to the second control means;
When the voice operation information acquired by the acquisition unit includes the synchronous sound information, a process of displaying a notification indicating permission to include the user voice information in the voice operation information on a display unit;
A process of sending a generation end instruction to end the generation of the pronunciation information to the second control means when ending the inclusion of the user voice information in the voice operation information;
Processing that causes the extraction means to compare the voice operation information and the pronunciation information to cause the extraction means to extract the user voice information included in the voice operation information;
In accordance with the extracted user voice information, processing to generate instruction information representing various instructions,
The second control means includes
In accordance with the generation instruction from the first control unit, a process for causing the acoustic control unit to start generating the pronunciation information;
A process of adding the synchronization sound information to the sound generation information by the synchronization sound information adding means and causing a speaker to sound;
In accordance with the generation end instruction from the first control unit, the sound control unit is caused to end the generation of the pronunciation information and the sounding information is sent to the first control unit.
The karaoke apparatus according to claim 4.

A sound control step for generating the pronunciation information representing the sound to be generated by a speaker, which is pronunciation information including music information representing the reproduced sound of the music and singing information representing the singing sound of the singer,
An acquisition step of acquiring voice operation information including user voice information representing a user's voice and information representing a sound generated from a speaker;
An extraction step for comparing the voice operation information with the pronunciation information and extracting the user voice information included in the voice operation information;
A generation step for generating instruction information representing various instructions according to the user voice information extracted in the extraction step;
A method for controlling a karaoke apparatus, comprising: