JP7240271B2

JP7240271B2 - Karaoke input device

Info

Publication number: JP7240271B2
Application number: JP2019119502A
Authority: JP
Inventors: 宇将永沼
Original assignee: Daiichikosho Co Ltd
Current assignee: Daiichikosho Co Ltd
Priority date: 2019-06-27
Filing date: 2019-06-27
Publication date: 2023-03-15
Anticipated expiration: 2039-06-27
Also published as: JP2021005027A

Description

本発明はカラオケ用入力装置に関する。 The present invention relates to an input device for karaoke.

カラオケ装置に付属するリモコン装置を用いて、コマンドや検索ワードを音声入力し、カラオケ演奏のテンポやキーを変更したり、楽曲検索を行う技術が知られている。 2. Description of the Related Art Techniques for inputting commands and search words by voice using a remote controller attached to a karaoke machine to change the tempo and key of a karaoke performance and to search for music are known.

たとえば、特許文献１には、複数の検索語を含む一続きの音声データから各検索語を自動的に抽出し、高精度の楽曲検索を行うことが可能な楽曲検索システムが開示されている。 For example, Patent Literature 1 discloses a music search system capable of automatically extracting each search term from a series of audio data containing a plurality of search terms and performing a highly accurate music search.

特開２００２－１８９４８３号公報JP-A-2002-189483

ここで、ある利用者がカラオケ歌唱を行っている最中に、別の利用者がリモコン装置を用いて音声入力により楽曲検索を行うとする。この場合、リモコン装置は、別の利用者の音声ではなく、ある利用者のカラオケ歌唱に伴う歌唱音声から検索ワードを抽出する可能性がある。その結果、リモコン装置は、検索ワードを誤認識することになり、利用者の希望とは異なる検索結果が提示される恐れがある。 Here, assume that while one user is singing karaoke, another user uses the remote controller to search for music by voice input. In this case, the remote control device may extract the search word from the singing voice of a certain user accompanying karaoke singing, instead of the voice of another user. As a result, the remote control device may erroneously recognize the search word and present a search result different from the user's desired result.

一方、利用者がカラオケ歌唱を行う楽曲の歌詞と、音声入力により実行されるコマンドとが一致する可能性もある。たとえば利用者がある楽曲の歌詞である「中止」という単語を発したとする。この場合、リモコン装置は、歌詞である「中止」をコマンド「カラオケ演奏の中止」であると誤認識し、カラオケ演奏を中止してしまう恐れがある。 On the other hand, there is a possibility that the lyrics of the song that the user sings in karaoke and the command executed by voice input match. For example, suppose that the user utters the word "stop" which is the lyrics of a certain song. In this case, the remote controller may erroneously recognize the lyrics "stop" as the command "stop karaoke performance" and stop the karaoke performance.

本発明の目的は、カラオケ歌唱の場において音声入力を利用する際の誤認識を低減することが可能なカラオケ用入力装置を提供することにある。 SUMMARY OF THE INVENTION It is an object of the present invention to provide a karaoke input device capable of reducing erroneous recognition when using voice input in karaoke singing.

上記目的を達成するための一の発明は、カラオケ歌唱を行う際に利用するカラオケ用入力装置であって、カラオケ歌唱の際に実行可能な処理に対応する複数のコマンド、及び前記コマンドの実行または楽曲の検索を指示するためのトリガーワードをそれぞれ異なるテキストデータと紐付けて記憶するデータ記憶部と、集音手段から出力された利用者の音声信号を音声認識処理し、テキストデータとして出力する音声処理部と、検索ワードに基づいて楽曲の検索を行う検索部と、前記利用者に対し、検索の結果を報知する報知部と、コマンドを実行する実行部と、前記音声処理部が出力したテキストデータに基づいて前記トリガーワードを取得する制御部であって、前記トリガーワードを取得した後、第１の所定時間が経過するまでに前記音声処理部があるテキストデータを出力した場合、当該あるテキストデータを前記検索ワードとする楽曲の検索を前記検索部に指示する第１の処理、及び前記コマンドに紐付けられたテキストデータが出力された際、当該テキストデータに紐付けられたコマンドを記憶手段に記憶し、前記トリガーワードを取得し且つ前記記憶手段に前記コマンドを記憶している場合、記憶している前記コマンドの実行を前記実行部に指示した後、記憶している前記コマンドを削除する一方、最新のコマンドの記憶から第２の所定時間が経過した場合、記憶している前記コマンドを削除する第２の処理を行う制御部と、を有するカラオケ用入力装置である。
本発明の他の特徴については、後述する明細書及び図面の記載により明らかにする。 One invention for achieving the above object is a karaoke input device used when singing karaoke, comprising a plurality of commands corresponding to processes executable when singing karaoke, and execution or execution of the commands. A data storage unit that stores trigger words for instructing music searches in association with different text data, and a voice that performs voice recognition processing on the user's voice signal output from the sound collecting means and outputs it as text data. a processing unit, a search unit that searches for music based on a search word, a notification unit that notifies the user of search results, an execution unit that executes a command, and a text output by the voice processing unit A control unit that acquires the trigger word based on data, and when the voice processing unit outputs certain text data within a first predetermined time after acquiring the trigger word, the certain text a first process of instructing the search unit to search for a song using the data as the search word; when the trigger word is acquired and the command is stored in the storage means, the stored command is deleted after instructing the execution unit to execute the stored command. On the other hand, the karaoke input device has a control section that performs a second process of deleting the stored command when a second predetermined time has elapsed since the latest command was stored.
Other features of the present invention will be clarified by the description of the specification and drawings described later.

本発明によれば、カラオケ歌唱の場において音声入力を利用する際の誤認識を低減することができる。 According to the present invention, it is possible to reduce erroneous recognition when using voice input in karaoke singing.

実施形態に係るカラオケ装置を示す図である。It is a figure which shows the karaoke apparatus which concerns on embodiment. 実施形態に係るリモコン装置を示す図である。It is a figure which shows the remote control device which concerns on embodiment. 実施形態に係るデータ記憶部が記憶するテーブルを示す図である。It is a figure which shows the table which the data storage part which concerns on embodiment memorize|stores. 実施形態に係るリモコン装置の制御部による第１の処理を示すフローチャートである。4 is a flowchart showing first processing by a control unit of the remote control device according to the embodiment; 実施形態に係るリモコン装置の制御部による第２の処理を示すフローチャートである。9 is a flowchart showing second processing by the control unit of the remote control device according to the embodiment;

図１～図４Ｂを参照して、実施形態に係るカラオケ用入力装置について説明する。 A karaoke input device according to an embodiment will be described with reference to FIGS. 1 to 4B.

＝＝カラオケ装置＝＝
カラオケ装置Ｋは、楽曲のカラオケ演奏、及び利用者がカラオケ歌唱を行うための装置である。図１に示すように、カラオケ装置Ｋは、カラオケ本体１０、スピーカ２０、表示装置３０、マイク４０、及びリモコン装置５０を備える。 ==Karaoke Device==
The karaoke device K is a device for performing karaoke music and for users to sing karaoke songs. As shown in FIG. 1, the karaoke machine K includes a karaoke main body 10, a speaker 20, a display device 30, a microphone 40, and a remote control device 50. As shown in FIG.

カラオケ本体１０は、選曲された楽曲の演奏制御、歌詞や背景映像等の表示制御、マイク４０を通じて入力された音声信号の処理といった、カラオケ演奏やカラオケ歌唱に関する各種の制御を行う。スピーカ２０はカラオケ本体１０からの放音信号に基づいて放音するための構成である。表示装置３０はカラオケ本体１０からの信号に基づいて映像や画像を画面に表示するための構成である。マイク４０は利用者の歌唱音声をアナログの音声信号に変換してカラオケ本体１０に入力するための構成である。リモコン装置５０は、カラオケ歌唱を行う際、カラオケ本体１０に対する各種操作をおこなうための装置である。本実施形態におけるリモコン装置５０は「カラオケ用入力装置」に相当する。 The karaoke main body 10 performs various controls related to karaoke performance and karaoke singing, such as performance control of selected music, display control of lyrics, background images, etc., and processing of audio signals input through the microphone 40 . The speaker 20 is configured to emit sound based on the sound emission signal from the karaoke main body 10 . The display device 30 is configured to display video and images on the screen based on the signal from the karaoke main body 10 . The microphone 40 is configured to convert the user's singing voice into an analog voice signal and input it to the karaoke main body 10 . The remote control device 50 is a device for performing various operations on the karaoke main body 10 when performing karaoke singing. The remote control device 50 in this embodiment corresponds to a "karaoke input device".

＝＝リモコン装置＝＝
図２に示すように、本実施形態に係るリモコン装置５０は、記憶手段５０ａ、通信手段５０ｂ、表示手段５０ｃ、入力手段５０ｄ、集音手段５０ｅ、及び制御手段５０ｆを備える。各構成はインターフェース（図示なし）を介してバスＢに接続されている。 == remote control device ==
As shown in FIG. 2, the remote control device 50 according to this embodiment includes storage means 50a, communication means 50b, display means 50c, input means 50d, sound collecting means 50e, and control means 50f. Each configuration is connected to bus B via an interface (not shown).

［記憶手段］
記憶手段５０ａは、各種のデータを記憶する大容量の記憶装置であり、たとえばハードディスクドライブなどである。本実施形態において、記憶手段５０ａの領域の一部は、データ記憶部１００として機能する。 [Storage means]
The storage unit 50a is a large-capacity storage device that stores various data, such as a hard disk drive. In this embodiment, part of the area of the storage means 50a functions as the data storage section 100. FIG.

（データ記憶部）
データ記憶部１００は、複数のコマンド、及びトリガーワードをそれぞれ異なるテキストデータと紐付けて記憶する。 (data storage unit)
The data storage unit 100 stores a plurality of commands and trigger words in association with different text data.

コマンドは、カラオケ歌唱の際に実行可能な処理に対応する命令である。コマンドは、たとえば、「カラオケ演奏のテンポを上げる」、「カラオケ演奏のキーを下げる」、「カラオケ演奏を一時停止する」、「マイクの音量を上げる」、「スピーカからの音量を下げる」、「歌詞の表示を消す」等の処理を実行するための命令である。 A command is an instruction corresponding to a process that can be executed during karaoke singing. Commands are, for example, "Increase the tempo of karaoke performance", "Lower the key of karaoke performance", "Pause karaoke performance", "Increase the volume of the microphone", "Lower the volume from the speaker", " This is a command for executing a process such as “turn off display of lyrics”.

トリガーワードは、コマンドの実行または楽曲の検索を指示するための単語または短文である。トリガーワードは、楽曲の歌詞や利用者間の会話に出てこないような造語であることが好ましい。トリガーワードは、リモコン装置毎に予め一のワードが設定されている。 A trigger word is a word or short sentence for instructing execution of a command or search for music. The trigger word is preferably a coined word that does not appear in the lyrics of a song or conversation between users. One trigger word is set in advance for each remote control device.

テキストデータは、コマンドやトリガーワードを識別するためのデータである。複数のコマンド及びトリガーワードには、それぞれ異なる一のテキストデータが紐付けられている。 Text data is data for identifying commands and trigger words. A plurality of commands and trigger words are each associated with a single different piece of text data.

図３は、データ記憶部１００に記憶されているテーブルの例である。たとえば、コマンドＣ０１（カラオケ演奏のテンポを５％上げる）に対しては、「テンポアゲテ」のテキストデータが紐付けられている。また、この例では、テキストデータ「カラオーケー」がトリガーワードとして紐付けられている。なお、テーブルに記憶されていないテキストデータについては、対応するコマンドが無いものとして取り扱う。以下、データ記憶部１００には図３のテーブルが記憶されているものとして説明する。 FIG. 3 is an example of a table stored in the data storage unit 100. As shown in FIG. For example, the command C01 (increase the tempo of karaoke performance by 5%) is associated with the text data of "tempo agete". Also, in this example, the text data "Karaoke" is linked as the trigger word. Note that text data not stored in the table is treated as having no corresponding command. In the following description, it is assumed that the data storage unit 100 stores the table shown in FIG.

［通信手段・表示手段・入力手段・集音手段］
通信手段５０ｂは、カラオケ本体１０との通信を行うためのインターフェースを提供する。表示手段５０ｃは、各種情報を表示させるための構成である。入力手段５０ｄは、利用者が各種の指示入力を行うための構成である。入力手段５０ｄは、リモコン装置５０に設けられたボタン等である。或いは、表示手段５０ｃがタッチパネル形式で構成されている場合、表示手段５０ｃは入力手段５０ｄとしても機能する。集音手段５０ｅは、利用者が発した音声を集音し、音声信号として出力するためのマイクである。 [Communication means, display means, input means, sound collection means]
The communication means 50b provides an interface for communicating with the karaoke main body 10. FIG. The display means 50c is a structure for displaying various information. The input means 50d is a structure for the user to input various instructions. The input means 50d is a button or the like provided on the remote control device 50 . Alternatively, if the display means 50c is configured in a touch panel format, the display means 50c also functions as the input means 50d. The sound collecting means 50e is a microphone for collecting the voice uttered by the user and outputting it as a voice signal.

［制御手段］
制御手段５０ｆは、リモコン装置５０における各種の制御を行う。制御手段５０ｆは、ＣＰＵおよびメモリ（いずれも図示無し）を備える。ＣＰＵは、メモリに記憶されたプログラムを実行することにより各種の機能を実現する。 [Control means]
The control means 50 f performs various controls in the remote control device 50 . The control means 50f includes a CPU and memory (both not shown). The CPU implements various functions by executing programs stored in the memory.

ここで、カラオケ装置Ｋを利用する利用者がコマンドや検索ワードの入力を音声で行いたいと考えたとする。この場合、利用者は、たとえば入力手段５０ｄを介し、表示手段５０ｃに表示されている「音声入力」のアイコンを選択する。当該選択に基づいて、制御手段５０ｆのＣＰＵはメモリに記憶されるプログラムを実行し、音声入力モードに移行する。この場合、制御手段５０ｆは、音声処理部２００、検索部３００、報知部４００、実行部５００、及び制御部６００として機能する。 Here, it is assumed that the user who uses the karaoke machine K wants to input commands and search words by voice. In this case, the user selects the "voice input" icon displayed on the display unit 50c, for example, via the input unit 50d. Based on the selection, the CPU of the control means 50f executes the program stored in the memory and shifts to the voice input mode. In this case, the control unit 50f functions as the voice processing unit 200, the search unit 300, the notification unit 400, the execution unit 500, and the control unit 600.

（音声処理部）
音声処理部２００は、集音手段５０ｅから出力された利用者の音声信号を音声認識処理し、テキストデータとして出力する。音声認識処理は、公知の手法を用いることができる。 (sound processing unit)
The voice processing unit 200 performs voice recognition processing on the user's voice signal output from the sound collecting means 50e, and outputs it as text data. A known method can be used for the speech recognition processing.

たとえば、利用者Ｕが集音手段５０ｅに対し「カラオーケー」と発声したとする。集音手段５０ｅは音声を集音し、音声信号として音声処理部２００に出力する。音声処理部２００は、音声信号を処理し、音声信号が示す「カラオーケー」をテキストデータとして出力する。 For example, assume that the user U utters "Karaoke" to the sound collecting means 50e. The sound collecting means 50e collects sound and outputs it to the sound processing section 200 as a sound signal. The audio processing unit 200 processes the audio signal and outputs "Karaoke" indicated by the audio signal as text data.

（検索部）
検索部３００は、検索ワードに基づいて楽曲の検索を行う。検索ワードは、カラオケ歌唱を行う楽曲を検索する際に使用する。具体的に、検索ワードは、歌手名、楽曲名、歌詞の一部等である。検索部３００は、複数の楽曲の中から検索ワードを含む楽曲を抽出する。なお、検索ワードに基づく楽曲の検索は公知の手法を用いることができる。本実施形態において検索ワードは音声入力される。検索部３００は、制御部６００からの指示に応じて楽曲の検索を行う（詳細は後述）。 (search part)
The search unit 300 searches for music based on search words. Search words are used when searching for songs to be sung in karaoke. Specifically, search words are singer names, song titles, part of lyrics, and the like. The search unit 300 extracts songs containing a search word from among a plurality of songs. Note that a known method can be used to search for songs based on search words. In this embodiment, search words are input by voice. The search unit 300 searches for music according to instructions from the control unit 600 (details will be described later).

（報知部）
報知部４００は、利用者に対し、検索の結果を報知する。たとえば、検索部３００が検索結果として楽曲Ｘ１、Ｘ２、Ｘ３を抽出したとする。この場合、報知部４００は楽曲Ｘ１、楽曲Ｘ２、楽曲Ｘ３の曲名や楽曲ＩＤを表示手段５０ｃに表示させることで、検索の結果を報知する。なお、検索の結果は、リモコン装置５０に設けられたスピーカ（図示なし）を介し、音声で報知してもよい。 (Notification part)
The notification unit 400 notifies the user of the search result. For example, assume that the search unit 300 has extracted songs X1, X2, and X3 as search results. In this case, the notification unit 400 notifies the result of the search by displaying the song titles and song IDs of the song X1, song X2, and song X3 on the display unit 50c. It should be noted that the search result may be notified by voice via a speaker (not shown) provided in the remote control device 50 .

（実行部）
実行部５００は、コマンドを実行する。実行部５００は、制御部６００からの指示に基づいて記憶手段５０ａに記憶されたコマンドを実行する（詳細は後述）。 (execution part)
The execution unit 500 executes commands. The execution unit 500 executes commands stored in the storage unit 50a based on instructions from the control unit 600 (details will be described later).

（制御部）
制御部６００は、音声処理部２００が出力したテキストデータに基づいてトリガーワードを取得する。 (control part)
The control unit 600 acquires trigger words based on the text data output by the speech processing unit 200 .

たとえば、音声処理部２００から「カラオーケー」というテキストデータが出力されたとする。制御部６００は、出力されたテキストデータに対応するデータがデータ記憶部１００に記憶されているかどうかを確認する。図３の例によれば、テキストデータ「カラオーケー」はトリガーワードと紐付けられている。この場合、制御部６００は、トリガーワードを取得する。 For example, it is assumed that text data "Karaoke" is output from voice processing unit 200 . Control unit 600 checks whether data corresponding to the output text data is stored in data storage unit 100 . According to the example of FIG. 3, the text data "Karaoke" is associated with the trigger word. In this case, the control section 600 acquires the trigger word.

なお、たとえば、音声処理部２００から「ハラヘッタ」というテキストデータが出力されたとする。制御部６００は、出力されたテキストデータに対応するデータがデータ記憶部１００に記憶されているかどうかを確認する。図３の例によれば、テキストデータ「ハラヘッタ」に紐付けられているトリガーワードやコマンドは存在しない。この場合、制御部６００は以下の処理を行わない。 For example, it is assumed that text data "Harahetta" is output from speech processing unit 200 . Control unit 600 checks whether data corresponding to the output text data is stored in data storage unit 100 . According to the example of FIG. 3, there is no trigger word or command associated with the text data "harahetta". In this case, the control unit 600 does not perform the following processing.

ここで、本実施形態に係る制御部６００は、第１の処理及び第２の処理を行う。以下、各処理について詳述する。 Here, the control unit 600 according to the present embodiment performs first processing and second processing. Each process will be described in detail below.

第１の処理は、トリガーワードを取得した後、第１の所定時間が経過するまでに音声処理部２００があるテキストデータを出力した場合、当該あるテキストデータを検索ワードとする楽曲の検索を検索部３００に指示する処理である。第１の所定時間は、たとえば「３秒」のように予め一の値が設定されている。 In the first process, when certain text data is output from the speech processing unit 200 within a first predetermined period of time after the acquisition of the trigger word, music is searched using the certain text data as a search word. This is a process for instructing the unit 300 . A value such as "3 seconds" is set in advance for the first predetermined time.

具体的に、制御部６００は、トリガーワードを取得した場合に計時を開始する。制御部６００は、トリガーワードを取得してから第１の所定時間が経過するまでに音声処理部２００がテキストデータを出力するかどうかを確認する。ここで、第１の所定時間が経過するまでに利用者Ｕが「入間椅子」と発声し、当該発声に基づいて、音声処理部２００からテキストデータ「イルマイス」の出力があったとする。この場合、制御部６００は、テキストデータ「イルマイス」を検索ワードとして判定する。制御部６００は、「イルマイス」を検索ワードとする楽曲の検索を行うよう、検索部３００に指示する。検索部３００は、当該指示に基づいて、検索ワード「イルマイス」で楽曲の検索を行う。 Specifically, the control unit 600 starts timing when the trigger word is acquired. The control unit 600 confirms whether or not the speech processing unit 200 will output the text data within the first predetermined time after the acquisition of the trigger word. Here, it is assumed that the user U utters "Iruma chair" before the first predetermined time elapses, and based on the utterance, the voice processing unit 200 outputs the text data "Illmais". In this case, the control unit 600 determines the text data "Illmais" as the search word. The control unit 600 instructs the search unit 300 to search for music using the search word “Illmais”. Based on the instruction, the search unit 300 searches for music using the search word “Illmais”.

一方、第１の所定時間が経過するまでに音声処理部２００があるテキストデータを出力しなかった場合、制御部６００は、検索部３００に対して楽曲の検索を指示することはない。 On the other hand, if the speech processing unit 200 does not output certain text data before the first predetermined time elapses, the control unit 600 does not instruct the search unit 300 to search for music.

なお、第１の所定時間が経過するまでにテキストデータの出力があった場合または第１の所定時間が経過するまでテキストデータの出力がなかった場合、制御部６００は、計時を終了し、タイマをリセットする。 If the text data is output before the first predetermined time elapses, or if the text data is not output until the first predetermined time elapses, the control unit 600 terminates the time measurement and the timer reset.

第２の処理は、コマンドに紐付けられたテキストデータが出力された際、当該テキストデータに紐付けられたコマンドを記憶手段５０ａに記憶し、トリガーワードを取得し且つ記憶手段５０ａにコマンドを記憶している場合、記憶しているコマンドの実行を実行部５００に指示した後、記憶しているコマンドを削除する一方、最新のコマンドの記憶から第２の所定時間が経過した場合、記憶しているコマンドを削除する処理である。第２の所定時間は、たとえば「３秒」のように予め一の値が設定されている。なお、第１の所定時間及び第２の所定時間は、同じ時間であってもよいし、異なっていてもよい。 In the second process, when the text data linked to the command is output, the command linked to the text data is stored in the storage means 50a, the trigger word is acquired, and the command is stored in the storage means 50a. If so, after instructing the execution unit 500 to execute the stored command, the stored command is deleted. This is the process of deleting commands that are The second predetermined time is set to a value such as "3 seconds" in advance. Note that the first predetermined time and the second predetermined time may be the same time or may be different.

たとえば、楽曲Ｘ１の前奏部分を聴いた利用者Ｕが、自ら楽曲Ｘ１をカラオケ歌唱するにはキーが高く、またテンポも速いと感じたとする。この場合、利用者Ｕは、集音手段５０ｅに対し「キー下げて」、「テンポ下げて」と順番に発声する。音声処理部２００は、当該音声に基づく音声信号を音声認識処理し、「キーサゲテ」、「テンポサゲテ」をテキストデータとして出力する。 For example, assume that the user U who listened to the introductory part of the music piece X1 feels that the key and the tempo of the music piece X1 are too high to sing in karaoke. In this case, the user U sequentially utters "lower the key" and "lower the tempo" to the sound collecting means 50e. The speech processing unit 200 performs speech recognition processing on the speech signal based on the speech, and outputs "Kisagete" and "Tempo Sagete" as text data.

制御部６００は、出力されたテキストデータに対応するデータがデータ記憶部１００に記憶されているかどうかを確認する。図３の例によれば、テキストデータ「キーサゲテ」はコマンドＣ０５（カラオケ演奏のキーを１半音下げる）が紐付けられており、テキストデータ「テンポサゲテ」はコマンドＣ０２（カラオケ演奏のテンポを５％下げる）が紐付けられている。よって、制御部６００は、テキストデータ「キーサゲテ」、「テンポサゲテ」に紐付けられているコマンドＣ０５及びＣ０２を発声された順番で記憶手段５０ａに記憶する。 Control unit 600 checks whether data corresponding to the output text data is stored in data storage unit 100 . According to the example in FIG. 3, the text data "keysagete" is associated with the command C05 (lower the karaoke performance key by 1 semitone), and the text data "tempo sagete" is associated with the command C02 (lower the karaoke performance tempo by 5%). ) are linked. Therefore, the control unit 600 stores the commands C05 and C02 linked to the text data "Kisagete" and "Tempo Sagete" in the order of utterance in the storage unit 50a.

制御部６００は、コマンドを記憶した場合にタイマをリセットして計時を開始する。上記例のように複数のコマンドを記憶する場合、制御部６００は、コマンドが記憶される都度タイマをリセットして新たに計時を開始する。 When the command is stored, the control unit 600 resets the timer and starts timing. When storing a plurality of commands as in the above example, the control unit 600 resets the timer each time a command is stored and starts timing anew.

利用者Ｕが「カラオーケー」と発声し、制御部６００がトリガーワードを取得したとする。この場合、制御部６００は、記憶手段５０ａが記憶しているコマンドＣ０５及びＣ０２を実行するよう実行部５００に指示する。制御部６００は、当該指示を行った後、記憶しているコマンドＣ０５及びコマンドＣ０２を記憶手段５０ａから削除する。なお、コマンドを削除した場合、制御部６００は、計時を終了する。 Assume that the user U utters "Karaoke" and the control unit 600 acquires a trigger word. In this case, the control section 600 instructs the execution section 500 to execute the commands C05 and C02 stored in the storage means 50a. After issuing the instruction, the control unit 600 deletes the stored command C05 and command C02 from the storage unit 50a. Note that when the command is deleted, the control unit 600 ends the timekeeping.

一方、コマンドＣ０２を記憶してから第２の所定時間が経過した場合、制御部６００は、記憶手段５０ａからコマンドＣ０５及びコマンドＣ０２を削除する。なお、コマンドを削除した場合、制御部６００は計時を終了する。すなわち、制御部６００は、最新のコマンドの記憶から第２の所定時間が経過した場合、記憶しているコマンドを削除する。 On the other hand, when the second predetermined time has passed since command C02 was stored, control unit 600 deletes command C05 and command C02 from storage unit 50a. It should be noted that when the command is deleted, the control unit 600 ends the timing. That is, control unit 600 deletes the stored command when the second predetermined time has passed since the latest command was stored.

ここで、複数のコマンドを記憶している場合、制御部６００は、全てのコマンドを実行するよう実行部５００に指示することができる。 Here, when a plurality of commands are stored, the control section 600 can instruct the execution section 500 to execute all the commands.

たとえば、制御部６００は、記憶した順に全てのコマンドを実行するよう実行部５００に指示することができる。上記例において、トリガーワードを取得し且つコマンドＣ０５及びコマンドＣ０２を記憶している場合、制御部６００は、コマンドＣ０５、コマンドＣ０２の順で実行するよう実行部５００に指示する。また、制御部６００は、当該指示を行った後、記憶している全てのコマンドを記憶手段５０ａから削除する。 For example, control unit 600 can instruct execution unit 500 to execute all commands in the order in which they are stored. In the above example, when the trigger word is acquired and the command C05 and the command C02 are stored, the control section 600 instructs the execution section 500 to execute the command C05 and the command C02 in this order. After issuing the instruction, the control unit 600 deletes all stored commands from the storage unit 50a.

実行部５００は、当該指示に基づいて、最初に楽曲Ｘ１のカラオケ演奏のキーを１半音下げる処理を実行し、次にテンポを５％下げる処理を実行する。 Based on the instruction, the execution unit 500 first executes a process of lowering the karaoke performance key of the song X1 by one semitone, and then executes a process of lowering the tempo by 5%.

或いは、複数のコマンドを記憶している場合、制御部６００は、その一部のコマンドのみを実行するよう実行部５００に指示してもよい。 Alternatively, when a plurality of commands are stored, the control section 600 may instruct the execution section 500 to execute only some of the commands.

たとえば、制御部６００は、最新のコマンドのみを実行するよう実行部５００に指示した後、記憶している全てのコマンドを削除することでもよい。 For example, the control unit 600 may delete all stored commands after instructing the execution unit 500 to execute only the latest commands.

上記例の場合、記憶手段５０ａには、コマンドＣ０５、コマンドＣ０２の順で記憶されている。ここで、トリガーワードを取得した場合、制御部６００は、最新のコマンドＣ０２のみを実行するよう実行部５００に指示する。このように記憶している一部のコマンドの実行を指示する場合であっても、制御部６００は、指示した後、記憶している全てのコマンドを記憶手段５０ａから削除する。 In the case of the above example, the command C05 and the command C02 are stored in the order of the storage means 50a. Here, when the trigger word is acquired, the control unit 600 instructs the execution unit 500 to execute only the latest command C02. Even when instructing execution of some of the stored commands in this way, the control unit 600 deletes all the stored commands from the storage unit 50a after the instruction.

実行部５００は、当該指示に基づいて、楽曲Ｘ１のカラオケ演奏のテンポを５％下げる処理のみを実行する。 Based on the instruction, the execution unit 500 only executes the process of lowering the tempo of the karaoke performance of the song X1 by 5%.

＝＝リモコン装置における処理について＝＝
次に、図４Ａ及び図４Ｂを参照して本実施形態に係るリモコン装置５０の制御部６００における処理について述べる。図４Ａは、音声入力に基づいて楽曲の検索を行う際の制御部６００における処理（第１の処理）を示すフローチャートである。図４Ｂは、音声入力に基づいてコマンドを実行する際の制御部６００における処理（第２の処理）を示すフローチャートである。この例では、音声入力モードが実行されているとする。また、データ記憶部１００は、複数のコマンド及び一のトリガーワードをそれぞれ異なるテキストデータと紐付けて記憶しているとする。 ==About the processing in the remote controller==
Next, processing in the control unit 600 of the remote control device 50 according to this embodiment will be described with reference to FIGS. 4A and 4B. FIG. 4A is a flowchart showing processing (first processing) in the control unit 600 when searching for music based on voice input. FIG. 4B is a flowchart showing processing (second processing) in the control unit 600 when executing a command based on voice input. In this example, it is assumed that the voice input mode is being executed. It is also assumed that the data storage unit 100 stores a plurality of commands and one trigger word in association with different text data.

（第１の処理）
音声処理部２００は、集音手段５０ｅから出力された利用者の音声信号を音声認識処理し、テキストデータとして出力する（テキストデータの出力。ステップ１０）。 (First processing)
The speech processing unit 200 performs speech recognition processing on the user's speech signal output from the sound collecting means 50e and outputs it as text data (output of text data, step 10).

制御部６００は、音声処理部２００が出力したテキストデータに基づいてトリガーワードを取得する。また、制御部６００は、トリガーワードを取得した場合に計時を開始する（トリガーワードの取得及び計時の開始。ステップ１１）。 The control unit 600 acquires trigger words based on the text data output by the speech processing unit 200 . Further, the control unit 600 starts time measurement when the trigger word is acquired (acquisition of the trigger word and start of time measurement; step 11).

トリガーワードを取得してから第１の所定時間が経過するまでに音声処理部２００があるテキストデータを出力した場合（ステップ１２でＹの場合）、制御部６００は、あるテキストデータを検索ワードとする楽曲の検索を検索部３００に指示する。また、制御部６００は、計時を終了し、タイマをリセットする（検索の指示及び計時の終了。ステップ１３）。 If the speech processing unit 200 outputs certain text data before the first predetermined time has elapsed after acquisition of the trigger word (Y in step 12), the control unit 600 treats the certain text data as a search word. The search unit 300 is instructed to search for the music to be played. In addition, the control unit 600 ends the time measurement and resets the timer (instruction to search and end of time measurement; step 13).

検索部３００は、ステップ１３の指示に基づいて楽曲の検索を行う（楽曲の検索。ステップ１４）。 The search unit 300 searches for music based on the instruction in step 13 (search for music; step 14).

報知部４００は、利用者に対し、ステップ１４で得られた検索の結果を報知する（検索結果の報知。ステップ１５）。 The notification unit 400 notifies the user of the search result obtained in step 14 (notification of search result; step 15).

一方、トリガーワードを取得した後、第１の所定時間が経過するまでに音声処理部２００があるテキストデータを出力しなかった場合（ステップ１２でＮの場合）、制御部６００は、計時を終了してタイマをリセットし、以降の処理を行わない（計時の終了。ステップ１６）。 On the other hand, if the speech processing unit 200 does not output certain text data within the first predetermined time after the trigger word is acquired (N in step 12), the control unit 600 terminates timing. to reset the timer, and the subsequent processing is not performed (end of timing, step 16).

（第２の処理）
音声処理部２００は、集音手段５０ｅから出力された利用者の音声信号を音声認識処理し、テキストデータとして出力する（テキストデータの出力。ステップ２０）。 (Second processing)
The speech processing unit 200 performs speech recognition processing on the user's speech signal output from the sound collecting means 50e and outputs it as text data (output of text data, step 20).

制御部６００は、コマンドに紐付けられたテキストデータが出力された際、当該テキストデータに紐付けられたコマンドを記憶手段５０ａに記憶する。また、制御部６００は、コマンドを記憶した場合にタイマをリセットして計時を開始する（コマンドの記憶、タイマのリセット及び計時の開始。ステップ２１）。 When the text data associated with the command is output, the control unit 600 stores the command associated with the text data in the storage unit 50a. In addition, when the command is stored, the control unit 600 resets the timer and starts timing (storing the command, resetting the timer, and starting timing; step 21).

トリガーワードを取得し且つコマンドを記憶している場合（ステップ２２でＹの場合）、制御部６００は、記憶しているコマンドの実行を実行部５００に指示する（コマンドの実行を指示。ステップ２３）。なお、実行部５００は、ステップ２３の指示に基づいてコマンドを実行する。 When the trigger word is acquired and the command is stored (Y in step 22), the control unit 600 instructs the execution unit 500 to execute the stored command (instruct execution of the command; step 23). ). Note that the execution unit 500 executes the command based on the instruction of step 23 .

その後、制御部６００は、記憶しているコマンドを削除する。また、制御部６００は、計時を終了する（コマンドの削除及び計時の終了。ステップ２４）。一方、計時を開始してから第２の所定時間が経過した場合（ステップ２５でＹの場合）、制御部６００は、記憶しているコマンドを削除し、計時を終了する（コマンドの削除及び計時の終了。ステップ２４）。 Thereafter, control unit 600 deletes the stored command. In addition, the control unit 600 ends the timing (deletion of command and termination of timing; step 24). On the other hand, if the second predetermined time has passed since the start of time measurement (Y in step 25), the control unit 600 deletes the stored command and ends the time measurement (command deletion and time measurement). end of step 24).

以上から明らかなように、本実施形態に係るリモコン装置５０は、カラオケ歌唱を行う際に利用する。リモコン装置５０は、カラオケ歌唱の際に実行可能な処理に対応する複数のコマンド、及びコマンドの実行または楽曲の検索を指示するためのトリガーワードをそれぞれ異なるテキストデータと紐付けて記憶するデータ記憶部１００と、集音手段５０ｅから出力された利用者の音声信号を音声認識処理し、テキストデータとして出力する音声処理部２００と、検索ワードに基づいて楽曲の検索を行う検索部３００と、利用者に対し、検索の結果を報知する報知部４００と、コマンドを実行する実行部５００と、音声処理部２００が出力したテキストデータに基づいてトリガーワードを取得する制御部６００であって、トリガーワードを取得した後、第１の所定時間が経過するまでに音声処理部２００があるテキストデータを出力した場合、当該あるテキストデータを検索ワードとする楽曲の検索を検索部３００に指示する第１の処理、及びコマンドに紐付けられたテキストデータが出力された際、当該テキストデータに紐付けられたコマンドを記憶手段５０ａに記憶し、トリガーワードを取得し且つ記憶手段５０ａにコマンドを記憶している場合、記憶しているコマンドの実行を実行部５００に指示した後、記憶しているコマンドを削除する一方、最新のコマンドの記憶から第２の所定時間が経過した場合、記憶しているコマンドを削除する第２の処理を行う制御部６００と、を有する。 As is clear from the above, the remote control device 50 according to the present embodiment is used when performing karaoke singing. The remote control device 50 has a data storage unit that stores a plurality of commands corresponding to processes that can be executed during karaoke singing, and trigger words for instructing the execution of the commands or the search for music, in association with different text data. 100, a speech processing unit 200 that performs speech recognition processing on the user's speech signal output from the sound collecting means 50e and outputs it as text data, a search unit 300 that searches for music based on search words, and the user , a notification unit 400 that notifies the result of the search, an execution unit 500 that executes the command, and a control unit 600 that acquires the trigger word based on the text data output by the voice processing unit 200, and the trigger word is A first process of instructing the search unit 300 to search for a song using the given text data as a search word when the voice processing unit 200 outputs certain text data before a first predetermined time elapses after the acquisition. , and when the text data linked to the command is output, the command linked to the text data is stored in the storage means 50a, the trigger word is acquired, and the command is stored in the storage means 50a , after instructing the execution unit 500 to execute the stored command, delete the stored command, and delete the stored command when the second predetermined time has elapsed since the latest command was stored. and a control unit 600 that performs a second process for performing the processing.

このようなリモコン装置５０によれば、トリガーワードの取得を契機として音声入力による楽曲の検索が可能となる。よって、検索ワードを誤認識する可能性が低くなる。また、このようなリモコン装置５０によれば、コマンドを音声入力した場合に、トリガーワードの取得を契機としてコマンドが実行される。よって、コマンドの誤認識による意図しないコマンドの実行がなされる可能性が低くなる。すなわち、本実施形態に係るリモコン装置によれば、カラオケ歌唱の場において音声入力を利用する際の誤認識を低減することができる。 According to the remote control device 50 as described above, it is possible to search for music by voice input, triggered by the acquisition of the trigger word. Therefore, the possibility of erroneously recognizing a search word is reduced. Further, according to the remote control device 50 as described above, when a command is input by voice, the command is executed upon acquisition of the trigger word. Therefore, the possibility of executing an unintended command due to erroneous command recognition is reduced. That is, according to the remote control device according to the present embodiment, it is possible to reduce erroneous recognition when using voice input in karaoke singing.

また、制御部６００は、複数のコマンドを記憶している場合、記憶した順に全てのコマンドを実行するよう実行部５００に指示した後、記憶している全てのコマンドを削除する。このような構成によれば、利用者が希望する順番で全てのコマンドを実行することができる。また、コマンドの実行を指示した後、記憶している全てのコマンドが削除されるため、次の音声入力が可能となる。 Also, when a plurality of commands are stored, the control unit 600 instructs the execution unit 500 to execute all the commands in the order in which they are stored, and then deletes all the stored commands. With such a configuration, all commands can be executed in the order desired by the user. Further, since all the stored commands are deleted after command execution is instructed, the next voice input becomes possible.

一方、制御部６００は、複数のコマンドを記憶している場合、最新のコマンドのみを実行するよう実行部５００に指示した後、記憶している全てのコマンドを削除してもよい。このような構成によれば、利用者が希望するコマンドの一部のみを実行することができる。また、コマンドの実行を指示した後、記憶している全てのコマンドが削除されるため、次の音声入力が可能となる。 On the other hand, when a plurality of commands are stored, the control unit 600 may delete all stored commands after instructing the execution unit 500 to execute only the latest command. According to such a configuration, it is possible to execute only part of the command desired by the user. Further, since all the stored commands are deleted after command execution is instructed, the next voice input becomes possible.

＜その他＞
上記実施形態は、カラオケ用入力装置としてリモコン装置５０を例に説明した。一方、カラオケ装置Ｋ自体がカラオケ用リモコン装置として機能してもよい。この場合、カラオケ本体１０が少なくとも記憶手段５０ａ（データ記憶部１００）、通信手段５０ｂ、入力手段５０ｄ、制御手段５０ｆ（音声処理部２００、検索部３００、報知部４００、実行部５００、制御部６００）を備える。また表示装置３０が表示手段５０ｃとして機能し、リモコン装置５０が入力手段５０ｄとして機能し、マイク４０が集音手段５０ｅとして機能する。 <Others>
In the above embodiment, the remote control device 50 has been described as an example of the input device for karaoke. On the other hand, the karaoke apparatus K itself may function as a karaoke remote controller. In this case, the karaoke main body 10 includes at least storage means 50a (data storage section 100), communication means 50b, input means 50d, control means 50f (voice processing section 200, search section 300, notification section 400, execution section 500, control section 600). ). The display device 30 functions as the display means 50c, the remote control device 50 functions as the input means 50d, and the microphone 40 functions as the sound collecting means 50e.

上記実施形態の例において、データ記憶部１００に記憶されている複数のコマンドに優先度が設定されていてもよい。この場合、データ記憶部１００は、複数のコマンドを実行する場合の優先度を示す優先情報をコマンド毎に記憶している。また、制御部６００は、複数のコマンドを記憶している場合、優先情報に基づいてコマンドの実行を実行部５００に指示した後、記憶している全てのコマンドを削除する。たとえば、実施形態の例において、記憶手段５０ａに記憶されているコマンドＣ０５の優先度が「高」、コマンドＣ０２の優先度が「中」と設定されているとする。この場合、制御部６００は、優先度の高いコマンドＣ０５のみを実行するよう、或いは優先度の高いコマンドＣ０５から先に実行するよう実行部５００に指示することができる。 In the example of the embodiment described above, a plurality of commands stored in the data storage unit 100 may have priorities set. In this case, the data storage unit 100 stores, for each command, priority information indicating the priority when executing a plurality of commands. Further, when a plurality of commands are stored, the control unit 600 deletes all the stored commands after instructing the execution unit 500 to execute the commands based on the priority information. For example, in the example of the embodiment, it is assumed that the priority of command C05 stored in storage means 50a is set to "high" and the priority of command C02 is set to "medium". In this case, the control unit 600 can instruct the execution unit 500 to execute only the high priority command C05 or to execute the high priority command C05 first.

また、利用者によっては、一度音声入力したコマンドをキャンセルしたいと考えることもありうる。そこで、データ記憶部１００は、コマンドの入力をキャンセルするキャンセルワードをテキストデータと紐付けて記憶してもよい。この場合、制御部６００は、音声処理部２００が出力したテキストデータに基づいてキャンセルワードを取得することができる。また、制御部６００は、キャンセルワードを取得した場合、記憶しているコマンドの実行を実行部５００に指示することなく、記憶している全てのコマンドを削除する。 Also, some users may wish to cancel a command that has been voice-inputted once. Therefore, the data storage unit 100 may store a cancel word for canceling command input in association with the text data. In this case, the control unit 600 can acquire the cancel word based on the text data output by the speech processing unit 200. FIG. Further, when the cancel word is acquired, the control unit 600 deletes all the stored commands without instructing the execution unit 500 to execute the stored commands.

たとえば、実施形態の例において、記憶手段５０ａにコマンドＣ０５及びコマンドＣ０２が記憶されているとする。また、テキストデータ「モトイ」がキャンセルワードと紐付けられているとする。 For example, in the example of the embodiment, it is assumed that command C05 and command C02 are stored in storage means 50a. It is also assumed that the text data "Motoi" is associated with the cancel word.

利用者は、音声入力したコマンドをキャンセルするために、キャンセルワード「もとい」を発声する。音声処理部２００は、テキストデータ「モトイ」を出力する。制御部６００は、出力されたテキストデータに対応するデータがデータ記憶部１００に記憶されているかどうかを確認する。上述の通り、テキストデータ「モトイ」はキャンセルワードと紐付けられている。よって、制御部６００はキャンセルワードを取得する。制御部６００は、記憶手段５０ａに記憶されているコマンドＣ０５及びコマンドＣ０２の実行を実行部５００に指示することなく、記憶手段５０ａから削除する。 The user utters the cancel word "motoi" in order to cancel the voice-inputted command. The speech processing unit 200 outputs text data "Motoi". Control unit 600 checks whether data corresponding to the output text data is stored in data storage unit 100 . As described above, the text data "Motoi" is associated with the cancel word. Therefore, the control unit 600 acquires the cancel word. The control unit 600 deletes the commands C05 and C02 stored in the storage unit 50a from the storage unit 50a without instructing the execution unit 500 to execute them.

また、上記実施形態における第１の処理及び第２の処理を連続して行ってもよい。すなわち、制御部６００は、記憶手段５０ａに記憶しているコマンドの実行を実行部５００に指示した後、更に、第１の所定時間が経過するまでに音声処理部２００がテキストデータを出力した場合、当該テキストデータを検索ワードとした楽曲検索を検索部３００に指示してもよい。 Also, the first process and the second process in the above embodiment may be performed continuously. That is, after the control unit 600 instructs the execution unit 500 to execute the command stored in the storage unit 50a, when the voice processing unit 200 outputs the text data before the first predetermined time elapses , the search unit 300 may be instructed to search for music using the text data as a search word.

逆に、制御部６００は、記憶手段５０ａに記憶しているコマンドの実行を実行部５００に指示した後、新たなトリガーワードの音声入力がなされるまでは、仮に第１の所定時間が経過するまでに音声処理部２００がテキストデータを出力した場合であっても、当該テキストデータを検索ワードとした楽曲検索を検索部３００に指示しないことでもよい。 Conversely, after the control unit 600 instructs the execution unit 500 to execute the command stored in the storage unit 50a, it is assumed that the first predetermined time elapses until a new trigger word is input by voice. Even if the audio processing unit 200 outputs the text data by then, the search unit 300 may not be instructed to search for music using the text data as a search word.

上記実施形態は、例として提示したものであり、発明の範囲を限定するものではない。上記の構成は、適宜組み合わせて実施することが可能であり、発明の要旨を逸脱しない範囲で、種々の省略、置き換え、変更を行うことができる。上記実施形態やその変形は、発明の範囲や要旨に含まれると同様に、特許請求の範囲に記載された発明とその均等の範囲に含まれる。 The above embodiments are presented as examples and are not intended to limit the scope of the invention. The above configurations can be implemented in combination as appropriate, and various omissions, replacements, and modifications can be made without departing from the scope of the invention. The above-described embodiments and modifications thereof are included in the invention described in the claims and their equivalents, as well as being included in the scope and gist of the invention.

５０リモコン装置
１００データ記憶部
２００音声処理部
３００検索部
４００報知部
５００実行部
６００制御部 50 remote control device 100 data storage unit 200 audio processing unit 300 search unit 400 notification unit 500 execution unit 600 control unit

Claims

A karaoke input device used when performing karaoke singing,
a data storage unit that stores a plurality of commands corresponding to processes that can be executed during karaoke singing, and trigger words for instructing the execution of the commands or the search for music, respectively, in association with different text data;
a speech processing unit that performs speech recognition processing on a user's speech signal output from the sound collecting means and outputs it as text data;
a search unit that searches for songs based on search words;
a notification unit that notifies the user of search results;
an execution unit that executes a command;
A control unit that acquires the trigger word based on the text data output by the voice processing unit, the text data having the voice processing unit before a first predetermined time elapses after acquiring the trigger word. is output, a first process of instructing the search unit to search for songs using the certain text data as the search word, and when text data linked to the command is output, the text data When the linked command is stored in the storage means, the trigger word is acquired, and the command is stored in the storage means, after instructing the execution unit to execute the stored command, a control unit that deletes the stored command and performs a second process of deleting the stored command when a second predetermined time has elapsed since the latest command was stored;
Karaoke input device having

wherein, when a plurality of the commands are stored, the control unit instructs the execution unit to execute all the commands in the order in which they are stored, and then deletes all the stored commands. Item 1. An input device for karaoke according to item 1.

2. The control unit, when storing a plurality of the commands, deletes all the stored commands after instructing the execution unit to execute only the latest command. An input device for karaoke as described.

The data storage unit stores priority information indicating a priority when executing a plurality of commands for each command,
3. The control unit, when storing a plurality of commands, deletes all the stored commands after instructing the execution unit to execute the commands based on the priority information. 1. The input device for karaoke according to 1.

the data storage unit stores a cancel word for canceling the input of the command in association with text data;
The control unit
When a cancel word is acquired based on the text data output by the speech processing unit, all stored commands are deleted without instructing the execution unit to execute the stored commands. The input device for karaoke according to any one of claims 1 to 4.