JP2021021848A

JP2021021848A - Input device for karaoke

Info

Publication number: JP2021021848A
Application number: JP2019138661A
Authority: JP
Inventors: 橘　聡; Satoshi Tachibana; 聡橘
Original assignee: Daiichikosho Co Ltd
Current assignee: Daiichikosho Co Ltd
Priority date: 2019-07-29
Filing date: 2019-07-29
Publication date: 2021-02-18
Anticipated expiration: 2039-07-29
Also published as: JP7312639B2

Abstract

To provide an input device for karaoke, with which a user can easily input voice even if the user is a stammerer.SOLUTION: An input device for karaoke comprises: a voice processing unit for sound-recognizing a voice signal of a user, which is outputted from sound collection means, and outputting the signal as text data; an execution unit for executing a command corresponding to inputted voice; a determination unit for determining whether a stammer is included in the voice signal; and a guide unit for outputting guide sound for guiding voice input by the user when the stammer is determined to be included.SELECTED DRAWING: Figure 2

Description

本発明はカラオケ用入力装置に関する。 The present invention relates to a karaoke input device.

カラオケ装置に付属するリモコン装置を用いて、操作や検索のコマンドに対応する単語や短文を音声入力し、カラオケ演奏のテンポやキーを変更したり、楽曲検索を行う技術が知られている。 A technique is known in which a remote controller attached to a karaoke device is used to input words or short sentences corresponding to operation or search commands by voice, change the tempo or key of a karaoke performance, or search for music.

たとえば、特許文献１には、複数の検索語を含む一続きの音声データから各検索語を自動的に抽出し、高精度の楽曲検索を行うことが可能な楽曲検索システムが開示されている。 For example, Patent Document 1 discloses a music search system capable of automatically extracting each search term from a series of voice data including a plurality of search terms and performing a highly accurate music search.

特開２００２−１８９４８３号公報JP-A-2002-189483

ここで、利用者が吃音の場合、カラオケ歌唱にはさほど影響がない一方で、コマンドに対応する単語や短文の音声入力については困難となる可能性がある。 Here, when the user stutters, the karaoke singing is not so affected, but the voice input of the word or short sentence corresponding to the command may be difficult.

本発明の目的は、利用者が吃音の場合であっても、音声入力を容易に行うことが可能なカラオケ用入力装置を提供することにある。 An object of the present invention is to provide a karaoke input device capable of easily performing voice input even when the user stutters.

上記目的を達成するための一の発明は、音声入力により、所定のコマンドを実行するためのカラオケ用入力装置であって、集音手段から出力された利用者の音声信号を音声認識処理し、テキストデータとして出力する音声処理部と、前記テキストデータに基づいて、入力された音声に対応するコマンドを実行する実行部と、前記音声信号に吃音が含まれるかどうかを判定する判定部と、吃音が含まれると判定された場合、前記利用者による音声入力をガイドするためのガイド音を出力するガイド部と、を有するカラオケ用入力装置である。
本発明の他の特徴については、後述する明細書及び図面の記載により明らかにする。 One invention for achieving the above object is a karaoke input device for executing a predetermined command by voice input, and voice recognition processing of a user's voice signal output from a sound collecting means is performed. A voice processing unit that outputs as text data, an execution unit that executes a command corresponding to the input voice based on the text data, a determination unit that determines whether or not the voice signal contains stuttering, and a stuttering sound. Is a karaoke input device having a guide unit for outputting a guide sound for guiding voice input by the user when it is determined that the data is included.
Other features of the present invention will be clarified by the description of the specification and drawings described later.

本発明によれば、利用者が吃音の場合であっても、音声入力を容易に行うことができる。 According to the present invention, voice input can be easily performed even when the user is stuttering.

第１実施形態に係るカラオケ装置を示す図である。It is a figure which shows the karaoke apparatus which concerns on 1st Embodiment. 第１実施形態に係るリモコン装置を示す図である。It is a figure which shows the remote control device which concerns on 1st Embodiment. 第１実施形態に係るリモコン装置による処理を示すフローチャートである。It is a flowchart which shows the process by the remote control device which concerns on 1st Embodiment. 第２実施形態に係るリモコン装置を示す図である。It is a figure which shows the remote control device which concerns on 2nd Embodiment. 第２実施形態に係る記憶手段が記憶する、吃音の程度に応じたガイド音のテンポ及び音量のテーブルである。It is a table of the tempo and volume of the guide sound according to the degree of stuttering, which is stored by the storage means according to the second embodiment.

＜第１実施形態＞
図１〜図３を参照して、本実施形態に係るカラオケ用入力装置について説明する。 <First Embodiment>
The karaoke input device according to the present embodiment will be described with reference to FIGS. 1 to 3.

＝＝カラオケ装置＝＝
カラオケ装置Ｋは、楽曲のカラオケ演奏、及び利用者がカラオケ歌唱を行うための装置である。図１に示すように、カラオケ装置Ｋは、カラオケ本体１０、スピーカ２０、表示装置３０、マイク４０、及びリモコン装置５０を備える。 == Karaoke device ==
The karaoke device K is a device for performing karaoke of music and for the user to sing karaoke. As shown in FIG. 1, the karaoke device K includes a karaoke body 10, a speaker 20, a display device 30, a microphone 40, and a remote control device 50.

カラオケ本体１０は、選曲された楽曲の演奏制御、歌詞や背景映像等の表示制御、マイク４０を通じて入力された音声信号の処理といった、カラオケ演奏やカラオケ歌唱に関する各種の制御を行う。スピーカ２０はカラオケ本体１０からの放音信号に基づいて放音するための構成である。表示装置３０はカラオケ本体１０からの信号に基づいて映像や画像を画面に表示するための構成である。マイク４０は利用者の歌唱音声をアナログの音声信号に変換してカラオケ本体１０に入力するための構成である。リモコン装置５０は、カラオケ本体１０に対する各種操作をおこなうための装置である。本実施形態におけるリモコン装置５０は「カラオケ用入力装置」に相当する。 The karaoke main body 10 performs various controls related to karaoke performance and karaoke singing, such as performance control of selected songs, display control of lyrics and background images, and processing of audio signals input through a microphone 40. The speaker 20 is configured to emit sound based on the sound emitted signal from the karaoke main body 10. The display device 30 is configured to display an image or an image on the screen based on the signal from the karaoke body 10. The microphone 40 is configured to convert the user's singing voice into an analog voice signal and input it to the karaoke main body 10. The remote control device 50 is a device for performing various operations on the karaoke main body 10. The remote control device 50 in this embodiment corresponds to a "karaoke input device".

＝＝リモコン装置＝＝
図２に示すように、本実施形態に係るリモコン装置５０は、記憶手段５０ａ、通信手段５０ｂ、表示手段５０ｃ、入力手段５０ｄ、集音手段５０ｅ、放音手段５０ｆ、及び制御手段５０ｇを備える。各構成はインターフェース（図示なし）を介してバスＢに接続されている。 == Remote control device ==
As shown in FIG. 2, the remote control device 50 according to the present embodiment includes a storage means 50a, a communication means 50b, a display means 50c, an input means 50d, a sound collecting means 50e, a sound emitting means 50f, and a control means 50g. Each configuration is connected to bus B via an interface (not shown).

［記憶手段］
記憶手段５０ａは、各種のデータを記憶する大容量の記憶装置である。 [Memory means]
The storage means 50a is a large-capacity storage device that stores various types of data.

本実施形態における記憶手段５０ａは、複数のコマンドをそれぞれ異なるテキストデータと紐付けて記憶する。 The storage means 50a in the present embodiment stores a plurality of commands in association with different text data.

コマンドは、カラオケ歌唱の際に実行可能な処理に対応する命令である。コマンドは、たとえば、「カラオケ演奏のテンポを上げる」、「カラオケ演奏のキーを下げる」、「カラオケ演奏を一時停止する」、「マイクの音量を上げる」、「スピーカからの音量を下げる」、「歌詞の表示を消す」、「楽曲を検索する」等の処理を実行するための命令である。 The command is a command corresponding to a process that can be executed when singing karaoke. The commands are, for example, "Increase the tempo of the karaoke performance", "Decrease the key of the karaoke performance", "Pause the karaoke performance", "Increase the volume of the microphone", "Decrease the volume from the speaker", " It is an instruction to execute processing such as "turn off the display of lyrics" and "search for music".

テキストデータは、コマンドを識別するためのデータである。複数のコマンドには、それぞれ異なる一のテキストデータが紐付けられている。たとえば、コマンド「カラオケ演奏のテンポを５％上げる」に対しては、「テンポアゲテ」のテキストデータが紐付けられている。なお、テーブルに記憶されていないテキストデータについては、対応するコマンドが無いものとして取り扱う。 Text data is data for identifying a command. One different text data is associated with each of the plurality of commands. For example, the text data of "Tempo Agete" is associated with the command "Increase the tempo of karaoke performance by 5%". Text data that is not stored in the table is treated as if there is no corresponding command.

［通信手段・表示手段・入力手段・集音手段・放音手段］
通信手段５０ｂは、カラオケ本体１０との通信を行うためのインターフェースを提供する。表示手段５０ｃは、各種情報を表示させるための構成である。入力手段５０ｄは、利用者が各種の指示入力を行うための構成である。入力手段５０ｄは、リモコン装置５０に設けられたボタン等である。或いは、表示手段５０ｃがタッチパネル形式で構成されている場合、表示手段５０ｃは入力手段５０ｄとしても機能する。集音手段５０ｅは、利用者が発した音声を集音し、音声信号として出力するためのマイクである。放音手段５０ｆは、各種音声を発するスピーカである。 [Communication means / Display means / Input means / Sound collecting means / Sound emitting means]
The communication means 50b provides an interface for communicating with the karaoke body 10. The display means 50c is configured to display various types of information. The input means 50d is configured for the user to input various instructions. The input means 50d is a button or the like provided on the remote controller device 50. Alternatively, when the display means 50c is configured in the touch panel format, the display means 50c also functions as the input means 50d. The sound collecting means 50e is a microphone for collecting sound emitted by the user and outputting it as a voice signal. The sound emitting means 50f is a speaker that emits various sounds.

［制御手段］
制御手段５０ｇは、リモコン装置５０における各種の制御を行う。制御手段５０ｇは、ＣＰＵおよびメモリ（いずれも図示無し）を備える。ＣＰＵは、メモリに記憶されたプログラムを実行することにより各種の機能を実現する。 [Control means]
The control means 50g performs various controls on the remote controller 50. The control means 50 g includes a CPU and a memory (neither of which is shown). The CPU realizes various functions by executing a program stored in the memory.

ここで、カラオケ装置Ｋを利用する利用者が音声入力を用いて各種のコマンドの実行を指示したいと考えたとする。この場合、利用者は、たとえば入力手段５０ｄを介し、表示手段５０ｃに表示されている「音声入力」のアイコンを選択する。当該選択に基づいて、制御手段５０ｇのＣＰＵはメモリに記憶されるプログラムを実行し、音声入力モードに移行する。この場合、制御手段５０ｇは、音声処理部１００、実行部２００、判定部３００、及びガイド部４００として機能する。 Here, it is assumed that a user who uses the karaoke device K wants to instruct the execution of various commands by using voice input. In this case, the user selects the "voice input" icon displayed on the display means 50c, for example, via the input means 50d. Based on the selection, the CPU of the control means 50g executes the program stored in the memory and shifts to the voice input mode. In this case, the control means 50g functions as a voice processing unit 100, an execution unit 200, a determination unit 300, and a guide unit 400.

（音声処理部）
音声処理部１００は、集音手段５０ｅから出力された利用者の音声信号を音声認識処理し、テキストデータとして出力する。音声認識処理は、公知の手法を用いることができる。 (Voice processing unit)
The voice processing unit 100 performs voice recognition processing on the user's voice signal output from the sound collecting means 50e and outputs it as text data. A known method can be used for the voice recognition process.

たとえば、利用者Ｕが集音手段５０ｅに対し「テンポアゲテ」と発声したとする。集音手段５０ｅは音声を集音し、音声信号として音声処理部１００に出力する。音声処理部１００は、音声信号を処理し、音声信号が示す「テンポアゲテ」をテキストデータとして出力する。 For example, suppose that the user U utters "tempo agete" to the sound collecting means 50e. The sound collecting means 50e collects sound and outputs it as a voice signal to the voice processing unit 100. The voice processing unit 100 processes the voice signal and outputs the "tempo agete" indicated by the voice signal as text data.

（実行部）
実行部２００は、テキストデータに基づいて、入力された音声に対応するコマンドを実行する。 (Execution department)
The execution unit 200 executes a command corresponding to the input voice based on the text data.

たとえば、音声処理部１００から「テンポアゲテ」というテキストデータが出力されたとする。実行部２００は、出力されたテキストデータに対応するデータが記憶手段５０ａに記憶されているかどうかを確認する。データが記憶手段５０ａに記憶されている場合、実行部２００は、テキストデータに対応するコマンドを読み出し、カラオケ演奏のテンポをあげる処理を実行する。 For example, it is assumed that the text data "tempo agete" is output from the voice processing unit 100. The execution unit 200 confirms whether or not the data corresponding to the output text data is stored in the storage means 50a. When the data is stored in the storage means 50a, the execution unit 200 reads the command corresponding to the text data and executes a process of raising the tempo of the karaoke performance.

一方、音声処理部１００から出力されたテキストデータに対応するデータが記憶手段５０ａに記憶されていない場合、テキストデータに対応するコマンドも存在しない。この場合、実行部２００はコマンドを実行することはない。 On the other hand, when the data corresponding to the text data output from the voice processing unit 100 is not stored in the storage means 50a, there is no command corresponding to the text data. In this case, the execution unit 200 does not execute the command.

（判定部）
判定部３００は、音声信号に吃音が含まれるかどうかを判定する。音声信号に吃音が含まれるかどうかを判定する方法は、公知の方法（たとえば特開２０１９−０５６７９１号公報参照）を用いることができる。 (Judgment unit)
The determination unit 300 determines whether or not the voice signal includes stuttering. As a method for determining whether or not the voice signal contains stuttering, a known method (see, for example, Japanese Patent Application Laid-Open No. 2019-056791) can be used.

具体的に、判定部３００は、集音手段５０ｅから音声信号が出力された場合、当該音声信号に吃音が含まれるかどうかを判定する。 Specifically, when the voice signal is output from the sound collecting means 50e, the determination unit 300 determines whether or not the voice signal includes stuttering.

吃音が含まれると判定した場合、判定部３００は、その旨の信号をガイド部４００に出力する。一方、吃音が含まれないと判定した場合、判定部３００は、その旨の信号を音声処理部１００に出力する。音声処理部１００は、当該信号に基づいて音声信号の音声認識処理を開始する。なお、音声処理部１００による音声認識処理に影響がない程度であれば、音声信号に吃音が含まれていてもよい。すなわち、判定部３００は、音声認識処理に影響があるかどうかという基準で吃音の有無の判定を行う。 When it is determined that stuttering is included, the determination unit 300 outputs a signal to that effect to the guide unit 400. On the other hand, when it is determined that the stuttering is not included, the determination unit 300 outputs a signal to that effect to the voice processing unit 100. The voice processing unit 100 starts voice recognition processing of a voice signal based on the signal. The voice signal may include stuttering as long as the voice recognition process by the voice processing unit 100 is not affected. That is, the determination unit 300 determines the presence or absence of stuttering based on whether or not the voice recognition process is affected.

或いは、集音手段５０ｅから音声信号が出力された場合、まず音声処理部１００が、音声信号の音声認識処理を行うことでもよい。音声認識処理ができた場合、音声処理部１００は、実行部２００にテキストデータを出力する。一方、音声認識処理ができなかった場合、音声処理部１００は、その旨の信号を判定部３００に出力する。このように音声認識処理ができなかった場合にのみ、判定部３００が、音声信号に吃音が含まれるかどうかを判定することでもよい。なお、音声認識処理ができない原因としては、音声自体が小さい、雑音が多すぎる等、吃音以外の様々な理由がありうる。すなわち、音声認識処理ができない場合であっても、必ずしも音声信号に吃音が含まれるとは限らない。 Alternatively, when a voice signal is output from the sound collecting means 50e, the voice processing unit 100 may first perform voice recognition processing of the voice signal. When the voice recognition process is completed, the voice processing unit 100 outputs text data to the execution unit 200. On the other hand, when the voice recognition process cannot be performed, the voice processing unit 100 outputs a signal to that effect to the determination unit 300. Only when the voice recognition process cannot be performed in this way, the determination unit 300 may determine whether or not the voice signal includes stuttering. It should be noted that the reason why the voice recognition process cannot be performed may be various reasons other than stuttering, such as the voice itself being small and the noise being too much. That is, even if the voice recognition process cannot be performed, the voice signal does not always include stuttering.

（ガイド部）
ガイド部４００は、吃音が含まれると判定された場合、ガイド音を出力する。ガイド音に合わせて発話することで吃音が減少する。たとえば、日常生活における吃音の訓練として、メトロノーム音をガイド音として利用する方法が用いられている。 (Guide section)
The guide unit 400 outputs a guide sound when it is determined that the stuttering sound is included. Stuttering is reduced by speaking in time with the guide sound. For example, as a training for stuttering in daily life, a method of using a metronome sound as a guide sound is used.

ガイド音は、吃音の利用者の音声入力をガイドするための音である。ガイド音は、たとえば所定のテンポ及び音量の電子メトロノーム音であったり、リズミカルなＢＧＭやリズムパターンである。ガイド音は、カラオケ装置毎に予め一の音が設定されていてもよいし、複数のガイド音の中から利用者が任意に選択した音であってもよい。 The guide sound is a sound for guiding the voice input of the stuttering user. The guide sound is, for example, an electronic metronome sound having a predetermined tempo and volume, or a rhythmic BGM or rhythm pattern. As the guide sound, one sound may be set in advance for each karaoke device, or a sound arbitrarily selected by the user from a plurality of guide sounds may be used.

判定部３００から吃音が含まれる旨の信号の入力を受けた場合、ガイド部４００は、放音手段５０ｆを介してガイド音を出力する。吃音の利用者は、ガイド音に合わせて落ち着いて音声入力を行うことができる。 When a signal indicating that stuttering is included is input from the determination unit 300, the guide unit 400 outputs the guide sound via the sound emitting means 50f. The user of stuttering can calmly input voice according to the guide sound.

なお、ガイド音の停止は、様々なタイミングで行うことができる。たとえば、ガイド部４００は、利用者が入力手段５０ｄを介し、表示手段５０ｃに表示されている「音声入力終了」のアイコンを選択した場合にガイド音の出力を停止することができる。 The guide sound can be stopped at various timings. For example, the guide unit 400 can stop the output of the guide sound when the user selects the "voice input end" icon displayed on the display means 50c via the input means 50d.

或いは、ガイド部４００は、判定部３００が音声信号に吃音が含まれないと判定した場合や、音声処理部１００が音声認識処理を完了した場合、または集音手段５０ｅが所定時間、音声入力を受け付けなかった場合に、ガイド音の出力を停止してもよい。 Alternatively, the guide unit 400 determines that the voice signal does not include stuttering, the voice processing unit 100 completes the voice recognition process, or the sound collecting means 50e inputs the voice for a predetermined time. If it is not accepted, the output of the guide sound may be stopped.

＝＝リモコン装置における処理について＝＝
図３を参照して、本実施形態に係るリモコン装置５０における処理について述べる。図３は、リモコン装置５０における処理を示すフローチャートである。この例では、音声入力モードが実行されているとする。 == Processing in the remote control device ==
The processing in the remote control device 50 according to the present embodiment will be described with reference to FIG. FIG. 3 is a flowchart showing processing in the remote controller device 50. In this example, it is assumed that the voice input mode is being executed.

利用者は、集音手段５０ｅを介して音声入力を行う。集音手段５０ｅは、音声を集音し、音声信号として判定部３００に出力する（音声信号の出力。ステップ１０）。 The user performs voice input via the sound collecting means 50e. The sound collecting means 50e collects sound and outputs it as a voice signal to the determination unit 300 (output of voice signal, step 10).

判定部３００は、ステップ１０で出力された音声信号に吃音が含まれるかどうかを判定する。 The determination unit 300 determines whether or not the voice signal output in step 10 includes stuttering.

吃音が含まれると判定された場合（ステップ１１でＹの場合）、ガイド部４００は、利用者による音声入力をガイドするためのガイド音を出力する（ガイド音の出力。ステップ１２）。利用者は、ガイド音に合わせて再度、音声入力を行う。 When it is determined that stuttering is included (in the case of Y in step 11), the guide unit 400 outputs a guide sound for guiding the voice input by the user (output of the guide sound; step 12). The user inputs the voice again according to the guide sound.

一方、吃音が含まれないと判定された場合（ステップ１１でＮの場合）、音声処理部１００は、ステップ１０で出力された音声信号を音声認識処理し、テキストデータとして出力する（テキストデータの出力。ステップ１３）。 On the other hand, when it is determined that stuttering is not included (in the case of N in step 11), the voice processing unit 100 performs voice recognition processing on the voice signal output in step 10 and outputs it as text data (text data). Output. Step 13).

実行部２００は、ステップ１３で出力されたテキストデータに基づいて、入力された音声に対応するコマンドを実行する（コマンドの実行。ステップ１４）。 The execution unit 200 executes a command corresponding to the input voice based on the text data output in step 13 (command execution. Step 14).

以上から明らかなように、本実施形態に係るリモコン装置５０は、音声入力により、所定のコマンドを実行するための装置である。リモコン装置５０は、集音手段５０ｅから出力された利用者の音声信号を音声認識処理し、テキストデータとして出力する音声処理部１００と、テキストデータに基づいて、入力された音声に対応するコマンドを実行する実行部２００と、音声信号に吃音が含まれるかどうかを判定する判定部３００と、吃音が含まれると判定された場合、利用者による音声入力をガイドするためのガイド音を出力するガイド部４００と、を有する。 As is clear from the above, the remote controller device 50 according to the present embodiment is a device for executing a predetermined command by voice input. The remote control device 50 issues a voice processing unit 100 that performs voice recognition processing on the user's voice signal output from the sound collecting means 50e and outputs it as text data, and a command corresponding to the input voice based on the text data. An execution unit 200 to execute, a determination unit 300 for determining whether or not the voice signal contains a stutter, and a guide for outputting a guide sound for guiding the voice input by the user when it is determined that the voice signal contains the stutter. It has a part 400 and.

このようなリモコン装置５０によれば、音声信号に吃音が含まれる場合にはガイド音が出力される。吃音の利用者は、ガイド音に合わせて音声入力を行うことで吃音の影響を受けずに音声入力が可能となる。すなわち、本実施形態に係るリモコン装置によれば、利用者が吃音の場合であっても、音声入力を容易に行うことができる。 According to such a remote controller 50, a guide sound is output when the voice signal includes stuttering. A user of stuttering can input voice without being affected by stuttering by inputting voice according to the guide sound. That is, according to the remote control device according to the present embodiment, voice input can be easily performed even when the user is stuttering.

＜第２実施形態＞
次に、図４及び図５を参照して、第２実施形態に係るカラオケ用入力装置について説明する。本実施形態では、吃音の程度に応じたガイド音を出力する例について述べる。第１実施形態と同様の構成については説明を省略する。 <Second Embodiment>
Next, the karaoke input device according to the second embodiment will be described with reference to FIGS. 4 and 5. In this embodiment, an example of outputting a guide sound according to the degree of stuttering will be described. The description of the same configuration as that of the first embodiment will be omitted.

［制御手段］
第１実施形態と同様、カラオケ装置Ｋを利用する利用者が音声入力を用いて各種のコマンドの実行を指示したいと考えたとする。この場合、利用者は、入力手段５０ｄを介し、表示手段５０ｃに表示されている「音声入力」のアイコンを選択する。当該選択に基づいて、制御手段５０ｇのＣＰＵはメモリに記憶されるプログラムを実行し、音声入力モードに移行する。この場合、制御手段５０ｇは、音声処理部１００、実行部２００、判定部３００、ガイド部４００、及び設定部５００として機能する（図４参照）。 [Control means]
As in the first embodiment, it is assumed that the user who uses the karaoke device K wants to instruct the execution of various commands by using voice input. In this case, the user selects the "voice input" icon displayed on the display means 50c via the input means 50d. Based on the selection, the CPU of the control means 50g executes the program stored in the memory and shifts to the voice input mode. In this case, the control means 50g functions as a voice processing unit 100, an execution unit 200, a determination unit 300, a guide unit 400, and a setting unit 500 (see FIG. 4).

（設定部）
設定部５００は、吃音が含まれると判定された場合、当該吃音の程度に基づいてガイド音のテンポ及び／または音量を設定する。 (Setting part)
When it is determined that stuttering is included, the setting unit 500 sets the tempo and / or volume of the guide sound based on the degree of the stuttering.

吃音の程度は、たとえば所定時間内の回数として表すことができる。吃音の程度に基づくガイド音のテンポや音量は予め設定されている。たとえば、記憶手段５０ａは、図５のテーブルに示すような、吃音の程度に応じたガイド音のテンポや音量を記憶している。 The degree of stuttering can be expressed, for example, as the number of times within a predetermined time. The tempo and volume of the guide sound based on the degree of stuttering are preset. For example, the storage means 50a stores the tempo and volume of the guide sound according to the degree of stuttering, as shown in the table of FIG.

判定部３００は、吃音が含まれていると判定した場合、吃音の程度を測定する。判定部３００は、測定した吃音の程度を示す情報を設定部５００に出力する。 When the determination unit 300 determines that stuttering is included, the determination unit 300 measures the degree of stuttering. The determination unit 300 outputs information indicating the measured degree of stuttering to the setting unit 500.

設定部５００は、判定部３００が出力した吃音が含まれる旨の信号及び吃音の程度を示す情報に基づいて、吃音の程度に適したガイド音となるよう設定する。ガイド音の設定は、テンポ及び音量の少なくとも一方について行う。 The setting unit 500 sets the guide sound suitable for the degree of stuttering based on the signal output by the determination unit 300 to the effect that the stuttering is included and the information indicating the degree of stuttering. The guide sound is set for at least one of the tempo and the volume.

ガイド部４００は、設定部５００により設定されたテンポ及び／または音量でガイド音を出力する。 The guide unit 400 outputs a guide sound at the tempo and / or volume set by the setting unit 500.

具体例として、判定部３００から利用者Ｕ１の吃音の程度を示す情報として「５秒間に吃音が３回以上」が出力されたとする。 As a specific example, it is assumed that "the stuttering is 3 times or more in 5 seconds" is output from the determination unit 300 as information indicating the degree of stuttering of the user U1.

この場合、設定部５００は、図５のテーブルを参照し、ガイド音のテンポをゆっくり（たとえばＢＰＭ＝８０）とし、且つ音量を大きめ（１０段階のうち「４」）と設定する。ガイド部４００は、設定されたテンポ及び音量でガイド音を出力する。 In this case, the setting unit 500 refers to the table of FIG. 5 and sets the tempo of the guide sound to be slow (for example, BPM = 80) and the volume to be large (“4” out of 10 steps). The guide unit 400 outputs a guide sound at a set tempo and volume.

一方、判定部３００から利用者Ｕ２の吃音の程度を示す情報として「５秒間に吃音が２回」が出力されたとする。この場合、利用者Ｕ１の吃音の程度より利用者Ｕ２の吃音の程度の方が軽いと考えられる。 On the other hand, it is assumed that the determination unit 300 outputs "stuttering twice in 5 seconds" as information indicating the degree of stuttering of the user U2. In this case, it is considered that the degree of stuttering of user U2 is lighter than the degree of stuttering of user U1.

そこで、設定部５００は、ガイド音のテンポをややゆっくり（たとえばＢＰＭ＝１００）とし、且つ音量を少し大きめ（１０段階のうち「３」）と設定する。ガイド部４００は、設定されたテンポ及び音量でガイド音を出力する。 Therefore, the setting unit 500 sets the tempo of the guide sound to be slightly slower (for example, BPM = 100) and the volume to be slightly louder (“3” out of 10 steps). The guide unit 400 outputs a guide sound at a set tempo and volume.

なお、この例ではガイド音のテンポ及び音量を設定する例について述べたが、設定部５００は、テンポ又は音量のいずれか一方を設定することでもよい。 In this example, an example of setting the tempo and volume of the guide sound has been described, but the setting unit 500 may set either the tempo or the volume.

このように、本実施形態に係るリモコン装置５０は、吃音が含まれると判定された場合、当該吃音の程度に基づいてガイド音のテンポ及び／または音量を設定する設定部５００を有する。また、ガイド部４００は、設定されたテンポ及び／または音量でガイド音を出力する。このようなリモコン装置によれば、利用者の吃音の程度に応じて適切なガイド音（すなわち、利用者が音声入力し易くなるガイド音）を出力できる。 As described above, the remote controller device 50 according to the present embodiment has a setting unit 500 that sets the tempo and / or volume of the guide sound based on the degree of the stuttering when it is determined that the stuttering is included. Further, the guide unit 400 outputs a guide sound at a set tempo and / or volume. According to such a remote control device, it is possible to output an appropriate guide sound (that is, a guide sound that facilitates the user to input voice) according to the degree of stuttering of the user.

＜変形例１＞
第２実施形態のように、あるテンポ及び音量でガイド音を出力した場合、利用者は当該ガイド音に合わせて音声入力を再度、試みる。一方、利用者にとって、設定されたテンポや音量が妥当でない場合がある。このような場合には、利用者に適したガイド音のテンポや音量を改めて設定することが好ましい。 <Modification example 1>
When the guide sound is output at a certain tempo and volume as in the second embodiment, the user tries to input the voice again in accordance with the guide sound. On the other hand, the set tempo and volume may not be appropriate for the user. In such a case, it is preferable to set the tempo and volume of the guide sound suitable for the user.

そこで、判定部３００は、ガイド部４００がガイド音を出力している間に集音手段５０ｅから新たな音声信号が出力された場合、当該新たな音声信号に吃音が含まれるかどうかを判定する。 Therefore, when a new voice signal is output from the sound collecting means 50e while the guide unit 400 is outputting the guide sound, the determination unit 300 determines whether or not the new voice signal includes stuttering. ..

設定部５００は、判定部３００の判定結果に応じて、テンポ及び／または音量を再設定する。具体的に、判定部３００により音声信号に吃音が含まれると判定された場合、設定部５００は、テンポや音量を変更することで、ガイド音の再設定を行う。 The setting unit 500 resets the tempo and / or the volume according to the determination result of the determination unit 300. Specifically, when the determination unit 300 determines that the voice signal includes stuttering, the setting unit 500 resets the guide sound by changing the tempo and volume.

ガイド部４００は、再設定されたテンポ及び／または音量で、新たなガイド音を出力する。なお、リモコン装置５０は、吃音が含まれないと判定されるまで、繰り返しガイド音の再設定を行うことができる。 The guide unit 400 outputs a new guide sound at the reset tempo and / or volume. The remote controller 50 can repeatedly reset the guide sound until it is determined that the stuttering is not included.

このように、ガイド音に合わせて音声入力された音声信号に吃音が含まれる場合に、ガイド音のテンポや音量を再設定することにより、利用者の吃音の程度に適したガイド音を出力できる。 In this way, when the voice signal input by voice according to the guide sound includes stuttering, the guide sound suitable for the degree of stuttering of the user can be output by resetting the tempo and volume of the guide sound. ..

＜変形例２＞
上記実施形態のように、ガイド音に合わせて音声入力を行った場合、集音手段５０ｅが集音した音の中には、音声入力だけでなくガイド音が含まれている可能性がある。このようなガイド音があることで、音声処理部１００が音声認識処理を誤る可能性がありうる。 <Modification 2>
When voice input is performed in accordance with the guide sound as in the above embodiment, there is a possibility that the sound collected by the sound collecting means 50e includes not only the voice input but also the guide sound. The presence of such a guide sound may cause the voice processing unit 100 to make a mistake in the voice recognition process.

そこで、音声処理部１００は、ガイド部４００がガイド音を出力している間に集音手段５０ｅから新たな音声信号が出力された場合、当該新たな音声信号を音声認識処理する際に、集音手段５０ｅから出力されたガイド音を除去する前処理を行う。 Therefore, when a new voice signal is output from the sound collecting means 50e while the guide unit 400 is outputting the guide sound, the voice processing unit 100 collects the new voice signal when performing voice recognition processing. Preprocessing is performed to remove the guide sound output from the sound means 50e.

具体的に、音声処理部１００は、ガイド部４００からガイド音に対応する音声信号を取得する。音声処理部１００は、集音手段５０ｅが集音した音に対応する音声信号から、ガイド音に対応する音声信号を減算することにより、音声入力に対応する音声信号のみを抽出し、抽出した音声信号に基づいて音声認識処理を行う。 Specifically, the voice processing unit 100 acquires a voice signal corresponding to the guide sound from the guide unit 400. The voice processing unit 100 extracts only the voice signal corresponding to the voice input by subtracting the voice signal corresponding to the guide sound from the voice signal corresponding to the sound collected by the sound collecting means 50e, and the extracted voice. Performs voice recognition processing based on the signal.

なお、ガイド音に対応する音声信号を完全に除去する必要は無い。すなわち、音声入力に対応する音声信号に基づいて音声認識処理ができる程度にガイド音に対応する音声信号が弱くなればよい。 It is not necessary to completely remove the audio signal corresponding to the guide sound. That is, the voice signal corresponding to the guide sound may be weakened to the extent that the voice recognition process can be performed based on the voice signal corresponding to the voice input.

このような前処理を行うことにより、集音手段５０ｅが集音した音の中から音声入力に対応する音声信号のみを確実に取り出すことができる。 By performing such preprocessing, it is possible to reliably extract only the voice signal corresponding to the voice input from the sound collected by the sound collecting means 50e.

なお、設定部５００によりガイド音の音量が設定された場合、ガイド音を除去する前処理は、設定されたガイド音の音量に基づいて行ってもよい。たとえば、図５のテーブルにおいて、ガイド音の音量が大きめ（１０段階のうち「４」）に設定された場合には前処理を行い、ガイド音の音量が少し大きめ（１０段階のうち「３」）以下に設定された場合には前処理を行わないことにしてもよい。これにより、音声処理部１００は、より確実に音声認識処理を行うことができる。 When the volume of the guide sound is set by the setting unit 500, the preprocessing for removing the guide sound may be performed based on the set volume of the guide sound. For example, in the table of FIG. 5, when the volume of the guide sound is set to be high (“4” out of 10 steps), preprocessing is performed and the volume of the guide sound is slightly loud (“3” out of 10 steps). ) If the following is set, preprocessing may not be performed. As a result, the voice processing unit 100 can perform voice recognition processing more reliably.

＜その他＞
上記実施形態は、カラオケ用入力装置としてリモコン装置５０を例に説明した。一方、カラオケ装置Ｋ自体がカラオケ用入力装置として機能してもよい。この場合、カラオケ本体１０が少なくとも記憶手段５０ａ、通信手段５０ｂ、及び制御手段５０ｇ（音声処理部１００、実行部２００、判定部３００、ガイド部４００）を備える。また表示装置３０が表示手段５０ｃとして機能し、リモコン装置５０が入力手段５０ｄとして機能し、マイク４０が集音手段５０ｅとして機能し、スピーカ２０が放音手段５０ｆとして機能する。 <Others>
The above embodiment has been described by taking the remote controller device 50 as an example of the input device for karaoke. On the other hand, the karaoke device K itself may function as a karaoke input device. In this case, the karaoke body 10 includes at least a storage means 50a, a communication means 50b, and a control means 50g (voice processing unit 100, execution unit 200, determination unit 300, guide unit 400). Further, the display device 30 functions as the display means 50c, the remote control device 50 functions as the input means 50d, the microphone 40 functions as the sound collecting means 50e, and the speaker 20 functions as the sound emitting means 50f.

上記実施形態は、例として提示したものであり、発明の範囲を限定するものではない。上記の構成は、適宜組み合わせて実施することが可能であり、発明の要旨を逸脱しない範囲で、種々の省略、置き換え、変更を行うことができる。上記実施形態やその変形は、発明の範囲や要旨に含まれると同様に、特許請求の範囲に記載された発明とその均等の範囲に含まれる。 The above-described embodiment is presented as an example and does not limit the scope of the invention. The above configurations can be implemented in appropriate combinations, and various omissions, replacements, and changes can be made without departing from the gist of the invention. The above-described embodiment and its modifications are included in the scope and gist of the invention, as well as in the scope of the invention described in the claims and the equivalent scope thereof.

５０リモコン装置
１００音声処理部
２００実行部
３００判定部
４００ガイド部
５００設定部 50 Remote control device 100 Voice processing unit 200 Execution unit 300 Judgment unit 400 Guide unit 500 Setting unit

Claims

A karaoke input device for executing a predetermined command by voice input.
A voice processing unit that performs voice recognition processing on the user's voice signal output from the sound collecting means and outputs it as text data.
An execution unit that executes a command corresponding to the input voice based on the text data,
A determination unit that determines whether or not the voice signal contains stuttering,
When it is determined that stuttering is included, a guide unit that outputs a guide sound for guiding voice input by the user, and a guide unit.
Input device for karaoke.

When it is determined that stuttering is included, it has a setting unit that sets the tempo and / or volume of the guide sound based on the degree of stuttering.
The karaoke input device according to claim 1, wherein the guide unit outputs a guide sound at a set tempo and / or volume.

When a new voice signal is output from the sound collecting means while the guide unit is outputting the guide sound, the determination unit determines whether or not the new voice signal includes stuttering.
The setting unit resets the tempo and / or the volume according to the determination result of the determination unit.
The karaoke input device according to claim 2, wherein the guide unit outputs a new guide sound at a reset tempo and / or volume.

When a new voice signal is output from the sound collecting means while the guide unit is outputting the guide sound, the voice processing unit receives the sound when the new voice signal is voice-recognized. The karaoke input device according to any one of claims 1 to 3, wherein a pretreatment for removing the guide sound output from the means is performed.