JP2010224013A

JP2010224013A - Karaoke machine stopping performance with voice uttered by singer to microphone

Info

Publication number: JP2010224013A
Application number: JP2009068419A
Authority: JP
Inventors: Rie Shigyo; 里恵執行
Original assignee: Daiichikosho Co Ltd
Current assignee: Daiichikosho Co Ltd
Priority date: 2009-03-19
Filing date: 2009-03-19
Publication date: 2010-10-07

Abstract

<P>PROBLEM TO BE SOLVED: To stop accompaniment music harmoniously and smoothly, midway through the music only with voice instructions to a microphone, while a singer who holds the microphone and is in the middle of the performance is not aware of such a troubles as voice feature parameter registration. <P>SOLUTION: The voice of the singer is analyzed, and feature parameters are extracted, based on singing voice signals obtained from the microphone 21 of the singer and character data in lyrics. Whether the singer who sang immediately prior utters words for performance stopping-instructions is analyzed, based on the feature parameter which a speech analysis means analyzed, until immediately prior and the words for performance stopping-instructions stored in a memory, during an interlude section of the Karaoke-accompanied music. The Karaoke-accompanied music performed is stopped, midway through the music when a voice recognition means analyzes that the words for performance stopping-instructions are uttered. <P>COPYRIGHT: (C)2011,JPO&INPIT

Description

この発明は歌唱者がマイクロホンに発した音声により演奏を中止するカラオケ装置に関し、とくに、演奏中のカラオケ伴奏音楽を歌唱者の音声指示によって途中終了させられるようにしたものに関する。 The present invention relates to a karaoke apparatus in which a singer ceases performance by voice uttered by a microphone, and more particularly to a karaoke apparatus in which karaoke accompaniment music being played can be terminated halfway by a singer's voice instruction.

カラオケ装置では、利用客が自由に選んだ曲目の伴奏音楽を生成および音響再生するとともに、その伴奏音楽の進行に同期した時系列の歌詞文字データを生成し、これを背景映像に重ねてディスプレイ表示させる（たとえば特許文献１や２など）。 The karaoke device generates and reproduces accompaniment music of a song that is freely selected by the user, generates time-series lyric character data synchronized with the progress of the accompaniment music, and overlays this on the background video to display (For example, Patent Documents 1 and 2).

このカラオケ装置は複数人のグループ単位で利用されることが多い。たとえばカラオケボックスの各部屋に１台ずつ設置されているカラオケ装置はそれぞれ、部屋ごとに複数の客によって共同利用されることが多い。 This karaoke device is often used in groups of a plurality of people. For example, one karaoke device installed in each room of a karaoke box is often shared by a plurality of customers in each room.

このため、１台のカラオケ装置には複数の客からそれぞれに演奏予約が入って、演奏処理の待ち行列ができる。演奏予約の受付は随時可能であるため、待ち行列は利用客が一人であっても生じる。待ち行列がなくなると、カラオケ装置は次の演奏予約が入るまで待機状態（アイドリング状態）となるが、この待機は時間や料金の無駄になるので回避されることが多い。通常は、常に待ち行列がある状態で利用される。 For this reason, one karaoke apparatus is reserved for performance by a plurality of customers, and a queue for performance processing is made. Since it is possible to accept a performance reservation at any time, a queue is created even if there is only one customer. When there is no queue, the karaoke apparatus is in a standby state (idling state) until the next performance reservation is made, but this standby is often avoided because it wastes time and fees. Normally, it is always used with a queue.

また、１台のカラオケ装置を複数人で利用する場合、その複数の利用者の全員が適宜交代しながら平等に歌唱の機会を持つことが望ましく、そのためには、演奏予約の待ち行列が円滑に捌けるような効率的な利用を個々の利用者がそれぞれに実践する必要がある。 In addition, when a single karaoke device is used by a plurality of people, it is desirable that all of the plurality of users have an equal opportunity to sing while switching appropriately. It is necessary for each user to practice efficient use that makes money.

カラオケ装置では、利用者が予約した曲目の歌を伴奏音楽に合わせて歌唱するのであるが、歌唱者がその伴奏音楽の全演奏が終了するまで歌唱するとは限らず、たとえば、１番の歌詞だけ歌って２番以降の歌詞は歌唱を省略して次の予約待ち曲目を繰り上げ演奏させたいという場合がある。あるいは、歌詞の好みのフレーズ部分だけを歌唱して次の予約待ちを演奏させたいという場合もある。とくに若者達のカラオケ利用シーンでは、歌いたい曲をつぎつぎとリクエストし、順番が来ると歌いたいところまで歌ってすぐに演奏中止とし、素早くつぎのリクエスト曲の演奏を開始させるという、目まぐるしい遊び方がよく見られる。 In the karaoke device, the song of the song reserved by the user is sung along with the accompaniment music, but the singer does not always sing until the performance of the accompaniment music is finished, for example, only the first lyrics Singing the second and later lyrics may be desired to skip the singing and advance the next reserved song. Alternatively, there may be a case where only the favorite phrase portion of the lyrics is sung and the next reservation is to be played. Especially in young people's karaoke use scenes, they often request a song that they want to sing, and when the turn comes, sing to the point where they want to sing, stop playing immediately, and immediately start playing the next requested song. It can be seen.

演奏中の伴奏音楽を途中終了させる場合、その操作を、歌唱を伴わない伴奏音楽の間奏区間で行えば、それほどの違和感を伴うことなく、つまりその場の雰囲気を損なうことなく、自然な流れとして伴奏音楽を途中終了させて次の予約待ち曲目を繰り上げ演奏させることができる。 When the accompaniment music being played is terminated halfway, if the operation is performed in the interlude section of the accompaniment music without singing, it will not be accompanied by much discomfort, that is, without impairing the atmosphere of the place, as a natural flow The accompaniment music can be terminated halfway, and the next reserved song can be advanced.

この場合、その伴奏音楽を円満かつ円滑に途中終了させる操作の最適任者は歌唱者本人である。歌唱者本人ならば、伴奏の途中終了の是非およびその終了タイミングを自己判断で自由に見計らうことができるが、歌唱者以外の第三者では、適切なタイミングで操作できなかったりすること、あるいは歌唱者の意に反して操作してしまうことが懸念されるからである。 In this case, the singer is the best person who can complete the accompaniment music smoothly and smoothly. If you are a singer, you can freely determine the pros and cons of the end of the accompaniment and its end timing, but you may not be able to operate it at a suitable time with a third party other than the singer, Or it is because it is concerned that it will operate contrary to a singer's will.

途中終了の操作はリモコンで行えても、演奏中止にしたいと思った歌唱者の手元にリモコンがないと、リモコンを持っている仲間に口頭で依頼して演奏中止の操作をしてもらうこともある。この際に仲間内で誤って情報伝達が行われ、まだ歌おうとしているのに仲間が演奏中止にしてしまうといった事態も発生する。 Even if the remote control can be done with the remote control, if there is no remote control at hand of the singer who wanted to cancel the performance, you can also ask the friend who has the remote control to stop the performance verbally is there. At this time, there is a situation in which information is mistakenly transmitted within the group, and the group stops performing even though it is still singing.

しかし、マイクロホンを手にして歌唱体勢にある歌唱者本人が、頃合いのタイミングで伴奏音楽を途中終了させる操作を行うのは、かなり煩雑で困難なことである。 However, it is quite cumbersome and difficult for the singer who is in a singing posture with the microphone in hand to perform the operation of terminating the accompaniment music at an appropriate timing.

そこで、マイクロホンを持ったままでカラオケ装置を操作できるようにした技術が提案されている（特許文献３）。特許文献３には音声認識リモコン機能付きワイヤレスマイクロホンの技術が記載されているが、このマイクロホンのリモコン機能でカラオケ装置の操作を行わせるようにすれば、マイクロホンを手にして歌唱体勢にある歌唱者本人が無理なく、随意のタイミングで伴奏音楽を途中終了させることが可能になる。 Therefore, a technique has been proposed in which a karaoke apparatus can be operated with a microphone (Patent Document 3). Patent Document 3 describes the technology of a wireless microphone with a voice recognition remote control function. However, if the microphone is operated with the remote control function of this microphone, a singer in a singing posture with the microphone in hand. Accompaniment music can be terminated halfway at any time without any difficulty.

特開２００４−３１７９２３JP 2004-317923 A 特開２００４−４８９６JP2004-4896 特開２００２−０６２８９４JP2002-062894

特許文献３に開示されている音声認識リモコン機能付きワイヤレスマイクロホンは、そのワイヤレスマイクロホンに音響入力された音声の認識処理を行い、特定の言葉が入力されたと認識したとき、その言葉に対応付けされて記憶されているコードを適宜な無線信号に変調して送出するものであって、必要な構成はすべてワイヤレスマイクロホンに内蔵される。つまり、カラオケ装置からは完全に独立し、ワイヤレスマイクロホン内だけで完結する構成であった。 The wireless microphone with a voice recognition remote control function disclosed in Patent Document 3 performs a process for recognizing a voice acoustically input to the wireless microphone, and recognizes that a specific word is input, and associates it with the word. The stored code is modulated into an appropriate radio signal and transmitted, and all necessary components are built in the wireless microphone. In other words, it was completely independent from the karaoke device and completed only within the wireless microphone.

ここで、音声認識には、特定話者の音声だけを認識対象とする特定話者音声認識と、不特定多数の話者の音声を認識対象とする不特定話者音声認識の２種類がある。
特定話者音声認識は、あらかじめメモリに記憶・登録された特定話者の音声特徴パラメータを用いて入力音声の認識処理を行う。この特定話者音声認識は、認識対象者である利用者の音声特徴パラメータをあらかじめ登録しなければならないという面倒はあるが、その代わり認識精度が高く、背景音が大きい環境でも誤動作が少ないという利点がある。
一方、不特定話者音声認識は、話者が限定されず、誰の声でも認識対象とすることができる上に、利用者の音声特徴パラメータをあらかじめ登録するという面倒もない。したがって、たとえば特許文献３の音声認識リモコン機能付きワイヤレスマイクロホンのように、一つの完結した装置系の中で音声認識を行わせるには適している。その代わり、特定話者音声認識に比べると認識精度が大きく劣り、背景音が大きい環境では誤動作しやすいという問題がある。 Here, there are two types of speech recognition: specific speaker speech recognition that recognizes only the speech of a specific speaker, and unspecified speaker speech recognition that recognizes the speech of an unspecified number of speakers. .
In the specific speaker voice recognition, input voice recognition processing is performed using the voice feature parameters of a specific speaker stored and registered in a memory in advance. This specific speaker voice recognition has the trouble of having to register the voice feature parameters of the user who is the recognition target in advance, but instead has the advantage of high recognition accuracy and few malfunctions even in environments with high background sounds. There is.
On the other hand, unspecified speaker voice recognition is not limited to a speaker, and any voice can be recognized, and there is no hassle of registering a user's voice feature parameters in advance. Therefore, it is suitable for performing speech recognition in one complete device system, such as a wireless microphone with a speech recognition remote control function in Patent Document 3, for example. Instead, the recognition accuracy is greatly inferior to that of specific speaker speech recognition, and there is a problem that malfunction is likely to occur in an environment with a large background sound.

歌声、話し声、歓声などの音声が常に飛び交うカラオケ利用現場は、音声認識の環境としては劣悪であり、このようなところで音声認識による操作を間違いなく行わせるためには、不特定話者音声認識は不適格であり、少なくとも特定話者音声認識とする必要がある。
しかし、不特定多数が利用するカラオケ装置において、話者を限定する特定話者音声認識はカラオケ装置の使用実態に合わず、仮に、その特定話者音声認識による操作を不特定多数の利用客に行わせようとしたら、個々の利用客に特定話者音声認識の機能を十分に理解させた上で、利用客ごとに音声特徴パラメータをあらかじめ登録させる操作を強いる必要があった。 The karaoke site where voices such as singing voices, talking voices and cheers constantly fly is inferior as an environment for voice recognition. It is ineligible and needs to be at least specific speaker speech recognition.
However, in a karaoke device used by an unspecified number of people, the specific speaker voice recognition that limits the speaker does not match the actual usage of the karaoke device. In order to perform this, it is necessary to force each user to fully understand the specific speaker voice recognition function, and to force the user to register the voice feature parameters in advance for each user.

本発明は、以上のようなカラオケ特有の技術背景を鑑みてなされたものであって、その目的は、マイクロホンを手にして歌唱体勢にある歌唱者本人が、音声特徴パラメータ登録といった面倒を意識することなく、そのマイクロホンへの音声指示だけでもって、伴奏音楽を円満かつ円滑に途中終了させることができ、これにより、たとえば、カラオケ利用現場の盛り上がった雰囲気を損なうことなく、自然な流れとして次の予約待ち曲目を繰り上げ演奏させることができるようにしたカラオケ装置を提供することにある。 The present invention has been made in view of the technical background peculiar to karaoke as described above. The purpose of the present invention is that a singer who is in a singing posture with a microphone is aware of the trouble of registering voice feature parameters. Without being impaired, the accompaniment music can be smoothly and smoothly terminated by simply giving a voice instruction to the microphone. An object of the present invention is to provide a karaoke apparatus capable of moving a reservation-waiting song forward.

この発明に係るカラオケ装置は、つぎの事項（１）〜（４）により特定されるものである。 The karaoke apparatus according to the present invention is specified by the following items (1) to (4).

（１）音声分析手段と、音声認識手段と、演奏中止手段を備えたカラオケ装置であること
（２）音声分析手段は、カラオケ伴奏音楽の進行に合わせて歌う歌唱者のマイクロホンから得た歌声信号と、伴奏音楽の進行に同期した時系列の歌詞文字データとに基づいて、歌唱者の音声を分析して特徴パラメータを抽出すること
（３）音声認識手段は、カラオケ伴奏音楽の間奏区間において、直前までに音声分析手段が分析した特徴パラメータと、メモリに記憶されている演奏中止指示用単語とに基づいて、直前まで歌っていた歌唱者が演奏中止指示用単語を発話したか否かを分析すること
（４）演奏中止手段は、音声認識手段が演奏指示用単語を発話したと分析した場合、演奏中のカラオケ伴奏音楽を途中終了させること (1) It is a karaoke device provided with voice analysis means, voice recognition means, and performance stop means. (2) The voice analysis means is a singing voice signal obtained from a microphone of a singer who sings along with the progress of karaoke accompaniment music. And analyzing the voice of the singer based on the time-series lyric character data synchronized with the progress of the accompaniment music, and extracting feature parameters. (3) The voice recognition means Based on the characteristic parameters analyzed by the voice analysis means until immediately before and the performance stop instruction word stored in the memory, it is analyzed whether or not the singer who was singing just before has uttered the performance stop instruction word (4) The performance stopping means, when analyzing that the voice recognition means has uttered the performance instruction word, ends the karaoke accompaniment music being played halfway.

カラオケ装置において、マイクロホンを手にして歌唱体勢にある歌唱者本人が、音声特徴パラメータ登録といった面倒を意識することなく、そのマイクロホンへの音声指示だけでもって、伴奏音楽を円満かつ円滑に途中終了させることができる。これにより、たとえば、カラオケ利用現場の盛り上がった雰囲気を損なうことなく、自然な流れとして次の予約待ち曲目を繰り上げ演奏させることができるようになる。 In a karaoke device, a singer who is in a singing posture with a microphone in hand quits accompaniment music smoothly and smoothly with just a voice instruction to the microphone without worrying about the trouble of registering voice feature parameters. be able to. As a result, for example, the next reservation-waiting music can be advanced and played as a natural flow without impairing the lively atmosphere of the karaoke site.

この発明の一実施例を構成するカラオケ装置の機能ブロック図である。It is a functional block diagram of the karaoke apparatus which comprises one Example of this invention. 中央処理装置に付加されている機能とその動作の概略を示すブロック図である。It is a block diagram which shows the outline of the function added to the central processing unit, and its operation | movement.

＝＝＝カラオケ装置の基本的な構成と動作＝＝＝
この発明の実施例に係るカラオケ装置の概略構成を図１に例示する。
このカラオケ装置は、周知のパソコン相当のコンピュータ応用機器であって、その中核をなす中央処理装置１１は、ＣＰＵ・ＲＡＭ・ＲＯＭを含むコンピュータ本体を形成する。 === Basic configuration and operation of karaoke apparatus ===
A schematic configuration of a karaoke apparatus according to an embodiment of the present invention is illustrated in FIG.
This karaoke apparatus is a computer application device equivalent to a well-known personal computer, and the central processing unit 11 that forms the core of the karaoke apparatus forms a computer main body including a CPU, a RAM, and a ROM.

中央処理装置１１の制御管理下に、大容量の外部記憶としてのハードディスク装置１２、ＣＤ−ＲＯＭやＤＶＤ−ＲＯＭなどの光ディスク再生装置１３、光通信回線などの公衆通信回線を介してカラオケホスト装置と通信する通信制御装置１４、利用者からの入力と利用者に向けての応答をやりとりする利用者インタフェース装置１５、ＭＩＤＩ形式の音楽演奏データに基づいて伴奏音楽の音響信号を生成する音楽生成装置１６、伴奏音楽やマイクロホン２１からの音響信号を増幅してスピーカ２２から発音する音響装置１７、ＬＣＤやＰＤＰなどを用いたディスプレイ１８、このディスプレイ１８に表示すべき映像データを処理する映像処理装置１９などが設置されている。 Under the control and management of the central processing unit 11, a hard disk device 12 as a large-capacity external storage, an optical disk playback device 13 such as a CD-ROM or DVD-ROM, and a karaoke host device via a public communication line such as an optical communication line A communication control device 14 for communication, a user interface device 15 for exchanging input from a user and a response to the user, and a music generation device 16 for generating an acoustic signal of accompaniment music based on MIDI music performance data , An acoustic device 17 that amplifies an accompaniment music and an acoustic signal from the microphone 21 and generates sound from the speaker 22, a display 18 using an LCD, a PDP, etc., a video processing device 19 that processes video data to be displayed on the display 18, etc. Is installed.

ハードディスク装置１２には多数のカラオケ楽曲について、ＭＩＤＩデータを主体とした伴奏音楽データと、歌詞画像の生成起源となる歌詞文字データとを含むカラオケデータが蓄積されている。また、所定形式の長時間分の動画データと、動画データの処理シーケンス（処理すべき動画データの格納場所と処理順番など）を規定した台本データや、演奏可能なカラオケ楽曲について、曲名やアーティスト名、発表年、歌詞の歌い出し部分などの目次情報も格納されている。 The hard disk device 12 stores, for many karaoke songs, karaoke data including accompaniment music data mainly composed of MIDI data and lyric character data that is a generation origin of the lyric image. In addition, for the long-format video data in a predetermined format, script data that defines the video data processing sequence (storage location and processing order of video data to be processed, etc.), and karaoke songs that can be played, song names and artist names Table of contents information such as the release year and the singing part of the lyrics is also stored.

中央処理装置１１は、各楽曲のカラオケデータ、台本データ、および目次情報を楽曲番号によって識別し、これをカラオケデータベースとして管理している。 The central processing unit 11 identifies karaoke data, script data, and table of contents information of each music piece by music number, and manages this as a karaoke database.

中央処理装置１１は、利用者インタフェース装置１５から演奏予約コマンドを受信すると、その受信コマンドに含まれている楽曲ＩＤ（楽曲識別符号）を受け取った順に演奏予約の待ち行列に登録する。そして待ち行列から登録順に楽曲ＩＤを取り出して、カラオケデータベースから該当する楽曲用のカラオケデータを取りだして演奏処理に供する。 When receiving the performance reservation command from the user interface device 15, the central processing unit 11 registers the music ID (music identification code) included in the received command in the performance reservation queue in the order received. Then, the music IDs are extracted from the queue in the order of registration, and the karaoke data for the corresponding music is extracted from the karaoke database and used for performance processing.

音楽生成装置１６はカラオケデータ中の伴奏音楽データによって伴奏音楽を生成する。歌詞文字データについては、伴奏音楽に同期して歌唱すべき箇所が色変わりする歌詞画像をビデオＲＡＭに順次ビットマップ展開していく。また、台本データに基づいて所定の動画データを所定の順番で映像処理装置１９に順次転送して歌詞画像の背景動画を復号させる。 The music generation device 16 generates accompaniment music from the accompaniment music data in the karaoke data. As for the lyric character data, the lyric image in which the portion to be sung is changed in color in synchronization with the accompaniment music is successively developed in the video RAM as a bitmap. Also, predetermined moving image data is sequentially transferred to the video processing device 19 in a predetermined order based on the script data, and the background moving image of the lyrics image is decoded.

音響装置１７はミキシングアンプを含み、音楽生成装置１６で生成された伴奏音楽と、マイクロホン２２に入力された歌声音声とを混合・増幅してスピーカ２２より音響出力する。この音響装置１７には、マイクロホン２１から入力された音声信号をデジタル処理するためのＡＤ変換器１７１が含まれている。 The acoustic device 17 includes a mixing amplifier, and mixes and amplifies the accompaniment music generated by the music generation device 16 and the singing voice input to the microphone 22 and outputs the sound from the speaker 22. The acoustic device 17 includes an AD converter 171 for digitally processing the audio signal input from the microphone 21.

映像処理装置１９は、復号した動画映像に歌詞画像をスーパーインポーズ処理してディスプレイ１８に表示出力する。歌詞画像は、伴奏音楽の進行に同期した時系列の歌詞文字データから順次作成される。 The video processing device 19 superimposes the lyrics image on the decoded moving image and displays it on the display 18. Lyric images are created sequentially from time-series lyric character data synchronized with the progress of accompaniment music.

利用者インタフェース装置１５には、カラオケ装置本体の操作パネルやカラオケリモコン装置が含まれ、双方向通信が可能な短距離無線通信手段（ＩｒＤＡトランシーバ・赤外線ＬＥＤ・赤外線受光素子）を備えている。 The user interface device 15 includes an operation panel of the karaoke device main body and a karaoke remote control device, and includes short-range wireless communication means (IrDA transceiver, infrared LED, infrared light receiving element) capable of bidirectional communication.

＝＝＝中央処理装置１１に付加されている機能とその動作＝＝＝
この発明に係るカラオケ装置においては、カラオケ装置全体の動作が中央処理装置１１の制御管理下で行われるが、この中央処理装置１１には、図２に示すように、音声分析、音声認識、間奏区間弁別、および演奏中止等の各機能部がソフトウェア的に付加されている。 === Functions Added to the Central Processing Unit 11 and their Operations ===
In the karaoke apparatus according to the present invention, the operation of the entire karaoke apparatus is performed under the control and management of the central processing unit 11, and the central processing unit 11 includes voice analysis, voice recognition, and interlude as shown in FIG. Functional parts such as section discrimination and performance stop are added in software.

音声分析機能部は、音楽生成装置１６が伴奏音楽を生成し、かつ、その伴奏音楽に同期して歌詞文字データが順次出力される歌唱区間のときに、カラオケ伴奏音楽の進行に合わせて歌う歌唱者のマイクロホン２１から得た歌声信号と、伴奏音楽の進行に同期した時系列の歌詞文字データとに基づいて、歌唱者の音声を分析して特徴パラメータを抽出する。 The voice analysis function unit sings along with the progress of the karaoke accompaniment music when the music generation device 16 generates the accompaniment music and the lyrics character data is sequentially output in synchronization with the accompaniment music. On the basis of the singing voice signal obtained from the person's microphone 21 and the time-series lyric character data synchronized with the progress of the accompaniment music, the singer's voice is analyzed to extract feature parameters.

この歌唱区間では、歌唱すべき歌詞文字に対応させて歌唱者の音声を分析することにより、その歌唱者の音声認識に必要な特徴パラメータが抽出される。抽出した特徴パラメータはメモリに順次更新しながら蓄積される。メモリには常に最新の特徴パラメータだけが記憶・蓄積される。また、共通する歌詞文字が複数存在した場合には平均化した特徴パラメータを記憶・蓄積するようにしてもよい。 In this singing section, by analyzing the voice of the singer in correspondence with the lyric characters to be sung, characteristic parameters necessary for the voice recognition of the singer are extracted. The extracted feature parameters are accumulated while being sequentially updated in the memory. Only the latest feature parameters are always stored and stored in the memory. Further, when there are a plurality of common lyric characters, averaged characteristic parameters may be stored and stored.

歌唱区間と間奏区間は、演奏中の楽曲についてのカラオケデータに含まれている時間データ、あるいは歌詞文字データの時系列上の出力パターンなどに基づいて弁別される。 The singing section and the interlude section are discriminated based on the time data included in the karaoke data about the music being played, the time-series output pattern of the lyric character data, or the like.

音声認識機能部は、伴奏音楽は継続するが歌詞文字データは出力されない間奏区間のときに、マイクロホン２１から得られる歌唱者の発話を、その間奏区間の直前までに音声分析機能部が分析してメモリに記憶した特徴パラメータに基づいて音声認識する。この音声認識の結果と、メモリにあらかじめ記憶されている演奏中止指示用単語とに基づいて、間奏区間の直前まで歌っていた歌唱者が演奏中止指示用単語（たとえば、「オワリ」「チュウシ」「ツギ」「ストップ」「ネックスト」など）を発話したか否かを分析する。 The voice recognition function unit analyzes the utterance of the singer obtained from the microphone 21 by the voice analysis function unit immediately before the interlude section when the accompaniment music continues but the lyric character data is not output. Speech recognition is performed based on the feature parameters stored in the memory. Based on the result of the voice recognition and the performance stop instruction word stored in advance in the memory, the singer who has sung until just before the interlude section performs the performance stop instruction words (for example, “Owari” “Chuushi” “ Tsugi, “stop”, “neck strike”, etc.).

ここで、演奏中止指示用単語が発話されたと分析されると、その分析結果が演奏中止指令として音声認識機能部から発信される。中央処理装置１１は、その演奏中止指令に応答して伴奏音楽を途中終了させる手順を実行し、次の予約待ち曲目がある場合は、その予約待ち曲目を繰り上げ演奏させる。 Here, when it is analyzed that the performance stop instruction word is uttered, the analysis result is transmitted from the voice recognition function unit as a performance stop command. The central processing unit 11 executes a procedure for ending the accompaniment music in response to the performance stop command, and if there is a next reserved waiting piece, the reserved waiting piece is advanced.

演奏中止指示用単語が発話されなかったと分析された場合は、そのまま演奏を継続させる。 If it is analyzed that the performance stop instruction word has not been spoken, the performance is continued.

以上のように、この発明に係るカラオケ装置では、マイクロホンを手にして歌唱体勢にある歌唱者本人が、そのマイクロホンへの音声指示だけでもって、伴奏音楽を円満かつ円滑に途中終了させることができるが、このとき、その歌唱者本人の音声指示は、その歌唱者が歌唱中にその歌唱発音から分析・抽出した特徴パラメータに基づいて音声認識される。 As described above, in the karaoke apparatus according to the present invention, the singer himself who is in a singing posture with the microphone in his / her hand can end the accompaniment music in a complete and smooth way with only a voice instruction to the microphone. However, at this time, the voice instruction of the singer himself / herself is recognized based on the characteristic parameter analyzed and extracted from the singing pronunciation during the singing by the singer.

これにより、その歌唱者を話者に限定した特定話者音声認識が行われ、この認識結果に基づいてその歌唱者が演奏中止指示用単語を発音したか否かが分析される。この特定話者音声認識は、不特定話者音声認識に比べて、認識精度が高く、背景音が大きいカラオケ利用環境でも誤動作が少ない利点があるが、その音声認識を間奏区間に限定することで歌唱者の音声による演奏中止指示は一層確実に行われるようになる。 Thereby, the specific speaker voice recognition is performed with the singer limited to the speaker, and it is analyzed whether or not the singer has pronounced the performance stop instruction word based on the recognition result. This specific speaker voice recognition has the advantage of higher recognition accuracy and fewer malfunctions even in a karaoke environment where the background sound is large compared to unspecified speaker voice recognition, but by limiting the voice recognition to the interlude section. The performance stop instruction by the voice of the singer is more reliably performed.

特定話者音声認識のためには特定話者の音声特徴パラメータ登録という面倒な準備操作が必要になるが、この発明に係るカラオケ装置では、歌唱者はそういった面倒をまったく意識する必要がなく、歌唱区間に普通に歌唱するだけで特定話者音声認識に必要な特徴パラメータが抽出されてメモリに記憶・登録される。 In order to recognize a specific speaker, a troublesome preparatory operation for registering a voice characteristic parameter of the specific speaker is necessary. However, in the karaoke apparatus according to the present invention, the singer does not need to be aware of such trouble at all, and singing Feature parameters necessary for voice recognition of a specific speaker are extracted just by singing normally in a section, and stored and registered in a memory.

そして、間奏区間に歌唱者がマイクロホンに音声指示を発話するだけでもって、伴奏音楽を円満かつ円滑に途中終了させることができる。これにより、たとえば、カラオケ利用現場の盛り上がった雰囲気を損なうことなく、自然な流れとして次の予約待ち曲目を繰り上げ演奏させることができる。 Then, the accompaniment music can be completed smoothly and smoothly in the middle just by singing a voice instruction to the microphone during the interlude section. Thereby, for example, the next reservation waiting song can be advanced and played as a natural flow without impairing the lively atmosphere of the karaoke site.

１１中央処理装置
１２ハードディスク装置
１３光ディスク再生装置
１４通信制御装置
１５利用者インタフェース装置
１６音楽生成装置
１７音響装置
１７１ＡＤ変換器
１８ディスプレイ
１９映像処理装置
２１マイクロホン
２２スピーカ DESCRIPTION OF SYMBOLS 11 Central processing unit 12 Hard disk device 13 Optical disk reproducing device 14 Communication control device 15 User interface device 16 Music generating device 17 Audio device 171 AD converter 18 Display 19 Video processing device 21 Microphone 22 Speaker

Claims

A karaoke apparatus comprising speech analysis means, speech recognition means, and performance stop means,
The voice analysis means analyzes the voice of the singer based on the singing voice signal obtained from the microphone of the singer singing along with the progress of the karaoke accompaniment music and the time-series lyric character data synchronized with the progress of the accompaniment music. Extract feature parameters,
The voice recognizing means is configured such that in the interlude section of karaoke accompaniment music, the singer who has been singing just before is based on the characteristic parameters analyzed by the voice analyzing means until immediately before and the performance stop instruction word stored in the memory. Analyzing whether or not the word for instructing to stop playing is spoken,
The performance canceling means is a karaoke device that terminates the karaoke accompaniment music being played when the voice recognition means analyzes that the performance instruction word has been uttered.