JP2015072432A

JP2015072432A - Karaoke device and program

Info

Publication number: JP2015072432A
Application number: JP2013209192A
Authority: JP
Inventors: 哲也水谷; Tetsuya Mizutani
Original assignee: Brother Industries Ltd
Current assignee: Brother Industries Ltd
Priority date: 2013-10-04
Filing date: 2013-10-04
Publication date: 2015-04-16
Anticipated expiration: 2033-10-04
Also published as: JP6060869B2

Abstract

PROBLEM TO BE SOLVED: To reduce unpleasantness given to a user to the minimum by making a delay of the synthetic sound such as a guide vocal for the karaoke performance as small as possible when restored to an ordinary reproduction state from a so-called trick play such as a temporary stop, a fast-forward operation, and a rewinding operation.SOLUTION: When finishing a trick play such as a temporary stop, a performance speed of the karaoke musical piece is restored to an original speed, and a generation of a synthetic sound and an output of the synthetic sound are performed according to a synthetic sound synchronous timing by setting the synthetic sound synchronous timing which is a timing to allow an output of the synthetic sound to synchronize with a performance of the karaoke musical piece. When the generation of the synthetic sound is not made in time for the synthetic sound synchronous timing, a digital sound signal is generated prior to the generation of the synthetic sound, and after finishing the generation of the digital sound signal, with the generation of the synthetic sound concentrated, the generation of the synthetic sound is performed by a speed having a value larger than the performance speed of the karaoke musical piece until the generation of the synthetic sound catches up to the performance of the karaoke musical piece.

Description

本発明は、ＭＩＤＩデータに基づきカラオケ楽曲の演奏を行うとともに、圧縮音声データに基づき音声合成にて生成した合成音声を前記カラオケ楽曲の演奏に合わせて出力するカラオケ装置に関する。 The present invention relates to a karaoke apparatus that performs karaoke music based on MIDI data and outputs a synthesized voice generated by voice synthesis based on compressed voice data in accordance with the performance of the karaoke music.

特許文献１には、カラオケ楽曲のカラオケ演奏中に、歌唱者の音声データに基づいて音声合成したガイドボーカルを出力して歌唱支援を行うガイドボーカル機能を有するカラオケ装置について記載されている。 Patent Document 1 describes a karaoke apparatus having a guide vocal function that supports a singing by outputting a guide vocal that is synthesized based on voice data of a singer during karaoke performance of karaoke music.

特開２００９−２４４７８９号公報JP 2009-244789 A

しかし、特許文献１に記載の技術では、次のような問題があった。すなわち、上述のようなガイドボーカルを生成するために行われる音声合成については、カラオケ装置が備えるＣＰＵなどの処理装置での処理負荷が大きいため、一時停止や早送り、巻き戻しなどのいわゆるトリックプレイから通常の再生状態に復帰する際に、音声合成がカラオケ演奏に間に合わずに、利用者に不快感を与えるという問題があった。なお、音声データに含まれる発音数の密度が高いほど音声合成のための処理装置での処理負荷は大きくなり、カラオケ演奏に対する合成音声の遅れが顕著となる。 However, the technique described in Patent Document 1 has the following problems. That is, with respect to the speech synthesis performed to generate the guide vocal as described above, since the processing load on the processing device such as the CPU provided in the karaoke device is large, from so-called trick play such as pause, fast forward, and rewind. When returning to the normal playback state, there is a problem that voice synthesis is not in time for karaoke performance, and the user feels uncomfortable. Note that the higher the density of the number of pronunciations included in the speech data, the greater the processing load on the processing device for speech synthesis, and the delay of the synthesized speech with respect to the karaoke performance becomes significant.

本発明は、このような課題に鑑みなされたものであり、その目的とするところは、一時停止や早送り、巻き戻しなどのいわゆるトリックプレイから通常の再生状態に復帰する際に、カラオケ演奏に対するガイドボーカルなどの合成音声の遅れを極力小さくして、利用者に与える不快感を最小限にする技術を提供することにある。 The present invention has been made in view of such problems, and its purpose is to provide a guide to karaoke performance when returning from a so-called trick play such as pause, fast forward, and rewind to a normal playback state. It is an object of the present invention to provide a technique for minimizing the delay of synthesized speech such as vocals and minimizing discomfort to the user.

上記課題を解決するためになされた請求項１に係るカラオケ装置は、楽曲データに基づき、所定の演奏速度でカラオケ楽曲の演奏を行う楽曲演奏手段と、音声合成用データに基づき音声合成にて、前記カラオケ楽曲を歌唱する合成音声を生成し、生成した合成音声を前記楽曲演奏手段によるカラオケ楽曲の演奏とともに前記カラオケ楽曲の演奏速度と同じ値の速度で出力する音声合成処理手段と、前記楽曲演奏手段によるカラオケ楽曲の演奏、前記音声合成処理手段による合成音声の生成、および前記音声合成処理手段による合成音声の出力を制御する制御手段と、前記楽曲演奏手段によるカラオケ楽曲の演奏速度を一時変更する旨の変更指示またはその変更を解除する旨の解除指示を受け付ける指示受付手段と、を備える。 The karaoke apparatus according to claim 1, which has been made to solve the above-described problem, is based on music data, music performance means for performing karaoke music at a predetermined performance speed, and voice synthesis based on voice synthesis data. Voice synthesis processing means for generating a synthesized voice for singing the karaoke music, and outputting the generated synthesized voice at the same value as the performance speed of the karaoke music along with the performance of the karaoke music by the music performance means; Karaoke music performance by the means, generation of synthesized speech by the voice synthesis processing means, control means for controlling the output of the synthesized voice by the voice synthesis processing means, and temporarily changing the performance speed of the karaoke music by the music performance means Instruction accepting means for accepting a change instruction to the effect or a release instruction to cancel the change.

前記指示受付手段が前記変更指示を受け付けた際には、前記制御手段が、前記楽曲演奏手段によるカラオケ楽曲の演奏速度を変更するとともに、前記音声合成処理手段による合成音声の生成および前記音声合成処理手段による合成音声の出力を停止する。 When the instruction accepting means accepts the change instruction, the control means changes the performance speed of the karaoke music by the music playing means, and also generates synthesized speech and the speech synthesis process by the speech synthesis processing means. The output of the synthesized speech by the means is stopped.

その後、前記指示受付手段が前記解除指示を受け付けた際には、前記制御手段が、前記楽曲演奏手段によるカラオケ楽曲の演奏速度を元に戻すとともに、前記音声合成処理手段による合成音声の出力を前記楽曲演奏手段によるカラオケ楽曲の演奏に同期させるタイミングである合成音声同期タイミングを設定して、前記合成音声同期タイミングに合わせて前記音声合成処理手段による合成音声の生成および前記音声合成処理手段による合成音声の出力を行う。 Thereafter, when the instruction receiving means receives the release instruction, the control means restores the performance speed of the karaoke music by the music performance means and outputs the synthesized speech by the voice synthesis processing means. Synthetic voice synchronization timing, which is a timing to synchronize with the performance of the karaoke music by the music performance means, is set, generation of synthesized voice by the voice synthesis processing means and synthesis voice by the voice synthesis processing means in accordance with the synthetic voice synchronization timing Is output.

なお、前記音声合成処理手段による合成音声の生成が前記合成音声同期タイミングに間に合わなかった場合には、前記制御手段が、前記音声合成処理手段による合成音声の生成が前記楽曲演奏手段によるカラオケ楽曲の演奏に追いつくまで前記音声合成処理手段による合成音声の生成を前記カラオケ楽曲の演奏速度より大きい値の速度で行う。 In addition, when the synthesized speech generation by the speech synthesis processing means is not in time for the synthesized speech synchronization timing, the control means generates the synthesized speech by the speech synthesis processing means for the karaoke music piece by the music performance means. The synthesized speech is generated by the speech synthesis processing means at a speed larger than the performance speed of the karaoke piece until the performance is caught up.

このように構成された本発明のカラオケ装置によれば、一時停止や早送り、巻き戻しなどのいわゆるトリックプレイから通常の再生状態に復帰する際に、カラオケ演奏に対するガイドボーカルなどの合成音声の遅れを極力小さくして、利用者に与える不快感を最小限にすることができる。 According to the karaoke apparatus of the present invention configured in this way, when returning to a normal playback state from so-called trick play such as pause, fast forward, and rewind, the delay of synthesized speech such as guide vocals with respect to karaoke performance is reduced. It can be made as small as possible to minimize discomfort for the user.

また、請求項２に係るカラオケ装置は、請求項１に記載のカラオケ装置において、圧縮音声データに基づきカラオケ楽曲のデジタル音声信号を生成し、生成したデジタル音声信号を前記楽曲演奏手段によるカラオケ楽曲の演奏とともに前記カラオケ楽曲の演奏速度と同じ値の速度で出力する生音再生処理手段を備え、前記制御手段は、前記生音再生処理手段によるデジタル音声信号の生成、および前記生音再生処理手段によるデジタル音声信号の出力を制御する。 A karaoke apparatus according to claim 2 is the karaoke apparatus according to claim 1, wherein a digital audio signal of karaoke music is generated on the basis of the compressed audio data, and the generated digital audio signal is converted into a karaoke music by the music performance means. A live sound reproduction processing means that outputs the performance at the same speed as the performance speed of the karaoke music piece, and the control means generates a digital audio signal by the raw sound reproduction processing means, and a digital audio signal by the raw sound reproduction processing means Control the output of.

前記指示受付手段が前記変更指示を受け付けた際には、前記制御手段が、前記生音再生処理手段によるデジタル音声信号の生成および前記生音再生処理手段によるデジタル音声信号の出力を停止し、その後、前記指示受付手段が前記解除指示を受け付けた際には、前記制御手段が、前記生音再生処理手段によるデジタル音声信号の出力を前記楽曲演奏手段によるカラオケ楽曲の演奏に同期させるタイミングである音声信号同期タイミングを設定して、前記音声信号同期タイミングに合わせて前記生音再生処理手段によるデジタル音声信号の生成および前記生音再生処理手段によるデジタル音声信号の出力を行うが、前記合成音声同期タイミングについては、前記音声信号同期タイミングが設定された時刻よりも所定時間だけ遅らせた時刻に設定するようにしている。 When the instruction accepting unit accepts the change instruction, the control unit stops the generation of the digital audio signal by the raw sound reproduction processing unit and the output of the digital audio signal by the raw sound reproduction processing unit. When the instruction accepting unit accepts the release instruction, the control unit synchronizes the output of the digital audio signal by the live sound reproduction processing unit with the performance of the karaoke music by the music playing unit, and the audio signal synchronization timing. And generating the digital audio signal by the raw sound reproduction processing means and outputting the digital audio signal by the raw sound reproduction processing means in synchronization with the audio signal synchronization timing. Set the signal synchronization timing at a time delayed by a predetermined time from the set time. It is way.

このように構成された本発明のカラオケ装置によれば、前記音声合成処理手段による合成音声の生成が前記合成音声同期タイミングに間に合わなかった場合に、前記生音再生処理手段によるデジタル音声信号の生成が先行して行われ、デジタル音声信号の生成が開始され、そのデジタル音声信号の生成の終了後には、前記音声合成処理手段による合成音声の生成に注力されるので、前記音声合成処理手段による合成音声の生成が、前記楽曲演奏手段によるカラオケ楽曲の演奏に追いつくのが早まる。したがって、カラオケ演奏に対するガイドボーカルなどの合成音声の遅れを極力小さくして、利用者に与える不快感を最小限にすることができる。 According to the karaoke apparatus of the present invention configured as described above, when the synthesized speech generation by the speech synthesis processing unit is not in time for the synthesized speech synchronization timing, the digital sound signal is generated by the raw sound reproduction processing unit. The generation of the digital audio signal is started in advance, and after the end of the generation of the digital audio signal, the synthesized speech by the speech synthesis processing unit is focused on the generation of the synthesized speech by the speech synthesis processing unit. Is quick to catch up with the performance of karaoke music by the music playing means. Therefore, it is possible to minimize the delay of the synthesized voice such as the guide vocal with respect to the karaoke performance, and to minimize the discomfort given to the user.

また、請求項３に係るカラオケ装置は、請求項１または請求項２に記載のカラオケ装置において、前記制御手段は、前記音声合成処理手段による合成音声の生成が前記合成音声同期タイミングに間に合わなかった場合には、前記音声合成処理手段による合成音声の生成が前記楽曲演奏手段によるカラオケ楽曲の演奏に追いつくまで前記音声合成処理手段による合成音声の出力レベルを低減することを特徴とする。 The karaoke apparatus according to claim 3 is the karaoke apparatus according to claim 1 or 2, wherein the control means is unable to generate the synthesized voice by the voice synthesis processing means in time for the synthesized voice synchronization timing. In this case, the output level of the synthesized speech by the speech synthesis processing means is reduced until the generation of the synthesized speech by the speech synthesis processing means catches up with the performance of the karaoke music by the music performance means.

このように構成された本発明のカラオケ装置によれば、一時停止や早送り、巻き戻しなどのいわゆるトリックプレイから通常の再生状態に復帰する際に、カラオケ演奏に対するガイドボーカルなどの合成音声の遅れを利用者に極力感じさせないようにすることができ、利用者に与える不快感を最小限にすることができる。 According to the karaoke apparatus of the present invention configured in this way, when returning to a normal playback state from so-called trick play such as pause, fast forward, and rewind, the delay of synthesized speech such as guide vocals with respect to karaoke performance is reduced. The user can be prevented from feeling as much as possible, and the discomfort given to the user can be minimized.

また、本発明は、請求項４に示すように、楽曲データに基づき、所定の演奏速度でカラオケ楽曲の演奏を行う楽曲演奏処理と、音声合成用データに基づき音声合成にて、前記カラオケ楽曲を歌唱する合成音声を生成し、生成した合成音声を前記カラオケ楽曲の演奏とともに前記カラオケ楽曲の演奏速度と同じ値の速度で出力する音声合成処理と、前記カラオケ楽曲の演奏速度を一時変更する旨の変更指示またはその変更を解除する旨の解除指示を受け付ける指示受付処理と、前記変更指示を受け付けた際には、前記カラオケ楽曲の演奏速度を変更するとともに、前記合成音声の生成および前記合成音声の出力を停止し、その後、前記解除指示を受け付けた際には、前記カラオケ楽曲の演奏速度を元に戻すとともに、前記合成音声の出力を前記カラオケ楽曲の演奏に同期させるタイミングである合成音声同期タイミングを設定して、前記合成音声同期タイミングに合わせて前記合成音声の生成および前記合成音声の出力を行い、前記合成音声の生成が前記合成音声同期タイミングに間に合わなかった場合には、前記合成音声の生成が前記カラオケ楽曲の演奏に追いつくまで前記合成音声の生成を前記カラオケ楽曲の演奏速度より大きい値の速度で行う制御処理と、をコンピュータに実行させるためのプログラムとしても実現できる。 Further, according to the present invention, as shown in claim 4, the karaoke music is processed by music performance processing for performing karaoke music at a predetermined performance speed based on music data, and by voice synthesis based on voice synthesis data. A voice synthesis process for generating a synthesized voice to sing, outputting the generated synthesized voice at the same value as the performance speed of the karaoke music together with the performance of the karaoke music, and temporarily changing the performance speed of the karaoke music An instruction reception process for receiving a change instruction or a release instruction for canceling the change, and when the change instruction is received, the performance speed of the karaoke song is changed, and the generation of the synthesized voice and the generation of the synthesized voice When the output is stopped and then the release instruction is accepted, the performance speed of the karaoke music is restored and the output of the synthesized speech is Synthetic voice synchronization timing, which is a timing to synchronize with the performance of Laoke music, is set, and the synthetic voice is generated and the synthetic voice is output in accordance with the synthetic voice synchronization timing. A control process for generating the synthesized speech at a speed greater than the performance speed of the karaoke song until the synthesized speech generation catches up with the performance of the karaoke song when the synchronization timing is not met. It can also be realized as a program for execution.

実施形態のカラオケ装置１の概略構成を示すブロック図である。It is a block diagram which shows schematic structure of the karaoke apparatus 1 of embodiment. 入力受付処理の手順を示すフローチャートである。It is a flowchart which shows the procedure of an input reception process. ハードＭＩＤＩ演奏処理の手順を示すフローチャートである。It is a flowchart which shows the procedure of a hard MIDI performance process. 生音演奏処理の手順を示すフローチャートである。It is a flowchart which shows the procedure of a live sound performance process. ソフトＭＩＤＩ演奏処理の手順を示すフローチャートである。It is a flowchart which shows the procedure of a soft MIDI performance process. 歌声音声合成再生処理の手順を示すフローチャートである。It is a flowchart which shows the procedure of a singing voice synthetic | combination reproduction | regeneration process. メイン制御部１０が実行する各種処理を説明する説明図である。It is explanatory drawing explaining the various processes which the main control part 10 performs. メイン制御部１０が実行する各種処理を説明する説明図である。It is explanatory drawing explaining the various processes which the main control part 10 performs.

以下、本発明の一実施形態を図面に基づいて説明する。
［１．カラオケ装置１の構成の説明］
カラオケ装置１は、図１に示すように、メイン制御部１０、ハードディスクドライブ（ＨＤＤ）２０、音声処理モジュール３０、操作処理部４０、操作部４１、通信インタフェース部４２、マイクアンプ４３、アナログ−デジタル変換部（ＡＤＣ）４４、映像処理部４５等を備える。また、カラオケ装置１には、モニタ５０、スピーカアンプ５１、スピーカ５２、マイクロフォン５３が接続されている。 Hereinafter, an embodiment of the present invention will be described with reference to the drawings.
[1. Description of configuration of karaoke apparatus 1]
As shown in FIG. 1, the karaoke apparatus 1 includes a main control unit 10, a hard disk drive (HDD) 20, a sound processing module 30, an operation processing unit 40, an operation unit 41, a communication interface unit 42, a microphone amplifier 43, an analog-digital A conversion unit (ADC) 44, a video processing unit 45, and the like are provided. In addition, a monitor 50, a speaker amplifier 51, a speaker 52, and a microphone 53 are connected to the karaoke apparatus 1.

［１．１．メイン制御部１０およびＨＤＤ２０の構成の説明］
メイン制御部１０は、カラオケ装置１全体の制御を司る情報処理デバイスである。メイン制御部１０は、ＣＰＵ、ＲＯＭ、ＲＡＭ、Ｉ／Ｏ等を中心に構成されており、ＨＤＤ２０から読込んだプログラムやデータに基づき、種々の処理を実行する。 [1.1. Description of Main Control Unit 10 and HDD 20]
The main control unit 10 is an information processing device that controls the entire karaoke apparatus 1. The main control unit 10 is configured mainly with a CPU, ROM, RAM, I / O, and the like, and executes various processes based on programs and data read from the HDD 20.

ＨＤＤ２０は、カラオケの伴奏内容を示す楽曲データや歌詞を示す歌詞データを多数収録したカラオケデータベースや、カラオケ装置１の動作を制御するシステムプログラムやアプリケーションプログラム、各種の情報コンテンツの再生に用いられるデータ等の各種データを記憶する。ＨＤＤ２０が記憶する楽曲データには、ＭＩＤＩ規格で作成された伴奏楽音の演奏データ（ＭＩＤＩデータ）、伴奏の生音を所定の圧縮音声方式によりデジタルデータに変換した圧縮音声データ、ガイドボーカル用の楽譜データと音声合成用のパラメータと音素データとからなる音声合成用データ、が用意されている。 The HDD 20 is a karaoke database in which many pieces of music data indicating karaoke accompaniment contents and lyrics data indicating lyrics, system programs and application programs for controlling the operation of the karaoke apparatus 1, data used for reproducing various information contents, and the like. The various data are stored. The music data stored in the HDD 20 includes performance data (MIDI data) of accompaniment music generated by the MIDI standard, compressed audio data obtained by converting the accompaniment raw sound into digital data by a predetermined compressed audio method, and musical score data for guide vocals And speech synthesis data comprising speech synthesis parameters and phoneme data.

メイン制御部１０は、カラオケ楽曲の演奏に関する機能的構成として、生音再生処理部１１、ＭＩＤＩシーケンサ１２、ＭＩＤＩ演奏制御部１３、ソフトウェアＭＩＤＩ音源部１４および音声合成処理部１５を備える。これらの構成は、メイン制御部１０が実行するプログラムによって具現化される。 The main control unit 10 includes a live sound reproduction processing unit 11, a MIDI sequencer 12, a MIDI performance control unit 13, a software MIDI sound source unit 14, and a voice synthesis processing unit 15 as a functional configuration related to the performance of karaoke music. These configurations are realized by a program executed by the main control unit 10.

このうち、生音再生処理部１１は、ＨＤＤ２０から読込まれた圧縮音声データの楽曲データをデコードしてカラオケ楽曲を生成してカラオケ楽曲の演奏速度と同じ値の速度で出力する、いわゆる生音再生を行うための音楽再生機能である。生音再生処理部１１から出力される楽曲のデジタル音声信号は、音声処理モジュール３０のキーコントロール部３１に入力される。 Among these, the live sound reproduction processing unit 11 performs so-called live sound reproduction in which the music data of the compressed audio data read from the HDD 20 is decoded to generate karaoke music and output at the same value as the performance speed of the karaoke music. Music playback function. The digital audio signal of the music output from the raw sound reproduction processing unit 11 is input to the key control unit 31 of the audio processing module 30.

ＭＩＤＩシーケンサ１２は、音声処理モジュール３０に実装されたＭＩＤＩ音源３２に対して、ＭＩＤＩデータに基づく楽曲の演奏をさせるためのソフトウェアシーケンサである。このＭＩＤＩシーケンサ１２は、ＨＤＤ２０から読込まれたＭＩＤＩデータに基づき、音声処理モジュール３０のＭＩＤＩ音源３２に対して演奏制御データ（ＭＩＤＩメッセージ）を送出し、ＭＩＤＩ音源３２に楽曲の楽音を発生させる。発生された楽音は、本実施形態では楽音のＰＣＭデータとしている。 The MIDI sequencer 12 is a software sequencer for causing a MIDI sound source 32 mounted on the audio processing module 30 to play a musical piece based on MIDI data. The MIDI sequencer 12 sends performance control data (MIDI message) to the MIDI sound source 32 of the sound processing module 30 based on the MIDI data read from the HDD 20, and causes the MIDI sound source 32 to generate musical tones. In the present embodiment, the generated musical sound is PCM data of musical sound.

ＭＩＤＩ演奏制御部１３は、メイン制御部１０のソフトウェアＭＩＤＩ音源部１４に対して、ＭＩＤＩデータに基づく所定の演奏速度でのカラオケ楽曲の演奏をさせるためのソフトウェアシーケンサである。 The MIDI performance control unit 13 is a software sequencer for causing the software MIDI tone generator unit 14 of the main control unit 10 to play karaoke music at a predetermined performance speed based on MIDI data.

ソフトウェアＭＩＤＩ音源部１４は、メイン制御部１０のＣＰＵで実行されるプログラムによってＭＩＤＩ音源をシミュレートし、ＭＩＤＩデータに基づく楽音のデジタル音声信号（ＰＣＭデータ等）を出力するソフトウェアである。 The software MIDI sound source unit 14 is software that simulates a MIDI sound source by a program executed by the CPU of the main control unit 10 and outputs a digital sound signal (such as PCM data) of musical sound based on MIDI data.

音声合成処理部１５は、ガイドボーカル用の楽譜データと音声合成用のパラメータと音素データからなる音声合成用データに基づいて、楽曲についての理想的な歌唱音声を、音素データを提供したユーザの音声により実現するように音声合成した合成音声を生成して、カラオケ楽曲の演奏速度と同じ値の速度で出力するための音声合成エンジンを備える、ソフトウェアシーケンサである。 The voice synthesis processing unit 15 generates an ideal singing voice for a song based on the voice synthesis data composed of the score data for the guide vocal, the parameters for voice synthesis, and the phoneme data, and the voice of the user who provided the phoneme data. Is a software sequencer that includes a speech synthesis engine for generating synthesized speech synthesized so as to be realized by the above and outputting it at the same value as the performance speed of the karaoke music.

なお、楽譜データ、音声合成用パラメータと、音素データや波形データから、音声合成エンジンによって歌唱音声を生成する技術については、例えば、特開平０９−９０９６６号、特開２０１３―１１４１３１号、特開２０１３―１９０４７３号など、複数の先行技術が開示されており、既に周知であるため、ここでは詳細に述べない。 For example, Japanese Patent Application Laid-Open No. 09-90966, Japanese Patent Application Laid-Open No. 2013-114131, and Japanese Patent Application Laid-Open No. 2013-114131 are techniques for generating a singing voice by a voice synthesis engine from musical score data, speech synthesis parameters, phoneme data and waveform data. Since a plurality of prior arts such as -190473 are disclosed and already known, they will not be described in detail here.

本実施形態のカラオケ装置１では、圧縮音声データに基づく生音再生、ハードウェアＭＩＤＩ音源、ソフトウェアＭＩＤＩ音源、および音声合成用データに基づく合成音声の再生を同時に使用して、複数音源による楽曲の演奏を行うことを想定している。そのため、メイン制御部１０は、生音再生処理部１１、ＭＩＤＩシーケンサ１２、ＭＩＤＩ演奏制御部１３および音声合成処理部１５の各部の処理を並行して行う。 In the karaoke apparatus 1 of the present embodiment, a live music reproduction based on compressed audio data, a hardware MIDI sound source, a software MIDI sound source, and a synthetic sound reproduction based on data for speech synthesis are simultaneously used to play a music piece using a plurality of sound sources. Assumes to do. Therefore, the main control unit 10 performs the processes of the raw sound reproduction processing unit 11, the MIDI sequencer 12, the MIDI performance control unit 13, and the voice synthesis processing unit 15 in parallel.

なお、メイン制御部１０が、楽曲演奏手段、音声合成処理手段、生音再生処理手段、制御手段、指示受付手段に該当する。また、メイン制御部１０が行う処理が、楽曲演奏処理、音声合成処理、指示受付処理および制御処理に該当する。 The main control unit 10 corresponds to music performance means, speech synthesis processing means, raw sound reproduction processing means, control means, and instruction receiving means. The processing performed by the main control unit 10 corresponds to music performance processing, speech synthesis processing, instruction reception processing, and control processing.

［１．２．音声処理モジュール３０の構成の説明］
音声処理モジュール３０は、メイン制御部１０からの各音源に対応する入力信号、およびカラオケ装置１に接続されたマイクロフォン５３を通じて入力された歌唱音声のデジタル音声信号に対して、各種デジタル音声処理する音声制御デバイスである。音声処理モジュール３０は、キーコントロール部３１、ＭＩＤＩ音源３２、ミュートスイッチ部３３、エフェクト部３４、ミキサ３５、デジタルアナログ変換部（ＤＡＣ）３６、ミュートスイッチ部３７を備える。 [1.2. Explanation of Configuration of Audio Processing Module 30]
The sound processing module 30 performs various digital sound processing on the input signal corresponding to each sound source from the main control unit 10 and the digital sound signal of the singing sound input through the microphone 53 connected to the karaoke apparatus 1. It is a control device. The audio processing module 30 includes a key control unit 31, a MIDI sound source 32, a mute switch unit 33, an effect unit 34, a mixer 35, a digital / analog conversion unit (DAC) 36, and a mute switch unit 37.

キーコントロール部３１は、メイン制御部１０の生音再生処理部１１から出力された生音再生のデジタル音声信号に対して、操作部４１等を介してユーザにより設定された演奏音のキーの音高に合わせて再生音の音高を変更し、ミキサ３５に出力する。なお、ハードウェアＭＩＤＩ音源およびソフトウェアＭＩＤＩ音源におけるキーの音高の調節は、メイン制御部１０のＭＩＤＩシーケンサ１２およびＭＩＤＩ演奏制御部１３が行う。 The key control unit 31 adjusts the pitch of the performance sound key set by the user via the operation unit 41 or the like with respect to the digital sound signal of the live sound output from the live sound playback processing unit 11 of the main control unit 10. At the same time, the pitch of the reproduced sound is changed and output to the mixer 35. The key sequence of the hardware MIDI tone generator and the software MIDI tone generator is adjusted by the MIDI sequencer 12 and the MIDI performance controller 13 of the main controller 10.

ＭＩＤＩ音源３２は、ＭＩＤＩシーケンサ１２から送出されるＭＩＤＩメッセージに基づいて、楽曲の楽音のデジタル音声信号を生成し、ミキサ３５に出力する。ＭＩＤＩ音源３２の機能は、カラオケ装置等に実装される周知のハードウェアＭＩＤＩ音源と同様である。 The MIDI sound source 32 generates a digital audio signal of musical tone based on the MIDI message sent from the MIDI sequencer 12 and outputs it to the mixer 35. The function of the MIDI sound source 32 is the same as that of a well-known hardware MIDI sound source mounted on a karaoke apparatus or the like.

ミュートスイッチ部３３は、メイン制御部１０のソフトウェアＭＩＤＩ音源部１４から出力されるデジタル音声信号の出力経路の途中にあるスイッチである。ミュートスイッチ部３３は、ＭＩＤＩ演奏制御部１３からの制御に応じて、ソフトウェアＭＩＤＩ音源部１４からのデジタル音声信号のミキサ３５への入力を入り切りする。 The mute switch unit 33 is a switch in the middle of the output path of the digital audio signal output from the software MIDI tone generator unit 14 of the main control unit 10. The mute switch unit 33 turns on / off the input of the digital audio signal from the software MIDI tone generator unit 14 to the mixer 35 in accordance with the control from the MIDI performance control unit 13.

エフェクト部３４は、ＡＤＣ４４によって変換されたマイク入力音のデジタル音声信号に対して、エコー等の音響効果を付加してミキサ３５に出力する。
ミキサ３５は、キーコントロール部３１、ＭＩＤＩ音源３２、ミュートスイッチ部３３、エフェクト部３４およびミュートスイッチ部３７の各部から入力された複数音源のデジタル音声信号を、音量や音色等のバランスを調節してミキシングし、ミキシングしたデジタル音声信号をＤＡＣ３６に出力する。 The effect unit 34 adds an acoustic effect such as echo to the digital audio signal of the microphone input sound converted by the ADC 44, and outputs it to the mixer 35.
The mixer 35 adjusts the balance of the volume and tone of the digital audio signals of a plurality of sound sources input from the key control unit 31, the MIDI sound source 32, the mute switch unit 33, the effect unit 34, and the mute switch unit 37. Mixing is performed, and the mixed digital audio signal is output to the DAC 36.

ＤＡＣ３６は、ミキサ３５によりミキシングされて生成されたデジタル音声信号を、アナログ音声信号に変換する。ＤＡＣ３６により変換されたアナログ音声信号は、カラオケ装置１に接続されたスピーカアンプ５１によって増幅され、スピーカアンプ５１に接続されたスピーカ５２から放音される。 The DAC 36 converts the digital audio signal generated by mixing by the mixer 35 into an analog audio signal. The analog audio signal converted by the DAC 36 is amplified by the speaker amplifier 51 connected to the karaoke apparatus 1 and emitted from the speaker 52 connected to the speaker amplifier 51.

ミュートスイッチ部３７は、メイン制御部１０の音声合成処理部１５から出力される合成音声の出力経路の途中にあるスイッチである。ミュートスイッチ部３７は、音声合成処理部１５からの制御に応じて、音声合成処理部１５からの合成音声のミキサ３５への入力を入り切りする。 The mute switch unit 37 is a switch in the middle of the output path of the synthesized speech output from the speech synthesis processing unit 15 of the main control unit 10. The mute switch unit 37 turns on and off the input of the synthesized speech from the speech synthesis processing unit 15 to the mixer 35 in accordance with the control from the speech synthesis processing unit 15.

［１．３．カラオケ装置１の他の構成等の説明］
操作部４１は、演奏するカラオケ楽曲を指定する操作や、演奏中にキーの音高を変更する操作、演奏速度を一時変更する操作（変更指示）、演奏速度の一時変更を解除する操作（解除指示）等の各種操作を行うための入力装置である。なお、演奏速度を一時変更する操作としては、一時停止、早送り、巻き戻しなどのいわゆるトリックプレイが挙げられる。操作処理部４０は、操作部４１からの信号を処理してメイン制御部１０に入力する。通信インタフェース部４２は、カラオケ装置１をＬＡＮ１００に接続し、他のカラオケ装置や、リモコン端末、外部のサーバ等の情報処理装置との間で通信を行うための構成である。 [1.3. Description of other configurations of karaoke apparatus 1]
The operation unit 41 is an operation for designating a karaoke piece to be played, an operation for changing the pitch of a key during the performance, an operation for temporarily changing the performance speed (change instruction), and an operation for canceling the temporary change of the performance speed (cancellation) This is an input device for performing various operations such as instructions. The operation for temporarily changing the performance speed includes so-called trick play such as pause, fast forward, and rewind. The operation processing unit 40 processes a signal from the operation unit 41 and inputs it to the main control unit 10. The communication interface unit 42 is configured to connect the karaoke device 1 to the LAN 100 and perform communication with other karaoke devices, a remote control terminal, and an information processing device such as an external server.

マイクアンプ４３は、カラオケ装置１に接続されるマイクロフォン５３から入力される歌唱者の音声（アナログ音声信号）を増幅する増幅器である。
ＡＤＣ４４は、マイクアンプ４３からのアナログ音声信号を、デジタル音声信号に変換する変換器である。 The microphone amplifier 43 is an amplifier that amplifies the singer's voice (analog voice signal) input from the microphone 53 connected to the karaoke apparatus 1.
The ADC 44 is a converter that converts an analog audio signal from the microphone amplifier 43 into a digital audio signal.

映像処理部４５は、画像情報を映像化するグラフィックエンジンや圧縮形式の映像データを再生するデコーダからなる映像処理装置である。映像処理部４５によって映像化された画像情報は、カラオケ装置１に接続されたモニタ５０に表示される。 The video processing unit 45 is a video processing device including a graphic engine that converts image information into a video image and a decoder that reproduces compressed video data. The image information visualized by the video processing unit 45 is displayed on the monitor 50 connected to the karaoke apparatus 1.

なお、カラオケ装置１におけるその他の機能や構成については公知技術に従っているので、ここでの詳細な説明は省略する。
［２．メイン制御部１０の各種処理の説明］
次に、メイン制御部１０が実行する各種処理について、以下に順に説明する。なお、これら各種処理では、早送りの操作入力を受け付けた場合を例に説明するが、早送りの操作入力だけでなく、巻き戻しや一時停止などの他のトリックプレイの操作入力を受け付けた場合も同様である。 In addition, since the other function and structure in the karaoke apparatus 1 are based on a well-known technique, detailed description here is abbreviate | omitted.
[2. Description of various processes of main control unit 10]
Next, various processes executed by the main control unit 10 will be described in order below. In these various processes, a case where a fast-forward operation input is accepted will be described as an example. However, not only a fast-forward operation input but also other trick play operation inputs such as rewind and pause are accepted. It is.

［２．１．入力受付処理の説明］
メイン制御部１０が実行する入力受付処理の手順について、図２のフローチャートを参照しながら説明する。この処理は、メイン制御部１０が実行する他の処理と同時に並行して実行される。 [2.1. Explanation of input reception process]
The procedure of the input reception process executed by the main control unit 10 will be described with reference to the flowchart of FIG. This process is executed in parallel with other processes executed by the main control unit 10.

最初のステップＳ１１０では、操作部４１に対して早送りを実行する旨の入力（早送り入力）があったか否かを判断する。早送り入力があったと判断された場合には（Ｓ１１０：ＹＥＳ）、Ｓ１２０に移行する。一方、早送り入力がなかったと判断された場合には（Ｓ１１０：ＮＯ）、Ｓ１３０に移行する。 In the first step S110, it is determined whether or not there is an input (fast-forward input) for executing fast-forwarding to the operation unit 41. When it is determined that there is a fast-forward input (S110: YES), the process proceeds to S120. On the other hand, if it is determined that there is no fast-forward input (S110: NO), the process proceeds to S130.

Ｓ１２０では、ハードＭＩＤＩ演奏処理、生音演奏処理、ソフトＭＩＤＩ演奏処理および歌声音声合成再生処理に、早送り入力があった旨を通知する。その後、Ｓ１１０に移行する。 In S120, notification is made that a fast-forward input has been made in the hard MIDI performance processing, live sound performance processing, soft MIDI performance processing, and singing voice synthesis reproduction processing. Thereafter, the process proceeds to S110.

Ｓ１３０では、操作部４１に対して早送りを解除する旨の入力（早送り解除）があったか否かを判断する。早送り解除があったと判断された場合には（Ｓ１３０：ＹＥＳ）、Ｓ１４０に移行する。一方、早送り解除がなかったと判断された場合には（Ｓ１３０：ＮＯ）、早送りの入力解除があるまで待機するため、Ｓ１１０に移行する。 In S <b> 130, it is determined whether or not an input for canceling fast-forwarding (fast-forwarding cancellation) has been made to the operation unit 41. If it is determined that fast-forwarding has been canceled (S130: YES), the process proceeds to S140. On the other hand, if it is determined that the fast-forwarding has not been canceled (S130: NO), the process proceeds to S110 in order to wait until the fast-forwarding input is canceled.

Ｓ１４０では、生音本数（生音再生処理部１１による楽曲のデジタル音声信号の同時出力数）に応じてソフトＭＩＤＩ演奏用の復帰シーク位置を算出する。なお、ソフトＭＩＤＩ演奏用の復帰シーク位置とは、ソフトウェアＭＩＤＩ音源部１４によるＭＩＤＩデータに基づく楽音のデジタル音声信号の出力をＭＩＤＩ音源３２による楽音の演奏（カラオケ楽曲の演奏）に同期させるタイミングのことである。ここでは、ソフトＭＩＤＩ演奏用の復帰シーク位置を、現在の演奏位置から、生音の本数と予め実験等により設定した所定値とを掛け合わせた値だけ遅らせた位置として算出する。その後、Ｓ１５０に移行する。 In S140, a return seek position for soft MIDI performance is calculated according to the number of raw sounds (the number of simultaneous output of digital audio signals of music by the raw sound reproduction processing unit 11). Note that the return seek position for soft MIDI performance refers to the timing at which the output of the digital sound signal of the musical sound based on the MIDI data by the software MIDI tone generator unit 14 is synchronized with the musical performance (karaoke music performance) by the MIDI sound source 32. It is. Here, the return seek position for soft MIDI performance is calculated as a position delayed from the current performance position by a value obtained by multiplying the number of raw sounds by a predetermined value set in advance through experiments or the like. Thereafter, the process proceeds to S150.

Ｓ１５０では、Ｓ１４０で算出したソフトＭＩＤＩ演奏用の復帰シーク位置を基に、生音演奏用の復帰シーク位置（音声信号同期タイミング）を算出する。なお、生音演奏用の復帰シーク位置とは、生音再生処理部１１による楽曲のデジタル音声信号の出力をＭＩＤＩ音源３２による楽音の演奏（カラオケ楽曲の演奏）に同期させるタイミングのことである。ここでは、生音演奏用の復帰シーク位置を、ソフトＭＩＤＩ演奏用の復帰シーク位置から、予め実験等により設定した所定値だけ遅らせた位置として算出する。その後、Ｓ１６０に移行する。 In S150, based on the return seek position for soft MIDI performance calculated in S140, a return seek position (audio signal synchronization timing) for live sound performance is calculated. The return seek position for the live sound performance is a timing at which the output of the digital audio signal of the music by the live sound reproduction processing unit 11 is synchronized with the performance of the musical sound by the MIDI sound source 32 (the performance of the karaoke music). Here, the return seek position for live sound performance is calculated as a position delayed from the return seek position for soft MIDI performance by a predetermined value set in advance through experiments or the like. Thereafter, the process proceeds to S160.

Ｓ１６０では、ソフトＭＩＤＩ演奏用の復帰シーク位置を基に、歌唱合成演奏用の復帰シーク位置（合成音声同期タイミング）を算出する。なお、歌唱合成演奏用の復帰シーク位置とは、音声合成処理部１５による合成音声の出力をＭＩＤＩ音源３２による楽音の演奏（カラオケ楽曲の演奏）に同期させるタイミングのことである。ここでは、歌唱合成演奏用の復帰シーク位置を、ソフトＭＩＤＩ演奏用の復帰シーク位置から、予め実験等により設定した所定値だけ遅らせた位置として算出する。なお、この歌唱合成演奏用の復帰シーク位置については、生音演奏用の復帰シーク位置が設定された時刻よりも所定時間だけ遅らせた時刻に設定されるようにする。その後、Ｓ１７０に移行する。 In S160, based on the return seek position for soft MIDI performance, the return seek position (synthetic voice synchronization timing) for singing synthesized performance is calculated. Note that the return seek position for the singing synthesis performance is a timing at which the output of the synthesized speech by the speech synthesis processing unit 15 is synchronized with the musical performance (the performance of karaoke music) by the MIDI sound source 32. Here, the return seek position for the singing synthesis performance is calculated as a position delayed from the return seek position for the soft MIDI performance by a predetermined value set in advance through experiments or the like. Note that the return seek position for the singing synthesis performance is set to a time delayed by a predetermined time from the time when the return seek position for the live sound performance is set. Thereafter, the process proceeds to S170.

Ｓ１７０では、ハードＭＩＤＩ演奏処理、生音演奏処理、ソフトＭＩＤＩ演奏処理および歌声音声合成再生処理の各処理に、早送り解除があった旨を通知するとともに、Ｓ１４０で算出したソフトＭＩＤＩ演奏用の復帰シーク位置をソフトＭＩＤＩ演奏処理に通知し、Ｓ１５０で算出した生音演奏用の復帰シーク位置を生音演奏処理に通知し、Ｓ１６０で算出した歌唱合成演奏用の復帰シーク位置を歌声音声合成再生処理に通知する。その後、次の早送りの入力に備えるため、Ｓ１１０に移行する。 In S170, the hard MIDI performance process, the live sound performance process, the soft MIDI performance process, and the singing voice synthesis / playback process are notified that the fast-forwarding has been canceled, and the return seek position for the soft MIDI performance calculated in S140. Is transmitted to the soft MIDI performance process, the return seek position for the live sound performance calculated in S150 is notified to the live sound performance process, and the return seek position for the song synthesis performance calculated in S160 is notified to the singing voice synthesis reproduction process. Thereafter, in order to prepare for the next fast-forward input, the process proceeds to S110.

［２．２．ハードＭＩＤＩ演奏処理の説明］
次に、メイン制御部１０が実行するハードＭＩＤＩ演奏処理の手順について、図３のフローチャートを参照しながら説明する。この処理は、メイン制御部１０が実行する他の処理と同時に並行して実行される。 [2.2. Explanation of hard MIDI performance processing]
Next, the procedure of the hard MIDI performance process executed by the main control unit 10 will be described with reference to the flowchart of FIG. This process is executed in parallel with other processes executed by the main control unit 10.

最初のステップＳ２１０では、入力受付処理から早送り入力の通知があったか否かを判断する。早送り入力の通知があったと判断された場合には（Ｓ２１０：ＹＥＳ）、カラオケ楽曲の演奏速度を早送りの速度に変更し、Ｓ２２０に移行する。一方、早送り入力の通知がなかったと判断された場合には（Ｓ２１０：ＮＯ）、Ｓ２４０に移行する。 In the first step S210, it is determined whether or not a fast-forward input notification has been received from the input reception process. If it is determined that there is a notification of fast-forward input (S210: YES), the performance speed of the karaoke music is changed to the fast-forward speed, and the process proceeds to S220. On the other hand, when it is determined that there is no notification of fast-forward input (S210: NO), the process proceeds to S240.

Ｓ２２０では、入力受付処理から早送り解除の通知があったか否かを判断する。早送り解除の通知があったと判断された場合には（Ｓ２２０：ＹＥＳ）、Ｓ２４０に移行する。一方、早送り解除の通知がなかったと判断された場合には（Ｓ２２０：ＮＯ）、Ｓ２３０に移行する。 In S220, it is determined whether or not there is a notification of canceling fast-forward from the input reception process. When it is determined that there is a notification of canceling fast-forwarding (S220: YES), the process proceeds to S240. On the other hand, if it is determined that there is no fast-forwarding cancellation notification (S220: NO), the process proceeds to S230.

Ｓ２３０では、ＭＩＤＩ音源３２による楽音の演奏（カラオケ楽曲の演奏）を早送り時の演奏速度にて行う早送り演奏を実行する。その後、早送り解除に備えるため、Ｓ２１０に移行する。 In S230, a fast-forward performance is performed in which a musical tone performance (karaoke music performance) by the MIDI sound source 32 is performed at a performance speed at the time of fast-forward. Thereafter, in order to prepare for canceling fast-forwarding, the process proceeds to S210.

Ｓ２４０では、生音演奏処理によって生音系の演奏、具体的には、生音再生部１１からのデータの出力、ソフトウェアＭＩＤＩ音源部１１からのデータの出力、もしくは、歌声合成部１５からのデータの出力が行われているか否かを判断する。生音系の演奏が行われていないと判断された場合には（Ｓ２４０：ＮＯ）、Ｓ２５０に移行する。一方、生音系の演奏が行われていると判断された場合には（Ｓ２４０：ＹＥＳ）、Ｓ２６０に移行する。 In S240, live sound performance is performed by the live sound performance processing, specifically, output of data from the live sound reproduction unit 11, output of data from the software MIDI sound source unit 11, or output of data from the singing voice synthesis unit 15 is performed. Determine if it is done. When it is determined that a live sound performance is not being performed (S240: NO), the process proceeds to S250. On the other hand, if it is determined that a live sound performance is being performed (S240: YES), the process proceeds to S260.

なお、生音再生部１１からのデータ出力、ソフトウェアＭＩＤＩ音源部１１からのデータ出力、もしくは、歌声合成部１５からのデータ出力を行うために、各データを生成する処理については後述する。 A process for generating each data in order to perform data output from the raw sound reproduction unit 11, data output from the software MIDI sound source unit 11, or data output from the singing voice synthesis unit 15 will be described later.

Ｓ２５０では、ＭＩＤＩ音源３２による楽音の演奏（カラオケ楽曲の演奏）を再生時の演奏速度にて行う定速演奏を実行する。その後、Ｓ２１０に移行する。
Ｓ２６０では、ＭＩＤＩ音源３２による楽音の演奏（カラオケ楽曲の演奏）を再生時の演奏速度にて行う定速演奏を、生音演奏処理によって行われている生音系の演奏と同期させながら実行する。その後、Ｓ２１０に移行する。 In S250, a constant speed performance is performed in which a musical tone performance (karaoke music performance) by the MIDI sound source 32 is performed at a performance speed at the time of reproduction. Thereafter, the process proceeds to S210.
In S260, a constant speed performance in which a musical sound performance (karaoke music performance) is played by the MIDI sound source 32 at a performance speed at the time of reproduction is executed in synchronization with a live sound performance performed by the live sound performance processing. Thereafter, the process proceeds to S210.

［２．３．生音演奏処理の説明］
次に、メイン制御部１０が実行する生音演奏処理の手順について、図４のフローチャートを参照しながら説明する。この処理は、メイン制御部１０が実行する他の処理と同時に並行して実行される。 [2.3. Explanation of live sound performance processing]
Next, the procedure of the live sound performance process executed by the main control unit 10 will be described with reference to the flowchart of FIG. This process is executed in parallel with other processes executed by the main control unit 10.

最初のステップＳ３１０では、入力受付処理から早送り入力の通知があったか否かを判断する。早送り入力の通知があったと判断された場合には（Ｓ３１０：ＹＥＳ）、生音再生を停止し、Ｓ３２０に移行する。一方、早送り入力の通知がなかったと判断された場合には（Ｓ３１０：ＮＯ）、Ｓ３５０に移行する。 In the first step S310, it is determined whether or not a fast-forward input notification has been received from the input reception process. If it is determined that there is a fast-forward input notification (S310: YES), the live sound reproduction is stopped and the process proceeds to S320. On the other hand, if it is determined that there is no notification of fast-forward input (S310: NO), the process proceeds to S350.

Ｓ３２０では、入力受付処理から早送り解除の通知があったか否かを判断する。早送り解除の通知がなかったと判断された場合には（Ｓ３２０：ＮＯ）、早送り解除の通知がなされるまで待機するために当該Ｓ３２０を再度実行する。一方、早送り解除の通知があったと判断された場合には（Ｓ３２０：ＹＥＳ）、Ｓ３３０に移行する。 In S320, it is determined whether or not a notification of canceling fast-forwarding has been received from the input reception process. If it is determined that there is no notification of canceling fast-forwarding (S320: NO), S320 is executed again to wait until notification of canceling fast-forwarding is made. On the other hand, when it is determined that there is a notification of canceling fast-forwarding (S320: YES), the process proceeds to S330.

Ｓ３３０では、入力受付処理から生音演奏用の復帰シーク位置を受け取る。その後、Ｓ３４０に移行する。
Ｓ３４０では、Ｓ３３０で受け取った生音演奏用の復帰シーク位置からのデコードを準備する。これは、理想的な演奏動作では演奏を開始（再開）したらすぐに演奏音が発生するが（図７（ａ）参照）、生音の場合、実際には演奏を開始（再開）してもすぐには演奏音は発生せず、タイムラグがあるからである（図７（ｂ）参照）。その後、Ｓ３５０に移行する。 In S330, a return seek position for live sound performance is received from the input reception process. Thereafter, the process proceeds to S340.
In S340, the decoding from the return seek position for the live sound performance received in S330 is prepared. This is because, in an ideal performance operation, a performance sound is generated as soon as the performance is started (restarted) (see FIG. 7A), but in the case of a live sound, it is actually immediately after the performance is started (restarted). This is because there is no performance sound and there is a time lag (see FIG. 7B). Thereafter, the process proceeds to S350.

Ｓ３５０では、Ｓ３１０から移行した場合には現在の演奏位置から生音データ（圧縮音声データの楽曲データ）を順次デコードして、ＰＣＭ（ＰｕｌｓｅＣｏｄｅＭｏｄｕｌａｔｕｉｏｎ）データを生成し、Ｓ３４０から移行した場合には生音演奏用の復帰シーク位置から生音データ（圧縮音声データの楽曲データ）をデコードし、ＰＣＭデータを生成する。その後、Ｓ３６０に移行する。 In S350, when moving from S310, raw sound data (composition data of compressed audio data) is sequentially decoded from the current performance position to generate PCM (Pulse Code Modulation) data. When moving from S340, the raw sound is generated. The raw sound data (composition data of compressed audio data) is decoded from the return seek position for performance, and PCM data is generated. Thereafter, the process proceeds to S360.

Ｓ３６０では、（Ｓ３５０で生音データを順次デコードすることで生成したＰＣＭデータ）生成データを、音声処理モジュール３０に渡す。その後、Ｓ３１０に移行する。
［２．４．ソフトＭＩＤＩ演奏処理の説明］
次に、メイン制御部１０が実行するソフトＭＩＤＩ演奏処理の手順について、図５のフローチャートを参照しながら説明する。この処理は、メイン制御部１０が実行する他の処理と同時に並行して実行される。 In S360, the generated data (PCM data generated by sequentially decoding the raw sound data in S350) is passed to the audio processing module 30. Thereafter, the process proceeds to S310.
[2.4. Explanation of soft MIDI performance processing]
Next, the procedure of the soft MIDI performance process executed by the main control unit 10 will be described with reference to the flowchart of FIG. This process is executed in parallel with other processes executed by the main control unit 10.

最初のステップＳ４１０では、入力受付処理から早送り入力の通知があったか否かを判断する。早送り入力の通知があったと判断された場合には（Ｓ４１０：ＹＥＳ）、カラオケ楽曲の演奏速度を早送りの速度に変更し、Ｓ４２０に移行する。一方、早送り入力の通知がなかったと判断された場合には（Ｓ４１０：ＮＯ）、Ｓ４５０に移行する。 In the first step S410, it is determined whether or not a fast-forward input notification has been received from the input reception process. If it is determined that there is a notification of fast-forward input (S410: YES), the performance speed of the karaoke music is changed to the fast-forward speed, and the process proceeds to S420. On the other hand, if it is determined that there is no notification of fast-forward input (S410: NO), the process proceeds to S450.

Ｓ４２０では、入力受付処理から早送り解除の通知があったか否かを判断する。早送り解除の通知がなかったと判断された場合には（Ｓ４２０：ＮＯ）、早送り解除の通知がなされるまで待機するために当該Ｓ４２０を再度実行する。一方、早送り解除の通知があったと判断された場合には（Ｓ４２０：ＹＥＳ）、Ｓ４３０に移行する。 In S420, it is determined whether or not there is a notification of canceling fast-forwarding from the input reception process. If it is determined that there is no notification of canceling fast-forwarding (S420: NO), S420 is executed again to wait until notification of canceling fast-forwarding is made. On the other hand, when it is determined that there is a notification of canceling fast-forwarding (S420: YES), the process proceeds to S430.

Ｓ４３０では、入力受付処理からソフトＭＩＤＩ演奏用の復帰シーク位置を受け取る。その後、Ｓ４４０に移行する。
Ｓ４４０では、Ｓ４３０で受け取ったソフトＭＩＤＩ演奏用の復帰シーク位置からの解釈・レンダリングを準備する。その後、Ｓ４５０に移行する。 In S430, a return seek position for soft MIDI performance is received from the input reception process. Thereafter, the process proceeds to S440.
In S440, interpretation / rendering from the return seek position for soft MIDI performance received in S430 is prepared. Thereafter, the process proceeds to S450.

Ｓ４５０では、Ｓ４１０から移行した場合には現在の演奏位置からＭＩＤＩデータを解釈してレンダリングを行うことで生成した、ソフトＭＩＤＩによる楽音のＰＣＭデータを生成し、一方、Ｓ４４０から移行した場合にはソフトＭＩＤＩ演奏用の復帰シーク位置からＭＩＤＩデータを解釈してレンダリングを行うことで生成した、ソフトＭＩＤＩによる楽音のＰＣＭデータを生成する。その後、Ｓ４６０に移行する。 In S450, when moving from S410, musical tone PCM data generated by software MIDI generated by interpreting and rendering MIDI data from the current performance position is generated. On the other hand, when moving from S440, software is generated. Musical tone PCM data generated by software MIDI generated by interpreting and rendering MIDI data from the return seek position for MIDI performance is generated. Thereafter, the process proceeds to S460.

Ｓ４６０では、（Ｓ４５０でＭＩＤＩデータを解釈してレンダリングを行うことで生成したＰＣＭデータ）生成データを音声処理モジュール３０に渡す。その後、Ｓ４１０に移行する。 In S460, the generated data is passed to the audio processing module 30 (PCM data generated by interpreting the MIDI data in S450 and rendering). Thereafter, the process proceeds to S410.

［２．５．歌声音声合成再生処理の説明］
次に、メイン制御部１０が実行する歌声音声合成再生処理の手順について、図６のフローチャートを参照しながら説明する。この処理は、メイン制御部１０が実行する他の処理と同時に並行して実行される。 [2.5. Explanation of singing voice synthesis playback process]
Next, the procedure of the singing voice synthesis and reproduction process executed by the main control unit 10 will be described with reference to the flowchart of FIG. This process is executed in parallel with other processes executed by the main control unit 10.

最初のステップＳ５１０では、入力受付処理から早送り入力の通知があったか否かを判断する。早送り入力の通知があったと判断された場合には（Ｓ５１０：ＹＥＳ）、合成音声の再生を停止し、Ｓ５２０に移行する。一方、早送り入力の通知がなかったと判断された場合には（Ｓ５１０：ＮＯ）、Ｓ５６０に移行する。 In first step S510, it is determined whether or not there is a notification of fast-forward input from the input reception process. If it is determined that a fast-forward input has been notified (S510: YES), the reproduction of the synthesized speech is stopped, and the process proceeds to S520. On the other hand, if it is determined that there is no notification of fast-forward input (S510: NO), the process proceeds to S560.

Ｓ５２０では、入力受付処理から早送り解除の通知があったか否かを判断する。早送り解除の通知がなかったと判断された場合には（Ｓ５２０：ＮＯ）、早送り解除の通知がなされるまで待機するために当該Ｓ５２０を再度実行する。一方、早送り解除の通知があったと判断された場合には（Ｓ５２０：ＹＥＳ）、Ｓ５３０に移行する。 In S520, it is determined whether or not there is a notification of fast-forward cancellation from the input reception process. If it is determined that there is no notification of canceling fast-forwarding (S520: NO), S520 is executed again in order to wait until notification of canceling fast-forwarding is made. On the other hand, when it is determined that there is a notification of canceling fast-forwarding (S520: YES), the process proceeds to S530.

Ｓ５３０では、入力受付処理から歌唱音声合成用の復帰シーク位置を受け取る。その後、Ｓ５４０に移行する。
Ｓ５４０では、Ｓ５３０で受け取った歌唱音声合成用の復帰シーク位置からの歌唱音声合成を準備する。これは、理想的な演奏動作では演奏を開始（再開）したらすぐに演奏音が発生するが（図７（ａ）参照）、音声合成の場合、生音の場合と同様に、実際には演奏を開始（再開）してもすぐには演奏音は発生せず、タイムラグがあるからである（図７（ｂ）参照）。その後、Ｓ５５０に移行する。 In S530, a return seek position for singing voice synthesis is received from the input reception process. Thereafter, the process proceeds to S540.
In S540, singing voice synthesis from the return seek position for singing voice synthesis received in S530 is prepared. This is because, in an ideal performance operation, a performance sound is generated as soon as the performance is started (resumed) (see FIG. 7A). However, in the case of speech synthesis, the performance is actually performed as in the case of the live sound. This is because no performance sound is generated immediately after starting (resuming), and there is a time lag (see FIG. 7B). Thereafter, the process proceeds to S550.

Ｓ５５０では、全体の進み具合から遅れているか否かを判断する。具体的には、カラオケ楽曲の演奏位置と合成音声の再生位置とを比較して、合成音声の再生位置がカラオケ楽曲の演奏位置から遅れているか否かを判断する。全体の進み具合から遅れていないと判断された場合には（Ｓ５５０：ＮＯ）、Ｓ５６０に移行する。一方、全体の進み具合から遅れていると判断された場合には（Ｓ５５０：ＹＥＳ）、Ｓ５８０に移行する。 In S550, it is determined whether or not it is delayed from the overall progress. Specifically, the performance position of the karaoke music is compared with the playback position of the synthesized voice, and it is determined whether or not the playback position of the synthesized voice is delayed from the performance position of the karaoke music. If it is determined that there is no delay from the overall progress (S550: NO), the process proceeds to S560. On the other hand, when it is determined that the overall progress is delayed (S550: YES), the process proceeds to S580.

また、この処理で上述のように全体の進み具合から遅れていると判断された場合には（Ｓ５５０：ＹＥＳ）、音声処理モジュール３０のミュートスイッチ部３７がオンされる。なお、ミュートスイッチ部３７のオン、オフ制御についての詳細は後述する。 If it is determined that the process is delayed from the overall progress as described above (S550: YES), the mute switch unit 37 of the audio processing module 30 is turned on. Details of the on / off control of the mute switch unit 37 will be described later.

Ｓ５６０では、Ｓ５１０から移行した場合には現在の演奏位置から楽譜・歌詞データ（音声合成用データ）を解釈して、音声合成用のパラメータと、音素データと、から歌声音声を等速（カラオケ演奏の演奏速度と同じ）で音声合成する。同様に、Ｓ５５０から移行した場合にも、歌唱音声合成用の復帰シーク位置から楽譜・歌詞データ（音声合成用データ）を解釈して、音声合成用のパラメータと、音素データと、から歌声音声を等速（カラオケ演奏の演奏速度と同じ）で合成する。その後、Ｓ５７０に移行する。 In S560, in the case of shifting from S510, the score / lyric data (speech synthesis data) is interpreted from the current performance position, and the singing voice is converted at a constant speed (karaoke performance) from the parameters for speech synthesis and the phoneme data. (Same as the performance speed of). Similarly, in the case of the transition from S550, the score / lyric data (speech synthesis data) is interpreted from the return seek position for singing speech synthesis, and the singing voice is obtained from the speech synthesis parameters and the phoneme data. Synthesize at constant speed (same as the performance speed of karaoke performance). Thereafter, the process proceeds to S570.

Ｓ５７０では、（Ｓ５６０で楽譜・歌詞データを解釈して歌声音声を等速で音声合成によって生成した、歌声音声のＰＣＭデータである）生成データを、音声処理モジュール３０に渡す。その後、Ｓ５１０に移行する。 In S570, the generated data (which is PCM data of singing voice generated by synthesizing the singing voice at the same speed by interpreting the score / lyric data in S560) is passed to the voice processing module 30. Thereafter, the process proceeds to S510.

Ｓ５８０では、楽譜・歌詞データ（音声合成用データ）を解釈して歌声音声を最速（カラオケ演奏の演奏速度よりも大きい値の速度にて）で音声合成して、合成された歌声音声の生成データを生成して出力し、生成データを音声処理モジュールに渡す。その後、Ｓ５５０に移行する。 In S580, the score / lyric data (speech synthesis data) is interpreted, and the singing voice is synthesized at the fastest speed (at a speed larger than the performance speed of the karaoke performance), and the synthesized singing voice generation data is generated. Is generated and output, and the generated data is passed to the voice processing module. Thereafter, the process proceeds to S550.

このとき、メイン制御部１０は、歌声音声合成再生処理において、全体の進み具合から遅れている期間、つまり、Ｓ５５０の処理で肯定判断されてから再びＳ５５０へ移行しているループの期間は、音声処理モジュール３０に対して、ミュートスイッチ部３７のミュートはオンする指示を与える。このため、この期間において、合成音声の音量は消音、もしくは、低減されることとなる。 At this time, in the singing voice synthesis and reproduction process, the main control unit 10 performs a period of delay from the overall progress, that is, a period of a loop in which a positive determination is made in the process of S550 and the process proceeds to S550 again. An instruction to turn on the mute of the mute switch unit 37 is given to the processing module 30. For this reason, the volume of the synthesized speech is muted or reduced during this period.

また、メイン制御部１０は、歌声音声合成再生処理において、Ｓ５５０の処理で否定判断されて全体の進み具合から遅れていない状態となったときに、音声処理モジュール３０に対して、ミュートスイッチ部３７のミュートを、オンからオフに変更し、ミュートを解除する指示を与える。なおこのとき、生成データはミキサ３５に入力が再開される。そしてミキサ３５に対する合成音声の入力音量は消音から復旧、もしくは低減が解消される。 In addition, the main control unit 10 makes a mute switch unit 37 to the voice processing module 30 when a negative determination is made in the process of S550 in the singing voice synthesizing / reproducing process and it is not delayed from the overall progress. Change the mute from on to off and give instructions to unmute. At this time, input of the generated data to the mixer 35 is resumed. Then, the input volume of the synthesized voice to the mixer 35 is restored from mute or eliminated.

［３．音声処理モジュール３０が実行する処理の説明］
次に、音声合成された歌唱音声を含んだカラオケ楽曲の出力を行う、音声処理モジュール３０の処理について説明する。すなわち、音声処理モジュール３０においては、メイン制御部１０からミュートスイッチ部３３，３７にミュートオン／オフする制御信号が入力されると、この制御信号に基づきミキサ３５への各生成データの入力が制御され、かつ、各生成データがミキサ３５で混合されて出力デバイス（ＤＡＣ３６）に出力される。そして、出力デバイス（ＤＡＣ３６）で可聴可能な音楽信号に変換されてスピーカアンプ５１に入力されることで、スピーカ５２から、音声合成された歌唱音声を含んだカラオケ楽曲が出力される。 [3. Description of processing executed by voice processing module 30]
Next, processing of the voice processing module 30 that outputs karaoke music including voice synthesized voices will be described. That is, in the audio processing module 30, when a control signal for turning on / off the mute is input from the main control unit 10 to the mute switch units 33 and 37, input of each generated data to the mixer 35 is controlled based on the control signal. Each generated data is mixed by the mixer 35 and output to the output device (DAC 36). Then, it is converted into an audible music signal by the output device (DAC 36) and input to the speaker amplifier 51, whereby a karaoke piece including the synthesized voice is output from the speaker 52.

［４．効果］
このように本実施形態のカラオケ装置１によれば、上述した各種処理を実行することにより、次のような効果を有する。 [4. effect]
Thus, according to the karaoke apparatus 1 of this embodiment, it has the following effects by performing the various processes described above.

すなわち、カラオケ楽曲の演奏速度を一時変更する旨の変更指示（トリックプレイ実行指示）を受け付けた際には、カラオケ楽曲の演奏速度を変更するとともに、合成音声の生成および合成音声の出力を停止し、その後、カラオケ楽曲の演奏速度の変更を解除する旨の解除指示（トリックプレイ終了指示）を受け付けた際には、カラオケ楽曲の演奏速度を元に戻すとともに、合成音声の出力をカラオケ楽曲の演奏に同期させるタイミング（合成音声同期タイミング）を設定して、合成音声同期タイミングに合わせて合成音声の生成および合成音声の出力を行う（図８（ａ）参照、トリックプレイとして一時停止を例示）。合成音声の生成が合成音声同期タイミングに間に合わなかった場合には、デジタル音声信号の生成が先行して行われ、デジタル音声信号の生成の終了後には、合成音声の生成に注力して、合成音声の生成がカラオケ楽曲の演奏に追いつくまで合成音声の生成をカラオケ楽曲の演奏速度より大きい値の速度で行うとともに合成音声の出力レベルをミュートする（図８（ｂ）参照、トリックプレイとして一時停止を例示）。 That is, when a change instruction (trick play execution instruction) for temporarily changing the performance speed of karaoke music is received, the performance speed of the karaoke music is changed, and the generation of the synthesized voice and the output of the synthesized voice are stopped. After that, upon receiving a cancel instruction (trick play end instruction) to cancel the change of the performance speed of the karaoke music, the performance speed of the karaoke music is returned to the original and the output of the synthesized voice is sent to the performance of the karaoke music. (Synthetic voice synchronization timing) is set, and synthetic voice is generated and synthesized voice is output in synchronism with the synthetic voice synchronization timing (see FIG. 8A, pause is exemplified as trick play). When the synthesized voice is not generated in time for the synthesized voice synchronization timing, the digital voice signal is generated in advance, and after the digital voice signal is generated, the synthesized voice is focused on. Generation is performed at a speed larger than the performance speed of the karaoke music and the output level of the synthesized voice is muted (see FIG. 8B, paused as trick play). Ex)

したがって、本実施形態のカラオケ装置１によれば、一時停止や早送り、巻き戻しなどのいわゆるトリックプレイから通常の再生状態に復帰する際に、カラオケ演奏に対するガイドボーカルなどの合成音声の遅れを極力小さくするとともに、合成音声の遅れを利用者に極力感じさせないようにすることができ、利用者に与える不快感を最小限にすることができる。 Therefore, according to the karaoke apparatus 1 of the present embodiment, when returning to a normal playback state from so-called trick play such as pause, fast forward, and rewind, the delay of synthesized speech such as guide vocals for karaoke performance is minimized. In addition, it is possible to prevent the user from feeling the delay of the synthesized speech as much as possible, and to minimize discomfort given to the user.

［５．他の実施形態］
以上、本発明の一実施形態について説明したが、本発明は上記実施形態に限定されるものではなく、以下のような様々な態様にて実施することが可能である。 [5. Other Embodiments]
As mentioned above, although one Embodiment of this invention was described, this invention is not limited to the said embodiment, It is possible to implement in the following various aspects.

上記実施形態では、合成音声の生成が合成音声同期タイミングに間に合わなかった場合には、合成音声の生成がカラオケ楽曲の演奏に追いつくまで合成音声の出力レベルをミュートするようにしているが、これには限られず、合成音声の生成がカラオケ楽曲の演奏に追いつくまで合成音声の出力レベルを低減するようにしてもよい。 In the above embodiment, when the synthesized speech is not generated in time for the synthesized speech synchronization timing, the synthesized speech output level is muted until the synthesized speech generation catches up with the performance of the karaoke song. However, the output level of the synthesized voice may be reduced until the generation of the synthesized voice catches up with the performance of the karaoke music.

このように構成しても合成音声の遅れを利用者に極力感じさせないようにすることができ、利用者に与える不快感を最小限にすることができる。 Even with this configuration, it is possible to prevent the user from feeling the delay of the synthesized speech as much as possible, and to minimize the discomfort given to the user.

１…カラオケ装置、１０…メイン制御部、１１…生音再生処理部、１２…ＭＩＤＩシーケンサ、１３…ＭＩＤＩ演奏制御部、１４…ソフトウェアＭＩＤＩ音源部、１５…音声合成処理部、２０…ＨＤＤ、３０…音声処理モジュール、３１…キーコントロール部、３２…ＭＩＤＩ音源、３３…ミュートスイッチ部、３４…エフェクト部、３５…ミキサ、３６…ＤＡＣ、３７…ミュートスイッチ部、４０…操作処理部、４１…操作部、４２…通信インタフェース部、４３…マイクアンプ、４４…ＡＤＣ、４５…映像処理部、５０…モニタ、５１…スピーカアンプ、５２…スピーカ、５３…マイクロフォン、１００…ＬＡＮ。 DESCRIPTION OF SYMBOLS 1 ... Karaoke apparatus, 10 ... Main control part, 11 ... Raw sound reproduction process part, 12 ... MIDI sequencer, 13 ... MIDI performance control part, 14 ... Software MIDI sound source part, 15 ... Speech synthesis process part, 20 ... HDD, 30 ... Audio processing module 31 ... Key control unit 32 ... MIDI sound source 33 ... Mute switch unit 34 ... Effect unit 35 ... Mixer 36 ... DAC 37 ... Mute switch unit 40 ... Operation processing unit 41 ... Operation unit 42 ... Communication interface unit, 43 ... Microphone amplifier, 44 ... ADC, 45 ... Video processing unit, 50 ... Monitor, 51 ... Speaker amplifier, 52 ... Speaker, 53 ... Microphone, 100 ... LAN.

Claims

Music performance means for performing karaoke music performance at a predetermined performance speed based on the music data;
A synthesized voice for singing the karaoke music is generated by voice synthesis based on the voice synthesis data, and the generated synthesized voice is played at the same value as the performance speed of the karaoke music along with the performance of the karaoke music by the music playing means. Speech synthesis processing means for outputting;
Control means for controlling performance of karaoke music by the music performance means, generation of synthesized speech by the speech synthesis processing means, and output of synthesized speech by the voice synthesis processing means;
An instruction receiving means for receiving a change instruction for temporarily changing the performance speed of the karaoke music by the music playing means or a release instruction for releasing the change;
With
The control means includes
When the instruction accepting means accepts the change instruction, it changes the performance speed of the karaoke music by the music playing means, generates synthesized speech by the speech synthesis processing means, and generates synthesized speech by the speech synthesis processing means. Stop the output,
Thereafter, when the instruction receiving means receives the release instruction, the performance speed of the karaoke music by the music performance means is restored, and the output of the synthesized voice by the voice synthesis processing means is output to the karaoke by the music performance means. Set a synthesized voice synchronization timing that is a timing to synchronize with the performance of the music, generate a synthesized voice by the voice synthesis processing means in accordance with the synthesized voice synchronization timing, and output a synthesized voice by the voice synthesis processing means,
If the synthesis of the synthesized speech by the speech synthesis processing means is not in time for the synthesized speech synchronization timing, the speech synthesis is performed until the generation of the synthesized speech by the speech synthesis processing means catches up with the performance of the karaoke music by the music performance means. A karaoke apparatus characterized in that the synthesized voice is generated by the processing means at a speed greater than the performance speed of the karaoke music piece.

The karaoke apparatus according to claim 1,
further,
Live sound reproduction processing means for generating a digital audio signal of karaoke music based on the compressed audio data, and outputting the generated digital audio signal at the same value as the performance speed of the karaoke music along with the performance of the karaoke music by the music performance means. Prepared,
The control means includes
Controlling the generation of the digital audio signal by the raw sound reproduction processing means, and the output of the digital audio signal by the raw sound reproduction processing means,
When the instruction receiving unit receives the change instruction, the digital sound signal generation by the raw sound reproduction processing unit and the output of the digital sound signal by the raw sound reproduction processing unit are stopped,
Thereafter, when the instruction receiving unit receives the release instruction, an audio signal synchronization timing is set which is a timing for synchronizing the output of the digital audio signal by the raw sound reproduction processing unit with the performance of the karaoke music by the music playing unit. Then, in accordance with the audio signal synchronization timing, generation of the digital audio signal by the raw sound reproduction processing unit and output of the digital audio signal by the raw sound reproduction processing unit,
The synthesized voice synchronization timing is set to a time delayed by a predetermined time from the time when the voice signal synchronization timing is set.

In the karaoke apparatus according to claim 1 or 2,
When the synthesized speech generation by the speech synthesis processing unit is not in time for the synthesized speech synchronization timing, the control unit generates the synthesized speech by the speech synthesis processing unit for the performance of the karaoke music by the music performance unit. A karaoke apparatus characterized in that the output level of synthesized speech by the speech synthesis processing means is reduced until it catches up.

A music performance process for performing karaoke music at a predetermined performance speed based on the music data;
Voice synthesis that generates synthesized voice for singing the karaoke music by voice synthesis based on the data for voice synthesis, and outputs the generated synthesized voice at the same value as the performance speed of the karaoke music along with the performance of the karaoke music. Processing,
An instruction reception process for receiving a change instruction for temporarily changing the performance speed of the karaoke music piece or a release instruction for releasing the change;
When the change instruction is received, the performance speed of the karaoke piece is changed, and the generation of the synthesized voice and the output of the synthesized voice are stopped. Returning the performance speed of the music, setting a synthesized voice synchronization timing that is a timing to synchronize the output of the synthesized voice with the performance of the karaoke song, and generating the synthesized voice according to the synthesized voice synchronization timing and When the synthesized speech is output and the synthesized speech is not generated in time for the synthesized speech synchronization timing, the synthesized speech is generated until the synthesized speech catches up with the performance of the karaoke song. Control processing performed at a speed greater than the performance speed of
A program that causes a computer to execute.