JP2006071712A

JP2006071712A - Karaoke system

Info

Publication number: JP2006071712A
Application number: JP2004251821A
Authority: JP
Inventors: Hideyuki Tsuchiya; 秀之土屋; Hiroyuki Matsui; 浩行松井; Yoshihide Okubo; 嘉英大久保; Makoto Umeda; 誠梅田
Original assignee: Daiichikosho Co Ltd
Current assignee: Daiichikosho Co Ltd
Priority date: 2004-08-31
Filing date: 2004-08-31
Publication date: 2006-03-16

Abstract

<P>PROBLEM TO BE SOLVED: To provide a karaoke system which is constituted so as to output pictures which are easy to see of picture contents with picture contents, such as motion pictures, drama and to output mixed voices in which the sound output of the picture contents and voices of play words of the user are mixed, by utilizing the functions of the contentional karake system effectively, and with which the user can experience pictures and sounds in which the user oneself participates, and which have presence, and also the user can perform the vocal training of the lines of a character of the motion picture or the drama, by allowing the lines to efficiently assist the user. <P>SOLUTION: The karaoke system consists essentially of a karaoke playing device (1) which is equipped with a display means (D) and a osteo conducting means (B), and has picture contents in which the characters having play words are included, and which outputs sounds, by eliminating all of the picture contents or the voices of play words of a prescribed character, and also which is equipped with mixing and voice outputting means (13) which mixes the sound output and the voices of the play words of the user and outputs the mixed voices. The karaoke system is constructed, by being equipped with an output control means (18) for performing control so as to output the pictures of the picture content to the display means (D) and so as to output voices of the eliminated play words of the osteo conduction to the osteo conduction means (B) by making it synchronize with the outputting of the pictures of the picture content having a prescribed preceding of time. <P>COPYRIGHT: (C)2006,JPO&NCIPI

Description

本発明は、映像コンテンツの映像出力と、その音響出力と利用者の台詞音声とをミキシング音声出力できると共に、骨伝導手段を備えたカラオケ演奏装置を主体とするカラオケシステムに関する。 The present invention relates to a karaoke system mainly composed of a karaoke performance device provided with bone conduction means and capable of mixing a video output of video content, a sound output of the video content, and a user's speech.

近年、老若男女を問わずカラオケが親しまれているが、特に最近では、個室内にカラオケ演奏装置を設置したカラオケルームを多数備える、所謂「カラオケボックス」が広く普及している。カラオケルームは、その室内にて大音響で音楽を演奏させても、あるいは、大出力で歌唱したとしても、その防音機能により、外部にはほとんど音が漏れないように設計されている。 In recent years, karaoke has become popular regardless of age or gender, but recently, so-called “karaoke boxes”, which have many karaoke rooms with karaoke performance devices installed in private rooms, have become widespread. The karaoke room is designed so that almost no sound is leaked to the outside by the soundproofing function even when music is played with high sound in the room or singing with high output.

さらに、最近では、カラオケ演奏装置が付帯する表示装置は液晶技術やプラズマ技術の進展により薄型で大型化し、これと従来の防音機能を併用し、カラオケルーム内でカラオケ以外の映像コンテンツ、具体的には映画やドラマを鑑賞するサービスが提供されるようになっている。このようなサービスに関しては、例えば、特許文献１では、カラオケ演奏装置が付帯する各装置を併用し、映画鑑賞ができるカラオケシステムが開示されている。 Furthermore, recently, the display device attached to the karaoke performance device has become thinner and larger due to the development of liquid crystal technology and plasma technology, and combined with this conventional soundproofing function, video content other than karaoke in the karaoke room, specifically Has become available for watching movies and dramas. With regard to such a service, for example, Patent Document 1 discloses a karaoke system in which each device attached to a karaoke performance device can be used together to watch a movie.

ところで、映画やドラマなどの映像コンテンツの利用法としては、例えば、アニメや洋画の声優の真似事をしたいと思う人達や声優を本気で目指す人達が、声優がアフターレコーディング所謂「アフレコ」をしているが如く、映像コンテンツから環境音（台詞音声以外の自然音や効果音など）を残し、登場人物の台詞音声を消去して音響出力することで、消去された登場人物の台詞を、その台本を見ながら発声練習をしたり、あるいは、語学の修得を目指す人達が、同様な方法で、外国語の台詞の発声練習をしたりするなどがある。 By the way, as a way to use video content such as movies and dramas, for example, people who want to imitate voice actors of animation and Western movies and those who are serious about voice actors do so-called “after-recording”. As it is, the environmental sound (natural sounds and sound effects other than speech sound) is left from the video content, and the speech of the character is erased and output by audible output. You can practice speaking while watching the language, or people who want to learn a language can practice speaking foreign languages in the same way.

このような映像コンテンツを利用して、台詞の発声練習をするためには、発声練習の対象となる登場人物の台詞音声を消去しなければならないが、現在の音響処理技術では、特に難しいことではない。例えば、上記特許文献１でも、映画を構成する映像信号と音声信号とを別処理できるシステムが開示され、映像信号と音声信号とを別処理とすることにより、映像に音声を付加したり消去したりできる。さらに、現在では、環境音に加え、登場人物毎に複数の音声信号を重畳して構成したり、全ての音声信号データから特定の音声周波数帯の音声信号を抽出したりして、それぞれの音声信号を別々に処理することで、例えば、環境音を残して全ての登場人物の台詞音声を消去することは勿論、所定の登場人物だけの台詞を消去することもできる。 In order to practice speech utterance using such video content, it is necessary to erase the speech of the characters that are the subject of speech practice, but this is not particularly difficult with current acoustic processing technology. Absent. For example, Patent Document 1 also discloses a system capable of separately processing a video signal and an audio signal constituting a movie, and adding or erasing audio to a video by performing separate processing on the video signal and the audio signal. You can. Moreover, at present, in addition to environmental sounds, a plurality of audio signals are superimposed for each character, or audio signals in a specific audio frequency band are extracted from all audio signal data. By separately processing the signals, for example, it is possible to erase the speech of all the characters while leaving the environmental sound, and it is also possible to erase the speech of only a predetermined character.

また、台詞の発声練習をするためには、一般的に、利用者には台本が必要となるが、映像コンテンツは映像を中心として鑑賞するため、例えば、カラオケの背景映像にスーパーインポーズされた歌詞テロップのように、台詞をアシストする字幕テロップを入れたとしても、映像が見難くなって好ましくない。そこで、字幕テロップを入れずに、音声によって台詞をアシストする方法が好ましくなる。従来のカラオケの歌唱サービスでは、歌詞テロップの代わりに、所謂「模範ボーカル」と呼ばれる歌詞アシスト音声にて歌詞を利用者に伝える技術が想到されている。例えば、特許文献２では、テレビモニタ等に表示される歌詞テロップに依らず、カラオケ演奏の進行に先んじて歌唱者に次に歌唱すべき歌詞を伝えるように、歌詞アシスト音声の出力を制御する技術が開示されている。また、特許文献３では、歌詞テロップを見られない車の運転者が文字表示を見ないでも歌詞を確認できるように、読上げ時期条件や声質条件からなる読上げ条件データを歌詞データに付加し、声質条件に従って歌詞の合成音声（歌詞アシスト音声）を生成する技術が開示されている。
特開平８−２９２７８０号公報特開平１１−１３３９８９号公報特開平１１−１６７３９２号公報 In addition, in order to practice speech utterance, the user generally needs a script, but the video content is superimposed on the background video of karaoke, for example, in order to appreciate the video content. Even if a caption telop that assists the dialogue is inserted like a lyric telop, it is not preferable because the video becomes difficult to see. Therefore, it is preferable to use a method that assists speech by voice without inserting a caption telop. In the conventional karaoke singing service, a technique has been conceived in which lyrics are transmitted to the user by means of a lyrics assist voice called “exemplary vocal” instead of the lyrics telop. For example, in Patent Document 2, a technique for controlling the output of the lyrics assist voice so as to convey the lyrics to be sung next to the singer prior to the progress of the karaoke performance, regardless of the lyrics telop displayed on the TV monitor or the like. Is disclosed. Further, in Patent Document 3, reading condition data including reading time conditions and voice quality conditions is added to the lyrics data so that a driver who cannot see the lyrics telop can check the lyrics without looking at the character display. A technique for generating synthesized speech of lyrics (lyric assist sound) according to conditions is disclosed.
JP-A-8-292780 Japanese Patent Laid-Open No. 11-133898 JP-A-11-167392

しかしながら、このような従来技術を、映像コンテンツを利用した台詞の発声練習に適応することは難しい。すなわち、カラオケを歌唱する際は、カラオケの伴奏音楽があるため模範ボーカルなどが多少聞こえてもさほど気にならないが、映画やドラマなどの映像コンテンツでは伴奏音楽はなく、台詞をアシストする音声が発せられると雰囲気が崩れてしまうのである。このような問題に対して、例えば、台詞アシスト音声を専用の頭部装着型（ヘッドホン型やイヤホーン型）の音圧式スピーカから発するようにして、これを利用者が使用することで、他の利用者にはこの台詞アシスト音声を聞こえなくする手法も考えられるが、利用者の耳を塞ぐヘッドホンを付けた状態では、利用者自身が他の登場人物の台詞が聞き取り難くなって好ましくない。 However, it is difficult to apply such conventional technology to speech utterance practice using video content. In other words, when singing karaoke, there is karaoke accompaniment music, so even if you can hear some model vocals etc., there is no accompaniment music in movie contents such as movies and dramas, and speech to assist dialogue is emitted. If it is done, the atmosphere will collapse. In response to such problems, for example, speech assist sound is emitted from a dedicated head-mounted type (headphone type or earphone type) sound pressure type speaker and used by the user for other purposes. A method of making the dialogue assist voice inaudible to the user is also conceivable. However, it is not preferable that the user himself / herself has difficulty in listening to the dialogue of other characters when wearing headphones that block the user's ears.

そこで、本発明は、従来のカラオケシステムの機能を有効に活用し、映画やドラマなどの映像コンテンツをもって、見易い映像コンテンツの映像出力と、その音響出力と利用者の台詞音声をミキシング音声出力し、利用者自らが参加した臨場感ある映像と音響を体感できると共に、利用者に対して好適に台詞をアシストでき、その登場人物の台詞の発声練習を効率良く行えるカラオケシステムの提供を課題とする。 Therefore, the present invention effectively utilizes the function of the conventional karaoke system, and with video content such as movies and dramas, video output of video content that is easy to see, and its audio output and user's speech are mixed and output. It is an object of the present invention to provide a karaoke system that allows users to experience the realistic video and sound that they participate in, can assist the user in the dialogue, and can efficiently practice the speech of the characters.

上記課題を鑑み、本発明者らは、表示手段と骨伝導手段を備え、映像コンテンツを映像出力する際、利用者が所望する登場人物の台詞音声を消去し、その音響出力と利用者の台詞音声とをミキシング音声出力すると共に、表示手段には字幕テロップなどを入れずに映像出力し、当該映像コンテンツの映像出力と所定の時間先行をもって同期させ、その消去された台詞を骨伝導手段にて骨伝導出力させるように制御するカラオケシステムを構築することで、上記課題を解決できることを見出し、本発明のカラオケシステムを想到した。 In view of the above problems, the present inventors are provided with display means and bone conduction means, and when outputting video content, erase the speech of the character desired by the user, and output the sound and the user's speech. In addition to outputting audio mixed with audio, the display means outputs video without subtitle telop, etc., and synchronizes with the video output of the video content for a predetermined time in advance, and the erased dialogue is transmitted by the bone conduction means. It has been found that the above-mentioned problems can be solved by constructing a karaoke system for controlling the bone conduction output, and the karaoke system of the present invention has been conceived.

すなわち、本発明のカラオケシステムは、表示手段と骨伝導手段を備え、台詞のある登場人物が含まれた映像コンテンツを有し、当該映像コンテンツの全てまたは所定の前記登場人物の台詞音声を消去して音響出力すると共に、その音響出力と利用者の台詞音声とをミキシングして音声出力するためのミキシング音声出力手段を備えたカラオケ演奏装置を主体としてなり、
前記表示手段には、前記映像コンテンツを映像出力し、
前記骨伝導手段には、当該映像コンテンツの映像出力と所定の時間先行をもって同期させ、前記消去された台詞音声を骨伝導出力させる、
ように制御するための出力制御手段を有してなることを特徴とする。 That is, the karaoke system of the present invention includes display means and bone conduction means, has video content including a character with dialogue, and erases all of the video content or speech of the predetermined character. A karaoke performance apparatus having a mixing sound output means for mixing the sound output with the user's speech and outputting the sound.
The display means outputs the video content as video,
The bone conduction means is synchronized with the video output of the video content for a predetermined time in advance, and the erased speech is bone conduction output.
It is characterized by having an output control means for controlling as described above.

本発明のカラオケシステムによれば、利用者が所望する登場人物の台詞音声を消去して音響出力し、これと利用者の台詞音声をミキシング音声出力すると共に、表示手段には映像コンテンツを映像出力し、その消去された台詞を骨伝導手段にて骨伝導出力させることにより、利用者自らが参加した臨場感ある映像と音響を体感できると共に、自らの台詞音声と台詞アシスト音声とが紛らわしくなく、それぞれが明瞭に聞こえるようになり、他の利用者に対しては利用者の台詞音声だけを聞けるようにでき、映像コンテンツを利用した台詞の発声練習を効率良く行えるなどといった効果を奏する。 According to the karaoke system of the present invention, the speech of the character desired by the user is erased and output as sound, the speech of the user and the user's speech are mixed, and the video content is output to the display means. However, by outputting the erased dialogue to the bone conduction means with bone conduction means, it is possible to experience the realistic video and sound that the user himself participated in, and his speech speech and speech assist speech are not misleading, Each of them can be heard clearly, and other users can hear only the speech of the user, and the speech utterance practice using the video content can be efficiently performed.

以下、本発明のカラオケシステムについて最適な実施例を挙げて説明するが、先ず、図１に示す、本発明のカラオケシステムを構成するカラオケ演奏装置のブロック構成図と、図２に示す、映像コンテンツデータの概略的構成図と、図３に示す、映像出力と台詞音声の骨伝導出力との同期方法の概念図と、図４に示す、骨伝導手段の外観斜視図に基づき、本システムの全体構成について説明する。 Hereinafter, the karaoke system of the present invention will be described with reference to an optimum embodiment. First, the block configuration diagram of the karaoke performance apparatus constituting the karaoke system of the present invention shown in FIG. 1 and the video content shown in FIG. Based on the schematic configuration diagram of data, the conceptual diagram of the method of synchronizing the video output and the bone conduction output of the dialogue voice shown in FIG. 3, and the external perspective view of the bone conduction means shown in FIG. The configuration will be described.

本実施例のカラオケシステムを構成するカラオケ演奏装置（１）は、装置全体の動作を制御する中央制御手段（２）と、これに接続された各種機器で構成され、この中央制御手段（２）には、ハードディスク（３）、ＲＡＭ（４）、音源（シンセサイザ：５）、ミキサ（６）、ＰＣＭデコーダ（９）、ＭＰＥＧデコーダ（１０）、合成回路（１１）、リモコン装置（Ｒ）などが接続されている。特に、ハードディスク（３）には、歌唱コンテンツデータ（Ｃ１）として、個々のカラオケ楽曲を特定する楽曲コード（２０）に紐付けされた、演奏データ（２１）、背景映像データ（２２）、歌詞テロップデータ（２３）などが記録され、一方、映像コンテンツデータ（Ｃ２）として、個々の映像コンテンツを特定する映像コード（２４）に紐付けされた、映像データ（２５）、音響データ（２６）などが記録されている。 The karaoke performance device (1) constituting the karaoke system of the present embodiment is composed of a central control means (2) for controlling the operation of the entire device and various devices connected thereto, and this central control means (2). Includes a hard disk (3), a RAM (4), a sound source (synthesizer: 5), a mixer (6), a PCM decoder (9), an MPEG decoder (10), a synthesis circuit (11), a remote control device (R), and the like. It is connected. In particular, on the hard disk (3), performance data (21), background video data (22), and lyrics telop linked to a music code (20) for specifying individual karaoke music as singing content data (C1). Data (23) and the like are recorded. On the other hand, as video content data (C2), video data (25), audio data (26) and the like associated with video code (24) specifying individual video content are recorded. It is recorded.

ここで、各機能手段について少々説明する。通信制御手段（１２）は、ＡＤＳＬ回線、ＩＳＤＮ回線、一般電話回線などを介して、インターネット上にカラオケ事業者が管理するＶＰＮ（Virtual Private Network）を用いてカラオケホスト装置（図示省略）に接続するための制御を行う。このカラオケホスト装置からは、個々のカラオケ演奏装置（１）へ歌唱コンテンツデータ（Ｃ１）および映像コンテンツデータ（Ｃ２）に関する各種データが配信される。 Here, each functional means will be described briefly. The communication control means (12) connects to a karaoke host device (not shown) using a VPN (Virtual Private Network) managed by a karaoke company on the Internet via an ADSL line, an ISDN line, a general telephone line, or the like. Control for. From this karaoke host device, various data relating to singing content data (C1) and video content data (C2) are distributed to individual karaoke performance devices (1).

歌唱コンテンツデータ（Ｃ１）の場合、音源（５）は、中央制御手段（２）が実行する楽曲シーケンサ（１５）の処理によって入力された演奏データに応じて楽音信号を形成する。形成された楽音信号はミキサ（６）に入力され、このミキサ（６）は、音源（５）が発生した複数の楽音信号やカラオケマイク（Ｍ）と、Ａ／Ｄコンバータ（８）を介して入力された利用者の歌唱音声信号を適当なバランスでミキシングする。ミキシングされたデジタル音声信号はサウンドシステム（ＳＳ：７）に入力される。このサウンドシステム（７）はパワーアンプを備え、入力されたデジタル信号をアナログ信号に変換して増幅してスピーカ（Ｓ）から楽音と歌唱音声を放音する。なお、Ａ／Ｄコンバータ（８）によってデジタル信号に変換された歌唱音声信号はボーカルアダプタ（図示省略）にも入力され、このボーカルアダプタから採取された歌唱音声信号の歌唱周波数により歌唱力の自動採点などが行われる。 In the case of singing content data (C1), the sound source (5) forms a musical sound signal according to the performance data input by the music sequencer (15) executed by the central control means (2). The formed musical sound signal is input to a mixer (6), which mixes a plurality of musical sound signals generated by the sound source (5), a karaoke microphone (M), and an A / D converter (8). The input user's singing voice signal is mixed with an appropriate balance. The mixed digital audio signal is input to the sound system (SS: 7). This sound system (7) includes a power amplifier, converts an input digital signal into an analog signal, amplifies it, and emits a musical sound and a singing sound from the speaker (S). The singing voice signal converted into a digital signal by the A / D converter (8) is also input to a vocal adapter (not shown), and the singing power is automatically scored based on the singing frequency of the singing voice signal collected from the vocal adapter. Etc. are performed.

一方、映像コンテンツデータ（Ｃ２）の場合、ミキシング音声出力手段（１３）は、ＰＣＭ（Pulse Code Modulation）形式で記録されている音響データ（２６）を抽出し、これをＰＣＭデコーダ（９）にて再生させて音響出力し、これをミキサ（６）に入力するように制御する。そして、歌唱コンテンツの場合と同様、ミキサ（６）は、カラオケマイク（Ｍ）からＡ／Ｄコンバータ（８）を介して入力された利用者の台詞音声を適当なバランスでミキシングし、ミキシングされたデジタル音声信号はサウンドシステム（７）に入力される。なお、歌唱コンテンツデータ（Ｃ１）におけるのと同様、Ａ／Ｄコンバータ（８）によってデジタル信号に変換された台詞音声信号をボーカルアダプタ（図示省略）にも入力することで、このボーカルアダプタから採取された台詞音声信号の発声周波数によって、例えば、これを外国語の台詞音声の自動採点にも利用できるように構成すれば、外国語の発声練習にも効果的に活用することもできる。 On the other hand, in the case of video content data (C2), the mixing audio output means (13) extracts the acoustic data (26) recorded in the PCM (Pulse Code Modulation) format, and this is extracted by the PCM decoder (9). It reproduces and outputs sound, and controls to input this to the mixer (6). As in the case of the singing content, the mixer (6) mixes the user's speech input from the karaoke microphone (M) via the A / D converter (8) with an appropriate balance, and is mixed. The digital audio signal is input to the sound system (7). As in the singing content data (C1), the speech audio signal converted into a digital signal by the A / D converter (8) is also input to the vocal adapter (not shown), and is collected from this vocal adapter. Depending on the speech frequency of the dialogue speech signal, for example, if it can be used for automatic scoring of speech speech in a foreign language, it can also be effectively used for practicing speech in a foreign language.

ハードディスク（３）に記録されている歌唱コンテンツデータ（Ｃ１）の背景映像データ（２２）と映像コンテンツデータ（Ｃ２）の映像データ（２５）は共にＭＰＥＧ（Moving Picture Experts）形式で記録されており、出力制御手段（１８）が、これを読み出して、ＭＰＥＧデコーダ（１０）に入力して再生処理を行う。歌唱コンテンツデータ（Ｃ１）の場合、このＭＰＥＧデコーダ（１０）は、入力されたＭＰＥＧデータをＮＴＳＣの映像信号に変換して合成回路（１１）に入力し、この合成回路（１１）は、ＶＲＡＭ（Video RAM）上で、この映像信号上で歌詞テロップをＯＳＤ合成し、その映像信号を表示手段（Ｄ）に表示する。 Both the background video data (22) of the singing content data (C1) recorded in the hard disk (3) and the video data (25) of the video content data (C2) are recorded in the MPEG (Moving Picture Experts) format. The output control means (18) reads this out and inputs it to the MPEG decoder (10) for playback processing. In the case of singing content data (C1), the MPEG decoder (10) converts the input MPEG data into an NTSC video signal and inputs it to the synthesis circuit (11). The synthesis circuit (11) On the video RAM), the lyrics telop is OSD synthesized on the video signal, and the video signal is displayed on the display means (D).

なお、この合成回路（１１）はフェード機能も有しており、歌詞テロップをフェードイン・フェードアウトしたり、複数行の歌詞テロップを表示する際、その一部の行のみをフェードインしたりフェードアウトしたりすることもできる。また、映像コンテンツデータ（Ｃ２）の場合、出力制御手段（１８）は、映像データ（２５）がＭＰＥＧデコーダ（１０）により、その圧縮データが伸長されてコンポジットの映像信号に変換された後、表示手段（Ｄ）に映像出力するように制御するが、この時、台詞の字幕テロップデータなどの合成は行わない。 The synthesizing circuit (11) also has a fade function. When a lyrics telop is faded in or faded out, or when a plurality of lyrics telops are displayed, only a part of the lines is faded in or faded out. You can also. In the case of video content data (C2), the output control means (18) displays the video data (25) after the compressed data is expanded by the MPEG decoder (10) and converted into a composite video signal. Control is performed so that video is output to the means (D), but at this time, synthesis of dialogue caption telop data and the like is not performed.

リモコン装置（Ｒ）は利用者インタフェイスやリモコン信号送信回路などからなり、利用者の操作に応じた操作信号を中央制御手段（２）に入力する。この中央制御手段（２）は、操作入力処理手段（１９）によって、この操作信号を検出し、対応する処理を行う。例えば、リモコン装置（Ｒ）にて楽曲コードが入力されると、これを操作入力処理手段（１９）が検出し、カラオケ楽曲のリクエストであるとしてシーケンサ（１４）に伝達し、同じように、映像コードが入力されると、映像コンテンツのリクエストであるとしてシーケンサ（１４）に伝達する。このシーケンサ（１４）は、これに応じて、この楽曲コード（２０）や映像コード（２４）で識別される演奏データ（２１）や映像データ（２５）などをハードディスク（３）から読み出す。 The remote control device (R) includes a user interface, a remote control signal transmission circuit, and the like, and inputs an operation signal corresponding to a user operation to the central control means (2). The central control means (2) detects the operation signal by the operation input processing means (19) and performs corresponding processing. For example, when a music code is input by the remote control device (R), the operation input processing means (19) detects this and transmits it to the sequencer (14) as a request for karaoke music. When the code is input, it is transmitted to the sequencer (14) as a request for video content. In response to this, the sequencer (14) reads performance data (21), video data (25) and the like identified by the music code (20) and video code (24) from the hard disk (3).

なお、このシーケンサ（１４）は主に楽曲シーケンサ（１５）、文字シーケンサ（１６）からなっており、楽曲シーケンサ（１５）は、演奏データ中の演奏データトラック、ガイドメロディトラックなどのトラックデータを読み出し、このデータで音源（５）を制御することでカラオケ楽曲の演奏を行う。一方、文字シーケンサ（１６）は、文字パターン作成手段（１７）を備えており、歌唱コンテンツデータ（Ｃ１）における歌詞テロップデータ（２３）の画像パターンを作成し、これを上記した合成回路（１１）に出力する。 The sequencer (14) mainly includes a music sequencer (15) and a character sequencer (16). The music sequencer (15) reads track data such as performance data tracks and guide melody tracks in the performance data. The karaoke piece is played by controlling the sound source (5) with this data. On the other hand, the character sequencer (16) is provided with a character pattern creation means (17), creates an image pattern of the lyrics telop data (23) in the singing content data (C1), and generates the image pattern as described above. Output to.

そして、出力制御手段（１８）は、消去された台詞音声を骨伝導手段（Ｂ）にて骨伝導出力させる。具体的には、ＰＣＭ形式で記録されている音響データ（２６）の内、利用者により指定された登場人物の台詞音声データを抽出し、このデータをＰＣＭデコーダ（９）をもって伸張した後、骨伝導信号に変換して骨伝導手段（Ｂ）にて骨伝導出力させる。なお、本発明における骨伝導手段とは、現在公知のもので良く、人の骨を介して音声や楽音などの振動波を伝導し、脳に情報を伝達するものであって、最近では、携帯電話にも装着されているものもある。通常、頭蓋骨や下顎骨などの骨を介して情報を伝達するものであるが、最近では、手の甲からも情報が伝達できるまでに性能が向上している。 Then, the output control means (18) causes the erased speech to be bone conduction output by the bone conduction means (B). Specifically, the speech data of the character designated by the user is extracted from the acoustic data (26) recorded in the PCM format, and this data is expanded by the PCM decoder (9). It is converted into a conduction signal, and the bone conduction means (B) outputs bone conduction. The bone conduction means in the present invention may be a currently known means, which conducts vibration waves such as voice and musical sound through a human bone and transmits information to the brain. Some are also attached to the phone. Usually, information is transmitted through bones such as the skull and mandible, but recently, performance has been improved to the extent that information can be transmitted from the back of the hand.

図２に示すように、本実施例の映像コンテンツデータ（Ｃ２）は、個々の映像コンテンツ毎に映像コード（２４）が付されて識別される。その構成は主に映像データ（２５）、音響データ（２６）からなる。音響データ（２６）は、環境音データ（２６ｘ）に加え、台詞のある登場人物ＡからＤを識別する「人物コードａ」から「人物コードｄ」に紐付けされた台詞音声データ（２６ａ）、（２６ｂ）、（２６ｃ）、（２６ｄ）からなり、各台詞音声データは個別に処理できるように独立可能に記録されている。利用者は自ら所望する登場人物を選択することで、環境音データ（２６ｘ）を残して全ての登場人物の台詞音声データ（２６ａ）、（２６ｂ）、（２６ｃ）、（２６ｄ）を消去することは勿論、所定の登場人物だけの台詞を消去することもでき、消去された台詞音声データを除いた音響データが音響出力され、消去された台詞音声は骨伝導出力される。 As shown in FIG. 2, the video content data (C2) of the present embodiment is identified by adding a video code (24) for each video content. The configuration mainly consists of video data (25) and audio data (26). In addition to the environmental sound data (26x), the acoustic data (26) includes dialogue voice data (26a) associated with “person code a” to “person code d” for identifying the characters A to D having dialogue. It consists of (26b), (26c), and (26d), and each speech audio data is recorded independently so that it can be processed individually. By selecting the desired character, the user deletes the speech audio data (26a), (26b), (26c), and (26d) of all the characters while leaving the environmental sound data (26x). Of course, it is also possible to delete the dialogue of only a predetermined character, acoustic data excluding the erased dialogue voice data is acoustically output, and the erased dialogue voice is output as bone conduction.

図３に示すように、映像データには、その映像出力の進行に伴うタイムコードデータが付帯され、このタイムコードデータに基づき、映像コンテンツの映像出力との同期を取るように、登場人物Ａ〜Ｄ毎にその台詞音声の骨伝導出力タイミングが設定されている。この時、本発明では所定の時間先行をもって同期が取られる。例えば、本実施例ではタイムコードデータに５秒先行して同期を取ることで、実際に登場人物が台詞を発声する５秒前から、その台詞音声が骨伝導出力される。これは、台詞をアシストするために先行して利用者に台詞を知らせるためであり、その先行時間は任意に設定できる。また、さらに好ましくは、台詞音声の骨伝導出力速度は通常の台詞音声速度より速くすることで、骨伝導出力と次の台詞音声が重ならないように構成するのが良い。 As shown in FIG. 3, the video data is accompanied by time code data accompanying the progress of the video output, and characters A to C are synchronized with the video output of the video content based on the time code data. The bone conduction output timing of the speech is set for each D. At this time, in the present invention, synchronization is established with a predetermined time in advance. For example, in this embodiment, by synchronizing the time code data 5 seconds ahead of time, the speech sound is bone conduction output from 5 seconds before the character actually utters the speech. This is for informing the user of the dialogue in advance to assist the dialogue, and the preceding time can be arbitrarily set. More preferably, the bone conduction output speed of the speech sound is made faster than the normal speech sound speed so that the bone conduction output and the next speech sound do not overlap.

前述したように、骨伝導とは音声を骨伝導音として頭蓋骨や下顎骨を介して伝達するもので、通常は骨伝導スピーカが耳の周辺部に密着した状態で使用される。図４の（イ）は指嵌め込み型の骨伝導手段（Ｂ）の外観図であって、保持アーム（Ｂ１）と骨伝導スピーカ（Ｂ２）との間に指を２〜３本程度入れて全体を保持し、これを利用者の頭部のこめかみ部や下顎部に当てながら台詞音声を聞くことができる。例えば、利用者は片手にマイクを持ち、一方の片手に当該骨伝導手段を持って台詞を発するのである。また、図４の（ロ）は頭部装着型の骨伝導手段（Ｂ）の外観図であって、保持アーム（Ｂ１）と左右の骨伝導スピーカ（Ｂ２）のと間に、利用者の頭部のこめかみ部を挟み入れるように装着する。この構成では利用者が骨伝導手段を手に持つ必要がない。なお、本実施例では、台詞音声は有線通信回線（Ｂ３）にてカラオケ演奏装置本体から送られるが、本発明はこれに限らず、無線通信方式でも構わない。 As described above, bone conduction is to transmit sound as bone conduction sound through the skull or mandible, and is usually used in a state where the bone conduction speaker is in close contact with the peripheral portion of the ear. FIG. 4 (a) is an external view of the finger-fitting type bone conduction means (B), and the whole is obtained by putting about 2 to 3 fingers between the holding arm (B1) and the bone conduction speaker (B2). Can be listened to while listening to the temple or lower jaw of the user's head. For example, the user has a microphone in one hand and utters speech with the bone conduction means in one hand. FIG. 4B is an external view of the head-mounted bone conduction means (B), and the user's head between the holding arm (B1) and the left and right bone conduction speakers (B2). Attach so that the temple part of the part is inserted. This configuration does not require the user to have bone conduction means in their hands. In this embodiment, the speech is transmitted from the karaoke performance apparatus main body via the wired communication line (B3). However, the present invention is not limited to this, and a wireless communication system may be used.

以下、本発明のカラオケシステムについて、図５に示す、本発明のカラオケシステムのシステムフロー図、図６に示す、カラオケリモコン装置の外観斜視図、図７および図８に示す、カラオケリモコン装置の利用者インタフェイス表示画面に基づき、本システムにおける処理手順について詳述する。 5 is a system flow diagram of the karaoke system of the present invention shown in FIG. 5, an external perspective view of the karaoke remote control apparatus shown in FIG. 6, and the use of the karaoke remote control apparatus shown in FIGS. The processing procedure in this system will be described in detail based on the person interface display screen.

先ず、図５に示すように、利用者は映像コードにより映像コンテンツを指定する（Ｓ１）。映像コンテンツの指定は、図６に示す、カラオケリモコン装置（Ｒ）を利用するが、ここで、カラオケリモコン装置（Ｒ）について少々説明する。カラオケリモコン装置（Ｒ）は、その正面に液晶ディスプレイ（Ｒａ）とタッチセンサ（Ｒｂ）とを積層しており、このタッチセンサ（Ｒｂ）は利用者にＧＵＩ（Graphical User Interface）環境を提供して利用者インタフェイスを形成し、利用者からの指示をカラオケ演奏装置に伝達する。 First, as shown in FIG. 5, the user designates video content by a video code (S1). The video content is specified by using the karaoke remote control device (R) shown in FIG. 6. Here, the karaoke remote control device (R) will be described briefly. The karaoke remote control device (R) has a liquid crystal display (Ra) and a touch sensor (Rb) stacked on the front, and the touch sensor (Rb) provides a GUI (Graphical User Interface) environment to the user. A user interface is formed and instructions from the user are transmitted to the karaoke performance device.

図７の表示画面（Ｒａ１）は、利用者が映像コンテンツを指定する際の利用者インタフェイスを示しており、この表示画面（Ｒａ１）には、歌唱コンテンツか映像コンテンツかを選択するための「カラオケコンテンツ」アイコン（３０）と「シネマコンテンツ」アイコン（３１）が設けられており、利用者が後者を選択すると表示画面（Ｒａ２）に切り替わる。この表示画面（Ｒａ２）では、利用者は所望する映像コンテンツの映像コードを、カラオケ楽曲の指定と同様、テンキー列（３３）を利用して映像コード入力欄（３４）に入力する。入力後、「転送」アイコン（３５）を選択すると映像コンテンツが正式に指定される。 The display screen (Ra1) in FIG. 7 shows a user interface when the user designates video content, and this display screen (Ra1) displays “Song content or video content”. A “karaoke content” icon (30) and a “cinema content” icon (31) are provided, and when the user selects the latter, the screen is switched to the display screen (Ra2). On this display screen (Ra2), the user inputs the video code of the desired video content into the video code input field (34) using the numeric keypad (33), as in the case of specifying the karaoke music piece. After input, when the “transfer” icon (35) is selected, the video content is formally designated.

映像コンテンツが正式に指定されると、次に、図５に示すように、利用者は自らが所望する登場人物を、その人物コードにより指定する（Ｓ２）。ここで、図８の表示画面（Ｒａ３）は登場人物を指定するための利用者インタフェイス表示画面である。この表示画面（Ｒａ３）では、台詞のある登場人物の顔写真（あるいはアニメ画）と共に、登場人物のＩＤである氏名ないしニックネームなどを掲載した登場人物指定欄（３６）が設けられ、それぞれの登場人物を指定するための「指定」アイコン（３６ａ）が備えられており、利用者は所望の登場人物を指定して「転送」アイコン（３５）を選択すると登場人物が正式に指定される。なお、本発明では、登場人物の指定は全てでも所定でも構わず、また、所定の場合であっても単数でも複数でも構わない。 If the video content is formally designated, then, as shown in FIG. 5, the user designates the desired character by the person code (S2). Here, the display screen (Ra3) of FIG. 8 is a user interface display screen for designating a character. In this display screen (Ra3), there is a character designation field (36) in which the name or nickname that is the ID of the character is posted together with a face photo (or animation) of the character with the dialogue, and each appearance A “designation” icon (36a) for designating a person is provided. When a user designates a desired character and selects a “transfer” icon (35), the character is formally designated. In the present invention, all of the characters may be specified in a predetermined manner, or a single character or a plurality of characters may be specified in a predetermined case.

登場人物が正式に指定されると、次に、図５に示すように、映像コンテンツの音響出力については、指定登場人物の台詞音声データ以外の音響データが抽出され（Ｓ３）、この抽出音響データが出力される（Ｓ４）。この時、互いの同期をとって、表示手段では映像データが出力され（Ｓ５）、骨伝導手段では、映像コンテンツの映像出力と所定の時間先行をもって同期させ、指定された登場人物の台詞音声が骨伝導出力される（Ｓ６）。そして、映像コンテンツの映像出力中に、利用者はこの台詞音声の骨伝導出力を聞きながら台詞音声を発すると、その音響出力とミキシングされ音声出力される（Ｓ７）。 When the character is formally designated, next, as shown in FIG. 5, for the audio output of the video content, sound data other than the speech data of the designated character is extracted (S3), and this extracted sound data Is output (S4). At this time, video data is output from the display means in synchronism with each other (S5), and the bone conduction means synchronizes with the video output of the video content for a predetermined time in advance, and the speech of the designated character is received. The bone conduction is output (S6). When the user utters speech while listening to the bone conduction output of the speech during video output of the video content, it is mixed with the acoustic output and output as speech (S7).

以上、詳述したように、本発明のカラオケシステムによれば、利用者が所望する登場人物の台詞音声を消去して音響出力し、これと利用者の台詞音声をミキシング音声出力すると共に、表示手段には映像コンテンツを映像出力し、その消去された台詞を骨伝導手段にて骨伝導出力させることにより、利用者自らが参加した臨場感ある映像と音響を体感できると共に、自らの台詞音声と台詞アシスト音声とが紛らわしくなく、それぞれが明瞭に聞こえるようになり、他の利用者に対しては利用者の台詞音声だけを聞けるようにでき、映像コンテンツを利用した台詞の発声練習を効率良く行える。 As described above in detail, according to the karaoke system of the present invention, the dialogue voice of the character desired by the user is erased and outputted as sound, and this and the user's dialogue voice are output as mixing voice and displayed. By outputting video content to the means and outputting the erased dialogue with the bone conduction means, you can experience the realistic video and sound that the user has participated in, The dialogue assist voice is not confusing, each can be heard clearly, and other users can only hear the dialogue voice of the user, and the speech utterance practice using the video content can be performed efficiently. .

本発明のカラオケシステムを構成するカラオケ演奏装置のブロック構成図。The block block diagram of the karaoke performance apparatus which comprises the karaoke system of this invention. 映像コンテンツデータの概略的構成図。The schematic block diagram of video content data. 映像出力と台詞音声の骨伝導出力との同期方法の概念図。The conceptual diagram of the synchronizing method of an image output and the bone conduction output of a speech sound. 骨伝導手段の外観斜視図。The external appearance perspective view of a bone conduction means. 本発明のカラオケシステムのシステムフロー図。The system flow figure of the karaoke system of this invention. カラオケリモコン装置の外観斜視図。The external appearance perspective view of a karaoke remote control device. カラオケリモコン装置の利用者インタフェイス表示画面。User interface display screen of karaoke remote control device. カラオケリモコン装置の利用者インタフェイス表示画面。User interface display screen of karaoke remote control device.

Explanation of symbols

１カラオケ演奏装置
２中央制御手段
３ハードディスク
６ミキサ
９ＰＣＭデコーダ
１０ＭＰＥＧデコーダ
１１合成回路
１３ミキシング音声出力手段
１８出力制御手段
２４映像コード
２５映像データ
２６音響データ
Ｂ骨伝導手段
Ｄ表示手段
DESCRIPTION OF SYMBOLS 1 Karaoke performance apparatus 2 Central control means 3 Hard disk 6 Mixer 9 PCM decoder 10 MPEG decoder 11 Synthesis circuit 13 Mixing sound output means 18 Output control means 24 Video code 25 Video data 26 Acoustic data B Bone conduction means D Display means

Claims

A display means and a bone conduction means, having a video content including a character with dialogue, and erasing all of the video content or the speech of the predetermined character, and outputting the sound; Mainly a karaoke performance device equipped with mixing audio output means for mixing and outputting the speech of the user and the speech of the user,
The display means outputs the video content as video,
The bone conduction means is synchronized with the video output of the video content for a predetermined time in advance, and the erased speech is bone conduction output.
A karaoke system comprising output control means for performing control.