JP2016102982A

JP2016102982A - Karaoke system, program, karaoke voice reproduction method, and voice input processing device

Info

Publication number: JP2016102982A
Application number: JP2014242533A
Authority: JP
Inventors: 直樹楠本; Naoki Kusumoto; 友彦佐藤; Tomohiko Satou; 浩行鳥居; Hiroyuki Torii; 司濱中; Tsukasa Hamanaka; 慶彦秋山; Yoshihiko Akiyama; 素之池田; Motoyuki Ikeda; 宝玉宋; Bao-Yu Song; 充弘吉谷; Mitsuhiro Yoshiya; 正人松浦; Masato Matsuura; 陽平鶴貝; Yohei Tsurugai
Original assignee: Xing Inc; Pixela Corp; Jupiter Entertainment Co Ltd
Current assignee: Xing Inc; Pixela Corp; Jupiter Entertainment Co Ltd
Priority date: 2014-11-28
Filing date: 2014-11-28
Publication date: 2016-06-02
Anticipated expiration: 2034-11-28
Also published as: JP6568351B2

Abstract

PROBLEM TO BE SOLVED: To provide a karaoke device that can prevent the satisfaction level of a user from deteriorating due to the output delay of singing voice caused by a variety of processing in a communication terminal device.SOLUTION: A communication terminal device constituting a karaoke system is provided with a novel audio library 706 on a platform 70. The new library 706 outputs voice data from a microphone to an HDMI (R) port 402 in a path-through manner, generates duplicate voice data to supply it to a karaoke application 80.SELECTED DRAWING: Figure 3

Description

本発明は、カラオケシステム及び音声の入力処理を実行する音声入力処理装置等に関する。 The present invention relates to a karaoke system and a voice input processing device that executes voice input processing.

従来、ゲーム端末装置、セットトップボックス（以下、「ＳＴＢ」という。）、ＰＣ（パーソナルコンピュータ）等の通信端末装置に、ＵＳＢ（ＵｎｉｖｅｒｓａｌＳｅｒｉａｌＢｕｓ）及びＨＤＭＩ(登録商標（Ｈｉｇｈ−ＤｅｆｉｎｉｔｉｏｎＭｕｌｔｉｍｅｄｉａＩｎｔｅｒｆａｃｅ）)等の各種のインターフェースを設け、マイクロホン（以下、「マイク」という。）及びテレビ受像機を当該インターフェースに接続して、ユーザにカラオケを行わせる家庭用カラオケネットワークシステムが実現されている。 Conventionally, communication terminal devices such as game terminal devices, set-top boxes (hereinafter referred to as “STB”), PCs (personal computers), USB (Universal Serial Bus) and HDMI (registered trademark (High-Definition Multimedia Interface)) A home karaoke network system that allows a user to perform karaoke by connecting a microphone (hereinafter referred to as a “microphone”) and a television receiver to the interface is realized.

具体的には、この種の家庭用カラオケネットワークシステムにおいては、ＳＴＢなどの通信端末装置が、ネットワークに接続されたサーバ装置から、カラオケ用のガイドメロディを含む伴奏音のデータ（以下、「伴奏音データ」という。）と、当該楽曲の歌詞を示す歌詞データと、を含むカラオケ用楽曲データ（以下、「楽曲データ」という）を、取得するようになっている。 Specifically, in this type of home karaoke network system, a communication terminal device such as an STB transmits accompaniment sound data including a guide melody for karaoke (hereinafter, “accompaniment sound”) from a server device connected to the network. Karaoke music data (hereinafter referred to as “music data”) including data and lyrics data indicating the lyrics of the music.

そして、通信端末装置は、インターフェースに接続されたテレビ受像機から伴奏音を出力させるとともに、当該伴奏音と同期させつつ、当該テレビ受像機に歌詞を表示させ、ユーザが伴奏音等に従い、マイクにて集音したユーザの音声（以下、「歌唱音声」という。）も当該テレビ受像機から出力させるようになっている。 Then, the communication terminal device outputs the accompaniment sound from the television receiver connected to the interface, displays the lyrics on the television receiver while synchronizing with the accompaniment sound, and the user follows the accompaniment sound and the like on the microphone. The user's voice collected in this way (hereinafter referred to as “singing voice”) is also output from the television receiver.

特に、通信端末装置は、集音した歌唱音声をマイクにて所定形式の音声データ（以下、「ドライ音データ」という。）に変換し、当該ドライ音データに対して、各種のオーディオエフェクト処理（例えば、残響処理、キーチェンジ、ボイスチェンジ等のエフェクト処理）を施すとともに、残響音等のエフェクト音声を、歌唱音声及び伴奏音とともにテレビ受像機から出力させるように構成されている。 In particular, the communication terminal device converts the collected singing sound into sound data of a predetermined format (hereinafter referred to as “dry sound data”) with a microphone, and performs various audio effect processing (for the dry sound data). For example, reverberation processing, effect processing such as key change and voice change) is performed, and effect sound such as reverberation sound is output from the television receiver together with singing sound and accompaniment sound.

一方、この種のカラオケシステムにおいては、違和感なく、ユーザに歌唱させるため、歌詞の表示タイミングと伴奏音、歌唱音声等の出力タイミングを同期させることが要求されている。そこで、最近では、テレビ受像機における歌詞表示の遅延時間量を予め取得しておき、当該遅延時間分、マイクから出力されたドライ音データに対応する歌唱音声、伴奏音、エフェクト音声の出力タイミングを遅延させる制御を実行し、歌詞の表示タイミングを、歌唱音声、伴奏音及びエフェクト音声等の出力タイミングに同期させるシステムやその方法が提案されている（例えば特許文献１）。 On the other hand, in this kind of karaoke system, it is required to synchronize the display timing of lyrics and the output timing of accompaniment sounds, singing voices, etc. in order to let the user sing without a sense of incongruity. Therefore, recently, the delay time amount of the lyrics display in the television receiver is acquired in advance, and the output timing of the singing sound, the accompaniment sound, and the effect sound corresponding to the dry sound data output from the microphone for the delay time. There has been proposed a system and method for executing delay control and synchronizing the display timing of lyrics with the output timing of singing voice, accompaniment sound, effect voice, and the like (for example, Patent Document 1).

特開２０１０−２７６９４９号公報JP 2010-276949 A

しかしながら、特許文献１に記載の方法であっては、表示タイミングと音声の出力タイミングを同期させているものの、通信端末装置内における処理に起因して発生する歌唱音声の出力遅延については解消されておらず、家庭などにおいてカラオケを行う際のユーザの満足度を向上させる方法としては、十分と言えない。 However, in the method described in Patent Document 1, although the display timing and the audio output timing are synchronized, the output delay of the singing voice generated due to the processing in the communication terminal device is eliminated. In other words, it is not sufficient as a method for improving user satisfaction when performing karaoke at home.

本発明は、上記課題を解決するためになされたものであり、その目的は、通信端末装置内における各種の処理に基づく歌唱音声の出力遅延に起因したユーザの満足度の低下を防止することが可能なカラオケシステムなどを提供することにある。 The present invention has been made in order to solve the above-described problems, and its purpose is to prevent a decrease in user satisfaction due to output delay of singing voice based on various processes in the communication terminal device. It is to provide a karaoke system that can be used.

上記課題を解決するため、本発明は、複数の楽曲データから、少なくとも一の前記楽曲データを選択するユーザの指示入力を受け付ける受付手段と、前記選択された楽曲データを取得する取得手段と、外部機器に接続され、予め規定された手順に従って、入力された所定のデータを当該外部機器に出力するインターフェースと、ユーザが発声した音声が入力された場合に当該音声を所定形式の音声データに変換しつつ出力する音声入力手段から、当該音声データの入力を入力音声データとして受け付けるとともに、当該入力された入力音声データを複製して複製音声データを生成し、当該入力音声データ及び当該複製音声データのいずれか一方の音声データを前記インターフェースに出力する入出力管理手段と、前記入出力管理手段からインターフェースに出力された音声データとは異なる音声データを取得し、当該取得した音声データと、前記取得された楽曲データと、に基づき、所定の処理を実行し、当該処理した所定のデータを前記インターフェースに出力するデータ処理手段と、を備えることを特徴とする構成を有している。 In order to solve the above-described problem, the present invention provides a receiving unit that receives an instruction input from a user who selects at least one piece of music data from a plurality of pieces of music data, an acquisition unit that acquires the selected music data, and an external unit. An interface that is connected to a device and outputs predetermined data that has been input to the external device in accordance with a pre-defined procedure, and when a voice uttered by the user is input, the sound is converted into audio data in a predetermined format. While receiving the input of the voice data as input voice data from the voice input means that outputs, the input voice data that has been input is duplicated to generate duplicate voice data, and any of the input voice data and the duplicate voice data Input / output management means for outputting one of the audio data to the interface, and input / output from the input / output management means. Audio data different from the audio data output to the interface is acquired, predetermined processing is executed based on the acquired audio data and the acquired music data, and the processed predetermined data is transmitted to the interface. And a data processing means for outputting the data.

この構成により、本発明は、例えば、マイクなどの音声入力手段から出力された音声データ（すなわち、ドライ音のデータ（以下、「ドライ音データ」という。））を、インターフェースにパススルー出力し、又は、ドライ音を複製した複製音声データを直接インターフェースに出力することができるので、通信端末装置内における各種の処理に基づく歌唱音声の出力遅延に起因したユーザの満足度の低下を防止することができる。 With this configuration, the present invention, for example, outputs audio data output from audio input means such as a microphone (that is, dry sound data (hereinafter referred to as “dry sound data”)) to the interface, or In addition, since it is possible to directly output the duplicate voice data obtained by duplicating the dry sound to the interface, it is possible to prevent the user's satisfaction from being lowered due to the output delay of the singing voice based on various processes in the communication terminal device. .

一般に、家庭用のテレビ受像機の利用など民生機器として用いるカラオケシステムは、ユーザの歌唱音声をマイクによって集音された、Ａ／Ｄ（ＡｎａｌｏｇｔｏＤｉｇｉｔａｌ）変換して得られる音声（すなわち、ドライ音）のデータ（以下、「ドライ音データ」という。）を取得するとともに、当該ドライ音データに基づくオーディオエフェクト処理を実行し、生成したエフェクト音声を歌唱音声及び伴奏音と合成しつつ、インターフェースに接続された例えばテレビ受像機などの音声出力を伴った表示装置に合成された各種のデータを出力する構成が採用される。 In general, a karaoke system used as a consumer device such as a home television receiver uses voice (ie, dry sound) obtained by A / D (Analog to Digital) conversion of a user's singing voice collected by a microphone. ) Data (hereinafter referred to as “dry sound data”), audio effect processing based on the dry sound data is executed, and the generated effect sound is combined with the singing sound and accompaniment sound and connected to the interface For example, a configuration is employed in which various types of data synthesized are output to a display device with audio output such as a television receiver.

その一方で、通信端末装置は、ドライ音データのデジタル形式のデータに対してエフェクト処理等を実行するので、カラオケ用のアプリケーションを駆動させるために複数のレイヤーを有するプラットフォーム構成を採用すると、各レイヤーにおけるバッファリングに伴う遅延が大きくなる。 On the other hand, the communication terminal device performs effect processing on the digital data of the dry sound data, so if a platform configuration having a plurality of layers is used to drive a karaoke application, each layer The delay associated with buffering at the time increases.

そこで、本発明は、ドライ音データ又はその複製音声データを直接インターフェースに出力することによって、カラオケ用のアプリケーションを駆動させるために複数のレイヤーを有するプラットフォームにおいても、各種の処理に基づく歌唱音声の出力遅延を防止し、ユーザの満足度の低下を防止することができる。 Therefore, the present invention outputs singing sound based on various processes even in a platform having a plurality of layers for driving a karaoke application by directly outputting dry sound data or its duplicate sound data to an interface. A delay can be prevented and a decrease in user satisfaction can be prevented.

本発明の音声入力処理装置等は、通信端末装置内の処理による歌唱音声の出力遅延に起因したユーザの満足度の低下を防止することができる。 The voice input processing device and the like according to the present invention can prevent a decrease in user satisfaction due to a delay in the output of singing voice due to processing in the communication terminal device.

本発明に係るカラオケネットワークシステムにおける一実施形態の構成を示すシステム構成図である。It is a system configuration figure showing the composition of one embodiment in the karaoke network system concerning the present invention. 一実施形態の通信端末装置の構成を示すブロック図である。It is a block diagram which shows the structure of the communication terminal device of one Embodiment. 一実施形態の通信端末装置のプラットフォーム構成の一例を示す図である。It is a figure which shows an example of the platform structure of the communication terminal device of one Embodiment. 一実施形態のカラオケシステムにおいて表示される評価データを含むカラオケ背景画像の一例を示す図である。It is a figure which shows an example of the karaoke background image containing the evaluation data displayed in the karaoke system of one Embodiment. 一実施形態の通信端末装置においてカラオケを行う際に実行されるフローチャートである。It is a flowchart performed when performing a karaoke in the communication terminal device of one Embodiment.

以下、本発明の実施形態について、図面を参照しながら説明する。なお、以下に説明する実施形態は、家庭用のカラオケシステムを有し、ネットワークを介して、カラオケ用の楽曲データを提供してユーザにカラオケを行わせるカラオケネットワークシステムに対して、本発明のカラオケ装置、プログラム、カラオケ音声再生方法及び音声入力処理装置を適用した場合の実施形態である。 Hereinafter, embodiments of the present invention will be described with reference to the drawings. Note that the embodiment described below has a karaoke system for home use, has a karaoke system for home use, provides music data for karaoke via a network, and allows a user to perform karaoke. It is an embodiment in the case of applying a device, a program, a karaoke voice reproduction method, and a voice input processing device.

［１］カラオケネットワークシステム
まず、図１を用いて本実施形態のカラオケネットワークシステム１の概要構成について説明する。なお、図１は、本実施形態におけるカラオケネットワークシステム１の構成を示すシステム構成図である。また、図が煩雑になることを防止するために、図１においては、一部のカラオケシステム１０のみを示している。すなわち、実際のカラオケネットワークシステム１においては、図１に表示するよりも多数のカラオケシステム１０が存在している。 [1] Karaoke Network System First, a schematic configuration of the karaoke network system 1 of the present embodiment will be described with reference to FIG. FIG. 1 is a system configuration diagram showing the configuration of the karaoke network system 1 in the present embodiment. In order to prevent the figure from becoming complicated, only a part of the karaoke systems 10 is shown in FIG. That is, in the actual karaoke network system 1, there are more karaoke systems 10 than displayed in FIG.

本実施形態のカラオケネットワークシステム１は、ユーザが歌唱することを所望する楽曲に対応するカラオケ用の楽曲データを、ネットワーク２０を介して、ユーザの各家庭などの施設に設置された装置（例えば、民生用のカラオケシステム）に提供するとともに、当該装置によって楽曲データに含まれる伴奏音データを再生しつつ、当該楽曲の歌詞を表示させことによって、ユーザにカラオケを興趣させるためのシステムである。 The karaoke network system 1 according to the present embodiment is an apparatus (for example, a device installed in a facility such as each home of the user via the network 20 for karaoke music data corresponding to the music that the user desires to sing). This is a system for providing karaoke to the user by displaying the lyrics of the music while reproducing the accompaniment sound data included in the music data by the apparatus.

具体的には、本実施形態のカラオケネットワークシステム１は、各家庭や施設に設置され、ユーザにより利用される複数のカラオケシステム１０と、ネットワーク２０と、楽曲データを管理するとともに、カラオケシステム１０からの要求に応じて該当する楽曲データを提供する楽曲提供サーバ装置３０と、を有している。 Specifically, the karaoke network system 1 of the present embodiment is installed in each home or facility, and manages a plurality of karaoke systems 10, a network 20, and music data that are used by a user. And a music providing server device 30 that provides corresponding music data in response to the request.

なお、ネットワーク２０は、例えば、携帯電話網を含む公衆電話網と、ＩＰ（ＩｎｔｅｒｎｅｔＰｒｏｔｏｃｏｌ）ネットワークが相互接続されて構成される。ただし、当該ネットワーク２０の構成は、これに限られない。 The network 20 is configured, for example, by connecting a public telephone network including a mobile telephone network and an IP (Internet Protocol) network. However, the configuration of the network 20 is not limited to this.

本実施形態のカラオケシステム１０は、ＵＳＢ又はＨＤＭＩ（登録商標）等の所定の規格におけるインターフェースを有し、ユーザにカラオケを行わせるための処理を行う通信端末装置４０と、当該通信端末装置４０のＨＤＭＩ（登録商標）ポートなどの音声及び画像を出力するため出力ポート（以下、「デジタル出力ポート」ともいう。）に接続され、音声及び画像を出力するテレビ受像機５０と、当該通信端末装置４０のＵＳＢポートなどのデータ伝送用の入出力ポートに接続され、ユーザの発声した歌唱音声を集音するマイクロホン６０と、を有している。 The karaoke system 10 of the present embodiment includes an interface in a predetermined standard such as USB or HDMI (registered trademark), a communication terminal device 40 that performs processing for causing a user to perform karaoke, and the communication terminal device 40. A television receiver 50 connected to an output port (hereinafter also referred to as a “digital output port”) for outputting audio and images such as an HDMI (registered trademark) port, and outputting the audio and images, and the communication terminal device 40. The microphone 60 is connected to an input / output port for data transmission, such as a USB port, and collects the singing voice uttered by the user.

通信端末装置４０は、ケーブルテレビ放送、衛星放送、地上波テレビ放送（デジタル放送及びアナログ放送を含む。）、ＩＰ放送（例えばブロードバンドＶＯＤ）などの放送信号を受信し、テレビ受像機５０で視聴可能な信号に変換するセットトップボックス、パーソナルコンピュータ、ゲーム装置（携帯型ゲーム装置も含む）又は携帯用情報端末装置（例えば、スマートフォンやタブレット型端末装置）などの端末装置であり、ネットワーク２０に通信接続され、楽曲提供サーバ装置３０と連動しつつ、ユーザにカラオケを行わせるための処理を実行する構成を有している。 The communication terminal device 40 receives broadcast signals such as cable television broadcast, satellite broadcast, terrestrial television broadcast (including digital broadcast and analog broadcast), IP broadcast (for example, broadband VOD), and can be viewed on the television receiver 50. A terminal device such as a set-top box, a personal computer, a game device (including a portable game device) or a portable information terminal device (for example, a smartphone or a tablet-type terminal device) that converts it into a simple signal, and is connected to the network 20 In addition, it is configured to execute processing for allowing the user to perform karaoke while interlocking with the music providing server device 30.

具体的には、通信端末装置４０は、ＣＰＵやメモリといった所定のハードウエア、Ｌｉｎｕｘ（登録商標）系のオペレーティングシステム（ＯＳ）及び当該オペレーティングシステム上で駆動するアプリケーションによって、
（１）テレビ受像機５０及びマイクロホン６０などの外部機器のデバイスドライバとして機能するとともに各データの入出力を管理する各種のカーネルを含むインターフェースによって構成される第１のレイヤーと、
（２）第１のレイヤー上に、ヴァーチャルマシンなどのライブラリ（汎用性のある特定の機能を有するプログラム（群））、カーネルを除くオペレーティングシステム及びＡＰＩ（アプリケーションプログラムインターフェース）から構成され、第１のレイヤーにおけるインターフェースを制御する第２のレイヤーと、
（３）第２のレイヤー上に構成され、かつ、ＡＰＩ及びライブラリやオペレーティングシステム（ＯＳ）上で駆動し、カラオケ用のアプリケーション（以下、「カラオケアプリ」という。）８０及びその他のアプリケーションを実行する第３のレイヤーと、
を有する階層構造を有するプラットフォームに基づいて駆動し、カラオケを実行するための各機能を実行している。 Specifically, the communication terminal device 40 includes predetermined hardware such as a CPU and a memory, a Linux (registered trademark) operating system (OS), and an application driven on the operating system.
(1) a first layer configured by an interface including various kernels that function as device drivers of external devices such as the television receiver 50 and the microphone 60 and manage input / output of each data;
(2) On the first layer, a library such as a virtual machine (program (group) having a specific function having general versatility), an operating system excluding the kernel, and an API (application program interface) are configured. A second layer that controls the interface in the layer;
(3) It is configured on the second layer and is driven on the API and library or operating system (OS) to execute an application for karaoke (hereinafter referred to as “karaoke application”) 80 and other applications. The third layer,
It is driven on the basis of a platform having a hierarchical structure having a function to execute karaoke.

そして、通信端末装置４０は、カラオケを実行するための各機能として
（１）ユーザの指示入力を受け付けるとともに、当該取得した楽曲データに基づき、楽曲提供サーバ装置３０から、ユーザの所望する楽曲に対応する楽曲データを取得する楽曲データ取得機能と、
（２）当該取得した楽曲データに基づき、テレビ受像機５０にて伴奏音を再生させるとともに、当該楽曲の歌詞をテレビ受像機５０に表示させる楽曲データ再生表示制御機能と、
（３）伴奏音及び歌詞に従ってユーザが発声した音声が入力された際に、マイクロホン６０によってデジタル信号に変換されたドライ音データを取得するとともに、当該ドライ音データに基づくエフェクト処理を実行し、エフェクト音声データを生成するエフェクト処理機能と、
（４）ドライ音データ及び楽曲データに含まれる伴奏音データに基づき、ユーザの歌唱力を評価し、当該評価結果を示す画像データを評価データとして生成するとともに、生成した評価データをテレビ受像機５０にて表示させる採点機能と、
を実現する構成を有している。 Then, the communication terminal device 40 receives (1) an instruction input from the user as each function for executing karaoke, and responds to the music desired by the user from the music providing server device 30 based on the acquired music data. A music data acquisition function for acquiring music data to be
(2) Based on the acquired music data, a music data reproduction display control function for reproducing accompaniment sounds on the television receiver 50 and displaying the lyrics of the music on the television receiver 50;
(3) When the voice uttered by the user according to the accompaniment sound and the lyrics is input, the dry sound data converted into the digital signal by the microphone 60 is acquired, and the effect processing based on the dry sound data is executed. Effects processing function to generate audio data,
(4) Based on the accompaniment sound data included in the dry sound data and the music data, the user's singing ability is evaluated, image data indicating the evaluation result is generated as evaluation data, and the generated evaluation data is used as the television receiver 50. Scoring function to be displayed at
It has the structure which realizes.

特に、通信端末装置４０は、マイクロホン６０から出力されたドライ音データの入力を入力音声データとして受け付けると、当該入力された入力音声データを複製して複製音声データを生成し、入力音声データ及び複製音声データのいずれか一方の音声データを、アプリケーション（すなわち、カラオケアプリ８０）に基づく処理を行うことなく、デバイスドライバ及び入出力ポートを介して、直接、テレビ受像機５０に出力する構成を有している。 In particular, when receiving the input of the dry sound data output from the microphone 60 as the input sound data, the communication terminal device 40 duplicates the input sound data that has been input to generate duplicate sound data. It has a configuration in which any one of the audio data is directly output to the television receiver 50 via the device driver and the input / output port without performing processing based on the application (that is, the karaoke application 80). ing.

通常、上記のプラットフォーム構造によってカラオケアプリなどのアプリケーションを実現する場合には、実際のハードウエアを制御し、かつ、データの入出力を管理する第１のレイヤーから各レイヤーのサブレイヤーも含めてカラオケアプリを実行する最上位のレイヤー（以下、「アプリケーションレイヤー」ともいう。）まで各レイヤーで種々の処理が実行されることになる。 Normally, when an application such as a karaoke application is realized by the above platform structure, the karaoke including the sub-layer of each layer from the first layer that controls the actual hardware and manages the input / output of data is performed. Various processes are executed in each layer up to the highest layer (hereinafter also referred to as “application layer”) for executing the application.

したがって、マイクロホン６０から出力された入力音声データ、すなわち、ドライ音データは、サブレイヤーを含めてプラットフォームの各レイヤーにおいて、それぞれ、データのバッファリングが行われるため、その度に数十ｍｓｅｃ程度の遅延が発生することとなる。 Therefore, the input audio data output from the microphone 60, that is, the dry sound data, is buffered in each layer of the platform including the sublayer, and therefore a delay of about several tens of msec each time. Will occur.

特に、マイクロホン６０から出力されたドライ音データを最上位のアプリケーションレイヤーで駆動するカラオケアプリに供給してエフェクト処理を施し、さらに、エフェクト処理された音声データ（以下、「エフェクト音声データ」という。）及びそれに用いたドライ音データを合成しつつ、サブレイヤーを含めて各レイヤーで所定の処理が実行されて最下位のデバイスドライバからテレビ受像機５０に供給することなる。 In particular, the dry sound data output from the microphone 60 is supplied to a karaoke app that is driven by the highest application layer to perform effect processing, and further, sound data subjected to effect processing (hereinafter referred to as “effect sound data”). In addition, while synthesizing the dry sound data used therefor, predetermined processing is executed in each layer including the sub-layer, and the data is supplied from the lowest device driver to the television receiver 50.

このため、ＨＤＭＩ（登録商標）における伝送及びテレビ受像機５０における遅延を無視したとしても、当該エフェクト音声データ及びドライ音データは、ユーザの歌唱及び発声タイミングから１１０〜１３０ｍｓｅｃ程度遅延して当該テレビ受像機５０に供給されることになる。 Therefore, even if transmission in HDMI (registered trademark) and delay in the television receiver 50 are ignored, the effect sound data and dry sound data are delayed by about 110 to 130 msec from the user's singing and utterance timing. Will be supplied to the machine 50.

したがって、ユーザがテレビ受像機５０から出力された伴奏音に基づいて歌唱したとしても、自己の歌唱音が遅延されて出力されることによって伴奏音と歌唱音がずれるので、歌唱しにくいだけでなく、それを聴取するにも違和感が生じ、採点機能その他のカラオケに関する付属機能も十分に発揮させることができない場合も多い。 Therefore, even if the user sings based on the accompaniment sound output from the television receiver 50, the accompaniment sound and the singing sound are shifted due to the delayed output of the singing sound of the user. Listening to it often causes a sense of incongruity, and in many cases the scoring function and other attached functions related to karaoke cannot be fully exhibited.

そこで、本実施形態の通信端末装置４０においては、入力されたドライ音データを複製し、アプリケーションのデータ処理に用いるデータと必要最低限の経路によってテレビ受像機５０に出力されるデータとを別々にすることによって、ドライ音データの出力遅延を低減させつつ、カラオケアプリの処理を的確に実行させ、ユーザの満足度の低下を防止することができるカラオケを実現している。 Therefore, in the communication terminal device 40 of the present embodiment, the input dry sound data is duplicated, and the data used for application data processing and the data output to the television receiver 50 through the minimum necessary route are separately provided. By doing so, the karaoke application can be accurately executed while reducing the output delay of the dry sound data, thereby realizing a karaoke that can prevent a decrease in user satisfaction.

特に、通信端末装置４０は、上記のプラットフォーム内（具体的には、第２のレイヤー）に新規のオーディオライブラリ（以下、「新規ライブラリ」という。）を設け、当該新規ライブラリ７０６にて、マイクロホン６０から、ＵＳＢポート４０１、Ｋｅｒｎｅｌ７０１及びＡｌｓａ−Ｌｉｂ７０２を介して入力されたドライ音データを複製しつつ、複製に用いられたドライ音データ又は複製により生成されたドライ音データ（複製音声データ）を、テレビ受像機５０に直接出力（パススルー出力）し、他方のドライ音データをカラオケアプリ８０に供給するようになっている。 In particular, the communication terminal apparatus 40 provides a new audio library (hereinafter referred to as “new library”) in the above platform (specifically, the second layer), and the microphone 60 is stored in the new library 706. The dry sound data input from the USB port 401, the Kernel 701, and the Alsa-Lib 702 is duplicated, and the dry sound data used for the duplication or the dry sound data (duplicate audio data) generated by the duplication is transmitted to the TV. It directly outputs to the receiver 50 (pass-through output), and supplies the other dry sound data to the karaoke application 80.

なお、本実施形態においては、複製に用いたドライ音データを、インターフェースを介してテレビ受像機５０に直接出力し、複製されたドライ音データをカラオケアプリ８０に用いてもよい。 In the present embodiment, the dry sound data used for copying may be directly output to the television receiver 50 via the interface, and the copied dry sound data may be used for the karaoke application 80.

テレビ受像機５０は、デジタル入力ポートとして、少なくともＨＤＭＩ（登録商標）ポートを有する民生用のテレビ受像機である。特に、テレビ受像機５０は、液晶パネル、プラズマディスプレイパネル等の表示素子と、表示素子を駆動する駆動回路及びスピーカ等を備え、例えば、通信端末装置４０のＨＤＭＩ（登録商標）ポートを介して供給された音声データ及び画像データに基づき、歌唱音声、伴奏音及びエフェクト音声を出力するとともに、歌詞及び評価データの表示を行う。 The television receiver 50 is a consumer television receiver having at least an HDMI (registered trademark) port as a digital input port. In particular, the television receiver 50 includes a display element such as a liquid crystal panel or a plasma display panel, a drive circuit that drives the display element, a speaker, and the like, and is supplied via, for example, an HDMI (registered trademark) port of the communication terminal device 40. Based on the audio data and the image data, the singing voice, the accompaniment sound and the effect voice are output, and the lyrics and the evaluation data are displayed.

マイクロホン６０は、例えば、ＵＳＢマイクであり、ユーザの歌唱音声を集音し、当該歌唱音声をＡ／Ｄ変換して、音声データ（すなわち、ドライ音データ）を生成し、通信端末装置４０に出力する。 The microphone 60 is, for example, a USB microphone, collects a user's singing voice, A / D converts the singing voice, generates voice data (that is, dry sound data), and outputs the voice data to the communication terminal device 40. To do.

楽曲提供サーバ装置３０は、（１）カラオケシステム１０に提供可能な楽曲データと、（２）当該楽曲を識別するための楽曲ＩＤと、（３）当該楽曲の曲名、楽曲の属するジャンルを示すジャンル情報及びアーティスト名を含む楽曲属性情報と、が対応付けて記録される楽曲データベース（以下、「データベースをＤＢ」という。）３００を有し、ネットワーク２０を介して、通信端末装置４０から受信した要求（以下、「楽曲データ取得要求」という。）に応じて該当する楽曲データを通信端末装置４０に配信することが可能な構成を有している。 The music providing server device 30 includes: (1) music data that can be provided to the karaoke system 10; (2) a music ID for identifying the music; and (3) a genre indicating the music name of the music and the genre to which the music belongs. A request received from the communication terminal device 40 via the network 20 has a music database (hereinafter referred to as “database DB”) 300 in which information and music attribute information including artist names are recorded in association with each other. (Hereinafter referred to as “music data acquisition request”), the corresponding music data can be distributed to the communication terminal device 40.

特に、楽曲提供サーバ装置３０は、楽曲データ取得要求に基づいて、楽曲ＤＢ３００を検索し、当該検索によりヒットした楽曲の一覧（歌手名一覧や曲名一覧など）を生成して、該当する通信端末装置４０に提供するとともに、当該一覧に応じてユーザが選曲した楽曲の楽曲データを楽曲ＤＢ３００から読み出して、通信端末装置４０に配信するようになっている。 In particular, the music providing server device 30 searches the music DB 300 based on the music data acquisition request, generates a list of songs hit by the search (singer name list, song name list, etc.), and the corresponding communication terminal device. 40, the music data of the music selected by the user according to the list is read from the music DB 300 and distributed to the communication terminal device 40.

なお、楽曲ＤＢ３００に記録される楽曲データには、当該楽曲の歌詞に対応した歌詞データと、カラオケ用の伴奏音データと、が含まれる。そして、伴奏音データは、例えば、ＭＩＤＩ（ＭｕｓｉｃａｌＩｎｓｒｕｍｅｎｔＤｉｇｉｔａｌＩｎｔｅｒｆａｃｅ）形式にて構成されており、ガイドメロディデータ、ギター等の各パートに対応するデータが含まれている。 The music data recorded in the music DB 300 includes lyrics data corresponding to the lyrics of the music and accompaniment sound data for karaoke. The accompaniment sound data is configured in, for example, a MIDI (Musical Instrument Digital Interface) format, and includes data corresponding to each part such as guide melody data and guitar.

［２］通信端末装置
次に、図２〜４を参照しつつ、本実施形態の通信端末装置４０の構成について説明する。なお、図２は、本実施形態の通信端末装置４０の構成を示すブロック図であり、図３は、通信端末装置４０のプラットフォーム構成を示す図である。また、図４は、評価データを含むカラオケ背景画像の表示例を示す図である。 [2] Communication Terminal Device Next, the configuration of the communication terminal device 40 of the present embodiment will be described with reference to FIGS. 2 is a block diagram illustrating a configuration of the communication terminal device 40 according to the present embodiment, and FIG. 3 is a diagram illustrating a platform configuration of the communication terminal device 40. FIG. 4 is a diagram illustrating a display example of a karaoke background image including evaluation data.

本実施形態の通信端末装置４０は、図２に示すように、外部機器とのデータの授受を仲介するインターフェース部４００と、通信制御部４１０と、端末管理制御部４７０及びプログラム管理制御部４８０と連動しつつ、テレビ受像機５０に供給する描画用データを生成する表示制御部４４０と、操作部４６０と、端末管理制御部４７０と、カラオケアプリ８０を含む各種のアプリケーションを実行するプログラム管理制御部４８０と、を有している。そして、各部は、バスＢを介して、相互に接続され、各構成要素間のデータの授受が実現されるようになっている。 As shown in FIG. 2, the communication terminal device 40 of the present embodiment includes an interface unit 400 that mediates exchange of data with an external device, a communication control unit 410, a terminal management control unit 470, and a program management control unit 480. A program management control unit that executes various applications including a display control unit 440 that generates drawing data to be supplied to the television receiver 50, an operation unit 460, a terminal management control unit 470, and a karaoke application 80 while interlocking with each other. 480. The units are connected to each other via a bus B so that data can be exchanged between the components.

なお、操作部４６０は、リモコンにより実現することも可能であり、この場合には、操作部４６０に、赤外線受光部又はＺｉｇＢｅｅ（ジグビー）の受信機を設け、受信された信号に基づく命令コマンドをバスＢに供給する構成にすればよい。 The operation unit 460 can also be realized by a remote controller. In this case, the operation unit 460 is provided with an infrared light receiving unit or a Zig Bee receiver, and a command command based on the received signal is provided. May be supplied to the bus B.

インターフェース部４００には、ＵＳＢポート４０１と、ＨＤＭＩ（登録商標）ポート４０２と、が設けられるとともに、ＵＳＢ及びＨＤＭＩ（登録商標）の各規格に応じてデータ変換その他の処理を実行し、デバイスドライバなど各インターフェースに対応する入出力制御用のチップが設けられている。 The interface unit 400 is provided with a USB port 401 and an HDMI (registered trademark) port 402, and performs data conversion and other processing in accordance with USB and HDMI (registered trademark) standards, and includes a device driver and the like. An input / output control chip corresponding to each interface is provided.

特に、本実施形態のインターフェース部４００は、図３に示すように、プラットフォーム７０上においては、デバイスドライバとして機能するＬｉｎｕｘ（登録商標）等のＫｅｒｎｅｌ７０１及びＡｌｓａ−Ｌｉｂ７０２を有し、当該プラットフォーム７０の第１のレイヤーを構成する。 In particular, as illustrated in FIG. 3, the interface unit 400 according to the present embodiment includes a kernel 701 such as Linux (registered trademark) and an Alsa-Lib 702 that function as device drivers on the platform 70. Configure one layer.

なお、Ｋｅｒｎｅｌ７０１及びＡｌｓａ−Ｌｉｂ７０２は、従来の機能と同様であるため、詳細を省略する。また、例えば、本実施形態のＫｅｒｎｅｌ７０１及びＡｌｓａ−Ｌｉｂ７０２を含むインターフェース部４００は、本発明の第１のレイヤー（インターフェース）を構成する。 Since Kernel 701 and Alsa-Lib 702 have the same functions as those of the related art, their details are omitted. Further, for example, the interface unit 400 including the Kernel 701 and the Alsa-Lib 702 of this embodiment constitutes a first layer (interface) of the present invention.

また、ＵＳＢポート４０１には、例えば、マイクロホン６０が、接続されるとともに、マイクロホン６０から出力されるドライ音データがバスＢに出力される。そして、ＨＤＭＩ（登録商標）ポート４０２には、テレビ受像機５０が接続され、テレビ受像機５０に画像データ及び音声データが出力される、 In addition, for example, a microphone 60 is connected to the USB port 401, and dry sound data output from the microphone 60 is output to the bus B. A television receiver 50 is connected to the HDMI (registered trademark) port 402, and image data and audio data are output to the television receiver 50.

なお、インターフェース部４００は、例えば、プログラム管理制御部４８０の一部とともに本発明のインターフェースを構成する。 The interface unit 400 constitutes an interface of the present invention together with a part of the program management control unit 480, for example.

通信制御部４１０は、所定のネットワークインターフェースであり、有線又は無線にて、ネットワーク２０と通信接続され、ネットワーク２０を介して楽曲提供サーバ装置３０との間でデータの授受を行う。 The communication control unit 410 is a predetermined network interface, is connected to the network 20 by wire or wirelessly, and exchanges data with the music providing server device 30 via the network 20.

表示制御部４４０は、端末管理制御部４７０及びプログラム管理制御部４８０と連動し、プログラム管理制御部４８０から供給される歌詞データ及び評価データに基づき、テレビ受像機５０にて、カラオケ用の背景画像上に、歌詞及び評価データを示す画像を表示させるための描画データを生成し、ＨＤＭＩ（登録商標）ポート４０２を介して、テレビ受像機５０に供給する。 The display control unit 440 is linked with the terminal management control unit 470 and the program management control unit 480, and based on the lyrics data and the evaluation data supplied from the program management control unit 480, the television receiver 50 uses the background image for karaoke. Above, drawing data for displaying an image showing lyrics and evaluation data is generated and supplied to the television receiver 50 via the HDMI (registered trademark) port 402.

記録部４５０は、ＥＥＰＲＯＭ（ＥｌｅｃｔｒｉｃａｌｌｙＥｒａｓａｂｌｅＰｒｏｇｒａｍｍａｂｌｅＲｅａｄＯｎｌｙＭｅｍｏｒｙ）、ＨＤＤ（ＨａｒｄＤｉｓｃＤｒｉｖｅ）、又は、ＳＳＤ（ＳｏｌｉｄＳｔａｔｅＤｒｉｖｅ）等の不揮発性のメモリにより構成され、その記録領域内に、ライブラリやオペレーティングシステム（ＯＳ）などプラットフォームを形成するためのプログラムが記録されるプログラム記録部４５１と、カラオケアプリ８０及びその他のアプリケーションが記録されるアプリケーション記録部４５２と、楽曲データ記録部４５３と、ワークエリアとして利用されるＲＡＭ４５４と、を有している。 The recording unit 450 includes a non-volatile memory such as an EEPROM (Electrically Erasable Programmable Read Only Memory), an HDD (Hard Disc Drive), or an SSD (Solid State Drive), and the recording area 450 includes an operating system and a recording area thereof. (OS) A program recording unit 451 for recording a program for forming a platform, an application recording unit 452 for recording a karaoke application 80 and other applications, a music data recording unit 453, and a work area. RAM 454.

プログラム記録部４５１には、ＵＳＢポート４０１及びＨＤＭＩ（登録商標）ポート４０２を有するインターフェース部４００を制御し、プログラム管理制御部４８０を動作させるためのプラットフォーム７０の構築に用いる各種のデータが記録される。なお、本実施形態のプラットフォーム７０については後述する。 The program recording unit 451 records various data used to construct the platform 70 for controlling the interface unit 400 having the USB port 401 and the HDMI (registered trademark) port 402 and operating the program management control unit 480. . The platform 70 of this embodiment will be described later.

楽曲データ記録部４５３には、通信端末装置４０が、楽曲提供サーバ装置３０から取得（ダウンロード）した、楽曲データが、対応する楽曲ＩＤ及び楽曲属性情報と対応付けて、記録され、取得済みの楽曲データを利用して、カラオケを行う際に利用可能にされる。 In the music data recording unit 453, music data acquired (downloaded) by the communication terminal device 40 from the music providing server device 30 is recorded in association with the corresponding music ID and music attribute information, and has already been acquired. It is made available when performing karaoke using data.

端末管理制御部４７０は、主に中央演算処理装置（ＣＰＵ）によって構成されるとともに、キー入力ポート、表示制御ポート等の各種入出力ポートを含み、プログラム記録部４５１及びアプリケーション記録部４５２に記録された各種のプログラム及びアプリケーションを実行することにより、通信端末装置４０の全般的な機能を統合的に制御する。 The terminal management control unit 470 is mainly composed of a central processing unit (CPU) and includes various input / output ports such as a key input port and a display control port, and is recorded in the program recording unit 451 and the application recording unit 452. By executing the various programs and applications, the overall functions of the communication terminal device 40 are controlled in an integrated manner.

プログラム管理制御部４８０は、端末管理制御部４７０と同一又は独立したＣＰＵにより構成される。そして、プログラム管理制御部４８０は、端末管理制御部４７０による制御の下、ＲＡＭ４５４及びインターフェース部４００と連動しつつ、プログラム記録部４５１に記録されたオペレーティングシステム（ＯＳ）などのプログラムを実行してプラットフォーム７０を構築し、かつ、当該プラットフォーム７０により提供される環境下にて、上記カラオケアプリ８０を実行し、カラオケに関する各機能を実現する。 The program management control unit 480 includes a CPU that is the same as or independent from the terminal management control unit 470. The program management control unit 480 executes a program such as an operating system (OS) recorded in the program recording unit 451 while interlocking with the RAM 454 and the interface unit 400 under the control of the terminal management control unit 470. 70 is constructed, and the karaoke application 80 is executed in an environment provided by the platform 70 to realize each function relating to karaoke.

特に、プログラム管理制御部４８０は、通信端末装置４０の駆動中に、ＣＰＵなどのハードウエア及びインターフェース部４００とともにプラットフォーム７０を構築するため、アンドロイドなどのＬｉｎｕｘ（登録商標）系のオペレーティングシステム（ＯＳ）及びその他の必要なプログラムを実行するプログラム実行部４８１と、当該オペレーティングシステム上で駆動するカラオケアプリ８０を実行するアプリケーション実行部４８２と、を有している。 In particular, since the program management control unit 480 constructs the platform 70 together with hardware such as a CPU and the interface unit 400 while the communication terminal device 40 is being driven, a Linux (registered trademark) operating system (OS) such as Android is used. And a program execution unit 481 for executing other necessary programs, and an application execution unit 482 for executing the karaoke application 80 driven on the operating system.

プログラム実行部４８１は、電源が投入された際などの所定のタイミングにおいて、インターフェース部４００と連動し、プログラム記録部４５１からアンドロイドなどのＬｉｎｕｘ（登録商標）系のオペレーティングシステム（ＯＳ）その他の必要なプログラムを読み出して、図３に示すプラットフォーム７０上における第２のレイヤーを形成させる。 The program execution unit 481 is linked with the interface unit 400 at a predetermined timing such as when the power is turned on, and from the program recording unit 451 to a Linux (registered trademark) operating system (OS) such as an Android or other necessary The program is read to form a second layer on the platform 70 shown in FIG.

特に、プログラム実行部４８１は、オペレーティングシステム（ＯＳ）を実行することよって、ＨＡＬ７０３、Ａｕｄｉｏ−Ｆｒｉｎｇｅｒ７０４及びＯｐｅｎＳＬＥＳ７０５を形成し、デバイスドライバなどの一部のＯＳ機能を除くＯＳの機能を実行する。 In particular, the program execution unit 481 forms an HAL 703, an Audio-Finger 704, and an Open SLES 705 by executing an operating system (OS), and executes functions of the OS excluding some OS functions such as a device driver.

具体的には、プログラム実行部４８１は、インターフェース部４００（すなわち、デバイスドライバ）を制御し、操作部４６０の操作又はアプリケーション実行部４８２の制御に基づいて、ドライ音データや楽曲データその他の各種のデータの入出力を制御するとともに、アプリケーション実行部４８２における各種の処理に応じて必要な処理を実行する。 Specifically, the program execution unit 481 controls the interface unit 400 (that is, a device driver), and based on the operation of the operation unit 460 or the control of the application execution unit 482, dry sound data, music data, and other various types. Data input / output is controlled, and necessary processing is executed according to various types of processing in the application execution unit 482.

また、プログラム実行部４８１は、本実施形態の特徴的な構成として、第２のレイヤー上に、ＵＳＢポート４０１を介して、マイクロホン６０からインターフェース部４００に含まれるＫｅｒｎｅｌ７０１及びＡｌｓａ−Ｌｉｂ７０２（すなわちデバイスドライバ）を介して供給されたドライ音データを、これらのＡｌｓａ−Ｌｉｂ７０２及びＫｅｒｎｅｌ７０１を介してＨＤＭＩ（登録商標）ポート４０２に直接パススルー出力するとともに、当該ドライ音データを複製し、複製した複製音声データを、カラオケアプリ８０に供給する新規ライブラリ７０６を形成し、その実行を制御する。 Further, as a characteristic configuration of the present embodiment, the program execution unit 481 includes a Kernel 701 and an Alsa-Lib 702 (that is, a device driver) included in the interface unit 400 from the microphone 60 via the USB port 401 on the second layer. ) Via the Alsa-Lib 702 and Kernel 701, the dry sound data supplied via the) is directly passed through to the HDMI (registered trademark) port 402, and the dry sound data is duplicated. , A new library 706 to be supplied to the karaoke application 80 is formed and its execution is controlled.

すなわち、プログラム実行部４８１は、本実施形態のプラットフォーム７０におけるＨＡＬ７０３、Ａｕｄｉｏ−Ｆｒｉｎｇｅｒ７０４及びＯｐｅｎＳＬＥＳ７０５を介在させることなく、ＵＳＢポートに４０１に入力されたドライ音データをＨＤＭＩ（登録商標）ポート４０２に直接供給する構成を有している。 That is, the program execution unit 481 directly inputs the dry sound data input to the USB port 401 to the HDMI (registered trademark) port 402 without interposing the HAL 703, the audio-ringer 704, and the open SLES 705 in the platform 70 of the present embodiment. It has a configuration to supply.

なお、このような構成を有することによって、本実施形態においては、ユーザの発声タイミングからテレビ受像機５０に音データが出力されるまでの遅延を効果的に抑制することができるとともに、複製音声データに基づくエフェクト処理及び採点処理を的確に実行することができるので、カラオケを行う際のユーザの満足度を向上させることできるようになっている。 In addition, by having such a configuration, in the present embodiment, it is possible to effectively suppress a delay from the user's utterance timing until the sound data is output to the television receiver 50, and the duplicate audio data. Since the effect process and scoring process based on can be performed accurately, the user's satisfaction when performing karaoke can be improved.

特に、本実施形態のプラットフォーム７０において、テレビ受像機５０から出力される歌唱音声の遅延時間数を測定した結果、その遅延時間数は、７５〜８５ｍｓｅｃとなり、従来のカラオケシステムにおける、遅延時間１１０〜１３０ｍｓｅｃ程度の遅延時間と比較して、２２〜４２％程度の歌唱音声の遅延時間を短縮できることが確認されている。 In particular, as a result of measuring the delay time of the singing voice output from the television receiver 50 in the platform 70 of the present embodiment, the delay time is 75 to 85 msec, which is the delay time 110 to 110 in the conventional karaoke system. It has been confirmed that the delay time of the singing voice of about 22 to 42% can be shortened compared with the delay time of about 130 msec.

また、複製音声データの生成に必要な時間（期間）は、プラットフォーム７０の各サブレイヤーにおけるバッファリング時間の和よりも少なくすることができるので、プログラム実行部４８１は、ドライ音データに代えて複製音声データを、Ａｌｓａ−Ｌｉｂ７０２及びＫｅｒｎｅｌ７０１を介して、ＨＤＭＩ（登録商標）ポート４０２に出力してもよい。 In addition, since the time (period) required for generating the duplicate audio data can be made smaller than the sum of the buffering times in each sublayer of the platform 70, the program execution unit 481 can duplicate the dry sound data instead of the dry sound data. Audio data may be output to the HDMI (registered trademark) port 402 via the Alsa-Lib 702 and the Kernel 701.

さらに、プラットフォーム７０を構成するＨＡＬ７０３、Ａｕｄｉｏ−Ｆｒｉｎｇｅｒ７０４及びＯｐｅｎＳＬＥＳ７０５は、従来の機能と同様であるため、その詳細を省略する。そして、例えば、本実施形態のＨＡＬ７０３、Ａｕｄｉｏ−Ｆｒｉｎｇｅｒ７０４及びＯｐｅｎＳＬＥＳ７０５と、新規ライブラリ７０６とは、本発明の第２のレイヤーを構成し、当該新規ライブラリ７０６は、発明の入出力管理手段を構成する。 Further, the HAL 703, the Audio-Fringer 704, and the Open SLES 705 constituting the platform 70 are the same as the conventional functions, and thus the details thereof are omitted. For example, the HAL 703, the audio-ringer 704, the OpenSLES 705, and the new library 706 of this embodiment constitute a second layer of the present invention, and the new library 706 constitutes an input / output management means of the present invention.

アプリケーション実行部４８２は、実行しているＯＳ上でカラオケアプリ８０を実行することによって、楽曲データ取得する楽曲データ取得管理部４８３と、エフェクト処理を実行するエフェクト処理部４８４と、楽曲データに基づいて楽曲データ再生表示するカラオケ処理部４８５と、カラオケの採点を行う採点処理部４８６と、を実現する。 The application execution unit 482 executes the karaoke application 80 on the running OS, thereby acquiring a music data acquisition management unit 483 that acquires music data, an effect processing unit 484 that executes effect processing, and the music data. A karaoke processing unit 485 for reproducing and displaying music data and a scoring processing unit 486 for scoring karaoke are realized.

特に、アプリケーション実行部４８２は、プログラム実行部４８１と連動し、電源が投入された際などの所定のタイミングにおいて、アプリケーション記録部４５２からカラオケアプリ８０を読み出して、図３に示すプラットフォーム７０上における第３のレイヤーを形成させ、上記の各部を実現する。なお、例えば、本実施形態のカラオケアプリ８０は、本発明の第３のレイヤーを構成する。 In particular, the application execution unit 482 operates in conjunction with the program execution unit 481 to read the karaoke app 80 from the application recording unit 452 at a predetermined timing such as when the power is turned on, and to 3 layers are formed to realize each of the above parts. For example, the karaoke application 80 of the present embodiment constitutes the third layer of the present invention.

楽曲データ取得管理部４８３は、通信制御部４１０を介して、楽曲提供サーバ装置３０から楽曲データを取得するための処理を実行し、楽曲提供サーバ装置３０から取得した楽曲データを対応する楽曲ＩＤ等と対応付けつつ、楽曲データ記録部４５３に記録させる。 The music data acquisition management unit 483 executes processing for acquiring music data from the music providing server device 30 via the communication control unit 410, and the music data corresponding to the music data acquired from the music providing server device 30 is used. Are recorded in the music data recording unit 453.

エフェクト処理部４８４は、新規ライブラリ７０６から供給される複製音声データに基づき、オーディオエフェクト処理を実行する。 The effect processing unit 484 executes audio effect processing based on the duplicate audio data supplied from the new library 706.

例えば、エフェクト処理部４８４は、複製音声データに基づき、残響音を生成し、当該残響音を畳み込んで得られるエフェクト音声データを生成して、プラットフォーム７０を介して、ＨＤＭＩ（登録商標）ポート４０２に出力する。 For example, the effect processing unit 484 generates reverberation sound based on the replicated sound data, generates effect sound data obtained by convolving the reverberation sound, and transmits the HDMI (registered trademark) port 402 via the platform 70. Output to.

なお、エフェクト音の遅延は、ユーザの満足度低下に影響を与えないので、ＨＡＬ７０３、Ａｕｄｉｏ−Ｆｒｉｎｇｅｒ７０４及びＯｐｅｎＳＬＥＳ７０５を介して、ＨＤＭＩ（登録商標）ポート４０２にエフェクト音声データを出力する場合にも、これらのサブレイヤーにおいてバッファリングが行われることに起因する音の出力遅延は、問題にならない。 Note that the delay of the effect sound does not affect the satisfaction level of the user, so even when the effect sound data is output to the HDMI (registered trademark) port 402 via the HAL 703, the Audio-Finger 704, and the Open SLES 705. Sound output delay due to buffering in these sublayers is not a problem.

カラオケ処理部４８５は、操作部４６０の操作に基づいて楽曲データ記録部４５３から楽曲データを取得し、当該取得した楽曲データに含まれるＭＩＤＩ形式の伴奏音データを所定形式の音データ（以下、単に「伴奏音データ」という。）に変換して、プラットフォーム７０を介してＨＤＭＩ（登録商標）ポート４０２に供給する。 The karaoke processing unit 485 acquires music data from the music data recording unit 453 based on the operation of the operation unit 460, and converts the accompaniment sound data in MIDI format included in the acquired music data into sound data (hereinafter simply referred to as “sound data”). Converted to “accompaniment sound data”) and supplied to the HDMI (registered trademark) port 402 via the platform 70.

このとき、プラットフォーム７０においては、各サブレイヤーにおいて伴奏音データが、バッファリングされるとともに、Ｋｅｒｎｅｌ７０１、又は、Ａｌｓａ−Ｌｉｂ７０２が、伴奏音データをドライ音データと合成して、ＨＤＭＩ（登録商標）ポート４０２に供給する。 At this time, in the platform 70, the accompaniment sound data is buffered in each sub-layer, and the Kernel 701 or Alsa-Lib 702 synthesizes the accompaniment sound data with the dry sound data to generate an HDMI (registered trademark) port. 402.

また、カラオケ処理部４８５は、表示制御部４４０と連動しつつ、プラットフォーム７０を介して、楽曲データに含まれる歌詞データをＨＤＭＩ（登録商標）ポート４０２に供給する。 In addition, the karaoke processing unit 485 supplies lyrics data included in the music data to the HDMI (registered trademark) port 402 via the platform 70 in conjunction with the display control unit 440.

採点処理部４８６は、新規ライブラリ７０６から供給される複製音声データと、カラオケ処理部４８５によって取得した楽曲データに含まれる伴奏データと、に基づき、ユーザの歌唱力の採点を実行し、当該採点結果を示す評価データを生成するとともに、表示制御部４４０と連動して、当該生成した評価データに基づいて、所定の画像を表示するための描画用のデータを生成する。 The scoring processing unit 486 performs scoring of the user's singing ability based on the duplicate audio data supplied from the new library 706 and the accompaniment data included in the music data acquired by the karaoke processing unit 485, and the scoring result Is generated in conjunction with the display control unit 440, and drawing data for displaying a predetermined image is generated based on the generated evaluation data.

具体的には、採点処理部４８６は、複製音声データに対応する歌唱音声の音階等の変位状態を、ＭＩＤＩ形式の伴奏音データに含まれる音階、音価及び休符と比較し、当該比較した結果を示す評価データを生成する。 Specifically, the scoring processing unit 486 compares the scale state of the singing voice corresponding to the duplicated voice data with the scale, tone value, and rest included in the MIDI accompaniment sound data, and compares them. Evaluation data indicating the result is generated.

特に、採点処理部４８６は、図４に示すように、五線譜上における縦軸方向に音階を設定するとともに、横軸方向に時間軸を設定し、伴奏音データにより示される音階、音価及び休符を五線譜上の所定の矩形形状（図４参照）により表現するとともに、歌唱音声の変位状態を実線により表現した評価データ（ピアノロールデータともいう。）を生成する。 In particular, as shown in FIG. 4, the scoring processing unit 486 sets the scale in the vertical axis direction on the staff and sets the time axis in the horizontal axis direction, and the scale, pitch value, and rest indicated by the accompaniment sound data. Evaluation data (also referred to as piano roll data) expressing the displacement state of the singing voice with a solid line is generated while expressing the note by a predetermined rectangular shape on the staff (see FIG. 4).

すなわち、採点処理部４８６は、図４に示すように、歌詞の画像を有する動画その他のカラオケ用の背景画像（以下、「カラオケ背景画像」という。）上に、伴奏音により示される音階等から歌唱音声の音程が、どの程度ずれているのかを示す画像を含む描画用のデータを、評価データとして生成する。 That is, as shown in FIG. 4, the scoring processing unit 486 uses a scale or the like indicated by an accompaniment sound on a moving image having lyrics images or other background images for karaoke (hereinafter referred to as “karaoke background images”). Drawing data including an image indicating how much the pitch of the singing voice is shifted is generated as evaluation data.

そして、採点処理部４８６は、表示制御部４４０と連携しつつ、当該評価データを、プラットフォーム７０を介して、ＨＤＭＩ（登録商標）ポート４０２に供給する。具体的には、採点処理部４８６は、表示制御部４４０制御してカラオケ処理部４８５から供給される歌詞データ及び評価データに基づき、描画用のデータを生成させ、当該生成した描画用のデータを、プラットフォーム７０を介して、ＨＤＭＩ（登録商標）ポート４０２に供給する。 Then, the scoring processing unit 486 supplies the evaluation data to the HDMI (registered trademark) port 402 via the platform 70 in cooperation with the display control unit 440. Specifically, the scoring processing unit 486 controls the display control unit 440 to generate drawing data based on the lyrics data and the evaluation data supplied from the karaoke processing unit 485, and the generated drawing data is displayed. , And supplied to the HDMI (registered trademark) port 402 via the platform 70.

なお、テレビ受像機５０は、ＨＤＭＩ（登録商標）ポート４０２から出力された描画用のデータを受信すると、図４に例示するような画像を表示する。 When the television receiver 50 receives the drawing data output from the HDMI (registered trademark) port 402, the television receiver 50 displays an image as illustrated in FIG.

［３］カラオケシステムの動作
次に、図５を参照しつつ、本実施形態のカラオケシステム１０の通信端末装置４０における動作について説明する。なお、図５は、本実施形態の通信端末装置４０にて実行される処理を示すフローチャートである。 [3] Operation of Karaoke System Next, the operation of the communication terminal device 40 of the karaoke system 10 of the present embodiment will be described with reference to FIG. In addition, FIG. 5 is a flowchart which shows the process performed with the communication terminal device 40 of this embodiment.

本動作においては、予め通信端末装置４０のＵＳＢポート４０１にマイクロホン６０が接続され、ＨＤＭＩ（登録商標）ポート４０２にテレビ受像機５０が接続されているものとする。 In this operation, it is assumed that the microphone 60 is connected to the USB port 401 of the communication terminal device 40 in advance, and the television receiver 50 is connected to the HDMI (registered trademark) port 402.

また、本動作においては、インターフェース部４００と連動しつつ、プログラム実行部４８１によってオペレーティングシステム（ＯＳ）が実行され、かつ、アプリケーション実行部４８２によってカラオケアプリ８０が実行されて図３に示すプラットフォーム７０が形成されているものとし、アプリケーション実行部４８２（具体的には楽曲データ取得管理部４８３）は、既にユーザが希望する楽曲データを取得して楽曲データ記録部４５３に記録されているものとする。 In this operation, the operating system (OS) is executed by the program execution unit 481 in conjunction with the interface unit 400, and the karaoke application 80 is executed by the application execution unit 482, so that the platform 70 shown in FIG. It is assumed that the application execution unit 482 (specifically, the music data acquisition management unit 483) has already acquired the music data desired by the user and recorded in the music data recording unit 453.

まず、カラオケ処理部４８５は、ユーザの操作部４６０への所定の入力操作を検出すると（ステップＳ１０１）、該当する楽曲データを楽曲データ記録部４５３から読み出し、楽曲の再生及び所定の画像の再生を開始する（ステップＳ１０２）。 First, when the karaoke processing unit 485 detects a predetermined input operation to the operation unit 460 by the user (step S101), the karaoke processing unit 485 reads out the corresponding music data from the music data recording unit 453, and reproduces the music and the predetermined image. Start (step S102).

具体的には、カラオケ処理部４８５は、読み出した楽曲データに含まれるＭＩＤＩ形式の伴奏音データに基づき、所定形式の音声データを生成し、プラットフォーム７０を介して生成した音声データのＨＤＭＩ（登録商標）ポート４０２への出力を開始する。 Specifically, the karaoke processing unit 485 generates audio data in a predetermined format based on the MIDI accompaniment sound data included in the read music data, and HDMI (registered trademark) of the audio data generated via the platform 70. ) Start output to port 402.

そして、カラオケ処理部４８５は、表示制御部４４０と連携しつつ、カラオケ背景画像、及び、当該カラオケ背景画像上に楽曲データに含まれる歌詞データの画像表示を行うための描画用のデータを生成し、プラットフォーム７０を介して生成した描画用のデータのＨＤＭＩ（登録商標）ポート４０２への出力を開始する。 Then, the karaoke processing unit 485 generates data for drawing for displaying an image of the karaoke background image and the lyrics data included in the music data on the karaoke background image in cooperation with the display control unit 440. Then, output of drawing data generated via the platform 70 to the HDMI (registered trademark) port 402 is started.

また、楽曲の再生が開始されると、各部は、当該楽曲が終了するまで以下の処理を繰り返し実行する。 Further, when the reproduction of the music is started, each unit repeatedly executes the following processing until the music ends.

まず、新規ライブラリ７０６は、ＵＳＢポート４０１から出力されたマイクロホン６０によって集音されてＡ／Ｄ変換された音声データ（すなわち、ドライ音データ）を、Ｋｅｒｎｅｌ７０１及びＡｌｓａ−Ｌｉｂ７０２を介して受け付けると（ステップＳ１０３）、ドライ音データを複製し、複製音声データを生成する（ステップＳ１０４）。 First, when the new library 706 receives sound data (ie, dry sound data) collected by the microphone 60 output from the USB port 401 and subjected to A / D conversion via the Kernel 701 and the Alsa-Lib 702 (steps). S103), the dry sound data is duplicated to generate duplicate voice data (step S104).

次いで、新規ライブラリ７０６は、Ｋｅｒｎｅｌ７０１及びＡｌｓａ−Ｌｉｂ７０２を介して受け付けたドライ音データをＫｅｒｎｅｌ７０１、Ａｌｓａ−Ｌｉｂ７０２のデバイスドライバに出力するとともに、複製したドライ音データの複製音声データをエフェクト処理部４８４に出力する（ステップＳ１０５）。 Next, the new library 706 outputs the dry sound data received via the Kernel 701 and the Alsa-Lib 702 to the device drivers of the Kernel 701 and Alsa-Lib 702, and outputs the duplicate audio data of the duplicated dry sound data to the effect processing unit 484. (Step S105).

次いで、エフェクト処理部４８４は、新規ライブラリ７０６から出力された複製音声データに対してエフェクト処理を実行しつつ、エフェクト音声データを生成し（ステップＳ１０６）、当該エフェクト処理によって生成された音声データ（すなわち、エフェクト音声データ）をＨＡＬ７０３、Ａｕｄｉｏ−Ｆｒｉｎｇｅｒ７０４及びＯｐｅｎＳＬＥＳ７０５の第２レイヤー介してＫｅｒｎｅｌ７０１及びＡｌｓａ−Ｌｉｂ７０２の第１レイヤーに出力する（ステップＳ１０７）。 Next, the effect processing unit 484 generates effect sound data while performing effect processing on the duplicate sound data output from the new library 706 (step S106), and the sound data generated by the effect processing (that is, the effect data) , The effect sound data) is output to the first layer of Kernel 701 and Alsa-Lib 702 via the second layer of HAL 703, Audio-Fringer 704, and Open SLES 705 (step S107).

次いで、Ｋｅｒｎｅｌ７０１及びＡｌｓａ−Ｌｉｂ７０２のデバイスドライバは、新規ライブラリ７０６から出力されたドライ音データと、エフェクト処理部４８４から出力されたエフェクト音声データと、を合成し（ステップＳ１０８）、ＨＤＭＩ（登録商標）ポート４０２を介してテレビ受像機５０に出力する（ステップＳ１０９）。 Next, the device driver of Kernel 701 and Alsa-Lib 702 synthesizes the dry sound data output from the new library 706 and the effect sound data output from the effect processing unit 484 (step S108), and HDMI (registered trademark) The data is output to the television receiver 50 via the port 402 (step S109).

なお、新規ライブラリ７０６から出力されて、エフェクト処理部４８４によって生成されたエフェクト音声データは、さらに、ＨＡＬ７０３、Ａｕｄｉｏ−Ｆｒｉｎｇｅｒ７０４及びＯｐｅｎＳＬＥＳ７０５の第２レイヤーを介してＫｅｒｎｅｌ７０１及びＡｌｓａ−Ｌｉｂ７０２のデバイスドライバに入力されるので、上述のようにドライ音データとの遅延が発生している。したがって、デバイスドライバは、取得したタイミングにおけるドライ音データとエフェクト音声データとを合成する。また、デバイスドライバは、伴奏音データを、適宜ドライ音データとエフェクト音声データとに合成する。 Note that the effect audio data output from the new library 706 and generated by the effect processing unit 484 is further input to the device drivers of the Kernel 701 and the Alsa-Lib 702 via the second layer of the HAL 703, Audio-Finger 704, and Open SLES 705. Therefore, there is a delay from the dry sound data as described above. Therefore, the device driver synthesizes the dry sound data and the effect sound data at the acquired timing. In addition, the device driver appropriately synthesizes accompaniment sound data with dry sound data and effect sound data.

なお、伴奏音データは、ＨＡＬ７０３、Ａｕｄｉｏ−Ｆｒｉｎｇｅｒ７０４及びＯｐｅｎＳＬＥＳ７０５の第２レイヤーで、エフェクト音声データと合成されてもよいし、アプリケーションを実行する第３のレイヤーで、エフェクト音声データと合成されてもよい。この場合、デバイスドライバは、ドライ音データと、予め合成されたエフェクト音声データと伴奏音データとを、合成する。 The accompaniment sound data may be combined with the effect sound data in the second layer of the HAL 703, the audio-ringer 704, and the Open SLES 705, or may be combined with the effect sound data in the third layer that executes the application. Good. In this case, the device driver synthesizes the dry sound data, the effect sound data synthesized in advance, and the accompaniment sound data.

一方、採点処理部４８６は、ステップＳ１０５〜ステップＳ１０８の処理と平行して、所定のタイミングに楽曲データに含まれる伴奏音データと、複製音声データと、に基づき、採点処理を実行し、評価データを生成し（ステップＳ１１１）、生成した評価データを、ＨＡＬ７０３、Ａｕｄｉｏ−Ｆｒｉｎｇｅｒ７０４及びＯｐｅｎＳＬＥＳ７０５の第２レイヤー及びＫｅｒｎｅｌ７０１及びＡｌｓａ−Ｌｉｂ７０２の第１レイヤーを介してＨＤＭＩ（登録商標）ポート４０２及びテレビ受像機５０に出力する（ステップＳ１１２）。 On the other hand, the scoring processing unit 486 performs scoring processing based on the accompaniment sound data included in the music data and the duplicated sound data at a predetermined timing in parallel with the processing of steps S105 to S108, and the evaluation data (Step S111), and the generated evaluation data is sent to the HDMI (registered trademark) port 402 and the television receiver via the second layer of the HAL 703, the Audio-Fringer 704 and the Open SLES 705, and the first layer of the Kernel 701 and Alsa-Lib 702. 50 (step S112).

なお、テレビ受像機５０は、評価データを受信すると、他の画像データとともに、図４に示すような画像を表示しつつ、当楽曲データ、ドライ音データ及びエフェクト処理されたデータを音として出力する。 When receiving the evaluation data, the television receiver 50 outputs the music data, dry sound data, and effect-processed data as sound while displaying the image as shown in FIG. 4 together with other image data. .

最後に、カラオケ処理部４８５は、伴奏音データの終了、又は、操作部４６０における終了操作の有無（すなわち、再生終了の有無）を判定し（ステップＳ１１０）、伴奏音データの終了又は操作部１６０の終了操作を検出していないと判定した場合には、ステップＳ１０３の処理に移行し、当該伴奏音データの終了又は操作部１６０の終了操作を検出したと判定した場合には、本動作を終了させる。 Finally, the karaoke processing unit 485 determines the end of the accompaniment sound data or the presence / absence of the end operation in the operation unit 460 (that is, the presence / absence of the end of reproduction) (step S110). If it is determined that no end operation has been detected, the process proceeds to step S103. If it is determined that the end of the accompaniment sound data or the end operation of the operation unit 160 has been detected, this operation ends. Let

以上、本実施形態のカラオケシステム１０においては、通信端末装置４０のプラットフォーム７０に新規ライブラリ７０６を設け、ドライ音データをＡｌｓａ−Ｌｉｂ７０２にパススルー出力するとともに、複製音声データを生成して、カラオケアプリ８０に供給する構成となっているため、歌唱音声がテレビ受像機５０により出力されるタイミングが、ユーザの発声タイミングから遅延する時間を短縮して、カラオケを行うユーザの満足度を向上させることができる。 As described above, in the karaoke system 10 according to the present embodiment, the new library 706 is provided in the platform 70 of the communication terminal device 40, the dry sound data is passed through to the Alsa-Lib 702, and the duplicate voice data is generated to generate the karaoke application 80. Therefore, the timing at which the singing voice is output from the television receiver 50 can be reduced from the time when the user speaks, and the satisfaction of the user who performs karaoke can be improved. .

［４］変形例
［４．１］変形例１
上記実施形態においては、カラオケアプリ８０からＨＤＭＩ（登録商標）ポート４０２に出力するエフェクト音声等をＯｐｅｎＳＬＥＳ７０５、Ａｕｄｉｏ−Ｆｒｉｎｇｅｒ７０４、ＨＡＬ７０３を介して、ＨＤＭＩ（登録商標）ポート４０２に供給する構成を採用したが、これらを介さず、Ａｌｓａ−Ｌｉｂ７０２及びＫｅｒｎｅｌ７０１のみを介して、ＨＤＭＩ（登録商標）ポート４０２に出力するようにしてもよい。 [4] Modification [4.1] Modification 1
In the above embodiment, the configuration is adopted in which the effect sound output from the karaoke application 80 to the HDMI (registered trademark) port 402 is supplied to the HDMI (registered trademark) port 402 via the OpenSLES 705, the Audio-Finger 704, and the HAL 703. Alternatively, the output may be made to the HDMI (registered trademark) port 402 via only the Alsa-Lib 702 and the Kernel 701 without passing through these.

この場合には、更に、これらに代えて、新規のライブラリを設け、当該ライブラリを介して、音声等をＡｌｓａ−Ｌｉｂ７０２に供給するようにすればよい。 In this case, in place of these, a new library may be provided, and audio or the like may be supplied to the Alsa-Lib 702 via the library.

また、この構成により、エフェクト音声等の出力遅延を低減して、ユーザの満足度を更に向上させることが可能となる。 Also, with this configuration, it is possible to further improve the user satisfaction by reducing output delay of effect sound and the like.

［４．２］変形例２
上記実施形態においては、楽曲ＤＢ３００を楽曲提供サーバ装置３０において管理する構成としたが、楽曲ＤＢ３００を管理するために他のコンピュータシステムを利用するようにしてもよい。 [4.2] Modification 2
In the above embodiment, the music DB 300 is managed by the music providing server device 30. However, another computer system may be used to manage the music DB 300.

［４．３］変形例３
上記実施形態のカラオケシステム１０に用いる通信端末装置４０において上述のプラットフォーム構造を適用したが、カラオケシステム１０又はカラオケシステム１０に用いる通信端末装置４０に限らず、マイクロホン６０によって集音されてＡ／Ｄ変換された音声データについて入力処理を実行する拡声装置などの音声入力処理装置についもて適用することも可能である。 [4.3] Modification 3
The platform structure described above is applied to the communication terminal device 40 used in the karaoke system 10 of the above embodiment. However, the A / D is not limited to the communication terminal device 40 used in the karaoke system 10 or the karaoke system 10 but is collected by the microphone 60 and A / D. The present invention can also be applied to a voice input processing device such as a loudspeaker that performs input processing on the converted voice data.

例えば、このような音声入力処理装置は、カラオケＢＯＸなどその他の音場を調整する際に、ユーザにより音声入力された場合に、ドライ音を出力するととともに、ドライ音の複製音をデータ処理して波形表示する装置に適用される。また、このような構成を有することによって、音声の波形表示と、スピーカから出力される音声との乖離を防止することができる。 For example, such a sound input processing apparatus outputs a dry sound and processes a duplicate sound of a dry sound when data is input by a user when adjusting other sound fields such as a karaoke BOX. It is applied to a device that displays waveforms. Further, by having such a configuration, it is possible to prevent a divergence between the sound waveform display and the sound output from the speaker.

１ … カラオケネットワークシステム
１０ … カラオケシステム
２０ … ネットワーク
３０ … 楽曲提供サーバ装置
４０ … 通信端末装置
５０ … テレビ受像機
６０ … マイク
７０ … プラットフォーム
８０ … カラオケアプリ
４１０ … 通信制御部
４４０ … 表示制御部
４５０ … 記録部
４５１ … プログラム記録部
４５２ … アプリケーション記録部
４５３ … 楽曲データ記録部
４５４ … ＲＡＭ
４７０ … 端末管理制御部
４８０ … アプリケーション実行部
４８１ … 楽曲データ取得管理部
４８２ … エフェクト処理部
４８３ … カラオケ処理部
４８４ … 採点処理部
７０１ … Ｋｅｒｎｅｌ
７０２ … Ａｌｓａ−Ｌｉｂ
７０３ … ＨＡＬ
７０４ … Ａｕｄｉｏ−Ｆｒｉｎｇｅｒ
７０５ … ＯｐｅｎＳＬＥＳ
７０６ … 新規ライブラリ DESCRIPTION OF SYMBOLS 1 ... Karaoke network system 10 ... Karaoke system 20 ... Network 30 ... Music provision server apparatus 40 ... Communication terminal apparatus 50 ... Television receiver 60 ... Microphone 70 ... Platform 80 ... Karaoke application 410 ... Communication control part 440 ... Display control part 450 ... Recording unit 451 ... Program recording unit 452 ... Application recording unit 453 ... Music data recording unit 454 ... RAM
470 ... terminal management control unit 480 ... application execution unit 481 ... music data acquisition management unit 482 ... effect processing unit 483 ... karaoke processing unit 484 ... scoring processing unit 701 ... Kernel
702 ... Alsa-Lib
703 ... HAL
704 ... Audio-Fringer
705 ... OpenSLES
706 ... New library

Claims

Receiving means for receiving an instruction input from a user who selects at least one of the music data from a plurality of music data;
Obtaining means for obtaining the selected music data;
An interface connected to an external device and outputting predetermined input data to the external device in accordance with a predetermined procedure;
When the voice uttered by the user is input, the input of the voice data is received as input voice data from the voice input means that converts the voice into voice data of a predetermined format and outputs the input voice data. To generate duplicate audio data, and to output the input audio data and either of the input audio data and the duplicate audio data to the interface;
Audio data different from the audio data output to the interface from the input / output management unit is acquired, and predetermined processing is executed based on the acquired audio data and the acquired music data. Data processing means for outputting predetermined data to the interface;
A karaoke system comprising:

The karaoke system according to claim 1,
A first layer having an interface that functions as a device driver for the external device;
A second layer configured on the first layer and having the input / output management means;
A third layer configured on the second layer and having the data processing means;
Having a platform structure having at least
The second layer has sound control means for controlling the interface based on data relating to sound output from the data processing means;
The karaoke system, wherein the input / output management means directly outputs one of the input voice data and the duplicated duplicate voice data to the interface.

The karaoke system according to claim 1 or 2,
The karaoke system, wherein the interface synthesizes sound data relating to sound processed by the data processing means and sound data output from the input / output management means, and outputs the result to the external device.

It is a karaoke system of any one of Claims 1-3,
The karaoke system, wherein the interface converts input data into digital data of the predetermined format and outputs the converted digital data to the external device.

It is a karaoke system of any one of Claims 1-4,
The data processing means is
As the predetermined process, a process of generating effect sound data corresponding to an effect sound obtained by performing an audio effect process on the input sound data,
A karaoke system for outputting the generated effect sound data to the interface.

It is a karaoke system of any one of Claims 1-5,
The song data includes at least guide melody data including at least one of the scale, note value, and rest of the song when singing the corresponding song,
A karaoke system in which the data processing means executes a scoring process for evaluating the singing ability of the user based on the guide melody data and the input voice data.

The karaoke system according to claim 6,
The data processing unit, as the scoring process,
The voice displacement state corresponding to the voice data or the duplicate voice data is compared with the scale, note value, rest included in the guide melody data,
Generate image data showing the comparison results as result data,
A karaoke system for outputting the generated result data to the interface.

Computer
Receiving means for receiving an instruction input of a user who selects at least one of the music data from a plurality of music data;
Obtaining means for obtaining the selected music data;
An interface that is connected to an external device and outputs predetermined input data to the external device in accordance with a predetermined procedure;
When the voice uttered by the user is input, the input of the voice data is received as input voice data from the voice input means that converts the voice into voice data of a predetermined format and outputs the input voice data. The input / output management means for generating duplicate audio data by outputting the audio data of any one of the input audio data and the duplicate audio data; and
Audio data different from the audio data output to the interface from the input / output management unit is acquired, and predetermined processing is executed based on the acquired audio data and the acquired music data. Data processing means for outputting predetermined data to the interface;
A program characterized by functioning as

A karaoke voice playback method for playing karaoke voice,
A first step of receiving an instruction input from a user selecting at least one of the music data from a plurality of music data;
A second step of obtaining the selected music data;
A third step of accepting input of the voice data as input voice data from voice input means that outputs the voice uttered by the user while converting the voice into voice data of a predetermined format;
A fourth step of duplicating the inputted input voice data to generate duplicate voice data;
Output either the input audio data or the duplicate audio data to the external device via the interface, and acquire audio data different from the audio data output to the external device via the interface A fifth step of executing a predetermined process based on the acquired audio data and the acquired music data and outputting the processed predetermined data to the interface;
A karaoke voice reproducing method comprising:

A voice input processing device for processing input voice,
An interface connected to an external device and outputting predetermined input data to the external device in accordance with a predetermined procedure;
When the voice uttered by the user is input, the input of the voice data is received as input voice data from the voice input means that converts the voice into voice data of a predetermined format and outputs the input voice data. To generate duplicate audio data, and to output the input audio data and either of the input audio data and the duplicate audio data to the interface;
Data processing means for acquiring voice data different from the voice data output to the interface from the input / output management means, and executing predetermined processing based on the acquired voice data;
A voice input processing device comprising:

A program for processing audio input by a computer,
The computer,
An interface that is connected to an external device and outputs predetermined input data to the external device in accordance with a predetermined procedure;
When the voice uttered by the user is input, the input of the voice data is received as input voice data from the voice input means that converts the voice into voice data of a predetermined format and outputs the input voice data. The input / output management means for generating duplicate audio data by outputting the audio data of any one of the input audio data and the duplicate audio data; and
Data processing means for acquiring voice data different from the voice data output from the input / output management means to the interface, and executing predetermined processing based on the acquired voice data;
A program characterized by functioning as