JP6070952B2

JP6070952B2 - Karaoke device and karaoke program

Info

Publication number: JP6070952B2
Application number: JP2013269370A
Authority: JP
Inventors: 隆喜愛葉
Original assignee: Brother Industries Ltd
Current assignee: Brother Industries Ltd
Priority date: 2013-12-26
Filing date: 2013-12-26
Publication date: 2017-02-01
Anticipated expiration: 2033-12-26
Also published as: JP2015125268A

Description

本発明は、演奏にあわせて歌唱を楽しむことのできるカラオケ装置、並びに、パーソナルコンピュータ等の各種情報処理装置で実行可能なカラオケ用プログラムに関する。 The present invention relates to a karaoke apparatus in which a user can enjoy singing along with a performance, and a karaoke program that can be executed by various information processing apparatuses such as a personal computer.

音声合成技術の分野では、その技術進歩により、文章の読み上げにおいて音声合成を行うのみならず、楽曲の歌唱を音声合成することが行われている。このような音声合成による歌唱は、インターネットによる配信等を媒体として、従来の人による歌唱と同様、人気を集めている。 In the field of speech synthesis technology, due to technological progress, not only speech synthesis is performed in reading a sentence, but also singing a song of a song is performed. Such singing by voice synthesis is gaining popularity in the same way as singing by a conventional person, using distribution via the Internet as a medium.

特許文献１には、このような音声合成技術を使用して歌唱を行う音声合成システムについて、音声合成に必要とするパラメータを抽出することが開示されている。この音声合成システムでは、ユーザの歌唱音声から音声合成に用いる音声パラメータを取得し、曲調から、感情（明るい、暗い）を特定して、音声パラメータと感情とを対応付けて記憶する。このように取得した音声パラメータを使用することで、合成音として出力する音に対する発声者の特徴を多様化することが可能である。 Patent Document 1 discloses extracting parameters necessary for speech synthesis for a speech synthesis system that performs singing using such speech synthesis technology. In this speech synthesis system, speech parameters used for speech synthesis are acquired from the user's singing speech, emotions (bright and dark) are specified from the tone, and speech parameters and emotions are stored in association with each other. By using the speech parameters acquired in this way, it is possible to diversify the speaker's characteristics for the sound output as a synthesized sound.

特開２０１３−１１４１９１号公報JP 2013-114191 A

発声者の特徴を多様化する、歌唱においては歌唱の表現を豊かにするためには、特許文献１において、同じ発音名「あ」であっても、感情毎に音声パラメータを取得しているように多数の音声合成用データが必要となる。特許文献１では、歌唱中に音声合成用データを取得することが可能であるが、どの楽曲を歌唱することで、必要とされる音声合成用データを収集することができるかについては開示されておらず、効率的に音声合成用データを収集することができなかった。 In order to diversify the features of the speaker and to enrich the expression of the singing, in Patent Document 1, even if the phonetic name is “A”, the voice parameter seems to be acquired for each emotion. In addition, a lot of data for speech synthesis is required. In Patent Document 1, it is possible to acquire speech synthesis data during singing, but it is disclosed about which music can be collected by singing which music can be collected. The data for speech synthesis could not be collected efficiently.

また、従来の音声合成では、ある発声者の音声合成用データを取得するため、対象となる発声者は意味の無い文章を延々と読み上げる必要があり、発声者に対しては負荷の大きい作業となっていた。このように音声合成の品質を向上させるためには、多数の音声パラメータが必要とされるところ、発声者に対する作業量、負荷は大きいものであった。 In addition, in conventional speech synthesis, in order to acquire speech synthesis data for a certain speaker, the target speaker needs to read out meaningless sentences endlessly. It was. Thus, in order to improve the quality of speech synthesis, a large number of speech parameters are required. However, the amount of work and the load on the speaker are large.

上述する課題を解決するため、本発明に係るカラオケ装置は、以下の構成を採用するものである。
演奏部と、制御部と、を備えたカラオケ装置であって、
前記演奏部は、演奏情報に基づく演奏を可能とし、
前記制御部は、演奏処理と、収集処理と、推奨処理とを実行可能とし、
前記演奏処理は、指定された楽曲に対応する演奏情報を前記演奏部に演奏させ、
前記収集処理は、前記演奏処理の実行中に入力される歌唱音声情報から音声合成用データを収集し、収集した音声合成用データを演奏中の楽曲に対応する歌詞情報の発音名に対応付けて、ユーザ別の音声合成用データベースに格納し、
前記推奨処理は、ユーザ別の音声合成用データベース中、不足する音声合成用データを不足音声合成用データとして抽出し、不足音声合成用データに対応する発音名を歌詞情報
に含んでいる楽曲を演奏対象として推奨する。 In order to solve the above-described problems, the karaoke apparatus according to the present invention employs the following configuration.
A karaoke apparatus comprising a performance unit and a control unit,
The performance unit enables performance based on performance information,
The control unit can perform performance processing, collection processing, and recommendation processing,
The performance processing causes the performance unit to perform performance information corresponding to a designated music piece,
The collection process collects voice synthesis data from singing voice information input during execution of the performance process, and associates the collected voice synthesis data with the pronunciation name of the lyrics information corresponding to the music being played. , Store it in the database for speech synthesis by user,
The recommended processing is to extract a lack of speech synthesis data as insufficient speech synthesis data from a user-specific speech synthesis database, and to play a song whose lyric information includes a pronunciation name corresponding to the lacking speech synthesis data. Recommended as a target.

さらに本発明に係るカラオケ装置において、
前記推奨処理は、不足音声合成用データに対応する発音名を歌詞情報に含むとともに、ユーザ情報に対応する楽曲を推奨する。 Furthermore, in the karaoke apparatus according to the present invention,
The recommendation process includes a pronunciation name corresponding to the deficient speech synthesis data in the lyrics information and recommends music corresponding to the user information.

さらに本発明に係るカラオケ装置において、
前記推奨処理は、推奨対象となる楽曲中、不足音声合成用データを多く含んでいる楽曲を優先して推奨する。 Furthermore, in the karaoke apparatus according to the present invention,
The recommendation process preferentially recommends a song that includes a lot of deficient speech synthesis data among songs to be recommended.

さらに本発明に係るカラオケ装置は、
音声合成用データベースに格納した音声合成用データに基づいて音声合成処理を実行する。 Furthermore, the karaoke apparatus according to the present invention is
A speech synthesis process is executed based on the speech synthesis data stored in the speech synthesis database.

さらに本発明に係るカラオケ装置において、
前記音声合成処理は、前記演奏処理に同期して実行される。 Furthermore, in the karaoke apparatus according to the present invention,
The voice synthesis process is executed in synchronization with the performance process.

さらに本発明に係るカラオケ装置において、
前記収集処理は、歌唱音声情報中、歌詞情報の発音名の歌唱タイミングに対応する時間領域から、当該発音名に対応する音声合成用データを収集する。 Furthermore, in the karaoke apparatus according to the present invention,
The collection process collects speech synthesis data corresponding to the pronunciation name from the time region corresponding to the singing timing of the pronunciation name of the lyrics information in the singing voice information.

さらに本発明に係るカラオケ装置において、
前記収集処理は、歌唱音声情報を分析して、歌詞情報の発音名に対応する音声合成用データを収集する。 Furthermore, in the karaoke apparatus according to the present invention,
The collection process analyzes singing voice information and collects voice synthesis data corresponding to the pronunciation name of the lyrics information.

また本発明に係るカラオケ用プログラム装置は、
演奏処理と、収集処理と、推奨処理とをコンピュータに実行させるカラオケ用プログラムであって、
前記演奏処理は、指定された楽曲に対応する演奏情報を演奏させ、
前記収集処理は、前記演奏処理の実行中に入力される歌唱音声情報から音声合成用データを収集し、収集した音声合成用データを演奏中の楽曲に対応する歌詞情報の発音名に対応付けて、ユーザ別の音声合成用データベースに格納し、
前記推奨処理は、ユーザ別の音声合成用データベース中、不足する音声合成用データを不足音声合成用データとして抽出し、不足音声合成用データに対応する発音名を歌詞情報に含んでいる楽曲を演奏対象として推奨する。 Moreover, the karaoke program device according to the present invention includes:
A karaoke program that causes a computer to perform performance processing, collection processing, and recommended processing,
The performance processing causes performance information corresponding to the designated music to be played,
The collection process collects voice synthesis data from singing voice information input during execution of the performance process, and associates the collected voice synthesis data with the pronunciation name of the lyrics information corresponding to the music being played. , Store it in the database for speech synthesis by user,
The recommended processing is to extract a lack of speech synthesis data as insufficient speech synthesis data from a user-specific speech synthesis database, and to play a song whose lyric information includes a pronunciation name corresponding to the lacking speech synthesis data. Recommended as a target.

本発明のカラオケ装置、並びに、カラオケ用プログラムによれば、ユーザ別の音声合成用データベース中、不足する音声合成用データを不足音声合成用データとして抽出し、不足音声合成用データに対応する発音名を歌詞情報に含んでいる楽曲を演奏対象として推奨することで、音声合成処理に必要とされる音声合成用パラメータを効率的に収集することが可能となる。 According to the karaoke apparatus and the karaoke program of the present invention, the lack of speech synthesis data is extracted as insufficient speech synthesis data in the speech synthesis database for each user, and the pronunciation name corresponding to the insufficient speech synthesis data It is possible to efficiently collect parameters for speech synthesis required for speech synthesis processing by recommending a musical piece including lyric information as a performance target.

さらに本発明のカラオケ装置、並びに、カラオケ用プログラムによれば、ユーザ情報に対応する楽曲を推奨することで、ユーザは自分の嗜好する楽曲を歌唱することで必要な音声合成用データを収集することが可能となり、音声合成用データ収集における作業感を軽減することが可能となる。 Furthermore, according to the karaoke apparatus and the karaoke program of the present invention, by recommending music corresponding to the user information, the user collects necessary speech synthesis data by singing music that the user likes. This makes it possible to reduce the work feeling in collecting data for speech synthesis.

さらに本発明のカラオケ装置、並びに、カラオケ用プログラムによれば、推奨対象となる楽曲中、不足音声合成用データを多く含んでいる楽曲を優先して推奨することで、音声
合成処理に必要とされる音声合成用パラメータをより効率的に収集することが可能となる。 Furthermore, according to the karaoke apparatus and the karaoke program of the present invention, it is necessary for the speech synthesis process by preferentially recommending the music that contains a lot of data for lacking speech synthesis among the music to be recommended. Can be collected more efficiently.

さらに本発明のカラオケ装置、並びに、カラオケ用プログラムによれば、音声合成用データベースに格納した音声合成用データに基づいて音声合成処理を実行することで、音声合成用データベースに対応するユーザの音声にて、文章の読み上げや歌唱などの音声合成を行うことが可能となる。 Furthermore, according to the karaoke apparatus and the karaoke program of the present invention, the voice synthesis process is executed based on the voice synthesis data stored in the voice synthesis database, so that the voice of the user corresponding to the voice synthesis database is obtained. Thus, it is possible to perform speech synthesis such as reading a sentence or singing.

さらに本発明のカラオケ装置、並びに、カラオケ用プログラムによれば、音声合成処理を演奏処理に同期して実行することで、通常の歌唱に対して、音声合成によるコーラス、ハーモニー、あるいは、模範歌唱、デュエットなどを付加することが可能となる。 Furthermore, according to the karaoke apparatus and the karaoke program of the present invention, by performing the voice synthesis process in synchronization with the performance process, a chorus, harmony, or model song by voice synthesis, A duet or the like can be added.

さらに本発明のカラオケ装置、並びに、カラオケ用プログラムによれば、歌唱音声情報中、歌詞情報の発音名の歌唱タイミングに対応する時間領域から、当該発音名に対応する音声合成用データを収集することで、収集処理における処理負担を軽減することが可能となる。 Furthermore, according to the karaoke apparatus and the karaoke program of the present invention, the voice synthesis data corresponding to the pronunciation name is collected from the time domain corresponding to the singing timing of the pronunciation name of the lyrics information in the singing voice information. Thus, it is possible to reduce the processing load in the collection process.

さらに本発明のカラオケ装置、並びに、カラオケ用プログラムによれば、歌唱音声情報を分析して、歌詞情報の発音名に対応する音声合成用データを収集することで、各発音名に対応した精度の高い音声合成用データを収集することが可能となる。 Furthermore, according to the karaoke apparatus and the karaoke program of the present invention, the singing voice information is analyzed, and the voice synthesis data corresponding to the pronunciation name of the lyric information is collected, so that the accuracy corresponding to each pronunciation name is obtained. High voice synthesis data can be collected.

本発明の実施形態に係るカラオケシステムの構成を示す図The figure which shows the structure of the karaoke system which concerns on embodiment of this invention. 本発明の実施形態に係るトップ画面を示す図The figure which shows the top screen which concerns on embodiment of this invention 本発明の実施形態に係るユーザ情報のデータ構成を示す図The figure which shows the data structure of the user information which concerns on embodiment of this invention. 本発明の実施形態に係る音声合成用データベースの構成例を示す図The figure which shows the structural example of the database for speech synthesis which concerns on embodiment of this invention. 本発明の実施形態に係るユーザ情報の送受信を説明する図The figure explaining transmission / reception of the user information which concerns on embodiment of this invention 本発明の実施形態に係るアクティブユーザトップ画面を示す図The figure which shows the active user top screen which concerns on embodiment of this invention 本発明の実施形態に係る楽曲情報のデータ構成を示す図The figure which shows the data structure of the music information which concerns on embodiment of this invention. 本発明の実施形態に係る推奨処理を示すフロー図The flowchart which shows the recommendation process which concerns on embodiment of this invention 本発明の実施形態に係る音声合成用データの収集について説明するための図The figure for demonstrating the collection of the data for speech synthesis which concerns on embodiment of this invention 本発明の実施形態に係る楽曲推奨画面を示す図The figure which shows the music recommendation screen which concerns on embodiment of this invention. 本発明の実施形態に係る楽曲確認画面を示す図The figure which shows the music confirmation screen which concerns on embodiment of this invention 本発明の実施形態に係る楽曲再生処理を示すフロー図The flowchart which shows the music reproduction process which concerns on embodiment of this invention 本発明の実施形態に係る音声合成用データ収集の形態を説明するための図The figure for demonstrating the form of the data collection for speech synthesis which concerns on embodiment of this invention 本発明の実施形態に係る音声合成用データ収集の形態を説明するための図The figure for demonstrating the form of the data collection for speech synthesis which concerns on embodiment of this invention 本発明の実施形態に係る収集結果表示画面を示す図The figure which shows the collection result display screen which concerns on embodiment of this invention

本実施形態のカラオケ装置は、歌唱を楽しむカラオケ装置において、ユーザの音声を使用した音声合成処理を行うために必要とされる音声合成用データの収集を行うことを可能としたものである。 The karaoke apparatus of the present embodiment enables collection of voice synthesis data required for performing voice synthesis processing using a user's voice in a karaoke apparatus that enjoys singing.

図１は、本発明の実施形態に係る歌唱投稿システムで使用するカラオケシステムの構成を示したものである。本実施形態におけるカラオケシステムは、カラオケ装置２（コマンダ）と、リモコン装置１を含んで構成されている。なお、図１に示す例では、１台のカラオケ装置２に対して、２台のリモコン装置１ａ、１ｂが使用されている例である。これらリモコン装置１ａ、１ｂの構成は同一であるため、リモコン装置１として説明を行う。カラオケ装置２とリモコン装置１は、ＬＡＮ１００及びアクセスポイント１１０を利用してネットワークを形成するように接続されている。 FIG. 1 shows a configuration of a karaoke system used in a singing contribution system according to an embodiment of the present invention. The karaoke system in this embodiment includes a karaoke device 2 (commander) and a remote control device 1. In the example shown in FIG. 1, two remote control devices 1 a and 1 b are used for one karaoke device 2. Since these remote control devices 1a and 1b have the same configuration, the remote control device 1 will be described. The karaoke device 2 and the remote control device 1 are connected so as to form a network using the LAN 100 and the access point 110.

カラオケボックスなどの店舗に設置されるカラオケ装置２は、楽曲を演奏するための演奏部として音響制御部２５を備えている。また、カラオケ装置２は、ユーザからの各種入力を受け付けるスイッチなどで構成されるカラオケ装置２側の入力部としての操作部２１を備える。カラオケ装置２は、操作部２１からの入力を解釈してＣＰＵ３０に伝達する操作処理部２２を備える。また、カラオケ装置２は、各種情報を記憶するカラオケ装置側記憶部としてのＨＤＤ３２（ハードディスク）を備える。カラオケ装置２は、ＬＡＮ１００に接続してネットワークに加入するためのカラオケ装置側通信手段としてのＬＡＮ通信部２４を備えている。 The karaoke apparatus 2 installed in a store such as a karaoke box includes an acoustic control unit 25 as a performance unit for playing music. Moreover, the karaoke apparatus 2 is provided with the operation part 21 as an input part by the side of the karaoke apparatus 2 comprised with the switch etc. which receive the various input from a user. The karaoke apparatus 2 includes an operation processing unit 22 that interprets an input from the operation unit 21 and transmits it to the CPU 30. The karaoke apparatus 2 includes an HDD 32 (hard disk) as a karaoke apparatus-side storage unit that stores various types of information. The karaoke apparatus 2 includes a LAN communication unit 24 as karaoke apparatus side communication means for connecting to the LAN 100 and joining the network.

また、カラオケ装置２は、モニタ４１に対して歌詞映像、背景映像を表示させる映像再生手段を備える。この映像再生手段は、映像情報に基づいて映像を再生する映像再生部２９、再生する映像を一時的に蓄積するビデオＲＡＭ２８、再生された映像に対する歌詞テロップの重畳、映像効果を付与する映像制御部３１を備えて構成される。 Further, the karaoke apparatus 2 includes video reproduction means for displaying lyrics video and background video on the monitor 41. This video playback means includes a video playback unit 29 that plays back video based on video information, a video RAM 28 that temporarily stores the video to be played back, a superposition of lyrics telop on the played back video, and a video control unit that provides video effects 31 is comprised.

さらに、このカラオケ装置２では、外部に接続されるモニタ４１以外に、タッチパネルモニタ３３に対して各種情報を表示することを可能としている。タッチパネルモニタ３３は、映像制御部３１から入力された映像情報を表示する表示部３５と、タッチ入力された位置を操作処理部２２に出力するタッチパネル３４を重畳して構成されている。このタッチパネルモニタ３３は、カラオケ装置２の操作部２１、あるいは、リモコン装置１のタッチパネルモニタ１１などと同様、入力部として機能する。ユーザは、タッチパネルモニタ３３から楽曲を選択して、直接カラオケ装置２に予約を行うなど、カラオケ装置２に対する各種操作を行うことが可能である。 Further, in this karaoke apparatus 2, various information can be displayed on the touch panel monitor 33 in addition to the monitor 41 connected to the outside. The touch panel monitor 33 is configured by superimposing a display unit 35 that displays video information input from the video control unit 31 and a touch panel 34 that outputs a touch input position to the operation processing unit 22. The touch panel monitor 33 functions as an input unit, like the operation unit 21 of the karaoke device 2 or the touch panel monitor 11 of the remote control device 1. The user can perform various operations on the karaoke device 2 such as selecting music from the touch panel monitor 33 and making a reservation directly to the karaoke device 2.

さらに、カラオケ装置２は、各構成を統括して制御するためのＣＰＵ３０、各種プログラムを実行するにあたって必要となる情報を一時記憶するためのメモリ２７を備えて構成される。 Furthermore, the karaoke apparatus 2 includes a CPU 30 for controlling each component in an integrated manner and a memory 27 for temporarily storing information necessary for executing various programs.

このような構成にてカラオケ装置２は、各種処理を実行することとなるが、カラオケ装置２の主な機能として、楽曲指定処理、楽曲再生処理などを実行可能としている。楽曲指定処理は、ユーザからの指定に基づいて楽曲を指定、予約するための処理であってリモコン装置１と連携して実行される。ユーザの操作により、リモコン装置１などの入力部で指定された予約情報をメモリ２７中の予約テーブルに登録する。楽曲再生処理は、予約された楽曲を再生させる処理であって、楽曲演奏処理と歌詞再生処理とが同期して実行される処理である。 With such a configuration, the karaoke apparatus 2 performs various processes, but as a main function of the karaoke apparatus 2, a music designation process, a music reproduction process, and the like can be performed. The music designation process is a process for designating and reserving music based on designation from the user, and is executed in cooperation with the remote control device 1. The reservation information designated by the input unit such as the remote control device 1 is registered in the reservation table in the memory 27 by a user operation. The music reproduction process is a process of reproducing a reserved music, and the music performance process and the lyrics reproduction process are executed in synchronization.

楽曲演奏処理は、楽曲情報に含まれる演奏情報に基づき、音響制御部２５に演奏を実行させる処理である。音響制御部２５にて演奏された楽曲は、歌唱用マイク４４ａ、４４ｂから入力される歌唱音声と一緒にスピーカ４２から放音される。歌詞再生処理は、楽曲情報に含まれる歌詞情報をモニタ４１に表示させることで歌唱補助を行う処理である。この歌詞再生処理で表示される歌詞に、背景映像を重畳させて表示させる背景映像表示処理を実行することとしてもよい。 The music performance process is a process for causing the acoustic control unit 25 to perform a performance based on the performance information included in the music information. The music played by the acoustic control unit 25 is emitted from the speaker 42 together with the singing voice input from the singing microphones 44a and 44b. The lyric reproduction process is a process of performing singing assistance by displaying the lyric information included in the music information on the monitor 41. A background video display process for superimposing a background video on the lyrics displayed in the lyrics reproduction process may be executed.

一方、リモコン装置１は、予約情報などカラオケ装置２に対して各種指示を送信するとともに、カラオケ装置２あるいはインターネット上に接続されたホスト装置５から各種情報を受信する。本実施形態では、ユーザインターフェイスとしてボタンなどの操作部１７と、タッチパネルモニタ１１を備えている。タッチパネルモニタ１１は、表示部１１ａとタッチパネル１１ｂを有して構成され、表示部１１ａに各種インターフェイスを表示するとともに、ユーザからのタッチ入力を受付可能としている。 On the other hand, the remote control device 1 transmits various instructions to the karaoke device 2 such as reservation information, and receives various information from the karaoke device 2 or the host device 5 connected on the Internet. In the present embodiment, an operation unit 17 such as a button and the touch panel monitor 11 are provided as user interfaces. The touch panel monitor 11 includes a display unit 11a and a touch panel 11b, displays various interfaces on the display unit 11a, and can accept a touch input from a user.

さらにリモコン装置１は、楽曲検索に必要とされるデータベース、各種プログラム、並びに、プログラム実行に伴って発生する各種情報を記憶するリモコン側記憶部として、メモリ１４、そして、これら構成を統括して制御するためのリモコン側制御部を備えて構成される。リモコン側制御部には、ＣＰＵ１５、タッチパネルモニタ１１に対して表示する映像を形成する映像制御部１３、表示する映像情報を一時的に蓄えるビデオＲＡＭ１２、タッチパネルモニタ１１あるいは操作部１７からの入力を解釈してＣＰＵ１５に伝える操作処理部１８が含まれている。 Furthermore, the remote control device 1 controls the memory 14 as a remote control side storage unit that stores a database, various programs required for music search, and various types of information generated in accordance with the execution of the program, and controls these components. The remote control side control part for performing is comprised. The remote control side control unit interprets inputs from the CPU 15, the video control unit 13 that forms video to be displayed on the touch panel monitor 11, the video RAM 12 that temporarily stores video information to be displayed, the touch panel monitor 11, or the operation unit 17. Thus, an operation processing unit 18 for transmitting to the CPU 15 is included.

また、リモコン装置１は、無線ＬＡＮ通信部１６によって、アクセスポイント１１０と無線接続されることで、ＬＡＮ１００によって構成されるネットワークに接続される。なお、各リモコン装置１は、特定のカラオケ装置２に対して事前に対応付けされている。リモコン装置１から出力される各種命令は、対応付けされたカラオケ装置２にて受信されることとなる。 Further, the remote control device 1 is connected to a network constituted by the LAN 100 by being wirelessly connected to the access point 110 by the wireless LAN communication unit 16. Each remote control device 1 is associated with a specific karaoke device 2 in advance. Various commands output from the remote control device 1 are received by the associated karaoke device 2.

このようなリモコン装置１の構成により、ユーザからの各種入力をタッチパネルモニタ１１、あるいは、操作部１７から受付けるとともに、映像情報をタッチパネルモニタ１１に表示することで各種情報を提供することで、カラオケ装置２に対して楽曲予約などの各種指示を行うことが可能とされている。 With such a configuration of the remote control device 1, various inputs from the user are received from the touch panel monitor 11 or the operation unit 17, and various information is provided by displaying video information on the touch panel monitor 11, thereby providing a karaoke device. It is possible to give various instructions such as music reservation to 2.

では、カラオケシステムの動作について説明を行う。図２は、本発明の実施形態に係るトップ画面、すなわち、リモコン装置１を起動した直後の画面を示す図である。本実施形態では、リモコン装置１の表示部１１ａに表示を行うことで、リモコン装置１を操作するユーザに対して各種情報が提供される。 Now, the operation of the karaoke system will be described. FIG. 2 is a diagram showing a top screen according to the embodiment of the present invention, that is, a screen immediately after starting the remote control device 1. In the present embodiment, various information is provided to the user who operates the remote control device 1 by displaying on the display unit 11a of the remote control device 1.

本実施形態のカラオケシステムは、各リモコン装置１のタッチパネルモニタ１１から認証情報（後述するユーザ識別情報やパスワード）を入力することで、ログインすることが可能となっている。特に、本実施形態のカラオケシステムでは、複数のユーザをログインさせることを可能としており、ログインしたユーザ（ログインユーザ）に関する情報は画面上方に常時表示される。ログインしたユーザの分身像（アバター）を表示するためのログインユーザ欄１０３が設けられている。なお、分身像としては、人、キャラクターを模した像の他、図形や記号など各種形態を採用することができる。 The karaoke system of the present embodiment can log in by inputting authentication information (user identification information and password described later) from the touch panel monitor 11 of each remote control device 1. In particular, in the karaoke system of the present embodiment, a plurality of users can be logged in, and information regarding the logged-in users (logged-in users) is always displayed at the top of the screen. A login user field 103 is provided for displaying a logged image (avatar) of the logged-in user. In addition to the image imitating a person or a character, various forms such as figures and symbols can be adopted as the portrait image.

また、アカウントを有していないユーザのためのゲストアイコン１０２、ユーザを切り替えるためのユーザ切り替えスイッチ１０１が表示されている。ログインしたユーザは、ログインユーザ欄１０３に表示される分身像を選択、あるいは、ユーザ切り替えスイッチ１０１を操作することで、自分のユーザ情報を利用した各種サービスを受けることが可能となる。また、アカウントを有していないユーザは、ゲストアイコン１０２を操作することで、選曲など一般的なサービスを受けることが可能となっている。なお、ユーザの切り替えの際には、パスワードなどを使用した認証を行うこととしてもよいが、認証を行うことなく簡易に切り替えることとしてもよい。 In addition, a guest icon 102 for a user who does not have an account and a user changeover switch 101 for switching the user are displayed. The logged-in user can receive various services using his / her user information by selecting a portrait image displayed in the logged-in user column 103 or operating the user changeover switch 101. Further, a user who does not have an account can receive a general service such as music selection by operating the guest icon 102. Note that when switching users, authentication using a password or the like may be performed, but switching may be performed simply without performing authentication.

図３は、ユーザ毎にホスト装置５に記憶されているユーザ情報を説明するための図である。本実施形態では、ユーザ情報として、個人情報、マイうたテーブル、マイアーティストテーブル、履歴テーブル、音声合成用データベースといった、ユーザに関する各種情報を含んで構成される。このユーザ情報は、ホスト装置５の記憶部５１において記憶管理されている。ユーザ情報には、ユーザがログインするための認証情報（ユーザ識別情報とパスワード）が含まれている。個人情報には、ユーザ名（実際の名前に限ったものでなく、ニックネームであってもよい）、性別、年齢、うた年齢などを含んで構成される。これら各種情報は、リモコン装置１、あるいは、インターネットなどのネットワークに接続された図示しないパーソナルコンピュータ、携帯端末などを使用し、ログインして設定するこ
とも可能である。 FIG. 3 is a diagram for explaining user information stored in the host device 5 for each user. In the present embodiment, the user information includes various information related to the user, such as personal information, a my song table, a my artist table, a history table, and a speech synthesis database. This user information is stored and managed in the storage unit 51 of the host device 5. The user information includes authentication information (user identification information and password) for the user to log in. The personal information includes a user name (not limited to an actual name but may be a nickname), gender, age, song age, and the like. These various types of information can be set by logging in using the remote control device 1 or a personal computer (not shown) connected to a network such as the Internet, a portable terminal, or the like.

マイうたテーブルは、ユーザによって登録された楽曲を記憶するテーブルであって、楽曲識別情報、音程調整値、過去の採点情報（最高点、もしくは、過去複数回のもの）などを含んで構成されている。ユーザは、このマイうたテーブルに基づいて、過去に登録したお気に入りの楽曲を呼び出して予約、演奏することが可能となる。その際、登録した音程調整値を利用することで、自分の歌唱にあった音程で演奏（再生）を行うことも可能である。 The My Uta table is a table that stores music registered by the user, and includes music identification information, pitch adjustment values, past scoring information (highest score or past multiple times), and the like. Yes. Based on this My Uta table, the user can call, reserve, and play a favorite song registered in the past. At that time, by using the registered pitch adjustment value, it is also possible to perform (reproduce) with a pitch suitable for the singing.

マイアーティストテーブルは、ユーザによって登録されたアーティスト（歌手）を記憶するテーブルであって、歌手識別情報を含んで構成されている。ユーザは、このマイアーティストテーブルに基づいて、登録したお気に入りのアーティストの楽曲（持ち歌、あるいは、関連楽曲）を呼び出して予約、演奏することが可能となる。 The my artist table is a table for storing artists (singer) registered by the user, and includes singer identification information. Based on this My Artist table, the user can call, reserve, and perform a favorite favorite artist's song (a song or a related song).

履歴テーブルは、ユーザが過去に演奏した楽曲に関する各種情報を記録した情報であって、本実施形態では、過去に当該ユーザが予約することで演奏された楽曲について、楽曲の楽曲識別情報、演奏した日時を示す歌唱日時、採点情報などを含んで構成されている。ユーザは、この履歴テーブルに基づいて、過去に演奏した楽曲を呼び出して再度歌唱することが可能である。 The history table is information in which various types of information related to the music played by the user in the past are recorded. In the present embodiment, the music identification information of the music played for the music played by the user in the past in the past. Singing date and time indicating date and time, scoring information, etc. are included. Based on this history table, the user can call a song played in the past and sing again.

音声合成用データベースは、ユーザの音声に基づいて形成された音声合成用データを蓄積したものであって、ユーザが楽曲を歌唱する際に収集が行われる。図４には、本発明の実施形態に係る音声合成用データベースの構成例が示されている。音声合成用データは、発音名に対応付けられており、本実施形態では、「あ」、「い」等の発音表記毎に、周波数の変化形態を示す平坦型、上昇型、下降型に分類されている。楽曲の歌唱時において収集された音声合成用データには、収集欄に丸印が記載され、音声合成用データは音声合成用データベース内にファイルとして登録される（図では登録された音声合成用データのファイル名が示されている）。一方、音声合成用データベース中、不足している音声合成用データの収集欄には「−」が記載されている。 The speech synthesis database stores speech synthesis data formed based on the user's speech, and is collected when the user sings music. FIG. 4 shows a configuration example of a speech synthesis database according to the embodiment of the present invention. The voice synthesis data is associated with the pronunciation name, and in this embodiment, for each phonetic notation such as “A”, “I”, etc., it is classified into a flat type, an ascending type, and a descending type indicating the frequency change form. Has been. The speech synthesis data collected at the time of singing a song is marked with a circle in the collection column, and the speech synthesis data is registered as a file in the speech synthesis database (in the figure, registered speech synthesis data File name is shown). On the other hand, in the speech synthesis database, “-” is written in the collection column of the missing speech synthesis data.

音声合成用データとしては、録音した音声情報の波形を切り出した波形情報の他、録音した音声情報を分析した結果であるパラメータ（フォルマント情報、線形予測係数等）を使用することが可能である。波形情報、パラメータ、何れの形態においても発音名と対応付けた形で音声合成用データベースに登録される。音声合成用データを作成するための技術については、特許文献１等に記載されるようにから既知の技術であるため、ここでは詳細には述べない。 As speech synthesis data, parameters (formant information, linear prediction coefficients, etc.) that are the results of analyzing the recorded speech information can be used in addition to the waveform information obtained by cutting out the waveform of the recorded speech information. Waveform information and parameters are registered in the speech synthesis database in association with pronunciation names in any form. Since the technique for creating the data for speech synthesis is a known technique since it is described in Patent Document 1, etc., it will not be described in detail here.

本実施形態の発音名は、発音表記をさらに平坦型、上昇型、下降型に分類した形態となっているが、発音名の分類形態は、「あ」、「い」等の発音表記のみと簡素化することや、各発音表記に対して複数の音程、あるいは、複数の音長を加える等、適宜形態を採用することが可能である。音声合成用データベースを使用した音声合成処理を使用して歌唱を行う場合（歌唱音声合成）には、各発音表記に対して複数の音程を加えることが、品質の高い歌唱音声合成を行う上で好適である。 The phonetic names in this embodiment are in a form in which phonetic notations are further classified into flat type, ascending type, and descending type, but the phonetic name classification forms are only phonetic notations such as “a”, “i”, etc. It is possible to appropriately adopt forms such as simplification and adding a plurality of pitches or a plurality of pitches to each phonetic notation. When performing a singing using a speech synthesis process using a speech synthesis database (singing speech synthesis), adding a plurality of pitches to each phonetic notation is necessary for high quality singing speech synthesis. Is preferred.

従来、ある人の音声合成用データベースを作成する場合、無意味な文章を長時間にわたって読み上げさせることで、必要となる音声合成用データを取得することが一般的であった。このような取得方法は、取得対象となる人に対して苦痛を与えることが考えられる。本実施形態のカラオケ装置では、従来の取得方法に代え、カラオケ装置２を使用し、楽曲を歌唱する際に、音声合成に必要とされる音声合成用データを取得し、音声合成用データベースを作成することを特徴としている。このように本実施形態のカラオケ装置２では、
ユーザは歌唱を楽しみつつ、音声合成用データベースを作成することが可能となる。 Conventionally, when creating a speech synthesis database for a person, it has been common to obtain necessary speech synthesis data by reading out meaningless sentences over a long period of time. Such an acquisition method may be painful for a person to be acquired. In the karaoke apparatus according to the present embodiment, instead of the conventional acquisition method, the karaoke apparatus 2 is used to acquire voice synthesis data required for voice synthesis when singing a song, and create a voice synthesis database. It is characterized by doing. Thus, in the karaoke apparatus 2 of this embodiment,
The user can create a speech synthesis database while enjoying singing.

図５は、ユーザ情報の送受信の様子を示した図である。カラオケ店舗に来店したユーザは、自己のユーザ識別情報とパスワードで構成された認証情報を入力、あるいは、リモコン装置１などに設けられたＩＣカードリーダでＩＣカードに記憶されている認証情報を読み取らせることで認証処理Ｓ１０１を実行する。カラオケ装置２あるいはリモコン装置１は、認証情報をホスト装置５に送信し（Ｓ１０２）、認証情報を受信したホスト装置５は記憶部５１に記憶されたデータベースから、該当するユーザのユーザ情報を抽出し（Ｓ１２１）、問い合わせのあったカラオケ装置２あるいはリモコン装置１に対して送信する（Ｓ１２２）。 FIG. 5 is a diagram showing how user information is transmitted and received. A user who visits a karaoke store inputs authentication information composed of his / her user identification information and password, or causes the IC card reader provided in the remote control device 1 or the like to read the authentication information stored in the IC card. Thus, the authentication process S101 is executed. The karaoke device 2 or the remote control device 1 transmits authentication information to the host device 5 (S102), and the host device 5 that has received the authentication information extracts user information of the corresponding user from the database stored in the storage unit 51. (S121), it transmits to the karaoke device 2 or the remote control device 1 inquired (S122).

ユーザ情報を受信したカラオケ装置２側では（Ｓ１０３）、受信したユーザ情報に基づいて、楽曲指定処理、楽曲再生処理など各種サービス処理を提供する（Ｓ１０４）。ユーザによりログアウトが要求される（Ｓ１０５）とサービス処理を中断し、サービス処理中に発生した各種履歴（ログ）、あるいは、ユーザによる設定変更を反映したユーザ情報をホスト装置５に送信する（Ｓ１０６）。ここで、ユーザ情報は、全ての情報を送信することの他、更新された差分だけを送信してもよい。ユーザ情報を受信したホスト装置５では（Ｓ１２３）、受信したユーザ情報に基づいて記憶部５１に記憶されたデータベースの更新を実行する（Ｓ１２４）。 On the karaoke device 2 side that has received the user information (S103), based on the received user information, various service processes such as a music designation process and a music reproduction process are provided (S104). When logout is requested by the user (S105), the service processing is interrupted, and various histories (logs) generated during the service processing or user information reflecting setting changes by the user are transmitted to the host device 5 (S106). . Here, user information may transmit only the updated difference other than transmitting all information. In the host device 5 that has received the user information (S123), the database stored in the storage unit 51 is updated based on the received user information (S124).

以上、あるユーザが認証処理（ログイン処理）してから、ログアウトするまでの流れを説明したが、本実施形態のカラオケ装置２、リモコン装置１は、認証された複数人（認証ユーザ）が同時にログインした状態で使用することが可能となっている。図２で説明したユーザインターフェイスを用いることで、サービスを提供するユーザ（以下、「アクティブユーザ」という）を切り替えて使用することが可能となっている。 The flow from the authentication process (login process) to logout has been described above. However, in the karaoke apparatus 2 and the remote control apparatus 1 of the present embodiment, a plurality of authenticated users (authenticated users) log in at the same time. It is possible to use in the state. By using the user interface described in FIG. 2, it is possible to switch and use a user who provides a service (hereinafter referred to as “active user”).

図６は、本発明の実施形態に係るアクティブユーザトップ画面を示す図である。この図に示されるように、複数人がログインした状態ではログインユーザ欄１０３にログインしたユーザの分身像１０３ａ、１０３ｂ、１０３ｅ（本実施形態では顔部分）が表示される。またログインユーザ欄１０３中、右端に背景がハイライト（白色）で示されるユーザは、アクティブユーザ１０３ｅ（Ａさん）であって、図に示す状態では、このアクティブユーザ１０３ｅに対するサービス、すなわち、アクティブユーザ１０３ｅのユーザ情報を利用したサービスが実行されている状態となっている。このアクティブユーザトップ画面では、「曲を探す」を選択することで、歌手名、楽曲名に基づく検索を行うことができる。 FIG. 6 is a diagram showing an active user top screen according to the embodiment of the present invention. As shown in this figure, when a plurality of users are logged in, the logged-in user column 103 displays the logged-in images 103a, 103b, and 103e (face portions in the present embodiment) of the logged-in user. In the logged-in user column 103, the user whose background is highlighted (white) at the right end is the active user 103e (Mr. A), and in the state shown in the figure, the service for the active user 103e, that is, the active user The service using the user information 103e is being executed. On this active user top screen, a search based on a singer name and a song name can be performed by selecting “Find a song”.

図７には、カラオケ装置２において再生対象となる楽曲情報のデータ構成が示されている。本実施形態の楽曲情報は、楽曲を指定するための楽曲識別子（楽曲ＩＤ）、演奏手段としての音響制御部２５において演奏される演奏情報、映像再生部２９にて再生され、モニタ４１に表示される歌詞情報を含んで構成されている。図７には歌詞情報の詳細が示されている。歌詞情報は、通常、漢字表記を含んで構成されているが、本実施形態では、音声合成用データの収集に必要とされる発音名を含んで構成されている。本実施形態で使用する発音名は、図４で説明した音声合成用データベースの発音名に即した形態となっており、発音表記毎に平坦型、上昇型、下降型を含んで構成されている。したがって、楽曲の歌唱を行った場合、当該楽曲の歌詞情報に含まれる発音名に対応した音声合成用データを取得することが可能となる。 FIG. 7 shows a data structure of music information to be reproduced in the karaoke apparatus 2. The music information of the present embodiment is a music identifier (music ID) for designating a music, performance information played by the acoustic control unit 25 as a performance means, reproduced by the video playback unit 29, and displayed on the monitor 41. It contains the lyrics information. FIG. 7 shows the details of the lyrics information. The lyric information is usually configured to include kanji notation, but in the present embodiment, the lyric information is configured to include a pronunciation name required for collecting the data for speech synthesis. The pronunciation names used in the present embodiment are in accordance with the pronunciation names in the speech synthesis database described with reference to FIG. 4, and are configured to include a flat type, an ascending type, and a descending type for each pronunciation notation. . Therefore, when a song is sung, it is possible to acquire speech synthesis data corresponding to the pronunciation name included in the lyrics information of the song.

本実施形態のカラオケ装置２では、楽曲の歌唱によって音声合成用データを収集することが可能となっている。特に本実施形態のカラオケ装置２では、不足する音声合成用データを収集可能な楽曲を推奨する推奨処理を行うこととしている。このような推奨処理を実行することで、音声合成用データベース中、不足する音声合成用データを効果的に収集す
ることが可能となる。 In the karaoke apparatus 2 of the present embodiment, it is possible to collect voice synthesis data by singing a song. In particular, in the karaoke apparatus 2 of the present embodiment, a recommendation process for recommending music that can collect insufficient data for speech synthesis is performed. By executing such a recommendation process, it is possible to effectively collect deficient speech synthesis data in the speech synthesis database.

推奨処理は、図６に示されるアクティブユーザトップ画面に表示されるメニュー「音声合成用ＤＢ作成」を選択することで開始される。図８には、本発明の実施形態に係る推奨処理を示すフロー図が示されている。推奨処理では、まず、ユーザ情報中の音声合成用データベース中、不足している音声合成用データの検出を行う（Ｓ２０１）。図９には、本発明の実施形態に係る音声合成用データの収集について説明するための図が示されている。図９（Ａ）は、必要となる音声合成用データベースである。この音声合成用データベースは、適宜に設定することが可能である。すなわち、図４に示すように音声合成用データ中、「あ」から「ん」の平坦型、上昇型、下降型の全てを収集対象としてもよい。あるいは、ある楽曲の歌唱音声合成を行うために必要な音声合成用データのみを収集対象として設定することも可能である。 The recommendation process is started by selecting the menu “create speech synthesis DB” displayed on the active user top screen shown in FIG. FIG. 8 is a flowchart showing a recommendation process according to the embodiment of the present invention. In the recommendation process, first, the missing voice synthesis data in the voice synthesis database in the user information is detected (S201). FIG. 9 is a diagram for explaining collection of speech synthesis data according to the embodiment of the present invention. FIG. 9A shows a required speech synthesis database. This speech synthesis database can be set as appropriate. That is, as shown in FIG. 4, all of the flat types from “a” to “n”, the ascending type, and the descending type may be collected. Alternatively, it is also possible to set only the data for speech synthesis necessary for performing singing speech synthesis of a certain piece of music as a collection target.

図９（Ｂ）は、収集済みの音声合成用データと、音声合成用データベースから、収集済みの音声合成用データを差し引いた分が、不足分の音声合成用データである。本実施形態では、この不足分の音声合成用データを効率的に収集することを目的としている。 FIG. 9B shows a shortage of speech synthesis data obtained by subtracting the collected speech synthesis data from the collected speech synthesis data and the speech synthesis database. The purpose of this embodiment is to efficiently collect the shortage of voice synthesis data.

推奨処理では、カラオケ装置２において演奏可能な楽曲を判定する（Ｓ２０２）。そして、Ｓ２０２で取得した楽曲中、Ｓ２０１で検出する不足する音声合成用データを含む楽曲を判定する（Ｓ２０３）。本実施形態では、さらに、不足する音声合成用データを含む楽曲に対し、ユーザの楽曲嗜好性に応じたフィルタリングを行うこととしている（Ｓ２０４）。このようなフィルタリングによって、ユーザの嗜好する楽曲を歌唱することで、音声合成用データベース中、必要とされる音声合成用データを収集することが可能となる。 In the recommendation process, a song that can be played in the karaoke apparatus 2 is determined (S202). Then, of the music pieces acquired in S202, the music piece containing the insufficient voice synthesis data detected in S201 is determined (S203). In the present embodiment, further, filtering according to the user's music preference is performed on the music including the insufficient voice synthesis data (S204). By singing the music that the user likes by such filtering, it is possible to collect the required speech synthesis data in the speech synthesis database.

フィルタリングは、ユーザ情報中、マイうたテーブルに登録された楽曲、マイアーティストテーブルに登録された歌手に対応する楽曲、履歴テーブルに登録された過去に歌唱した楽曲等を選択対象とすることが可能である。このようにユーザ情報と直接的に対応する楽曲のみならず、例えば、履歴テーブルに基づいて判定されたユーザの嗜好ジャンルに対応する楽曲など、間接的に対応する楽曲を選択対象としてもよい。Ｓ２０４のフィルタリング結果は、推奨楽曲として楽曲推奨画面に表示される（Ｓ２０５）。 Filtering can select music registered in the My Uta table, music corresponding to the singer registered in the My Artist table, music sung in the past registered in the history table, etc. in the user information. is there. Thus, not only the music directly corresponding to the user information but also the music corresponding indirectly such as the music corresponding to the user's favorite genre determined based on the history table may be selected. The filtering result of S204 is displayed on the music recommendation screen as recommended music (S205).

図１０には、本発明の実施形態に係る楽曲推奨画面が示されている。この楽曲推奨画面は、図６に示されるアクティブユーザトップ画面中、「音声合成用ＤＢ作成」を選択した場合に表示される画面である。楽曲推奨画面には、Ｓ２０３においてユーザ情報に基づいてフィルタリングされた推奨楽曲について、その楽曲名、歌手名、音声合成用ＤＢ（データベース）アップ率が表示されている。本実施形態では、音声合成用ＤＢアップ率の高い順で推奨楽曲を表示している。音声合成用ＤＢアップ率とは、その推奨楽曲を歌唱することで、完成すべき音声合成用データベース中、収集済みの音声合成用データの割合の増加率を示す指数である。ユーザは、この音声合成用ＤＢアップ率を参考にして歌唱したい推奨楽曲を選択することが可能である。 FIG. 10 shows a music recommendation screen according to the embodiment of the present invention. This music recommendation screen is a screen that is displayed when “create speech synthesis DB” is selected in the active user top screen shown in FIG. 6. The song recommendation screen displays the song name, singer name, and speech synthesis DB (database) up rate for the recommended song filtered based on the user information in S203. In this embodiment, the recommended music is displayed in descending order of the DB up rate for speech synthesis. The speech synthesis DB up rate is an index indicating the rate of increase in the ratio of collected speech synthesis data in the speech synthesis database to be completed by singing the recommended music. The user can select a recommended song to be sung with reference to this DB ratio for speech synthesis.

楽曲推奨画面中に表示されている楽曲推奨画面中、ユーザの選択により推奨楽曲が選択された場合、図１１に示される楽曲確認画面が表示される。この楽曲確認画面は、楽曲推奨画面中で選択した推奨楽曲の詳細情報を示す画面であり、ユーザは詳細情報を確認して楽曲の予約を行うことが可能である。本実施形態では、推奨楽曲について、楽曲名、歌手名、歌い出しの他、音声合成用ＤＢ状況が表示されている。音声合成用ＤＢ状況には、現在の音声合成用データベースの取得率、音声合成用ＤＢアップ率、歌唱後の音声合成用データベースの取得率が表示されている。 When the recommended music is selected by the user's selection in the music recommendation screen displayed on the music recommendation screen, the music confirmation screen shown in FIG. 11 is displayed. This music confirmation screen is a screen showing detailed information of the recommended music selected in the music recommendation screen, and the user can check the detailed information and reserve a music. In the present embodiment, for the recommended music, in addition to the music name, singer name, and singing, the voice synthesis DB status is displayed. In the speech synthesis DB status, the current speech synthesis database acquisition rate, the speech synthesis DB up rate, and the speech synthesis database acquisition rate after singing are displayed.

楽曲確認画面において詳細情報を確認したユーザは、「予約」ボタンを操作することで
、当該推奨楽曲の予約を実行する。予約された推奨楽曲は、その楽曲識別情報が予約テーブルに登録され、楽曲再生処理にて順次、演奏が行われる。 The user who has confirmed the detailed information on the music confirmation screen operates the “reservation” button to reserve the recommended music. The reserved recommended music has its music identification information registered in the reservation table, and the music is played sequentially in the music playback process.

図１２には、本発明の実施形態に係る楽曲再生処理を示すフロー図が示されている。カラオケ装置２において楽曲再生処理が開始されると、メモリ２７等の記憶部に記憶されている予約テーブルをチェック（Ｓ３０１）して次に再生すべき楽曲の有無について判定が行われる。次に再生する楽曲がある場合（Ｓ３０２：Ｙｅｓ）、予約テーブルに登録された楽曲識別情報に対応する楽曲情報の演奏処理が開始される（Ｓ３０３）。また、演奏処理の開始に同期して歌唱録音処理も開始される（Ｓ３０４）。歌唱録音処理は、歌唱するユーザの歌声から音声合成用データを抽出するため、演奏中、歌唱用マイク４４に入力される音声情報を録音する処理である。 FIG. 12 is a flowchart showing music reproduction processing according to the embodiment of the present invention. When the music playback process is started in the karaoke apparatus 2, the reservation table stored in the storage unit such as the memory 27 is checked (S301), and a determination is made as to whether or not there is a music to be played next. Next, when there is a song to be played back (S302: Yes), the music information performance process corresponding to the song identification information registered in the reservation table is started (S303). Also, the song recording process is started in synchronization with the start of the performance process (S304). The song recording process is a process of recording voice information input to the singing microphone 44 during performance in order to extract voice synthesis data from the singing voice of the user who sings.

楽曲の再生終了が判定された場合（Ｓ３０５：Ｙｅｓ）、歌唱録音処理を終了（Ｓ３０６）し、楽曲の演奏期間中を通じて歌唱音声情報を録音した録音情報に基づいて、音声合成用データベース中、不足する音声合成用データを収集する収集処理を実行する。図１３には、本発明の実施形態に係る音声合成用データ収集の形態を説明するための図が示されている。この図は、歌詞情報の発音名（図１３（Ａ））と演奏期間中に録音した録音情報（図１３（Ｂ））の時間軸上での関係を示したものである。例えば、音声合成用データベース中、不足する音声合成用データが、「く（平坦型）」、「の（平坦型）」の場合、図１３（Ａ）に示す、発音名に対応する演奏期間Ｔ１、Ｔ２が、録音情報の切り出し期間となる。音声合成用データを波形情報とする場合、切り出された録音情報が音声合成用データとなる。また、音声合成用データをパラメータとする場合、切り出された録音情報を分析して得られた結果が音声合成用データとなる。 When it is determined that the music has been played back (S305: Yes), the singing recording process is finished (S306), and there is a shortage in the speech synthesis database based on the recording information obtained by recording the singing voice information throughout the performance period of the music. A collection process for collecting voice synthesis data is executed. FIG. 13 is a diagram for explaining a form of data collection for speech synthesis according to the embodiment of the present invention. This figure shows the relationship on the time axis between the pronunciation name of the lyrics information (FIG. 13A) and the recorded information recorded during the performance period (FIG. 13B). For example, if the lack of speech synthesis data in the speech synthesis database is “ku (flat type)” or “no (flat type)”, the performance period T1 corresponding to the pronunciation name shown in FIG. , T2 is a recording information cut-out period. When voice synthesis data is used as waveform information, the recorded recording information is voice synthesis data. In addition, when speech synthesis data is used as a parameter, the result obtained by analyzing the clipped recording information is speech synthesis data.

図１３に示すように発音名に対応する演奏期間を切り出して、そのまま音声合成用データに使用することとしてもよいが、カラオケにおける歌唱では、歌唱者の歌唱タイミングが早かったり、遅かったりする場合がある。そのため、図１４に示すような他の音声合成用データ収集の形態を採用することとしてもよい。 As shown in FIG. 13, a performance period corresponding to a pronunciation name may be cut out and used as it is for speech synthesis data. However, in singing in karaoke, a singer's singing timing may be early or late. is there. Therefore, another form of data collection for speech synthesis as shown in FIG. 14 may be adopted.

図１４は、図１３と同様、歌詞情報の発音名（図１４（Ａ））と演奏期間中に録音した録音情報（図１４（Ｂ））の時間軸上での関係を示したものである。この形態においても不足する音声合成用データが、「く（平坦型）」、「の（平坦型）」としている。この収集形態では、図１３（Ａ）に示す、発音名に対応する演奏期間Ｔ１、Ｔ２に対して前後に余裕を持った期間を録音情報の切り出し期間としている。したがって、歌唱者の歌唱が、本来の歌唱タイミングよりも早い、あるいは、遅い場合であっても、必要とする発音名の録音情報を切り出すことが可能となる。ただし、この場合、本来必要としない前後の発音名の録音情報が切り出されてしまう。そのため、このような収集形態では、切り出された録音情報を分析して、必要とする発音名の音声合成用データを収集することが必要となる。 FIG. 14, like FIG. 13, shows the relationship on the time axis between the pronunciation name of the lyrics information (FIG. 14A) and the recorded information recorded during the performance period (FIG. 14B). . The voice synthesis data that is insufficient even in this form is “ku (flat type)” and “no (flat type)”. In this collection form, a period having a margin before and after the performance periods T1 and T2 corresponding to the pronunciation name shown in FIG. Therefore, even when the singer's singing is earlier or later than the original singing timing, it is possible to cut out the recording information of the necessary pronunciation name. However, in this case, the recording information of the pronunciation names before and after that is not necessary is cut out. Therefore, in such a collection form, it is necessary to analyze the recorded recording information and collect speech synthesis data having a required pronunciation name.

図１３、図１４の収集形態では、何れも発音名に対応する演奏タイミングを使用して、録音情報を切り出し、音声合成用データを収集することとしているが、録音情報の切り出しを行うこと無く、録音情報全体を分析することで必要とする発音名の音声合成用データを収集することも考えられる。分析手法としては、発音名に対応する音声合成用データのテンプレートを用意しておき、録音情報中、テンプレートにマッチングした期間を音声合成用データ取得の対象とすることなどが考えられる。 In the collection forms of FIGS. 13 and 14, the recording timing is cut out by using the performance timing corresponding to the pronunciation name and the voice synthesis data is collected, but without cutting out the recording information, It is also conceivable to collect voice synthesis data of pronunciation names required by analyzing the entire recording information. As an analysis method, it is conceivable to prepare a voice synthesis data template corresponding to a pronunciation name, and use a period matching the template in the recorded information as a target for voice synthesis data acquisition.

収集処理（Ｓ３０７）で収集された音声合成用データは、対応するユーザのユーザ情報中の音声合成用データベースに格納される。カラオケ装置２では、収集結果をモニタ４１に表示する（Ｓ３０８）。図１５には、本発明の実施形態に係る収集結果表示画面が示さ
れている。本実施形態では、歌唱を終えたユーザの音声合成用データベースの完成度と、今回の歌唱による音声合成用データベースアップ率が示されている。さらに、本実施形態では、歌唱終了後、次に何を歌唱すべきかを推奨する推奨処理（Ｓ２００）を実行することとしている。この推奨処理は、図６のアクティブユーザトップ画面において「音声合成用ＤＢ作成」を選択した場合と同様、音声合成用データベースを完成に近づけるための楽曲を推奨する処理であり、この場合、歌唱者が確認できるようにモニタ４１に推奨楽曲が表示される。この推奨処理（Ｓ２００）では、モニタ４１に表示されている推奨楽曲の中から予約指定できるようにしてもよい。 The speech synthesis data collected in the collection process (S307) is stored in the speech synthesis database in the user information of the corresponding user. In the karaoke apparatus 2, the collection result is displayed on the monitor 41 (S308). FIG. 15 shows a collection result display screen according to the embodiment of the present invention. In the present embodiment, the degree of completion of the database for speech synthesis of the user who has finished singing and the database up rate for speech synthesis by this singing are shown. Furthermore, in this embodiment, the recommendation process (S200) which recommends what should be sung next is performed after completion | finish of a song. This recommendation process is a process for recommending music for bringing the speech synthesis database close to completion, as in the case where “Create speech synthesis DB” is selected on the active user top screen of FIG. The recommended music is displayed on the monitor 41 so that can be confirmed. In this recommendation process (S200), reservations may be designated from the recommended music displayed on the monitor 41.

以上、説明を行った楽曲再生処理中の収集処理を利用することで、ユーザは歌唱を通じて作業感を伴うこと無く、音声合成用データを収集することが可能となる。さらに音声合成用データを収集するための楽曲は、ユーザ情報に基づいたユーザの嗜好に応じて推奨することで、さらに音声合成用データの収集に対する作業感を軽減することが可能となる。音声合成用データベースに格納された音声合成用データは、各種音声合成処理に使用することが可能である。取得した音声合成用データによる音声合成処理は、カラオケ装置２において、通常の歌唱に対して、コーラス、ハーモニー、あるいは、模範歌唱、デュエットなどを付加する歌唱音声合成に使用することが考えられる。あるいは、カラオケ装置２において、文章の読み上げなど、音声によるアナウンスに使用してもよい。 As described above, by using the collection process during the music reproduction process described above, the user can collect the data for speech synthesis without feeling the work through singing. Furthermore, music for collecting speech synthesis data is recommended according to the user's preference based on the user information, so that it is possible to further reduce the work feeling for the collection of speech synthesis data. The speech synthesis data stored in the speech synthesis database can be used for various speech synthesis processes. It is conceivable that the speech synthesis processing using the acquired speech synthesis data is used in the karaoke apparatus 2 for singing speech synthesis in which chorus, harmony, model song, duet, or the like is added to a normal song. Alternatively, the karaoke device 2 may be used for voice announcements such as reading a sentence.

また、本実施例において、図３に示すユーザ情報は、インターネットを介したホスト装置５の記憶５１に記憶しているが、複数のカラオケ装置２が稼動するカラオケボックスなどにおいては、ＬＡＮ１００を介して接続された任意の１台のカラオケ装置２が記憶部５１を備えていてもよい。またナイト市場などにおける小規模な店舗においては、１台のカラオケ装置が記憶部５１を備えていてもよい。 In the present embodiment, the user information shown in FIG. 3 is stored in the storage 51 of the host device 5 via the Internet. However, in a karaoke box or the like in which a plurality of karaoke devices 2 are operated, the LAN 100 is used. Any one connected karaoke apparatus 2 may include the storage unit 51. In a small-scale store in a night market or the like, one karaoke device may include the storage unit 51.

また、カラオケ装置２での使用に限ったものでは無く、パーソナルコンピュータや携帯端末等の各種情報処理装置に、音声合成用データベースを提供することで、各種情報処理装置においてユーザの声に基づく音声合成処理を行うことが可能である。 Further, the present invention is not limited to use in the karaoke apparatus 2 but provides a speech synthesis database to various information processing apparatuses such as a personal computer and a portable terminal, so that voice synthesis based on a user's voice is performed in the various information processing apparatuses. Processing can be performed.

なお、本発明はこれらの実施形態のみに限られるものではなく、それぞれの実施形態の構成を適宜組み合わせて構成した実施形態も本発明の範疇となるものである。 Note that the present invention is not limited to these embodiments, and embodiments configured by appropriately combining the configurations of the respective embodiments also fall within the scope of the present invention.

１ａ、１ｂ…リモコン装置２４…ＬＡＮ通信部
１１…タッチパネルモニタ２５…音響制御部
１１ａ…表示部２７…メモリ
１１ｂ…タッチパネル２８…ビデオＲＡＭ
１２…ビデオＲＡＭ２９…映像再生部
１３…映像制御部３０…ＣＰＵ
１４…メモリ３１…映像制御部
１５…ＣＰＵ３２…ＨＤＤ（ハードディスク）
１６…無線ＬＡＮ通信部４１…モニタ
１７…操作部４２…スピーカ
１８…操作処理部４４ａ、４４ｂ…歌唱用マイク（マイクロホン）
１９…赤外線通信部１００…ＬＡＮ
２…カラオケ装置（コマンダ）１１０…アクセスポイント
２１…操作部１２０…ルータ
２２…操作処理部５…ホスト装置
２３…赤外線通信部５１…記憶部 DESCRIPTION OF SYMBOLS 1a, 1b ... Remote control device 24 ... LAN communication part 11 ... Touch panel monitor 25 ... Acoustic control part 11a ... Display part 27 ... Memory 11b ... Touch panel 28 ... Video RAM
12 ... Video RAM 29 ... Video reproduction unit 13 ... Video control unit 30 ... CPU
14 ... Memory 31 ... Video control unit 15 ... CPU 32 ... HDD (hard disk)
DESCRIPTION OF SYMBOLS 16 ... Wireless LAN communication part 41 ... Monitor 17 ... Operation part 42 ... Speaker 18 ... Operation processing part 44a, 44b ... Microphone for singing (microphone)
19 ... Infrared communication unit 100 ... LAN
2 ... Karaoke device (commander) 110 ... Access point 21 ... Operation unit 120 ... Router 22 ... Operation processing unit 5 ... Host device 23 ... Infrared communication unit 51 ... Storage unit

Claims

A karaoke apparatus comprising a performance unit and a control unit,
The performance unit enables performance based on performance information,
The control unit can perform performance processing, collection processing, and recommendation processing,
The performance processing causes the performance unit to perform performance information corresponding to a designated music piece,
The collection process collects voice synthesis data from singing voice information input during execution of the performance process, and associates the collected voice synthesis data with the pronunciation name of the lyrics information corresponding to the music being played. , Store it in the database for speech synthesis by user,
The recommended processing is to extract a lack of speech synthesis data as insufficient speech synthesis data from a user-specific speech synthesis database, and to play a song whose lyric information includes a pronunciation name corresponding to the lacking speech synthesis data. Karaoke device recommended as a target.

The karaoke apparatus according to claim 1, wherein the recommendation process includes a pronunciation name corresponding to insufficient voice synthesis data in the lyrics information and recommends music corresponding to the user information.

The karaoke apparatus according to claim 1, wherein the recommendation process preferentially recommends a song that includes a lot of deficient speech synthesis data among songs to be recommended.

The karaoke apparatus according to any one of claims 1 to 3, wherein speech synthesis processing is executed based on speech synthesis data stored in a speech synthesis database.

The karaoke apparatus according to claim 4, wherein the voice synthesis process is executed in synchronization with the performance process.

The said collection process collects the data for speech synthesis corresponding to the said pronunciation name from the time area | region corresponding to the singing timing of the pronunciation name of lyrics information in singing voice information. Karaoke apparatus as described in 1.

The karaoke apparatus according to any one of claims 1 to 6, wherein the collection process analyzes singing voice information and collects data for speech synthesis corresponding to a pronunciation name of lyrics information.

A karaoke program that causes a computer to perform performance processing, collection processing, and recommended processing,
The performance processing causes performance information corresponding to the designated music to be played,
The collection process collects voice synthesis data from singing voice information input during execution of the performance process, and associates the collected voice synthesis data with the pronunciation name of the lyrics information corresponding to the music being played. , Store it in the database for speech synthesis by user,
The recommended processing is to extract a lack of speech synthesis data as insufficient speech synthesis data from a user-specific speech synthesis database, and to play a song whose lyric information includes a pronunciation name corresponding to the lacking speech synthesis data. A recommended karaoke program.