JP4595948B2

JP4595948B2 - Data reproducing apparatus, data reproducing method and program

Info

Publication number: JP4595948B2
Application number: JP2007031063A
Authority: JP
Inventors: 卓朗曽根; 孝浩田中
Original assignee: Yamaha Corp
Current assignee: Yamaha Corp
Priority date: 2007-02-09
Filing date: 2007-02-09
Publication date: 2010-12-08
Anticipated expiration: 2027-02-09
Also published as: JP2008197269A

Description

本発明は、入力されたデータと同期して、他のデータを再生する技術に関する。 The present invention relates to a technique for reproducing other data in synchronization with input data.

一般的なカラオケ装置においては、例えば、ＭＩＤＩ（ＭｕｓｉｃａｌＩｎｓｔｒｕｍｅｎｔｓＤｉｇｉｔａｌＩｎｔｅｒｆａｃｅ：登録商標）形式の伴奏用データ、楽曲の歌詞テロップを表示するシーケンスデータおよび映像データなどから構成された楽曲データをそれぞれ同期させて再生することにより、利用者はその楽曲のカラオケを楽しむことができる。特許文献１には、伴奏データと映像データを別個にサーバから受信してそれらを同期させて再生させる技術が開示されている。また、特許文献２においては、ライブ演奏のような臨場感のあるカラオケ演奏を提供する技術が開示されている。 In a general karaoke apparatus, for example, music data composed of MIDI (Musical Instruments Digital Interface) format accompaniment data, sequence data for displaying lyrics telop of music, and video data are synchronized. By playing, the user can enjoy karaoke of the music. Japanese Patent Application Laid-Open No. 2004-151620 discloses a technique for receiving accompaniment data and video data separately from a server and reproducing them in synchronization. Patent Document 2 discloses a technique for providing a karaoke performance with a sense of presence such as a live performance.

また、歌詞を表示させる技術は、カラオケ装置だけで用いられるわけではなく、テレビ放送などにおける歌番組においても用いられている。また、テレビ放送においては、生放送の歌番組などで歌手が歌唱に合わせて歌詞テロップを表示する場合、オペレータが楽曲の進行にあわせて、所定のタイミングで歌詞を表示させるようにしている。
特開２００３−１５６７５号公報特開２０００−３４７６７６号公報 Moreover, the technique for displaying lyrics is not used only in a karaoke apparatus, but is also used in a song program in television broadcasting or the like. In television broadcasting, when a singer displays a lyrics telop along with a song in a live broadcast song program or the like, the operator displays the lyrics at a predetermined timing as the music progresses.
JP 2003-15675 A JP 2000-347676 A

しかし、特許文献１に開示された技術を用いる場合には、伴奏データはＭＩＤＩ形式のデータの再生であるため、ＭＩＤＩ形式のデータを再生できる音源を用いた再生のため、音質が劣る場合や、テンポが単調な演奏となってしまう場合があった。また、特許文献２に開示された技術の場合には、ライブ演奏のような臨場感が得られるが、楽曲の進行が単調ではないため、歌詞テロップを表示させるためなどのシーケンスデータについては、楽曲の進行に合わせて事前に作成しておく必要があった。また、テレビ放送などにおいて上述したようなオペレータの作業は、本番においてミスは許されず、事前に楽曲の進行と歌詞の表示を行うタイミングとを合わせる練習をしておく必要があり、非常に負担が大きかった。 However, when the technique disclosed in Patent Document 1 is used, the accompaniment data is reproduction of data in MIDI format. Therefore, reproduction using a sound source that can reproduce data in MIDI format results in poor sound quality, In some cases, the tempo was monotonous. In addition, in the case of the technique disclosed in Patent Document 2, a sense of reality such as a live performance can be obtained, but since the progress of the music is not monotonous, the sequence data for displaying the lyrics telop, etc. It was necessary to prepare in advance according to the progress of. Moreover, the operator's work as described above in television broadcasting etc. is not allowed to be mistaken in the actual performance, and it is necessary to practice in advance the timing of the progress of the music and the timing of displaying the lyrics, which is very burdensome. It was big.

本発明は、上述の事情に鑑みてなされたものであり、ライブ演奏のように楽曲のテンポの変動があっても、楽曲の流れにあわせ、当該楽曲に対応したデータの再生を簡単に行うことができるデータ再生装置、データ再生方法およびプログラムを提供することを目的とする。 The present invention has been made in view of the above circumstances, and even if there is a change in the tempo of a music like a live performance, data corresponding to the music can be easily reproduced in accordance with the music flow. An object of the present invention is to provide a data reproducing apparatus, a data reproducing method, and a program capable of performing the above.

上述の課題を解決するため、本発明は、データ各部についての時刻を規定する同期情報を有した第１のデータと、音声の波形を示しデータ各部に時刻が規定された第１のオーディオデータとを組にして複数記憶する記憶手段と、外部から供給される音声の波形を示す第２のオーディオデータに基づいて、前記記憶手段に記憶された複数の第１のオーディオデータから第１のオーディオデータを特定する特定手段と、前記外部から供給される第２のオーディオデータの各部を順次取得し、当該取得した第２のオーディオデータの各部と前記特定手段によって特定された第１のオーディオデータの各部とを比較することによりデータ各部を対応付けて、当該取得した第２のオーディオデータの各部に対応する前記第１のオーディオデータに規定された時刻に応じた時刻情報を生成するタイムアライメント手段と、前記時刻情報と前記同期情報の対応関係に基づいて、前記特定手段によって特定された第１のオーディオデータと組になっている第１のデータを前記記憶手段から読み出すデータ読出手段と、前記データ読出手段によって読み出された第１のデータに基づいて、第１の映像データを生成する映像データ生成手段と、前記外部から供給される第２のオーディオデータを所定量遅延させて出力する遅延手段とを具備することを特徴とするデータ再生装を提供する。 In order to solve the above-described problem, the present invention provides: first data having synchronization information that defines a time for each data portion ; first audio data that indicates a waveform of a sound and has a time defined in each data portion; A plurality of first audio data stored in the storage means based on storage means for storing a plurality of sets and second audio data indicating a waveform of a sound supplied from outside specifying means for specifying a second each unit of the audio data sequentially obtained, each portion of the first audio data specified by each unit and said specifying means of the second audio data the acquired supplied from the external DOO association data each part by comparing the, is defined in the first audio data corresponding to each part of the second audio data the acquired And a time alignment means for generating a time information according to time, based on the synchronization information of the corresponding relationship between the time information, a first that is a first audio data and the set identified by the identifying means Data reading means for reading data from the storage means, video data generating means for generating first video data based on the first data read by the data reading means, and a first supplied from the outside the second audio data by a predetermined amount the delay to provide a data reproduction instrumentation, characterized by comprising delay means you output.

また、別の好ましい態様において、前記特定手段は、外部から供給される第２のオーディオデータと、前記記憶手段に記憶された第１のオーディオデータの各々とを比較した結果に基づいて、前記記憶手段に記憶された複数の第１のオーディオデータから第１のオーディオデータを特定してもよい。 In another preferable aspect, the specifying unit is configured to store the second audio data supplied from the outside and the first audio data stored in the storage unit based on a result of comparison. The first audio data may be specified from the plurality of first audio data stored in the means.

また、別の好ましい態様において、収音によってオーディオデータを生成する収音手段をさらに具備し、前記特定手段は、外部から供給される第２のオーディオデータに代えて、前記収音手段によって生成したオーディオデータと、前記記憶手段に記憶された第１のオーディオデータの各々とを比較した結果に基づいて、前記記憶手段に記憶された複数の第１のオーディオデータから第１のオーディオデータを特定してもよい。 Further, in another preferred aspect, the apparatus further includes sound collection means for generating audio data by sound collection, wherein the specifying means is generated by the sound collection means instead of the second audio data supplied from the outside. Based on the result of comparing the audio data and each of the first audio data stored in the storage means, the first audio data is specified from the plurality of first audio data stored in the storage means. May be.

また、別の好ましい態様において、前記特定手段に再度第１のオーディオデータを特定させる再特定指示手段をさらに具備してもよい。 In another preferable aspect, the information processing apparatus may further include a re-specification instruction unit that causes the specifying unit to specify the first audio data again.

また、本発明は、ネットワークに接続されたサーバに対して、音声の波形を示しデータ各部に時刻が規定された第１のオーディオデータを特定させる指示を出す指示手段と、前記指示手段の指示に応答して前記サーバから送信されたデータを前記第１のオーディオデータおよびデータ各部についての時刻を規定する同期情報を有した第１のデータとして受信する受信手段と、前記受信手段によって受信した第１のオーディオデータと第１のデータとを記憶する記憶手段と、外部から供給される音声の波形を示す第２のオーディオデータの各部を順次取得し、当該取得した第２のオーディオデータの各部と前記記憶手段に記憶された第１のオーディオデータの各部とを比較することによりデータ各部を対応付けて、当該取得した第２のオーディオデータの各部に対応する前記第１のオーディオデータに規定された時刻に応じた時刻情報を生成するタイムアライメント手段と、前記時刻情報と前記同期情報の対応関係に基づいて、前記記憶手段から第１のデータを読み出すデータ読出手段と、前記データ読出手段によって読み出された第１のデータに基づいて、第１の映像データを生成する映像データ生成手段と、前記外部から供給される第２のオーディオデータを所定量遅延させて出力する遅延手段とを具備することを特徴とするデータ再生装置を提供する。 Further, the present invention is for a server connected to a network, and instructing means for issuing an instruction for specifying the first audio data time data each part shows the voice waveform is defined, the instruction of the instructing means receiving means for receiving a first data having a synchronization information that defines the time for the first audio data and data each part data transmitted from said server in response, the received by the receiving unit 1 Storage means for storing the audio data and the first data, and each part of the second audio data indicating the waveform of the sound supplied from the outside are sequentially acquired, and each part of the acquired second audio data in association with data each part by comparing the respective portions of the first audio data stored in the storage unit, second audio having the acquired A time alignment means for generating time information in accordance with the first audio data to a defined time corresponding to each portion of the data, based on the correspondence between the time information and the synchronization information, the from the storage unit 1 Data reading means for reading the data, video data generating means for generating the first video data based on the first data read by the data reading means, and the second audio supplied from the outside to provide a data reproduction apparatus characterized by comprising delay means you output by a predetermined amount the delay data.

また、本発明は、ネットワークに接続されたサーバに第１のオーディオデータを特定させる指示を出す指示手段と、前記指示手段の指示に応答して前記サーバから送信されたデータを楽曲情報データとして受信する受信手段と、データ各部についての時刻を規定する同期情報を有した第１のデータと、音声の波形を示しデータ各部に時刻が規定された第１のオーディオデータとを組にして複数記憶する記憶手段と、前記楽曲情報データに基づいて、前記記憶手段に記憶されている複数の第１のオーディオデータから第１のオーディオデータを選択する選択手段と、外部から供給される第２のオーディオデータの各部を順次取得し、当該取得した第２のオーディオデータの各部と前記選択手段によって選択された第１のオーディオデータの各部とを比較することによりデータ各部を対応付けて、当該取得した第２のオーディオデータの各部に対応する前記第１のオーディオデータに規定された時刻に応じた時刻情報を生成するタイムアライメント手段と、前記時刻情報と前記同期情報の対応関係に基づいて、前記選択手段によって選択された第１のオーディオデータと組になっている第１のデータを前記記憶手段から読み出すデータ読出手段と、前記データ読出手段によって読み出された第１のデータに基づいて、第１の映像データを生成する映像データ生成手段と、前記外部から供給される第２のオーディオデータを所定量遅延させて出力する遅延手段とを具備することを特徴とするデータ再生装置を提供する。 In addition, the present invention provides an instruction means for issuing an instruction to specify a first audio data to a server connected to a network, and receives data transmitted from the server in response to an instruction from the instruction means as music information data. A plurality of receiving means, first data having synchronization information that defines the time for each part of the data, and first audio data that indicates the waveform of the sound and the time is defined for each part of the data. Storage means; selection means for selecting first audio data from a plurality of first audio data stored in the storage means based on the music information data; and second audio data supplied from the outside each part of each unit sequentially acquires a first audio data selected by each unit and said selecting means of the second audio data the acquired In association with data each part by comparing the time alignment means for generating the time information acquired in accordance with the first audio data to a defined time corresponding to the respective portions of the second audio data, the Data reading means for reading out the first data paired with the first audio data selected by the selection means from the storage means based on the correspondence between the time information and the synchronization information; and the data reading means based on the first data read by the image data generating means for generating first image data, delay means you outputting second audio data supplied from the outside by a predetermined amount delay A data reproducing apparatus is provided.

また、別の好ましい態様において、前記指示手段は、ネットワークに接続されたサーバに第１のオーディオデータを特定させる指示を出すとともに、外部から供給される第２のオーディオデータを前記サーバへ送信してもよい。 In another preferred embodiment, the instruction means issues an instruction to specify a first audio data to a server connected to a network, and transmits second audio data supplied from the outside to the server. Also good.

また、別の好ましい態様において、前記指示手段は、ネットワークに接続されたサーバに第１のオーディオデータを特定させる指示を出すとともに、外部から供給される第２のオーディオデータの一部を前記サーバへ送信してもよい。 In another preferred embodiment, the instruction means issues an instruction to specify a first audio data to a server connected to a network, and a part of the second audio data supplied from outside is sent to the server. You may send it.

また、別の好ましい態様において、収音によってオーディオデータを生成する収音手段をさらに具備し、前記指示手段は、ネットワークに接続されたサーバに第１のオーディオデータを特定させる指示を出すとともに、前記収音手段によって生成したオーディオデータを前記サーバへ送信してもよい。 Further, in another preferred aspect, the apparatus further comprises sound collection means for generating audio data by sound collection, wherein the instruction means issues an instruction to specify the first audio data to a server connected to the network, and Audio data generated by the sound collection means may be transmitted to the server.

また、別の好ましい態様において、ネットワークに接続されたサーバに第１のオーディオデータを特定させる指示を前記指示手段に再度出させる再特定指示手段をさらに具備してもよい。 In another preferable aspect, the information processing apparatus may further include re-specification instruction means for causing the instruction means to issue an instruction for causing the server connected to the network to specify the first audio data again.

また、別の好ましい態様において、複数の映像データが入力され、前記複数の映像データのうち一の映像データに係る映像に対して、他の映像データに係る映像をスーパーインポーズする映像合成手段をさらに具備し、前記遅延手段は、外部から供給され、前記第２のオーディオデータと時間同期がとられた第２の映像データを前記所定量遅延させ、前記映像合成手段は、前記遅延手段によって遅延された第２の映像データに係る映像に対して、前記映像データ生成手段によって生成された第１の映像データに係る映像をスーパーインポーズしてもよい。 Further, in another preferred aspect, there is provided video synthesizing means for superimposing a video related to other video data on a video related to one video data among the plurality of video data when a plurality of video data is inputted. The delay means further delays the predetermined amount of the second video data supplied from the outside and time-synchronized with the second audio data, and the video composition means delays the delay by the delay means. The video related to the first video data generated by the video data generation means may be superimposed on the video related to the second video data.

また、本発明は、外部から供給される音声の波形を示す第２のオーディオデータに基づいて、記憶手段に記憶された音声の波形を示しデータ各部に時刻が規定された複数の第１のオーディオデータから第１のオーディオデータを特定する特定過程と、前記外部から供給される第２のオーディオデータの各部を順次取得し、当該取得した第２のオーディオデータの各部と前記特定過程において特定された第１のオーディオデータの各部とを比較することによりデータ各部を対応付けて、当該取得した第２のオーディオデータの各部に対応する前記第１のオーディオデータに規定された時刻に応じた時刻情報を生成するタイムアライメント過程と、データ各部についての時刻を規定する同期情報を有した第１のデータと前記第１のオーディオデータとを組にして複数記憶する前記記憶手段から、前記時刻情報と前記同期情報の対応関係に基づいて、前記特定過程において特定された第１のオーディオデータと組になっている第１のデータを読み出すデータ読出過程と、前記データ読出過程において読み出された第１のデータに基づいて、第１の映像データを生成する映像データ生成過程と、前記外部から供給される第２のオーディオデータを所定量遅延させて出力する遅延過程とを備えることを特徴とするデータ再生方法を提供する。 Further, the present invention provides a plurality of first audios that indicate the waveform of the voice stored in the storage means and that have a time defined in each part of the data, based on the second audio data that indicates the waveform of the voice supplied from the outside. a specific step of identifying a first audio data from the data, said each part of the second audio data sequentially obtained externally supplied, identified in each section and the specific process of the second audio data the acquired By comparing each part of the first audio data with each other , the data parts are associated with each other, and time information corresponding to the time defined in the first audio data corresponding to each part of the acquired second audio data is obtained. a time alignment process of generating the first data having the synchronization information that defines the time for the data each part first audio data From the storage means for storing a plurality in the set the door, based on the correspondence relationship of the synchronization information and the time information, the first data that is the first audio data and the set identified in the specific process A data reading process for reading, a video data generating process for generating first video data based on the first data read in the data reading process, and the second audio data supplied from the outside are provided. providing data reproducing method characterized by comprising a delay process you output by quantitative delayed.

また、別の好ましい態様において、収音によってオーディオデータを生成する収音過程をさらに備え、前記特定過程は、外部から供給される第２のオーディオデータに代えて、前記収音過程において生成したオーディオデータと、前記記憶手段に記憶された第１のオーディオデータの各々とを比較した結果に基づいて、前記記憶手段に記憶された複数の第１のオーディオデータから第１のオーディオデータを特定してもよい。 Further, in another preferred embodiment, further comprising a degree over sound pickup for generating audio data by sound collection, the particular process, instead of the second audio data supplied from the outside, and generated in the sound collection step Based on the result of comparing the audio data and each of the first audio data stored in the storage means, the first audio data is specified from the plurality of first audio data stored in the storage means. May be.

また、本発明は、コンピュータに、外部から供給される音声の波形を示す第２のオーディオデータに基づいて、記憶手段に記憶された音声の波形を示しデータ各部に時刻が規定された複数の第１のオーディオデータから第１のオーディオデータを特定する特定機能と、前記外部から供給される第２のオーディオデータの各部を順次取得し、当該取得した第２のオーディオデータの各部と前記特定機能によって特定された第１のオーディオデータの各部とを比較することによりデータ各部を対応付けて、当該取得した第２のオーディオデータの各部に対応する前記第１のオーディオデータに規定された時刻に応じた時刻情報を生成するタイムアライメント機能と、データ各部についての時刻を規定する同期情報を有した第１のデータと前記第１のオーディオデータとを組にして複数記憶する前記記憶手段から、前記時刻情報と前記同期情報の対応関係に基づいて、前記特定機能において特定された第１のオーディオデータと組になっている第１のデータを読み出すデータ読出機能と、前記データ読出機能によって読み出された第１のデータに基づいて、第１の映像データを生成する映像データ生成機能と、前記外部から供給される第２のオーディオデータを所定量遅延させて出力する遅延機能とを実現させるためのプログラムを提供する。 In addition, the present invention provides a computer with a plurality of second waveforms indicating the waveform of the voice stored in the storage means based on the second audio data indicating the waveform of the voice supplied from the outside. 1 audio data and specifying function of specifying the first audio data, wherein each section of the second audio data sequentially obtained externally supplied, by the specific features and each part of the second audio data the acquired By comparing each part of the specified first audio data with each other , the data parts are associated with each other, and according to the time defined in the first audio data corresponding to each part of the acquired second audio data a time alignment function of generating the time information, the first data and the first having synchronization information that defines the time of the data each section From the storage means for storing a plurality by a set of audio data, the time information and on the basis of the correspondence relationship of the synchronization information, the first audio data and the set since it first has identified in the specific function A data read function for reading data, a video data generation function for generating first video data based on the first data read by the data read function, and the second audio data supplied from the outside the providing program for realizing the you output delay function by a predetermined amount delay.

また、別の好ましい態様において、収音によってオーディオデータを生成する収音機能をさらに備え、前記特定機能は、外部から供給される第２のオーディオデータに代えて、前記収音機能によって生成したオーディオデータと、前記記憶手段に記憶された第１のオーディオデータの各々とを比較した結果に基づいて、前記記憶手段に記憶された複数の第１のオーディオデータから第１のオーディオデータを特定してもよい。 In another preferable aspect, the audio recording apparatus further includes a sound collection function for generating audio data by sound collection, wherein the specific function is an audio generated by the sound collection function instead of the second audio data supplied from the outside. First audio data is identified from a plurality of first audio data stored in the storage means based on a result of comparing the data and each of the first audio data stored in the storage means Also good.

本発明によれば、ライブ演奏のように楽曲のテンポの変動があっても、楽曲の流れにあわせ、当該楽曲に対応したデータの再生を簡単に行うことができるデータ再生装置、データ再生方法およびプログラムを提供することができる。 ADVANTAGE OF THE INVENTION According to this invention, even if there exists a fluctuation | variation of the tempo of a music like a live performance, according to the flow of a music, the data reproduction apparatus which can perform the reproduction | regeneration of the data corresponding to the said music easily, a data reproduction method, and A program can be provided.

以下、本発明の一実施形態について説明する。 Hereinafter, an embodiment of the present invention will be described.

＜実施形態＞
図１は、本発明の本実施形態に係るデータ再生装置を有する映像表示装置１のハードウエアの構成を示すブロック図である。 <Embodiment>
FIG. 1 is a block diagram showing a hardware configuration of a video display apparatus 1 having a data reproducing apparatus according to the embodiment of the present invention.

ＣＰＵ（ＣｅｎｔｒａｌＰｒｏｃｅｓｓｉｎｇＵｎｉｔ）１１は、ＲＯＭ（ＲｅａｄＯｎｌｙＭｅｍｏｒｙ）１２に記憶されているプログラムを読み出して、ＲＡＭ（ＲａｎｄｏｍＡｃｃｅｓｓＭｅｍｏｒｙ）１３にロードして実行することにより、映像表示装置１の各部について、バス１０を介して制御する。また、ＲＡＭ１３は、ＣＰＵ１１が記憶された各データの加工などを行う際のワークエリアとして機能する。 A CPU (Central Processing Unit) 11 reads out a program stored in a ROM (Read Only Memory) 12, loads it into a RAM (Random Access Memory) 13, and executes it. Control is performed via the bus 10. The RAM 13 functions as a work area when the CPU 11 processes each data stored.

さらに、ＲＡＭ１３は、内部に有するライブデータバッファ領域１３ａに、通信部１７から受信するストリーミング再生するためのライブ映像データ（第２の映像データ）、ライブ楽音データ（第２のオーディオデータ）を一時的にバッファする機能を有する。そして、ＣＰＵ１１は、ＲＡＭ１３にバッファされたライブ映像データ、ライブ楽音データを読み出し、後述するような処理を行ってストリーミング再生を行う。また、ＲＡＭ１３は、ライブ楽音データの一部を記憶するための基準データ記憶領域１００２を有している。基準データ記憶領域１００２については、後述する。 Further, the RAM 13 temporarily stores live video data (second video data) and live musical sound data (second audio data) for streaming playback received from the communication unit 17 in a live data buffer area 13a included therein. Has a buffering function. Then, the CPU 11 reads live video data and live music data buffered in the RAM 13, and performs streaming playback by performing processing as described later. The RAM 13 has a reference data storage area 1002 for storing a part of live musical sound data. The reference data storage area 1002 will be described later.

ここで、ライブ映像データは、楽曲の生演奏を撮影した映像（以下、ライブ映像という）の映像データである。また、ライブ楽音データは、楽曲の生演奏を録音した音声（以下、ライブ演奏）のオーディオデータであって、ライブ演奏には、歌手の歌声（以下、ライブボーカルという）や伴奏などが含まれている。 Here, the live video data is video data of a video (hereinafter referred to as a live video) obtained by shooting a live performance of music. In addition, live music data is audio data of a sound recording a live performance of a music (hereinafter referred to as a live performance), and the live performance includes a singer's singing voice (hereinafter referred to as a live vocal) and accompaniment. Yes.

記憶部（記憶手段）１４は、例えば、ハードディスクなどの大容量記憶手段であって、楽曲データ記憶領域１４ａに、各楽曲のリファレンスデータとなるリファレンス楽曲データを記憶する。リファレンス楽曲データは、リファレンス楽音データ（第１のオーディオデータ）と歌詞データ（第１のデータ）を有している。リファレンス楽音データは、楽曲の見本となる歌声や伴奏を含む演奏（以下、リファレンス演奏という）を録音したオーディオデータであって、その再生時刻を示すタイムコードが付されている。歌詞データは、楽曲の歌詞を示すテキストデータと当該テキストデータの各テキストの表示タイミングを示すデータとを有するシーケンスデータであって、シーケンスデータの読み出し時刻を示すタイムコード（同期情報）が付されている。そして、リファレンス楽音データと歌詞データは、同じタイムコードによって読み出すことにより時間同期して再生することができ、楽曲の見本となる歌声にあった歌詞が表示されるようになっている。ここで、各リファレンス楽曲データは、それぞれの楽曲を識別するための情報（以下、楽曲情報データという）を有している。なお、楽曲を識別するための情報は、楽曲名や認識番号など、当該楽曲を特定できる情報であれば何でもよい。 The storage unit (storage means) 14 is a large-capacity storage means such as a hard disk, for example, and stores reference song data serving as reference data for each song in the song data storage area 14a. The reference music data includes reference musical sound data (first audio data) and lyrics data (first data). The reference musical sound data is audio data obtained by recording a performance (hereinafter referred to as a reference performance) including a singing voice and accompaniment as a sample of the music, and a time code indicating the reproduction time is attached. The lyric data is sequence data having text data indicating the lyrics of the music and data indicating the display timing of each text of the text data, and a time code (synchronization information) indicating the readout time of the sequence data is attached. Yes. The reference musical sound data and the lyric data can be reproduced in time synchronization by being read out by the same time code, and the lyrics suitable for the singing voice as a sample of the music are displayed. Here, each reference music data has information (hereinafter referred to as music information data) for identifying each music. The information for identifying the music may be anything as long as it is information that can identify the music, such as a music name and a recognition number.

表示部１５は、映像を画面に表示する液晶ディスプレイなどの表示デバイスであって、入力された映像データに基づいて表示を行う。また、映像表示装置１を操作するためのメニュー画面などの各種画面を表示する。操作部１６は、例えばキーボードやマウスなどであり、映像表示装置１の利用者が操作部１６を操作すると、その操作内容を表すデータがＣＰＵ１１へ出力される。 The display unit 15 is a display device such as a liquid crystal display that displays an image on a screen, and performs display based on input image data. Further, various screens such as a menu screen for operating the video display device 1 are displayed. The operation unit 16 is, for example, a keyboard or a mouse. When a user of the video display device 1 operates the operation unit 16, data representing the operation content is output to the CPU 11.

通信部１７は、有線、無線などによって、データを受信するチューナなどの通信手段であって、上述したように、本実施形態においては、ライブ映像データ、ライブ楽音データを受信して、ＲＡＭ１３のライブデータバッファ領域１３ａにバッファする機能を有している。 The communication unit 17 is a communication unit such as a tuner that receives data by wire, wireless, or the like. As described above, in the present embodiment, the communication unit 17 receives live video data and live musical sound data, and performs live playback of the RAM 13. It has a function of buffering in the data buffer area 13a.

音声出力部１８は、スピーカなどの放音手段を有し、入力されたオーディオデータに基づいて放音する。 The sound output unit 18 has sound emitting means such as a speaker, and emits sound based on the input audio data.

次に、ＣＰＵ１１が、ＲＯＭ１２に記憶されたプログラムを実行することによって実現する機能について説明する。図２は、ＣＰＵ１１が実現する機能を示したソフトウエアの構成を示すブロック図である。また、図３は、図２における楽曲検索部１００の詳細の構成を示すブロック図である。 Next, functions realized by the CPU 11 executing programs stored in the ROM 12 will be described. FIG. 2 is a block diagram showing a software configuration showing the functions realized by the CPU 11. FIG. 3 is a block diagram showing a detailed configuration of the music search unit 100 in FIG.

楽曲検索部（特定手段）１００は、ＣＰＵ１１によってライブデータバッファ領域１３ａから読み出されたライブ楽音データを取得し、このライブ楽音データに基づいて、当該ライブ楽音データに対応する楽曲の楽曲情報データを楽曲データ選択部１０１に出力するものである。まず、楽曲検索部１００について、図３を用いて説明する。 The music search unit (specifying means) 100 acquires live music data read from the live data buffer area 13a by the CPU 11, and based on the live music data, music information data of music corresponding to the live music data is obtained. This is output to the music data selection unit 101. First, the music search unit 100 will be described with reference to FIG.

楽曲検索部１００は、楽音データ取得部１００１、基準データ記憶領域１００２および検索処理部１００３を有している。楽音データ取得部１００１は、ＣＰＵ１１によってライブデータバッファ領域１３ａから読み出されたライブ楽音データのうちライブ演奏の最初の所定時間分（本実施形態においては３秒）のライブ楽音データを取得（以下、取得したデータを基準データという）し、基準データ記憶領域１００２に記憶させる。そして、楽音データ取得部１００１は、基準データ記憶領域１００２への記憶が終了すると、検索処理部１００３に対して、検索処理の開始を指示する。 The music search unit 100 includes a musical sound data acquisition unit 1001, a reference data storage area 1002, and a search processing unit 1003. The musical sound data acquisition unit 1001 acquires live musical sound data for the first predetermined time (3 seconds in the present embodiment) of the live performance from the live musical sound data read out from the live data buffer area 13a by the CPU 11 (hereinafter referred to as the following). The acquired data is referred to as reference data) and stored in the reference data storage area 1002. Then, the musical sound data acquisition unit 1001 instructs the search processing unit 1003 to start the search process when the storage in the reference data storage area 1002 is completed.

検索処理部１００３は、楽音データ取得部１００１から検索処理の開始を指示されると、検索処理を開始する。以下、検索処理について説明する。まず、検索処理部１００３は、基準データ記憶領域１００２に記憶された基準データと記憶部１４の楽曲データ記憶領域１４ａに記憶されている各リファレンス楽曲データに係るリファレンス楽音データの最初の部分と比較し、基準データと最も類似する、すなわち類似度が最も高いリファレンス楽音データを特定する。そして、検索処理部１００３は、特定したリファレンス楽音データに対応する楽曲情報データを楽曲データ選択部１０１に出力する。ここで、類似度の算出は、例えば、ＤＰ（ＤｙｎａｍｉｃＰｒｏｇｒａｍｍｉｎｇ：動的計画法）マッチングにおいて算出されるＤＰマッチングスコアの累算値に基づいて行えばよい。ＤＰマッチングおよびＤＰマッチングスコアについては後述する。 When the search processing unit 1003 is instructed to start the search processing from the musical sound data acquisition unit 1001, the search processing unit 1003 starts the search processing. Hereinafter, the search process will be described. First, the search processing unit 1003 compares the reference data stored in the reference data storage area 1002 with the first portion of the reference musical sound data related to each reference song data stored in the song data storage area 14a of the storage unit 14. The reference musical sound data that is most similar to the reference data, that is, the highest similarity is specified. Then, the search processing unit 1003 outputs music information data corresponding to the specified reference musical sound data to the music data selection unit 101. Here, the degree of similarity may be calculated based on, for example, an accumulated value of DP matching scores calculated in DP (Dynamic Programming) matching. The DP matching and the DP matching score will be described later.

図２に戻って説明を続ける。楽曲データ選択部１０１は、楽曲検索部１００から出力された楽曲情報データに基づいて、通信部１７が受信している各データの楽曲を認識する。そして、楽曲データ記憶領域１４ａに記憶されているリファレンス楽曲データの中から、認識した楽曲に対応するリファレンス楽曲データを選択し、そのリファレンス楽曲データのリファレンス楽音データおよび歌詞データを読み出す。そして、楽曲データ選択部１０１は、読み出したリファレンス楽音データをタイムアライメント部（タイムアライメント手段）１０２へ出力するとともに、読み出した歌詞データについては、後述するデータ読出部（データ読出手段）１０３に読み出されるまでＲＡＭ１３にバッファしておく。 Returning to FIG. 2, the description will be continued. The music data selection unit 101 recognizes the music of each data received by the communication unit 17 based on the music information data output from the music search unit 100. Then, reference music data corresponding to the recognized music is selected from the reference music data stored in the music data storage area 14a, and reference musical sound data and lyrics data of the reference music data are read out. Then, the music data selection unit 101 outputs the read reference musical tone data to the time alignment unit (time alignment unit) 102, and the read lyrics data is read to the data reading unit (data reading unit) 103 described later. Until it is buffered in the RAM 13.

タイムアライメント部１０２は、ＣＰＵ１１によってライブデータバッファ領域１３ａから読み出されたライブ楽音データを取得し、当該ライブ楽音データと、楽曲データ選択部１０１から出力されたリファレンス楽音データとを比較して、ライブ演奏とリファレンス演奏との楽曲の進行のずれを検出し、当該楽曲の進行のずれに基づいてタイムコード（時刻情報）を出力する機能を有している。 The time alignment unit 102 acquires live musical tone data read from the live data buffer area 13a by the CPU 11, compares the live musical tone data with the reference musical tone data output from the music data selection unit 101, and performs live processing. It has a function of detecting the progress of the music between the performance and the reference performance and outputting a time code (time information) based on the progress of the music.

ここで、楽曲の進行のずれは、それぞれのデータを所定時間長のフレーム単位に分離し、その各々に対してＦＦＴ（ＦａｓｔＦｏｕｒｉｅｒＴｒａｎｓｆｏｒｍ）を施して、それぞれのデータのスペクトルを算出し、これらの各フレーム間で類似したスペクトルを対応付けることにより検出する。また、楽曲の進行のずれを検出する機能について、本実施形態ではＤＰ（ＤｙｎａｍｉｃＰｒｏｇｒａｍｍｉｎｇ：動的計画法）マッチングを用いる。具体的には以下のような処理となる。 Here, the deviation of the progression of music is obtained by separating each data into frame units of a predetermined time length, performing FFT (Fast Fourier Transform) on each of them, calculating the spectrum of each data, Detection is performed by associating similar spectra between frames. In addition, in this embodiment, DP (Dynamic Programming) matching is used for the function of detecting the progression of music progression. Specifically, the processing is as follows.

タイムアライメント部１０２は、図４に示すような座標平面（以下、ＤＰプレーンという）をＲＡＭ１３に形成する。このＤＰプレーンの縦軸は、ライブ楽音データをそれぞれ所定時間長のフレーム単位に分離してその各々に対してＦＦＴを施して得られたスペクトルについて、各フレームのスペクトルの絶対値の対数に逆フーリエ変換をかけて得られるパラメータ（ケプストラム）をａ１、ａ２、ａ３・・・ａｎとして、時間軸に従って並べたものである。また、横軸のｂ１、ｂ２、ｂ３・・・ｂｎは、リファレンス楽音データについて、上記同様に時間軸に従って並べたものである。ここで、縦軸のａ１、ａ２、ａ３・・・ａｎの間隔と横軸のｂ１、ｂ２、ｂ３・・・ｂｎの間隔は、いずれもフレームの時間長と対応している。このＤＰプレーンにおける各格子点の各々には、ａ１、ａ２、ａ３・・・ａｎの各パラメータと、ｂ１、ｂ２、ｂ３・・・ｂｎの各パラメータのユークリッド距離を夫々示す値であるＤＰマッチングスコアが対応付けられている。例えば、ａ１とｂ１とにより位置決めされる格子点（ａ１，ｂ１）には、ライブ楽音データの一連のフレームのうち最初のフレームから得たパラメータとリファレンス楽音データの一連のフレームのうち最初のフレームから得たパラメータのユークリッド距離を示す値が対応付けられることになる。 The time alignment unit 102 forms a coordinate plane (hereinafter referred to as a DP plane) as shown in FIG. The vertical axis of the DP plane indicates the inverse Fourier transform of the logarithm of the absolute value of the spectrum of each frame for the spectrum obtained by dividing the live musical sound data into frames each having a predetermined time length and applying FFT to each. The parameters (cepstrum) obtained by the conversion are arranged along the time axis as a1, a2, a3... An. Further, b1, b2, b3... Bn on the horizontal axis are reference music data arranged in accordance with the time axis as described above. Here, the intervals of a1, a2, a3... An on the vertical axis and the intervals of b1, b2, b3... Bn on the horizontal axis all correspond to the time length of the frame. Each lattice point in the DP plane has a DP matching score which is a value indicating the Euclidean distance of each parameter of a1, a2, a3... An and each parameter of b1, b2, b3. Are associated. For example, the lattice points (a1, b1) positioned by a1 and b1 are the parameters obtained from the first frame of the series of frames of live musical sound data and the first frame of the series of frames of reference musical sound data. A value indicating the Euclidean distance of the obtained parameter is associated.

そして、タイムアライメント部１０２は、このような構造を成すＤＰプレーンを形成した後、ａ１とｂ１とにより位置決めされる始端にあたる格子点（ａ１，ｂ１）からａｎとｂｎとにより位置決めされる終端にあたる格子点（ａｎ，ｂｎ）に至る全経路を探索し、探索した経路ごとに、その始端から終端までの間に辿る各格子点のＤＰマッチングスコアを累算して累算値を求める。なお、始端と終端は各データの最初のフレームと最後のフレームということではなく、各データから所定のフレーム数を単位として行われ、この単位における最初のフレームから最後のフレームまで行われて、これが順次処理され各データの最後のフレームまで処理されていく。 After the time alignment unit 102 forms the DP plane having such a structure, the lattice corresponding to the end positioned by an and bn from the lattice point (a1, b1) positioned by the start end positioned by a1 and b1. All routes to the point (an, bn) are searched, and for each searched route, the DP matching score of each lattice point traced from the start end to the end is accumulated to obtain an accumulated value. Note that the start and end are not the first and last frames of each data, but are performed in units of a predetermined number of frames from each data, and are performed from the first frame to the last frame in this unit. The data is sequentially processed until the last frame of each data is processed.

そして、ＤＰマッチングスコアの累算値が最小となる経路をＤＰプレーン上から特定し、その経路上の各格子点によって、ライブ楽音データの各フレームにリファレンス楽音データの各フレームが対応付けられる。この対応関係により楽曲の進行のずれを検出することができる。例えば、図３に示すＤＰプレーン上に記された経路においては、ａ１とｂ１により位置決めされる格子点（ａ１，ｂ１）からその右上のａ２とｂ２により位置決めされる格子点（ａ２，ｂ２）に進んでいることが分かる。この場合、ａ２のフレームとｂ２のフレームとの時間軸上の位置は当初から同じである。一方、この経路においては、ａ２とｂ２により位置決めされる格子点（ａ２，ｂ２）からその右のａ２とｂ３により位置決めされる格子点（ａ２，ｂ３）に進んでいる。楽曲の進行のずれが無ければ格子点（ａ３，ｂ３）へ進み、ｂ３のフレームの時間軸上の位置に対応すべきフレームはａ３のフレームとなるものであるが、格子点（ａ２，ｂ３）に進んでいることから、ｂ３のフレームはａ３ではなくａ２のフレームの時間軸上の位置と同じであると対応付けられ、楽曲の進行のずれが発生していることになる。すなわち、リファレンス楽音データのフレームｂ３までに進行する演奏の内容が、ライブ楽音データのフレームａ２までに進行していることになるから、この時点においては、リファレンス演奏よりもライブ演奏の方が早く進んでいることになる。このようにして、楽曲の進行のずれを検出することができる。そして、ライブ楽音データの全てのフレームに対して、リファレンス楽音データのフレームを対応付けて、楽曲の進行のずれを検出する。以上がＤＰマッチングの仕組みである。 Then, a path having the minimum DP matching score accumulated value is specified on the DP plane, and each frame of the reference musical sound data is associated with each frame of the live musical sound data by each grid point on the path. A shift in the progression of music can be detected by this correspondence. For example, in the path marked on the DP plane shown in FIG. 3, from the lattice point (a1, b1) positioned by a1 and b1, to the lattice point (a2, b2) positioned by a2 and b2 at the upper right You can see that it is going. In this case, the positions on the time axis of the frame a2 and the frame b2 are the same from the beginning. On the other hand, in this route, the grid point (a2, b2) positioned by a2 and b2 advances to the grid point (a2, b3) positioned by a2 and b3 on the right. If there is no shift in the progression of the music, the process proceeds to the grid point (a3, b3), and the frame that should correspond to the position of the frame b3 on the time axis is the frame a3, but the grid point (a2, b3) Therefore, the frame of b3 is associated with the same position on the time axis of the frame of a2, not a3, and the progress of the music is generated. That is, since the content of the performance that has progressed up to the frame b3 of the reference musical sound data has progressed to the frame a2 of the live musical sound data, the live performance progresses faster than the reference performance at this point. It will be out. In this way, it is possible to detect a shift in the progression of music. Then, the frame of the reference musical sound data is associated with all the frames of the live musical sound data to detect a shift in the progression of music. The above is the mechanism of DP matching.

次に、タイムアライメント部１０２が検出した楽曲の進行のずれに基づいてタイムコードを順次出力する機能について説明する。タイムアライメント部１０２は、上述したように、ライブ楽音データの各フレームに対してリファレンス楽音データのフレームを対応付けるから、入力されたライブ楽音データの時間軸上の位置をリファレンス楽音データの時間軸上の位置（以下、再生位置という）として認識することができる。また、この再生位置の時間変化によりテンポを認識することができる。タイムアライメント部１０２は、所定の間隔で、認識した再生位置とテンポに基づいてタイムコードを生成し、順次出力する。もし、タイムアライメント部１０２から順次出力されたタイムコードを参照してリファレンス楽音データを読み出して再生したとすれば、リファレンス演奏の時間軸が伸縮され、ライブ演奏と同等な楽曲の進行で再生することができる。なお、楽曲データ選択部１０１から出力されるリファレンス楽音データは、楽曲検索部１００において検索に要する時間が存在する分、ライブ楽音データに比べて遅れていることになるから、検索に要する時間によっては、楽曲の最初の部分についてはライブ楽音データとリファレンス楽音データの対応付けを行わず、楽曲途中からの対応付けを行うこととなる。この場合は、楽曲におけるイントロ部分が終了した後から対応付けを始めるようにすればよい。これは、リファレンス楽音データにイントロの終了を示す情報を付加して、タイムアライメント部１０２がその情報を読み取った場合に対応付けを始めるようにすることにより実現できる。 Next, a function of sequentially outputting time codes based on the progress of music detected by the time alignment unit 102 will be described. As described above, the time alignment unit 102 associates the frame of the reference musical sound data with each frame of the live musical sound data, so that the position on the time axis of the input live musical sound data is set on the time axis of the reference musical sound data. It can be recognized as a position (hereinafter referred to as a reproduction position). In addition, the tempo can be recognized from the time change of the reproduction position. The time alignment unit 102 generates a time code based on the recognized reproduction position and tempo at predetermined intervals and sequentially outputs them. If the reference musical sound data is read and reproduced by referring to the time code sequentially output from the time alignment unit 102, the time axis of the reference performance is expanded and reproduced with the progression of music equivalent to the live performance. Can do. Note that the reference musical sound data output from the music data selection unit 101 is delayed compared to the live musical sound data by the time required for the music search unit 100 to search, so depending on the time required for the search. For the first part of the music, the live music data and the reference music data are not associated with each other, but are associated from the middle of the music. In this case, the association may be started after the intro portion of the music is finished. This can be realized by adding information indicating the end of the intro to the reference musical sound data and starting the association when the time alignment unit 102 reads the information.

図２に戻って、説明を続ける。データ読出部１０３は、タイムアライメント部１０２から順次出力されるタイムコードと、歌詞データに付されたタイムコードとを対応させるようにして、楽曲データ選択部１０１がＲＡＭ１３にバッファした歌詞データを読み出し、データ処理部１０４に順次出力していく。そして、データ処理部（映像データ生成手段）１０４は、データ読出部１０３から順次出力された歌詞データに基づいて歌詞映像データ（第１の映像データ）を生成し、映像合成部（映像合成手段）１０６に出力する。ここで、楽曲の歌詞を示すテキストデータとそのテキストの表示タイミングを示すデータを有する歌詞データは、タイムアライメント部１０２から出力されたタイムコードを参照して読み出すことによりデータ読出部１０３から出力されるシーケンスデータであるから、歌詞映像データは、楽曲の歌詞の表示タイミングがライブ演奏の楽曲の進行にあわせて表示される映像のデータとして生成される。 Returning to FIG. 2, the description will be continued. The data reading unit 103 reads the lyrics data buffered in the RAM 13 by the music data selection unit 101 so that the time codes sequentially output from the time alignment unit 102 correspond to the time codes attached to the lyrics data, The data is sequentially output to the data processing unit 104. The data processing unit (video data generation unit) 104 generates lyric video data (first video data) based on the lyrics data sequentially output from the data reading unit 103, and the video synthesis unit (video synthesis unit). The data is output to 106. Here, the text data indicating the lyrics of the music and the lyrics data having data indicating the display timing of the text are output from the data reading unit 103 by reading with reference to the time code output from the time alignment unit 102. Since it is sequence data, the lyric video data is generated as video data that is displayed in accordance with the progress of the tune of the live performance as the lyrics display timing of the tune.

遅延部（遅延手段）１０５は、ＣＰＵ１１によってライブデータバッファ領域１３ａから読み出されたライブ映像データとライブ楽音データに所定時間の遅延処理を行って出力する。ここで、所定時間は、上述したタイムアライメント部１０２がライブ楽音データを取得してから、データ処理部１０４が歌詞映像データを出力するまでの処理に必要な時間が設定されている。このようにすると、遅延部１０５から出力されたライブ楽音データ、ライブ映像データと、データ処理部１０４から出力された歌詞映像データは時間同期したものとなる。 The delay unit (delay unit) 105 performs delay processing for a predetermined time on the live video data and live music data read from the live data buffer area 13a by the CPU 11, and outputs the result. Here, the predetermined time is set to a time required for processing from when the time alignment unit 102 described above acquires live musical tone data until the data processing unit 104 outputs lyrics video data. In this way, the live musical sound data and live video data output from the delay unit 105 and the lyrics video data output from the data processing unit 104 are synchronized in time.

映像合成部１０６は、遅延部１０５から出力されたライブ映像データに係るライブ映像に対して、データ処理部１０４から出力された歌詞映像データに係る歌詞の映像（以下、歌詞映像という）をスーパーインポーズした合成映像データを生成して表示部１５へ出力する。ここで、合成映像データは、遅延部１０５において所定時間の遅延処理を行ったライブ映像データに係るライブ映像に対して、時間同期した歌詞映像、すなわち楽曲の歌詞がライブ演奏の楽曲の進行にあわせて表示される映像をスーパーインポーズしたものであり、ライブ映像に楽曲の歌詞映像が合成されることによって、その歌詞映像がライブボーカル、ライブ演奏、ライブ映像に合った映像の映像データとして生成される。 The video composition unit 106 superimposes a lyric video (hereinafter referred to as a lyric video) related to the lyric video data output from the data processing unit 104 with respect to the live video related to the live video data output from the delay unit 105. The paused synthesized video data is generated and output to the display unit 15. Here, the synthesized video data is a time-synchronized lyric video, that is, the lyrics of the music in accordance with the progress of the music of the live performance, with respect to the live video related to the live video data subjected to the delay process for a predetermined time in the delay unit 105. Superimpose the displayed video, and by synthesizing the lyric image of the song with the live video, the lyric video is generated as video data for the live vocal, live performance, and live video. The

このようにして、表示部１５には合成映像データが出力され、音声出力部１８にはライブ楽音データが出力されることにより、本実施形態に係るデータ再生装置を有する映像表示装置１は、入力されたデータの楽曲を認識し、もともとのライブ映像に対して時間同期、すなわち楽曲の進行に合わせた歌詞映像が合成された映像と楽曲とを再生することができる。また、楽曲のイントロ部において検索処理を終了させるようにすれば、最初の歌詞映像が表示される前に終了させることもできる。 In this way, the composite video data is output to the display unit 15 and the live musical sound data is output to the audio output unit 18, so that the video display device 1 having the data reproduction device according to the present embodiment can input It is possible to recognize the music of the recorded data and to reproduce the video and the music in which the original live video is time-synchronized, that is, the lyrics video combined with the progress of the music. If the search process is terminated in the intro part of the music, it can be terminated before the first lyrics video is displayed.

以上、本発明の実施形態について説明したが、本発明は以下のように、さまざまな態様で実施可能である。 As mentioned above, although embodiment of this invention was described, this invention can be implemented in various aspects as follows.

＜変形例１＞
実施形態においては、楽曲検索部１００は、ライブ楽音データの取得を自動的に行い、楽曲の検索結果に基づいて、楽曲情報データを出力していたが、利用者が操作部１６を操作して、楽曲検索部１００に再度検索処理を行わせることができるようにしてもよい。この場合は、操作部１６が操作されてから所定時間分のライブ楽音データを取得して、基準データとして扱えばよい。このようにすると、楽曲の検索結果が対応する楽曲になっていなかった場合に、表示部１５に表示される歌詞映像がライブボーカルと一致しないものになっていても、再度検索処理を行わせることができる。また、基準データとなるライブ楽音データの領域が異なることにより、ライブ楽音データとリファレンス楽音データの類似度も異なるようになり、正しい楽曲の検索結果が得られることも期待することができる。 <Modification 1>
In the embodiment, the music search unit 100 automatically acquires live musical sound data and outputs the music information data based on the search result of the music, but the user operates the operation unit 16. The music search unit 100 may be allowed to perform the search process again. In this case, live music data for a predetermined time after the operation unit 16 is operated may be acquired and handled as reference data. In this way, if the search result of the music is not the corresponding music, the search process is performed again even if the lyrics video displayed on the display unit 15 does not match the live vocal. Can do. Also, since the live music data area serving as the reference data is different, the similarity between the live music data and the reference music data is also different, and it can be expected that a correct music search result can be obtained.

＜変形例２＞
実施形態においては、楽曲検索部１００は、基準データと各リファレンス楽曲データに係るリファレンス楽音データの最初の部分と比較し、基準データと最も類似する、すなわち類似度が最も高いリファレンス楽音データを特定し、特定したリファレンス楽音データに対応する楽曲情報データを楽曲データ選択部１０１に出力していたが、類似度が最も高いリファレンス楽音データを特定するのではなく、類似度が高いリファレンス楽音データを複数特定し、利用者がその中から一のリファレンス楽音データを選択するようにしてもよい。この場合は、ＣＰＵ１１は、楽曲検索部１００が特定した複数のリファレンス楽音データに対応する楽曲情報データに基づいて、楽曲の名称などを表示部１５に既に表示されているライブ映像にスーパーインポーズするようにして表示させればよい。そして、利用者は、音声出力部から出力されるライブ演奏を聴きながら、または、表示部１５に表示されているライブ映像を見ながら、操作部１６を操作して、表示された複数の楽曲の名称から、一の楽曲を選択するようにすればよい。そして、楽曲検索部１００は、利用者によって選択された楽曲に対応する楽曲情報データを楽曲データ選択部１０１へ出力すればよい。このようにすれば、一度でより正確な楽曲を特定することができる。 <Modification 2>
In the embodiment, the music search unit 100 compares the reference data with the first portion of the reference tone data related to each reference song data, and specifies the reference tone data that is most similar to the reference data, that is, has the highest similarity. The music information data corresponding to the specified reference tone data was output to the song data selection unit 101. However, instead of specifying the reference tone data having the highest similarity, a plurality of reference tone data having a high degree of similarity is specified. Then, the user may select one reference musical sound data from among them. In this case, the CPU 11 superimposes the name of the music on the live video already displayed on the display unit 15 based on the music information data corresponding to the plurality of reference musical sound data specified by the music search unit 100. In this way, the display may be performed. Then, the user operates the operation unit 16 while listening to the live performance output from the audio output unit or watching the live video displayed on the display unit 15 to display a plurality of displayed music pieces. One song may be selected from the name. Then, the music search unit 100 may output the music information data corresponding to the music selected by the user to the music data selection unit 101. In this way, more accurate music can be specified at once.

＜変形例３＞
実施形態においては、楽曲検索部１００は、取得したライブ楽音データを基準データとして、検索処理を行っていたが、利用者が入力する音声データを基準データとして検索処理を行なうようにしてもよい。この場合は、図５のように、映像表示装置１に、マイクロフォンなどの収音手段と収音手段によって収音された音から音声データを生成する音声データ生成手段とを有する音声入力部２１を具備させればよい。そして、利用者は、操作部１６を操作して、楽曲検索部１００に音声データを入力するタイミング、すなわち音声入力部２１から音声を収音させるタイミングを指定して、そのタイミングで発音すればよい。ここで、当該発音は、利用者が楽曲のメロディや歌詞を歌ったものであってもよいし、楽器を演奏したものであってもよい。このようにすれば、事前に曲がわかっている場合には、事前に楽曲を特定しておくことができる。 <Modification 3>
In the embodiment, the music search unit 100 performs the search process using the acquired live musical sound data as reference data. However, the music search unit 100 may perform the search process using voice data input by the user as reference data. In this case, as shown in FIG. 5, an audio input unit 21 having a sound collection unit such as a microphone and a sound data generation unit that generates sound data from the sound collected by the sound collection unit is provided in the video display device 1. It may be provided. Then, the user may operate the operation unit 16 to designate the timing for inputting the voice data to the music search unit 100, that is, the timing for collecting the voice from the voice input unit 21, and to generate the sound at that timing. . Here, the pronunciation may be one in which the user sang a melody or lyrics of a musical piece or may be a musical instrument played. In this way, if the music is known in advance, the music can be specified in advance.

＜変形例４＞
実施形態においては、楽曲検索部１００が楽曲情報データを出力して、楽曲を特定するまでの処理は、全て映像表示装置１において行っていたが、一部の処理について通信部１７を介して映像表示装置１と通信可能なサーバなどに行なわせてもよい。例えば、各リファレンス楽曲データを楽曲データ記憶領域１４ａに記憶させる代わりに、サーバの記憶部に記憶させ、検索処理部１００３、楽曲データ選択部１０１の処理をサーバに行なわせてもよい。この場合には、楽曲検索部１００の処理は、以下のように行われる。 <Modification 4>
In the embodiment, the processing until the music search unit 100 outputs the music information data and specifies the music is all performed in the video display device 1, but the video is partly processed via the communication unit 17. You may make it perform to the server etc. which can communicate with the display apparatus 1. For example, instead of storing each reference song data in the song data storage area 14a, the reference song data may be stored in the storage unit of the server, and the search processing unit 1003 and the song data selection unit 101 may be processed by the server. In this case, the process of the music search unit 100 is performed as follows.

まず、楽音データ取得部１００１は、基準データを基準データ記憶領域１００２に記憶させた後、サーバと通信を行い、サーバの検索処理部に検索処理の開始を指示するとともに、基準データ記憶領域１００２に記憶させた基準データをサーバに送信する。そして、サーバの楽曲処理部は、送信された基準データとサーバの記憶部に記憶されたリファレンス楽曲データに係るリファレンス楽音データと比較して、類似度の高いリファレンス楽音データを特定する。その後、サーバの楽曲データ選択部は、特定したリファレンス楽音データに係るリファレンス楽曲データを映像表示装置１に送信する。映像表示装置１のＣＰＵ１１は、送信されたリファレンス楽曲データのうち、リファレンス楽音データをＲＡＭ１３にバッファした後にタイムアライメント部１０２に出力するとともに、歌詞データをデータ読出部１０３に読み出されるまでＲＡＭ１３にバッファしておく。このようにすれば、映像表示装置１において行われる処理の一部をサーバにおいて行うようにしても、実施形態と同様な効果を得ることができる。 First, the musical sound data acquisition unit 1001 stores the reference data in the reference data storage area 1002, communicates with the server, instructs the search processing unit of the server to start the search process, and stores the reference data storage area 1002 in the reference data storage area 1002. The stored reference data is transmitted to the server. Then, the music processing unit of the server identifies the reference musical sound data having a high degree of similarity by comparing the transmitted reference data with the reference musical sound data related to the reference musical piece data stored in the storage unit of the server. Thereafter, the music data selection unit of the server transmits the reference music data related to the identified reference musical sound data to the video display device 1. The CPU 11 of the video display device 1 buffers the reference musical sound data of the transmitted reference music data in the RAM 13 and then outputs it to the time alignment unit 102 and also buffers the lyrics data in the RAM 13 until it is read out by the data reading unit 103. Keep it. In this way, even if a part of the processing performed in the video display device 1 is performed in the server, the same effect as in the embodiment can be obtained.

なお、上記に加えて楽音データ取得部１００１の処理をサーバに行なわせるようにしてもよい。この場合は、映像表示装置１は、サーバの楽音データ取得部にライブ楽音データを取得させ、基準データ記憶領域１００２をサーバの記憶部に確保して基準データを記憶させればよい。ここで、サーバのライブ楽音データの取得は、映像表示装置１がライブ楽音データを取得するとともに、映像表示装置１から通信部１７を介してサーバに送信することによって行なってもよいし、映像表示装置１がライブ楽音データを取得した取得元の情報をサーバに送信して、サーバが当該取得元から取得するようにしてもよい。 In addition to the above, the processing of the musical sound data acquisition unit 1001 may be performed by a server. In this case, the video display device 1 may have the musical tone data acquisition unit of the server acquire live musical tone data, secure the reference data storage area 1002 in the storage unit of the server, and store the reference data. Here, the acquisition of the live musical tone data of the server may be performed by the video display device 1 acquiring the live musical tone data and transmitting the live musical tone data from the video display device 1 to the server via the communication unit 17 or displaying the video. The information of the acquisition source from which the apparatus 1 has acquired live musical sound data may be transmitted to the server, and the server may acquire the information from the acquisition source.

このように、映像表示装置１において行われる処理の一部をサーバにおいて行う態様は、様々な組み合わせで可能であり、検索処理部１００３の処理のみをサーバに行なわせてもよいし、楽音データ取得部１００１と検索処理部１００３の処理をサーバに行なわせてもよい。なお、この場合には、サーバから映像表示装置１へは楽曲情報データが送信されることになる。 As described above, a mode in which a part of the processing performed in the video display device 1 is performed in the server can be performed in various combinations, and only the processing of the search processing unit 1003 may be performed in the server, or musical sound data acquisition may be performed. The server 1001 and the search processing unit 1003 may perform the processing. In this case, music information data is transmitted from the server to the video display device 1.

＜変形例５＞
実施形態においては、楽音データ取得部１００１は、ＣＰＵ１１によってライブデータバッファ領域１３ａから読み出されたライブ楽音データのうちライブ演奏の最初の所定時間分を取得し、これを基準データとして基準データ記憶領域１００２に記憶させていたが、最初ではなく、途中から所定時間分を取得するようにしてもよい。この場合、利用者が操作部１６を操作して、楽音データ取得部１００１がライブ楽音データの取得を行うタイミングを指示するようにすればよい。このようにすれば、利用者が操作部１６を操作するまでは、楽曲検索部１００は、楽曲情報データを出力することはないから、ライブ映像に歌詞映像が表示されることがなく、歌詞映像を表示させたいときに利用者が操作部１６を操作して、表示させることができる。 <Modification 5>
In the embodiment, the musical sound data acquisition unit 1001 acquires the first predetermined time of the live performance from the live musical sound data read from the live data buffer area 13a by the CPU 11, and uses this as reference data for the reference data storage area. Although stored in 1002, a predetermined amount of time may be acquired from the middle instead of the first. In this case, the user may operate the operation unit 16 to instruct the timing at which the musical sound data acquisition unit 1001 acquires live musical sound data. In this way, the song search unit 100 does not output the song information data until the user operates the operation unit 16, so that the lyrics video is not displayed on the live video, and the lyrics video is not displayed. The user can operate the operation unit 16 to display it.

＜変形例６＞
実施形態においては、楽音データ取得部１００１は、ライブ楽音データの取得のうちライブ演奏の最初の所定時間分を取得して、これを基準データとして基準データ記憶領域１００２に記憶させ、基準データ記憶領域１００２への記憶が終了すると、検索処理部１００３に対して、検索処理の開始を指示していたが、ライブ楽音データの取得を開始するとともに、検索処理部１００３に対して、検索処理の開始を指示してもよい。この場合、検索処理部１００３は、基準データ記憶領域１００２に順次記憶されていく基準データとリファレンス楽音データを比較すればよい。このようにすると、基準データのデータ量が徐々に増えることによって、類似度の高いリファレンス楽音データの数が絞られて、精度良くリファレンス楽音データを特定することができる。なお、リファレンス楽音データの特定が終了したら、楽音データ取得部１００１はライブ楽音データの取得を停止すればよい。 <Modification 6>
In the embodiment, the musical sound data acquisition unit 1001 acquires the first predetermined time of the live performance out of the acquisition of the live musical sound data, and stores it in the reference data storage area 1002 as reference data. When the storage to 1002 is completed, the search processing unit 1003 is instructed to start the search processing. However, the acquisition of the live music data is started and the search processing unit 1003 is started to start the search process. You may instruct. In this case, the search processing unit 1003 may compare the reference data and the reference musical sound data that are sequentially stored in the reference data storage area 1002. In this way, the number of reference musical sound data having a high degree of similarity is reduced by gradually increasing the data amount of the reference data, and the reference musical sound data can be specified with high accuracy. Note that when the identification of the reference musical sound data is completed, the musical sound data acquisition unit 1001 may stop acquiring the live musical sound data.

＜変形例７＞
実施形態においては、検索処理部１００３は、基準データとリファレンス楽音データを比較することにより、類似度の高いリファレンス楽音データを特定していたが、基準データおよびリファレンス楽音データのそれぞれが示すデータの特徴量を比較することにより類似度の高いリファレンス楽音データを特定するようにしてもよい。この場合には、楽曲データ記憶領域１４ａに記憶されている各リファレンス楽曲データは、リファレンス楽音データの特徴量を示す特徴量データを有するようにし、検索処理部１００３は、基準データから特徴量を抽出して、当該特徴量と、リファレンス楽曲データの特徴量データとを比較することにようにすればよい。ここで、特徴量は、音程の変動、音量の変動などオーディオデータから抽出できる特徴量であれば、どのようなものであってもよく、検索処理部１００３は、楽曲データ記憶領域１４ａに記憶されている特徴量データと同じ特徴量を基準データから抽出すればよい。なお、楽曲データ記憶領域１４ａには必ずしも特徴量データが記憶されている必要はなく、検索処理部１００３にリファレンス楽音データの特徴量も抽出させてもよい。このようにすれば、より簡易にリファレンス楽音データの特定を行うことができる。 <Modification 7>
In the embodiment, the search processing unit 1003 identifies the reference tone data having a high degree of similarity by comparing the reference data and the reference tone data, but the characteristics of the data indicated by the reference data and the reference tone data, respectively. Reference music data having a high degree of similarity may be specified by comparing the amounts. In this case, each reference song data stored in the song data storage area 14a has feature data indicating the feature value of the reference music data, and the search processing unit 1003 extracts the feature value from the reference data. Then, the feature amount may be compared with the feature amount data of the reference music data. Here, the feature amount may be any feature amount that can be extracted from audio data, such as pitch variation and volume variation, and the search processing unit 1003 is stored in the music data storage area 14a. What is necessary is just to extract the same feature-value from the reference data as the feature-value data which are being stored. It should be noted that the feature data is not necessarily stored in the music data storage area 14a, and the search processor 1003 may also extract the feature data of the reference musical sound data. In this way, the reference musical sound data can be specified more easily.

＜変形例８＞
実施形態においては、タイムアライメント部１０２におけるライブ楽音データとリファレンス楽音データとの対応付けは、楽曲検索部１００における検索時間によっては、楽曲の途中から対応付けを行うようにしていたが、検索時間によらず、楽曲の最初から対応付けを行なうようにしてもよい。この場合には、タイムアライメント部１０２は、ＣＰＵ１１によってライブデータバッファ領域１３ａから読み出されたライブ楽音データを取得するとともにＲＡＭ１３にバッファし、楽曲データ選択部１０１からリファレンス楽曲データが出力されるまで、ライブ楽音データをバッファし続ければよい。その後、実施形態におけるタイムアライメント部１０２の処理を行なうようにすればよい。ここで、ライブ楽音データをバッファする際には、取得したタイミングに応じてタイムコードを付しておくことにより、楽曲データ選択部１０１からリファレンス楽音データが出力されたときに、各データの最初からＤＰマッチングを行うことができるとともに、ライブ楽音データの現在の再生位置とリファレンス楽音データの位置の対応の目安として用いることができる。なお、ライブ楽音データをＲＡＭ１３にバッファしなくても、ライブ楽音データの再生が開始されてからの時間を認識する手段を設けるだけでも、ライブ楽音データの現在の再生位置とリファレンス楽音データの位置の対応の目安として用いることが可能である。 <Modification 8>
In the embodiment, the association between the live musical sound data and the reference musical sound data in the time alignment unit 102 is performed from the middle of the music depending on the search time in the music searching unit 100. Regardless, the association may be performed from the beginning of the music. In this case, the time alignment unit 102 acquires live musical sound data read from the live data buffer area 13a by the CPU 11 and buffers it in the RAM 13 until the music data selection unit 101 outputs reference music data. Just keep buffering live music data. Thereafter, the processing of the time alignment unit 102 in the embodiment may be performed. Here, when buffering live musical sound data, by attaching a time code according to the acquired timing, when the reference musical sound data is output from the music data selection unit 101, each data is started from the beginning. In addition to performing DP matching, it can be used as a measure of the correspondence between the current playback position of live music data and the position of reference music data. Even if the live musical sound data is not buffered in the RAM 13 and only means for recognizing the time since the reproduction of the live musical sound data is provided, the current reproduction position of the live musical sound data and the position of the reference musical sound data are determined. It can be used as a measure of correspondence.

＜変形例９＞
実施形態においては、楽音データ取得部１００１は、ＣＰＵ１１によってライブデータバッファ領域１３ａから読み出されたライブ楽音データを取得する処理を行っていたが、ＣＰＵ１１によって読み出される前、すなわちライブ楽音データがライブデータバッファ領域１３ａにバッファされ始めると同時に、ライブ楽音データの取得を行なってもよい。このようにすると、ＣＰＵ１１によってライブ楽音データが読み出される前に楽曲検索部１００において検索処理を開始することができるため、タイムアライメント部１０２におけるライブ楽音データとリファレンス楽音データの対応付けを早い段階で始めることが可能となる。 <Modification 9>
In the embodiment, the musical sound data acquisition unit 1001 performs processing for acquiring the live musical sound data read from the live data buffer area 13a by the CPU 11, but before being read by the CPU 11, that is, the live musical sound data is live data. The live musical sound data may be acquired at the same time as being buffered in the buffer area 13a. In this way, the music search unit 100 can start the search process before the live musical sound data is read out by the CPU 11, so the association between the live musical sound data and the reference musical sound data in the time alignment unit 102 is started at an early stage. It becomes possible.

＜変形例１０＞
実施形態においては、入力される情報にライブ映像データが含まれていたが、入力されるデータには映像データが含まれなくてもよい。この場合は、図６に示すように、映像合成部１０６を用いずに、データ処理部１０４は、歌詞映像データをそのまま表示部１５へ出力するようにすればよい。このように、入力される情報には映像データを含む必要はなく、オーディオデータが含まれていればよいから、例えば、携帯オーディオから出力される音声データ、ラジオ放送など、様々な装置からのデータ入力に対応できる。この場合、通信部１７をそれぞれのデータ入力に対応した通信手段とすればよい。 <Modification 10>
In the embodiment, live video data is included in the input information, but the input data may not include video data. In this case, as shown in FIG. 6, the data processing unit 104 may output the lyrics video data to the display unit 15 as it is without using the video synthesis unit 106. In this way, the input information does not need to include video data, but only needs to include audio data. For example, data from various devices such as audio data output from portable audio and radio broadcasts. Can handle input. In this case, the communication unit 17 may be a communication unit corresponding to each data input.

＜変形例１１＞
実施形態においては、リファレンス楽音データと対応する歌詞データは、楽曲の歌詞を示すテキストデータと当該テキストデータの各テキストの表示タイミングを示すデータとを有するシーケンスデータであったが、楽曲の歌詞をリファレンス楽音データと時間同期して再生できるようにしたタイムコードの付された映像データ（第１のデータ）であってもよい。この場合、以下のようにすればよい。データ読出部１０３は、実施形態に示したようにタイムコードを参照し、映像データである歌詞データを読み出して、データ処理部１０４に順次出力する。これにより、データ読出部１０３から順次出力された歌詞データは、読み出されるときに楽曲の進行に合わせるように時間軸が伸縮されてデータ処理部１０４に出力される。そして、データ処理部１０４は、この時間軸が伸縮された歌詞データを歌詞映像データ（第１の映像データ）として生成して出力する。このようにすれば、歌詞データが映像データであっても、本発明の効果を奏することができる。なお、映像データについては、歌詞データに限らず、楽曲の楽譜など、入力されたオーディオデータ（実施形態においてはライブ楽音データ）の楽曲の進行に合わせて表示させたい映像のデータなら、どのような映像データでもよい。 <Modification 11>
In the embodiment, the lyric data corresponding to the reference musical sound data is sequence data including text data indicating the lyrics of the music and data indicating the display timing of each text of the text data. It may be video data (first data) with a time code that can be reproduced in time synchronization with the musical sound data. In this case, the following may be performed. As shown in the embodiment, the data reading unit 103 refers to the time code, reads the lyric data that is video data, and sequentially outputs it to the data processing unit 104. Thus, the lyrics data sequentially output from the data reading unit 103 is output to the data processing unit 104 with the time axis expanded and contracted so as to match the progress of the music when read. Then, the data processing unit 104 generates and outputs lyrics data with the time axis expanded and contracted as lyrics video data (first video data). In this way, even if the lyrics data is video data, the effects of the present invention can be achieved. Note that the video data is not limited to lyrics data, and any video data that is to be displayed in accordance with the progress of the music in the input audio data (in the embodiment, live music data), such as music scores, etc. Video data may be used.

＜変形例１２＞
実施形態においては、楽曲の進行に合わせて歌詞を表示させるようにしていたが、これに加えて、外部の他の装置を楽曲の進行に合わせて制御するようにしてもよい。この場合は、以下のような構成とすればよい。図７に示すように、例えば、ＡＵＸ（Ａｕｘｉｌｉａｒｙ）端子などの制御信号出力部１９を設け、楽曲データ記憶領域１４ａに記憶されているリファレンス楽曲データは、リファレンス楽音データ、歌詞データに加え、制御信号データを有するようにする。ここで、制御信号データは、ＡＵＸ端子に接続される外部の装置を制御する信号とその制御のタイミングを示すデータを有するシーケンスデータであって、歌詞データと同様にリファレンス楽音データと時間同期して外部の装置を制御できるようにタイムコードが付されている。 <Modification 12>
In the embodiment, the lyrics are displayed in accordance with the progress of the music, but in addition to this, other external devices may be controlled in accordance with the progress of the music. In this case, the following configuration may be used. As shown in FIG. 7, for example, a control signal output unit 19 such as an AUX (Auxiliary) terminal is provided, and the reference music data stored in the music data storage area 14a is a control signal in addition to the reference musical sound data and lyrics data. Have data. Here, the control signal data is sequence data having a signal for controlling an external device connected to the AUX terminal and data indicating the timing of the control, and is synchronized with the reference musical sound data in the same manner as the lyrics data. A time code is attached so that an external device can be controlled.

そして、図８に示すように、実施形態における楽曲データ選択部１０１の動作に加えて、楽曲データ記憶領域１４ａに記憶されている制御信号データを読み出す。そして、制御信号データをデータ読出部１０３に読み出されるまでＲＡＭ１３にバッファしておく。データ読出部１０３は、歌詞データを読み出すときと同様に、タイムアライメント部１０２から順次出力されるタイムコードと、制御信号データに付されたタイムコードとを対応させるようにして、楽曲データ選択部１０１がＲＡＭ１３にバッファした制御信号データを読み出し、制御信号出力部１９に順次出力していく。このようにすると、データ再生装置を有する映像表示装置１は、制御信号出力部に接続される外部の装置が、楽曲の進行にあわせて出力される制御信号に基づいて制御されるため、楽曲の進行にあわせて外部の装置を動作させることができる。なお、外部の装置には、照明、音響機器、ロボットなど、制御信号によって制御できるものであれば、どのようなものにも適用できる。この場合は、制御信号データは、制御したい装置にあわせたデータとしておけばよい。さらに、複数の外部の装置を同時に制御したい場合には、複数の制御信号データを用意しておき、制御信号出力部１９に複数の装置を接続できるようにしておけばよい。この接続は、有線接続であってもよいし、無線接続であってもよく、信号が伝達できるものであればよい。 Then, as shown in FIG. 8, in addition to the operation of the music data selection unit 101 in the embodiment, the control signal data stored in the music data storage area 14a is read out. The control signal data is buffered in the RAM 13 until it is read by the data reading unit 103. Similar to the case of reading the lyric data, the data reading unit 103 associates the time code sequentially output from the time alignment unit 102 with the time code attached to the control signal data so as to correspond to the music data selection unit 101. Reads out the control signal data buffered in the RAM 13 and sequentially outputs it to the control signal output unit 19. In this way, the video display device 1 having the data reproduction device controls the external device connected to the control signal output unit based on the control signal output in accordance with the progress of the music. An external device can be operated as it progresses. Note that the external device can be applied to any device that can be controlled by a control signal, such as a lighting device, an acoustic device, and a robot. In this case, the control signal data may be data that matches the device to be controlled. Furthermore, when it is desired to control a plurality of external devices simultaneously, a plurality of control signal data may be prepared so that a plurality of devices can be connected to the control signal output unit 19. This connection may be a wired connection or a wireless connection as long as a signal can be transmitted.

＜変形例１３＞
実施形態においては、楽曲の進行に合わせて歌詞を表示させるようにしていたが、これに加えて、ライブ演奏に別の楽音を楽曲の進行に合わせて再生するようにしてもよい。この場合は、以下のような構成とすればよい。図９に示すように、楽曲データ記憶領域１４ａに記憶されているリファレンス楽曲データは、リファレンス楽音データ、歌詞データに加え、ＭＩＤＩデータを有するようにする。ここで、ＭＩＤＩデータには、歌詞データと同様に、リファレンス楽音データと時間同期して再生できるように時刻情報が付されている。 <Modification 13>
In the embodiment, the lyrics are displayed in accordance with the progress of the music. However, in addition to this, another musical sound may be reproduced in accordance with the progress of the music in the live performance. In this case, the following configuration may be used. As shown in FIG. 9, the reference music data stored in the music data storage area 14a has MIDI data in addition to the reference musical sound data and lyrics data. Here, time information is attached to the MIDI data so that it can be reproduced in time synchronization with the reference musical sound data, like the lyrics data.

そして、図１０に示すように、楽曲データ選択部１０１は、実施形態における動作に加えて、楽曲データ記憶領域１４ａに記憶されているＭＩＤＩデータを読み出す。そして、ＭＩＤＩデータをデータ読出部１０３に読み出されるまでＲＡＭ１３にバッファしておく。データ読出部１０３は、歌詞データを読み出すときと同様に、タイムアライメント部１０２から順次出力されるタイムコードと、ＭＩＤＩデータに付された時刻情報とを対応させるようにして、楽曲データ選択部１０１がＲＡＭ１３にバッファしたＭＩＤＩデータを読み出し、再生部１０７に順次出力する。ここで、時刻情報がタイムコードである場合には、そのまま対応させればよいが、デュレーションなどタイムコード以外の相対時刻を表す情報である場合には、ＣＰＵ１１はリファレンス楽音データと時間同期して再生できるように設定されるテンポなどを参照してタイムコードを生成して対応させればよい。これにより、データ読出部１０３から順次出力されたＭＩＤＩデータは、読み出されるときに楽曲の進行に合わせるように読み出され再生部１０７に出力される。そして、再生部１０７は、このＭＩＤＩデータを再生してオーディオデータであるＭＩＤＩ楽音データを生成し、音声合成部１０８へ出力する。一方、遅延部１０５によって遅延処理されたライブ楽音データも音声合成部１０８へ出力される。そして、音声合成部１０８は、ライブ楽音データに係るライブ演奏とＭＩＤＩ楽音データに係るＭＩＤＩ演奏とをミキシングした音を合成楽音データとして生成し、音声出力部１８へ出力する。なお、ライブ楽音データとＭＩＤＩ楽音データとをミキシングせずに別々に音声出力部１８に出力するようにしてもよい。この場合は、それぞれのデータについて音声出力部１８においてミキシングしてもよいし、ミキシングせず異なる放音手段から放音するようにしてもよい。ここで、再生部１０７は、ＭＩＤＩデータを再生することができる音源部を有することによって、ＭＩＤＩ楽音データを生成することができる。なお、音源部によって生成されるオーディオデータの楽音の発音内容を示すシーケンスデータであれば、必ずしもＭＩＤＩデータでなくてもよい。 As shown in FIG. 10, the music data selection unit 101 reads out MIDI data stored in the music data storage area 14 a in addition to the operation in the embodiment. The MIDI data is buffered in the RAM 13 until it is read by the data reading unit 103. The data reading unit 103 causes the music data selection unit 101 to associate the time code sequentially output from the time alignment unit 102 with the time information attached to the MIDI data in the same manner as when reading the lyrics data. The MIDI data buffered in the RAM 13 is read and sequentially output to the reproduction unit 107. Here, when the time information is a time code, it may be handled as it is. However, when the time information is information representing a relative time other than the time code such as a duration, the CPU 11 reproduces the time information in synchronization with the reference musical sound data. A time code may be generated by referring to a tempo that is set so that it can be performed. As a result, the MIDI data sequentially output from the data reading unit 103 is read and output to the reproducing unit 107 in accordance with the progress of the music when being read. Then, the reproduction unit 107 reproduces this MIDI data to generate MIDI musical sound data that is audio data, and outputs it to the voice synthesis unit 108. On the other hand, live musical sound data delayed by the delay unit 105 is also output to the voice synthesis unit 108. Then, the voice synthesizer 108 generates a sound obtained by mixing the live performance related to the live music data and the MIDI performance related to the MIDI music data as synthesized music data, and outputs it to the voice output unit 18. Note that the live musical sound data and the MIDI musical sound data may be separately output to the audio output unit 18 without being mixed. In this case, each data may be mixed in the audio output unit 18 or may be emitted from different sound emitting means without being mixed. Here, the reproducing unit 107 can generate MIDI musical tone data by having a sound source unit capable of reproducing MIDI data. Note that the sequence data does not necessarily need to be MIDI data as long as it is sequence data indicating the tone generation content of the audio data generated by the sound source unit.

また、楽曲データ記憶領域１４ａに記憶されているリファレンス楽曲データのＭＩＤＩデータに代えて、リファレンス楽音データと時間同期して再生できるようにタイムコードの付されたオーディオデータである追加楽音データとしてもよい。この場合は、以下のように処理される。まず、楽曲データ選択部１０１は、楽曲データ記憶領域１４ａに記憶されている追加楽音データを読み出す。そして、追加楽音データをデータ読出部１０３に読み出されるまでＲＡＭ１３にバッファしておく。データ読出部１０３は、歌詞データを読み出すときと同様に、タイムアライメント部１０２から順次出力されるタイムコードと、追加楽音データに付されたタイムコードとを対応させるようにして、楽曲データ選択部１０１がＲＡＭ１３にバッファした追加楽音データを読み出し、再生部１０７に順次出力する。これにより、データ読出部１０３から順次出力された追加楽音データは、読み出されるときに楽曲の進行に合わせるように時間軸が伸縮されて再生部１０７に出力される。そして、再生部１０７は、この時間軸が伸縮された追加楽音データを新たなオーディオデータとして、音声合成部１０８へ出力する。その後はＭＩＤＩデータの場合と同様に処理すればよい。このようにすれば、映像だけでなく様々な音についても、楽曲の進行にあわせて出力することができる。 Further, in place of the MIDI data of the reference music data stored in the music data storage area 14a, additional music data which is audio data with a time code so that it can be reproduced in time synchronization with the reference music data may be used. . In this case, processing is performed as follows. First, the music data selection unit 101 reads the additional musical tone data stored in the music data storage area 14a. The additional musical tone data is buffered in the RAM 13 until it is read out by the data reading unit 103. Similar to the case of reading the lyric data, the data reading unit 103 associates the time code sequentially output from the time alignment unit 102 with the time code attached to the additional musical tone data, so that the music data selection unit 101 Reads out the additional musical tone data buffered in the RAM 13 and sequentially outputs it to the playback unit 107. Thus, the additional musical sound data sequentially output from the data reading unit 103 is output to the reproducing unit 107 with the time axis expanded and contracted so as to match the progress of the music when read. Then, the playback unit 107 outputs the additional musical sound data whose time axis is expanded and contracted as new audio data to the voice synthesis unit 108. Thereafter, processing may be performed in the same manner as in the case of MIDI data. In this way, not only video but also various sounds can be output as the music progresses.

＜変形例１４＞
実施形態においては、通信部１７は、有線、無線などによって、データを受信するチューナなどの通信手段であって、ライブ映像データ、ライブ楽音データ、楽曲情報データを受信していたが、図１に破線で示したように、データ入力部２０を設けて、これらのデータがデータ入力部２０から入力されるようにしてもよい。例えば、これらのデータが、ＤＶＤ（ＤｉｇｉｔａｌＶｅｒｓａｔｉｌｅＤｉｓｃ）などの記録メディアに記録されたデータであれば、データ入力部は、ＤＶＤに記録されたデータを読み取れる光学ドライブであればよい。このようにしても、実施形態と同様な効果を得ることができる。 <Modification 14>
In the embodiment, the communication unit 17 is a communication unit such as a tuner that receives data by wire or wireless, and has received live video data, live music data, and music information data. As indicated by a broken line, a data input unit 20 may be provided and these data may be input from the data input unit 20. For example, if these data are data recorded on a recording medium such as a DVD (Digital Versatile Disc), the data input unit may be an optical drive that can read the data recorded on the DVD. Even if it does in this way, the effect similar to embodiment can be acquired.

＜変形例１５＞
実施形態においては、遅延部１０５を設け、ＣＰＵ１１によってライブデータバッファ領域１３ａから読み出されたライブ映像データとライブ楽音データにタイムアライメント部１０２がライブ楽音データを取得してから、データ処理部１０４が歌詞映像データを出力するまでの処理に必要な時間（以下、遅延時間という）の遅延処理を行って出力していた。一方、図１１に示すように、タイムアライメント部１０２からのタイムコードの出力は、タイムコード予測部１０９を介してデータ読出部１０３に出力されるようにしてもよい。この場合は、タイムコード予測部１０９は、タイムアライメント部１０２から順次出力されるタイムコードを参照して、遅延時間後にタイムアライメント部１０２から出力されるタイムコードを予測する。そして、タイムコード予測部１０９は、予測したタイムコードをデータ読出部１０３に順次出力すればよい。ここで、タイムコードの予測は、タイムアライメント部１０２から順次出力されたタイムコードのうち、過去の所定の数のタイムコードを参照し、これらのタイムコードの示す時間軸上の位置の変化量から、遅延時間後のタイムコードを推定すればよい。なお、タイムコードの予測は、遅延時間後にタイムアライメント部１０２から出力されるタイムコードを予測すればよいから、上記方法に限らず、所定のアルゴリズムに基づいて行なえばよい。このようにすれば、遅延部１０５を用いたライブ楽音データ、ライブ演奏データの遅延をさせずに、実施形態の効果を得ることができる。 <Modification 15>
In the embodiment, a delay unit 105 is provided, and after the time alignment unit 102 acquires live music data from the live video data and live music data read from the live data buffer area 13a by the CPU 11, the data processing unit 104 The time required for processing until the lyric video data is output (hereinafter referred to as delay time) is delayed and output. On the other hand, as shown in FIG. 11, the output of the time code from the time alignment unit 102 may be output to the data reading unit 103 via the time code prediction unit 109. In this case, the time code prediction unit 109 refers to the time codes sequentially output from the time alignment unit 102 and predicts the time code output from the time alignment unit 102 after the delay time. Then, the time code prediction unit 109 may sequentially output the predicted time code to the data reading unit 103. Here, the prediction of the time code is performed by referring to a predetermined number of past time codes among the time codes sequentially output from the time alignment unit 102, and from the amount of change in the position on the time axis indicated by these time codes. The time code after the delay time may be estimated. Note that the time code can be predicted based on a predetermined algorithm, not limited to the above method, because the time code output from the time alignment unit 102 after the delay time may be predicted. In this way, the effects of the embodiment can be obtained without delaying the live musical sound data and live performance data using the delay unit 105.

＜変形例１６＞
実施形態においては、タイムアライメント部１０２は、楽曲の進行のずれを検出する機能について、ＤＰマッチングを用いていたが、異なった手法によって楽曲の進行のずれを検出するようにしてもよい。例えば、ＨＭＭ（ＨｉｄｄｅｎＭａｒｋｏｖＭｏｄｅｌ：隠れマルコフモデル）を用いてもよいし、比較対象である各データから波形の特徴量（音程、音量など）を抽出して比較するようにしてもよい。すなわち、各データの比較をして、各データ間の類似する部分を対応させることができる手法であればどのような手法でもよい。このようにしても、実施形態と同様な効果を得ることができる。 <Modification 16>
In the embodiment, the time alignment unit 102 uses DP matching for the function of detecting a shift in the progress of music. However, the time alignment unit 102 may detect a shift in the progress of music by a different method. For example, an HMM (Hidden Markov Model: Hidden Markov Model) may be used, or waveform feature values (pitch, volume, etc.) may be extracted and compared from each data to be compared. That is, any method may be used as long as it is a method that can compare each data and correspond a similar portion between the data. Even if it does in this way, the effect similar to embodiment can be acquired.

実施形態に係る映像表示装置のハードウエアの構成を示すブロック図である。It is a block diagram which shows the structure of the hardware of the video display apparatus which concerns on embodiment. 実施形態に係る映像表示装置のソフトウエアの構成を示すブロック図である。It is a block diagram which shows the structure of the software of the video display apparatus which concerns on embodiment. 楽曲検索部のソフトウエアの構成を示すブロック図である。It is a block diagram which shows the structure of the software of a music search part. ＤＰマッチングを行う際のＤＰプレーンを示す説明図である。It is explanatory drawing which shows DP plane at the time of performing DP matching. 変形例３に係る映像表示装置のハードウエアの構成を示すブロック図である。It is a block diagram which shows the structure of the hardware of the video display apparatus concerning the modification 3. 変形例１０に係る映像表示装置のソフトウエアの構成を示すブロック図である。It is a block diagram which shows the structure of the software of the video display apparatus which concerns on the modification 10. FIG. 変形例１２に係る映像表示装置のハードウエアの構成を示すブロック図である。It is a block diagram which shows the structure of the hardware of the video display apparatus concerning the modification 12. 変形例１２に係る映像表示装置のソフトウエアの構成を示すブロック図である。FIG. 22 is a block diagram showing a software configuration of a video display device according to Modification 12. 変形例１３に係る映像表示装置のハードウエアの構成を示すブロック図である。It is a block diagram which shows the structure of the hardware of the video display apparatus concerning the modification 13. 変形例１３に係る映像表示装置のソフトウエアの構成を示すブロック図である。FIG. 20 is a block diagram illustrating a software configuration of a video display device according to Modification Example 13; 変形例１５に係る映像表示装置のソフトウエアの構成を示すブロック図である。And FIG. 20 is a block diagram showing a software configuration of a video display device according to Modification 15.

Explanation of symbols

１…映像表示装置、１０…バス、１１…ＣＰＵ、１２…ＲＯＭ、１３…ＲＡＭ、１３ａ…ライブデータバッファ領域、１４…記憶部、１４ａ…楽曲データ記憶領域、１５…表示部、１６…操作部、１７…通信部、１８…音声出力部、１９…制御信号出力部、２０…データ入力部、２１…音声入力部、１００…楽曲検索部、１００１…楽音データ取得部、１００２…基準データ記憶領域、１００３…検索処理部、１０１…楽曲データ選択部、１０２…タイムアライメント部、１０３…データ読出部、１０４…データ処理部、１０５…遅延部、１０６…映像合成部、１０７…再生部、１０８…音声合成部、１０９…タイムコード予測部 DESCRIPTION OF SYMBOLS 1 ... Video display apparatus, 10 ... Bus, 11 ... CPU, 12 ... ROM, 13 ... RAM, 13a ... Live data buffer area, 14 ... Storage part, 14a ... Music data storage area, 15 ... Display part, 16 ... Operation part DESCRIPTION OF SYMBOLS 17 ... Communication part, 18 ... Audio | voice output part, 19 ... Control signal output part, 20 ... Data input part, 21 ... Voice input part, 100 ... Music search part, 1001 ... Musical sound data acquisition part, 1002 ... Reference | standard data storage area , 1003 ... Search processing section, 101 ... Music data selection section, 102 ... Time alignment section, 103 ... Data reading section, 104 ... Data processing section, 105 ... Delay section, 106 ... Video composition section, 107 ... Playback section, 108 ... Speech synthesizer 109, time code predictor

Claims

Storage means for storing a plurality of sets of first data having synchronization information that defines the time for each part of the data and first audio data indicating the waveform of the sound and having the time defined for each part of the data ;
Identifying means for identifying first audio data from a plurality of first audio data stored in the storage means based on second audio data indicating a waveform of a sound supplied from outside;
Data by comparing the respective portions of the first audio data identified by said each part of the second audio data supplied sequentially acquired from the outside, the acquired second audio data of each part and the specific means Time alignment means for associating each part and generating time information according to the time defined in the first audio data corresponding to each part of the acquired second audio data ;
Data reading means for reading out from the storage means the first data paired with the first audio data specified by the specifying means based on the correspondence between the time information and the synchronization information;
Video data generating means for generating first video data based on the first data read by the data reading means;
Data reproducing apparatus characterized by comprising a delay means that to output the second audio data to be supplied by a predetermined amount delayed from the external.

The specifying means includes a plurality of second audio data stored in the storage means based on a result of comparing the second audio data supplied from the outside and each of the first audio data stored in the storage means. The data reproduction apparatus according to claim 1, wherein the first audio data is specified from one audio data.

Sound collecting means for generating audio data by sound collection;
The specifying means replaces the second audio data supplied from the outside with the result of comparing the audio data generated by the sound pickup means and each of the first audio data stored in the storage means. The data reproducing apparatus according to claim 1, wherein the first audio data is specified from the plurality of first audio data stored in the storage unit based on the first audio data.

The data reproducing apparatus according to any one of claims 1 to 3, further comprising re-specification instruction means for causing the specifying means to specify the first audio data again.

For the server connected to a network, and instructing means for issuing an instruction for specifying the first audio data time data each part shows the voice waveform is defined,
Receiving means for receiving a first data having a synchronization information that defines the time for the first audio data and data each part data transmitted from said server in response to an instruction of said instruction means,
Storage means for storing the first audio data and the first data received by the receiving means;
Each part of the second audio data indicating the waveform of the sound supplied from the outside is sequentially acquired, and each part of the acquired second audio data is compared with each part of the first audio data stored in the storage means Time alignment means for associating each data part and generating time information corresponding to the time defined in the first audio data corresponding to each part of the acquired second audio data ;
Data reading means for reading out the first data from the storage means based on the correspondence between the time information and the synchronization information;
Video data generating means for generating first video data based on the first data read by the data reading means;
Data reproducing apparatus characterized by comprising a delay means that to output the second audio data to be supplied by a predetermined amount delayed from the external.

Instruction means for giving an instruction to specify a first audio data in a server connected to the network;
Receiving means for receiving data transmitted from the server in response to an instruction from the instruction means as music information data;
Storage means for storing a plurality of sets of first data having synchronization information that defines the time for each part of the data and first audio data indicating the waveform of the sound and having the time defined for each part of the data ;
Selection means for selecting first audio data from a plurality of first audio data stored in the storage means based on the music information data;
Each part of the second audio data sequentially obtained externally supplied data each part by comparing the respective portions of the first audio data selected by the respective portions of the second audio data the acquired said selection means And time alignment means for generating time information according to the time defined in the first audio data corresponding to each part of the acquired second audio data ,
Data reading means for reading out from the storage means the first data paired with the first audio data selected by the selection means based on the correspondence between the time information and the synchronization information;
Video data generating means for generating first video data based on the first data read by the data reading means;
Data reproducing apparatus characterized by comprising a delay means that to output the second audio data to be supplied by a predetermined amount delayed from the external.

The said instruction | indication means sends out the 2nd audio data supplied from the outside to the said server while giving the instruction | indication which specifies the 1st audio data to the server connected to the network. The data reproducing apparatus according to claim 6.

The instruction unit issues an instruction to specify a first audio data to a server connected to a network, and transmits a part of the second audio data supplied from the outside to the server. The data reproducing device according to claim 5 or 6.

Sound collecting means for generating audio data by sound collection;
The said instruction | indication means sends out the audio data produced | generated by the said sound collection means to the said server while giving the instruction | indication which makes the server connected to a network identify 1st audio data. Item 7. The data reproducing device according to Item 6.

The re-specification instruction unit that causes the instruction unit to re-issue an instruction to specify the first audio data in a server connected to the network. 10. Data playback device.

A plurality of video data is input, and further comprising video composition means for superimposing a video related to other video data on a video related to one video data among the plurality of video data,
The delay means delays the predetermined amount of the second video data supplied from the outside and time-synchronized with the second audio data,
The video composition means superimposes the video related to the first video data generated by the video data generation means on the video related to the second video data delayed by the delay means. The data reproducing apparatus according to claim 1.

Based on the second audio data indicating the waveform of the sound supplied from the outside, the first audio data from the plurality of first audio data indicating the waveform of the sound stored in the storage means and having the time defined in each part of the data A specific process of identifying data,
Data by each unit of the second audio data sequentially obtained to be supplied, compared with that of the first audio data identified in each section and the specific process of the second audio data the acquired from the external A time alignment process for generating time information according to the time defined in the first audio data corresponding to each part of the acquired second audio data in association with each part ;
From said storage means for a time in the first data and the first set and the audio data having the synchronization information that defines a plurality storage of the data each part, based on the synchronization information of the corresponding relationship between the time information A data reading process of reading first data paired with the first audio data specified in the specifying process;
A video data generating process for generating first video data based on the first data read in the data reading process;
Data reproduction method characterized by comprising a delay process you output by a predetermined amount delaying the second audio data supplied from the outside.

Further comprising a degree over sound pickup for generating audio data by sound pickup,
The specifying process is based on a result of comparing the audio data generated in the sound collection process with each of the first audio data stored in the storage means instead of the second audio data supplied from the outside. 13. The data reproduction method according to claim 12, wherein the first audio data is specified from the plurality of first audio data stored in the storage unit based on the first audio data.

On the computer,
Based on the second audio data indicating the waveform of the sound supplied from the outside, the first audio data from the plurality of first audio data indicating the waveform of the sound stored in the storage means and having the time defined in each part of the data A specific function to identify data,
Data by each unit of the second audio data sequentially obtained to be supplied, compared with that of the first audio data specified by each part and the specific function of the second audio data the acquired from the external in association with each unit, and the time alignment function of generating the time information acquired in accordance with the first audio data to a defined time corresponding to the respective portions of the second audio data,
From the storage means for a time in the first data and the first set and the audio data having the synchronization information that defines a plurality storage of the data each part, based on the synchronization information of the corresponding relationship between the time information A data read function for reading the first data paired with the first audio data specified in the specific function;
A video data generation function for generating first video data based on the first data read by the data read function;
Program for realizing a you output delay function by a predetermined amount delaying the second audio data supplied from the outside.

A sound collection function for generating audio data by collecting sound is further provided.
The specific function is based on a result of comparing the audio data generated by the sound collection function with each of the first audio data stored in the storage unit, instead of the second audio data supplied from the outside. based on, program of claim 14, wherein the identifying the first audio data from a plurality of first audio data stored in the storage means.