JP2008197272A

JP2008197272A - Data reproducer, data reproducing method and program

Info

Publication number: JP2008197272A
Application number: JP2007031066A
Authority: JP
Inventors: Takuro Sone; 卓朗曽根; Takahiro Tanaka; 孝浩田中
Original assignee: Yamaha Corp
Current assignee: Yamaha Corp
Priority date: 2007-02-09
Filing date: 2007-02-09
Publication date: 2008-08-28

Abstract

<P>PROBLEM TO BE SOLVED: To provide a data reproducer, data reproducing method and a program for easily reproducing image data in accordance with a flow of melody even when a tempo of the music is changed like a live performance. <P>SOLUTION: In an image display device with the data reproducer, a data read-out part 103 reads out lyric data temporally synchronized with reference image data based on time information generated by a comparison between live image data and prepared reference image data in a time alignment part 102, so that the live image data and the lyric data can be reproduced with temporal synchronization, and the prepared data can be reproduced in accordance with a flow of melody even when a tempo of the music piece is changed like a live performance. <P>COPYRIGHT: (C)2008,JPO&INPIT

Description

本発明は、入力されたデータと同期して、他のデータを再生する技術に関する。 The present invention relates to a technique for reproducing other data in synchronization with input data.

一般的なカラオケ装置においては、例えば、ＭＩＤＩ（ＭｕｓｉｃａｌＩｎｓｔｒｕｍｅｎｔｓＤｉｇｉｔａｌＩｎｔｅｒｆａｃｅ：登録商標）形式の伴奏用データ、楽曲の歌詞テロップを表示するシーケンスデータおよび映像データなどから構成された楽曲データをそれぞれ同期させて再生することにより、利用者はその楽曲のカラオケを楽しむことができる。特許文献１には、伴奏データと映像データを別個にサーバから受信してそれらを同期させて再生させる技術が開示されている。また、特許文献２においては、ライブ演奏のような臨場感のあるカラオケ演奏を提供する技術が開示されている。 In a general karaoke apparatus, for example, music data composed of MIDI (Musical Instruments Digital Interface) format accompaniment data, sequence data for displaying lyrics telop of music, and video data are synchronized. By playing, the user can enjoy karaoke of the music. Japanese Patent Application Laid-Open No. 2004-133620 discloses a technique for receiving accompaniment data and video data separately from a server and reproducing them in synchronization. Patent Document 2 discloses a technique for providing a karaoke performance with a sense of presence such as a live performance.

また、歌詞を表示させる技術は、カラオケ装置だけで用いられるわけではなく、テレビ放送などにおける歌番組においても用いられている。また、テレビ放送においては、生放送の歌番組などで歌手が歌唱に合わせて歌詞テロップを表示する場合、オペレータが楽曲の進行にあわせて、所定のタイミングで歌詞を表示させるようにしている。
特開２００３−１５６７５号公報特開２０００−３４７６７６号公報 Moreover, the technique for displaying lyrics is not used only in a karaoke apparatus, but is also used in a song program in television broadcasting or the like. In television broadcasting, when a singer displays a lyrics telop along with a song in a live broadcast song program or the like, the operator displays the lyrics at a predetermined timing as the music progresses.
JP 2003-15675 A JP 2000-347676 A

しかし、特許文献１に開示された技術を用いる場合には、伴奏データはＭＩＤＩ形式のデータの再生であるため、ＭＩＤＩ形式のデータを再生できる音源を用いた再生のため、音質が劣る場合や、テンポが単調な演奏となってしまう場合があった。また、特許文献２に開示された技術の場合には、ライブ演奏のような臨場感が得られるが、楽曲の進行が単調ではないため、歌詞テロップを表示させるためなどのシーケンスデータについては、楽曲の進行に合わせて事前に作成しておく必要があった。また、テレビ放送などにおいて上述したようなオペレータの作業は、本番においてミスは許されず、事前に楽曲の進行と歌詞の表示を行うタイミングとを合わせる練習をしておく必要があり、非常に負担が大きかった。 However, when the technique disclosed in Patent Document 1 is used, the accompaniment data is reproduction of data in MIDI format. Therefore, reproduction using a sound source that can reproduce data in MIDI format results in poor sound quality, In some cases, the tempo was monotonous. In addition, in the case of the technique disclosed in Patent Document 2, a sense of reality such as a live performance can be obtained, but since the progress of the music is not monotonous, the sequence data for displaying the lyrics telop, etc. It was necessary to prepare in advance according to the progress of. Moreover, the operator's work as described above in television broadcasting etc. is not allowed to be mistaken in the actual performance, and it is necessary to practice in advance the timing of the progress of the music and the timing of displaying the lyrics, which is very burdensome. It was big.

本発明は、上述の事情に鑑みてなされたものであり、ライブ演奏のように楽曲のテンポの変動があっても、楽曲の流れにあわせたデータの再生を簡単に行うことができるデータ再生装置、データ再生方法およびプログラムを提供することを目的とする。 The present invention has been made in view of the above-described circumstances, and is a data reproducing apparatus that can easily reproduce data in accordance with the flow of music even when there is a change in the tempo of the music as in live performances. An object of the present invention is to provide a data reproduction method and program.

上述の課題を解決するため、本発明は、第１の映像データを記憶するとともに、データ各部についての時刻を規定する同期情報を有した第１のデータを記憶する記憶手段と、外部から供給される第２の映像データと前記第１の映像データとを所定時間長のフレーム単位で対応付けて、対応する部分についての時刻を示す時刻情報を生成するタイムアライメント手段と、前記時刻情報と前記同期情報の対応関係に基づいて、前記記憶手段から前記第１のデータを読み出すデータ読出手段と、前記外部から供給される第２の映像データを所定量遅延させる遅延手段とを具備することを特徴とするデータ再生装置を提供する。 In order to solve the above-mentioned problems, the present invention stores the first video data and the storage means for storing the first data having the synchronization information that defines the time for each part of the data, and is supplied from the outside. Time alignment means for associating the second video data and the first video data in units of a frame of a predetermined time length, and generating time information indicating the time for the corresponding portion; and the time information and the synchronization A data reading unit that reads the first data from the storage unit based on a correspondence relationship of information, and a delay unit that delays the second video data supplied from the outside by a predetermined amount. Provided is a data reproducing apparatus.

また、別の好ましい態様において、前記タイムアライメント手段は、前記外部から供給される第２の映像データおよび前記第１の映像データの所定時間長のフレームごとの特徴を示す特徴量を予め設定された所定のアルゴリズムに基づいて各々抽出して、当該各々抽出した特徴量をフレーム単位で対応付けて、対応する部分についての時刻を示す時刻情報を生成してもよい。 In another preferred aspect, the time alignment means is preset with a feature quantity indicating a feature of each frame of a predetermined time length of the second video data supplied from the outside and the first video data. Each piece of information may be extracted based on a predetermined algorithm, and the extracted feature values may be associated with each other in units of frames to generate time information indicating the time for the corresponding part.

また、別の好ましい態様において、前記記憶手段は、第１の映像データに代えて、予め設定された所定のアルゴリズムに基づいて抽出された前記第１の映像データの所定時間長のフレームごとの特徴を示す特徴量を記憶し、前記タイムアライメント手段は、前記外部から供給される第２の映像データの所定時間長のフレームごとの特徴を示す特徴量を前記所定のアルゴリズムに基づいて抽出し、当該特徴量と前記記憶手段に記憶された特徴量とをフレーム単位で対応付けて、対応する部分についての時刻を示す時刻情報を生成してもよい。 In another preferred aspect, the storage means is characterized by a frame of a predetermined time length of the first video data extracted based on a predetermined algorithm instead of the first video data. The time alignment means extracts a feature amount indicating a feature for each frame of a predetermined time length of the second video data supplied from the outside based on the predetermined algorithm, and The feature amount and the feature amount stored in the storage unit may be associated with each other in a frame unit to generate time information indicating the time for the corresponding portion.

また、別の好ましい態様において、前記データ読出手段によって読み出された第１のデータに基づいて第３の映像データを生成する映像データ生成手段をさらに具備してもよい。 In another preferable aspect, the video data generating device may further include video data generating means for generating third video data based on the first data read by the data reading means.

また、別の好ましい態様において、前記遅延手段によって遅延された第２の映像データに係る映像に対して、前記映像データ生成手段によって生成された第３の映像データに係る映像をスーパーインポーズする映像合成手段をさらに具備してもよい。 In another preferred embodiment, a video that superimposes the video related to the third video data generated by the video data generating means with respect to the video related to the second video data delayed by the delay means. You may further comprise a synthetic | combination means.

また、別の好ましい態様において、前記遅延手段における遅延量は、タイムアライメント手段に第２の映像データが供給されてから前記映像データ生成手段によって第３の映像データが生成されるまでの時間として設定されてもよい。 In another preferable aspect, the delay amount in the delay unit is set as a time from when the second video data is supplied to the time alignment unit to when the third video data is generated by the video data generation unit. May be.

また、別の好ましい態様において、前記データ読出手段によって読み出された第１のデータに基づいて第１のオーディオデータを生成するオーディオデータ生成手段をさらに具備してもよい。 In another preferable aspect, the audio data generating unit may further include an audio data generating unit configured to generate first audio data based on the first data read by the data reading unit.

また、別の好ましい態様において、複数のオーディオデータが入力され、前記複数のオーディオデータに係る楽音をミキシングする音声合成手段をさらに具備し、前記遅延手段は、外部から供給され、前記第２の映像データと時間同期がとられた第２のオーディオデータを前記所定量遅延させ、前記音声合成手段は、前記遅延手段によって遅延された第２のオーディオデータに係る楽音と前記オーディオデータ生成手段によって生成された第１のオーディオデータに係る楽音とをミキシングしてもよい。 In another preferred embodiment, the apparatus further comprises voice synthesis means for receiving a plurality of audio data and mixing musical sounds related to the plurality of audio data, wherein the delay means is supplied from the outside, and the second video The second audio data time-synchronized with the data is delayed by the predetermined amount, and the voice synthesizing unit is generated by the musical sound related to the second audio data delayed by the delay unit and the audio data generating unit. The musical sound related to the first audio data may be mixed.

また、別の好ましい態様において、前記遅延手段における遅延量は、タイムアライメント手段に第２の映像データが供給されてから前記オーディオデータ生成手段によって第１のオーディオデータが生成されるまでの時間として設定されてもよい。 In another preferable aspect, the delay amount in the delay unit is set as a time from when the second video data is supplied to the time alignment unit to when the first audio data is generated by the audio data generation unit. May be.

また、別の好ましい態様において、前記第１のデータは、外部に接続した装置を制御するためのシーケンスデータである制御信号データであってもよい。 In another preferred embodiment, the first data may be control signal data that is sequence data for controlling an externally connected device.

また、別の好ましい態様において、前記遅延手段における遅延量は、タイムアライメント手段に第２の映像データが供給されてから前記データ読出手段によって第１のデータが読み出されるまでの時間として設定されてもよい。 In another preferable aspect, the delay amount in the delay unit may be set as a time from when the second video data is supplied to the time alignment unit to when the first data is read out by the data reading unit. Good.

また、本発明は、外部から供給される第２の映像データと記憶手段に記憶された第１の映像データとを所定時間長のフレーム単位で対応付けて、対応する部分についての時刻を示す時刻情報を生成するタイムアライメント過程と、第１の映像データを記憶するとともに、データ各部についての時刻を規定する同期情報を有した第１のデータを記憶する記憶手段から、前記時刻情報と前記同期情報の対応関係に基づいて、前記第１のデータを読み出すデータ読出過程と、前記外部から供給される第２の映像データを所定量遅延させる遅延過程とを備えることを特徴とするデータ再生方法を提供する。 In the present invention, the second video data supplied from the outside and the first video data stored in the storage means are associated with each other in a frame unit of a predetermined time length, and the time indicating the time for the corresponding part A time alignment process for generating information; first video data; and storage means for storing first data having synchronization information for defining a time for each part of the data. A data reproduction method comprising: a data reading process for reading the first data based on the correspondence relationship; and a delay process for delaying the second video data supplied from the outside by a predetermined amount To do.

また、本発明は、記憶手段を有するコンピュータに、前記記憶手段に第１の映像データを記憶させるとともに、データ各部についての時刻を規定する同期情報を有した第１のデータを記憶させる記憶機能と、外部から供給される第２の映像データと前記第１の映像データとを所定時間長のフレーム単位で対応付けて、対応する部分についての時刻を示す時刻情報を生成するタイムアライメント機能と、前記時刻情報と前記同期情報の対応関係に基づいて、前記記憶手段から前記第１のデータを読み出すデータ読出機能と、前記外部から供給される第２の映像データを所定量遅延させる遅延機能とを実現させるためのコンピュータ読み取り可能なプログラムを提供する。 According to another aspect of the present invention, there is provided a storage function for causing a computer having storage means to store first video data in the storage means and storing first data having synchronization information for defining a time for each part of the data. A time alignment function for associating the second video data supplied from the outside with the first video data in units of a frame having a predetermined time length, and generating time information indicating the time for the corresponding portion; Based on the correspondence between the time information and the synchronization information, a data reading function for reading the first data from the storage means and a delay function for delaying the second video data supplied from the outside by a predetermined amount are realized. A computer-readable program is provided.

本発明によれば、ライブ演奏のように楽曲のテンポの変動があっても、楽曲の流れにあわせたデータの再生を簡単に行うことができるデータ再生装置、データ再生方法およびプログラムを提供することができる。 According to the present invention, it is possible to provide a data reproducing apparatus, a data reproducing method, and a program capable of easily reproducing data in accordance with the flow of music even when the tempo of the music varies as in live performance. Can do.

以下、本発明の一実施形態について説明する。 Hereinafter, an embodiment of the present invention will be described.

＜実施形態＞
図１は、本発明の本実施形態に係るデータ再生装置を有する映像表示装置１のハードウエアの構成を示すブロック図である。 <Embodiment>
FIG. 1 is a block diagram showing a hardware configuration of a video display apparatus 1 having a data reproducing apparatus according to the embodiment of the present invention.

ＣＰＵ（ＣｅｎｔｒａｌＰｒｏｃｅｓｓｉｎｇＵｎｉｔ）１１は、ＲＯＭ（ＲｅａｄＯｎｌｙＭｅｍｏｒｙ）１２に記憶されているプログラムを読み出して、ＲＡＭ（ＲａｎｄｏｍＡｃｃｅｓｓＭｅｍｏｒｙ）１３にロードして実行することにより、映像表示装置１の各部について、バス１０を介して制御する。また、ＲＡＭ１３は、ＣＰＵ１１が記憶された各データの加工などを行う際のワークエリアとして機能する。 A CPU (Central Processing Unit) 11 reads out a program stored in a ROM (Read Only Memory) 12, loads it into a RAM (Random Access Memory) 13, and executes it. Control is performed via the bus 10. The RAM 13 functions as a work area when the CPU 11 processes each data stored.

さらにＲＡＭ１３は、内部に有するライブデータバッファ領域１３ａに、通信部１７から受信するストリーミング再生するためのライブ映像データ（第２の映像データ）、ライブ楽音データ（第２のオーディオデータ）を一時的にバッファする機能を有するとともに、当該各データに対応する楽曲に関する情報のデータ（以下、楽曲情報データという）についても、通信部１７において受信して記憶する。なお、楽曲に関する情報は、楽曲名や認識番号など、当該楽曲を特定できる情報であれば何でもよい。そして、ＣＰＵ１１は、ＲＡＭ１３にバッファされたライブ映像データ、ライブ楽音データを読み出し、後述するような処理を行ってストリーミング再生を行う。 Further, the RAM 13 temporarily stores live video data (second video data) and live musical sound data (second audio data) for streaming playback received from the communication unit 17 in a live data buffer area 13a. In addition to having a buffering function, the communication unit 17 also receives and stores information data (hereinafter referred to as music information data) related to music corresponding to the data. In addition, the information regarding a music may be anything as long as it is information that can specify the music, such as a music name and a recognition number. Then, the CPU 11 reads live video data and live music data buffered in the RAM 13, and performs streaming playback by performing processing as described later.

ここで、ライブ映像データは、楽曲の生演奏を撮影した映像（以下、ライブ映像という）の映像データである。また、ライブ楽音データは、楽曲の生演奏を録音した音声（以下、ライブ演奏）のオーディオデータであって、ライブ演奏には、歌手の歌声（以下、ライブボーカルという）や伴奏などが含まれている。 Here, the live video data is video data of a video (hereinafter referred to as a live video) obtained by shooting a live performance of music. In addition, live music data is audio data of a sound recording a live performance of a music (hereinafter referred to as a live performance), and the live performance includes a singer's singing voice (hereinafter referred to as a live vocal) and accompaniment. Yes.

記憶部（記憶手段）１４は、例えば、ハードディスクなどの大容量記憶手段であって、楽曲データ記憶領域１４ａに、各楽曲のリファレンスデータとなるリファレンス楽曲データを記憶する。リファレンス楽曲データは、リファレンス映像データ（第１の映像データ）と歌詞データ（第１のデータ）を有している。リファレンス映像データは、楽曲の見本となる歌声を発している際の歌手の口元の動きを撮影した映像（以下、リファレンス映像という）の映像データであって、その再生時刻を示すタイムコードが付されている。歌詞データは、楽曲の歌詞を示すテキストデータと当該テキストデータの各テキストの表示タイミングを示すデータとを有するシーケンスデータであって、シーケンスデータの読み出し時刻を示すタイムコード（同期情報）が付されている。そして、リファレンス映像データと歌詞データは、同じタイムコードによって読み出すことにより時間同期して再生することができ、楽曲の見本となる歌声を発している際の歌手の口元の動きにあった歌詞が表示されるようになっている。 The storage unit (storage means) 14 is a large-capacity storage means such as a hard disk, for example, and stores reference song data serving as reference data for each song in the song data storage area 14a. The reference music data includes reference video data (first video data) and lyrics data (first data). Reference video data is video data of a video of the movement of the singer's mouth when a singing voice is used as a sample song (hereinafter referred to as a reference video), and a time code indicating the playback time is attached. ing. The lyric data is sequence data having text data indicating the lyrics of the music and data indicating the display timing of each text of the text data, and a time code (synchronization information) indicating the readout time of the sequence data is attached. Yes. The reference video data and the lyric data can be played back in time synchronization by reading out with the same time code, and the lyrics corresponding to the movement of the singer's mouth when the singing voice is used as a sample of the music are displayed. It has come to be.

表示部１５は、映像を画面に表示する液晶ディスプレイなどの表示デバイスであって、入力された映像データに基づいて表示を行う。また、映像表示装置１を操作するためのメニュー画面などの各種画面を表示する。操作部１６は、例えばキーボードやマウスなどであり、映像表示装置１の利用者が操作部１６を操作すると、その操作内容を表すデータがＣＰＵ１１へ出力される。 The display unit 15 is a display device such as a liquid crystal display that displays an image on a screen, and performs display based on input image data. Further, various screens such as a menu screen for operating the video display device 1 are displayed. The operation unit 16 is, for example, a keyboard or a mouse. When a user of the video display device 1 operates the operation unit 16, data representing the operation content is output to the CPU 11.

通信部１７は、有線、無線などによって、データを受信するチューナなどの通信手段であって、上述したように、本実施形態においては、ライブ映像データ、ライブ楽音データ、楽曲情報データを受信して、ＲＡＭ１３のライブデータバッファ領域１３ａにバッファする機能を有している。 The communication unit 17 is a communication means such as a tuner that receives data by wire or wireless. In the present embodiment, as described above, the communication unit 17 receives live video data, live music data, and music information data. And has a function of buffering in the live data buffer area 13 a of the RAM 13.

音声出力部１８は、スピーカなどの放音手段を有し、入力されたオーディオデータに基づいて放音する。 The sound output unit 18 has sound emitting means such as a speaker, and emits sound based on the input audio data.

次に、ＣＰＵ１１が、ＲＯＭ１２に記憶されたプログラムを実行することによって実現する機能について説明する。図２は、ＣＰＵ１１が実現する機能を示したソフトウエアの構成を示すブロック図である。 Next, functions realized by the CPU 11 executing programs stored in the ROM 12 will be described. FIG. 2 is a block diagram showing a software configuration showing the functions realized by the CPU 11.

楽曲データ選択部１０１は、ライブデータバッファ領域１３ａから楽曲情報データを読み取り、通信部１７が受信している各データの楽曲を認識する。そして、楽曲データ記憶領域１４ａに記憶されているリファレンス楽曲データの中から、認識した楽曲に対応するリファレンス楽曲データを選択し、そのリファレンス楽曲データのリファレンス映像データおよび歌詞データを読み出す。そして、楽曲データ選択部１０１は、読み出したリファレンス映像データをタイムアライメント部（タイムアライメント手段）１０２へ出力するとともに、読み出した歌詞データについては、後述するデータ読出部（データ読出手段）１０３に読み出されるまでＲＡＭ１３にバッファしておく。 The music data selection unit 101 reads music information data from the live data buffer area 13a, and recognizes the music of each data received by the communication unit 17. Then, reference music data corresponding to the recognized music is selected from the reference music data stored in the music data storage area 14a, and reference video data and lyrics data of the reference music data are read out. The music data selection unit 101 outputs the read reference video data to the time alignment unit (time alignment unit) 102, and the read lyrics data is read to a data reading unit (data reading unit) 103 described later. Until it is buffered in the RAM 13.

タイムアライメント部１０２は、ＣＰＵ１１によってライブデータバッファ領域１３ａから読み出されたライブ映像データを取得し、当該ライブ映像データと、楽曲データ選択部１０１から出力されたリファレンス映像データとを比較して、ライブ映像とリファレンス映像とから映像のずれ、すなわち楽曲の進行のずれを検出し、当該楽曲の進行のずれに基づいてタイムコード（時刻情報）を出力する機能を有している。 The time alignment unit 102 acquires live video data read from the live data buffer area 13a by the CPU 11, compares the live video data with the reference video data output from the music data selection unit 101, and performs live processing. It has a function of detecting a video shift from the video and the reference video, that is, a shift in the progress of the music, and outputting a time code (time information) based on the shift in the music progress.

ここで、楽曲の進行のずれは、それぞれのデータを所定時間長のフレーム単位に分離し、その各々から映像の特徴を示す特徴量を所定のアルゴリズムによって抽出し、これらの各フレーム間で類似した特徴量を対応付けることにより検出する。ここで、所定のアルゴリズムとは、本実施形態においては、映像データに係る映像の明暗、色彩やその時間変化などから輪郭を抽出し、人物が存在すると推定される輪郭を特定する。そして、特定された輪郭のうち、事前に設定されたパターンとの比較を行い、歌手であると推定される輪郭を特定し、その顔部分、さらに口元部分を特定する。このようにして特定された口元部分の形状（例えば、唇における数箇所の位置）を特徴量とすればよい。なお、このアルゴリズムは一例であって、映像の特徴を示す特徴量を抽出することができれば、どのような方法を用いてもよい。 Here, the deviation of the progression of the music is similar to each frame by separating each data into frame units of a predetermined time length, extracting feature values indicating the features of the video from each by a predetermined algorithm. Detection is performed by associating feature amounts. Here, in the present embodiment, the predetermined algorithm is to extract a contour from the brightness and color of the video associated with the video data, color, and its temporal change, and specify a contour that is estimated to be a person. Then, among the identified contours, a comparison with a preset pattern is performed, the contour estimated to be a singer is identified, and the face portion and further the mouth portion are identified. The shape of the mouth portion specified in this way (for example, several positions on the lips) may be used as the feature amount. Note that this algorithm is an example, and any method may be used as long as it can extract a feature amount indicating the feature of the video.

また、楽曲の進行のずれを検出する機能について、本実施形態ではＤＰ（ＤｙｎａｍｉｃＰｒｏｇｒａｍｍｉｎｇ：動的計画法）マッチングを用いる。具体的には以下のような処理となる。 In addition, in this embodiment, DP (Dynamic Programming) matching is used for the function of detecting the progression of music progression. Specifically, the processing is as follows.

タイムアライメント部１０２は、図３に示すような座標平面（以下、ＤＰプレーンという）をＲＡＭ１３に形成する。このＤＰプレーンの縦軸は、ライブ映像データに係る映像のうち、歌手の口元部分を検出し、それぞれ所定時間長のフレーム単位に分離してその各々から映像の特徴（本実施形態においては、唇における数箇所の位置）を抽出して得られる特徴量について、各フレームの特徴量（以下、パラメータという）をａ１、ａ２、ａ３・・・ａｎとして、時間軸に従って並べたものである。また、横軸のｂ１、ｂ２、ｂ３・・・ｂｎは、リファレンス映像データについて、上記同様に抽出して得られるパラメータを時間軸に従って並べたものである。ここで、縦軸のａ１、ａ２、ａ３・・・ａｎの間隔と横軸のｂ１、ｂ２、ｂ３・・・ｂｎの間隔は、いずれもフレームの時間長と対応している。このＤＰプレーンにおける各格子点の各々には、ａ１、ａ２、ａ３・・・ａｎの各パラメータと、ｂ１、ｂ２、ｂ３・・・ｂｎの各パラメータのユークリッド距離を夫々示す値であるＤＰマッチングスコアが対応付けられている。例えば、ａ１とｂ１とにより位置決めされる格子点（ａ１，ｂ１）には、ライブ映像データの一連のフレームのうち最初のフレームから得たパラメータとリファレンス映像データの一連のフレームのうち最初のフレームから得たパラメータのユークリッド距離を示す値が対応付けられることになる。 The time alignment unit 102 forms a coordinate plane (hereinafter referred to as a DP plane) as shown in FIG. The vertical axis of this DP plane detects the singer's mouth portion of the video related to the live video data, and separates each frame into frames each having a predetermined time length. Are extracted according to the time axis, with feature amounts (hereinafter referred to as parameters) of each frame as a1, a2, a3... An. Also, b1, b2, b3... Bn on the horizontal axis are obtained by arranging parameters obtained by extracting the reference video data in the same manner as described above according to the time axis. Here, the intervals of a1, a2, a3... An on the vertical axis and the intervals of b1, b2, b3... Bn on the horizontal axis all correspond to the time length of the frame. Each lattice point in the DP plane has a DP matching score which is a value indicating the Euclidean distance of each parameter of a1, a2, a3... An and each parameter of b1, b2, b3. Are associated. For example, the lattice point (a1, b1) positioned by a1 and b1 includes the parameter obtained from the first frame of the series of frames of the live video data and the first frame of the series of frames of the reference video data. A value indicating the Euclidean distance of the obtained parameter is associated.

そして、タイムアライメント部１０２は、このような構造を成すＤＰプレーンを形成した後、ａ１とｂ１とにより位置決めされる始端にあたる格子点（ａ１，ｂ１）からａｎとｂｎとにより位置決めされる終端にあたる格子点（ａｎ，ｂｎ）に至る全経路を探索し、探索した経路ごとに、その始端から終端までの間に辿る各格子点のＤＰマッチングスコアを累算して累算値を求める。なお、始端と終端は各データの最初のフレームと最後のフレームということではなく、各データから所定のフレーム数を単位として行われ、この単位における最初のフレームから最後のフレームまで行われて、これが順次処理され各データの最後のフレームまで処理されていく。 After the time alignment unit 102 forms the DP plane having such a structure, the lattice corresponding to the end positioned by an and bn from the lattice point (a1, b1) positioned by the start end positioned by a1 and b1. All routes to the point (an, bn) are searched, and for each searched route, the DP matching score of each lattice point traced from the start end to the end is accumulated to obtain an accumulated value. Note that the start and end are not the first and last frames of each data, but are performed in units of a predetermined number of frames from each data, and are performed from the first frame to the last frame in this unit. The data is sequentially processed until the last frame of each data is processed.

そして、ＤＰマッチングスコアの累算値が最小となる経路をＤＰプレーン上から特定し、その経路上の各格子点によって、ライブ映像データの各フレームにリファレンス映像データの各フレームが対応付けられる。この対応関係により楽曲の進行のずれを検出することができる。例えば、図３に示すＤＰプレーン上に記された経路においては、ａ１とｂ１により位置決めされる格子点（ａ１，ｂ１）からその右上のａ２とｂ２により位置決めされる格子点（ａ２，ｂ２）に進んでいることが分かる。この場合、ａ２のフレームとｂ２のフレームとの時間軸上の位置は当初から同じである。一方、この経路においては、ａ２とｂ２により位置決めされる格子点（ａ２，ｂ２）からその右のａ２とｂ３により位置決めされる格子点（ａ２，ｂ３）に進んでいる。楽曲の進行のずれが無ければ格子点（ａ３，ｂ３）へ進み、ｂ３のフレームの時間軸上の位置に対応すべきフレームはａ３のフレームとなるものであるが、格子点（ａ２，ｂ３）に進んでいることから、ｂ３のフレームはａ３ではなくａ２のフレームの時間軸上の位置と同じであると対応付けられ、楽曲の進行のずれが発生していることになる。すなわち、リファレンス映像データのフレームｂ３までに進行する映像の内容が、ライブ映像データのフレームａ２までに進行していることになるから、この時点においては、リファレンス映像よりもライブ映像の方が早く進んでいることになる。このようにして、楽曲の進行のずれを検出することができる。そして、ライブ映像データの全てのフレームに対して、リファレンス映像データのフレームを対応付けて、楽曲の進行のずれを検出する。以上がＤＰマッチングの仕組みである。 Then, a path that minimizes the accumulated value of the DP matching score is specified on the DP plane, and each frame of the reference video data is associated with each frame of the live video data by each grid point on the path. A shift in the progression of music can be detected by this correspondence. For example, in the path marked on the DP plane shown in FIG. 3, from the lattice point (a1, b1) positioned by a1 and b1, to the lattice point (a2, b2) positioned by a2 and b2 at the upper right You can see that it is going. In this case, the positions on the time axis of the frame a2 and the frame b2 are the same from the beginning. On the other hand, in this route, the grid point (a2, b2) positioned by a2 and b2 advances to the grid point (a2, b3) positioned by a2 and b3 on the right. If there is no shift in the progression of the music, the process proceeds to the grid point (a3, b3), and the frame that should correspond to the position of the frame b3 on the time axis is the frame a3, but the grid point (a2, b3) Therefore, the frame of b3 is associated with the same position on the time axis of the frame of a2, not a3, and the progress of the music is generated. That is, since the content of the video that has progressed up to the frame b3 of the reference video data has progressed to the frame a2 of the live video data, the live video progresses faster than the reference video at this point. It will be out. In this way, it is possible to detect a shift in the progression of music. Then, the frame of the reference video data is associated with all the frames of the live video data to detect a shift in the progression of the music. The above is the mechanism of DP matching.

次に、タイムアライメント部１０２が検出した楽曲の進行のずれに基づいてタイムコードを順次出力する機能について説明する。タイムアライメント部１０２は、上述したように、ライブ映像データの各フレームに対してリファレンス映像データのフレームを対応付けるから、入力されたライブ映像データの時間軸上の位置をリファレンス映像データの時間軸上の位置（以下、再生位置という）として認識することができる。また、この再生位置の時間変化によりテンポを認識することができる。タイムアライメント部１０２は、所定の間隔で、認識した再生位置とテンポに基づいてタイムコードを生成し、順次出力する。もし、タイムアライメント部１０２から順次出力されたタイムコードを参照してリファレンス映像データを読み出して再生したとすれば、リファレンス映像の時間軸が伸縮され、ライブ映像と同等な映像の進行で再生することができる。 Next, a function of sequentially outputting time codes based on the progress of music detected by the time alignment unit 102 will be described. As described above, the time alignment unit 102 associates the frame of the reference video data with each frame of the live video data, so the position on the time axis of the input live video data is set on the time axis of the reference video data. It can be recognized as a position (hereinafter referred to as a reproduction position). In addition, the tempo can be recognized from the time change of the reproduction position. The time alignment unit 102 generates a time code based on the recognized reproduction position and tempo at predetermined intervals and sequentially outputs them. If the reference video data is read and reproduced by referring to the time code sequentially output from the time alignment unit 102, the time axis of the reference video is expanded and reproduced with the same video progression as the live video. Can do.

図２に戻って、説明を続ける。データ読出部１０３は、タイムアライメント部１０２から順次出力されるタイムコードと、歌詞データに付されたタイムコードとを対応させるようにして、楽曲データ選択部１０１がＲＡＭ１３にバッファした歌詞データを読み出し、データ処理部１０４に順次出力していく。そして、データ処理部（映像データ生成手段）１０４は、データ読出部１０３から順次出力された歌詞データに基づいて歌詞映像データ（第３の映像データ）を生成し、映像合成部（映像合成手段）１０６に出力する。ここで、楽曲の歌詞を示すテキストデータとそのテキストの表示タイミングを示すデータを有する歌詞データは、タイムアライメント部１０２から出力されたタイムコードを参照して読み出すことによりデータ読出部１０３から出力されるシーケンスデータであるから、歌詞映像データは、楽曲の歌詞の表示タイミングがライブ演奏の楽曲の進行にあわせて表示される映像のデータとして生成される。 Returning to FIG. 2, the description will be continued. The data reading unit 103 reads the lyrics data buffered in the RAM 13 by the music data selection unit 101 so that the time codes sequentially output from the time alignment unit 102 correspond to the time codes attached to the lyrics data, The data is sequentially output to the data processing unit 104. Then, the data processing unit (video data generation unit) 104 generates lyric video data (third video data) based on the lyrics data sequentially output from the data reading unit 103, and the video synthesis unit (video synthesis unit). It outputs to 106. Here, the text data indicating the lyrics of the music and the lyrics data having data indicating the display timing of the text are output from the data reading unit 103 by reading with reference to the time code output from the time alignment unit 102. Since it is sequence data, the lyric video data is generated as video data that is displayed in accordance with the progress of the tune of the live performance as the lyrics display timing of the tune.

遅延部（遅延手段）１０５は、ＣＰＵ１１によってライブデータバッファ領域１３ａから読み出されたライブ映像データとライブ楽音データに所定時間の遅延処理を行って出力する。ここで、所定時間は、上述したタイムアライメント部１０２がライブ映像データを取得してから、データ処理部１０４が歌詞映像データを出力するまでの処理に必要な時間が設定されている。このようにすると、遅延部１０５から出力されたライブ楽音データ、ライブ映像データと、データ処理部１０４から出力された歌詞映像データは時間同期したものとなる。 The delay unit (delay unit) 105 performs delay processing for a predetermined time on the live video data and live music data read from the live data buffer area 13a by the CPU 11, and outputs the result. Here, the predetermined time is set to a time required for processing from when the above-described time alignment unit 102 acquires live video data to when the data processing unit 104 outputs lyrics video data. In this way, the live musical sound data and live video data output from the delay unit 105 and the lyrics video data output from the data processing unit 104 are synchronized in time.

映像合成部１０６は、遅延部１０５から出力されたライブ映像データに係るライブ映像に対して、データ処理部１０４から出力された歌詞映像データに係る歌詞の映像（以下、歌詞映像という）をスーパーインポーズした合成映像データを生成して表示部１５へ出力する。ここで、合成映像データは、遅延部１０５において所定時間の遅延処理を行ったライブ映像データに係るライブ映像に対して、時間同期した歌詞映像、すなわち楽曲の歌詞がライブ演奏の楽曲の進行にあわせて表示される映像をスーパーインポーズしたものであり、ライブ映像に楽曲の歌詞映像が合成されることによって、その歌詞映像がライブボーカル、ライブ演奏、ライブ映像に合った映像の映像データとして生成される。 The video composition unit 106 superimposes a lyric video (hereinafter referred to as a lyric video) related to the lyric video data output from the data processing unit 104 with respect to the live video related to the live video data output from the delay unit 105. The paused synthesized video data is generated and output to the display unit 15. Here, the synthesized video data is a time-synchronized lyric video, that is, the lyrics of the music in accordance with the progress of the music of the live performance, with respect to the live video related to the live video data subjected to the delay process for a predetermined time in the delay unit 105. Superimpose the displayed video, and by synthesizing the lyric image of the song with the live video, the lyric video is generated as video data for the live vocal, live performance, and live video. The

このようにして、表示部１５には合成映像データが出力され、音声出力部１８にはライブ楽音データが出力されることにより、本実施形態に係るデータ再生装置を有する映像表示装置１は、もともとのライブ映像に対して時間同期、すなわち楽曲の進行に合わせた歌詞映像が合成された映像と楽曲とを再生することができる。 In this way, the composite video data is output to the display unit 15 and the live musical sound data is output to the audio output unit 18, so that the video display device 1 having the data reproduction device according to the present embodiment is originally It is possible to reproduce a video and a music in which the live video is synchronized with time, that is, a lyric video combined with the progress of the music.

以上、本発明の実施形態について説明したが、本発明は以下のように、さまざまな態様で実施可能である。 As mentioned above, although embodiment of this invention was described, this invention can be implemented in various aspects as follows.

＜変形例１＞
実施形態においては、リファレンス映像データと対応する歌詞データは、楽曲の歌詞を示すテキストデータと当該テキストデータの各テキストの表示タイミングを示すデータとを有するシーケンスデータであったが、楽曲の歌詞をリファレンス映像データと時間同期して再生できるようにしたタイムコードの付された映像データ（第１のデータ）であってもよい。この場合、以下のようにすればよい。データ読出部１０３は、実施形態に示したようにタイムコードを参照し、映像データである歌詞データを読み出して、データ処理部１０４に順次出力する。これにより、データ読出部１０３から順次出力された歌詞データは、読み出されるときに楽曲の進行に合わせるように時間軸が伸縮されてデータ処理部１０４に出力される。そして、データ処理部１０４は、この時間軸が伸縮された歌詞データを歌詞映像データ（第３の映像データ）として生成して出力する。このようにすれば、歌詞データが映像データであっても、本発明の効果を奏することができる。なお、映像データについては、歌詞データに限らず、楽曲の楽譜など、入力された映像データ（実施形態においてはライブ映像データ）の楽曲の進行に合わせて表示させたい映像のデータなら、どのような映像データでもよい。 <Modification 1>
In the embodiment, the lyric data corresponding to the reference video data is sequence data including text data indicating the lyrics of the music and data indicating the display timing of each text of the text data. It may be video data (first data) with a time code that can be reproduced in time synchronization with the video data. In this case, the following may be performed. As shown in the embodiment, the data reading unit 103 refers to the time code, reads the lyric data that is video data, and sequentially outputs it to the data processing unit 104. Thus, the lyrics data sequentially output from the data reading unit 103 is output to the data processing unit 104 with the time axis expanded and contracted so as to match the progress of the music when read. Then, the data processing unit 104 generates and outputs lyrics data with the time axis expanded and contracted as lyrics video data (third video data). In this way, even if the lyrics data is video data, the effects of the present invention can be achieved. Note that the video data is not limited to lyrics data, and any video data that is to be displayed in accordance with the progress of the music in the input video data (live video data in the embodiment), such as a music score, may be used. Video data may be used.

＜変形例２＞
実施形態においては、楽曲の進行に合わせて歌詞を表示させるようにしていたが、これに加えて、外部の他の装置を楽曲の進行に合わせて制御するようにしてもよい。この場合は、以下のような構成とすればよい。図４に示すように、例えば、ＡＵＸ（Ａｕｘｉｌｉａｒｙ）端子などの制御信号出力部１９を設け、楽曲データ記憶領域１４ａに記憶されているリファレンス楽曲データは、リファレンス映像データ、歌詞データに加え、制御信号データ（第２のデータ）を有するようにする。ここで、制御信号データは、ＡＵＸ端子に接続される外部の装置を制御する信号とその制御のタイミングを示すデータを有するシーケンスデータであって、歌詞データと同様にリファレンス映像データと時間同期して外部の装置を制御できるようにタイムコード（同期情報）が付されている。 <Modification 2>
In the embodiment, the lyrics are displayed in accordance with the progress of the music, but in addition to this, other external devices may be controlled in accordance with the progress of the music. In this case, the following configuration may be used. As shown in FIG. 4, for example, a control signal output unit 19 such as an AUX (Auxiliary) terminal is provided, and the reference music data stored in the music data storage area 14a is a control signal in addition to reference video data and lyrics data. It has data (second data). Here, the control signal data is sequence data having a signal for controlling an external device connected to the AUX terminal and data indicating the timing of the control, and is synchronized with the reference video data in the same manner as the lyrics data. A time code (synchronization information) is attached so that an external device can be controlled.

そして、図５に示すように、実施形態における楽曲データ選択部１０１の動作に加えて、楽曲データ記憶領域１４ａに記憶されている制御信号データを読み出す。そして、制御信号データをデータ読出部１０３に読み出されるまでＲＡＭ１３にバッファしておく。データ読出部１０３は、歌詞データを読み出すときと同様に、タイムアライメント部１０２から順次出力されるタイムコードと、制御信号データに付されたタイムコードとを対応させるようにして、楽曲データ選択部１０１がＲＡＭ１３にバッファした制御信号データを読み出し、制御信号出力部１９に順次出力していく。このようにすると、データ再生装置を有する映像表示装置１は、制御信号出力部に接続される外部の装置が、楽曲の進行にあわせて出力される制御信号に基づいて制御されるため、楽曲の進行にあわせて外部の装置を動作させることができる。なお、外部の装置には、照明、音響機器、ロボットなど、制御信号によって制御できるものであれば、どのようなものにも適用できる。この場合は、制御信号データは、制御したい装置にあわせたデータとしておけばよい。さらに、複数の外部の装置を同時に制御したい場合には、複数の制御信号データを用意しておき、制御信号出力部１９に複数の装置を接続できるようにしておけばよい。この接続は、有線接続であってもよいし、無線接続であってもよく、信号が伝達できるものであればよい。また、本変形例における歌詞データは必須のものではなく、歌詞データを用いない場合には、データ処理部１０４、映像合成部１０６も必要なく、遅延部１０５から出力されるライブ映像データは、そのまま表示部１５へ出力するようにすればよい。また、歌詞データを用いない場合には、遅延部１０５におけるライブ映像データ、ライブ楽音データを遅延させる所定時間は、タイムアライメント部１０２がライブ映像データを取得してから、データ読出部１０３が制御信号出力部１９に制御信号データを出力するまでの処理に必要な時間を設定すればよい。 Then, as shown in FIG. 5, in addition to the operation of the music data selection unit 101 in the embodiment, the control signal data stored in the music data storage area 14a is read. The control signal data is buffered in the RAM 13 until it is read by the data reading unit 103. Similar to the case of reading the lyric data, the data reading unit 103 associates the time code sequentially output from the time alignment unit 102 with the time code attached to the control signal data so as to correspond to the music data selection unit 101. Reads out the control signal data buffered in the RAM 13 and sequentially outputs it to the control signal output unit 19. In this way, the video display device 1 having the data reproduction device controls the external device connected to the control signal output unit based on the control signal output in accordance with the progress of the music. An external device can be operated as it progresses. Note that the external device can be applied to any device that can be controlled by a control signal, such as a lighting device, an acoustic device, and a robot. In this case, the control signal data may be data that matches the device to be controlled. Furthermore, when it is desired to simultaneously control a plurality of external devices, a plurality of control signal data may be prepared so that a plurality of devices can be connected to the control signal output unit 19. This connection may be a wired connection or a wireless connection, as long as a signal can be transmitted. In addition, the lyrics data in this modification is not essential, and when the lyrics data is not used, the data processing unit 104 and the video composition unit 106 are not necessary, and the live video data output from the delay unit 105 is used as it is. What is necessary is just to make it output to the display part 15. FIG. When lyrics data is not used, the predetermined time for delaying the live video data and live music data in the delay unit 105 is determined by the data reading unit 103 after the time alignment unit 102 acquires the live video data. What is necessary is just to set time required for the process until it outputs control signal data to the output part 19. FIG.

＜変形例３＞
実施形態においては、楽曲の進行に合わせて歌詞を表示させるようにしていたが、これに加えて、ライブ演奏に別の楽音を楽曲の進行に合わせて再生するようにしてもよい。この場合は、以下のような構成とすればよい。図６に示すように、楽曲データ記憶領域１４ａに記憶されているリファレンス楽曲データは、リファレンス映像データ、歌詞データに加え、ＭＩＤＩデータ（第２のデータ）を有するようにする。ここで、ＭＩＤＩデータには、歌詞データと同様に、リファレンス映像データと時間同期して再生できるように時刻情報（同期情報）が付されている。 <Modification 3>
In the embodiment, the lyrics are displayed in accordance with the progress of the music. However, in addition to this, another musical sound may be reproduced in accordance with the progress of the music in the live performance. In this case, the following configuration may be used. As shown in FIG. 6, the reference music data stored in the music data storage area 14a has MIDI data (second data) in addition to reference video data and lyrics data. Here, time information (synchronization information) is attached to the MIDI data so that it can be reproduced in time synchronization with the reference video data, like the lyrics data.

そして、図７に示すように、楽曲データ選択部１０１は、実施形態における動作に加えて、楽曲データ記憶領域１４ａに記憶されているＭＩＤＩデータを読み出す。そして、ＭＩＤＩデータをデータ読出部１０３に読み出されるまでＲＡＭ１３にバッファしておく。データ読出部１０３は、歌詞データを読み出すときと同様に、タイムアライメント部１０２から順次で出力されるタイムコードと、ＭＩＤＩデータに付された時刻情報とを対応させるようにして、楽曲データ選択部１０１がＲＡＭ１３にバッファしたＭＩＤＩデータを読み出し、再生部１０７に順次出力する。ここで、時刻情報がタイムコードである場合には、そのまま対応させればよいが、デュレーションなどタイムコード以外の相対時刻を表す情報である場合には、ＣＰＵ１１はリファレンス映像データと時間同期して再生できるように設定されるテンポなどを参照してタイムコードを生成して対応させればよい。これにより、データ読出部１０３から順次出力されたＭＩＤＩデータは、読み出されるときに楽曲の進行に合わせるように読み出され再生部１０７に出力される。そして、再生部１０７は、このＭＩＤＩデータを再生してオーディオデータであるＭＩＤＩ楽音データを生成し、音声合成部１０８へ出力する。一方、遅延部１０５によって遅延処理されたライブ楽音データも音声合成部（音声合成手段）１０８へ出力される。そして、音声合成部１０８は、ライブ楽音データに係るライブ演奏とＭＩＤＩ楽音データに係るＭＩＤＩ演奏とをミキシングした音を合成楽音データとして生成し、音声出力部１８へ出力する。なお、ライブ楽音データとＭＩＤＩ楽音データとをミキシングせずに別々に音声出力部１８に出力するようにしてもよい。この場合は、それぞれのデータについて音声出力部１８においてミキシングしてもよいし、ミキシングせず異なる放音手段から放音するようにしてもよい。ここで、再生部１０７は、ＭＩＤＩデータを再生することができる音源部を有することによって、ＭＩＤＩ楽音データを生成することができる。なお、音源部によって生成されるオーディオデータの楽音の発音内容を示すシーケンスデータであれば、必ずしもＭＩＤＩデータでなくてもよい。 Then, as shown in FIG. 7, the music data selection unit 101 reads out MIDI data stored in the music data storage area 14 a in addition to the operation in the embodiment. The MIDI data is buffered in the RAM 13 until it is read by the data reading unit 103. Similar to the case of reading the lyrics data, the data reading unit 103 associates the time code sequentially output from the time alignment unit 102 with the time information attached to the MIDI data so as to correspond to the music data selecting unit 101. Reads out the MIDI data buffered in the RAM 13 and sequentially outputs it to the reproduction unit 107. Here, when the time information is a time code, it may be handled as it is. However, when it is information representing a relative time other than the time code such as a duration, the CPU 11 reproduces the time in synchronization with the reference video data. It is only necessary to generate a time code with reference to a tempo that is set so that it can be performed. As a result, the MIDI data sequentially output from the data reading unit 103 is read and output to the reproducing unit 107 in accordance with the progress of the music when being read. Then, the reproduction unit 107 reproduces this MIDI data to generate MIDI musical sound data that is audio data, and outputs it to the voice synthesis unit 108. On the other hand, the live musical sound data delayed by the delay unit 105 is also output to the voice synthesis unit (speech synthesis unit) 108. Then, the voice synthesizer 108 generates a sound obtained by mixing the live performance related to the live music data and the MIDI performance related to the MIDI music data as synthesized music data, and outputs it to the voice output unit 18. Note that the live musical sound data and the MIDI musical sound data may be separately output to the audio output unit 18 without being mixed. In this case, each data may be mixed in the audio output unit 18 or may be emitted from different sound emitting means without being mixed. Here, the reproducing unit 107 can generate MIDI musical tone data by having a sound source unit capable of reproducing MIDI data. Note that the sequence data does not necessarily need to be MIDI data as long as it is sequence data indicating the tone generation content of the audio data generated by the sound source unit.

また、楽曲データ記憶領域１４ａに記憶されているリファレンス楽曲データのＭＩＤＩデータに代えて、リファレンス映像データと時間同期して再生できるようにタイムコード（同期情報）の付されたオーディオデータである追加楽音データ（第２のデータ）としてもよい。この場合は、以下のように処理される。まず、楽曲データ選択部１０１は、楽曲データ記憶領域１４ａに記憶されている追加楽音データを読み出す。そして、追加楽音データをデータ読出部１０３に読み出されるまでＲＡＭ１３にバッファしておく。データ読出部１０３は、歌詞データを読み出すときと同様に、タイムアライメント部１０２から順次出力されるタイムコードと、追加楽音データに付されたタイムコードとを対応させるようにして、楽曲データ選択部１０１がＲＡＭ１３にバッファした追加楽音データを読み出し、再生部１０７に順次出力する。これにより、データ読出部１０３から順次出力された追加楽音データは、読み出されるときに楽曲の進行に合わせるように時間軸が伸縮されて再生部１０７に出力される。そして、再生部１０７は、この時間軸が伸縮された追加楽音データを新たなオーディオデータとして、音声合成部１０８へ出力する。その後はＭＩＤＩデータの場合と同様に処理すればよい。このようにすれば、映像だけでなく様々な音についても、楽曲の進行にあわせて出力することができる。なお、本変形例においても、変形例３と同様、歌詞データは必須のものではなく、歌詞データを用いない場合には、データ処理部１０４、映像合成部１０６も必要なく、遅延部１０５から出力されるライブ映像データは、そのまま表示部１５へ出力するようにすればよい。また、歌詞データを用いない場合には、遅延部１０５におけるライブ映像データ、ライブ楽音データを遅延させる所定時間は、タイムアライメント部１０２がライブ映像データを取得してから、再生部１０７がオーディオデータを出力するまでの処理に必要な時間を設定すればよい。 Further, instead of the MIDI data of the reference music data stored in the music data storage area 14a, the additional musical sound which is audio data to which time code (synchronization information) is attached so that it can be reproduced in time synchronization with the reference video data. Data (second data) may be used. In this case, processing is performed as follows. First, the music data selection unit 101 reads the additional musical tone data stored in the music data storage area 14a. The additional musical tone data is buffered in the RAM 13 until it is read out by the data reading unit 103. Similar to the case of reading the lyric data, the data reading unit 103 associates the time code sequentially output from the time alignment unit 102 with the time code attached to the additional musical tone data, so that the music data selection unit 101 Reads out the additional musical tone data buffered in the RAM 13 and sequentially outputs it to the playback unit 107. Thus, the additional musical sound data sequentially output from the data reading unit 103 is output to the reproducing unit 107 with the time axis expanded and contracted so as to match the progress of the music when read. Then, the playback unit 107 outputs the additional musical sound data whose time axis is expanded and contracted as new audio data to the voice synthesis unit 108. Thereafter, processing may be performed in the same manner as in the case of MIDI data. In this way, not only video but also various sounds can be output as the music progresses. In this modified example as well, as in modified example 3, the lyric data is not essential, and when the lyric data is not used, the data processing unit 104 and the video synthesizing unit 106 are not necessary and are output from the delay unit 105. The live video data to be performed may be output to the display unit 15 as it is. When the lyrics data is not used, the predetermined time for delaying the live video data and live music data in the delay unit 105 is obtained after the time alignment unit 102 acquires the live video data, and then the playback unit 107 receives the audio data. What is necessary is just to set time required for the process until it outputs.

＜変形例４＞
本実施形態においては、入力される情報にライブ楽音データが含まれていたが、入力されるデータにはオーディオデータが含まれなくてもよい。例えば、変形例３のようにオーディオデータが出力される場合であっても、音声合成部１０８は必要ない。ここで、変形例３の歌詞データを用いない場合を変形例４に適用した場合のソフトウエアの構成を図８に示す。このようにすれば、入力される情報にライブ楽音データがなくても、ライブ映像にあわせてＭＩＤＩデータ、追加楽音データなどのオーディオデータを再生することができる。また、図９のように、ビデオカメラなどの撮影手段で撮影した映像から映像データを生成する映像入力部２１を設けてもよい。このようにすれば、ライブ映像データに代えて、映像入力部２１において利用者の歌唱状態を撮影して生成された映像データを用いることができ、例えば、楽曲データ記憶領域１４ａに記憶されている追加楽音データが楽曲の伴奏を録音したものであった場合には、利用者の歌唱にあわせた楽曲の伴奏を再生することができる。また、追加楽音データに楽曲の歌声も含まれている場合には、利用者は歌唱している真似をするだけで、唇の動きにあわせて見本となる歌手の歌声を含んだ楽曲が再生されるから、見本となる歌手になったような気分を味わうことができる。なお、楽曲情報データについては、利用者が操作部１６を操作することにより入力すればよい。 <Modification 4>
In the present embodiment, live music data is included in the input information. However, the input data may not include audio data. For example, even when audio data is output as in Modification 3, the speech synthesizer 108 is not necessary. Here, FIG. 8 shows a software configuration in the case where the lyrics data of the third modification is not used in the fourth modification. In this way, even if there is no live music data in the input information, audio data such as MIDI data and additional music data can be reproduced in accordance with the live video. Further, as shown in FIG. 9, a video input unit 21 that generates video data from video shot by a shooting unit such as a video camera may be provided. If it does in this way, it can replace with live image data and can use image data generated by photographing a user's singing state in image input part 21, for example, is memorized by music data storage area 14a. If the additional musical sound data is a recording of the musical accompaniment, the musical accompaniment can be reproduced in accordance with the user's song. In addition, if the additional musical sound data includes the singing voice of the song, the user can simply imitate the singing, and the song containing the singing voice of the singer as a sample is played according to the movement of the lips. So you can feel like you're a sample singer. In addition, what is necessary is just to input music information data by operating the operation part 16 by a user.

＜変形例５＞
実施形態においては、通信部１７は、有線、無線などによって、データを受信するチューナなどの通信手段であって、ライブ映像データ、ライブ楽音データ、楽曲情報データを受信していたが、図１に破線で示したように、データ入力部２０を設けて、これらのデータがデータ入力部２０から入力されるようにしてもよい。例えば、これらのデータが、ＤＶＤ（ＤｉｇｉｔａｌＶｅｒｓａｔｉｌｅＤｉｓｃ）などの記録メディアに記録されたデータであれば、データ入力部は、ＤＶＤに記録されたデータを読み取れる光学ドライブであればよい。このようにしても、実施形態と同様な効果を得ることができる。 <Modification 5>
In the embodiment, the communication unit 17 is a communication unit such as a tuner that receives data by wire or wireless, and has received live video data, live music data, and music information data. As indicated by a broken line, a data input unit 20 may be provided and these data may be input from the data input unit 20. For example, if these data are data recorded on a recording medium such as a DVD (Digital Versatile Disc), the data input unit may be an optical drive that can read the data recorded on the DVD. Even if it does in this way, the effect similar to embodiment can be acquired.

＜変形例６＞
実施形態においては、遅延部１０５を設け、ＣＰＵ１１によってライブデータバッファ領域１３ａから読み出されたライブ映像データとライブ楽音データに、タイムアライメント部１０２がライブ映像データを取得してから、データ処理部１０４が歌詞映像データを出力するまでの処理に必要な時間（以下、遅延時間という）の遅延処理を行って出力していた。一方、図１０に示すように、タイムアライメント部１０２からのタイムコードの出力は、タイムコード予測部１０９を介してデータ読出部１０３に出力されるようにしてもよい。この場合は、タイムコード予測部１０９は、タイムアライメント部１０２から順次出力されるタイムコードを参照して、遅延時間後にタイムアライメント部１０２から出力されるタイムコードを予測する。そして、タイムコード予測部１０９は、予測したタイムコードをデータ読出部１０３に順次出力すればよい。ここで、タイムコードの予測は、タイムアライメント部１０２から順次出力されたタイムコードのうち、過去の所定の数のタイムコードを参照し、これらのタイムコードの示す時間軸上の位置の変化量から、遅延時間後のタイムコードを推定すればよい。なお、タイムコードの予測は、遅延時間後にタイムアライメント部１０２から出力されるタイムコードを予測すればよいから、上記方法に限らず、所定のアルゴリズムに基づいて行なえばよい。このようにすれば、遅延部１０５を用いたライブ楽音データ、ライブ演奏データの遅延をさせずに、実施形態の効果を得ることができる。 <Modification 6>
In the embodiment, a delay unit 105 is provided, and after the time alignment unit 102 acquires live video data and live music data read from the live data buffer area 13 a by the CPU 11, the data processing unit 104. Was output after being subjected to a delay process for the time required to process the lyrics video data (hereinafter referred to as the delay time). On the other hand, as shown in FIG. 10, the output of the time code from the time alignment unit 102 may be output to the data reading unit 103 via the time code prediction unit 109. In this case, the time code prediction unit 109 refers to the time codes sequentially output from the time alignment unit 102 and predicts the time code output from the time alignment unit 102 after the delay time. Then, the time code prediction unit 109 may sequentially output the predicted time code to the data reading unit 103. Here, the prediction of the time code is performed by referring to a predetermined number of past time codes among the time codes sequentially output from the time alignment unit 102, and from the amount of change in the position on the time axis indicated by these time codes. The time code after the delay time may be estimated. Note that the time code can be predicted based on a predetermined algorithm, not limited to the above method, because the time code output from the time alignment unit 102 after the delay time may be predicted. In this way, the effects of the embodiment can be obtained without delaying the live musical sound data and live performance data using the delay unit 105.

＜変形例７＞
実施形態においては、タイムアライメント部１０２は、楽曲の進行のずれを検出する機能について、ＤＰマッチングを用いていたが、異なった手法によって楽曲の進行のずれを検出するようにしてもよい。例えば、ＨＭＭ（ＨｉｄｄｅｎＭａｒｋｏｖＭｏｄｅｌ：隠れマルコフモデル）を用いてもよいし、比較対象である各データの特徴量を抽出して比較するようにしてもよく、さらに、映像データに係る映像の類似の程度を判断して対応付けさせてもよい。すなわち、各データの比較をして、各データ間の類似する部分を対応させることができる手法であればどのような手法でもよい。このようにしても、実施形態と同様な効果を得ることができる。 <Modification 7>
In the embodiment, the time alignment unit 102 uses DP matching for the function of detecting a shift in the progress of music. However, the time alignment unit 102 may detect a shift in the progress of music by a different method. For example, HMM (Hidden Markov Model: Hidden Markov Model) may be used, feature amounts of each data to be compared may be extracted and compared, and the similarities of video related to video data may be compared. The degree may be determined and associated. That is, any method may be used as long as it is a method that can compare each data and correspond a similar portion between the data. Even if it does in this way, the effect similar to embodiment can be acquired.

＜変形例８＞
実施形態においては、リファレンス映像データが楽曲データ記憶領域１４ａに記憶されていたが、リファレンス映像データの代わりに、タイムアライメント部１０２においてリファレンス映像データに係る映像から抽出される特徴量を記憶させておいてもよい。このようにすれば、タイムアライメント部１０２においては、リファレンス映像データ係る映像の特徴量を抽出する必要が無く、タイムアライメント部１０２における処理量を軽減することができる。 <Modification 8>
In the embodiment, the reference video data is stored in the music data storage area 14a. However, instead of the reference video data, the time alignment unit 102 stores the feature amount extracted from the video related to the reference video data. May be. In this way, it is not necessary for the time alignment unit 102 to extract the feature amount of the video related to the reference video data, and the processing amount in the time alignment unit 102 can be reduced.

＜変形例９＞
実施形態においては、楽曲データ記憶領域１４ａに記憶されているリファレンス映像データは、歌手の口元部分を撮影した映像データであったが、口元部分だけでなく、より広い範囲、例えば全体の映像であってもよい。口元部分だけ無い映像である場合には、タイムアライメント部１０２は、ライブ映像データだけでなくリファレンス映像データからも口元部分を検出するようにすればよい。 <Modification 9>
In the embodiment, the reference video data stored in the music data storage area 14a is video data obtained by photographing the mouth portion of the singer, but it is not only the mouth portion but also a wider range, for example, the entire video. May be. When the video has only the mouth part, the time alignment unit 102 may detect the mouth part not only from the live video data but also from the reference video data.

＜変形例１０＞
実施形態においては、タイムアライメント部１０２において抽出される特徴量は、ライブ映像データに係る映像のうち、検出された歌手の口元部分から抽出されていたが、他の部分から抽出してもよい。他の部分としては、例えば、腕の動き、演奏楽器の動きなど動きのある部分であれば、どのような部分であってもよい。また、位置検出用マーカをライブ演奏の演奏者に所持してもらい、ライブ映像の進行にあわせて動かすことにより、当該マーカの動きから特徴量を抽出することもできる。位置検出用マーカは、例えば蛍光塗料や特定の色彩を有するボール、スティックなどを用いればよく、ライブ映像データに係る映像において検出しやすいようなマーカであれば、どのようなものであってもよい。なお、楽曲データ記憶領域１４ａに記憶されるリファレンス映像データは、上記抽出する部分に応じた映像のデータであればよい。 <Modification 10>
In the embodiment, the feature amount extracted by the time alignment unit 102 is extracted from the mouth part of the detected singer in the video related to the live video data, but may be extracted from other parts. The other part may be any part as long as it is a moving part such as a movement of an arm or a movement of a musical instrument. In addition, a feature amount can be extracted from the movement of the marker by having a live performance player possess the position detection marker and moving it in accordance with the progress of the live video. The position detection marker may be, for example, a fluorescent paint, a ball having a specific color, a stick, or the like, and may be any marker as long as it is easy to detect in an image related to live video data. . The reference video data stored in the music data storage area 14a may be video data corresponding to the extracted portion.

＜変形例１１＞
実施形態においては、楽曲情報データは、通信部１０７によって外部から受信していたが、受信するデータには、楽曲情報データが含まれていなくてもよい。この場合には、利用者が操作部１６を操作して、楽曲情報データを入力してもよいし、図１１に示すように楽曲検索部１００を設けてもよい。ここで、楽曲検索部１００は、入力されたライブ映像データの一部と楽曲データ記憶領域１４ａに記憶されているリファレンス映像データの一部とを比較して、対応する楽曲を特定し、特定した楽曲の楽曲情報データを出力すればよい。また、リファレンス演奏を録音したリファレンス楽音データについても、楽曲データ記憶領域１４ａに記憶させておけば、図中の破線矢印で示すように、楽曲検索部１００は、ライブ楽音データを取得して、ライブ楽音データの一部から楽曲を特定することもできる。このようにすれば、利用者が自ら楽曲を指定することなく、楽曲情報データを生成して楽曲を特定することができる。 <Modification 11>
In the embodiment, the music information data is received from the outside by the communication unit 107, but the received data may not include the music information data. In this case, the user may input the music information data by operating the operation unit 16, or the music search unit 100 may be provided as shown in FIG. Here, the music search unit 100 compares a part of the input live video data with a part of the reference video data stored in the music data storage area 14a to identify and specify the corresponding music. What is necessary is just to output the music information data of a music. In addition, if the reference musical sound data in which the reference performance is recorded is stored in the music data storage area 14a, the music search unit 100 acquires the live musical sound data as shown by the broken arrow in the figure, and performs live recording. It is also possible to specify a music piece from a part of musical sound data. If it does in this way, a user can generate music information data and specify music, without designating music himself.

＜変形例１２＞
実施形態においては、タイムアライメント部１０２において、ライブ映像における歌手の口元部分を検出し、唇部分の数箇所の位置を特徴量として、リファレンス映像と比較、対応付けを行っていた。ここで、タイムアライメント部１０２における対応付けは、必ずしも実施形態のように口元部分だけでなく、複数の部分（例えば、腕の動き、マイクの位置、他の演奏者の動きなど）を用いてもよい。この場合は、楽曲データ記憶領域１４ａに記憶されるリファレンス映像データは、複数の部分の映像のデータを有するデータとすればよい。タイムアライメント部１０２は、複数の部分のそれぞれにおいて、ライブ映像データとリファレンス映像データを比較し対応付けてタイムコードを生成するとともに、複数の部分のうち対応付けの信頼度の高い部分において生成されたタイムコードを出力すればよい。このようにすると対応付けの精度を向上することができ、楽曲の進行にあわせた歌詞をより正確に表示させることができる。なお、信頼度の高い部分において生成されたタイムコードを出力するのではなく、それぞれの部分においてタイムコードを生成し、信頼度に応じて加重平均をとったタイムコードを生成し、出力するようにしてもよい。 <Modification 12>
In the embodiment, the time alignment unit 102 detects the mouth portion of the singer in the live video, and compares and associates it with the reference video using the positions of several positions of the lip portion as feature amounts. Here, the association in the time alignment unit 102 is not necessarily limited to the mouth portion as in the embodiment, but a plurality of portions (for example, movement of the arm, position of the microphone, movement of another player, etc.) may be used. Good. In this case, the reference video data stored in the music data storage area 14a may be data having video data of a plurality of portions. The time alignment unit 102 generates a time code by comparing and associating live video data with reference video data in each of a plurality of parts, and is generated in a part having a high degree of correspondence among the plurality of parts. What is necessary is just to output a time code. If it does in this way, the precision of matching can be improved and the lyrics according to progress of music can be displayed more correctly. Instead of outputting the time code generated in the part with high reliability, the time code is generated in each part, and the time code taking the weighted average according to the reliability is generated and output. May be.

＜変形例１３＞
タイムアライメント部１０２において、歌手の口元部分を検出するときに、ライブ映像に複数の人が存在する場合などにおいて、所定のアルゴリズムにおいて歌手の特定ができない場合などは、その旨を表示部１５に表示させ、利用者は表示部１５に表示された映像を参照しながら操作部１６を操作することにより、歌手の特定を行なってもよい。このように、タイムアライメント部１０２における処理において、問題が発生した場合には、利用者が操作部１６を操作して、その処理の支援をしてもよい。 <Modification 13>
When the time alignment unit 102 detects the singer's lip, if there are a plurality of people in the live video and the singer cannot be specified by a predetermined algorithm, the fact is displayed on the display unit 15. The user may specify the singer by operating the operation unit 16 while referring to the video displayed on the display unit 15. As described above, when a problem occurs in the processing in the time alignment unit 102, the user may operate the operation unit 16 to support the processing.

実施形態に係る映像表示装置のハードウエアの構成を示すブロック図である。It is a block diagram which shows the structure of the hardware of the video display apparatus which concerns on embodiment. 実施形態に係る映像表示装置のソフトウエアの構成を示すブロック図である。It is a block diagram which shows the structure of the software of the video display apparatus which concerns on embodiment. ＤＰマッチングを行う際のＤＰプレーンを示す説明図である。It is explanatory drawing which shows DP plane at the time of performing DP matching. 変形例２に係る映像表示装置のハードウエアの構成を示すブロック図である。It is a block diagram which shows the structure of the hardware of the video display apparatus concerning the modification 2. 変形例２に係る映像表示装置のソフトウエアの構成を示すブロック図である。FIG. 12 is a block diagram illustrating a software configuration of a video display device according to Modification 2. 変形例３に係る映像表示装置のハードウエアの構成を示すブロック図である。It is a block diagram which shows the structure of the hardware of the video display apparatus concerning the modification 3. 変形例３に係る映像表示装置のソフトウエアの構成を示すブロック図である。FIG. 12 is a block diagram illustrating a software configuration of a video display device according to Modification 3. 変形例４に係る映像表示装置のソフトウエアの構成を示すブロック図である。It is a block diagram which shows the structure of the software of the video display apparatus which concerns on the modification 4. 変形例４に係る映像表示装置のハードウエアの構成を示すブロック図である。It is a block diagram which shows the structure of the hardware of the video display apparatus concerning the modification 4. 変形例６に係る映像表示装置のソフトウエアの構成を示すブロック図である。It is a block diagram which shows the structure of the software of the video display apparatus which concerns on the modification 6. FIG. 変形例１１に係る映像表示装置のソフトウエアの構成を示すブロック図である。FIG. 20 is a block diagram showing a software configuration of a video display device according to modification example 11.

Explanation of symbols

１…映像表示装置、１０…バス、１１…ＣＰＵ、１２…ＲＯＭ、１３…ＲＡＭ、１３ａ…ライブデータバッファ領域、１４…記憶部、１４ａ…楽曲データ記憶領域、１５…表示部、１６…操作部、１７…通信部、１８…音声出力部、１９…制御信号出力部、２０…データ入力部、２１…映像入力部、１００…楽曲検索部、１０１…楽曲データ選択部、１０２…タイムアライメント部、１０３…データ読出部、１０４…データ処理部、１０５…遅延部、１０６…映像合成部、１０７…再生部、１０８…音声合成部、１０９…タイムコード予測部 DESCRIPTION OF SYMBOLS 1 ... Video display apparatus, 10 ... Bus, 11 ... CPU, 12 ... ROM, 13 ... RAM, 13a ... Live data buffer area, 14 ... Storage part, 14a ... Music data storage area, 15 ... Display part, 16 ... Operation part 17 ... Communication unit, 18 ... Audio output unit, 19 ... Control signal output unit, 20 ... Data input unit, 21 ... Video input unit, 100 ... Music search unit, 101 ... Music data selection unit, 102 ... Time alignment unit, DESCRIPTION OF SYMBOLS 103 ... Data reading part, 104 ... Data processing part, 105 ... Delay part, 106 ... Image | video synthetic | combination part, 107 ... Reproduction | regeneration part, 108 ... Speech synthesis part, 109 ... Time code prediction part

Claims

Storage means for storing the first video data and storing the first data having synchronization information for defining the time for each part of the data;
Time alignment means for associating the second video data supplied from the outside with the first video data in units of frames of a predetermined time length, and generating time information indicating the time for the corresponding part;
Data reading means for reading the first data from the storage means based on the correspondence between the time information and the synchronization information;
A data reproducing apparatus comprising: delay means for delaying the second video data supplied from outside by a predetermined amount.

The time alignment means extracts the second video data supplied from the outside and feature quantities indicating features for each frame of a predetermined time length of the first video data based on a predetermined algorithm set in advance. The data reproducing apparatus according to claim 1, wherein the extracted feature values are associated with each other in units of frames to generate time information indicating the time for the corresponding portion.

The storage means stores, in place of the first video data, a feature amount indicating a feature for each frame of a predetermined time length of the first video data extracted based on a predetermined algorithm set in advance.
The time alignment unit extracts a feature amount indicating a feature for each frame having a predetermined time length of the second video data supplied from the outside based on the predetermined algorithm, and stores the feature amount and the storage unit. The data reproducing apparatus according to claim 1, wherein time information indicating a time for a corresponding portion is generated by associating the feature amount with each other in a frame unit.

The video data generating means for generating third video data based on the first data read by the data reading means is further provided. Data playback device.

Video synthesis means for superimposing the video related to the third video data generated by the video data generation means on the video related to the second video data delayed by the delay means. 5. The data reproducing apparatus according to claim 4, wherein

The delay amount in the delay unit is set as a time from when the second video data is supplied to the time alignment unit to when the third video data is generated by the video data generation unit. The data reproducing apparatus according to claim 4 or 5.

The audio data generation means for generating the first audio data based on the first data read by the data reading means is further provided. Data playback device.

A plurality of audio data is input, further comprising speech synthesis means for mixing musical sounds related to the plurality of audio data;
The delay means delays the predetermined amount of second audio data supplied from outside and time-synchronized with the second video data,
The speech synthesis means mixes the musical sound related to the second audio data delayed by the delay means and the musical sound related to the first audio data generated by the audio data generation means. 8. The data reproducing device according to 7.

The delay amount in the delay unit is set as a time from when the second video data is supplied to the time alignment unit to when the first audio data is generated by the audio data generation unit. The data reproducing apparatus according to claim 7 or 8.

4. The data reproducing apparatus according to claim 1, wherein the first data is control signal data that is sequence data for controlling a device connected to the outside.

The delay amount in the delay unit is set as a time from when the second video data is supplied to the time alignment unit to when the first data is read out by the data reading unit. The data reproducing device described in 1.

Time alignment for generating time information indicating a time for a corresponding portion by associating second video data supplied from the outside with the first video data stored in the storage unit in units of frames having a predetermined time length Process,
Based on the correspondence between the time information and the synchronization information, the first video data is stored and the first data having the synchronization information defining the time for each part of the data is stored. A data reading process for reading one data;
And a delay process for delaying the second video data supplied from the outside by a predetermined amount.

In a computer having storage means,
A storage function for storing the first video data in the storage unit and storing the first data having synchronization information for defining the time for each part of the data;
A time alignment function for associating the second video data supplied from the outside with the first video data in units of a frame of a predetermined time length, and generating time information indicating the time for the corresponding part;
A data read function for reading the first data from the storage means based on the correspondence between the time information and the synchronization information;
A computer-readable program for realizing a delay function for delaying the second video data supplied from the outside by a predetermined amount.