JP2003099435A

JP2003099435A - Multimedia document preparing method, device and storage medium

Info

Publication number: JP2003099435A
Application number: JP2001292899A
Authority: JP
Inventors: Takashi Hanamoto; 貴志花本
Original assignee: Canon Inc
Current assignee: Canon Inc
Priority date: 2001-09-26
Filing date: 2001-09-26
Publication date: 2003-04-04

Abstract

PROBLEM TO BE SOLVED: To easily edit voice data by utilizing a template having a retrieving key word. SOLUTION: The method uses templates which are models for preparing a multimedia document where document description of the reproduction order of musics, reproducing time, etc., can be performed and uses a theme which is the set of the templates. The templates are described in a multimedia description language and become the multimedia documents by describing reference to voice data. At least one key word is set respectively to the templates in the theme, and the system retrieves the voice data corresponding to a selected theme by using the key word. The multimedia documents are prepared by describing reference to the voice data obtained as a retrieval result in respective templates.

Description

Detailed Description of the Invention

【０００１】[0001]

【発明の属する技術分野】本発明は、データ検索技術を
利用したマルチメディアドキュメント作成方法、装置、
およびメディアに関する。BACKGROUND OF THE INVENTION 1. Field of the Invention The present invention relates to a multimedia document creation method, apparatus,
And about the media.

【０００２】[0002]

【従来の技術】ＣＤなどの普及によって、音声データの
デジタル化が進んでおり、デジタル録音・再生が可能な
ＭＤを用いた音楽編集も行われるようになった。また、
近年においては、ＭＰ３などの音声データ圧縮方式の普
及によって、高品質・高圧縮なデジタル音声データを用
いた音声データ編集が可能になった。2. Description of the Related Art With the widespread use of CDs and the like, the digitization of voice data is progressing, and music editing using an MD capable of digital recording / playback has also been performed. Also,
In recent years, the spread of audio data compression methods such as MP3 has enabled audio data editing using high-quality and highly-compressed digital audio data.

【０００３】その結果、メモリーカードに音声データを
保存し、軽量な携帯音声プレイヤーで、その音声データ
を再生することなども行われている。そのような中で、
ユーザーは、自分の欲しい音声データを編集して、ＭＤ
やメモリーカードに保存し、携帯プレイヤーや、カース
テレオで再生している。As a result, it has been practiced to store voice data in a memory card and reproduce the voice data with a lightweight portable voice player. In such a situation
The user edits the voice data he wants and MD
Or save it to a memory card and play it on a mobile player or car stereo.

【０００４】[0004]

【発明が解決しようとする課題】しかしながら、実際に
ユーザーが自分の欲しい音声データを編集し、ＭＤやメ
モリーカードに保存するのには煩雑な作業が必要であ
る。なぜなら、ユーザーが所有する大量の音声データの
中から目的の音声データを検索し、それらをＭＤやメモ
リーカードに保存するのには、多くの労力を要するから
である。また、従来は例えば、特開２０００−０７８５
３０号公報のように、音声でメタデータをつけて動画の
好きなシーンを検出する技術や、特開２０００−０５７
７５３号公報のように、大量の曲データの中からユーザ
ーの好みの曲データについてのみの再生を定義できる技
術などが存在するが、音声データ編集そのものを簡易化
する技術はなかった。However, it is troublesome for the user to actually edit the desired voice data and save it in the MD or the memory card. This is because it takes a lot of work to search for desired voice data from a large amount of voice data owned by the user and store them in the MD or the memory card. Further, conventionally, for example, Japanese Patent Laid-Open No. 2000-0785.
Japanese Patent Laid-Open No. 2000-057, a technique of adding metadata by voice to detect a favorite scene of a moving image as disclosed in Japanese Patent Laid-Open No.
As disclosed in Japanese Patent No. 753, there is a technique capable of defining reproduction of only song data desired by a user from a large amount of song data, but there is no technique for simplifying audio data editing itself.

【０００５】本発明は、音声データ編集に関する上記の
問題について鑑みてなされたものであり、検索用キーワ
ードを持つテンプレートの利用によって、音声データ編
集を容易にすることを目的とする。また、音声データ検
索技術を利用して、音声データ編集を自動化し、その結
果としてマルチメディアドキュメントを作成することを
目的とする。The present invention has been made in view of the above problems relating to voice data editing, and an object thereof is to facilitate voice data editing by using a template having a search keyword. It also aims at automating audio data editing using audio data retrieval technology and, as a result, creating multimedia documents.

【０００６】[0006]

【課題を解決するための手段】上記の目的を達成するた
めに、本発明の一態様による音声データ編集装置は、以
下の構成を備える。すなわち、音声データの再生記述が
行える、音声データ編集用テンプレートを用いた音声デ
ータ編集装置であり、ユーザーによって指定されたテー
マから、そのテーマに沿ったテンプレート群を選び出
す、テンプレート群選択工程と、テンプレートに設定さ
れたキーワードから、テンプレートに再生記述する音声
データを検索する、音声データ検索工程と、検索された
音声データを、テンプレートに再生記述し、その再生用
データを作成する再生用データ作成工程と、作成した再
生用データを、記録用メディアに書き込む、メディア記
録工程と、を備えることを特徴とする、音声データ編集
装置である。In order to achieve the above object, an audio data editing apparatus according to one aspect of the present invention has the following configuration. That is, a voice data editing device using a voice data editing template capable of reproducing and describing voice data, and selecting a template group according to the theme from a theme designated by a user, a template group selecting step, and a template A voice data search step of searching the voice data to be reproduced and described in the template from the keyword set in step 1, and a reproduction data creation step of reproducing and describing the searched voice data in the template and creating the reproduction data. And a media recording step of writing the created reproduction data into a recording medium, the audio data editing device.

【０００７】[0007]

【発明の実施の形態】ここでは、実際にパソコンや専用
機器を用いて音声データ編集成を行う、実施形態につい
て説明する。BEST MODE FOR CARRYING OUT THE INVENTION Here, an embodiment in which voice data editing is actually performed using a personal computer or a dedicated device will be described.

【０００８】図１は、実施形態における、システム構成
を示すブロック図である。図１において、データ入出力
装置１００は、データの入出力を行う装置である。入力
装置１０１は、ユーザーからの指示や、データを入力す
る装置で、キーボードや、マウスなどのポインティング
装置を含む。蓄積装置１０２は、バイナリデータや、メ
タデータを蓄積する装置で、通常は、ハードディスクな
どが用いられる。表示装置１０３は、ＧＵＩや、画像を
表示する装置で、一般的に、ＣＲＴや、液晶ディスプレ
イなどが用いられる。FIG. 1 is a block diagram showing a system configuration in the embodiment. In FIG. 1, a data input / output device 100 is a device that inputs and outputs data. The input device 101 is a device for inputting instructions and data from a user, and includes a keyboard and a pointing device such as a mouse. The storage device 102 is a device that stores binary data and metadata, and a hard disk or the like is usually used. The display device 103 is a device that displays a GUI or an image, and a CRT, a liquid crystal display, or the like is generally used.

【０００９】１０４は、ＣＰＵで、上述した各構成の処
理の全てに関わる。ＲＯＭ１０５と、ＲＡＭ１０６は、
その処理に必要なプログラム、データ、作業領域などを
ＣＰＵ１０４に提供する。また、図４以降のフローチャ
ートの処理に必要な制御プログラムは、蓄積装置１０２
に格納されているか、ＲＯＭ１０５に格納されているも
のとする。蓄積装置１０２に格納されている場合は、一
旦ＲＡＭ１０６に読み込まれてから実行される。Reference numeral 104 denotes a CPU, which is involved in all the processes of the above-described components. The ROM 105 and the RAM 106 are
The CPU 104 is provided with a program, data, a work area and the like necessary for the processing. Further, the control program necessary for the processing of the flowcharts of FIG.
Or stored in the ROM 105. If it is stored in the storage device 102, it is once read into the RAM 106 and then executed.

【００１０】なお、システム構成については、上記以外
にも、様々な構成要素が存在するが、本発明の主眼では
ないので、その説明は省略する。Regarding the system configuration, there are various components other than those described above, but since they are not the main subject of the present invention, the description thereof will be omitted.

【００１１】音声データ編集とは、大量にある音声デー
タの中から、自分の好みの音声データを複数選び出し、
それらを好きな順に並び替えて、再生できる状態にする
こと（マルチメディアドキュメントの作成）を意味す
る。また、編集後には、記録用ディスクメディアなどに
保存される場合が多い。本発明による音声データ編集で
は、音声データ編集用のテンプレートを用意し、そのテ
ンプレートを用いて、自動的にマルチメディアドキュメ
ントを作成し、かつ、記録用メディアへの保存を行うこ
とが可能である。Voice data editing means selecting a plurality of voice data of one's liking from a large amount of voice data,
It means rearranging them in any order and making them playable (creation of multimedia documents). After editing, it is often saved in a recording disk medium or the like. In the audio data editing according to the present invention, it is possible to prepare a template for audio data editing, use the template to automatically create a multimedia document, and save it in a recording medium.

【００１２】図２は、テンプレートの概要を示した図で
ある。テンプレートは主に、ＳＭＩＬなどのマルチメデ
ィア記述言語によって記述され、音声データの再生順
や、部分再生、などの再生用記述が行える。テンプレー
ト２０１は、音声データ検索用キーワード２０２を持っ
ている。そして、キーワードにマッチした音声データへ
のリンクを、テンプレート自身に記述できるようになっ
ている。FIG. 2 is a diagram showing an outline of the template. The template is mainly described in a multimedia description language such as SMIL, and reproduction description such as reproduction order of audio data or partial reproduction can be performed. The template 201 has a voice data search keyword 202. Then, the link to the voice data matching the keyword can be described in the template itself.

【００１３】具体的には、例えば、テンプレートのキー
ワードに「朝」という単語があり、音声データのキーワ
ードにも同じ「朝」という単語があった場合、この音声
データはテンプレートと、マッチしたと判断され、その
リンク先をテンプレートに記述される。その結果、テン
プレート２０１は、マルチメディアドキュメント２０４
となる。Specifically, for example, when the template keyword includes the word "morning" and the voice data keyword also includes the same word "morning", it is determined that the voice data matches the template. The link destination is described in the template. As a result, the template 201 becomes the multimedia document 204.
Becomes

【００１４】図３は、複数のテンプレートが集合して、
１つの大きなテーマを作り上げている様子を示した図で
ある。このように通常は複数のテンプレートを利用する
場合が多い。In FIG. 3, a plurality of templates are collected,
It is a figure showing how one big theme is being created. As described above, usually, a plurality of templates are often used.

【００１５】（実施例１）この実施例では、パソコン上
のアプリケーションや専用機器を用いた、メタデータに
よる音声データ検索と、その音声データ編集方法およ
び、マルチメディアドキュメント作成方法について記述
する。また、音声データ編集後の記録メディアへの保存
方法についても記述する。メタデータとは、データに関
するデータの意味であり、この実施例においては、音声
データに対してのキーワードや、作曲者、歌手名などの
情報を意味する。(Embodiment 1) This embodiment describes a voice data search by metadata using an application on a personal computer or a dedicated device, a voice data editing method therefor, and a multimedia document creating method. It also describes how to save the audio data on the recording medium after editing. The term "metadata" means data relating to data, and in this embodiment, it means information such as a keyword for audio data, a composer name, and a singer name.

【００１６】図４は、音声データに付加されているメタ
データの内容を示した図である。例えば本実施例では、
メタデータはＸＭＬ言語で記述され、「タイトル」、
「作曲者」、「作詞者」、「歌手」、「歌詞」、「ジャ
ンル」、「キーワード」などの情報を含んでいる。もち
ろん、これよりも多くの情報を持っても構わないことは
言うまでもない。メタデータは、図４のように音声デー
タの末尾に付加される。もちろん、これ以外にも、ヘッ
ダー部分にメタデータを記入しても良いし、メタデータ
を別ファイルとして持っていても良い。FIG. 4 is a diagram showing the contents of the metadata added to the audio data. For example, in this embodiment,
The metadata is described in the XML language and includes a "title",
It includes information such as "composer", "lyricist", "singer", "lyrics", "genre", and "keyword". Of course, it goes without saying that you can have more information than this. The metadata is added to the end of the audio data as shown in FIG. Of course, other than this, metadata may be written in the header part, or the metadata may be held as a separate file.

【００１７】また、図５のように、１つの音声データフ
ァイルを「前奏」、「フレーズ１」などのように分割し
て、それぞれにメタデータを持たせることも可能であ
る。このとき、音声データファイルの分割は時間単位で
行われ、それぞれのメタデータには、分割された時間が
記されている（例えば、メタデータ５０２では、＜Ｔｉ
ｍｅ＞０：２０〜３：１１＜／Ｔｉｍｅ＞などのように
記される）。Further, as shown in FIG. 5, it is also possible to divide one audio data file into "prelude", "phrase 1", etc., and give each of them metadata. At this time, the audio data file is divided in units of time, and the divided time is described in each metadata (for example, in the metadata 502, <Ti
me> 0: 20-3: 11 </ Time> etc.).

【００１８】図６は、実際に音声データ編集、およびマ
ルチメディアドキュメント作成方法を行う様子を説明し
たフローチャート図である。FIG. 6 is a flow chart for explaining how to actually perform the audio data editing and the multimedia document creating method.

【００１９】Ｓ６０１において、ユーザーはアプリケー
ション側で用意されている複数あるテーマの内から、編
集したいテーマを選択する。Ｓ６０２では、選択された
テーマを基に、アプリケーション側が図１の蓄積装置１
０４より、図３のようなテンプレート群を読み出す。Ｓ
６０３では、読み込んだテンプレート群中の一枚のテン
プレートよりキーワードを読み出し、ＲＡＭ１０６に記
憶させる。In step S601, the user selects a theme to be edited from a plurality of themes prepared by the application. In S602, the application side sets the storage device 1 of FIG. 1 based on the selected theme.
From 04, the template group as shown in FIG. 3 is read out. S
In 603, the keyword is read from one template in the read template group and stored in the RAM 106.

【００２０】Ｓ６０４では、蓄積装置１０４に保存され
ている音声データの１つを読み出し、音声データに付属
しているメタデータを読み出し、ＲＡＭ１０６に記憶さ
せる。そして、ＲＡＭ１０６に読み込んだテンプレート
のキーワードと、音声データのメタデータを比較する。
この際、テンプレートのキーワードと、歌詞の内容も比
較する。In step S604, one of the audio data stored in the storage device 104 is read out, the metadata attached to the audio data is read out and stored in the RAM 106. Then, the keyword of the template read in the RAM 106 is compared with the metadata of the audio data.
At this time, the keyword of the template and the content of the lyrics are also compared.

【００２１】Ｓ６０５では、Ｓ６０４で比較した結果を
判断するステップで、もし、テンプレートのキーワード
と、音声データのメタデータが一致するならば、Ｓ６０
６に進み、一致しないならば、Ｓ６０７に進む。In step S605, in the step of judging the result of comparison in step S604, if the keyword of the template and the metadata of the audio data match, step S60.
If not, the process proceeds to S607.

【００２２】Ｓ６０６では、一致した音声データを、再
生できるように、テンプレートに再生用の記述（音声デ
ータへの参照を記述）を行う。Ｓ６０７では、蓄積装置
１０４中の全ての音声データに関して、Ｓ６０４の操作
を実行したかを判断するプロセスで、もし全ての音声デ
ータに関して、操作を実行したならば、Ｓ６０８に進
み、実行していないならば、新たな音声データを読み込
み、Ｓ６０４に戻る。In step S606, a description for reproduction (a reference to the sound data is described) is written in the template so that the matched sound data can be reproduced. In S607, in the process of determining whether the operation of S604 has been executed for all the audio data in the storage device 104, if the operation has been executed for all the audio data, the process proceeds to S608, and if not executed. For example, new voice data is read, and the process returns to S604.

【００２３】Ｓ６０８は、テーマ中の全てのテンプレー
トに、音声データの組み込みが終了したかどうかを判断
するプロセスである。もし、全てのテンプレートに対し
て、音声データの組み込みが終了しているならば、全て
のテンプレートを１つにまとめ、それをマルチメディア
ドキュメントとして出力し、音声データ編集を終了す
る。もし、まだ音声データの組み込みの終了していない
テンプレートが存在するならば、Ｓ６０３に戻り、新た
なテンプレートを読み出す。Step S608 is a process for judging whether or not the incorporation of the audio data into all the templates in the theme has been completed. If the incorporation of the audio data has been completed for all the templates, all the templates are combined into one, which is output as a multimedia document, and the audio data editing is completed. If there is a template for which voice data has not yet been incorporated, the process returns to S603 to read a new template.

【００２４】図７は、編集後のマルチメディアドキュメ
ントと、そこで参照されている音声データを、記憶メデ
ィアに保存する方法を説明したフローチャート図であ
る。FIG. 7 is a flow chart for explaining a method of saving the edited multimedia document and the audio data referred to in the edited multimedia document in a storage medium.

【００２５】Ｓ７０１では、編集したマルチメディアド
キュメントと音声データを保存する記録メディアを選択
する。記録メディアとは、例えばＭＤやメモリーカード
のようなものである。メモリーカードである場合は、そ
の容量も指定する。In step S701, a recording medium for storing the edited multimedia document and audio data is selected. The recording medium is, for example, an MD or a memory card. If it is a memory card, also specify its capacity.

【００２６】Ｓ７０２では、編集したマルチメディアド
キュメントと音声データの容量と、記録メディアの容量
とを比較して、どちらが大きいかを判断する。もし、編
集したマルチメディアドキュメントと音声データの容量
のほうが大きい場合は、Ｓ７０３に進む。もし、記録メ
ディアの容量のほうが大きい場合はＳ７０５に進む。Ｓ
７０３はマルチメディアドキュメントと音声データを、
複数の記憶メディアに分割して保存するかどうかを判断
するステップである。もし、複数に分割する場合は、Ｓ
７０５に進む。もし、分割せず一枚のメディアに保存す
る場合は、Ｓ７０４に進む。In step S702, the capacities of the edited multimedia document and audio data are compared with the capacities of the recording media to determine which is larger. If the volumes of the edited multimedia document and audio data are larger, the process proceeds to S703. If the recording medium has a larger capacity, the process proceeds to S705. S
703 is a multimedia document and audio data,
This is a step of determining whether to divide and save the data in a plurality of storage media. If you want to divide into multiple
Proceed to 705. If the data is not divided and stored in one medium, the process proceeds to S704.

【００２７】Ｓ７０４は、記録メディアの容量を越えた
分の、音声データと、その再生用記述を削除するステッ
プである。例えば、６４ＭＢのメモリーカードに７０Ｍ
Ｂのマルチメディアドキュメントと音声データを保存す
る場合は、差分である６ＭＢに一番近い音声データを検
出し、その音声データと、マルチメディアドキュメント
における、その再生用記述を削除する。Ｓ７０５では、
マルチメディアドキュメントと音声データを記録メディ
アに書き込む。Step S704 is a step of deleting the audio data and the reproduction description thereof for the amount exceeding the capacity of the recording medium. For example, 70M on a 64MB memory card
When the multimedia document B and the voice data are stored, the voice data closest to the difference of 6 MB is detected, and the voice data and the reproduction description in the multimedia document are deleted. In S705,
Write multimedia documents and audio data to recording media.

【００２８】このように、ユーザーはテーマを選択する
のみで、音声データの編集を行うことができ、ＭＤやメ
モリーカードなどの記録用メディアへの保存も可能であ
る。もちろん、「ドライブ」のようなテーマの代わり
に、歌手名を指定して編集を行っても良い。本実施例で
は、蓄積装置より音声データを検索したが、Ｗｅｂ上か
ら音声データを検索するような場合であっても構わな
い。また、図５のような、音声データの一部分に付加さ
れたメタデータを用いることによって、曲のワンフレー
ズのみを集めたダイジェスト版をつくることも可能であ
る。As described above, the user can edit the audio data only by selecting the theme, and can save the data in a recording medium such as an MD or a memory card. Of course, the singer name may be designated and edited instead of the theme such as "drive". In this embodiment, the voice data is retrieved from the storage device, but the voice data may be retrieved from the Web. Further, by using the metadata added to a part of the voice data as shown in FIG. 5, it is possible to create a digest version in which only one phrase of the music is collected.

【００２９】（実施例２）この実施例では、記録用メデ
ィアをパソコンや、専用機器に差し込むだけで、音声デ
ータ編集を自動的に実行する方法を説明する。(Embodiment 2) In this embodiment, a method for automatically executing audio data editing by simply inserting a recording medium into a personal computer or a dedicated device will be described.

【００３０】図８は、メモリーカードやＭＤなどのディ
スクメディアに、テンプレートを選択するためのテーマ
（「ドライブ」などの単語）が保存されている様子を示
している。もちろん、テーマのみでなく、テンプレート
そのものを保存していても良い。FIG. 8 shows a state in which a theme (word such as “drive”) for selecting a template is stored in a disk medium such as a memory card or MD. Of course, you may store not only the theme but the template itself.

【００３１】図９は、実際にメモリーカードを挿入し
て、自動的に音声データ編集と、記録メディアへの保存
が行われる様子を説明したフローチャート図である。FIG. 9 is a flow chart for explaining a state in which a memory card is actually inserted and audio data is automatically edited and stored in a recording medium.

【００３２】Ｓ９０１では、メモリーカードを図１の入
出力装置１００に挿入する。Ｓ９０２では、メモリーカ
ードの容量を調べて、メモリーカード内のテーマを読み
込む。Ｓ９０３では、テーマに沿ったテンプレート群を
読み込む。Ｓ９０４では、実施例１と同様に、音声デー
タを検索して、テンプレートに再生用の記述を行う。Ｓ
９０５では作成したマルチメディアドキュメントと、そ
こで参照している音声データをメモリーカードに書き込
む。In step S901, the memory card is inserted into the input / output device 100 shown in FIG. In S902, the capacity of the memory card is checked and the theme in the memory card is read. In S903, a template group according to the theme is read. In S904, as in the first embodiment, the audio data is searched and the description for reproduction is written in the template. S
At 905, the created multimedia document and the audio data referred to therein are written in the memory card.

【００３３】このように、メモリーカードをパソコンや
専用機器に差し込むのみで、マルチメディアドキュメン
トと、そこで参照している音声データを、自動的にメモ
リーカードに保存することができる。この実施例では、
メモリーカードを用いたが、もちろん、ＭＤなどのディ
スクメディアであっても構わない。As described above, by simply inserting the memory card into the personal computer or the dedicated device, the multimedia document and the audio data referred to therein can be automatically stored in the memory card. In this example,
Although a memory card is used, it goes without saying that a disc medium such as an MD may be used.

【００３４】（実施例３）この実施例では、音声データ
の検索方法として、音声認識や曲調を利用した方法につ
いて説明する。(Embodiment 3) In this embodiment, as a method for searching voice data, a method using voice recognition or music tone will be described.

【００３５】図１０は、音声認識を利用した音声データ
検索方法について説明したフローチャート図である。検
索方法以外の部分は実施例１と全く同様であるので、説
明を簡略化する。FIG. 10 is a flow chart for explaining a voice data search method using voice recognition. Since the parts other than the search method are exactly the same as those in the first embodiment, the description will be simplified.

【００３６】Ｓ１００１では、テーマを選択する。Ｓ１
００２では、テーマに沿ったテンプレート群を読み込
む。Ｓ１００３では、蓄積装置１０４内の音声データの
１つを読み込む。Ｓ１００４において、読み込んだ音声
データの歌詞を音声認識し、テキスト化する。In S1001, a theme is selected. S1
In 002, a template group according to the theme is read. In S1003, one of the audio data in the storage device 104 is read. In step S1004, the lyrics of the read voice data are voice-recognized and converted into text.

【００３７】Ｓ１００５では、テンプレートのキーワー
ドと、音声認識した歌詞とを比較する。Ｓ１００６で
は、Ｓ１００５で比較した結果が一致するかどうかを判
断する。もし、一致するならば、Ｓ１００７に進む。も
し、一致しないのであれば作業を終了する。Ｓ１００７
では、テンプレートと一致した音声データを、テンプレ
ートに組み込み、マルチメディアドキュメントを作成す
る。In step S1005, the keyword of the template is compared with the voice-recognized lyrics. In S1006, it is determined whether the results of the comparison in S1005 match. If they match, the process proceeds to S1007. If they do not match, the work ends. S1007
Then, the audio data that matches the template is incorporated into the template to create a multimedia document.

【００３８】このように、音声データを音声認識するこ
とによって、音声データを検索することが可能である。
また、音声データの曲調などから、似たような曲調の音
声データを検索することなども可能である。As described above, the voice data can be searched by recognizing the voice data.
It is also possible to search for voice data having a similar tone based on the tone of the voice data.

【００３９】[0039]

【発明の効果】以上のように、本発明を用いると、テー
マ設定を行うか、記録メディアを挿入するのみで、後は
自動的に音声データ編集を行い、マルチメディアドキュ
メントを作成することが可能である。これにより、従来
煩雑であった音声データ編集作業を、容易に行うことが
でき、マルチメディアドキュメントを作成できる。As described above, according to the present invention, it is possible to create a multimedia document by automatically editing voice data after setting a theme or inserting a recording medium. Is. As a result, it is possible to easily perform the audio data editing work that has been conventionally complicated, and to create a multimedia document.

[Brief description of drawings]

【図１】実施形態におけるマルチメディアドキュメン
ト作成装置のシステム構成を示すブロック図である。FIG. 1 is a block diagram showing a system configuration of a multimedia document creation device according to an embodiment.

【図２】実施形態におけるマルチメディアドキュメン
ト作成に用いるテンプレートの概略図である。FIG. 2 is a schematic diagram of a template used for creating a multimedia document in the embodiment.

【図３】実施形態におけるマルチメディアドキュメン
ト作成に用いるテーマ（テンプレート群）の概略図であ
る。FIG. 3 is a schematic diagram of a theme (group of templates) used for creating a multimedia document in the embodiment.

【図４】実施形態の実施例１における音声データに付
加されたメタデータを説明した図FIG. 4 is a diagram illustrating metadata added to audio data according to Example 1 of the embodiment.

【図５】実施形態の実施例１における分割音声データ
に、メタデータが付加されている様子を説明した図FIG. 5 is a diagram illustrating that metadata is added to the divided audio data according to the first example of the embodiment.

【図６】実施形態の実施例１における音声データ編集
について説明したフローチャート図FIG. 6 is a flowchart for explaining audio data editing in the first example of the embodiment.

【図７】実施形態の実施例１における記録メディアへ
の保存について説明したフローチャート図FIG. 7 is a flowchart for explaining saving to a recording medium in Example 1 of the exemplary embodiment.

【図８】実施形態の実施例２における記録メディアに
テーマが保存されている様子を示した図FIG. 8 is a diagram showing how a theme is stored in a recording medium according to Example 2 of the embodiment.

【図９】実施形態の実施例２における音声データ編集
について説明したフローチャート図FIG. 9 is a flowchart illustrating the audio data editing in the second example of the embodiment.

【図１０】実施形態の実施例３における音声データ編
集について説明したフローチャート図FIG. 10 is a flow chart diagram explaining audio data editing in Example 3 of the embodiment.

[Explanation of symbols]

１００データ入出力装置１０１入力装置１０２蓄積装置１０３表示装置１０４ＣＰＵ１０５ＲＯＭ１０６ＲＡＭ 100 data input / output device 101 Input device 102 storage device 103 display device 104 CPU 105 ROM 106 RAM

───────────────────────────────────────────────────── フロントページの続き (51)Int.Cl.⁷ 識別記号ＦＩテーマコート゛(参考）Ｇ１０Ｌ 15/00 Ｇ１０Ｌ 3/00 ５５１ＧＧ１１Ｂ 27/034 Ｇ１１Ｂ 27/02 ＫＦターム(参考） 5B058 CA23 KA04 KA08 KA11 YA16 5B075 ND14 ND16 ND36 NK02 NK04 PP13 PQ05 UU36 UU37 5B082 GA08 5D015 KK02 5D110 AA08 AA19 AA27 BB01 BB20 CA06 CA07 CB04 CB06 CC04 CJ06 DA02 DB05 DC05 DE01 EA12 ─────────────────────────────────────────────────── ─── Continuation of front page (51) Int.Cl. ⁷ Identification code FI theme code (reference) G10L 15/00 G10L 3/00 551G G11B 27/034 G11B 27/02 K F term (reference) 5B058 CA23 KA04 KA08 KA11 YA16 5B075 ND14 ND16 ND36 NK02 NK04 PP13 PQ05 UU36 UU37 5B082 GA08 5D015 KK02 5D110 AA08 AA19 AA27 BB01 BB20 CA06 CA07 CB04 CB06 CC04 CJ06 DA02 DB05 DC05 DE01 EA12

Claims

[Claims]

1. A method for supporting the creation of a multimedia document to be played back by a music player or the like, which is a template for creating a multimedia document in which document description such as song order and playback time can be described. A template and the process of using a theme, which is a set of templates, a process of becoming a multimedia document by describing a reference to audio data, and a template in a theme has one or more keywords each. It is set, and the system uses the keyword to search the audio data that corresponds to the theme, and in each template, describe the reference to the audio data obtained from the search results, and create a multimedia document. Multimedia doki characterized by creating Instrument creation method.

2. The multimedia document creating method according to claim 1, wherein a plurality of themes are prepared by the system, and the user can select an arbitrary theme from the plurality of themes.

3. The template is a data description language,
Alternatively, the multimedia document creating method according to claim 1, wherein the method is described in an incomplete data description language.

4. The multimedia document creating method according to claim 3, wherein the data description language is described in XML.

5. The multimedia document creation method according to claim 3, wherein the data description language is described in SMIL.

6. The multimedia document creating method according to claim 3, wherein the data description language is described in XHTML.

7. The multimedia document creating method according to claim 3, wherein the data description language is described in HTML.

8. The multimedia document creation method according to claim 3, wherein the data description language is described in SGML.

9. The multimedia document creation method according to claim 1, wherein one or more keywords to be used when searching voice data are set in the template.

10. The template is stored in advance in a recording medium.
How to create the described multimedia document.

11. The multimedia document creating method according to claim 1, wherein the theme is stored in advance in a recording medium.

12. The method for creating a multimedia document according to claim 1, wherein all the steps are automatically performed after inserting the recording medium by using a template and a theme stored in the recording medium. The multimedia document creating method according to claim 1.

13. The multimedia document creation method according to claim 1, wherein metadata is used for voice data search in the voice data search step.

14. The multimedia document creation method according to claim 13, wherein the metadata is described in a data description language.

15. The multimedia document creating method according to claim 14, wherein the data description language is described in XML.

16. The multimedia document creation method according to claim 14, wherein the data description language is described in HTML.

17. The multimedia document creating method according to claim 14, wherein the data description language is described in SGML.

18. The metadata is added to the end of the audio data file.
How to create the described multimedia document.

19. The metadata is audio data,
14. The method of claim 13, wherein the multimedia document is stored in another file.

20. The multimedia document creation method according to claim 13, wherein the metadata is described in a header portion of audio data.

21. The multimedia document creating method according to claim 13, wherein the metadata is also added to divided data obtained by dividing an audio data file by a time unit.

22. The multimedia document creating method according to claim 1, wherein voice recognition is used for voice retrieval in the voice data retrieval step.

23. The multimedia document creating method according to claim 1, wherein the reproduction audio data in the multimedia document output step is audio data.

24. The multimedia document creating method according to claim 1, wherein the reproduction audio data in the multimedia document output step is described in a data description language.

25. The multimedia document creation method according to claim 23, wherein the data description language is described in XML.

26. The multimedia document creating method according to claim 23, wherein the data description language is described in SMIL.

27. The multimedia document creation method according to claim 23, wherein the data description language is described in XHTML.

28. The multimedia document creating method according to claim 23, wherein the data description language is described in HTML.

29. The multimedia document creation method according to claim 23, wherein the data description language is described in SGML.

30. An apparatus using the method for creating a multimedia document according to any one of claims 1 to 29.

31. A storage medium using the multimedia document creation method according to claim 1. Description: