JP3704968B2

JP3704968B2 - Multimedia editing device

Info

Publication number: JP3704968B2
Application number: JP26335598A
Authority: JP
Inventors: 英清立花
Original assignee: Fuji Xerox Co Ltd; Fujifilm Business Innovation Corp
Current assignee: Fujifilm Business Innovation Corp
Priority date: 1998-09-17
Filing date: 1998-09-17
Publication date: 2005-10-12
Anticipated expiration: 2018-09-17
Also published as: JP2000099520A

Description

【０００１】
【発明の属する技術分野】
本発明は、マルチメディアを用いたプレゼンテーションの編集、加工を支援するマルチメディア編集装置に関する。
【０００２】
【従来の技術】
従来、マルチメディアを用いたプレゼンテーションを編集、加工する際には、すでに作成したプレゼンテーションを時系列に再生して、編集、加工の必要のある部分を検出し、その部分と同時に並行して再生されているマルチメディア情報が静止画であるか動画であるかなどの属性を考慮しながら編集を行っていた。
マルチメディアに関連する発明として、特開平６−１７６５４２号公報には、台本と音声をテキストに変換したものとの対応付けを行い、台本と映像とを関係づけることにより所定の映像部分を検索する方法が記載されている。
【０００３】
また、特開平５−６７１８４号公報には、セグメントで管理されているマルチメディアを削除する際に、あらかじめユーザがメディア同士の関連付けおよび属性（時刻シフト可否、切断可否、圧縮可否）の設定を行った情報を参照して、削除するマルチメディアに関連するメディアの属性を調べ、その属性に応じた編集処理を行う装置が記載されている。さらに、特開平７−１８２３６５号公報には、音声認識により会議中の会話からキーワードを抽出ておき、重要なシーンをキーワードで検索して会議録を作成する方法が記載されている。
【０００４】
【発明が解決しようとする課題】
マルチメディアを用いたプレゼンテーションを編集、加工するために編集個所を特定する場合には、作成したプレゼンテーションを時系列に再生して、編集、加工の必要のある部分を検出し、その部分の開始点と終了点とをマルチメディアデータの巻き戻し、早送り等を繰り返しながら特定するために、非常に長い時間を要してしまうという問題が生じる。また、編集部分を特定した後の編集作業も編集部分における複数のマルチメディア情報の属性を考慮しながら編集者が編集する必要があるために、編集者に非常に大きな負担をかけるという問題が生じる。
【０００５】
これに対して、特開平６−１７６５４２号公報に記載された方法によると、所定の映像部分を容易に検索できる。しかしながら、この公報には、映像を検索した後の編集処理に関してはまったく記述がなく、編集者の手間を軽減することができないという問題が生じる。また、検索できる映像部分は、台本と対応付けられている部分だけなので、台本に記載されていない発話に関する部分については検索することができないという問題が生じる。
【０００６】
また、特開平５−６７１８４号公報に記載された装置によると、編集段階において編集者に手間はかからないが、この装置により編集を行うためには、編集の前に、ユーザが映像などのメディアを再生しながら他のメディアと関連付けし、さらにメディアの属性を設定するということを行わなければならず現実的ではない。また、特開平７−１８２３６５号公報に記載された方法によると、キーワードによってシーンの検索を容易に行うことができるが、その後の編集には何ら寄与しないので、編集者の手間を軽減することができない。また、キーワードとなった部分以外については、検索することができないという問題が生じる。
【０００７】
本発明の目的は、プレゼンテーションの編集を短時間に、容易に、且つ適切に行うことのできるマルチメディア編集装置を提供することにある。
【０００８】
【課題を解決するための手段】
上記目的は、映像や音声のメディア情報を複数保持する情報保持手段と、プレゼンテーションで提示するメディア情報、メディア情報の提示位置、および提示時間を記述したシナリオファイルを保持するシナリオ保持手段と、シナリオファイルに記述された所定の音声のメディア情報から文字列を認識して、当該文字列のテキストデータを作成する音声認識手段と、文字列が発せられる時間領域を検出する時間領域検出手段と、テキストデータと、時間領域とを対応付けて保持するマルチメディア関連情報保持手段と、テキストデータを表示する表示手段と、編集範囲を特定する前記テキストデータを指示する指示手段と、時間領域において再生される所定の編集評価対象となるメディア情報について時間変化に関する属性情報を判定するマルチメディア属性判定手段と、指示されたテキストデータに対応する時間領域について、当該時間領域に再生される編集評価対象となるメディア情報についての属性情報に基づいてシナリオファイルを編集するシナリオ編集手段とを有することを特徴とするマルチメディア編集装置によって達成される。
【０００９】
【発明の実施の形態】
本発明の一実施の形態によるマルチメディア編集装置について説明する前に、マルチメディアを使用したプレゼンテーションについて簡単に説明する。図１は、マルチメディアを使用したプレゼンテーションの例を示している。
このプレゼンテーションにおいては、ディスプレイ画面１に、プレゼンテーションを行っている人（プレゼンター）の様子を表わしている動画像２や、プレゼンテーションにおいて説明される商品の動画４や、商品に関するグラフの静止画３が表示され、これ共に、スピーカ５からプレゼンターの説明の音声６が出力される。このように、マルチメディアを使用したプレゼンテーションによると、あたかも実際の会議室などでプレゼンテーションを聞いているかのような効果が得られる。
【００１０】
このようなマルチメディアを使用したプレゼンテーションにおいては、再生される映像２、映像３、映像４や音声６等はそれぞれマルチメディアデータ（メディア情報）として記憶され、これらメディア情報は、再生するメディア情報、メディア情報の再生位置、再生するタイミング、再生時の効果などが記述されたシナリオファイルに基づいて再生される。
シナリオファイルはプレゼンターにより予め作成され、たとえば、ネットワークなどを介した通信手段により各視聴者の情報処理装置に配布される。
【００１１】
本発明に係るマルチメディア編集装置は、上記のようなシナリオファイルを編集するための装置である。以下、本発明の一実施の形態によるマルチメディア編集装置を図２乃至図６を用いて説明する。まず、本発明の一実施の形態によるマルチメディア編集装置の概略の構成を図２を用いて説明する。
【００１２】
図２において、マルチメディア情報蓄積部１３は、映像、音声等の複数のメディア情報を蓄積する。シナリオ保持部１１は、プレゼンテーションに用いるメディア情報、そのメディア情報の再生時の位置、再生のタイミング（時刻、時間等）、再生時の効果等が記述されたシナリオファイル１１ａを保持する。
シナリオ解析部６は、シナリオファイル１１ａに記述された内容を解釈して、シナリオファイル１１ａに記述されたメディア情報をマルチメディア情報蓄積部１３から取り出して、記述されたタイミングおよび記述された処理方法等にしたがって、メディア情報が所定の音声のメディア情報であれば音声認識部１５に出力し、メディア情報が前記音声以外の編集評価対象のメディア情報であれば、マルチメディア属性判定部１４に出力する。ここで、所定の音声のメディア情報とは、後述するように編集範囲を特定するための基準となるテキストデータを抽出する音声のメディア情報であり、シナリオファイルに記述されている任意の音声のメディア情報であってもよく、ユーザにより指定された音声のメディア情報であってもよい。
【００１３】
音声認識部１５は、マルチメディア情報蓄積部１３から入力された音声のメディア情報中の音声データから文字列のテキストデータを作成し、当該テキストデータをマルチメディア関連情報保持部１６に出力する。また、音声認識部１５は、作成されたテキストデータに対応する音声データが再生される時間領域を検出して、マルチメディア関連情報保持部１６に出力する。本実施の形態では、音声認識部１５は、音声データの音節の区切り毎、すなわち、連続した有音部分毎にテキストデータを作成しており、各テキストデータを単位としてマルチメディア関連情報保持部１６に出力する。
【００１４】
マルチメディア属性判定部１４は、マルチメディア情報蓄積部１３から入力されたメディア情報に基づいて、音声認識部１５によって作成されたテキストデータに対応する音声データと同時に再生されるメディア情報の中の編集評価対象のメディア情報の属性情報を判定して、マルチメディア関連情報保持部１６に出力する。
ここで、編集評価対象のメディア情報とは、後述するように属性情報に応じて編集を考慮すべきメディア情報のことをいい、前記音声のメディア情報以外のメディア情報であってもよく、ユーザによって直接的に指定されたメディア情報であってもよく、あるいは、特定のメディア情報以外のメディア情報といったようにユーザに間接的に指定されたメディア情報であってもよい。
【００１５】
また、属性情報とは、メディア情報が時間と共に変化するものであるか否かに関する情報であり、たとえば、映像のメディア情報であれば、静止画像であるか、あるいは動画像であるかといった情報である。属性情報を判定する方法としては、たとえば、シナリオファイル１１ａ中にマルチメディア情報の属性情報が記述されている場合には、当該シナリオファイル１１ａの記述により判定する方法や、映像のメディア情報の場合には、一定時間毎に映像を取り出して、前出の映像との差分比較を行うことによって静止画であるか動画であるかを判定する方法等がある。
【００１６】
マルチメディア関連情報保持部１６は、音声認識部１５から入力されたテキストデータと、それに対応する音声データが再生される時間範囲と、マルチメディア属性判定部１４から入力された当該音声データと同時に再生される編集評価対象のメディア情報に関する属性情報と、を対応付けたマルチメディア関連情報を保持する。
表示・指示部１１は、マルチメディア関連情報保持部１６に保持されたテキストデータをディスプレイ装置１７ａに表示して、ペン型ポインタ１７ｂあるいはマウス装置１７ｃ等によりユーザから編集範囲としてテキストデータに対する選択指示を受け付け、シナリオ編集部１８に当該指示されたテキストデータに対応する編集範囲の編集処理を指示する。シナリオ編集部１２は、表示・指示部１１によって指示されたテキストデータに対応するマルチメディア関連情報保持部１６のマルチメディア関連情報に基づいてシナリオファイル１１ａを編集して新しいシナリオファイル１９を作成する。
【００１７】
ここで、特許請求の範囲にいう情報保持手段は、マルチメディア情報蓄積部１３によって構成され、シナリオ保持手段は、シナリオ保持部１１によって構成され、音声認識手段および時間領域検出手段は、音声認識部１５によって構成され、マルチメディア関連情報保持手段は、マルチメディア関連情報保持部１６によって構成され、表示手段および指示手段は、表示・指示部１７によって構成され、マルチメディア属性判定手段は、マルチメディア属性判定部１４によって構成され、シナリオ編集手段は、シナリオ編集部１８によって構成される。
【００１８】
次に、本マルチメディア編集装置によるマルチメディア関連情報を作成する処理動作を図３を用いて説明する。
処理が開始される（ステップＳ１０１）と、シナリオ解析部１２はシナリオ保持部１１からシナリオファイル１１ａを取り出して解析し、複数のメディア情報の再生時の位置、再生タイミング、表示効果などの情報を取り出す（ステップＳ１０２）。次いで、シナリオ解析部１２は取り出した情報に基づいて、所定の音声のメディア情報を音声認識部１５へ出力し、他のメディア情報のうち編集評価対象のメディア情報をマルチメディア属性判定部１４へ出力する（ステップＳ１０３）。
【００１９】
この後、音声認識部１５は音声のメディア情報が最後まで入力されたか否かを判定し（ステップＳ１０４）、最後まで入力されていないときには、現在入力されているメディア情報の部分の音節の開始点の時刻情報：ｔ１を獲得し（ステップＳ１０５）、当該部分が有音部であるか、あるいは無音部であるかを評価する（ステップＳ１０６）。
現在入力されている部分が有音部であれば、まだ次の音節に達していないと判断して、入力されている部分が無音部になるまで上記処理ステップＳ１０３〜Ｓ１０６を繰り返し行う。
【００２０】
一方、現在入力されている部分が無音部であれば、音節に達したと判断して、音声認識部１５は当該部分の終了点の時刻情報：ｔ２を獲得し（ステップＳ１０７）、当該音節によって区切られた有音部分の音声データに関して文字列を認識して、当該文字列のテキストデータを作成し、当該テキストデータと時刻情報ｔ１およびｔ２（時間領域）をマルチメディア関連情報保持部１６へ出力する（ステップＳ１０８）。次いで、マルチメディア属性判定部１４は、時刻ｔ１およびｔ２の間に再生される編集評価対象のメディア情報の属性の判定を行い、判定した結果の属性情報をマルチメディア関連情報保持部１６へ出力する（ステップＳ１０９）。例えば、編集評価対象のメディア情報のすべてが時間変化しないメディア情報（たとえば、静止画）であると判定した場合には、属性情報”０”を出力し、編集評価対象のメディア情報の少なくとも一つが時間変化するメディア情報（たとえば動画像）であると判定した場合には、属性情報”１”を出力する。
【００２１】
次いで、マルチメディア関連情報保持部１６は、音声認識部１５から入力されたテキストデータと、時刻情報ｔ１およびｔ２と、マルチメディア属性判定部１４から入力されたメディア情報の属性情報とを対応付けてマルチメディア関連情報として保持する（ステップＳ１１０）。そして、上記した処理（ステップＳ１０３〜Ｓ１１０）を繰り返し行い、ステップＳ１０４において音声のメディア情報が最後まで入力されていると判断された場合に処理を終了する（ステップＳ１１１）。
【００２２】
図４は、上記のマルチメディア関連情報を作成する処理動作を具体的に説明する図である。同図は、２つの属性評価対象（編集評価対象）となる映像データａおよび映像データｂと、属性評価対象（編集評価対象）とならないプレゼンターの映像が入っている映像データｃおよび編集の基準とするプレゼンターの音声データとから構成されているプレゼンテーションについて処理を行った例を示している。
【００２３】
上記の動作によると、領域１、領域２、領域３・・のように、音声データの有音部Ｐに対応するプレゼンテーションの時間領域が検出され、この領域における音声のテキストデータが作成される。また、これらの時間領域のうち編集評価対象となる映像データａおよび映像データｂが静止画の領域２および領域４は、削除することが可能な領域であることを表わす属性値”０”と判定され、その他の領域１、３、５は削除することができない領域を表わす属性値”１”と判定される。そして、これらの結果は図５に示すように、テキストデータと、テキストデータに対応する時間領域を示す開始時間および終了時間と、その時間領域における編集評価対象のメディア情報の属性情報とが対応付けられてマルチメディア関連情報保持部１６に保持される。
【００２４】
次に、本マルチメディア編集装置によるマルチメディアプレゼンテーションのシナリオファイルを編集する処理動作を図６を用いて説明する。
処理が開始される（ステップＳ２０１）と、表示・指示部１７は、マルチメディア関連情報保持部１６からテキストデータを取り出してディスプレイ１７ａに表示し（ステップＳ２０２、Ｓ２０３）、ペン型ポインタ１７ｂ、マウス１７ｃ等によるユーザから編集処理を終了する指示イベントや、編集する範囲の基準となるテキストデータの選択指示イベント（ステップＳ２０５）を受け付ける（ステップＳ２０４）。
編集を行わない指示イベントが選択された場合には、処理を終了する（ステップＳ２０６）。一方、テキストデータの選択指示イベントが選択された場合には、
シナリオ編集部１８がマルチメディア関連情報保持部１６に保持されたマルチメディア関連情報を参照し、選択されたテキストデータに対応付けられた属性情報をチェックする（ステップＳ２０７）。
【００２５】
属性情報が時間変化しないメディア情報（たとえば、静止画）を表している（本実施の形態では”０”）場合には、これらメディア情報は全て時間変化しないメディア情報であり、これらメディア情報を再生しなくてもプレゼンテーションの自然さは保たれるので、シナリオ編集部１８は、対応する時間領域におけるすべてメディア情報を再生しないようにシナリオファイルを編集する（ステップＳ２０８）。一方、属性情報が時間変化するメディア情報（たとえば、動画）を表している（本実施の形態では”１”）場合には、メディア情報を再生しないと前後のつながりが悪くなりプレゼンテーションが不自然になるので、シナリオ編集部１８は、テキストデータに対応する音声データのみについて再生しないようにシナリオファイルを編集する（ステップＳ２０９）。そして、上記の処理ステップＳ２０２〜Ｓ２０９をユーザにより終了のイベントが選択されるまで実行する。
【００２６】
このように、ユーザはマルチメディアプレゼンテーションを再生することなく、編集したい個所を容易且つ短時間に特定でき、編集したい個所におけるメディア情報を考慮しなくてもプレゼンテーションに支障を与えることがないようにシナリオファイルを編集することができる。
【００２７】
本発明は、上記実施の形態に限らず種々の変形が可能である。
例えば、上記実施の形態では、ユーザにより編集範囲の基準となるテキストデータが指定される前に、予めテキストデータに対応する時間領域の編集評価対象のメディア情報の属性情報を判定するようにして、ユーザによりテキストデータが選択された場合に短時間でシナリオファイルを編集できるようにしていたが、本発明はこれらに限られず、ユーザによりテキストデータが指定された後に、対応する時間領域のメディア情報の属性情報を判定して、シナリオファイルを編集するようにしてもよい。
【００２８】
また、上記実施の形態では、編集評価対象の複数のメディア情報をまとめた属性情報を検出し、当該属性情報に基づいて複数のメディアに関してシナリオファイルを編集するようにしていたが、本発明はこれに限られず、編集評価対象の各メディア情報ごとに属性情報を検出し、各属性情報に基づいて各メディア情報に関してシナリオファイルを編集するようにしてもよい。
【００２９】
【発明の効果】
以上の通り、本発明によれば、プレゼンテーションのシナリオファイルの編集を短時間に、容易に、且つ適切に行うことができる。
【００３０】
【図面の簡単な説明】
【図１】マルチメディアを使用したプレゼンテーションを説明する図である。
【図２】本発明の一実施の形態によるマルチメディア編集装置の構成を示す図である。
【図３】本発明の一実施の形態によるマルチメディア編集装置のマルチメディア関連情報を作成する処理動作のフローチャートである。
【図４】本発明の一実施の形態によるマルチメディア編集装置のマルチメディア関連情報を作成する過程を説明する図である。
【図５】本発明の一実施の形態によるマルチメディア編集装置のマルチメディア関連情報の一例を示す図である。
【図６】本発明の一実施の形態によるマルチメディア編集装置のマルチメディアプレゼンテーションのシナリオファイルを編集する処理動作のフローチャートである。
【符号の説明】
１１シナリオ保持部
１２シナリオ解析部
１３マルチメディア情報蓄積部
１４マルチメディア属性判定部
１５音声認識部
１６マルチメディア関連情報保持部
１７表示・入力部
１８シナリオ編集部[0001]
BACKGROUND OF THE INVENTION
The present invention relates to a multimedia editing apparatus that supports editing and processing of a presentation using multimedia.
[0002]
[Prior art]
Conventionally, when editing and processing a presentation using multimedia, the presentation that has already been created is played back in chronological order, and the part that needs to be edited and processed is detected and played back in parallel with that part. Editing was performed while considering attributes such as whether the multimedia information being a still image or a movie.
As an invention related to multimedia, Japanese Patent Laid-Open No. 6-176542 discloses that a script is associated with a sound converted into text, and a predetermined video portion is searched by associating the script with the video. A method is described.
[0003]
In JP-A-5-67184, when deleting multimedia managed by a segment, the user sets in advance the association between media and attributes (time shift enable / disable, disconnect enable / disable compressible). Describes an apparatus for referring to the information and checking the attribute of the media related to the multimedia to be deleted, and performing an editing process according to the attribute. Furthermore, Japanese Patent Laid-Open No. 7-182365 describes a method of extracting a keyword from a conversation during a conference by voice recognition, and searching for an important scene by the keyword to create a conference record.
[0004]
[Problems to be solved by the invention]
When editing locations for editing and processing multimedia presentations, the created presentation is played back in chronological order to detect the portion that needs to be edited and processed, and the starting point of that portion. Therefore, it takes a very long time to specify the end point while repeating the rewinding, fast-forwarding, and the like of the multimedia data. In addition, the editing work after specifying the editing part needs to be edited by taking into account the attributes of the plurality of multimedia information in the editing part. .
[0005]
On the other hand, according to the method described in JP-A-6-176542, a predetermined video portion can be easily searched. However, this publication has a problem that there is no description at all regarding the editing process after the video is searched, and the trouble of the editor cannot be reduced. Further, since the video portion that can be searched is only the portion associated with the script, there is a problem that it is not possible to search for a portion related to an utterance that is not described in the script.
[0006]
Further, according to the device described in Japanese Patent Laid-Open No. 5-67184, the editor does not require much trouble at the editing stage. However, in order to perform editing with this device, the user must insert media such as video before editing. It is impractical to associate with other media during playback and set media attributes. Further, according to the method described in Japanese Patent Laid-Open No. 7-182365, it is possible to easily search for a scene by using a keyword. However, since it does not contribute to the subsequent editing at all, the labor of the editor can be reduced. Can not. In addition, there is a problem in that it is impossible to search for portions other than the keyword.
[0007]
An object of the present invention is to provide a multimedia editing apparatus that can easily and appropriately edit a presentation in a short time.
[0008]
[Means for Solving the Problems]
The object is to provide information holding means for holding a plurality of video and audio media information, scenario holding means for holding a scenario file describing media information to be presented in a presentation, a presentation position of media information, and a presentation time, and a scenario file A voice recognition means for recognizing a character string from predetermined audio media information described in the above and creating text data of the character string; a time domain detection means for detecting a time domain in which the character string is emitted; and text data A multimedia-related information holding unit that holds the time region in association with each other, a display unit that displays text data, an instruction unit that specifies the text data that specifies an editing range, and a predetermined reproduction that is performed in the time region For determining the attribute information related to temporal changes for media information subject to editing evaluation A media attribute determination unit, and a scenario editing unit that edits a scenario file for the time region corresponding to the instructed text data based on the attribute information about the media information to be edited and evaluated that is reproduced in the time region. This is achieved by a multimedia editing device.
[0009]
DETAILED DESCRIPTION OF THE INVENTION
Before describing a multimedia editing apparatus according to an embodiment of the present invention, a presentation using multimedia will be briefly described. FIG. 1 shows an example of a presentation using multimedia.
In this presentation, a display screen 1 displays a moving image 2 representing a person (presenter) giving a presentation, a moving image 4 of a product explained in the presentation, and a still image 3 of a graph related to the product. In both cases, the presenter's audio 6 is output from the speaker 5. Thus, according to the presentation using multimedia, it is possible to obtain an effect as if listening to the presentation in an actual conference room.
[0010]
In such a presentation using multimedia, the reproduced video 2, video 3, video 4, audio 6, etc. are stored as multimedia data (media information), respectively. Playback is based on a scenario file that describes the playback position of media information, playback timing, playback effects, and the like.
The scenario file is created in advance by a presenter, and is distributed to each viewer's information processing apparatus by communication means via a network, for example.
[0011]
The multimedia editing apparatus according to the present invention is an apparatus for editing the scenario file as described above. Hereinafter, a multimedia editing apparatus according to an embodiment of the present invention will be described with reference to FIGS. First, a schematic configuration of a multimedia editing apparatus according to an embodiment of the present invention will be described with reference to FIG.
[0012]
In FIG. 2, a multimedia information storage unit 13 stores a plurality of media information such as video and audio. The scenario holding unit 11 holds a scenario file 11a in which media information used for presentation, a position at the time of reproducing the media information, a reproduction timing (time, time, etc.), an effect at the time of reproduction, and the like are described.
The scenario analysis unit 6 interprets the contents described in the scenario file 11a, extracts the media information described in the scenario file 11a from the multimedia information storage unit 13, and describes the described timing, the described processing method, and the like. Accordingly, if the media information is media information of a predetermined voice, it is output to the voice recognition unit 15, and if the media information is media information subject to editing evaluation other than the voice, it is output to the multimedia attribute determination unit 14. Here, the predetermined audio media information is audio media information for extracting text data as a reference for specifying the editing range, as will be described later, and any audio media described in the scenario file. It may be information, or may be audio media information designated by the user.
[0013]
The voice recognition unit 15 creates text data of a character string from the voice data in the voice media information input from the multimedia information storage unit 13, and outputs the text data to the multimedia related information holding unit 16. Further, the voice recognition unit 15 detects a time region in which the voice data corresponding to the created text data is reproduced, and outputs the time domain to the multimedia related information holding unit 16. In the present embodiment, the voice recognition unit 15 creates text data for each syllable break of voice data, that is, for each continuous voiced portion, and the multimedia related information holding unit 16 in units of each text data. Output to.
[0014]
Based on the media information input from the multimedia information storage unit 13, the multimedia attribute determination unit 14 edits media information that is reproduced simultaneously with the audio data corresponding to the text data created by the audio recognition unit 15. The attribute information of the media information to be evaluated is determined and output to the multimedia related information holding unit 16.
Here, the media information subject to editing evaluation refers to media information that should be considered for editing according to attribute information, as will be described later, and may be media information other than the audio media information. It may be media information directly designated, or media information indirectly designated by the user, such as media information other than specific media information.
[0015]
The attribute information is information regarding whether or not the media information changes with time. For example, in the case of video media information, it is information such as whether the image is a still image or a moving image. is there. As a method of determining attribute information, for example, when attribute information of multimedia information is described in the scenario file 11a, a method of determining by description of the scenario file 11a, or in the case of video media information There is a method of determining whether the image is a still image or a moving image by taking out the video at regular intervals and comparing the difference with the previous video.
[0016]
The multimedia related information holding unit 16 reproduces the text data input from the voice recognition unit 15, the time range in which the corresponding audio data is reproduced, and the voice data input from the multimedia attribute determination unit 14. Multimedia-related information in which attribute information related to the media information to be edited is associated.
The display / instruction unit 11 displays the text data held in the multimedia related information holding unit 16 on the display device 17a, and issues a selection instruction for the text data as an editing range from the user using the pen-type pointer 17b or the mouse device 17c. Accept and instruct the scenario editing unit 18 to edit the editing range corresponding to the instructed text data. The scenario editing unit 12 edits the scenario file 11a based on the multimedia related information of the multimedia related information holding unit 16 corresponding to the text data instructed by the display / instruction unit 11, and creates a new scenario file 19.
[0017]
Here, the information holding means referred to in the claims is configured by the multimedia information storage unit 13, the scenario holding unit is configured by the scenario holding unit 11, and the voice recognition unit and the time domain detection unit are the voice recognition unit. 15, the multimedia related information holding unit is configured by the multimedia related information holding unit 16, the display unit and the instruction unit are configured by the display / instruction unit 17, and the multimedia attribute determination unit is configured by the multimedia attribute determination unit. The scenario editing unit is configured by the determination unit 14 and the scenario editing unit 18 is configured.
[0018]
Next, processing operations for creating multimedia related information by the multimedia editing apparatus will be described with reference to FIG.
When the processing is started (step S101), the scenario analysis unit 12 extracts and analyzes the scenario file 11a from the scenario holding unit 11, and extracts information such as the position, playback timing, and display effect at the time of playback of a plurality of media information. (Step S102). Next, the scenario analysis unit 12 outputs predetermined audio media information to the audio recognition unit 15 based on the extracted information, and outputs media information to be edited and evaluated among other media information to the multimedia attribute determination unit 14. (Step S103).
[0019]
Thereafter, the voice recognition unit 15 determines whether or not the voice media information has been input to the end (step S104). If the voice media information has not been input to the end, the start point of the syllable of the part of the currently input media information is determined. Time information: t1 is acquired (step S105), and it is evaluated whether the part is a sound part or a silence part (step S106).
If the currently input part is a sound part, it is determined that the next syllable has not yet been reached, and the above processing steps S103 to S106 are repeated until the input part becomes a silent part.
[0020]
On the other hand, if the currently input part is a silent part, it is determined that a syllable has been reached, and the speech recognition unit 15 acquires time information t2 of the end point of the part (step S107). Recognize a character string with respect to the voice data of the separated sound part, create text data of the character string, and output the text data and time information t1 and t2 (time domain) to the multimedia related information holding unit 16 (Step S108). Next, the multimedia attribute determination unit 14 determines the attribute of the media information to be edited and evaluated that is played between times t1 and t2, and outputs the attribute information of the determination result to the multimedia related information holding unit 16. (Step S109). For example, when it is determined that all of the media information subject to editing evaluation is media information that does not change over time (for example, still images), attribute information “0” is output, and at least one piece of media information subject to editing evaluation is If it is determined that the media information changes with time (for example, a moving image), attribute information “1” is output.
[0021]
Next, the multimedia related information holding unit 16 associates the text data input from the speech recognition unit 15, the time information t 1 and t 2, and the attribute information of the media information input from the multimedia attribute determination unit 14. It is held as multimedia related information (step S110). Then, the above-described processing (steps S103 to S110) is repeated, and when it is determined in step S104 that the audio media information has been input to the end, the processing ends (step S111).
[0022]
FIG. 4 is a diagram for specifically explaining the processing operation for creating the multimedia related information. The figure shows video data a and video data b that are two attribute evaluation targets (editing evaluation targets), video data c that contains a video of a presenter that is not an attribute evaluation target (editing evaluation target), and editing standards. The example which processed about the presentation comprised from the audio | voice data of the presenter who performs is shown.
[0023]
According to the above operation, the presentation time area corresponding to the sound part P of the audio data is detected as in the areas 1, 2, 3,..., And the audio text data in this area is created. In addition, among these time areas, the video data a and video data b to be edited and evaluated are determined to have the attribute value “0” indicating that the still image areas 2 and 4 are areas that can be deleted. The other areas 1, 3, and 5 are determined to be attribute values “1” representing areas that cannot be deleted. Then, as shown in FIG. 5, these results correspond to the text data, the start time and end time indicating the time area corresponding to the text data, and the attribute information of the media information to be edited in that time area. And stored in the multimedia related information storage unit 16.
[0024]
Next, a processing operation for editing a scenario file for multimedia presentation by the multimedia editing apparatus will be described with reference to FIG.
When the processing is started (step S201), the display / instruction unit 17 extracts the text data from the multimedia related information holding unit 16 and displays it on the display 17a (steps S202 and S203), the pen pointer 17b, and the mouse 17c. An instruction event for ending the editing process or a text data selection instruction event (step S205) serving as a reference for the editing range is received from the user (step S204).
If an instruction event not to be edited is selected, the process ends (step S206). On the other hand, when the text data selection instruction event is selected,
The scenario editing unit 18 refers to the multimedia related information held in the multimedia related information holding unit 16 and checks the attribute information associated with the selected text data (step S207).
[0025]
When the attribute information represents media information that does not change with time (for example, a still image) (in this embodiment, “0”), the media information is all media information that does not change with time, and the media information is reproduced. Even if not, the naturalness of the presentation is maintained, and the scenario editing unit 18 edits the scenario file so as not to reproduce all the media information in the corresponding time domain (step S208). On the other hand, if the attribute information represents time-varying media information (for example, a moving image) (“1” in the present embodiment), the media information is not played back and the connection between the front and rear becomes poor and the presentation becomes unnatural. Therefore, the scenario editing unit 18 edits the scenario file so as not to reproduce only the audio data corresponding to the text data (step S209). Then, the above processing steps S202 to S209 are executed until an end event is selected by the user.
[0026]
In this way, the user can easily and quickly specify the part to be edited without playing the multimedia presentation, and the scenario so that the presentation is not hindered without considering the media information at the part to be edited. You can edit the file.
[0027]
The present invention is not limited to the above embodiment, and various modifications can be made.
For example, in the above embodiment, before the text data as the reference of the editing range is designated by the user, the attribute information of the media information to be edited and evaluated in the time domain corresponding to the text data is determined in advance. The scenario file can be edited in a short time when the text data is selected by the user. However, the present invention is not limited to this, and after the text data is designated by the user, the media information in the corresponding time domain can be edited. The scenario information may be edited by determining the attribute information.
[0028]
Further, in the above embodiment, attribute information including a plurality of pieces of media information to be edited and evaluated is detected, and a scenario file is edited for a plurality of media based on the attribute information. However, the present invention is not limited thereto, and attribute information may be detected for each piece of media information to be edited and evaluated, and the scenario file may be edited for each piece of media information based on each piece of attribute information.
[0029]
【The invention's effect】
As described above, according to the present invention, it is possible to easily and appropriately edit a scenario file for a presentation in a short time.
[0030]
[Brief description of the drawings]
FIG. 1 is a diagram illustrating a presentation using multimedia.
FIG. 2 is a diagram showing a configuration of a multimedia editing apparatus according to an embodiment of the present invention.
FIG. 3 is a flowchart of a processing operation for creating multimedia related information of the multimedia editing apparatus according to the embodiment of the present invention;
FIG. 4 is a diagram illustrating a process of creating multimedia related information in the multimedia editing apparatus according to the embodiment of the present invention.
FIG. 5 is a diagram illustrating an example of multimedia related information of the multimedia editing apparatus according to the embodiment of the present invention.
FIG. 6 is a flowchart of a processing operation for editing a multimedia presentation scenario file of the multimedia editing apparatus according to the embodiment of the present invention;
[Explanation of symbols]
DESCRIPTION OF SYMBOLS 11 Scenario holding part 12 Scenario analysis part 13 Multimedia information storage part 14 Multimedia attribute determination part 15 Speech recognition part 16 Multimedia related information holding part 17 Display / input part 18 Scenario editing part

Claims

Information holding means for holding a plurality of video and audio media information;
A scenario holding means for holding a scenario file describing media information to be presented in a presentation, a presentation position of the media information, and a presentation time;
Voice recognition means for recognizing a character string from predetermined audio media information described in the scenario file and creating text data of the character string;
Time domain detection means for detecting a time domain in which the character string is emitted;
Multimedia-related information holding means for holding the text data and the time region in association with each other;
Display means for displaying the text data;
Instruction means for instructing the text data for specifying the editing range;
Multimedia attribute determination means for determining attribute information related to time change for media information to be subjected to predetermined edit evaluation in the time domain;
Scenario editing means for editing a scenario file for the time area corresponding to the instructed text data based on the attribute information about the media information to be edited and evaluated that is reproduced in the time area. Features multimedia editing device.

The multimedia editing apparatus according to claim 1, wherein
The multimedia-related information holding unit holds the attribute information about the media information to be edited and evaluated that is reproduced in the time domain in association with the time domain. apparatus.

The multimedia editing apparatus according to claim 1 or 2,
The multimedia attribute determining means determines attribute information as to whether the image is a moving image or a still image by detecting a change in the image every predetermined time for the media information of the image. apparatus.

The multimedia editing apparatus according to claim 1 or 2,
The scenario file includes the attribute information related to the media information,
The multimedia editing apparatus, wherein the multimedia attribute determining means determines attribute information of the media information based on the scenario file.

The multimedia editing device according to any one of claims 1 to 4,
The voice recognition means converts the continuous voiced portion in the voice media information into text data as a unit,
The multimedia-related information holding means associates the text data with the time domain in units of text data.

The multimedia editing device according to any one of claims 1 to 5,
The scenario editing means edits a scenario file so as not to reproduce the time domain portion of all media information if all of the media information to be edited and evaluated has attribute information that does not change over time. A multimedia editing device characterized by this.

The multimedia editing device according to any one of claims 1 to 5,
The scenario editing means edits the scenario file so that only the audio part corresponding to the text data is not reproduced if at least one of the media information to be edited is subject to attribute information that changes with time. A multimedia editing apparatus characterized by that.