JP2005100415A

JP2005100415A - Multimedia print driver dialogue interface

Info

Publication number: JP2005100415A
Application number: JP2004278355A
Authority: JP
Inventors: Jonathan J Hull; ジェーハルジョナサン; Jamey Graham; グラハムジャメイ; Peter E Hart; イーハートピーター; Piasoru Kurt; ピアソルカート
Original assignee: Ricoh Co Ltd
Current assignee: Ricoh Co Ltd
Priority date: 2003-09-25
Filing date: 2004-09-24
Publication date: 2005-04-14

Abstract

<P>PROBLEM TO BE SOLVED: To provide a user interface allowing a user to interact with a media contents analysis, a media expression generating process. <P>SOLUTION: The media print interface makes it possible for the user to interact with a multimedia transformation process, and also to format multimedia data in order to generate expressions of multimedia data. A user interface makes it possible for the user to interact with media content analysis and media expression generation. A media analysis software module receives commands for media contents analysis from the user by the user interface, and the media analysis software module analyzes and recognizes features of the media contents such as, for example, faces, contents of conversations, texts or the like. The media expressions can be generated in one of a paper-based format, a digital format, or any another expression format. The user interface includes various fields capable of displaying the media contents and modifying the generated media expressions by the user. <P>COPYRIGHT: (C)2005,JPO&NCIPI

Description

本発明は、マルチメディア印刷インタフェースを提供するためのシステム及び方法に関する。特に、本発明は、ユーザがマルチメディアデータの表現を生成するためのマルチメディアデータをフォーマットすることを可能にする印刷ドライバダイアログインタフェースを提供するためのシステム及び方法に関する。 The present invention relates to a system and method for providing a multimedia printing interface. In particular, the present invention relates to a system and method for providing a print driver dialog interface that allows a user to format multimedia data for generating a representation of the multimedia data.

今日の近代的なシステムにおけるプリンタはマルチメディア文書を生成するようにデザインされていない。現在、ペーパーフォーマットか又はデジタルフォーマットのどちらかにおいて、マルチメディアコンテンツの容易に読み取り可能な表現のための効果的な方法は何ら存在してしない。マルチメディアの情報にアクセスし且つそれ操作するために幾つかの異なる技術及びツールを利用することが可能である（例えば、既存のマルチメディアプレーヤ）。しかしながら、それらはいずれも、ユーザが容易に調べ且つユーザがマルチメディアコンテンツへのアクセスを得ることができるマルチメディア文書を生成するオプションをユーザに提供することができない。 Printers in today's modern systems are not designed to generate multimedia documents. Currently, there is no effective method for easily readable representation of multimedia content, either in paper format or digital format. Several different technologies and tools can be utilized to access and manipulate multimedia information (eg, existing multimedia players). However, none of them can provide the user with an option to generate a multimedia document that the user can easily examine and gain access to the multimedia content.

今日の近代的なシステムにおけるプリンタは又、一般に、マルチメディアコンテンツ又は印刷コンテンツとの対話を容易にするようにデザインされていない。標準的なプリンタダイアログボックスは、例えば、印刷するページ数、実行されるコピー数等の印刷ジョブにおける幾つかの一般的なフォーマット化オプションをユーザに提供する。しかしながら、近代的なオペレーティングシステムにおけるプリンタドライバは、対話式の情報収集を容易にするようにデザインされていない。印刷ジョブは他のプリンタにリダイレクトされる、又は印刷プロトコルはそのような対話式セッションを認めないため、オペレーティングシステムはユーザとの対話を促進しない。 Printers in today's modern systems are also generally not designed to facilitate interaction with multimedia content or printed content. The standard printer dialog box provides the user with some common formatting options in a print job, such as the number of pages to print, the number of copies to be performed, etc. However, printer drivers in modern operating systems are not designed to facilitate interactive information gathering. The operating system does not facilitate user interaction because the print job is redirected to another printer, or the print protocol does not allow such interactive sessions.

プリンタの対話におけるこのような制限のために、ユーザは、標準的な印刷においてより詳細な印刷性能を定義することができない。更に、そのような印刷能力は、現在、利用可能でないため、マルチメディアコンテンツに関する全てにおいていずれの印刷性能を規定することができない。それ故、ユーザは、印刷に対して興味がもてるマルチメディアコンテンツのセグメントを選択するために現在の印刷ダイアログボックスを用いることができない。現在の印刷ダイアログボックスは又、いずれのマルチメディアコンテンツをユーザがプレビューすることを可能にしていない。更に、興味のある特定の特徴のための冗長なマルチメディアセグメントによりユーザが検索するいずれの方法も存在しない。例えば、ユーザは、特定のトピックをカバーするコンテンツについての新しいセグメントにより、現在のところ、検索することができない、又は、ユーザは、新しいセグメントにおける特定の顔又はイベントを検索することができない。更に、マルチメディアコンテンツの選択されたセグメントに対する印刷フォーマットを定義するための方法は存在せず、直接、印刷ダイアログボックスにより印刷フォーマットを調べる又は修正する方法は存在しない。 Because of these limitations in printer interaction, the user cannot define more detailed print performance in standard printing. Furthermore, since such printing capabilities are not currently available, it is not possible to define any printing performance for everything related to multimedia content. Therefore, the user cannot use the current print dialog box to select a segment of multimedia content that is of interest to printing. The current print dialog box also does not allow the user to preview any multimedia content. Furthermore, there is no way for the user to search with redundant multimedia segments for the particular feature of interest. For example, a user cannot currently search for a new segment for content that covers a particular topic, or the user cannot search for a specific face or event in the new segment. Furthermore, there is no way to define a print format for a selected segment of multimedia content, and there is no way to examine or modify the print format directly through the print dialog box.

従って、先行技術において認識された制約を克服する、マルチメディア表現の生成に対してユーザが対話し且つ制御することを可能にするシステム及び方法が必要とされている。 Accordingly, there is a need for systems and methods that allow a user to interact and control the generation of multimedia representations that overcome the limitations recognized in the prior art.

本発明は、メディアコンテンツ分析処理及びメディア表現生成処理とユーザが対話することを可能にするユーザインタフェースを提供するシステム及び方法を用いて、先行技術の制限と欠陥とを克服する。本発明のシステムは、メディアコンテンツ分析とメディア表現生成とをユーザが制御することを可能にするためにユーザインタフェースを含む。メディア分析ソフトウェアモジュールはメディアコンテンツの特徴を分析し且つ認識する。更に、そのシステムは、ユーザからの命令を受信する出力装置ドライバモジュールを含むことができ、メディアコンテンツ分析及びメディア表現生成を促進する。例えば、メディアソフトウェア分析モジュールは、例えば、顔、発話、テキスト等の特徴を認識する。そのシステムは又、メディア表現を生成するための拡張出力装置を含むことができる。処理ロジックは、マルチメディア表現の生成をユーザが制御することを可能にするユーザインタフェースの表示を管理する。処理ロジックは又、印刷可能マルチメディア表現の生成を成業する。その表現は、ペーパーベースのフォーマット、デジタルフォーマット又はいずれの他の表現フォーマットにおいて生成されることができる。ユーザインタフェースは、ユーザがメディアコンテンツを見ることができ、生成されたメディア表現を修正することができる多くのフィールドを含む。 The present invention overcomes the limitations and deficiencies of the prior art using a system and method that provides a user interface that allows a user to interact with a media content analysis process and a media representation generation process. The system of the present invention includes a user interface to allow a user to control media content analysis and media representation generation. The media analysis software module analyzes and recognizes the characteristics of the media content. In addition, the system can include an output device driver module that receives instructions from the user to facilitate media content analysis and media representation generation. For example, the media software analysis module recognizes features such as faces, utterances, and texts. The system can also include an enhanced output device for generating the media representation. Processing logic manages the display of the user interface that allows the user to control the generation of the multimedia representation. Processing logic also implements the generation of printable multimedia representations. The representation can be generated in a paper-based format, a digital format, or any other representation format. The user interface includes a number of fields that allow a user to view media content and modify the generated media representation.

本発明の方法は、メディアデータ分析とメディア表現生成とを制御するためのユーザインタフェースと対話する段階を含む。その方法は、メディア表現生成のためのメディアデータの特徴を分析する段階と、メディアデータ分析を促進する段階と、命令を受信すること及びメディア表現パラメータに関する命令を送信することによりメディア表現生成を促進する段階とを更に含む。 The method of the present invention includes interacting with a user interface for controlling media data analysis and media representation generation. The method facilitates media representation generation by analyzing media data characteristics for media representation generation, facilitating media data analysis, receiving instructions and sending instructions regarding media representation parameters. Further comprising the step of:

ユーザがマルチメディア表現生成と対話することを可能にするグラフィックユーザインタフェース又は印刷ドライバダイアログインタフェースを提供するためのシステム及び方法について説明する。本発明の実施形態に従って、マルチメディア文書において記憶されることが可能であるマルチメディア情報を表示するグラフィックユーザインタフェースが提供される。本発明の教示に従って、そのインタフェースは、マルチメディア文書において記憶されたマルチメディア情報によりユーザが操作することを可能にする。 Systems and methods for providing a graphic user interface or print driver dialog interface that allow a user to interact with multimedia representation generation are described. In accordance with an embodiment of the present invention, a graphical user interface is provided for displaying multimedia information that can be stored in a multimedia document. In accordance with the teachings of the present invention, the interface allows a user to operate with multimedia information stored in a multimedia document.

本発明の目的のために、用語“メディア”、“マルチメディア”、“マルチメディアコンテンツ”、“マルチメディアデータ”又は“マルチメディア情報”は、テキスト情報、グラフィック情報、アニメーション情報、音声（オーディオ）情報、映像情報、スライド情報、ホワイトボード画像情報及び他のタイプの情報のいずれか１つ又はそれらの組み合わせをいう。例えば、テレビジョン放送の映像記録は、映像情報と音声情報とから構成されることが可能である。特定の例においては、映像記録は又、クローズドキャプションド（ＣＣ：ｃｌｏｓｅｄｃａｐｔｉｏｎｅｄ）テキスト情報から構成され、そのＣＣテキスト情報は映像情報に関連する要素から構成され、多くの場合、映像記録の音声部分に含まれる発話の正確な表現である。マルチメディア情報は又、１つ又はそれ以上のオブジェクトから構成される情報を参照するために用いられ、それらオブジェクトは異なるタイプの情報を含む。例えば、マルチメディア情報に含まれるマルチメディアオブジェクトは、テキスト情報、グラフィック情報、アニメーション情報、音声情報、映像情報、スライド情報、ホワイトボード画像情報及び他のタイプの情報から構成されることが可能である。 For the purposes of the present invention, the terms “media”, “multimedia”, “multimedia content”, “multimedia data” or “multimedia information” refer to text information, graphic information, animation information, audio (audio). Information, video information, slide information, whiteboard image information, and other types of information, or a combination thereof. For example, video recording of a television broadcast can be composed of video information and audio information. In a particular example, the video recording is also composed of closed captioned (CC) text information, which is composed of elements related to the video information, often the audio portion of the video recording. Is an accurate representation of the utterances contained in. Multimedia information is also used to reference information composed of one or more objects, which contain different types of information. For example, a multimedia object included in the multimedia information can be composed of text information, graphic information, animation information, audio information, video information, slide information, whiteboard image information, and other types of information. .

本発明の目的のために、用語“プリント”又は“プリンティング”は、あるタイプの媒体への印刷をいうとき、プリンティング、書き込み、描画、インプリンティング、エンボシング、デジタルフォーマットの生成及び他のタイプのデータ表現の生成を含むことを意図している。又、本発明の目的のために、システムにより生成される出力は、“メディア表現”、“マルチメディア文書”、“マルチメディア表現”、“文書”、“ペーパー文書”若しくは“映像ペーパー”又は“オーディオペーパー”をいう。単語文書及びペーパーが以上の用語で呼ばれる一方、本発明におけるシステムの出力は、例えば、ペーパー媒体のような物理的媒体等に限定されない。それに代えて、上記の用語は、ある有形の媒体に固定されるいずれの出力をいうことができる。一部の実施形態においては、本発明のシステムの出力は、物理的ペーパー文書においてプリントされたマルチメディアコンテンツの表現とすることができる。ペーパーフォーマットにおいて、マルチメディア文書は、ペーパーの高解像度と可搬性を利用し、マルチメディア情報の読み取り可能な表現を提供する。本発明の教示するところに従って、マルチメディア文書は又、マルチメディア情報を選択し、検索し、それにアクセスすることが可能である。他の実施形態においては、システムは、デジタルフォーマット又はある他の有形の媒体に出力することができる。更に、本発明の出力については、デジタルフォーマットにおけるマルチメディア情報を記憶するいずれの記憶ユニット（例えば、ファイル）を引き合いに出すことができる。マルチメディア情報を記憶するために、種々の異なるフォーマットを用いることが可能である。それらのフォーマットは、ＭＰＥＧフォーマット（例えば、ＭＰＥＧ１、ＭＰＥＧ２、ＭＰＥＧ４、ＭＰＥＧ７等）、ＭＰ３フォーマット、ＳＭＩＬフォーマット、ＨＴＭＬ＋ＴＩＭＥフォーマット、ＷＭＦ（Ｗｉｎｄｏｗｓ（登録商標）ＭｅｄｉａＦｏｒｍａｔ）、ＲＭ（ＲｅａｌＭｅｄｉａ）フォーマット、Ｑｕｉｃｋｔｉｍｅフォーマット、Ｓｈｏｃｋｗａｖｅフォーマット、種々のストリーミングメディアフォーマット、エンジニアリング共同体、専用及び慣用フォーマット並びに他のフォーマットにより発達したフォーマットを含む。 For the purposes of the present invention, the term “print” or “printing” refers to printing on one type of media, printing, writing, drawing, imprinting, embossing, digital format generation and other types of data. It is intended to include the generation of expressions. Also, for the purposes of the present invention, the output generated by the system is “media representation”, “multimedia document”, “multimedia representation”, “document”, “paper document” or “video paper” or “ “Audio paper”. While word documents and paper are referred to in the above terms, the output of the system in the present invention is not limited to physical media such as paper media, for example. Instead, the above term can refer to any output that is fixed to some tangible medium. In some embodiments, the output of the system of the present invention can be a representation of multimedia content printed in a physical paper document. In paper format, multimedia documents take advantage of the high resolution and portability of paper to provide a readable representation of multimedia information. In accordance with the teachings of the present invention, a multimedia document can also select, retrieve, and access multimedia information. In other embodiments, the system can output to a digital format or some other tangible medium. Further, for the output of the present invention, any storage unit (eg, file) that stores multimedia information in digital format can be referenced. A variety of different formats can be used to store multimedia information. Those formats are MPEG format (for example, MPEG1, MPEG2, MPEG4, MPEG7, etc.), MP3 format, SMIL format, HTML + TIME format, WMF (Windows (registered trademark) Media Format), RM (Real Media) format, Quicktime format, Includes formats developed by the Shockwave format, various streaming media formats, engineering communities, proprietary and conventional formats, and other formats.

下の説明においては、説明目的のために、多数の具体的な詳細を示し、本発明の理解のために提供している。しかしながら、それらの具体的な詳細がなくとも、本発明を実施ですることができることは、当業者には理解できるであろう。他の例においては、本発明を分かり難くすることを回避するように、ブロック図の方式で構造及び装置を示している。
例えば、本発明の特定の特徴については、主に、映像コンテンツに関連して説明する。しかしながら、たとえ、映像情報関連する特徴のみを説明するとしても、本発明の特徴は、音声コンテンツを含むいずれのタイプのメディアコンテンツに適用することができる。 In the following description, for purposes of explanation, numerous specific details are set forth in order to provide an understanding of the present invention. However, one of ordinary skill in the art appreciates that the invention may be practiced without these specific details. In other instances, structures and devices are shown in block diagram form in order to avoid obscuring the present invention.
For example, certain features of the present invention will be described primarily in the context of video content. However, the features of the present invention can be applied to any type of media content including audio content, even if only the features related to video information are described.

“１つの実施形態”又は“実施形態”の明確化における基準は、実施形態に関連して述べられる具体的な特徴、構成、特性が本発明の少なくとも１つの実施形態に含まれることを意味している。明細書の種々の部分における“１つの実施形態において”との表現がある場合、必ずしも、全てが同じ実施形態を参照しない。 Criteria in defining “one embodiment” or “embodiment” mean that a particular feature, configuration, or characteristic described in connection with the embodiment is included in at least one embodiment of the invention. ing. Where the phrase “in one embodiment” in various parts of the specification, there is not necessarily all referring to the same embodiment.

ここで、図１を参照するに、マルチメディア表現を生成するためにユーザがマルチメディアデータをフォーマットすることを可能にするグラフィクユーザインタフェースを提供する例示としてのシステム１００が示されている。この実施形態において、マルチメディア表現を生成するための拡張出力装置又はプリンタ１０２が示されている。プリンタ１０２は音声データ又は映像データのようなマルチメディアデータを表示し、そのマルチメディアデータを、ユーザは管理し、ユーザインタフェース又は印刷ドライバダイアログインタフェース（ＰＤＤＩ）１２２により編集することができる。用語“印刷ドライバダイアログインタフェース”又は“ＰＤＤＩ”はグラフィックユーザインタフェースを言うために用いられる一方、グラフィクユーザインタフェースは、プリンタに限定されるものではなく、下で説明する、機能を提供するいずれのグラフィックユーザインタフェースであることが可能である。ＰＤＤＩ１２２において表示されるマルチメディア情報は、システム１００にアクセス可能であるマルチメディア文書において記憶されることが可能である。マルチメディア情報は、システム１００に、直接、記憶されることが可能であり、マルチメディア情報が接続１４０を介してシステム１００によりアクセスされることが可能である外部の記憶装置又はサーバ（図示せず）において記憶された情報であることが可能である。 Now referring to FIG. 1, an exemplary system 100 that provides a graphical user interface that allows a user to format multimedia data to generate a multimedia representation is shown. In this embodiment, an extended output device or printer 102 for generating a multimedia representation is shown. The printer 102 displays multimedia data, such as audio data or video data, which the user can manage and edit via a user interface or print driver dialog interface (PDDI) 122. While the term “print driver dialog interface” or “PDDI” is used to refer to a graphic user interface, the graphic user interface is not limited to a printer, but any graphic user that provides the functionality described below. It can be an interface. The multimedia information displayed on the PDDI 122 can be stored in a multimedia document that is accessible to the system 100. The multimedia information can be stored directly in the system 100, and an external storage device or server (not shown) where the multimedia information can be accessed by the system 100 via the connection 140. ) Can be stored information.

他の実施形態において、マルチメディア文書に代えて、システム１００は、マルチメディア情報ソースからマルチメディア情報（例えば、ストリーミングメディア信号、ケーブル信号等）のストリームを受信することが可能である。本発明の実施形態に従って、システム１００は、マルチメディア文書においてマルチメディア情報信号を記憶し、次いで、マルチメディア情報を表示するインタフェース１２２を生成する。システム１００にマルチメディア情報を供給することができるソースは、例えば、テレビジョン、テレビジョン放送受信器、ケーブル受信器、ビデオレコーダ、デジタルビデオレコーダ、携帯デジタル端末（ＰＤＡ）等を含む。例えば、マルチメディア情報のソースは、マルチメディア放送信号を受信し且つシステム１００にその信号を送信するように構成されるテレビジョンとして具現化されることが可能である。この例においては、情報ソースは、システム１００にライブのテレビジョン供給情報を供給するテレビジョン受信器／アンテナであることが可能である。情報ソースは又、システム１００に記録された映像及び／又は音声ストリームを供給する、例えば、ビデオレコーダ／プレーヤ、ＤＶＤプレーヤ、ＣＤプレーヤ等の装置であることが可能である。他の実施形態においては、情報のソースは、システム１００に捕捉されたプレゼンテーション又はミーティングの情報のストリームを供給することができる、プレゼンテーション又はミーティングレコーダであることが可能である。更に、マルチメディア情報のソースは、外部ソースからのマルチメディア情報を捕捉又は受信し、次いで、更なる処理のためにシステム１００に捕捉されたマルチメディア情報を供給する（例えば、無線リンクにより）ように構成された受信器（例えば、衛星放送用アンテナ又はケーブル受信器）であることが可能である。マルチメディアコンテンツは、例えば、ＲｅａｌＰｌａｙｅｒ（登録商標）、ＭｉｃｒｏｓｏｆｔＷｉｎｄｏｗｓ（登録商標）ＭｅｄｉａＰｌａｙｅｒ等の専用の又はカスタマイズされたマルチメディアプレーヤから発せられることができる。 In other embodiments, instead of a multimedia document, the system 100 can receive a stream of multimedia information (eg, streaming media signal, cable signal, etc.) from a multimedia information source. In accordance with an embodiment of the present invention, the system 100 stores the multimedia information signal in the multimedia document and then generates an interface 122 that displays the multimedia information. Sources that can provide multimedia information to the system 100 include, for example, televisions, television broadcast receivers, cable receivers, video recorders, digital video recorders, portable digital terminals (PDAs), and the like. For example, the source of multimedia information can be embodied as a television configured to receive multimedia broadcast signals and transmit the signals to the system 100. In this example, the information source can be a television receiver / antenna that provides live television supply information to system 100. The information source can also be a device, such as a video recorder / player, DVD player, CD player, etc., that supplies video and / or audio streams recorded in the system 100. In other embodiments, the source of information can be a presentation or meeting recorder that can provide a stream of presentation or meeting information captured by the system 100. Further, the source of multimedia information may capture or receive multimedia information from an external source and then provide the captured multimedia information to system 100 for further processing (eg, via a wireless link). It can be a receiver (for example, a satellite broadcasting antenna or a cable receiver). The multimedia content can originate from a dedicated or customized multimedia player, such as RealPlayer®, Microsoft Windows® Media Player, for example.

他の実施形態においては、システム１００は、マルチメディア情報ソースにより受信されたマルチメディア情報信号を途中で捕らえるように構成されることが可能である。システム１００は、マルチメディア情報ソースから、直接、マルチメディア情報を受信することが可能であり、又、通信ネットワークにより情報を受信することが可能である。 In other embodiments, the system 100 can be configured to intercept a multimedia information signal received by a multimedia information source. System 100 can receive multimedia information directly from a multimedia information source and can receive information over a communications network.

拡張出力装置又はプリンタ１０２は、従来のプリンタ１０３、メディア分析ソフトウェアモジュール１０４、処理ロジック１０６及びデジタルメディア出力１０８を含む多くの構成要素から構成されている。プリンタ１０２の従来のプリンタ１０３構成要素は、インクジェットプリンタ、レーザプリンタ又は他の印刷装置のような標準的な又は従来の印刷装置の全ての又は一部の能力を含むことが可能である。それ故、従来のプリンタ１０３は、ペーパー文書を印刷するための機能を有し、又、ファックス器、コピー器及び物理的文書を生成するための他の装置を有することが可能である。印刷システムについての更なる情報は、
“ＮｅｔｗｏｒｋｅｄＰｒｉｎｔｉｎｇＳｙｓｔｅｍＨａｖｉｎｇＥｍｂｅｄｄｅｄＦｕｎｃｔｉｏｎａｌｉｔｙｆｏｒＰｒｉｎｔｉｎｇＴｉｍｅ−ＢａｓｅｄＭｅｄｉａ”と題され、２００４年３月３０日に、Ｈａｒｔ等により出願された米国特許出願公開第１０／８１４，９４８号明細書において提供されており、この文献の援用によって発明の説明の一部を代替する。 The extended output device or printer 102 is comprised of a number of components including a conventional printer 103, media analysis software module 104, processing logic 106, and digital media output 108. The conventional printer 103 components of the printer 102 can include all or part of the capabilities of a standard or conventional printing device, such as an inkjet printer, laser printer, or other printing device. Thus, the conventional printer 103 has the capability to print paper documents, and can include fax machines, copiers, and other devices for generating physical documents. More information about the printing system
Entitled “Networked Printing System Having Embedded Functionality for Printing Time-Based Media” and provided in US Patent Application Publication No. 10 / 814,948, filed March 30, 2004 by Hart et al. This document is replaced by a part of the description of the invention.

メディア分析ソフトウェアモジュール１０４は又、音声及び映像コンテンツ認識及び処理ソフトウェアを含む。メディア分析ソフトウェアモジュール１０４は、プリンタ１０２において位置付けられることができ、又は、例えば、パーソナルコンピュータ（ＰＣ）等において遠隔的に位置付けられることができる。そのようなマルチメディア分析ソフトウェアの一部の例は、映像イベント検出、映像フォアグラウンド／バックグラウンドセグメント化、顔検出、顔画像マッチング、顔認識、顔カタログ化、映像テキスト定位化、映像光学式文字認識（ＯＣＲ）、言語変換、フレーム分類、クリップ分類、画像ステッチング、音声リフォーマッタ、発話認識、音声イベント検出、音声波形マッチング、音声キャプションアライメント、映像ＯＣＲ及びキャプションアライメントを含むが、それらに限定されるものではない。一旦、ユーザがシステム１００において“印刷”を選択すると、システム１００は、１つ又はそれ以上のこれらの技術を用いてマルチメディアコンテンツを分析し、そのユーザが文書を生成することができる分析結果をユーザに提供することができる、
図１に示す実施形態において、プリンタ１０２は、ＰＤＤＩ１２２を制御する処理ロジック１０６を付加的に有し、マルチメディア文書１２０又はメディア表現のプリンタ１０２による生成を管理する。例えば、処理ロジック１０６は、マルチメディアコンテンツがマルチメディア表現において表示されるマルチメディアコンテンツ又はフォーマットの処理のような特定のプリンタのアクションをユーザが管理することを可能にするＰＤＤＩ１２２の表示を管理する。それに代えて、ＰＤＤＩ１２２の機能は、ウェブインタフェースにより提供されることができ、このウェブにより、イシューのフォーマット化のようなプリンタのアクションをユーザが管理することを可能にする。 Media analysis software module 104 also includes audio and video content recognition and processing software. The media analysis software module 104 can be located in the printer 102 or can be remotely located, for example, in a personal computer (PC) or the like. Some examples of such multimedia analysis software are video event detection, video foreground / background segmentation, face detection, face image matching, face recognition, face cataloging, video text localization, video optical character recognition (OCR), language conversion, frame classification, clip classification, image stitching, audio reformatter, speech recognition, audio event detection, audio waveform matching, audio caption alignment, video OCR and caption alignment It is not a thing. Once the user selects “print” in the system 100, the system 100 analyzes the multimedia content using one or more of these techniques and generates an analysis result that allows the user to generate a document. Can be provided to the user,
In the embodiment shown in FIG. 1, the printer 102 additionally includes processing logic 106 that controls the PDDI 122 to manage the generation of the multimedia document 120 or media representation by the printer 102. For example, the processing logic 106 manages the display of the PDDI 122 that allows the user to manage certain printer actions, such as processing multimedia content or formats in which the multimedia content is displayed in a multimedia representation. Alternatively, the functionality of the PDDI 122 can be provided by a web interface that allows the user to manage printer actions such as issue formatting.

図１に示す例においては、ＰＤＤＩ１２２は、映像コンテンツを表示するマルチメディア文書１２０の印刷を含むユーザによる選択を表示する。この例において、ユーザは、映像ペーパーフォーマットの状態で印刷されたマルチメディアコンテンツを有するように選択し、映像ペーパーはシーン毎に１つのフレームを表示する。又、インタフェース１２２は、ユーザが生成しているマルチメディア表現のプレビューを表示するプレビューフィールド１２４を含む。図１の例においては、ＰＤＤＩ１２２は表示フレームのサムネールピクチャ１４２を示す。 In the example shown in FIG. 1, PDDI 122 displays a user selection including printing of multimedia document 120 displaying video content. In this example, the user selects to have multimedia content printed in video paper format, and the video paper displays one frame per scene. The interface 122 also includes a preview field 124 that displays a preview of the multimedia representation being generated by the user. In the example of FIG. 1, PDDI 122 shows a thumbnail picture 142 of the display frame.

更に、ＰＤＤＩ１２２は、生成されるマルチメディア文書に関連する好みのフォーマット化をユーザが設定することを可能にする。一部の実施形態においては、ユーザは、文書のフォーマット及びレイアウト、フォントタイプ及びサイズ、各々のラインに表示される情報、ヘッダに表示される情報、スケジュールカラムのサイズ及び位置、フォントの色、ライン間隔、ライン当たりの文字数、ボールド化及び大文字化技術、文書が印刷される言語、ペーパーサイズ、ペーパーの種類等に関する好みを設定することができる。例えば、ユーザは、表示されるマルチメディアコンテンツの名前（例えば、ＣＮＮニュースのセグメント）を示す、大きい、ボールドの状態のヘッダを含むマルチメディア文書を有することを選択することが可能であり、ユーザは、ページ当たりに表示される映像フレームの配置を選択することができる。 In addition, PDDI 122 allows the user to set the preferred formatting associated with the generated multimedia document. In some embodiments, the user can format and layout the document, font type and size, information displayed in each line, information displayed in the header, schedule column size and position, font color, line Preferences regarding spacing, number of characters per line, bolding and capitalization technology, language in which the document is printed, paper size, paper type, etc. can be set. For example, the user can choose to have a multimedia document that includes a large, bold header that indicates the name of the multimedia content to be displayed (eg, a segment of CNN news). The arrangement of video frames displayed per page can be selected.

図１の実施形態に示すように、文書フォーマット仕様（ＤｏｃｕｍｅｎｔＦｏｒｍａｔＳｐｅｃｉｆｉｃａｔｉｏｎ（ＤＦＳ））１１２と呼ばれるデータ構造が印刷ドライバソフトウェアにより生成される。ＤＦＳ１１２は、マルチメディアデータの変換を表す。ＤＦＳ１１２はＰＤＤＩ１２２を格納するために用いられ、システム１００により修正される。ＤＦＳ１１２は、ユーザに提供される特徴抽出オプションを決定し、その特徴抽出オプションはマルチメディアデータに適用されることができる。ＤＦＳ１１２は又、出力文書を生成するために用いられるフォーマットのガイドラインを決定する。ＤＦＳ１１２は、ＰＣにおける印刷ドライバのような外部のアプリケーションにより供給されることができ、又は、プリンタ１０２において内部で決定されることができる。 As shown in the embodiment of FIG. 1, a data structure called a document format specification (DFS) 112 is generated by print driver software. DFS 112 represents the conversion of multimedia data. DFS 112 is used to store PDDI 122 and is modified by system 100. The DFS 112 determines feature extraction options provided to the user, and the feature extraction options can be applied to the multimedia data. The DFS 112 also determines the format guidelines used to generate the output document. The DFS 112 can be supplied by an external application, such as a print driver in a PC, or can be determined internally in the printer 102.

ＤＦＳ１１２は、マルチメディアコンテンツのタイトル、マルチメディアコンテンツのプロデューサ／発行人等についての情報のような、マルチメディアファイルについてのメタデータ情報を含むことができる。ＤＦＳ１１２は又、マルチメディアセグメントの開始時間及び終了時間（例えば、音声記録の開始時間及び終了時間）、及びタイムラインに沿って表示されることができるマルチメディアデータのグラフィック表現の指定（例えば、時間に対して音声信号の振幅を示す波形）のような他の情報を含むことができる。ＤＦＳ１１２は、タイムラインに沿って表示されることが可能である各々のタイムスタンプ（例えば、テキストタグ又はバーコード）に対するメタデータ及びタイムスタンプマーカー、並びに物理的マルチメディア文書１２０のアピアランスを決めるレイアウトパラメータについての指定を更に含むことができる。ＤＦＳ１１２についての更なる情報及び例は、“
ＰｒｉｎｔａｂｌｅＲｅｐｒｅｓｅｎｔａｔｉｏｎｓｆｏｒＴｉｍｅ−ＢａｓｅｄＭｅｄｉａ”と題され、２００４年３月３０日に、Ｈｕｌｌ等により出願された米国特許出願公開第１０／８１４，８４４号明細書において提供されており、この文献の援用によって発明の説明の一部を代替する。 The DFS 112 may include metadata information about the multimedia file, such as information about the title of the multimedia content, producer / publisher of the multimedia content, etc. The DFS 112 also specifies the start time and end time of the multimedia segment (eg, the start time and end time of the audio recording) and the designation of the graphical representation of the multimedia data that can be displayed along the timeline (eg, time). Other information such as (a waveform indicating the amplitude of the audio signal). DFS 112 provides metadata and timestamp markers for each timestamp (eg, text tag or barcode) that can be displayed along the timeline, and layout parameters that determine the appearance of physical multimedia document 120. A designation for can be further included. For more information and examples about DFS 112, see “
“Printable Representations for Time-Based Media” and is provided in US Patent Application Publication No. 10 / 814,844, filed March 30, 2004, by Hull et al. Substitute part of the description of the invention.

プリンタ１０２により生成されたマルチメディア文書１２０は種々のフォーマットから構成されることができる。例えば、マルチメディア文書１２０は、図１に示すフォームの映像ペーパーのようなペーパー文書から構成されることができる。プリンタ１０２により提供されるマルチメディア文書１２０は又、デジタルメディア１４４において記憶されることができる。図１に示すように、このプリンタ１０２の実施形態は、デジタルメディア出力装置又はインタフェース１０８を含む。デジタルメディア書き込みハードウェアは、例えば、ネットワークインタフェースカード、ＤＶＤ（ＤｉｇｉｔａｌＶｉｄｅｏＤｉｓｃ）ライタ、セキュアデジタル（ＳＤ）ライタ、ＣＤ（ＣｏｍｐａｃｔＤｉｓｃ）ライタ等を含むことができる。マルチメディアコンテンツは、例えば、フラッシュメディア、ＤＶＤ、ＣＤ等のデジタルメディア１４４に記憶されることができる。 The multimedia document 120 generated by the printer 102 can be composed of various formats. For example, the multimedia document 120 may be composed of a paper document such as a video paper of the form shown in FIG. The multimedia document 120 provided by the printer 102 can also be stored on the digital media 144. As shown in FIG. 1, this printer 102 embodiment includes a digital media output device or interface 108. The digital media writing hardware can include, for example, a network interface card, a DVD (Digital Video Disc) writer, a secure digital (SD) writer, a CD (Compact Disc) writer, and the like. The multimedia content can be stored in digital media 144 such as flash media, DVD, CD, for example.

マルチメディア文書１２０は多くの異なるタイプのレイアウトを有し、種々のタイプの情報を表示することができる。図１は、１つ又はそれ以上のニュースセグメントからの映像フレームを表示する映像ペーパー文書の例を示している。図１の例において、映像ペーパー文書は、映像情報から抽出されたサムネール画像又はフレーム１３２を含み、ユーザがプレビューすることができる映像コンテンツを表示することができる。この実施形態においては、ユーザは、ＰＤＤＩ１２２により映像ペーパー文書に対する好みのフォーマット化を指定することができる。レイアウト及びフォーマット情報は、眼ル地メディアフレーム１３２を抽出するためのサンプリングレート、映像情報から抽出されるフレーム１３２の数、メディアにおけるフレーム１３２の順序及び位置並びにその他の情報を指定することが可能である。映像情報に対して、プリンタ１０２は、マルチメディア情報の特定のセグメントに対する映像（又は、情報を与えるフレーム）の顕著な特徴を捕捉するフレームを抽出することができる。更に、上記のように、プリンタ１０２は、例えば、特定の顔画像、テキストとして表示される特定の言葉等の、興味をもたれたアイテムに対してユーザが映像セグメントにおいて検索することを可能にする、特徴認識能力（例えば、顔認識、顔検出、ＯＣＲ等）を含むことが可能である。例えば、プリンタ１０２は、ユーザが見ることに興味を有する特定の人物の顔の画像を表示するフレームを抽出するための顔認識技術を使用することができる。 Multimedia document 120 has many different types of layouts and can display various types of information. FIG. 1 shows an example of a video paper document that displays video frames from one or more news segments. In the example of FIG. 1, the video paper document includes thumbnail images or frames 132 extracted from the video information and can display video content that the user can preview. In this embodiment, the user can specify a preferred formatting for the video paper document via PDDI 122. The layout and format information can specify the sampling rate for extracting the media frame 132, the number of frames 132 extracted from the video information, the order and position of the frames 132 in the media, and other information. is there. For video information, the printer 102 can extract frames that capture salient features of the video (or information-providing frames) for a particular segment of multimedia information. Further, as described above, the printer 102 allows the user to search in the video segment for items of interest such as, for example, specific facial images, specific words displayed as text, Feature recognition capabilities (eg, face recognition, face detection, OCR, etc.) can be included. For example, the printer 102 may use a face recognition technique to extract a frame that displays an image of a particular person's face that the user is interested in seeing.

本発明の他の実施形態においては、ユーザ選択可能識別子１３４（例えば、バーコード）が各々のフレーム１３２と関連付けられる。図１の例においては、ユーザ選択可能識別子１３４は各々のフレーム１３２に下に表示されるが、これらは又、このページのどこかに表示することができる。ユーザ選択可能識別子１３４は、ユーザがマルチメディア文書１２０において表示されるマルチメディアコンテンツにアクセスし又はそれを検索することを可能にするインタフェースとして機能する。ユーザは、携帯電話又はＰＤＡのような、組み込まれるバーコードスキャナを有するいずれの種類の装置を用いて、印刷されたペーパー文書における適切なバーコードを走査することにより、ユーザ選択可能識別子を選択する。例えば、図１のバーコードを走査することにより、ユーザは、映像クリップが表示装置（例えば、テレビジョン、ＰＣモニタ、携帯電話のスクリーン、ＰＤＡ等）において表示するようにでき、ユーザは、コンテンツを見ることができる。他の例としては、ペーパーマルチメディア文書１２０は又、各々のフレーム１３２に含まれる数字識別子を含むことができ、ユーザは、システム１００が表示装置における映像クリップを表示するようにする装置に関連付けられるタッチパッド又はキーパッドにそれらの数字をタイプすることができる。又、図１に示す映像ペーパー文書がデジタルフォーマットの状態にある場合、システム１００は、映像コンテンツが表示装置において表示されるようにするフレーム１３２をユーザが選択するように（即ち、マウス又は他の選択装置を用いてそのフレームをクリックすることにより）構成されることが可能である。 In other embodiments of the invention, a user selectable identifier 134 (eg, a barcode) is associated with each frame 132. In the example of FIG. 1, user selectable identifiers 134 are displayed below each frame 132, but they can also be displayed somewhere on this page. User selectable identifier 134 serves as an interface that allows a user to access or retrieve multimedia content displayed in multimedia document 120. A user selects a user-selectable identifier by scanning the appropriate barcode in a printed paper document using any type of device that has an integrated barcode scanner, such as a cell phone or PDA. . For example, by scanning the bar code of FIG. 1, the user can cause the video clip to be displayed on a display device (eg, television, PC monitor, cell phone screen, PDA, etc.), and the user can view the content. Can see. As another example, the paper multimedia document 120 can also include a numeric identifier included in each frame 132, and the user is associated with a device that allows the system 100 to display a video clip on the display device. You can type those numbers on the touchpad or keypad. Also, if the video paper document shown in FIG. 1 is in a digital format, the system 100 allows the user to select a frame 132 that causes the video content to be displayed on the display device (ie, a mouse or other device). Can be configured (by clicking on the frame with the selection device).

プリンタ１０２は、ユーザ選択可能識別子１３４に対応するマルチメディア情報を検索することができる。選択装置（即ち、数字識別子に入るためのキーパッド又はバーコードスキャナを有する装置）からプリンタ１０２に通信された信号は、ユーザにより選択されるマルチメディアコンテンツフレーム１３２、表示されるマルチメディアコンテンツの位置、セグメントが選択されるマルチメディアペーパー文書、好み及び／又はユーザにより選択された１つ又はそれ以上のマルチメディア表示装置（例えば、テレビジョンセット）に関連する情報、並びに要求されたマルチメディア情報の検索を容易にするための他の情報を識別することが可能である。例えば、システム１００はＰＣにおいて記憶された映像ファイルにアクセスすることができ、そのシステムはユーザの命令の際にこの映像コンテンツをプレイすることができる。 The printer 102 can retrieve multimedia information corresponding to the user selectable identifier 134. A signal communicated to the printer 102 from a selection device (ie, a device having a keypad or barcode scanner for entering a numeric identifier) is sent to the multimedia content frame 132 selected by the user, the location of the displayed multimedia content. Information relating to the multimedia paper document from which the segment is selected, preferences and / or one or more multimedia display devices (eg, television sets) selected by the user, and requested multimedia information It is possible to identify other information to facilitate the search. For example, the system 100 can access a video file stored on the PC, and the system can play this video content upon user command.

図１の例は、マルチメディア文書１２０における各々のフレームの次のテキスト情報を更に示す。テキスト情報は、話者名フィールド１２６又は映像のフレーム１３２において示される人物の名前（例えば、ＢｒｉｔＨｕｍｅ）を表示するフィールドを含む。テキスト情報は、映像セグメントの主題（例えば、ＩｎｔｒｏＩｎｔｅｌ−ｇａｔｅ）についての情報を表示する主題フィールド１２８を更に含む。又、テキスト情報はタイムフィールド１３０を含み、そのタイムフィールドは、映像セグメントの時間の長さ（例えば、３分５２秒）を表示する。 The example of FIG. 1 further shows the text information following each frame in the multimedia document 120. The text information includes a field for displaying a speaker name field 126 or a person's name (for example, Brit Human) shown in the video frame 132. The text information further includes a subject field 128 that displays information about the subject of the video segment (eg, Intro Intel-gate). The text information also includes a time field 130, which displays the length of time of the video segment (eg, 3 minutes 52 seconds).

ユーザは又、フレーム１３２に対する音声情報の一部をマルチメディア文書１２０に含んでいることを選択することが可能であり、その音声情報の一部はテキストとして表示される。例えば、ユーザは、マルチメディアフレーム１３２の次に表示されるマルチメディアセグメントの写し（即ち、ニュースプログラムセグメントのコピー）の一部を有することを選択することが可能である。他の例としては、ユーザは、特定のテレビジョンセグメント又はプログラムの要約のような、各々のフレーム１３２にコンテンツの要約又はテキスト記述を印刷文書に含むことを選ぶことが可能である。ユーザは、音声情報をテキスト情報に変換するために用いられる技術（即ち、音声情報のためのテキストコピーを生成するための技術）と、音声コピーを印刷するためのフォーマット及びスタイル（テキスト情報を印刷することに対してと同様にすることが可能である）と、マルチメディアコンテンツについての要約のできストを印刷するためのフォーマット及びスタイル、等を識別するために印刷ドライバダイアログインタフェース１２２を用いることができる。更に、マルチメディア情報を検索すること及びマルチメディア情報に注釈を付けることについての情報が、上記の映像ペーパーアプリケーションに供給される。 The user can also choose to include a portion of the audio information for frame 132 in multimedia document 120, and the portion of the audio information is displayed as text. For example, the user may choose to have a portion of the multimedia segment copy (ie, a copy of the news program segment) displayed next to the multimedia frame 132. As another example, the user can choose to include a summary or text description of the content in each frame 132, such as a summary of a particular television segment or program, in the printed document. The user uses the technology used to convert the speech information into text information (ie, the technology for generating a text copy for the speech information) and the format and style for printing the speech copy (print the text information). The print driver dialog interface 122 can be used to identify the format and style for printing a summary list of multimedia content, etc. it can. In addition, information about retrieving multimedia information and annotating multimedia information is provided to the video paper application.

ここで、図２を参照するに、本発明の実施形態のアーキテクチャを示している。この実施形態において、システム２００は、データ処理システムに結合されたプリンタ１０２を含み、そのデータ処理システムは、図２の実施形態においてはＰＣ２３０であるが又、ポータブルコンピュータ、ワークステーション、コンピュータ端末、ネットワークコンピュータ、メインフレーム、キオスク、標準的リモートコントロール、ＰＤＡ、ゲーム制御器、携帯電話又はいずれの他のデータシステムであってもよい。プリンタ１０２は又、ネットワーク環境においてアプリケーションサーバ２１２に光学的に結合されることができる。 Now referring to FIG. 2, the architecture of an embodiment of the present invention is shown. In this embodiment, system 200 includes a printer 102 coupled to a data processing system, which is a PC 230 in the embodiment of FIG. 2, but is also a portable computer, workstation, computer terminal, network. It can be a computer, mainframe, kiosk, standard remote control, PDA, game controller, mobile phone or any other data system. The printer 102 can also be optically coupled to the application server 212 in a network environment.

図２の例においては、プリンタ１０２は、次のような構成要素から構成される。即ち、従来のプリンタ１０３、処理器２１４、マルチメディア記憶器２０２及びデジタルメディア入力／出力部１０８である。従来のプリンタ１０３は、上記のように、従来のプリンタが一般に有する標準的な印刷能力を有する。 In the example of FIG. 2, the printer 102 includes the following components. That is, the conventional printer 103, processor 214, multimedia storage 202, and digital media input / output unit 108. As described above, the conventional printer 103 has the standard printing capability that the conventional printer generally has.

処理器２１４はデータ信号を処理し、ＣＩＳＣ（ＣｏｍｐｌｅｘＩｎｓｔｒｕｃｔｉｏｎＳｅｔＣｏｍｐｕｔｅｒ）アーキテクチャ、ＲＩＳＣ（ＲｅｄｕｃｅｄＩｎｓｔｒｕｃｔｉｏｎＳｅｔＣｏｍｐｕｔｅｒ）アーキテクチャ、又は命令の集合の組み合わせを実行するアーキテクチャを含む種々のコンピューティングアーキテクチャから構成されることが可能である。１つの信号処理器のみが図２に示されているが、複数の処理器を含むことが可能である。主メモリ（図示せず）は、システム２００のソフトウェア及び他の構成要素を含む処理器２１４により実行されることが可能である、命令及び／又はデータを記憶することが可能である。命令及び／又はデータは、ここで説明する技術のいずれか及び／又は全てを実行するためのコードから構成されることが可能である。主メモリ（図示せず）は、ＤＲＡＭ（ＤｙｎａｍｉｃＲａｎｄｏｍＡｃｃｅｓｓＭｅｍｏｒｙ）デバイス、ＳＲＡＭ（ＳｔａｔｉｃＲａｎｄｏｍＡｃｃｅｓｓＭｅｍｏｒｙ）デバイス又は当該技術分野で既知のある他のメモリデバイスであることが可能である。 The processor 214 is composed of various computing architectures that process data signals and include a CISC (Complex Instruction Set Computer) architecture, a RISC (Reduce Instruction Set Computer) architecture, or an architecture that executes a combination of instruction sets. Is possible. Although only one signal processor is shown in FIG. 2, it is possible to include multiple processors. Main memory (not shown) may store instructions and / or data that can be executed by processor 214, which includes software and other components of system 200. The instructions and / or data can be comprised of code for performing any and / or all of the techniques described herein. The main memory (not shown) can be a DRAM (Dynamic Random Access Memory) device, an SRAM (Static Random Access Memory) device, or other memory device known in the art.

上記のように、プリンタ１０２は、あるソースから、音声ファイル又は映像ファイルのようなマルチメディア情報にアクセスする又はそれを受信する。一実施形態においては、マルチメディアファイルは、ＰＣ２３０のようなデータ処理システムにおいて記憶され、そのデータ処理システムは、信号ライン２４８によりプリンタ１０２に結合されている。図２の実施形態においては、マルチメディアファイルは、ＰＣ２３０におけるマルチメディアファイル記憶器２６４において記憶されることができる。マルチメディアファイルは又、ある遠隔ソース（図示せず）からアクセスされることが可能である。他の例としては、マルチメディアファイルは、それ自身、プリンタ１０２自身のプリンタマルチメディア記憶器２０２に記憶されることが可能であり、そのファイルはこの記憶部からアクセスされる。 As described above, the printer 102 accesses or receives multimedia information such as audio or video files from a certain source. In one embodiment, the multimedia file is stored in a data processing system, such as PC 230, which is coupled to printer 102 by signal line 248. In the embodiment of FIG. 2, the multimedia file can be stored in a multimedia file store 264 in the PC 230. Multimedia files can also be accessed from some remote source (not shown). As another example, the multimedia file can itself be stored in the printer multimedia storage 202 of the printer 102 itself, and the file is accessed from this storage.

ユーザは、上記のように、プリンタ１０２を用いて印刷するように特定にコンテンツを選択するために、表示装置（図示せず）においてマルチメディアコンテンツを見ることができる。表示装置（図示せず）は、ブラウン管（ＣＲＴ）、液晶表示装置（ＬＣＤ）のようなフラットパネル装置、投射装置等を含む。他の実施形態においては、プリンタ１０２は、ＬＣＤ表示パネル又は他の種類の表示パネルを含み、ユーザはプリンタ自身のマルチメディアコンテンツを表示することができる。 As described above, a user can view multimedia content on a display device (not shown) in order to select content specifically for printing using printer 102. The display device (not shown) includes a flat panel device such as a cathode ray tube (CRT) and a liquid crystal display device (LCD), a projection device, and the like. In other embodiments, the printer 102 includes an LCD display panel or other type of display panel that allows a user to display the printer's own multimedia content.

図２の実施形態においては、ユーザは、ユーザがマルチメディア情報を再生、記憶、インデックス付け、編集又は操作することを可能にするＰＣ２３０におけるマルチメディアレンダリングアプリケーション（ＭＲＡ）２０４を用いて、マルチメディアファイルを見ることができる。ＭＲＡ２０４の例は、専用の又はカスタマイズされたマルチメディアプレーヤ（例えば、ＲｅａｌＮｅｔｗｏｒｋｓ製のＲｅａｌＰｌａｙｅｒ（登録商標）、Ｍｉｃｒｏｓｏｆｔ社製のＭｉｃｒｏｓｏｆｔＷｉｎｄｏｗｓ（登録商標）ＭｅｄｉａＰｌａｙｅｒ、Ａｐｐｌｅ社製のＱｕｉｃｋＴｉｍｅ（登録商標）Ｐｌａｙｅｒ、Ｓｈｏｃｋｗａｖｅマルチメディアプレーヤ、その他）と、映像プレーヤ、テレビジョン、ＰＤＡ等を含む。図２の実施形態においては、ＭＲＡ２０４は、バス２４４によりマルチメディアファイル記憶器２６４に結合されている。記憶されたマルチメディアコンテンツは、ＭＲＡ２０４によりアクセスされ、ユーザが見ることができるようにＭＲＡ２０４に転送されることができる。マルチメディアビジュアル化についての更なる情報は、次のような米国特許出願公開であって、それらの文献の各々の援用によって発明の説明の一部を代替する。即ち、それらの文献は：“ＭｕｌｔｉｍｅｄｉａＶｉｓｕａｌｉｚａｔｉｏｎａｎｄＩｎｔｅｇｒａｔｉｏｎＥｎｖｉｒｏｎｍｅｎｔ”と題され、２００１年２月２１日に、Ｇｒａｈａｍにより出願された米国特許出願公開第１０／０８１，１２９号明細書；“ＭｕｌｔｉｍｅｄｉａＶｉｓｕａｌｉｚａｔｉｏｎａｎｄＩｎｔｅｇｒａｔｉｏｎＥｎｖｉｒｏｎｍｅｎｔ”と題され、２００３年１１月４日に、Ｇｒａｈａｍにより出願された米国特許出願公開第１０／７０１，９６６号明細書；“ＩｎｔｅｒｆａｃｅＦｏｒＰｒｉｎｔｉｎｇＭｕｌｔｉｍｅｄｉａＩｎｆｏｒｍａｔｉｏｎ”と題され、２００３年６月１８日に、Ｇｒａｈａｍ等により出願された米国特許出願公開第１０／４６５，０２７号明細書；“ＴｅｃｈｎｉｑｕｅｓＦｏｒＤｉｓｐｌａｙｉｎｇＩｎｆｏｒｍａｔｉｏｎＳｔｏｒｅｄＩｎＭｕｌｔｉｐｌｅＭｕｌｔｉｍｅｄｉａＤｏｃｕｍｅｎｔ”と題され、２００３年６月１８日に、Ｇｒａｈａｍ等により出願された米国特許出願公開明細書；“Ｔｅｌｅｖｉｓｉｏｎ−ＢａｓｅｄＶｉｓｕａｌｉｚａｔｉｏｎａｎｄＮａｖｉｇａｔｉｏｎＩＮｔｅｒｆａｃｅ”と題され、２００２年６月１７日に、Ｇｒａｈａｍにより出願された米国特許出願公開第１０／１７４，５２２号明細書；及び“ＭｕｌｔｏｍｅｄｉａＶｉｓｕａｌｉｚａｔｉｏｎａｎｄＩｎｔｅｇｒａｔｉｏｎＥｎｖｉｒｏｎｍｅｎｔ”と題され、２００４年３月３日に、Ｇｒａｈａｍにより出願された米国特許出願公開明細書；である。 In the embodiment of FIG. 2, the user uses a multimedia rendering application (MRA) 204 on the PC 230 that allows the user to play, store, index, edit or manipulate multimedia information. Can see. Examples of MRA 204 are dedicated or customized multimedia players (eg, RealPlayer (registered trademark) from RealNetworks, Microsoft Windows (registered trademark) Media Player from Microsoft), QuickTime (registered trademark) player from Apple, Inc. Shockwave multimedia player, etc.), video player, television, PDA and the like. In the embodiment of FIG. 2, MRA 204 is coupled to multimedia file store 264 by bus 244. The stored multimedia content can be accessed by MRA 204 and transferred to MRA 204 for viewing by the user. Further information on multimedia visualization is the following US patent application publications, each of which is incorporated by reference in its entirety. That is, these documents are entitled: “Multimedia Visualization and Integration Environment”, US Patent Application Publication No. 10 / 081,129 filed by Graham on February 21, 2001; “Multimedia Visualization and Integration”. No. 10 / 701,966 entitled “Environment” and filed by Graham on Nov. 4, 2003; entitled “Interface For Printing Multimedia Information”, 18 June 2003 And US Patent Application Publication No. 10 / 465,027 filed by Graham et al .; “Tech "Request For Displaying Information Stored In Multiple Multimedia Document", published on June 18, 2003 by Graham et al., "Television-Based Visualization 200" United States Patent Application Publication No. 10 / 174,522 filed by Graham on June 17, and entitled “Multimedia Visualization and Integration Environment”, filed by Graham on March 3, 2004 US Patent Application Publication.

図２の実施形態において、システム２００は又、ＰＣ２３０において又は他の位置に位置付けられることができる、出力装置ドライバモジュール又はプリンタドライバソフトウェアモジュール２０８を含む。プリンタドライバソフトウェアモジュール２０８は、特定の機能を実行するためにインストール時に設定される。プリンタドライバソフトウェアモジュール２０８は、Ｗｉｎｄｏｗｓ（登録商標）ＭｅｄｉａＰｌａｙｅｒのような既存のＭＲＡ２０４に“印刷”機能を加えている。任意のアプリケーションプラグイン２０６は、“印刷”機能を加えるために必要とされる。代替として、ユーザは、この目的のために指定された、分離したＭＲＡ２０４をインストールすることができる。プリンタ１０２が呼び出される（即ち、ユーザがＭＲＡ２０４の印刷ボタンを選択する）とき、プリンタドライバソフトウェアモジュール２０８は、マルチメディアデータ及び信号ライン２４６を通る他の関連情報と共に、ＭＲＡ２０４からの印刷要求を受信する。プリンタドライバソフトウェアモジュール２０８は、バス２４８を介してプリンタ１０２にマルチメディアデータを送信し、プリンタが指定された変換ルーチン（例えば、顔認識）を適用するように命令する。プリンタドライバソフトウェアモジュール２０８は、ユーザが、必要に応じて、ユーザが行ったレイアウトの決定及び結果を確認するように更に指示することができる。 In the embodiment of FIG. 2, the system 200 also includes an output device driver module or printer driver software module 208 that can be located at the PC 230 or elsewhere. The printer driver software module 208 is set during installation to perform a specific function. The printer driver software module 208 adds a “print” function to an existing MRA 204 such as Windows (registered trademark) Media Player. An optional application plug-in 206 is required to add a “print” function. Alternatively, the user can install a separate MRA 204 designated for this purpose. When the printer 102 is invoked (ie, the user selects the print button on the MRA 204), the printer driver software module 208 receives a print request from the MRA 204 along with multimedia data and other relevant information through the signal line 246. . The printer driver software module 208 sends multimedia data to the printer 102 via the bus 248 and instructs the printer to apply a specified conversion routine (eg, face recognition). The printer driver software module 208 can further instruct the user to confirm the layout decisions and results made by the user, if necessary.

プリンタ１０２が印刷要求を受信するとき、その要求及び関連するマルチメディアデータは処理器２１４に転送される。処理器２１４は入力を解釈し、適切なモジュールをアクティブにする。処理器２１４は、マルチメディアコンテンツを変換するために、マルチメディア変換ソフトウェアモジュール（ＭＴＳ）（図示せず）に結合され、それを制御する。処理器２１４が印刷要求を受信した場合、処理器２１４は、次いで、ユーザがマルチメディアデータの変換を要求したか否かに依存して、ＭＴＳ（図示せず）をアクティブにすることが可能である。マルチメディアコンテンツへの変換は、プリンタ１０２、ＰＣ２３０（即ち、印刷ドライバ２０８を用いてインストールされたソフトウェアによる）又は他の位置において適用されることができる。ＴＭＳ（図示せず）は、所定の音声又は映像ファイルに指定された変換機能を適用する。ＭＴＳ（図示せず）は適切な文書ベースの表現を生成し、その変換のパラメータを修正するため及び結果をプレビューするために印刷ドライバダイアログインタフェースによりユーザと対話する。マルチメディア変換の結果及びパラメータは、上で説明した文書フォーマット仕様（ＤｏｃｕｍｅｎｔＦｏｒｍａｔＳｐｅｃｉｆｉｃａｔｉｏｎ（ＤＥＳ））の状態で表される。 When printer 102 receives a print request, the request and associated multimedia data are forwarded to processor 214. The processor 214 interprets the input and activates the appropriate module. The processor 214 is coupled to and controls a multimedia conversion software module (MTS) (not shown) for converting multimedia content. If processor 214 receives the print request, processor 214 can then activate an MTS (not shown) depending on whether the user has requested conversion of multimedia data. is there. The conversion to multimedia content can be applied at the printer 102, PC 230 (ie, by software installed using the print driver 208) or other location. TMS (not shown) applies a conversion function specified for a predetermined audio or video file. MTS (not shown) generates an appropriate document-based representation and interacts with the user via a print driver dialog interface to modify the parameters of the conversion and preview the results. The results and parameters of the multimedia conversion are expressed in the state of the document format specification (Document Format Specification (DES)) described above.

上記のように、プリンタ１０２は、映像又は音声ファイルのようなマルチメディアデータを記憶するために、マルチメディア記憶部２０２を含むことができる。処理器２１４はマルチメディア記憶部２０２に結合され、マルチメディア記憶部２０２にバス２５１を介してマルチメディアデータを送信することができる。このデータは、印刷ジョブが処理されている間に、記憶されることができる。マルチメディア記憶部２０２は、プログラム実行の間に命令及びデータの記憶に対する主ＲＡＭ（ＲａｎｄｏｍＡｃｃｅｓｓＭｅｍｏｒｙ）と定められた命令を記憶するＲＯＭ（ＲｅａｄＯｎｌｙＭｅｍｏｒｙ）とを含む多くのメモリタイプを含むことが可能である。マルチメディア記憶部２０２は又、例えば、ハードディスクドライブ、フロッピー（登録商標）ディスクドライブ、ＣＤ−ＲＯＭドライブ、ＤＶＤ−ＲＯＭ装置、ＤＶＤ−ＲＡＭ装置、ＤＶＤ−ＲＡＭ装置、ＤＶＤ−ＲＷ装置又は当該技術分野で既知の他の類似した記憶装置のような、プログラム及びデータファイルに対する固定（不揮発性）記憶部を含むことが可能である。１つ又はそれ以上のドライブ又は装置が、他の接続構成要素における遠隔位置に位置付けられることが可能である。 As described above, the printer 102 can include a multimedia storage unit 202 for storing multimedia data such as video or audio files. The processor 214 is coupled to the multimedia storage unit 202 and can transmit multimedia data to the multimedia storage unit 202 via the bus 251. This data can be stored while the print job is being processed. The multimedia storage unit 202 may include a number of memory types including a main RAM (Random Access Memory) for storing instructions and data during program execution and a ROM (Read Only Memory) that stores the defined instructions. Is possible. The multimedia storage unit 202 may also be, for example, a hard disk drive, floppy disk drive, CD-ROM drive, DVD-ROM device, DVD-RAM device, DVD-RAM device, DVD-RW device, or the art. It is possible to include fixed (non-volatile) storage for programs and data files, such as other known similar storage devices. One or more drives or devices can be located at remote locations in other connected components.

処理器２１４は又、デジタルメディア入力／出力部１０８を制御する。処理器２１４は、バス２５０を介して、デジタルメディア入力／出力部１０８に情報を転送し、それから情報を受信する。生成されたマルチメディア文書は、上記のように、あるタイプのデジタルフォーマットに変換されることができる。デジタルメディア書き込みハードウェアは、例えば、ネットワークインタフェースカード、ＤＶＤ（ＤｉｇｉｔａｌＶｉｄｅｏＤｉｓｃ）ライタ、セキュアデジタル（ＳＤ）ライタ、ＣＤ（ＣｏｍｐａｃｔＤｉｓｃ）ライタ等を含むことができる。デジタル出力２６０文書は、ＣＤ、ＤＶＤ，フラッシュメディア等を含むデジタルメディアに記憶されることができる。従って、ユーザは、入力音声又は映像ファイルのデジタル出力２６０バージョンを生成することができ、例えば、ＰＣ、携帯電話又はＰＤＡのような指定された目的装置においてこのデジタル出力２６０バージョンを見ることができる。 The processor 214 also controls the digital media input / output unit 108. The processor 214 transfers information to and receives information from the digital media input / output unit 108 via the bus 250. The generated multimedia document can be converted to some type of digital format as described above. The digital media writing hardware can include, for example, a network interface card, a DVD (Digital Video Disc) writer, a secure digital (SD) writer, a CD (Compact Disc) writer, and the like. The digital output 260 document can be stored on digital media including CD, DVD, flash media, and the like. Thus, the user can generate a digital output 260 version of the input audio or video file, and can view this digital output 260 version on a designated target device such as a PC, mobile phone or PDA, for example.

処理器２１４は又、映像又は音声ペーパー文書のようなマルチメディア文書の生成を管理する。マルチメディア情報は又、図２に示すように、ペーパー文書又はマルチメディア文書１２０において表示されることができる。処理器２１４は、バス２５２を介して従来のプリンタ１０３と通信し、それに印刷ジョブ情報を送信し、従来のプリンタ１０３はパーパー出力を生成する。生成されたマルチメディア文書１２０は、認識ソフトウェアから得られる、入力音声又は映像ファイル情報のペーパー表現を含む。図２のマルチメディア文書の実施形態は又、バーコードのようなユーザ選択可能識別子と、プリンタ１０２により記憶された又は指定されたオンラインデータベースにおいて記憶されたマルチメディアデータへの他のリンクを含むことができる。 The processor 214 also manages the generation of multimedia documents such as video or audio paper documents. The multimedia information can also be displayed in a paper document or multimedia document 120, as shown in FIG. The processor 214 communicates with the conventional printer 103 via the bus 252 and transmits print job information thereto, and the conventional printer 103 generates a paper output. The generated multimedia document 120 includes a paper representation of input audio or video file information obtained from recognition software. The multimedia document embodiment of FIG. 2 also includes a user selectable identifier, such as a barcode, and other links to multimedia data stored in the online database stored or designated by the printer 102. Can do.

処理器２１４は又、ネットワークインタフェース等を介して、外部の通信ハードウェアを制御する。処理器２１４は、バス２５４を介してアプリケーションサーバ２１２に情報を転送し、それから情報を受信することができる。プリンタ１０２は又、アプリケーションサーバ２１２（例えば、“ウェブサービス”システム又は“グリッドコンピューティング”システム）と通信し、それから情報を得ることができる。 The processor 214 also controls external communication hardware via a network interface or the like. The processor 214 can transfer information to and receive information from the application server 212 via the bus 254. The printer 102 can also communicate with and obtain information from an application server 212 (eg, a “web service” system or a “grid computing” system).

一実施形態においては、システム２００は、通信モニタリングモジュール又はユーザインタフェースリスナーモジュール２１０（ＵＩＬｉｓｔｅｎｅｒ）を含む。図２の実施形態においては、ＵＩＬｉｓｔｅｎｅｒ２１０はＰＣ２３０において位置付けられているが、ＵＩＬｉｓｔｅｎｅｒ２１０は、代替として、プリンタ１０２、アプリケーションサーバ２１２又はある他の遠隔位置に位置付けられることができる。ＵＩＬｉｓｔｅｎｅｒ２１０はＭＲＡ２０４と結合され、それと通信し、バス２４０に対してデータを送信し且つ受信することができる。具体的には、ＵＩＬｉｓｔｅｎｅｒ２１０は、ユーザからＭＲＡへの印刷要求を受信し、遠隔の構成要素（例えば、プリンタ１０２、アプリケーションサーバ２１２等）からユーザへの要求を送信する。ＵＩＬｉｓｔｅｎｅｒ２１０は又、プリンタ１０２と結合され、それと通信し、バス２４２に対してデータを送信及び受信することができる。具体的には、ＵＩＬｉｓｔｅｎｅｒ２１０は、プリンタに印刷要求を送信し、プリンタ１０２からユーザからの更なる情報についての要求を受信する。更に、ＵＩＬｉｓｔｅｎｅｒ２１０は、ネットワークにおいてアプリケーションサーバ２１２と結合され、それと通信することができ、ネットワーク接続（図示せず）においてデータを送信及び受信することができる。ＵＩＬｉｓｔｅｎｅｒ２１０は、ユーザからの情報についての要求のような、アプリケーションサーバ２１２からの情報を受信し、ＵＩＬｉｓｔｅｎｅｒ２１０は、応答を返信することができる。ＵＩＬｉｓｔｅｎｅｒ２１０及びその機能については、下で、詳細に説明する。 In one embodiment, the system 200 includes a communication monitoring module or user interface listener module 210 (UI Listener). In the embodiment of FIG. 2, UI Listener 210 is located at PC 230, but UI Listener 210 may alternatively be located at printer 102, application server 212, or some other remote location. The UI listener 210 is coupled to and communicates with the MRA 204 and can send and receive data to and from the bus 240. Specifically, the UI listener 210 receives a print request from the user to the MRA, and transmits a request to the user from a remote component (for example, the printer 102, the application server 212, etc.). UI listener 210 may also be coupled to and communicate with printer 102 to send and receive data to bus 242. Specifically, the UI listener 210 transmits a print request to the printer and receives a request for further information from the user from the printer 102. In addition, the UI listener 210 is coupled to and can communicate with the application server 212 in the network and can send and receive data over a network connection (not shown). The UI listener 210 receives information from the application server 212, such as a request for information from the user, and the UI listener 210 can send back a response. The UI Listener 210 and its functions are described in detail below.

ここで、図３を参照するに、システム２００において、プリンタ１０２との対話式通信のグラフィック表現を示している。プリンタドライバは、典型的には、対話式情報収集を促進しない。一旦、初期のプリンタの設定が捕捉されると、プリンタ１０２との更なる対話は、一般に認められない。この問題に対する一対処方法は、寝たデータを印刷ストリーム自体に組み込むことである。しかしながら、プリンタ１０２は、ユーザにより供給されたデータから実行された計算に応じて、ユーザに更なる情報を要求する必要がある。更に、プリンタ１０２自体は、一部のタスクを他のアプリケーションサーバ２１２に任せることが可能であり、そのような他のアプリケーションサーバ２１２は又、ユーザ３０２から更なる情報を必要とする可能性がある。 Referring now to FIG. 3, a graphical representation of interactive communication with the printer 102 in the system 200 is shown. Printer drivers typically do not facilitate interactive information collection. Once the initial printer settings are captured, further interaction with the printer 102 is generally not allowed. One way to deal with this problem is to incorporate sleeping data into the print stream itself. However, the printer 102 needs to request more information from the user in response to calculations performed from data supplied by the user. In addition, the printer 102 itself can delegate some tasks to other application servers 212, and such other application servers 212 may also require additional information from the user 302. .

基本的オペレーティングシステムのプリンタドライバアーキテクチャを修正することなく、この対話を可能にするように、図３に示すような付加的機構を構成することができる。１つの解決方法は、ＵＩＬｉｓｔｅｎｅｒ２１０であって、ネットワークソケットを聞き、情報についての要求を認め、そのようなデータを得るためにユーザと対話し、要求者にデータを返信するプログラムを構築することである。そのようなプログラムは、有効な対話の固定集合を有し、又はフレキシブルな命令構文を認めることが可能であり、そのフレキシブルな命令構文は、要求者が多くの異なる要求を表示することを可能にする。そのような命令構文の例は、ＨＴＭＬフォームを表示するための標準的ウェブブラウザの能力である。このようなフォームは、遠隔のサーバにより生成され、ブラウザにより表示され、そのブラウザは、次いで、サーバに結果を返信する。この実施形態においては、ユーザ３０２がフォームを見るための初期要求を生成しないという点で、ＵＩＬｉｓｔｅｎｅｒ２１０はブラウザとは異なる。それに代えて、遠隔の機械はこの要求を生成する。それ故、ＵＩＬｉｓｔｅｎｅｒ２１０はサーバであって、クライアントではない。 An additional mechanism as shown in FIG. 3 can be configured to allow this interaction without modifying the basic operating system printer driver architecture. One solution is UI Listener 210, which builds a program that listens to network sockets, accepts requests for information, interacts with the user to obtain such data, and returns data to the requester. is there. Such a program has a fixed set of valid dialogs or can allow flexible instruction syntax, which allows the requester to display many different requests. To do. An example of such an instruction syntax is the ability of a standard web browser to display HTML forms. Such a form is generated by a remote server and displayed by a browser, which then sends the results back to the server. In this embodiment, the UI listener 210 differs from the browser in that the user 302 does not generate an initial request to view the form. Instead, the remote machine generates this request. Therefore, the UI listener 210 is a server, not a client.

この種のネットワークトランザクションは、多くの複雑なエラー状態を引き起こす傾向にあるため、タイムアウトのシステムは効率的な動作を与える。ネットワークに亘って送信される各々のメッセージは、一般に、応答を要求するか又は一方的メッセージである。応答を要求するメッセージはタイムアウト、又は応答が到着するために許容される時間の制限期間を有することができる。本発明においては、組み込まれたメタデータは、更なる情報についての要求を許容するＵＩＬｉｓｔｅｎｅｒ２１０についてのメタデータを含む。そのようなメタデータは、少なくとも、ネットワークアドレス、ポート番号及びタイムアウト期間を有する。要求が、プリンタ１０２、委託されたサーバ又は悪質なエージェントによってもたらされたかどうかが分からないため、そのようなメタデータは又、ユーザ３０２から情報を引き出すための悪意的な試みを回避するように指定された認証情報を含むことが可能である。プリンタ１０２又は委託されたアプリケーションサーバ２１２がさらに情報を求める場合、プリンタ１０２又は委託されたアプリケーションサーバ２１２は、ＵＩＬｉｓｔｅｎｅｒ２１０がユーザに必要な情報を要求する要求に対して上記の情報を用いることができる。ＵＩＬｉｓｔｅｎｅｒ２１０プログラムは、ユーザ３０２の対話装置（例えば、ＰＣ、携帯電話、ＰＤＡ）、プリンタ１０２（即ち、プリンタに位置付けられたＬＣＤパネルにおけるユーザの対話に対する）又は他の遠隔位置に位置付けられることができる。 Because this type of network transaction tends to cause many complex error conditions, a timeout system provides efficient operation. Each message sent across the network is typically a response or a unilateral message. A message requesting a response may have a timeout or a time limit for the time allowed for a response to arrive. In the present invention, the embedded metadata includes metadata about the UI listener 210 that allows a request for further information. Such metadata has at least a network address, a port number, and a timeout period. Such metadata also avoids malicious attempts to retrieve information from the user 302 because it is not known whether the request was made by the printer 102, a delegated server or a malicious agent. It is possible to include specified authentication information. When the printer 102 or the entrusted application server 212 requests further information, the printer 102 or the entrusted application server 212 can use the above information in response to a request that the UI listener 210 requests information necessary for the user. . The UI Listener 210 program can be located at the user 302's interactive device (eg, PC, mobile phone, PDA), printer 102 (ie, for user interaction on the LCD panel located on the printer) or other remote location. .

図３は、上記の対話式通信システムの例を示している。図３の例においては、ユーザ３０２はシステム２００における“印刷”オプションを選択し、その“印刷”オプションは、“印刷”オプションを選択するＭＲＡ２０４又は他の方法に加えられた印刷ボタンをクリックすることにより含めることができる。“印刷”を選択することにより、ユーザ３０２は、ＭＲＡ２０４又は他のアプリケーションに印刷要求３０４を送信する。例えば、ユーザ３０２は、ＣＮＮに関するニュースセグメントからユーザ選択映像フレームの映像ペーパー表現を印刷することを要求することが可能である。ＭＲＡ２０４は、ＵＩＬｉｓｔｅｎｅｒ２１０に要求通知３０６メッセージを送信し、ＵＩＬｉｓｔｅｎｅｒ２１０が印刷要求３０４のプリンタ１０２に通知することを要求する。次いで、映像ペーパーについてのユーザ定義レイアウト及びフォーマット化の好みのような印刷ジョブ３０８情報がＭＲＡ２０４からプリンタ１０２に送信される。印刷ジョブ３０８は、例えば、ＵＩＬｉｓｔｅｎｅｒ２１０のネットワークアドレス、認証情報及び要求に対してクライアントが視聴する最新時間についての情報のような組み込まれた情報を含むことができる。 FIG. 3 shows an example of the above interactive communication system. In the example of FIG. 3, the user 302 selects a “print” option in the system 200, which “print” option clicks a print button added to the MRA 204 or other method of selecting the “print” option. Can be included. By selecting “Print”, the user 302 sends a print request 304 to the MRA 204 or other application. For example, user 302 may request to print a video paper representation of a user selected video frame from a news segment about CNN. The MRA 204 transmits a request notification 306 message to the UI listener 210, and requests that the UI listener 210 notify the printer 102 of the print request 304. Print job 308 information, such as user-defined layout and formatting preferences for video paper, is then sent from the MRA 204 to the printer 102. The print job 308 can include embedded information such as, for example, the network address of the UI listener 210, authentication information, and information about the latest time that the client views for the request.

図３の例においては、プリンタ１０２は、ユーザ３０２の対話の装置に位置付けられたＵＩＬｉｓｔｅｎｅｒ２１０プログラムに対する情報の要求を送信する。例えば、プリンタ１０２は、映像ペーパー印刷ジョブに対してユーザにより選択された特定のレイアウトの好みについての更なる情報を要求することが可能であり、又はデフォルトレイアウトが用いられる必要があることを新たに確認することが可能である。ＵＩＬｉｓｔｅｎｅｒ２１０は、次いで、この要求をユーザ３０２に提供し、ダイアログボックスは、ダイアログボックスにおいて情報を選択することによりユーザ３０２が要求に応答することを可能にして、ユーザ３０２に対して表示される３１２。ユーザ３０２の応答３１４は、情報３１０に対するプリンタ１０２の要求への答の状態で、プリンタ１０２に送信される。 In the example of FIG. 3, the printer 102 sends a request for information to the UI Listener 210 program located in the user 302 interaction device. For example, the printer 102 can request further information about a particular layout preference selected by the user for a video paper print job, or a new default layout needs to be used. It is possible to confirm. The UI listener 210 then provides this request to the user 302, and the dialog box is displayed 312 to the user 302, allowing the user 302 to respond to the request by selecting information in the dialog box. . A response 314 of the user 302 is transmitted to the printer 102 in a state of an answer to the request of the printer 102 for the information 310.

更に、図３の例においては、プリンタ１０２は、アプリケーションサーバ２１２に、情報に対する要求３１６を送信する。例えば、プリンタ１０２は、データベースから印刷オプションに必要な特定のデータを要求することが可能であり、そのデータベースは、ユーザからの更なる情報を収集する必要がある。図３の例においては、アプリケーションサーバ２１２は、ＵＩＬｉｓｔｅｎｅｒ２１０に情報に対する要求３１８を送信し、そのＵＩＬｉｓｔｅｎｅｒ２１０は、次いで、ユーザ３０２にその要求３１８を転送する。ダイアログボックスは、ユーザ３０２がその要求３１８に応答することを可能にして、ユーザに対して表示される３２０。ＵＩＬｉｓｔｅｎｅｒ２１０は、次いで、アプリケーションサーバ２１２へのユーザ３０２の応答３２２を転送し、アプリケーションサーバ２１２は、次いで、情報に対するプリンタ１０２の要求３１６に関して、プリンタ１０２への応答３２４を送信することができる。 Further, in the example of FIG. 3, the printer 102 transmits a request 316 for information to the application server 212. For example, the printer 102 can request specific data needed for print options from a database, which needs to collect further information from the user. In the example of FIG. 3, the application server 212 sends a request 318 for information to the UI Listener 210, which then forwards the request 318 to the user 302. A dialog box is displayed 320 to the user, allowing the user 302 to respond to the request 318. The UI listener 210 may then forward the user 302 response 322 to the application server 212, which may then send a response 324 to the printer 102 in response to the printer 102 request 316 for information.

ここで、図４を参照するに、ＭＲＡ２０４に加えられる“印刷”ボタン４０２を有するＭＲＡ２０４のグラフィック表現を示している。この例においては、ＭＲＡ２０４ボックスはＷｉｎｄｏｗｓ（登録商標）ＭｅｄｉａＰｌａｙｅｒ（ＷＭＰ）アプリケーションであるが、上記のように、ＭＲＡ２０４の他のタイプを用いることが可能である。ユーザは、プリンタがマルチメディア文書を生成するようにする印刷ボタン４０２をクリックすることにより印刷オプションを選択することができる。印刷オプションは、Ｍｉｃｒｏｓｏｆｔ社により供給されるプラグイン機能を利用することにより、ＷＭＰバージョン９に加えられることができる。プラグイン機能は、開発者が、どうにかしてＷＭＰを補うアプリケーションを開発することを可能にする。幾つかのタイプのプラグイン機能は、“表示”、“設定”、“メタデータ”、“ウィンドウ及びバックグラウンド”等を開発することができる。Ｍｉｃｒｏｓｏｆｔ社は、プラグインが何であるか及びプラグインはどのように構築するかについての説明を行っている。ユーザインタフェースプラグインの様式の１つを使用することにより、ＷＭＰスクリーンにボタン又はパネルを加えることが可能である。アプリケーションに印刷オプションを加えることについての更なる情報は、“ＰｒｉｎｔｉｎｇＳｙｓｔｅｍｗｉｔｈＥｍｂｅｄｅｄＡｕｄｉｏ／ＶｉｄｅｏＣｏｎｔｅｎｔＲｅｃｏｇｎｉｔｉｏｎａｎｄＰｒｏｃｅｓｓｉｎｇ”と題され、２００４年３月３０日に、Ｈｕｌｌ等により出願された米国特許出願公開第１０／８１３，９５０号明細書において提供されており、この文献の援用によって発明の説明の一部を代替する。 Referring now to FIG. 4, a graphical representation of MRA 204 having a “print” button 402 added to MRA 204 is shown. In this example, the MRA 204 box is a Windows® Media Player (WMP) application, but as described above, other types of MRA 204 can be used. The user can select print options by clicking a print button 402 that causes the printer to generate a multimedia document. Printing options can be added to WMP version 9 by utilizing plug-in functionality supplied by Microsoft. Plug-in functionality allows developers to somehow develop applications that supplement WMP. Several types of plug-in functions can be developed, such as “view”, “settings”, “metadata”, “windows and backgrounds”, etc. Microsoft explains what a plug-in is and how to build it. By using one of the user interface plug-in styles, buttons or panels can be added to the WMP screen. For more information on adding print options to applications, see “Printing System with Embedded Audio / Video Content Recognition and Processing”, published in US Patent Application Publication No. Hull et al. 10 / 813,950, which is incorporated herein by reference in its entirety.

オプションとして、システム２００はマルチメディアコンテンツを印刷するための方法を提供する。ユーザはＭＲＡにおいて印刷オプションを選択し、初期の印刷ドライバダイアログインタフェース（ＰＤＤＩ）１２２がユーザに対して示される。初期ＰＤＤＩ１２２は、マルチメディアデータを転送するためにプリンタ１０２の能力についての情報を備えている。初期ＰＤＤＩ１２２は、データを転送するために、ユーザが利用可能であるオプションを表示することができ、又は、パラメータのデフォルト集合を有するデフォルト変換を実行した結果を示すことができる。ユーザは、ユーザが好む２つのオプションのどちらかを選択することができ、ユーザの好みは又、プリンタ１０２の特性において設定されることができる。このようなオプションの各々についての動作のフローについては、下で説明する、図５及び６に示されている。実行することができる異なる変換と変換に対するユーザが利用可能であるオプションとについての更なる情報は、“ＰｒｉｎｔｅｒｗｉｔｈＥｍｂｅｄｅｄＡｕｄｉｏ／ＶｉｄｅｏＣｏｎｔｅｎｔＲｅｃｏｇｎｉｔｉｏｎａｎｄＰｒｏｃｅｓｓｉｎｇ”と題され、２００４年３月３０日に、Ｈｕｌｌ等により出願された米国特許出願公開第１０／８１３，９５０号明細書において提供されており、この文献の援用によって発明の説明の一部を代替する。 Optionally, system 200 provides a method for printing multimedia content. The user selects a print option in the MRA and an initial print driver dialog interface (PDDI) 122 is presented to the user. The initial PDDI 122 includes information about the capabilities of the printer 102 for transferring multimedia data. The initial PDDI 122 can display options that are available to the user to transfer data, or can indicate the results of performing a default transformation with a default set of parameters. The user can select between two options that the user prefers, and the user's preferences can also be set in the characteristics of the printer 102. The flow of operation for each such option is shown in FIGS. 5 and 6, described below. Further information on the different transformations that can be performed and the options available to the user for the transformations, entitled “Printer with Embedded Audio / Video Content Recognition and Processing”, on 30 March 2004, Hull In U.S. Patent Application Publication No. 10 / 813,950, filed by E. et al., Which is incorporated herein by reference in its entirety.

ここで、図５を参照するに、いずれのマルチメディア変換が実行される前にユーザに対して表示されるときの、システム２００における動作のフローを説明するフローチャートを示している。この実施形態においては、ユーザは、ＭＲＡにおける印刷ボタン（例えば、図４）を押す５０２ことにより、システムに“印刷”命令を入力する。ユーザは、いずれの変換が実行される前に生成されるマルチメディア文書に関連する好みを定義するために初期ＰＤＤＩ１２２を使用することができる。ユーザは、マルチメディアコンテンツに適用される変換についてのパラメータを選択する５０６。例えば、ユーザは、ユーザ定義配列において表示された特定数の映像フレームを示す文書を有するように選択することができる。 Referring now to FIG. 5, a flowchart illustrating the operational flow in the system 200 when displayed to the user before any multimedia conversion is performed is shown. In this embodiment, the user enters a “print” command into the system by pressing 502 a print button (eg, FIG. 4) in the MRA. The user can use the initial PDDI 122 to define preferences associated with multimedia documents that are generated before any conversion is performed. The user selects 506 parameters for conversion to be applied to the multimedia content. For example, the user can select to have a document showing a specific number of video frames displayed in a user-defined array.

システム２００は、次いで、ＰＤＤＩ１２２におけるＯＫボタン又は更新ボタンをユーザが押すのを待つ５０８。ユーザがキャンセルボタンを選択する場合、システム２００は終了し、ＰＤＤＩ１２２は視野から消える。一旦、ユーザが更新ボタン又はＯＫボタンを選択すると、システム２００はプリンタ１０２にパラメータ及び他のユーザ選択情報を送信する５１０。システム２００は、マルチメディアデータがプリンタ１０２に既に転送されたかどうかを判定する。上記のように、このマルチメディアデータは、ＰＣ、携帯電話、ＰＤＡ又はマルチメディアコンテンツを含むことができる他の装置に位置付けられることが可能である。マルチメディアデータがプリンタ１０２にまだ転送されていない場合、システム２００はプリンタ１０２にマルチメディアデータを転送し５１２、その動作フローは継続される。マルチメディアデータが既にプリンタ１０２に転送されている場合、システム２００は、ユーザ定義パラメータを伴うマルチメディア変換がすでに実行されたか否かを判定する。否定的な場合、プリンタはマルチメディアデータにおける変換を実行する５１４。肯定的な場合、システム２００は、パラメータを入力した後にユーザが更新ボタンを押したか否か、又は、それに代えて、ユーザがＯＫボタンを押したかどうか、を判定する。ユーザが更新ボタンを押していない場合、及び、その代わりにＯＫボタンを押した場合、プリンタ１０２は、文書、マルチメディアデータ及びマルチメディアデータを有するペーパー文書にリンクする制御データを生成する５１６。更に、システム２００は、マルチメディアデータに識別子（例えば、バーコード）を割り当て、どのマルチメディアコンテンツにアクセスするかによりユーザにインタフェースを提供する。必要に応じて、文書が生成される前に、プリンタ１０２は、先ず、印刷ジョブに関連する更なる情報をユーザに促すことが可能である。ＰＤＤＩ１２２に入力されたマルチメディアデータ及び命令についてのメタデータがＤＦＳ１１２において表わされる。 The system 200 then waits 508 for the user to press an OK or update button on the PDDI 122. If the user selects the cancel button, the system 200 ends and the PDDI 122 disappears from view. Once the user selects the update button or the OK button, the system 200 sends 510 parameters and other user selection information to the printer 102. System 200 determines whether multimedia data has already been transferred to printer 102. As described above, this multimedia data can be located on a PC, mobile phone, PDA or other device that can include multimedia content. If the multimedia data has not yet been transferred to the printer 102, the system 200 transfers 512 the multimedia data to the printer 102 and the operational flow continues. If the multimedia data has already been transferred to the printer 102, the system 200 determines whether a multimedia conversion with user-defined parameters has already been performed. If not, the printer performs a conversion 514 on the multimedia data. If yes, the system 200 determines whether the user has pressed the update button after entering the parameters, or alternatively, has pressed the OK button. If the user has not pressed the update button, and instead presses the OK button, the printer 102 generates 516 control data that links to the document, multimedia data, and a paper document having multimedia data. In addition, the system 200 assigns identifiers (eg, barcodes) to multimedia data and provides an interface to the user depending on which multimedia content is accessed. If necessary, before the document is generated, the printer 102 may first prompt the user for further information related to the print job. Metadata about multimedia data and commands input to the PDDI 122 is represented in the DFS 112.

ユーザが、ＯＫボタンではなく、更新ボタンを押した場合、ユーザは、まだ、プリンタ１０２がマルチメディア文書を生成するように要求していない。そうではなく、ＰＤＤＩ１２２におけるユーザ選択パラメータをユーザが修正したとき、ユーザは更新ボタンを押し、ユーザは更新されるＰＤＤＩ１２２のプレビューフィールドを求める。ユーザが更新ボタンを押した場合、システム２００は、対話式ＰＤＤＩ１２２における表示についての結果を対話式に返信する５１８。このことは、新たに加えられたパラメータの修正を伴ってマルチメディア文書がどのように現れるかをユーザがプレビューすることを可能にする。動作のフローは、次いで、ユーザがパラメータを選択する５０６機会を有する時点に戻り、システム２００は、再び、そのフローを循環することができ、最終的な文書が生成されるまで、対話式ＰＤＤＩ１２２においてパラメータを修正し続ける。 If the user presses the update button instead of the OK button, the user has not yet requested that the printer 102 generate a multimedia document. Rather, when the user modifies user selection parameters in the PDDI 122, the user presses the update button and the user asks for a preview field of the PDDI 122 to be updated. If the user presses the update button, the system 200 interactively returns 518 the results for the display in the interactive PDDI 122. This allows the user to preview how the multimedia document will appear with modification of newly added parameters. The flow of operation then returns to the point where the user has a 506 opportunity to select parameters, and the system 200 can again cycle through the flow until the final document is generated in the interactive PDDI 122. Continue to modify parameters.

ここで、図６を参照するに、システムが、マルチメディアデータを転送し、デフォルト変換を実行し、ＰＤＤＩ１２２における結果を表示するときの、システム２００における動作のフローを説明するためのフロー図を示している。この実施形態においては、ユーザは、マルチメディアレンダリングアプリケーションにおいて印刷ボタンを押す６０２。システム２００は、プリンタ１０２、ＰＣ又は他の位置に記憶されたプリンタ特性情報からのパラメータ及びデフォルト変換を読み込む６２０。システム２００は、次いで、マルチメディアデータがプリンタ１０２に既に転送されたかどうかを判定する。マルチメディアデータがプリンタ１０２にまだ転送されていない場合、システム２００はプリンタ１０２にマルチメディアデータを転送し６１２、動作フローを継続する。マルチメディアデータがプリンタに既に転送されている場合、システム２００は、定義されたパラメータを伴う転送が既に実行されたかどうかを判定する。否定的な場合、プリンタはマルチメディアデータに関する変換を実行する６１４。肯定的な場合、システム２００は、ユーザに対してＰＤＤＩ１２２を表示し６０４、変換結果を示す。ユーザは、マルチメディアコンテンツに適用された変換についてのパラメータを修正する６０６。システム２００は、次いで、ユーザがＰＤＤＩ１２２における更新ボタン又はＯＫボタンを押すのを待つ６０８。ユーザがキャンセルボタンを選択する場合、システム２００は終了し、ＰＤＤＩ１２２は視野から消える。ユーザがＯＫボタンを押した場合、プリンタ１０２は、文書、マルチメディアデータを生成し６１６、マルチメディアデータを有するペーパー文書にリンクするデータを制御し、システム２００はマルチメディアデータに識別子を割り当てる。ユーザが更新ボタンを押した場合、システム２００はプリンタ１２０に変換のためのパラメータを送信し６１０、次いで、動作のフローは再び循環する。 Referring now to FIG. 6, a flow diagram for illustrating the operational flow in the system 200 when the system transfers multimedia data, performs default conversion, and displays the results in the PDDI 122 is shown. ing. In this embodiment, the user presses 602 a print button in the multimedia rendering application. The system 200 reads 620 parameters and default conversions from printer characteristics information stored in the printer 102, PC or other location. System 200 then determines whether multimedia data has already been transferred to printer 102. If the multimedia data has not yet been transferred to the printer 102, the system 200 transfers the multimedia data to the printer 102 612 and continues the operational flow. If the multimedia data has already been transferred to the printer, the system 200 determines whether a transfer with the defined parameters has already been performed. If not, the printer performs a conversion 614 on the multimedia data. If so, the system 200 displays 604 PDDI 122 to the user and indicates the conversion result. The user modifies 606 parameters for the transformation applied to the multimedia content. The system 200 then waits 608 for the user to press the update or OK button on the PDDI 122. If the user selects the cancel button, the system 200 ends and the PDDI 122 disappears from view. If the user presses the OK button, the printer 102 generates the document, multimedia data 616, controls the data linked to the paper document having the multimedia data, and the system 200 assigns an identifier to the multimedia data. If the user presses the update button, the system 200 sends the parameters for conversion 610 to the printer 120, and then the flow of operations cycles again.

図７乃至１９は、下で説明するように、ユーザがＰＤＤＩ１２２を見る前に、先ず、マルチメディアデータにデフォルト変換が適用される図６の方法において、ユーザに対して表示されるＰＤＤＩ１２２の例を示している。更に、図７乃至１９の例は又、図５の方法において示された対話式ＰＤＤＩ１２２であることが可能であり、その対話式ＰＤＤＩ１１２は、ユーザが変換の結果を修正することを可能にする。図５の方法において、ユーザに対して初期に表示される初期ＰＤＤＩ１２２の例としては、“ＰｒｉｎｔｅｒｗｉｔｈＥｍｂｅｄｄｅｄＡｕｄｉｏ／ｖｉｄｅｏＣｏｎｔｅｎｔＲｅｃｏｇｎｉｔｉｏｎａｎｄＰｒｏｃｅｓｓｉｎｇ”と題され、２００４年３月３０日に、Ｈｕｌｌ等により出願された米国特許出願公開第１０／０８１，１２９号明細書において提供されており、この文献の援用によって発明の説明の一部を代替する。 FIGS. 7-19 show examples of PDDI 122 displayed to the user in the method of FIG. 6 in which default conversion is first applied to multimedia data before the user views PDDI 122, as described below. Show. Further, the example of FIGS. 7-19 can also be the interactive PDDI 122 shown in the method of FIG. 5, which allows the user to modify the result of the conversion. In the method of FIG. 5, an example of the initial PDDI 122 that is initially displayed to the user is entitled “Printer with Embedded Audio / video Content Recognition and Processing”, filed by Hull et al. On Mar. 30, 2004. U.S. Patent Application Publication No. 10 / 081,129, which is incorporated herein by reference, and replaces part of the description of the invention.

音声
図７は、音声ファイルにおけるユーザ選択領域を印刷するためのＰＤＤＩ１２２のグラフィック変換を示している。ユーザは、レイアウト、セグメント化等に関するユーザの好みを指定するためにＰＤＤＩ１２２においてフィールドに情報を入力することができる。図７の実施形態において示されたＰＤＤＩ１２２は、例えば、プリンタフィールド７０４、印刷領域フィールド７０６及びコピー及び調節フィールド７０８などの標準的プリンタダイアログボックスに見られる一部のフィールドを含んでいる。しかしながら、ＰＤＤＩ１２２は又、アドバンスドオプションフィールド（ＡｄｖａｎｃｅｄＯｐｔｉｏｎｓＦｉｅｌｄ）７１０、プレビューフィールド７１２及びコンテンツ選択フィールド７１４等の標準的プリンタダイアログボックスに見られないフィールドを表示する。 Audio FIG. 7 shows graphic conversion of the PDDI 122 for printing a user selected area in an audio file. A user can enter information into fields in the PDDI 122 to specify user preferences for layout, segmentation, and the like. The PDDI 122 shown in the embodiment of FIG. 7 includes some fields found in a standard printer dialog box such as, for example, a printer field 704, a print area field 706, and a copy and adjust field 708. However, PDDI 122 also displays fields that are not found in standard printer dialog boxes, such as Advanced Options Field 710, Preview Field 712, and Content Selection Field 714.

標準的プリンタダイアログボックスに示すように、ＰＤＤＩ１２２の上部は、印刷されるマルチメディアファイルの名前（例えば、“ｌｏｃｏｍｏｔｉｏｎ．ｍｐ３”）を表示するファイル名フィールド７０２を含む。プリンタフィールド７０４において、ユーザは、どのプリンタがプリントジョブを実行するか、及び、印刷ジョブ、画像又はファイルとしての印刷、印刷順序等の他のオプションを選択することができる。更に、プリンタフィールド７０４は、選択されたプリンタ、プリンタのタイプ、どこにプリンタは位置付けられているか等の状態を表示する。印刷範囲フィールド７０６は、文書のどの部分を印刷するか等についてユーザが選択することを可能にする。コピー及び調節フィールド７０８は、印刷ジョブにおいて生成されるコピーされるコピー数、ペーパーに関連する印刷ジョブページのサイズ、ペーパーにおける印刷ジョブページの位置合わせ等をユーザが指定することを可能にする。図示してはいないが、このダイアログボックスは又、映像、音声又は文書の表現を出力することに関連する他の従来の印刷パラメータのいずれの種々の組み合わせを含むことが可能である。 As shown in the standard printer dialog box, the top of PDDI 122 includes a file name field 702 that displays the name of the multimedia file to be printed (eg, “lomotion.mp3”). In the printer field 704, the user can select which printer performs the print job and other options such as print job, printing as an image or file, print order, and the like. Further, the printer field 704 displays the status of the selected printer, the type of printer, where the printer is located, and the like. The print range field 706 allows the user to select which part of the document is to be printed and so forth. The copy and adjust field 708 allows the user to specify the number of copies to be generated in the print job, the size of the print job page associated with the paper, the alignment of the print job page on the paper, and the like. Although not shown, the dialog box can also include any various combinations of video, audio, or other conventional printing parameters associated with outputting a representation of a document.

図７の実施形態において、アドバンスドオプションフィールド７１０は、眼ル地メディアコンテンツのレイアウト及びフォーマット化に特定のオプションをユーザに提供する。この実施形態において、ユーザは、ユーザがマルチメディアコンテンツに適用したかったセグメント化のタイプを選択することができる。本発明のこの実施形態においては、ユーザは、セグメント化タイプフィールド７１６における矢印をクリックして、ユーザが選択することができるタイプのリストを表示させることができるドロップダウンメニューが現れる。セグメント化タイプは、例えば、音声イベント検出、話者セグメント化、話者認識、音声ソース定位化、発話認識、プロファイル分析、映像イベント検出、カラーヒストグラム分析、顔検出、クラスタリング、顔認識、光学式文字認識（ＯＣＲ）、動き分析、距離予測、前景／背景セグメント化、シーンセグメント化、自動車認識及びライセンスプレート認識等を含むが、それらの限定されるものではない。この例においては、ユーザは、セグメント化タイプフィールド７１６におけるいずれのセグメント化タイプを選択していない、それ故、セグメント化タイプは“ＮＯＮＥ”と示されている。従って、この例においては、ユーザは、コンテンツ選択フィールド７１４においてセレクタ７３６を動かすことにより音声波形タイムライン７３４における音声範囲をマニュアルで選択する。 In the embodiment of FIG. 7, the advanced options field 710 provides the user with specific options for the layout and formatting of the media media content. In this embodiment, the user can select the type of segmentation that the user wanted to apply to the multimedia content. In this embodiment of the invention, the user clicks on the arrow in the segmentation type field 716 to bring up a drop-down menu that can display a list of types that the user can select. Segmentation types include, for example, speech event detection, speaker segmentation, speaker recognition, speech source localization, utterance recognition, profile analysis, video event detection, color histogram analysis, face detection, clustering, face recognition, optical character These include but are not limited to recognition (OCR), motion analysis, distance prediction, foreground / background segmentation, scene segmentation, car recognition and license plate recognition. In this example, the user has not selected any segmentation type in the segmentation type field 716, so the segmentation type is indicated as “NONE”. Accordingly, in this example, the user manually selects the audio range in the audio waveform timeline 734 by moving the selector 736 in the content selection field 714.

各々のセグメント化タイプは、セグメント化において選択されたイベントの各々に関連する信頼レベルを有することができる。例えば、ユーザが、音声データにおいて生じる拍手イベント従って音声データをセグメント化する音声イベント検出を適用する場合、各々の拍手イベントは、拍手イベントが正確に検出される信頼を規定する対応する信頼レベルを有する。アドバンスドオプションフィールド７１０においては、ユーザは、特定のセグメント化に関連する信頼値に関する閾値を定義又は調節することができる。ユーザは、閾値フィールド７１８に閾値をタイプすることにより閾値を設定する。例えば、ユーザは、閾値の７５％に設定することができ、この閾値より大きいイベントのみ（即ち、イベントが拍手イベントであるように正確に検出される確率は７５％より大きい）を表示する。他の実施形態においては、閾値スライダ（図示せず）はＰＤＤＩ１２２に含まれ、ユーザは、その範囲内の特定の閾値を選択するために０％から至１００％まで動く閾値バーに沿ってスライダを動かすことができる。 Each segmentation type can have a confidence level associated with each of the events selected in the segmentation. For example, if the user applies audio event detection that segments the audio data according to the applause event that occurs in the audio data, each applause event has a corresponding confidence level that defines the confidence that the applause event is accurately detected. . In the advanced options field 710, the user can define or adjust a threshold for a confidence value associated with a particular segmentation. The user sets the threshold by typing the threshold in the threshold field 718. For example, the user can set 75% of the threshold, and only display events that are greater than this threshold (ie, the probability that the event will be accurately detected as a clap event is greater than 75%). In other embodiments, a threshold slider (not shown) is included in PDDI 122 and the user moves the slider along a threshold bar that moves from 0% to 100% to select a particular threshold within that range. Can move.

一実施形態において、ユーザは又、生成されたマルチメディア表現に関連したレイアウト選択を行うことができる。ユーザは、“フィットオンフィールド７２０において、音声波形タイムライン７３４が表示されるページの数を設定する。ユーザは又、タイムライン数選択フィールド７２２において、各々のページに表示されるタイムラインの数を選択する。更に、ユーザは、方向フィールド７２４において、マルチメディア表現におけるタイムラインの表示の方向（例えば、垂直方向の又は水平方向の）を選択する。例えば、図７に示すように、ユーザは、水平方向に、１ページに表示される１タイムラインを有することを選択することができ、このことは、ページに水平方向に全体的な音声波形タイムライン７３４を表示する。他の例としては、ユーザは、２つのページ（即ち、ページ当たり２つのタイムライン）に亘って垂直方向に表示される４つの部分に分割された音声波形タイムライン７３４を有することを選択することができる。 In one embodiment, the user can also make layout selections associated with the generated multimedia representation. The user sets the number of pages on which the audio waveform timeline 734 is displayed in the “fit on field 720. The user also sets the number of timelines displayed on each page in the timeline number selection field 722. Further, the user selects the display direction (eg, vertical or horizontal) of the timeline in the multimedia representation in the direction field 724. For example, as shown in FIG. You can choose to have one timeline displayed on one page in the horizontal direction, which displays the overall audio waveform timeline 734 in the horizontal direction on the page. The user can see four displayed vertically in two pages (ie two timelines per page) It can be selected to have a speech waveform timeline 734 is divided into minute.

又、図７の実施形態においては、更新ボタン７２６、ページ設定ボタン７２８、ＯＫボタン７３０及びキャンセルボタン７３２を含む種々のボタンを示している。図５及び６に関連して説明したように、ユーザがＰＤＤＩにおいて印刷ジョブパラメータを修正したとき、ユーザは更新ボタン７２６を選択することができ、ユーザは、マルチメディア表現がどのように現れるかについて、更新された画像を見ることを希望する。プレビューフィールド７１２において示されるマルチメディア文書の画像は、ユーザがＰＤＤＩ１２２において実行したいずれの新しい変更を表示するように更新される。又、システムは、いずれの時間の変更がＰＤＤＩ１２２においてなされたプレビューフィールド７１２を自動的に変更するように、指定されることができる。一実施形態において、ユーザがページ設定ボタン７２８を選択するとき、異なるダイアログインタフェースボックスがユーザに対して表示され、ユーザに種々の印刷フォーマット化オプションを提供する。このことについては、下で更に詳細に説明する。図７の実施形態は又、ＯＫボタン７３０を含み、ユーザがこのボタンを選択するとき、プリンタは、ＰＤＤＩ１２２において設定された現在のユーザ定義の好みの下でマルチメディア文書の生成を準備する。ユーザがその処理のいずれの時点でキャンセルボタン７３２を選択する場合、印刷ジョブ終端の生成及びＰＤＤＩ１２２は消滅する。 In the embodiment of FIG. 7, various buttons including an update button 726, a page setting button 728, an OK button 730, and a cancel button 732 are shown. As described in connection with FIGS. 5 and 6, when the user modifies the print job parameters in PDDI, the user can select the update button 726 and the user can see how the multimedia representation appears. Hope to see the updated image. The image of the multimedia document shown in preview field 712 is updated to display any new changes that the user has made in PDDI 122. The system can also be specified to automatically change the preview field 712 that any time change is made in the PDDI 122. In one embodiment, when the user selects the page setup button 728, a different dialog interface box is displayed to the user, providing the user with various print formatting options. This will be explained in more detail below. The embodiment of FIG. 7 also includes an OK button 730, when the user selects this button, the printer prepares to generate a multimedia document under the current user-defined preferences set in the PDDI 122. If the user selects the cancel button 732 at any point in the process, the print job end generation and PDDI 122 disappear.

図７の実施形態においては、コンテンツ選択フィールド７１４は、変換及び印刷のためにユーザにより選択された音声データを表示するタイムラインにおける音声情報波形を示す。この例において、音声波形タイムラインの上部は時間“００：００：００”又はこの音声コンテンツの開始時間を示す。この音声波形タイムライン７３４の下部は、時間“００：０７：１４”又は音声コンテンツの終了時間を示す。従って、この例における音声情報は７分１４秒間である。ユーザは、対応するマーカー又は識別子が生成されたマルチメディア文書に関して表示される音声コンテンツの特定のセグメントを選択するために、音声波形タイムライン７３４に沿ってセレクタ７３６をスライドさせることができる。例えば、ユーザは、クリックするためにマウス又は他の選択装置を用い、セグメント７４０にセレクタ７３６をスライドさせることができ、そのセグメントは、図７において選択されたセグメントそして示される。一実施形態においては、一旦、セレクタ７３６が、ユーザが選択することを希望する音声コンテンツのセグメントに位置付けられると、ユーザは、セレクタ７３６をクリック又はダブルクリックしてセグメントを選択することができる。この例において、ユーザは、ユーザが選択することを希望する音声セグメントの移動量を横断してセレクタ７３６をクリック及びドラッグすることにより音声コンテンツのより長いセグメントを選択することができる。音声波形タイムライン７３４は又、例えば、水平タイムラインを示す、２つ以上のタイムラインを隣り合わせて示す、異なる波形表示を示す、特定のスキームに従って色付けされた波形を示すなどのような多くの代替の方式で表示されることが可能である。 In the embodiment of FIG. 7, the content selection field 714 shows an audio information waveform in a timeline that displays audio data selected by the user for conversion and printing. In this example, the upper part of the audio waveform timeline indicates time “00:00:00” or the start time of this audio content. The lower part of the audio waveform timeline 734 indicates time “00:07:14” or the end time of the audio content. Therefore, the audio information in this example is 7 minutes 14 seconds. A user can slide selector 736 along audio waveform timeline 734 to select a particular segment of audio content to be displayed for the multimedia document for which the corresponding marker or identifier was generated. For example, the user can use a mouse or other selection device to click and slide the selector 736 over the segment 740, which is shown as the selected segment in FIG. In one embodiment, once the selector 736 is positioned on the segment of audio content that the user wishes to select, the user can click or double-click the selector 736 to select the segment. In this example, the user can select a longer segment of audio content by clicking and dragging selector 736 across the amount of movement of the audio segment that the user wishes to select. The audio waveform timeline 734 can also be a number of alternatives, such as showing a horizontal timeline, showing two or more timelines next to each other, showing different waveform displays, showing waveforms colored according to a particular scheme, etc. Can be displayed.

図７に示す実施形態においては、ユーザは、マルチメディア文書においてカーキングされる音声波形タイムライン７３４の３つの領域を選択した。ユーザは、セグメント７４０、セグメント７４２及びセグメント７４４を選択した。これらの選択されたセグメント各々は、マルチメディア文書の印刷プレビューに表示された、分離した対応するマーカー又は識別子７６６（例えば、バーコード）を有する。例えば、図７のプレビューフィールド７１２において、マルチメディア文書の画像が示されている。この文書は、左に対する音声タイムラインの開始及び右に対する終了を有するように表示される１つの水平タイムラインを含む１つのページを示している。この例においては、全音声波形タイムライン７３６が、マルチメディア文書ページにおいて表示されている。又、プレビューフィールド７１２において表示されたタイムラインは、３つのマーカー又は識別子７６６であって、１つはセグメント７４０についてのものであり、１つはセグメント７４２についてのものであり、１つはセグメント７４４についてのものである。各々のマーカー７６６は、バーコードと音声コンテンツにおけるセグメントの位置を与えるタイムスタンプとを含む。更に、プレビューフィールド７１２に示す図７の例は、音声コンテンツについての情報（例えば、音声コンテンツのタイトル、音声コンテンツを創ったミュージシャン及び音声コンテンツの年月日）を含むことができるヘッダを含むプレビューフィールド７１２を示している。マルチメディア文書は、その文書（即ち、下部、中央）のどこかに位置付けされることができるプレイ識別子又はプレイマーカー７６０を更に含む。 In the embodiment shown in FIG. 7, the user has selected three regions of the audio waveform timeline 734 to be carded in the multimedia document. The user has selected segment 740, segment 742 and segment 744. Each of these selected segments has a separate corresponding marker or identifier 766 (eg, barcode) displayed in the print preview of the multimedia document. For example, an image of the multimedia document is shown in the preview field 712 of FIG. This document shows one page containing one horizontal timeline displayed with the beginning of the audio timeline for the left and the end for the right. In this example, the full audio waveform timeline 736 is displayed on the multimedia document page. Also, the timeline displayed in the preview field 712 is three markers or identifiers 766, one for segment 740, one for segment 742, and one for segment 744. Is about. Each marker 766 includes a barcode and a time stamp that gives the location of the segment in the audio content. Furthermore, the example of FIG. 7 shown in preview field 712 includes a preview field that includes a header that may include information about the audio content (eg, the title of the audio content, the musician who created the audio content, and the date of the audio content). 712 is shown. The multimedia document further includes a play identifier or play marker 760 that can be located somewhere in the document (ie, bottom, center).

ユーザは、多くの方法で音声コンテンツをプレイすることができる。例えば、ユーザは、セグメントがプレイを開始するために音声波形タイムライン７５０におけるプレイセレクタ又はプレイ矢印７５０をクリックすることができる。更に、システムは、プレイ矢印７５０の選択が音声波形タイムライン７３４における全音声コンテンツがプレイを開始するように、構成されることができる。ユーザは又、マルチメディア文書において対応するマーカーを検出するためにいずれか１つの選択されたセグメントを右クリックすることができる。ペーパーマルチメディア表現は又、音声コンテンツをプレイするためのインタフェースを提供することができる。ユーザは、ペーパー文書におけるいずれかの選択されたセグメントのためにいずれかのマーカーを選択する（即ち、バーコードを走査する）ことができる。例えば、ユーザは、バーコードスキャナを有する携帯電話又はＰＤＡ装置を用いてバーコードを走査することができる。ユーザは、携帯電話又はＰＤＡにおける選択された抜粋を聴くことができる、又は、ユーザは、ユーザのＰＣにおけるサウンドカードによりそのコンテンツを聴くことができる。又、ユーザはポーズボタンとして機能するプレイマーカー７６０を選択することができ、それ故、ユーザがページにおけるいずれかのマーカーを選択して対応する音声コンテンツをプレイする場合、ユーザはプレイマーカー７６０を選択することによりこの音声コンテンツを停止することができる。ユーザは、再び、プレイマーカー７６０を選択することにより、音声コンテンツのプレイを再開することができる、又は、ユーザは、対応する音声コンテンツをプレイするためにページにおける他のマーカーを選択することができる。 Users can play audio content in many ways. For example, the user can click the play selector or play arrow 750 in the audio waveform timeline 750 to start the segment playing. Further, the system can be configured such that selection of the play arrow 750 causes all audio content in the audio waveform timeline 734 to begin playing. The user can also right click on any one selected segment to find the corresponding marker in the multimedia document. The paper multimedia representation can also provide an interface for playing audio content. The user can select any marker (ie, scan a barcode) for any selected segment in the paper document. For example, a user can scan a barcode using a cell phone or PDA device that has a barcode scanner. The user can listen to selected excerpts on the mobile phone or PDA, or the user can listen to the content with a sound card on the user's PC. The user can also select a play marker 760 that functions as a pause button, so if the user selects any marker on the page to play the corresponding audio content, the user selects the play marker 760 By doing so, the audio content can be stopped. The user can resume playing the audio content by selecting play marker 760 again, or the user can select other markers on the page to play the corresponding audio content. .

ここで、図８を参照するに、設定されたマルチメディア文書ページ又はダイアログインタフェース８００を設定したページについてのＰＤＤＩ１２２のグラフィック表現を示している。ユーザが、上記のように、ページ設定ボタン７２８を選択するとき、ページ設定ダイアログインタフェース８００が表示され、ユーザはフォーマット化オプションを選択することができる。ペーパーフィールド８０２において、ユーザは、マルチメディア印刷ジョブに対して、ペーパーサイズ（例えば、レター）及びペーパーソース（例えば、自動選択）を選択することができる。方向フィールド８０４において、ユーザは、ポートレート又は風景フォーマットにおいて文書を方向付けるかどうかを指定することができる。好みフィールド８０６において、ユーザは、タイトル、テキストフォントタイプ及びサイズ（例えば、ヘルベチカ、２２インチサイズ）、メディアタイプ（例えば、映像）、ページにおけるマーカーの位置（例えば、波形の上）を設定することができ、又、ユーザは、波形、中央のタイトル、タイムラインバーコード及びその周波数、並びに時間ラベルを印刷するか否かを決定することができる。又、ページ設定ダイアログインタフェース８００において示されたページ設定オプションの各々を、図７に示すＰＤＤＩのような主ＰＤＤＩに組み込むことができる。ページ設定オプションは、図８に示されるようなオプションに限定されることなく、他の実施形態においては、種々の異なるページ設定オプションがユーザに提供される。 Referring now to FIG. 8, there is shown a graphical representation of the PDDI 122 for a set multimedia document page or a page with a dialog interface 800 set. When the user selects the page settings button 728 as described above, the page settings dialog interface 800 is displayed and the user can select formatting options. In the paper field 802, a user can select a paper size (eg, letter) and a paper source (eg, automatic selection) for a multimedia print job. In the direction field 804, the user can specify whether to direct the document in portrait or landscape format. In the preference field 806, the user can set the title, text font type and size (eg, Helvetica, 22 inch size), media type (eg, video), and marker position (eg, above the waveform) on the page. The user can also decide whether to print the waveform, center title, timeline barcode and its frequency, and time label. Also, each of the page setting options shown in the page setting dialog interface 800 can be incorporated into a main PDDI such as the PDDI shown in FIG. The page setup options are not limited to the options as shown in FIG. 8, and in other embodiments, a variety of different page setup options are provided to the user.

ここで、図９を参照するに、音声ファイルの２ページの要旨を生成するＰＤＤＩ１２２のグラフィック表現を示している。ＰＤＤＩ１２２は図７において示したＰＤＤＩと類似しているが、ユーザは、マルチメディア文書のページ毎に印刷される３つのタイムラインを含むことを、タイムライン数選択フィールド７２２において選択した。更に、ユーザは、２ページに亘ってマルチメディア文書を印刷することを、ページフィットフィールド７２０において選択した。ユーザは、マーカーが表示される音声コンテンツの４つのセグメントを、コンテンツ選択フィールドにおいて選択した。選択されたセグメントは、セグメント９０２、セグメント９０４、セグメント９０６及びセグメント９０８を含む。 Referring now to FIG. 9, there is shown a graphic representation of the PDDI 122 that generates a two page summary of the audio file. PDDI 122 is similar to the PDDI shown in FIG. 7, but the user has selected in timeline number selection field 722 to include three timelines to be printed for each page of the multimedia document. In addition, the user has selected in page fit field 720 to print the multimedia document over two pages. The user has selected four segments of audio content in which the marker is displayed in the content selection field. Selected segments include segment 902, segment 904, segment 906 and segment 908.

図９のプレビューフィールド７１２に示すマルチメディア文書は、ページにおける３つのタイムラインを表示しており、２ページが生成されたことを示している。各々の水平方向のタイムラインのはじまりにおけるタイムスタンプ９１０は、そのタイムラインの開始時間を表示している。プレビューフィールド７１２における上位部ページにおいて示されるタイムラインは、コンテンツ選択フィールド７１４において示される音声波形タイムライン７３４の半分に対応する。更に詳細には、マルチメディア文書は音声波形の上部半分を表示し、３つの分離したタイムラインに分割している。選択されたセグメント９０２及び９０４に対応するマーカーは、プレビューフィールド７１２に示すページに表示されている。選択されたセグメント９０６及び９０８に対応するマーカーは第２ページに表示されており、そのコンテンツはプレビューフィールド７１２においては見ることができない。 The multimedia document shown in the preview field 712 of FIG. 9 displays three timelines on the page, indicating that two pages have been generated. A time stamp 910 at the beginning of each horizontal timeline indicates the start time of the timeline. The timeline shown on the upper page in the preview field 712 corresponds to half of the audio waveform timeline 734 shown in the content selection field 714. More specifically, the multimedia document displays the upper half of the speech waveform and is divided into three separate timelines. Markers corresponding to the selected segments 902 and 904 are displayed on the page shown in the preview field 712. Markers corresponding to the selected segments 906 and 908 are displayed on the second page and their contents are not visible in the preview field 712.

図９のプレビューフィールド７１２における文書は、ページにおいて表示された３つのタイムラインの各々の始めと終わりの近くにタイムラインマーカー９１２を付加的に含む。これらのタイムラインマーカーは、ユーザがマルチメディアコンテンツにアクセスすることができる印刷文書における付加的な中間インタフェースポイントをユーザに提供する。タイムラインマーカー９１２は、各々の印刷されたタイムラインのはじめ又は終わりに対応する音声コンテンツにおける位置を示し、ユーザは、音声コンテンツが音声ファイルのその位置においてプレイを開始するためマーカーを選択する（即ち、上記のように、バーコードを走査する）ことにより、それらの位置にアクセスすることができる。
図９におけるタイムラインマーカー９１２はタイムラインの下に表示されるが、これらのタイムラインマーカー９１２は又、それらタイムラインの上又は近くに表示されることが可能である。これらのタイムラインマーカー９１２は又、印刷された文書において表示され、ユーザが規定された位置においてマルチメディアコンテンツにアクセスすることができる他のインタフェースを提供する。 The document in preview field 712 of FIG. 9 additionally includes a timeline marker 912 near the beginning and end of each of the three timelines displayed on the page. These timeline markers provide the user with additional intermediate interface points in the printed document that allow the user to access multimedia content. The timeline marker 912 indicates the position in the audio content corresponding to the beginning or end of each printed timeline, and the user selects the marker for the audio content to start playing at that position in the audio file (ie. By scanning the barcode as described above, these positions can be accessed.
Although the timeline markers 912 in FIG. 9 are displayed below the timelines, these timeline markers 912 can also be displayed above or near the timelines. These timeline markers 912 also provide other interfaces that are displayed in the printed document and allow the user to access multimedia content at defined locations.

図１０は、タイムラインが２つの部分に分割されたＰＤＤＩ１２２のグラフィック表現を示している。ＰＤＤＩ１２２は、図７において示すＰＤＤＤＩに類似しているが、ユーザは、出力が１つのページに制限されることを、“フィットオン”フィールドにおいて選択された。タイムライン数選択フィールド７２２は、ページ当たり２つのタイムラインを指定する。従って、コンテンツ選択フィールド７１４において示された音声波形タイムライン７３４は２つに分割され、それらの半分ずつは２つの水平方向のタイムラインとしてマルチメディア文書において表示される。ユーザは、音声波形タイムライン７３４において、再びセグメントを選択し、それらのセグメントに対応するマーカーはマルチメディア文書において表示される。具体的には、ユーザは、セグメント１００２、１００４、１００６及び１００８を選択した。 FIG. 10 shows a graphical representation of the PDDI 122 with the timeline divided into two parts. PDDI 122 is similar to the PDDDI shown in FIG. 7, but the user has selected in the “fit on” field that the output is limited to one page. The timeline number selection field 722 specifies two timelines per page. Accordingly, the audio waveform timeline 734 shown in the content selection field 714 is divided into two, half of which are displayed in the multimedia document as two horizontal timelines. The user selects segments again in the audio waveform timeline 734 and the markers corresponding to those segments are displayed in the multimedia document. Specifically, the user has selected segments 1002, 1004, 1006 and 1008.

図１１をここで参照するに、タイムラインが２つの垂直部分に分割され、セグメント化タイプ及び閾値レベルが適用されたＰＤＤＩ１２２のグラフィック表現を示している。この例においては、ユーザは、マルチメディア文書のページ当たり印刷された２つのタイムラインを含むことを、タイムライン数選択フィールド７２２において選択した。更に、ユーザは、２ページに亘ってマルチメディア文書を印刷することを、“フィットオン”フィールド７２０において選択した。ユーザは又、マルチメディア文書において垂直方向のタイムラインを表示することを方向フィールド７２４において選択した。それ故、コンテンツ選択フィールド７１４において示された音声波形タイムライン７３４は半分に分割され、上方の半分はプレビューフィールド７１２において示されるページに示される。
下方の半分は第２ページにおいて示され、そのコンテンツはプレビューフィールドにおいて示されない。 Referring now to FIG. 11, a graphical representation of the PDDI 122 is shown with the timeline divided into two vertical parts and the segmentation type and threshold level applied. In this example, the user has selected in the timeline number selection field 722 to include two timelines printed per page of the multimedia document. In addition, the user has selected in the “fit on” field 720 to print the multimedia document over two pages. The user also selected in direction field 724 to display a vertical timeline in the multimedia document. Therefore, the audio waveform timeline 734 shown in the content selection field 714 is divided in half and the upper half is shown on the page shown in the preview field 712.
The lower half is shown on the second page and its content is not shown in the preview field.

図１１の例において、セレクタ７３６を有する音声波形タイムライン７３４のセグメントをマニュアルで選択することに代えて、ユーザは、音声データにセグメント化タイプを適用した。ユーザは、音声データにおける拍手イベントについての音声検出を行うことを、セグメント化タイプフィールド７１６において選択した。システム２００は、音声データにおける全ての拍手イベントを検索する。しかしながら、ユーザはまた、閾値選択フィールド７１８において、７５％の閾値を適用することを選択した。それ故、拍手イベントである確率が７５％より大きい音声イベントのみがＰＤＤＩ１２２において表示される。拍手イベントは、セグメント化表示フィールド１１０２において表示される。セグメント化表示フィールド１１０２において示された各々のイベントセグメント１１０４は、拍手イベントの確率が７５％より大きいイベントに対応する。 In the example of FIG. 11, instead of manually selecting a segment of the audio waveform timeline 734 having a selector 736, the user applied a segmentation type to the audio data. The user has selected in segmentation type field 716 to perform audio detection for applause events in the audio data. The system 200 searches for all applause events in the audio data. However, the user also chose to apply a 75% threshold in the threshold selection field 718. Therefore, only audio events with a probability of being applause events greater than 75% are displayed in PDDI 122. The applause event is displayed in the segmented display field 1102. Each event segment 1104 shown in the segmented display field 1102 corresponds to an event with a probability of applause event greater than 75%.

イベントセグメント１１０４は図１１における千鳥状ボックスに示されている。しかしながら、又、セグメント化表示フィールド１１０２を横断して延びるライン又は他の視覚インジケータを備えることが可能である。ユーザは、イベントセグメント１１０４を検出するためにイベントセグメント１１０４のいずれか１つをクリックすることができる。拍手イベントセグメント１１０４に対応するマーカー（即ち、バーコード、ＲＦＩＤタグ、ＵＲＬ又はマルチメディアデータを検索することができる位置についての他の表示）１１２０は、プレビューフィールド７１２において示されたるマルチメディア文書において示される。この例において、タイムスタンプ１１２２は又、各々のマーカー１１２０を伴う。ユーザは、拍手を含む可能性がある音声コンテンツをプレイするために、各々のイベントセグメント１１０４の近くに位置付けられた矢印７５０をクリックすることができる。このようにして、ユーザは、イベントセグメント１１０４が拍手イベントに本当に対応することを確実にするために、文書の印刷の前に、示されたイベントセグメント１１０４をチェックすることができる。更に、ユーザは、拍手コンテンツをプレイするために拍手イベントに対応する印刷文書におけるマーカーを選択することができる。図１１において説明した音声検出イベントの例以外に、音声コンテンツ又は他の種類のマルチメディアコンテンツに適用されることができる多くの他のセグメント化タイプが存在する。これらのセグメント化タイプの各々をセグメント化タイプフィールド７１６のメニューにおいて表示
することができ、ユーザは、セグメント化タイプが適用されるメニューから選択することができる。適用することができる種々の異なるセグメント化タイプの例の概略にすいて次に示す。話者セグメント化は１つの例であって、その話者セグメント化において、異なる話者に対応する各々のセグメントは異なる色で又は異なるアイコンにより示される。同じ話者により与えられたセグメントは、同じ色で又は同じアイコンにより示される。話者認識は他の例であって、その話者認識において、各々の話者の名前は、正確に検出された信頼度が添付される。ＰＤＤＩ１２２は、どの話者を表示するかをユーザに選択させる一連のチェックボックスを含む。ユーザは又、音声ソース定位化を適用することができ、その音声ソース定位化において、音声が検出された方向が円のセクタとして表示される。各々のセクタは、音声が正確に検出された信頼度を添付される。ユーザインタフェースは、どの方向が表示されるかをユーザに選択させるようにするプロトタイプ円の円周の周りに配列された一連のチェックボックスを含む。発話認識はセグメント化タイプの他の例であり、その発話認識において、タイムラインは、オプションとしての音声コンテンツの間に話された各々の単語又は文章に対する信頼度値及びテキストを表示する。 The event segment 1104 is shown in a staggered box in FIG. However, it is also possible to provide a line or other visual indicator that extends across the segmented display field 1102. The user can click on any one of the event segments 1104 to detect the event segment 1104. A marker corresponding to the applause event segment 1104 (ie, a bar code, RFID tag, URL or other indication about where the multimedia data can be retrieved) 1120 is shown in the multimedia document shown in the preview field 712. It is. In this example, the time stamp 1122 also accompanies each marker 1120. The user can click on an arrow 750 positioned near each event segment 1104 to play audio content that may include applause. In this way, the user can check the indicated event segment 1104 before printing the document to ensure that the event segment 1104 really corresponds to the applause event. Further, the user can select a marker in the printed document corresponding to the applause event to play the applause content. In addition to the audio detection event example described in FIG. 11, there are many other segmentation types that can be applied to audio content or other types of multimedia content. Each of these segmentation types can be displayed in a menu in the segmentation type field 716 and the user can select from the menu to which the segmentation type is applied. A summary of examples of various different segmentation types that can be applied follows. Speaker segmentation is an example, in which each segment corresponding to a different speaker is indicated by a different color or by a different icon. Segments given by the same speaker are shown in the same color or by the same icon. Speaker recognition is another example, and in the speaker recognition, each speaker's name is attached with an accurately detected reliability. PDDI 122 includes a series of check boxes that allow the user to select which speakers to display. The user can also apply audio source localization, in which the direction in which the audio was detected is displayed as a circular sector. Each sector is accompanied by a confidence that the voice was correctly detected. The user interface includes a series of check boxes arranged around the circumference of the prototype circle that allow the user to select which direction is displayed. Speech recognition is another example of segmentation type, in which the timeline displays a confidence value and text for each word or sentence spoken during the optional audio content.

映像
図１２は、映像ペーパー文書を生成するためのＰＤＤＩ１２２のグラフィック表示を示している。音声文書を生成するためのＰＤＤＩ１２２と同様に、ユーザは、映像文書を生成するためにＰＤＤＩ１２２におけるフィールドに情報を入力することができる。図１２の実施形態において示すＰＤＤＩ１２２は、例えば、プリンタフィールド７０４、印刷範囲フィールド７０６並びにコピー及び調節フィールド７０８等のような標準的プリンタダイアログボックスにおいてみられる幾つかのフィールドを含む。しかしながら、ＰＤＤＩ１２２は又、例えば、アドバンスドオプションフィールド（ＡｄｖａｎｃｅｄＯｐｔｉｏｎｓＦｉｅｌｄ）７１０、プレビューフィールド７１２及びコンテンツ選択フィールド７１４等のような標準プリンタダイアログボックスにみられないフィールドを表示する。 Video FIG. 12 shows a graphic display of PDDI 122 for generating a video paper document. Similar to the PDDI 122 for generating an audio document, a user can enter information into fields in the PDDI 122 to generate a video document. The PDDI 122 shown in the embodiment of FIG. 12 includes several fields found in a standard printer dialog box such as, for example, a printer field 704, a print range field 706, and a copy and adjust field 708. However, PDDI 122 also displays fields that are not found in the standard printer dialog box, such as, for example, Advanced Options Field 710, Preview Field 712, Content Selection Field 714, and the like.

図１２の実施形態において、アドバンスドオプションフィールド７１０は、マルチメディアコンテンツのレイアウトとフォーマット化に特定のオプションをユーザに提供する。この実施形態において、ユーザは、ユーザが映像コンテンツに適用することを希望するセグメント化タイプフィールド１２０２においてセグメント化タイプを選択する。映像文書を生成するためのセグメント化タイプは、図７における音声文書を生成することに関連して上で既に説明したセグメント化タイプを少なくとも含む。この例においては、ユーザは、セグメント化タイプフィールド１２０２におけるいずれかのセグメント化タイプを選択していない、それ故、セグメント化タイプ１２０２は“ない”と示される。このようにして、この例においては、ユーザは、コンテンツ選択フィールド７１４においてセレクタ１２２２を動かし且つユーザが選択したいと思う映像タイムライン表示の一部をクリックすることにより、所定の映像ファイルのセグメントに対する開始時間及び終了時間をマニュアルで選択する。 In the embodiment of FIG. 12, the advanced option field 710 provides the user with specific options for the layout and formatting of multimedia content. In this embodiment, the user selects a segmentation type in the segmentation type field 1202 that the user wishes to apply to the video content. Segmentation types for generating a video document include at least the segmentation types already described above in connection with generating the audio document in FIG. In this example, the user has not selected any segmentation type in the segmentation type field 1202, and therefore the segmentation type 1202 is indicated as “none”. Thus, in this example, the user moves the selector 1222 in the content selection field 714 and clicks on the portion of the video timeline display that the user wishes to select to start a segment for a given video file. Select time and end time manually.

アドバンスドオプションフィールド７１０において、ユーザは、上記のように、測定のセグメント化に関連する信頼度値に関する閾値を規定し、又は調節することができる。ユーザは、閾値フィールド１２０４に閾値をタイプすることによりその閾値を設定する。例えば、ユーザは閾値を７５％に設定することができ、この閾値より大きい（即ち、フレームが顔検出分析において顔を含む確率が７５％より大きい）フレームのみを表示する。他の実施形態においては、閾値スライダはＰＤＤＩ１２２において含まれ、ユーザは、範囲内の特定の閾値を選択するために０％から１００％まで実行する閾値バーに沿ってそのスライダを動かすことができる。更に、更新ボタン７２６、ページ設定ボタン７２８、ＯＫボタン７３０及びキャンセルボタン７３２を含む、図１２に実施形態において示されるボタンは、図７に関連して説明した対応するボタンに類似する方式で機能する。 In the advanced options field 710, the user can define or adjust a threshold for a confidence value associated with the segmentation of the measurement, as described above. The user sets the threshold by typing it in the threshold field 1204. For example, the user can set the threshold to 75% and only display frames that are greater than this threshold (ie, the probability that the frame contains a face in the face detection analysis is greater than 75%). In other embodiments, a threshold slider is included in PDDI 122 and the user can move the slider along a threshold bar that runs from 0% to 100% to select a particular threshold within the range. Further, the buttons shown in the embodiment in FIG. 12, including an update button 726, a page setting button 728, an OK button 730, and a cancel button 732, function in a manner similar to the corresponding buttons described in connection with FIG. .

図１２の実施形態において、コンテンツ選択フィールド７１４はタイムラインにおけるテキスト及び映像フレームを示し、そのタイムラインは規定されたある映像コンテンツの全体に亘る通常のインターバルにおいて抽出された。例えば、システムは、毎秒ＣＮＮニュースセグメントの映像フレームを保存し、映像タイムラインは、映像タイムフレームにおける保存されたフレームの少なくとも一部又は全部を表示する。抽出されたフレームは、タイムラインの上部におけるＣＮＮニュースセグメントにおいて、時間“００：００：００”において開始し、時間“００；１２：１９”において終了するまでタイムラインに沿って表示され続けるフレームを伴って表示される。この例において、映像タイムラインの上部において“００：００：００”又はタイムラインにおいて表示される映像コンテンツの開始時間が示される。映像タイムラインの下部においては、時間“００；１２：１９”又は映像コンテンツの終了時間が示される。一部の実施形態においては、映像フレームをタイムラインに沿って逆の順序に表示することができる。 In the embodiment of FIG. 12, the content selection field 714 shows the text and video frames in the timeline, and the timeline was extracted at regular intervals throughout a given video content. For example, the system stores video frames of CNN news segments per second and the video timeline displays at least some or all of the stored frames in the video time frame. The extracted frame is a frame that continues to be displayed along the timeline in the CNN news segment at the top of the timeline, starting at time “00:00:00” and ending at time “00; 12: 19”. Displayed with it. In this example, “00:00:00” at the top of the video timeline or the start time of the video content displayed on the timeline is shown. At the bottom of the video timeline, the time “00; 12:19” or the end time of the video content is shown. In some embodiments, video frames can be displayed in reverse order along the timeline.

又、コンテンツ選択フィールド７１４において表示される３つのカラム１２５０、１２５２及び１２５４がある。１つのカラム１２５０はテキスト情報を表示し、他の２つのカラム１２５２及び１２５４は映像フレームを表示する。２つのカラムに図１２において表示された映像フレームは隣り合って表示される。例えば、選択された第１フレームはタイムラインの左上部に表示され、選択された第２フレームは第１フレームのとねりに表示される。選択された第３フレームは、第１スレームの下に表示され、選択された第４フレームは第２フレームの下に表示される。映像フレーム表示はこのパターンでタイムラインに沿って継続される。他の実施形態においては、映像フレームを異なるパターンで表示することが可能であり、又は、１つのカラム又はタイムラインに沿った２つ又はそれ以上のカラムにおいて表示することが可能である。テキストのコピーは又、上部から下部であって、一般に、対応する映像フレームの近くに、図１２におけるタイムフレームに沿って表示される。他の実施形態においては、テキストは２つ又はそれ以上のカラム又は映像フレームの他の側において表示され、若しくは、タイムラインにおける全てにおいて表示される。 There are also three columns 1250, 1252 and 1254 displayed in the content selection field 714. One column 1250 displays text information, and the other two columns 1252 and 1254 display video frames. The video frames displayed in FIG. 12 in two columns are displayed next to each other. For example, the selected first frame is displayed in the upper left part of the timeline, and the selected second frame is displayed at the top of the first frame. The selected third frame is displayed below the first frame, and the selected fourth frame is displayed below the second frame. The video frame display is continued along the timeline in this pattern. In other embodiments, the video frames can be displayed in different patterns, or can be displayed in two or more columns along one column or timeline. A copy of the text is also displayed along the time frame in FIG. 12, from top to bottom, generally near the corresponding video frame. In other embodiments, the text is displayed in two or more columns or other sides of the video frame, or displayed in all in the timeline.

ユーザは、映像コンテンツの特定のセグメントを選択するために映像タイムラインに沿ってセレクタ１２２２をスライドさせることができ、そのセレクタは生成されたマルチメディア文書において表示される。一実施形態においては、一旦、セレクタ１２２２が、ユーザが選択したいと思う映像コンテンツのセグメントにおいて位置付けられると、ユーザはセグメント１２２６を選択するためにセレクタ１２２２をクリックすることができる。映像タイムラインは又、例えば、水平方向のタイムラインの表示、隣り合った２つ以上のタイムラインの表示、異なる映像フレーム表示等のような種々の代替の方式で表示されることが可能である。上記のように、図１２の実施形態における映像タイムラインが映像フレームと対応するテキストの両方を表示する一方、映像タイムラインは又、一部の実施形態においては、関連するテキストにおいて映像フレームのみを表示することができる。タイムラインが映像フレームのみを表示するこれらの実施形態においては、生成されたマルチメディア表現は、尚も、テキストと映像フレームの両方を含むことができ、又は、映像フレームのみに限定されることができる。 A user can slide a selector 1222 along the video timeline to select a particular segment of video content, which is displayed in the generated multimedia document. In one embodiment, once the selector 1222 is positioned in the segment of video content that the user wishes to select, the user can click the selector 1222 to select the segment 1226. The video timeline can also be displayed in a variety of alternative ways, for example, displaying a horizontal timeline, displaying two or more adjacent timelines, displaying different video frames, etc. . As noted above, while the video timeline in the embodiment of FIG. 12 displays both the video frame and the corresponding text, the video timeline also displays only the video frame in the associated text in some embodiments. Can be displayed. In those embodiments where the timeline displays only video frames, the generated multimedia representation may still include both text and video frames, or may be limited to video frames only. it can.

図１２に示す例においては、ユーザは、マルチメディア文書においてカーキングされた映像タイムラインの４つの領域を選択した。ユーザは、セグメント１２２６、セグメント１２２８、セグメント１２３０及びセグメント１２３２を選択した。これらの選択されたセグメントの各々は、１つ又はそれ以上の映像フレーム及びマルチメディア文書における対応するテキストとして表示される。図１２の例の文書は、５つの映像フレーム１２０６と関連テキスト１２１６を含む一ページを示している。表示された映像フレーム１２０６及び関連テキスト各々は、コンテンツ選択フィールド７１４において選択されたセグメントに対応する。例えば、マルチメディア文書の左上部において開始する第１の２つの映像フレーム１２０６は選択されたセグメント１２２６と対応している。図１２の例においては、マルチメディア文書の左上部の角に示されている映像フレーム１２０６は、映像タイムラインにおいて選択されたセグメント１２２８に対応している。文書の右上部の角における映像フレーム１２０６は選択されたセグメント１２３０に対応し、右下部の角における映像フレームは選択されたセグメント１２３２に対応している。 In the example shown in FIG. 12, the user has selected four areas of the video timeline carked in the multimedia document. The user has selected segment 1226, segment 1228, segment 1230 and segment 1232. Each of these selected segments is displayed as one or more video frames and corresponding text in the multimedia document. The example document of FIG. 12 shows a page including five video frames 1206 and associated text 1216. Each displayed video frame 1206 and associated text corresponds to the segment selected in the content selection field 714. For example, the first two video frames 1206 starting at the top left of the multimedia document correspond to the selected segment 1226. In the example of FIG. 12, the video frame 1206 shown in the upper left corner of the multimedia document corresponds to the segment 1228 selected in the video timeline. The video frame 1206 in the upper right corner of the document corresponds to the selected segment 1230, and the video frame in the lower right corner corresponds to the selected segment 1232.

更に、映像タイムラインにおいて各々表示された映像フレームの位置は、タイムマーカー１２４０として各々の映像フレームの上に表示される。図１２において、各々のタイムマーカー１２４０は、映像タイムラインにおいて表示された映像コンテンツの“００：００：００”から“００：１２：１９”全長さの範囲内のセグメントに対応する。例えば、ＣＮＮニュースセグメントに対するマルチメディア文書の左上部の角における映像フレーム１２０６は“００：０４：２１”のタイムマーカーを含む。このようにして、この映像フレーム１２０６に対応する映像コンテンツは、ＣＮＮニュースセグメントにおいて４分２１秒に開始する。更に、この映像フレーム１２０６に対応するテキスト１２１６は映像フレーム１２０６のコピーを表示し、このコピーはＣＮＮニュースセグメントにおいて４分２１秒に開始する。 Further, the position of each displayed video frame in the video timeline is displayed as a time marker 1240 on each video frame. In FIG. 12, each time marker 1240 corresponds to a segment in the range of “00:00:00” to “00:12:19” full length of the video content displayed on the video timeline. For example, the video frame 1206 in the upper left corner of the multimedia document for the CNN news segment includes a time marker of “00:04:21”. In this way, the video content corresponding to this video frame 1206 starts at 4 minutes 21 seconds in the CNN news segment. Furthermore, the text 1216 corresponding to this video frame 1206 displays a copy of the video frame 1206, which starts at 4 minutes 21 seconds in the CNN news segment.

ユーザは又、種々の方法で映像コンテンツをプレイすることができる。例えば、ユーザは、セグメントがプレイを開始するようにするために映像タイムラインにおいて選択されたセグメント各々の隣のプレイ矢印１２２４をクリックすることができる。図１２の実施形態において、マルチメディア文書に関して表示される映像フレーム１２０６各々は、映像フレーム１２０６の下に位置付けられる、対応するマーカー又は識別子１２０８（例えば、バーコード）を有する。これらの識別子１２０８は又、映像コンテンツをプレイするためにいずれかの識別子１２０８を選択する（即ち、バーコードを走査する）ことができ、このことは、音声セグメントに関連して上で説明したように、選択された映像セグメントをプレイするようにする。 The user can also play video content in various ways. For example, the user can click the play arrow 1224 next to each selected segment in the video timeline to cause the segment to begin playing. In the embodiment of FIG. 12, each video frame 1206 displayed for a multimedia document has a corresponding marker or identifier 1208 (eg, a barcode) positioned below the video frame 1206. These identifiers 1208 can also select any identifier 1208 to play the video content (ie, scan the barcode), as described above in connection with the audio segment. Then, the selected video segment is played.

ユーザが識別子１２０８を選択するとき、対応する映像コンテンツは、対応するタイムマーカー１２４０において表示される時間においてプレイを開始する。図１２の実施形態において、映像フレーム１２０６に関連するダイアログは、テキスト１２１６の関連コピーの開始において、始まる。例えば、ユーザが、図１２のプレビューフィールド７１２において示されるマルチメディア文書の左上部の角における映像フレームの下に示されるバーコードを走査する場合、ＣＮＮニュースセグメントの映像クリップはプレイし、４分２１秒のニュースショーが始まる。 When the user selects the identifier 1208, the corresponding video content starts playing at the time displayed at the corresponding time marker 1240. In the embodiment of FIG. 12, the dialog associated with video frame 1206 begins at the beginning of the associated copy of text 1216. For example, if the user scans the barcode shown below the video frame in the upper left corner of the multimedia document shown in the preview field 712 of FIG. The second news show begins.

図１２の実施形態において示すマルチメディア文書は、映像コンテンツ表示を制御するためにマーカー又は識別子を更に示す。図１２においては、プレイマーカー１２１０、早送り（ＦＦ）マーカー１２１２及び巻き戻しマーカー１２１４を示している。ユーザは印刷文書におけるプレイマーカーを選択することができ（即ち、携帯電話又は他の装置を用いてバーコードを走査することにより）、そのプレイマーカーは一時停止ボタンとして機能する。ユーザが印刷ページにおいていずれの識別子１２０８を選択し、且つ対応する映像コンテンツが、携帯電話のような、あるタイプの表示装置（図示せず）においてプレイされている場合、ユーザは、プレイマーカー１２１０を選択することによりこのプレイを一時停止することができる。ユーザは、印刷文書においてプレイマーカー１２１０を選択することにより映像コンテンツのプレイを再開することができる、又は、ユーザは、対応する映像コンテンツをプレイするためにページの他の識別子１２０８を選択することができる。更に、ユーザが印刷ページにおけるいずれの識別子１２０８を選択した場合、ユーザは、早送りマーカー１２１２又は巻き戻しマーカーを選択することにより、それぞれ、映像クリップを通して早送り又は巻き戻すことができる。 The multimedia document shown in the embodiment of FIG. 12 further shows markers or identifiers to control video content display. In FIG. 12, a play marker 1210, a fast forward (FF) marker 1212, and a rewind marker 1214 are shown. The user can select a play marker in the printed document (i.e., by scanning the barcode using a mobile phone or other device), and the play marker functions as a pause button. If the user selects any identifier 1208 on the print page and the corresponding video content is being played on a type of display device (not shown), such as a mobile phone, the user can click play marker 1210. This play can be paused by selecting. The user can resume playing the video content by selecting the play marker 1210 in the printed document, or the user can select another identifier 1208 of the page to play the corresponding video content. it can. Furthermore, if the user selects any identifier 1208 on the printed page, the user can fast forward or rewind through the video clip by selecting the fast forward marker 1212 or the rewind marker, respectively.

図１２の例において、プレビューフィールド７１２に示すマルチメディア文書は又、ヘッダを含み、そのヘッダは映像コンテンツについての情報（例えば、映像コンテンツのタイトル及び映像コンテンツの日付）を含むことができる。例えば、図１２のヘッダは、“ＣＮＮニュース”としての映像コンテンツを識別子、ニュースセグメントは“２００１年９月１９日”にプレイされたものである。 In the example of FIG. 12, the multimedia document shown in preview field 712 also includes a header, which may include information about the video content (eg, video content title and video content date). For example, the header of FIG. 12 identifies video content as “CNN news”, and the news segment is played on “September 19, 2001”.

図１２の実施形態において示すプレビューフィールド７１２は、プレビューコンテンツフィールド１２２０を更に含む。プレビューコンテンツフィールド１２２０は、ユーザがマルチメディア文書のペーパーバージョンをプレビューするか又はマルチメディア文書において表示された映像フレーム１２０６も関連する映像コンテンツをプレビューするかどうか、印付ける。ユーザは、“映像”ラジオボタンを選択することによりＰＤＤＩ１２２に組み込まれた映像プレーヤにより映像フレームに関連する選択された映像コンテンツをプレビューすることができる。 The preview field 712 shown in the embodiment of FIG. 12 further includes a preview content field 1220. The preview content field 1220 marks whether the user previews the paper version of the multimedia document or the video frame 1206 displayed in the multimedia document also previews the associated video content. The user can preview the selected video content associated with the video frame by a video player embedded in the PDDI 122 by selecting the “Video” radio button.

図１２の実施形態において、マルチメディア文書は、１つの特定のレイアウトに従ってプレビューフィールド７１２において示される。しかしながら、その文書は種々の異なるフォームにおいて配列されることができる。例えば、その文書はヘッダを含まないことが可能であり、タイムマーカー１２４０は映像フレーム１２０６の下に表示されることが可能であり、識別子１２０８は映像フレームンお上に表示されることが可能である、等である。 In the embodiment of FIG. 12, the multimedia document is shown in preview field 712 according to one particular layout. However, the document can be arranged in a variety of different forms. For example, the document can include no header, the time marker 1240 can be displayed below the video frame 1206, and the identifier 1208 can be displayed above the video frame. Yes, etc.

図１２は又、コンテンツ選択フィールド７１４において映像タイムラインに関して表示されるプレビューウィンドウ１２８０を示している。一部の実施形態においては、プレビューウィンドウ１２８０は、ユーザが映像タイムラインに沿ってセレクタオ１２２２を動かすとき、現れる。プレビューウィンドウ１３０２は、セレクタ１２２２が位置付けられる映像フレームの画像を表示する。プレビューウィンドウ１２８０は選択されたセグメントにおいて直接現れることができる、又は、それに代えて、そのセグメントの隣、周り又は下に現れることが可能である。 FIG. 12 also shows a preview window 1280 displayed for the video timeline in the content selection field 714. In some embodiments, the preview window 1280 appears when the user moves the selector 1222 along the video timeline. The preview window 1302 displays an image of the video frame where the selector 1222 is positioned. The preview window 1280 can appear directly in the selected segment, or alternatively can appear next to, around, or below the segment.

ここで、図１３を参照するに、図１２のＰＤＤＩ１２２のグラフィック表現を示してオリ、そのグラフィック表現において、ユーザは映像クリップをプレビューする。ユーザは、クリップがプレイされるようにするコンテンツ選択フィールド７１４における映像タイムラインに沿って選択されたセグメント各々の近くに位置付けされたプレイ矢印１２２４を選択することができる。又、プレイ矢印１２２４が選択されるとき、映像タイムラインにより表される全映像コンテンツがプレイを始めるように、システムは構成されることができる。映像がプレビューフィールド７１２においてプレイされるとき、タイムラインに沿った対応するセグメント（例えば、セグメント１２２６）はハイライトされ、そのセグメントの隣のプレイ矢印１２２４は、そのセグメントがプレイされていることを示すフォーム（例えば、二重線）に変更する。又、ユーザが映像フレームを選択し、プレビューフィールド７１２においてマルチメディア文書において特定の映像フレームをクリック又はダブルクリックすることにより、若しくは、映像タイムラインにおいてフレームをそのクリックすることにより、その映像フレームがプレイを始めるようにすることができるように、システムは指定されることができる。更に、一部の実施形態において、ユーザが映像タイムフレームにおいてセグメント（例えば、１２２４）を右クリックする場合、ダイアログボックスは映像をプレイする（セグメントの始めにおいて開始する）ためにユーザにオプションを提供するために現れる。ユーザはプレイボックスにおいてプレイオプションを選択することができ、映像フレームはプレビューフィールド７１２においてプレイを開始する。 Here, referring to FIG. 13, the graphic representation of the PDDI 122 of FIG. 12 is shown, and in that graphic representation, the user previews the video clip. The user can select a play arrow 1224 positioned near each selected segment along the video timeline in the content selection field 714 that causes the clip to be played. The system can also be configured such that when the play arrow 1224 is selected, all video content represented by the video timeline begins to play. When a video is played in the preview field 712, the corresponding segment along the timeline (eg, segment 1226) is highlighted and the play arrow 1224 next to that segment indicates that the segment is being played. Change to a form (for example, double line). Also, when the user selects a video frame and clicks or double-clicks the specific video frame in the multimedia document in the preview field 712 or clicks the frame in the video timeline, the video frame is played. The system can be specified so that you can get started. Further, in some embodiments, if the user right-clicks a segment (eg, 1224) in the video time frame, the dialog box provides an option to the user to play the video (start at the beginning of the segment). Appears for. The user can select a play option in the playbox and the video frame starts playing in the preview field 712.

ユーザがプレビューに対して特定の映像セグメントを選択するとき、ＰＤＤＩ１２２に組み込まれるメディアプレーヤは、映像セグメントの開始からプレビューフィールド７１２における映像セグメントのプレイを開始する。例えば、図１３において、映像セグメントは、ニュースセグメントに４分２１秒においてプレイを開始することができ、このことは、“００：０４：２１−００：６：３５．”から実行する選択されたクリップの開始に対応している。上記のように、映像コンテンツは又、特定のクリップにおけるのではなく、映像タイムラインにおいて“００：００：００”からプレイを開始することが可能である。又、システムは、ユーザがプレイボタン１３０４を選択するまで、メディアプレーヤがビデオクリップのプレイを開始しないように、指定されることが可能である。このようにして、プレビューされる映像セグメントを選択する際に、メディアプレーヤは、セグメントの開始においてスライダ１３０８と共に現れ、ユーザは、コンテンツがプレイを開始するようにプレイボタン１３０４を実際にクリックする必要がある。 When the user selects a particular video segment for preview, the media player embedded in PDDI 122 starts playing the video segment in preview field 712 from the start of the video segment. For example, in FIG. 13, the video segment can begin playing at the news segment at 4 minutes 21 seconds, which is selected to run from “00: 04: 21-00: 6: 35.” Corresponds to the start of a clip. As described above, video content can also start playing from “00:00:00” in the video timeline rather than in a specific clip. The system can also be specified so that the media player does not begin playing the video clip until the user selects the play button 1304. In this way, when selecting a video segment to be previewed, the media player appears with a slider 1308 at the start of the segment and the user needs to actually click the play button 1304 so that the content begins to play. is there.

プレビューフィールド７１２におけるメディアプレーヤは又、例えば、映像クリップの表示を停止する／一時停止するための一時停止ボタン１３１０、映像コンテンツにおいて巻き戻しするための巻き戻し戻しボタン、映像コンテンツにおいて早送りするための早送りボタン１３１４及び表示についてのボリュームを設定するためのボリュームアジャスタ１３０６等のような多くの標準的マルチメディアプレーヤ（例えば、ＭｉｃｒｏｓｏｆｔＷｉｎｄｏｗｓ（登録商標）ＭｅｄｉａＰｌａｙｅｒ）の特徴を含む。スライダ１３０８が又、含まれ、そのスライダはビデオコンテンツにおいてユーザが動き回ることを可能にする。スライダが沿って動く１３０８スライダバー１３１６は、タイムラインに沿って表示される全映像コンテンツの長さに対応し、又はクリップの長さのみに対応することができる。ユーザは、ビデオコンテンツにおいて動き回るように、スライダバー１３１６に沿ってスライダ１３０８をドラッグ又はクリックすることができる。早送りボタン１３１４及び巻き戻しボタン１３１２は、選択されたセグメントにおいてのみユーザが移動することを可能にする、又は、それに代えて、映像タイムラインに関連する全映像コンテンツにおいてユーザが移動することを可能にする。メディアプレーヤは、図１３に示す制御ボタンのいずれか１つを含まないことが可能であり。映像の表示を制御するために他のボタンを含むことが可能である。 The media player in the preview field 712 also includes, for example, a pause button 1310 to stop / pause display of the video clip, a rewind button to rewind in the video content, and fast forward to fast forward in the video content. Includes features of many standard multimedia players (eg, Microsoft Windows® Media Player) such as buttons 1314 and a volume adjuster 1306 for setting the volume for display. A slider 1308 is also included, which allows the user to move around in the video content. The 1308 slider bar 1316 along which the slider moves can correspond to the length of the entire video content displayed along the timeline, or only the length of the clip. The user can drag or click slider 1308 along slider bar 1316 to move around in the video content. The fast forward button 1314 and rewind button 1312 allow the user to move only in selected segments, or alternatively, allow the user to move in all video content associated with the video timeline. To do. The media player may not include any one of the control buttons shown in FIG. Other buttons can be included to control the display of the video.

図１３は又、図１２において示すプレビューウィンドウと類似して、コンテンツ選択フィールド７１４の映像タイムラインにおいて表示されるプレビューウィンドウ１２８０を示している。これは、ユーザがコンテンツ選択フィールド７１４における映像コンテンツをプレビューすることを可能にする。 FIG. 13 also shows a preview window 1280 displayed in the video timeline of the content selection field 714, similar to the preview window shown in FIG. This allows the user to preview the video content in the content selection field 714.

図１４をここで参照するに、映像クリップがプレビューフィールド７１２において表示されるＰＤＤＩ１２２のグラフィック表現を示している。図１４は、この実施形態におけるメディアプレーヤに含まれる開始マーカーボタン１４０２及び終了マーカーボタン１４０４の使用によるセグメントの生成を示している。スライダバー１３１６の開始が映像タイムライン（例えば、“００：００：００”における）関連する映像コンテンツの開始又は選択されたクリップの開始に対応するように、メディアプレーヤを指定することができる。映像コンテンツがプレイされるとき、ユーザは、関心のあるセグメントをマークするために、開始マーカーボタン１４０２及び終了マーカーボタン１４０４を用いることができる。例えば、ユーザが、特定の俳優について示されるニュースセグメントにおける映像コンテンツに興味をもっている場合、ユーザはメディアプレーヤの映像コンテンツをプレイすることができる。ユーザが、俳優について示されるセグメントに達したとき、ユーザはその位置を印付けるように介しマーカーボタンをクリックすることができる。セグメントが終了したとき、ユーザは終了マーカーボタン１４０４をクリックすることができる。ユーザは、ニュースセグメントの全体に亘ってこれを続けることができ、印刷されるように又は他の目的のために興味のあるセグメントに印付けることができる。 Referring now to FIG. 14, a graphical representation of the PDDI 122 where a video clip is displayed in the preview field 712 is shown. FIG. 14 shows the generation of segments by using the start marker button 1402 and the end marker button 1404 included in the media player in this embodiment. The media player can be specified such that the start of the slider bar 1316 corresponds to the start of the video content associated with the video timeline (eg, at “00:00:00”) or the start of the selected clip. When video content is played, the user can use a start marker button 1402 and an end marker button 1404 to mark the segment of interest. For example, if the user is interested in video content in a news segment shown for a particular actor, the user can play the video content of the media player. When the user reaches the segment shown for the actor, the user can click on the marker button to mark its position. When the segment ends, the user can click on the end marker button 1404. The user can continue this throughout the news segment and can mark segments of interest to be printed or for other purposes.

図１５は、図１２に示しているＰＤＤＩ１２２のグラフィック表現を示しており、図１２においては、コンテンツ選択フィールド７１４において示している映像タイムラインにおいて右クリックするようにマウス又は他のプリンタ装置を使用した。ユーザが映像タイムラインにおける映像セグメントを右クリックするとき、ダイアログボックス１５０２が現れ、ユーザに映像フレームに関するオプションを提供する。ユーザは、セグメントに対応する映像をプレイすることをダイアログボックス１５０２において選択することができ、又は、映像が現在プレイされている場合、映像のプレイを一時停止することができる。又、ユーザはセグメントを編集することを選択することができる。編集オプションは下で書斎に説明する。ダイアログボックス１５０２は、映像タイムラインにおける選択されたセグメントにおいて又は映像タイムラインの近くのどこか現れる。ダイアログボックス１５０２は、巻き戻しオプション又は早送りオプションのような図１５に示すオプション以外に、他の制御オプションを更に含むことができる。 FIG. 15 shows a graphical representation of the PDDI 122 shown in FIG. 12, in which a mouse or other printer device was used to right-click on the video timeline shown in the content selection field 714. . When the user right-clicks a video segment in the video timeline, a dialog box 1502 appears, providing the user with options regarding the video frame. The user can select in the dialog box 1502 to play the video corresponding to the segment, or if the video is currently being played, the video can be paused. The user can also choose to edit the segment. Editing options are explained in the study below. Dialog box 1502 appears anywhere in the selected segment in the video timeline or near the video timeline. The dialog box 1502 can further include other control options in addition to the options shown in FIG. 15 such as a rewind option or a fast forward option.

図１６をここで参照するに、ユーザがダイアログボックス１５０において編集オプションを選択した図１５のＰＤＤＩ１２２のグラフィック表現を示している。ダイアログボックス１現れ、ユーザがオプションを更に選択することを可能にする。編集ダイアログボックス１６０２において、ユーザは、開始時間フィールド１６０４及び終了時間フィールド１６０６を修正することによりセグメントについての開始時間又は終了時間を修正することができる。このようにして、ユーザは、セグメントの前又は後の幾らかの時間を含むセグメントを有するように選択することができる。例えば、ユーザは、ユーザが興味をもつ幾らかの導入部又は前置を有することが可能であるため、セグメントの所定の開始時間の４５秒前にＣＮＮニュースセグメント４５において発生した映像コンテンツを見ることに興味をもつことが可能である。更に、ユーザは、セグメントについての所定の終了時間の数秒後に発生する映像コンテンツを見ることに興味をもつことが可能である。ユーザは又、セグメントを短くするためにセグメントの開始時間及び終了時間を修正し、ユーザが興味をもっていない無関係なコンテンツを取り除くことができる。開始時間、終了時間又はそれらの両方を修正した後、ユーザは修正を適用するためにＯＫボタンを選択することができ、又は、ユーザはタスクを終了するためにキャンセルボタンを選択し編集ダイアログボックス１６０２が消えるようにすることができる。 Referring now to FIG. 16, a graphical representation of PDDI 122 of FIG. 15 with the user selecting an edit option in dialog box 150 is shown. Dialog box 1 appears and allows the user to select further options. In the edit dialog box 1602, the user can modify the start time or end time for a segment by modifying the start time field 1604 and end time field 1606. In this way, the user can select to have a segment that includes some time before or after the segment. For example, the user can have some introductory or prefix that the user is interested in, so that he sees the video content generated in the CNN news segment 45 45 seconds before the predetermined start time of the segment. It is possible to be interested in In addition, the user can be interested in viewing video content that occurs several seconds after a predetermined end time for the segment. The user can also modify the segment start and end times to shorten the segment and remove extraneous content that the user is not interested in. After modifying the start time, end time, or both, the user can select the OK button to apply the modification, or the user can select the cancel button to end the task and edit dialog box 1602. Can disappear.

図１７の実施形態においては、セレクタ１２２２を用いて映像タイムラインのセグメントをマニュアルで選択することに代えて、ユーザは、映像データにセグメント化タイプを適用した。ユーザは、顔検出を実施することを、セグメント化フィールドにおいて選択し、その顔検出において、システムは映像コンテンツにおける顔の画像を検索する。顔検出が選択されたとき、ＰＤＤＩ１２２は、顔画像を含むタイムラインに沿ったセグメントを示す。各々のセグメントは、信頼度値及びクリップにおいて検出された顔の数を表す整数により割り当てられることができる。ユーザは又、閾値選択イールド１２０４において、８０％の閾値を適用することを選択した。このようにして、顔画像を８０％より大きい確率で含む可能性がある映像フレームのみがＰＤＤＩ１２２において表示される。顔検出結果はセグメント化表示フィールド１７０２において表示される。セグメント化表示フィールド１７０２において示される各々のイベントセグメント１７０４は、顔画像を８０％より大きい確率で含む可能性がある映像フレームの集合又は１つの映像フレームに対応している。 In the embodiment of FIG. 17, instead of manually selecting a segment of the video timeline using the selector 1222, the user applied a segmentation type to the video data. The user selects to perform face detection in the segmentation field, and in the face detection, the system searches for a face image in the video content. When face detection is selected, PDDI 122 shows a segment along the timeline that includes the face image. Each segment can be assigned by a confidence value and an integer representing the number of faces detected in the clip. The user also chose to apply an 80% threshold at the threshold selection yield 1204. In this way, only video frames that may contain face images with a probability greater than 80% are displayed on the PDDI 122. The face detection result is displayed in the segmented display field 1702. Each event segment 1704 shown in the segmented display field 1702 corresponds to a set of video frames or one video frame that may contain face images with a probability greater than 80%.

イベントセグメント１７０４は図１７において千鳥状ボックスとして示されている。しかしながら、又、セグメント化表示フィールド１７０２又は他の視覚インジケータに亘って延びるラインを備えている。ユーザは、イベントセグメント１７０４を検出するためにイベントセグメント１７０４のいずれか１つを右クリックすることができる。イベントセグメント１７０４に対応するマーカー（例えば、バーコード）を、プレビューフィールド７１２において表示されたマルチメディア文書に示すことができる。又、ユーザが映像タイムラインに沿ってセレクタ１２２２を動かすとき、プレビューウィンドウ１３０２が現れ、顔画像が存在することを確実にするためにイベントセグメント１７０４において映像フレームを見る機会をユーザに提供する。更に、システムは、特定の顔画像の各々の例に対する分離した映像フレームを表示するのではなく、１つの映像フレームに同じ顔の画像を示す映像コンテンツを結合させることが可能である。 Event segment 1704 is shown as a staggered box in FIG. However, it also includes a line that extends across the segmented display field 1702 or other visual indicator. The user can right click on any one of the event segments 1704 to detect the event segment 1704. A marker (eg, a barcode) corresponding to the event segment 1704 can be shown in the multimedia document displayed in the preview field 712. Also, when the user moves the selector 1222 along the video timeline, a preview window 1302 appears, providing the user with an opportunity to view the video frame in the event segment 1704 to ensure that a face image is present. In addition, the system can combine video content showing the same face image in one video frame, rather than displaying separate video frames for each example of a particular face image.

図１７の顔検出の例以外に、映像コンテンツ又は他の種類のマルチメディアコンテンツに適用することができる多くの他のセグメント化タイプが存在する。これらのセグメント化タイプの各々は、セグメント化タイプフィールド１２０２におけるメニューに表示されることができ、ユーザは、セグメント化タイプが適用される必要があるメニューから選択することができる。適用される種々の異なるセグメント化タイプの例の概略を次に説明する。映像イベント検出は、ユーザが適用することができるセグメント化タイプであり、その映像イベント検出において、ＰＤＤＩ１２２はタイムラインに沿って映像イベント検出アルゴリズムを適用する結果を示す。映像イベントの例は、人が会議の間に立ち上がる場合又は人が部屋に入ってくる場合を含む。カラーヒストグラム分析は、ユーザが適用することができる他のセグメント化タイプであり、そのカラーヒストグラム分析において、ＰＤＤＩ１２２はタイムラインに沿ってカラーヒストグラム分析アルゴリズムを適用する結果を示す。例えば、ＰＤＤＩ１２２は、経験豊かなユーザが日没を含む映像の一部を即座に位置付けることを可能にする、３０秒間隔毎の色相図を示すことが可能である。更に、同じ顔の複数の例が１つの表現の顔画像に結合されるように、顔画像をクラスタリングするために、クラスタリングを適用することができる。 In addition to the face detection example of FIG. 17, there are many other segmentation types that can be applied to video content or other types of multimedia content. Each of these segmentation types can be displayed in a menu in the segmentation type field 1202 and the user can select from the menu to which the segmentation type needs to be applied. A summary of examples of the various different segmentation types that are applied follows. Video event detection is a segmented type that can be applied by the user, and in that video event detection, PDDI 122 shows the result of applying a video event detection algorithm along the timeline. Examples of video events include when a person stands up during a meeting or when a person enters a room. Color histogram analysis is another segmentation type that can be applied by the user, in which PDDI 122 shows the result of applying a color histogram analysis algorithm along the timeline. For example, the PDDI 122 can show a hue diagram every 30 seconds interval that allows an experienced user to immediately locate a portion of the video including sunset. Further, clustering can be applied to cluster face images so that multiple instances of the same face are combined into a single representation of the face image.

顔認識は他のセグメント化タイプであって、その顔認識において、ＰＤＤＩ１２２は、タイムラインに沿ったポイントに対応する映像フレームに顔認識を適用することによりもたらされる、タイムラインに沿った名前を示す。又、名前を選択することによってユーザにクリップを選択させる一連のチェックボックスが提供される。光学式文字認識（ＯＣＲ）はセグメント化タイプであって、ＯＣＲは映像コンテンツにおける各々のフレームにおいて実行され、各々のフレームはサブサンプリング（即ち、３０フレーム毎に１回）される。その結果はタイムラインに沿って表示される。ユーザにＯＣＲの結果において検索される言葉を入力させるように、テキスト入力ダイアログボックスが又、提供される。入力されたテキストに含まれるクリップはタイムラインに沿って示される。更に、各々のフレームに対してＯＣＲを実行することにおける類似する結果が結合されるように、クラスタリングを適用することができる。入力されたテキストが含まれるクラスタはタイムラインに沿って表示される。 Face recognition is another segmentation type, in which PDDI 122 indicates the name along the timeline that results from applying face recognition to the video frame corresponding to the point along the timeline. . A series of check boxes are also provided that allow the user to select a clip by selecting a name. Optical character recognition (OCR) is a segmentation type, where OCR is performed on each frame in the video content, and each frame is subsampled (ie, once every 30 frames). The result is displayed along the timeline. A text entry dialog box is also provided to allow the user to enter words to be searched for in the OCR results. Clips included in the entered text are shown along the timeline. Further, clustering can be applied so that similar results in performing OCR for each frame are combined. Clusters containing the entered text are displayed along the timeline.

上記のセグメント化タイプに加えて、適用することが可能である他の例が存在する。動き分析は他のセグメント化タイプであり、その動き分析において、ＰＤＤＩ１２２はタイムラインに沿って動き分析アルゴリズムを適用する結果を示す。その結果は、例えば、検出された動きの量を示す大きさを有する波形として示されることができる。これは、例えば、経験豊かなユーザがカメラの視野を横切って走る人を含む映像の一部を即座に位置付けることを可能にする。距離予測は他のセグメント化タイプであって、その距離予測において、ＰＤＤＩ１２２は、タイムラインに沿って距離予測アルゴリズムを適用する結果を示す。例えば、既知の距離だけ離れた２台のカメラを用いる監視カメラの用途において、カメラからの各々のポイントの距離を予測することができる。ユーザは、カメラからのそれらの距離に基づいて、印刷するために所定の映像ファイルの部分を選択するための閾値を設定することができる。例えば、ユーザは、カメラから５０ヤード以上離れた対象のみを見ることを希望することが可能である。前景及び後景のセグメント化を又、適用することが可能であり、その前景及び後景のセグメント化において、ＰＤＤＩ１２２は、タイムラインに沿って前景／後景セグメント化アルゴリズムを適用した結果を示す。各々のポイントにおいて、前景の対象が表示される。クラスタリング及び結合アルゴリズムが、表示される個々の対象の数を減少させるために隣接フレームのグループに亘って適用されることができる。ユーザは、結合アルゴリズムばかりでなく、前景／後景セグメント化の信頼度値の基づくポイントにおいて所定の映像ファイルの一部を選択するために閾値を設定することができる。シーンセグメント化はユーザが適用することができる他のタイプであって、そのシーンセグメント化において、ＰＤＤＩ１２２は、タイムフレームに沿ってショットセグメント化アルゴリズムを適用する結果を示す。各々のセグメントは、セグメント化が正確である信頼度値を添付されることができる。 In addition to the segmentation types described above, there are other examples that can be applied. Motion analysis is another segmentation type, in which PDDI 122 shows the result of applying a motion analysis algorithm along the timeline. The result can be shown, for example, as a waveform having a magnitude that indicates the amount of motion detected. This allows, for example, an experienced user to instantly locate a portion of an image that includes a person running across the camera's field of view. Distance prediction is another segmentation type, in which the PDDI 122 shows the result of applying a distance prediction algorithm along the timeline. For example, in surveillance camera applications using two cameras that are separated by a known distance, the distance of each point from the camera can be predicted. Based on their distance from the camera, the user can set a threshold for selecting portions of a given video file for printing. For example, the user may wish to see only objects that are 50 yards or more away from the camera. Foreground and background segmentation can also be applied, and in that foreground and background segmentation, PDDI 122 shows the result of applying the foreground / background segmentation algorithm along the timeline. At each point, the foreground object is displayed. Clustering and combining algorithms can be applied over groups of adjacent frames to reduce the number of individual objects displayed. The user can set a threshold to select a portion of a given video file at a point based on the foreground / background segmentation confidence value as well as the combining algorithm. Scene segmentation is another type that a user can apply, in which PDDI 122 shows the result of applying a shot segmentation algorithm along a time frame. Each segment can be accompanied by a confidence value that the segmentation is accurate.

自動車又はナンバープレートを認識するためのセグメント化タイプを又、適用することができる。自動車認識は、例えば、多くの時間、非常に退屈な映像を生成する関しカメラを操作するユーザにとって、有用である。そのようなユーザは、しばしば、赤のキャデラックのような特定の対象を含む部分のみを見つけて印刷する必要がある。このような目的のために、映像の各々のフレームは自動車認識技術に対して入力され、得られた結果はタイムラインに沿って表示される。ナンバープレート認識は又、監視カメラを操作しているユーザにとって有用であり、特定のナンバープレートの番号を含む部分にすいての関し映像を検索するために必要である。この目的のために、映像の各々のフレームは、ナンバープレート認識技術に入力され、得られた結果（ナンバープレートの番号、県、プレートの色、プレートの所有者の名前と住所、未解決の逮捕状、プレートの所有者の犯罪履歴等）がタイムラインに沿って表示される。自動車又はナンバープレート認識のどちらかを用いて、ユーザは、自動車又はナンバープレート認識結果を伴う信頼度値に基づいて印刷される所定の映像ファイルの一部を選択するために閾値を設定することができる。テキスト入力ダイアログボックスが又、提供され、ユーザが、製造元、型、色、自動車の年式、ナンバープレートの番号、県及び年などをナンバープレートに対して入力することを可能にする。これらのテキスト入力は、認識結果の範囲に対して検索される。入力情報を含むクリップはタイムラインに沿って表示される。 A segmentation type for recognizing automobiles or license plates can also be applied. Car recognition is useful, for example, for a user operating a camera with a lot of time generating very boring images. Such users often need to find and print only those parts that contain a particular object, such as a red cadillac. For this purpose, each frame of the video is input to the car recognition technology and the results obtained are displayed along the timeline. License plate recognition is also useful for a user operating a surveillance camera and is necessary to retrieve video about the part containing the number of a particular license plate. For this purpose, each frame of the video is input into license plate recognition technology and the results obtained (number of license plate, province, plate color, name and address of plate owner, unresolved arrest Status, plate owner's criminal history, etc.) along the timeline. Using either car or license plate recognition, the user can set a threshold to select a portion of a given video file to be printed based on a confidence value with the car or license plate recognition result. it can. A text entry dialog box is also provided to allow the user to enter the manufacturer, model, color, car year, license plate number, county, year, etc. into the license plate. These text inputs are searched for the range of recognition results. Clips containing input information are displayed along the timeline.

ここで、図１８を参照するに、１つ又はそれ以上のタイムラインにおいて表示される複数のソースからの映像コンテンツを含むＰＤＤＩ１２２のグラフィック表示を示している。例えば、映像コンテンツは、２つの異なるＣＮＮニュースセグメントからもたらされることが可能であり、又は、ＣＮＮニュースセグメント及びＣＳＰＡＮニュースセグメントの両方からもたらされることが可能である。システムは、マルチメディア文書の一のページにおける１つのニュースセグメントからの映像フレームと、他のページにおける他のニュースセグメントからの映像フレームとを印刷することができる。図１８は、２つの分離したコンテンツ選択フィールド７１４ａ及び７１４ｂであって、各々は抽出された映像フレームを表示する分離した映像タイムラインと対応するテキスト戸を有する、コンテンツ選択フィールドを表示している。各々の映像タイムラインは、顔画像が８０％以上の確率を有して検出されるフレームを表すイベントセグメント１７０４と映像タイムラインにおいて選択を行うためのセレクタ１２２２ａと１２２２ｂを含む。各々の映像タイムラインは又、映像ソースに対する顔検出の適用から得られたセグメント１７０４を示す各々の映像タイムラインに対応する分離したセグメント化表示フィールド１１０２を含む。従って、ユーザがプレビューウィンドウ１３０２を見て、マルチメディア文書における表示を選択するためにどれかの映像フレームを決定するまで、ユーザは、各々のタイムラインの周りでセレクタ１２２２ａと１２２２ｂとを独立して動かすことができる。図１８は２つの映像タイムラインを示している一方、又、ユーザが複数のソースを比較することを可能にし、それ故、ＰＤＤＩ１２２において複数のタイムラインを生成することを可能にする。 Referring now to FIG. 18, a graphical display of a PDDI 122 is shown that includes video content from multiple sources displayed in one or more timelines. For example, video content can come from two different CNN news segments, or it can come from both a CNN news segment and a CSPAN news segment. The system can print video frames from one news segment on one page of the multimedia document and video frames from other news segments on other pages. FIG. 18 shows two separate content selection fields 714a and 714b, each having a separate video timeline displaying extracted video frames and corresponding text doors. Each video timeline includes event segments 1704 representing frames in which face images are detected with a probability of 80% or higher and selectors 1222a and 1222b for making selections in the video timeline. Each video timeline also includes a separate segmented display field 1102 corresponding to each video timeline showing segments 1704 resulting from the application of face detection to the video source. Thus, until the user views preview window 1302 and determines which video frame to select for display in the multimedia document, the user independently selects selectors 1222a and 1222b around each timeline. Can move. While FIG. 18 shows two video timelines, it also allows the user to compare multiple sources, thus allowing multiple timelines to be generated in PDDI 122.

ユーザは、ＰＤＤＩを用いて、映像コンテンツに異なるセグメント化タイプを適用することができる。ユーザは、例えば、１つの２０分の長さのＣＮＮニュースショーに音声検出及び話者認識の両方を適用することを選択することが可能である。図１９は、顔認識が適用された図１７のＰＤＤＩ１２２を示している。しかしながら、図１９は、顔検出映像ＯＣＲに両方を適用した結果を示している。一部の実施形態においては、システムはセグメント化タイプフィールドにおいてドロップダウンメニューを含む。そのメニューは、損メニューにおいて一つずつ各々のセグメント化タイプをリストアップすることができる。それ故、この実施形態においては、ユーザは、メニューにおいて２つ以上のセグメント化タイプをクリックし、選択されたセグメント化タイプ全てを適用することができる。 Users can apply different segmentation types to video content using PDDI. The user can, for example, choose to apply both voice detection and speaker recognition to one 20 minute long CNN news show. FIG. 19 shows the PDDI 122 of FIG. 17 to which face recognition is applied. However, FIG. 19 shows the result of applying both to the face detection video OCR. In some embodiments, the system includes a drop-down menu in the segmentation type field. The menu can list each segmentation type one by one in the loss menu. Therefore, in this embodiment, the user can click on more than one segmentation type in the menu to apply all selected segmentation types.

他の実施形態においては、メニューは又、多くの異なる組み合わせのオプションを含むことが可能であり、２つ以上のセグメント化タイプを含むメニューにおいて１つのアイテムをユーザが選択することを可能にする。例えば、音声検出と話者認識の組み合わせは、メニューにおける１つの組み合わせのアイテムとすることが可能である。メニューにおいてこのオプションを選択することにより、ユーザは、音声検出及び話者認識がマルチメディアコンテンツにおいて実行されるようにする。このメニューアイテムの組み合わせは、セグメント化タイプ及びセグメント化組み合わせタイプのデフォルトリストとして、特性をプリンタ１０２に予め設定されることが可能である。更に、ユーザは、ユーザ自身の組み合わせのタイプを定義することができる。ユーザがユーザ定義セグメント化タイプを生成するとき、ユーザはセグメント化タイプに名前を与えることができ、このオプションはセグメント化タイプのドロップダウンメニューにおいて現れることとなる。図１９におけるセグメント化タイプは“Ｃｏｍｂｏ１”と命名され、それは単一のセグメント化タイプのユーザ定義の組み合わせである。更に、２対脳のセグメント化技術の組み合わせにより調整されることが可能である非常に多くのパラメータを生成することが可能であるため、閾値フィールド１２０４は無効になる。それ故、各々の技術の組み合わせは、うまく機能することが証明されたパラメータ値のデフォルト集合を有することができる。しかしながら、ユーザは、オプションボタン１９０６がクリックされるときに現れるダイアログボックス（図示せず）においてそれらデフォルト集合を修正することができる。 In other embodiments, the menu can also include many different combinations of options, allowing a user to select an item in a menu that includes more than one segmentation type. For example, the combination of voice detection and speaker recognition can be one combination item in the menu. By selecting this option in the menu, the user causes voice detection and speaker recognition to be performed on the multimedia content. This menu item combination can be pre-configured in the printer 102 as a default list of segmentation types and segmentation combination types. In addition, the user can define his own combination type. When the user creates a user-defined segmentation type, the user can give the segmentation type a name and this option will appear in the segmentation type drop-down menu. The segmentation type in FIG. 19 is named “Combo1”, which is a user-defined combination of a single segmentation type. In addition, the threshold field 1204 is invalid because a large number of parameters can be generated that can be adjusted by a combination of two-to-brain segmentation techniques. Therefore, each technology combination can have a default set of parameter values that have proven to work well. However, the user can modify these default sets in a dialog box (not shown) that appears when option button 1906 is clicked.

図１９に示すように、コンテンツ選択フィールド７１４は２つのセグメント化表示フィールド１１０２を含み、各々のセグメント化タイプの一は映像コンテンツに適用されたものである。この例においては、“Ｃｏｍｂｏ１”を構成する２つのセグメント化タイプの各々に対して１つのセグメント化表示フィールド１１０２が存在する。左側のセグメント化表示フィールド１１０２は、図１７において示した顔検出に適用した結果を示している。しかしながら、イベントセグメント１７０４は、図１７において示されるような千鳥状にはなっておらず、それに代えて、一列に並んでいる。右側のセグメント化表示フィールド１１０２は、映像コンテンツに映像ＯＣＲを適用した結果を示している。このセグメント化表示フィールド１１０２において示されているイベントセグメント１７０４は、左側のセグメント化表示フィールド１１０２に示されているイベントセグメントと異なるように表示されている。しかしながら、一部の実施形態においては、異なるセグメント化表示フィールド１１０２において示されているイベントセグメント１７０４は同様に表示されることが可能である。一部の実施形態においては、イベントセグメント１７０４は、２つのセグメント化表示フィールド１１０２において異なるフォーマットに配列される、又は、ＰＤＤＩ１７０４は、適用されるセグメント化タイプ全てに対してイベントセグメント１７０４を示す１つのセグメント化表示フィールド１１０２のみを含むことができる。 As shown in FIG. 19, the content selection field 714 includes two segmentation display fields 1102, one of each segmentation type being applied to video content. In this example, there is one segmented display field 1102 for each of the two segmentation types that make up “Combo1”. A segmented display field 1102 on the left shows the result applied to the face detection shown in FIG. However, the event segments 1704 are not staggered as shown in FIG. 17 and are instead arranged in a line. A segmented display field 1102 on the right side shows the result of applying the video OCR to the video content. The event segment 1704 shown in the segmented display field 1102 is displayed differently from the event segment shown in the segmented display field 1102 on the left side. However, in some embodiments, event segments 1704 shown in different segmented display fields 1102 can be displayed as well. In some embodiments, event segments 1704 are arranged in different formats in the two segmentation display fields 1102 or PDDI 1704 shows one event segment 1704 for all applicable segmentation types. Only the segmented display field 1102 can be included.

セグメント化タイプが適用される図１９の例（例えば、Ｃｏｍｂｏ１）以外に、実施されることができる多くの他のセグメント化タイプの組み合わせがある。これらのセグメント化タイプの組み合わせの各々は、セグメント化タイプフィールド１２０２のメニューにおいて表示されることができ、ユーザは、セグメント化タイプが適用される必要があるメニューから選択することができる。種々の異なるセグメント化タイプの組み合わせの例の概要について次に説明するが、下で説明しない多くの他の組み合わせも又実行することが可能である。ユーザは、距離予測と組み合わせて動き分析を適用することができ、その適用において、ＰＤＤＩ１２２は、１つのタイムライン又は２つの分離したタイムラインに沿って、動き分析アルゴリズムと距離予測アルゴリズムとを適用する結果を示す。動き分析タイムラインは、大きさが検出された動きの量を表す波形を含むことが可能である。ユーザは、検出される動きの量とカメラからのその動きの距離とに基づいて、印刷される所定の映像ファイルの位置をユーザが選択することを可能にする閾値を設定することができる。シーンセグメント化及び顔検出は、ユーザが適用することが可能である他の組み合わせであって、その適用において、ＰＤＤＩ１２２は、タイムラインに沿ってショットセグメント化アルゴリズムを適用する結果を示す。例えば、色又は特定のアイコンは、顔画像を含むタイムラインにおけるセグメントを示すことができる。各々のセグメントは、シーンセグメント化が正確である信頼度値が添付されることができ、信頼度値ばかりでなく、検出された顔の数を表す整数が添付されることができる。シーンセグメント化及びＯＣＲは、適用されることが可能である他の組み合わせであり、その適用において、ＰＤＤＩ１２２は、タイムラインに沿ってショットセグメント化アルゴリズムの結果を示す。ＯＣＲは又、映像コンテンツの各々のフレームにおいて実行され、コンテンツはサブサンプリングされる。得られる結果は同じタイムライン又は異なるタイムラインに沿って表示される。ユーザは又、ＯＣＲの結果におけるテキスト検索を実行することができ、検索語を含むセグメントをタイムラインに沿って表示することができる。 There are many other segmentation type combinations that can be implemented besides the example of FIG. 19 where segmentation types are applied (eg, Combo1). Each of these segmentation type combinations can be displayed in a menu in the segmentation type field 1202 and the user can select from a menu where the segmentation type needs to be applied. An overview of examples of various different segmentation type combinations is described next, but many other combinations not described below are also possible. The user can apply motion analysis in combination with distance prediction, in which PDDI 122 applies the motion analysis algorithm and the distance prediction algorithm along one timeline or two separate timelines. Results are shown. The motion analysis timeline can include a waveform representing the amount of motion detected in magnitude. The user can set a threshold that allows the user to select the location of a predetermined video file to be printed based on the amount of motion detected and the distance of that motion from the camera. Scene segmentation and face detection is another combination that a user can apply, in which PDDI 122 shows the result of applying a shot segmentation algorithm along the timeline. For example, a color or a specific icon can indicate a segment in a timeline that includes a face image. Each segment can be accompanied by a confidence value that the scene segmentation is accurate, and can be accompanied by an integer representing the number of detected faces as well as the confidence value. Scene segmentation and OCR are other combinations that can be applied, in which PDDI 122 shows the results of the shot segmentation algorithm along the timeline. OCR is also performed on each frame of video content, and the content is subsampled. The resulting results are displayed along the same timeline or different timelines. The user can also perform a text search on the OCR results and display segments containing the search terms along the timeline.

マルチメディアコンテンツにセグメントタイプの組み合わせを適用するとき、ユーザは、組み合わせにおける２つのタイプのみを適用することに限定されない。ユーザは、３つ又はそれ以上のセグメント化タイプを適用することができ、そのような組み合わせは、デフォルトによりセグメント化タイプメニューにおいて示されることができ、それらの組み合わせはユーザが作り出すことができる。シーンセグメント化、ＯＣＲ及び顔認識を組み合わせて適用することができ、その適用において、ＰＤＤＩ１２２は、タイムラインに沿ってショットセグメント化アルゴリズムを適用する結果を示す。映像における各々のフレームは、それらが実行され且つサブサンプリングされるＯＣＲを有し、得られる結果は同じタイムライン又は異なるタイムラインに沿って表示される。映像フレームへの顔認識の適用によりもたらされた名前は又、同じタイムライン又は異なるタイムラインにおいて示される。又、名前を選択することによりユーザにクリップを選択させるように、一連のチェックボックスが提供される。ユーザはその結果に対して閾値を設定することができ、ショットセグメント化、ＯＣＲ及び顔認識結果を伴う信頼度値に基づいて、印刷される所定の映像ファイルの一部をユーザが選択することを可能にする。それに代えて。ユーザは、ＯＣＲ及びシーンセグメント化を伴う顔検出を適用することが可能である。ＰＤＤＩ１２２は、上記のように、ＯＣＲ及びシーンセグメント化の結果を表示する。尾奈にタイムライン又は異なるタイムラインは又、顔画像を含むセグメントを含むことが可能である。各々のセグメントに、信頼度値ばかりでなく、クリップにおいて検出された顔の数を表す整数を添付することができる。 When applying a combination of segment types to multimedia content, the user is not limited to applying only two types in the combination. The user can apply three or more segmentation types, such combinations can be shown in the segmentation type menu by default, and those combinations can be created by the user. A combination of scene segmentation, OCR and face recognition can be applied, in which case PDDI 122 shows the result of applying a shot segmentation algorithm along the timeline. Each frame in the video has an OCR on which they are executed and subsampled, and the resulting results are displayed along the same timeline or different timelines. Names resulting from the application of face recognition to video frames are also shown in the same timeline or in different timelines. A series of check boxes are also provided to allow the user to select a clip by selecting a name. The user can set a threshold for the result, and the user can select a portion of the predetermined video file to be printed based on the confidence value with shot segmentation, OCR and face recognition results. to enable. Instead. The user can apply face detection with OCR and scene segmentation. PDDI 122 displays the results of OCR and scene segmentation as described above. Ona timeline or a different timeline can also include segments containing facial images. Each segment can be attached with an integer representing the number of faces detected in the clip, as well as a confidence value.

自動車認識及び動き分析は、他の代替のセグメント化タイプの組み合わせであり、その組み合わせにおいて、映像の各々のフレームは自動車認識技術に入力され、得られる結果はタイムラインに沿って表示される。又、動き分析技術が、１つのフレームから次のフレームへの自動車の速度を予測するために映像に適用される。又、テキスト入力ダイアログボックスが、ユーさが、製造元、型、色、自動車の年式、ナンバープレートの番号、県及び年などをナンバープレートに対して入力することを可能にする。これらのアイテムは自動車認識及び動き分析結果及び入力情報がタイムラインに沿って示され、入力情報を含むクリップはタイムラインに沿って表示される。 Car recognition and motion analysis is a combination of other alternative segmentation types, in which each frame of video is input into car recognition technology and the resulting results are displayed along a timeline. Motion analysis techniques are also applied to the video to predict the speed of the car from one frame to the next. A text entry dialog box also allows the user to enter manufacturer, model, color, car year, license plate number, county, year, etc. into the license plate. For these items, car recognition and motion analysis results and input information are shown along the timeline, and clips including the input information are displayed along the timeline.

図１９は、映像コンテンツに適用される２つ以上のセグメント化タイプ（例えば、Ｃｏｍｂｏ１）の例を示している一方、音声コンテンツ又はマルチメディアコンテンツのいずれかの他のタイプに２つ以上のセグメント化タイプを適用することが可能である。適用することができるセグメント化タイプの異なる組み合わせの概要について下に説明するが、実行されることができる多くの他の組み合わせについては下で説明しない。音声イベント検出及び分類は１つの組み合わせの例である。ＰＤＤＩ１２２は、タイムラインに沿って、例えば拍手、歓声、笑い等のような音声イベント検出を適用する結果を示す。各々の検出されたイベントは、正確に検索された信頼度を添付される。話者セグメント化および話者認識は他の組み合わせの例である。各々のセグメントは、タイムラインに沿って、異なる色で又は異なるアイコンにより示され、同じ話者により生成されたセグメントは、同じ色で又は同じアイコンにより示される。話者認識の結果は、テキスト及びオプションとしての各々の話者の名前に対する信頼度を含む。複数の話者の名前は各々のセグメントを割り当てられることが可能である。音声ソース定位化及び音声イベント検出は、代替としてユーザに適用されることが可能である。音声が検出された方向は円形のセクタとして表示される。各々のセクタは、音声が正確に検出された信頼度を添付される。ユーザインタフェースは、どの方向が表示されるべきかをユーザに選択させるプロトタイプの円の円周の周りに配列された一連のチェックボックスを含む。各々の検出された音声イベントは、音声が正確に検出された信頼度を添付され、ＰＤＤＩ１２２は、どのイベントが表示されるべきかをユーザに選択させる一連のチェックボックスを含む。ユーザは、代替として、発話認識及びプロファイル分析を組み合わせて適用することが可能である。ＰＤＤＩ１２２におけるタイムラインは、テキストと、オプションとして話された各々の単語又は文章に対する信頼度値を示す。発話認識の結果は、ユーザの興味を表す事前のテキストベースのプロファイルに対して適合される。ユーザは信頼度値に関して敷石を調節することができ、ユーザは又、そのプロファイルと発話認識結果との間の適合の語アイに関して閾値を調節することができる。発話認識及び音声イベント検出は、適用されることが可能である他の組み合わせの例である。タイラインは、音声イベント検出の結果と共に、テキストと、オプションとして話された各々の単語又は文章に対する信頼度値を含む。 FIG. 19 shows an example of more than one segmentation type (eg Combo1) applied to video content, while more than one segmentation into any other type of audio content or multimedia content It is possible to apply types. An overview of the different combinations of segmentation types that can be applied is described below, but many other combinations that can be implemented are not described below. Voice event detection and classification is an example of one combination. The PDDI 122 shows the results of applying audio event detection, such as applause, cheer, laughter, etc. along the timeline. Each detected event is accompanied by a correctly retrieved confidence. Speaker segmentation and speaker recognition are examples of other combinations. Each segment is shown along the timeline with a different color or with a different icon, and segments generated by the same speaker are shown with the same color or with the same icon. The result of speaker recognition includes text and optionally confidence for each speaker's name. Multiple speaker names can be assigned to each segment. Audio source localization and audio event detection can alternatively be applied to the user. The direction in which the sound is detected is displayed as a circular sector. Each sector is accompanied by a confidence that the voice was correctly detected. The user interface includes a series of check boxes arranged around the circumference of a prototype circle that allows the user to select which direction should be displayed. Each detected audio event is accompanied by a confidence that the audio was accurately detected, and PDDI 122 includes a series of check boxes that allow the user to select which events should be displayed. The user can alternatively apply a combination of speech recognition and profile analysis. The timeline in the PDDI 122 shows the text and a confidence value for each word or sentence spoken as an option. The result of speech recognition is adapted to a prior text-based profile representing the user's interest. The user can adjust the paving stone with respect to the confidence value, and the user can also adjust the threshold with respect to the matching word eye between the profile and the speech recognition result. Speech recognition and voice event detection are examples of other combinations that can be applied. The tie-line includes text and a confidence value for each spoken word or sentence, along with the results of the speech event detection.

マルチメディアコンテンツにセグメント化タイプの組み合わせを適用するとき、ユーザは２つのタイプのみを組み合わせて適用することに限定されない。ユーザは、３つ又はそれ以上のセグメント化タイプを適用することができ、そのような組み合わせは、デフォルトによりセグメント化タイプにおいて示されることができる、又はユーザにより作り出されることができる。発話認識、音声イベント検出及び話者認識を組み合わせて適用することができる。発話認識の結果は、テキストと、オプションとして各々の単語又は文章に対する信頼度値を含む。検出された音声イベントは同じタイムライン又は異なるタイムラインにおいて示される。ＰＤＤＩ１２２は又、音声イベントが正確に検出された信頼度を添付されて、検出された各々の話者の名前を表示する。ユーザインタフェースは、どの話者が表示されるべきかをユーザに選択させるようにする一連のチェックボックスを含む。発話認識、音声イベント検出及び話者セグメント化の組み合わせは、代替として適用されることが可能である。その適用は、話者セグメント化イベントが話者認識イベントに代えて示されることを除いて、上記と同様である。各々の話者セグメントは異なる色で又は異なるアイコンを用いて示され、同じ話者により生成されたセグメントは同じ色で又は同じアイコンを用いて示される。他の例として、話者認識、音声イベント検出及び音声定位化を組み合わせて適用されることが可能である。タイムラインは、検出された音声イベントと共に、テキストと、オプションとして各々の単語又は文章に対する信頼度値を示す。タイムラインは又、音声が円形のセクタとして検出された方向を表示する。各々のセクタは、音声が正確に検出された信頼度を添付される。ユーザインタフェースは、どの方向が表示されるべきかをユーザに選択させるプロトタイプの円の円周の周りに配列された一連のチェックボックスを含む。 When applying a combination of segmentation types to multimedia content, the user is not limited to applying only two types in combination. The user can apply three or more segmentation types, and such combinations can be indicated in the segmentation type by default or can be created by the user. Speech recognition, voice event detection, and speaker recognition can be applied in combination. The result of speech recognition includes text and optionally a confidence value for each word or sentence. The detected audio events are shown on the same timeline or on different timelines. PDDI 122 also displays the name of each detected speaker, accompanied by a confidence that the audio event was detected correctly. The user interface includes a series of check boxes that allow the user to select which speakers are to be displayed. A combination of speech recognition, voice event detection and speaker segmentation can be applied as an alternative. Its application is similar to the above except that the speaker segmentation event is shown instead of the speaker recognition event. Each speaker segment is shown in a different color or with a different icon, and segments generated by the same speaker are shown in the same color or with the same icon. As another example, speaker recognition, voice event detection, and voice localization can be applied in combination. The timeline shows the text and, optionally, the confidence value for each word or sentence along with the detected audio event. The timeline also displays the direction in which the audio was detected as a circular sector. Each sector is accompanied by a confidence that the voice was correctly detected. The user interface includes a series of check boxes arranged around the circumference of a prototype circle that allows the user to select which direction should be displayed.

ここで、図２０を参照するに、システムにより生成されることができるマルチメディア文書の他の実施形態の表現を表示するマルチメディア表現（例えば、映像ペーパー文書）を示している。この文書２０００は８つの映像フレーム１２０６を示し、それらの映像フレームの一部はテキスト１２１６が添付され、それらのテキストは図１９におけるＰＤＤＩにより生成されることが可能である、ダイアログのコピー、映像コンテンツの要旨等とすることが可能である。仕切り２００４は、この実施形態において各々の映像フレーム１２０６を分離し、映像コンテンツの各々のセグメントの開始時間から終了時間までを示すタイムスタンプ２００６が各々の仕切り２００４の中に示されている。更に、映像コンテンツについての情報を表示するヘッダ２００２が示されている。この例においては、ヘッダ２００２は、タイトル即ちＣＮＮニュース、ニュースショーの時間（例えば、午前１００時）、ショーの日付（例えば、２００１年９月１９日）及びショーの持続時間（例えば、１２分１９秒）を示している。 Referring now to FIG. 20, a multimedia representation (eg, a video paper document) that displays a representation of another embodiment of a multimedia document that can be generated by the system is shown. This document 2000 shows eight video frames 1206, some of which are attached with text 1216, which can be generated by PDDI in FIG. 19, a copy of a dialog, video content It is possible to make it a summary. A partition 2004 separates each video frame 1206 in this embodiment, and a time stamp 2006 indicating from the start time to the end time of each segment of the video content is shown in each partition 2004. In addition, a header 2002 that displays information about video content is shown. In this example, the header 2002 includes the title or CNN news, the time of the news show (eg, 10:00 am), the date of the show (eg, September 19, 2001) and the duration of the show (eg, 12 minutes 19 Second).

識別子１２０８が各々の映像フレーム１２０６の下に示され、ユーザは、映像フレーム１２０６を添付された映像コンテンツがプレイを開始するようにするそれらオン識別子１２０８のいずれ１つを用いることができる。映像フレーム１２０６は、話者が関連するテキスト１２１６のコピーを受信し始める時点でプレイを開始することができる。テキストがないことが示される、又は、表現“テキストなし”が表示される映像フレーム１２０６
は、クリップにおける人が話していない映像コンテンツを含むことが可能である、又は、ユーザがテキストを示すことを選択した例を示すことが可能である。 An identifier 1208 is shown below each video frame 1206, and the user can use any one of those on-identifiers 1208 that causes the video content attached with the video frame 1206 to begin playing. Video frame 1206 can begin playing when the speaker begins to receive a copy of the associated text 1216. A video frame 1206 indicating that there is no text or the expression “no text” is displayed.
Can include video content that a person in the clip is not speaking, or can indicate an example where the user has chosen to show text.

図２０の実施形態において示されているマルチメディア文書は、映像コンテンツ表示を制御するために制御マーカー又は識別子を更に示している。図２０において、プレイマーカー１２１０、早送りマーカー１２１２及び巻き戻しマーカー１２１４を示している。それらのマーカーは上記のように、マルチメディアデータにインタフェースを提供する。 The multimedia document shown in the embodiment of FIG. 20 further shows control markers or identifiers to control video content display. In FIG. 20, a play marker 1210, a fast-forward marker 1212, and a rewind marker 1214 are shown. These markers provide an interface to the multimedia data as described above.

本発明について特定の好適な実施形態を参照して説明したが、当業者は、種々の改良が可能であることを認識するであろう。好適な実施形態に対する変形及び改良は、同時提出の請求の範囲によってのみ規定される本発明により提供される。 Although the present invention has been described with reference to certain preferred embodiments, those skilled in the art will recognize that various modifications are possible. Variations and improvements to the preferred embodiment are provided by the present invention, which is defined only by the appended claims.

マルチメディアの表現を生成する前に、ユーザがマルチメディアデータをフォーマットすることを可能にする印刷ドライバダイアログインタフェースを生成するためのシステムのグロック図である。1 is a block diagram of a system for generating a print driver dialog interface that allows a user to format multimedia data before generating a multimedia representation. FIG. 図１のシステムの一実施形態の例示としてのアーキテクチャについてのブロック図である。FIG. 2 is a block diagram for an exemplary architecture of one embodiment of the system of FIG. プリンタとの対話式通信の例示を示す図であるFIG. 4 is a diagram illustrating an example of interactive communication with a printer. アプリケーションに盛り込まれた印刷選択ボタンを有する例示としてのマルチメディアレンダリングアプリケーションのグラフィック表現の図である。FIG. 2 is a graphical representation of an exemplary multimedia rendering application with a print selection button embedded in the application. いずれのマルチメディア変換が実行される前のユーザによるシステム制御のための方法のフロー図である。FIG. 5 is a flow diagram of a method for system control by a user before any multimedia conversion is performed. システムがマルチメディアを変換し、デフォルト変換を実行し、そしてグラフィックユーザインタフェースにおける結果を表示するときのシステム制御のための方法のフロー図である。FIG. 4 is a flow diagram of a method for system control when the system converts multimedia, performs default conversion, and displays the results in a graphic user interface. 音声ファイルのユーザ選択範囲を含む表現の生成を示す例示としてのグラフィックユーザインタフェースのグラフィック表現を示す図である。FIG. 5 is a diagram illustrating a graphical representation of an exemplary graphic user interface illustrating generation of a representation that includes a user-selected range of an audio file. マルチメディア表現をフォーマットするために例示としてのオプションを提供するグラフィックインタフェースのグラフィック表現を示す図である。FIG. 6 illustrates a graphic representation of a graphic interface that provides exemplary options for formatting a multimedia representation. 音声ファイルの２ページの要旨の生成を示す例示としてのグラフィックユーザインタフェースのグラフィック表現を示す図である。FIG. 4 is a diagram illustrating a graphic representation of an exemplary graphic user interface illustrating the generation of a two-page gist of an audio file. ２つの部分への音声ファイルタイムラインの分割を示す例示としてのグラフィックユーザインタフェースのグラフィック表現を示す図である。FIG. 4 is a graphical representation of an exemplary graphic user interface showing the division of an audio file timeline into two parts. ２つの垂直方向の部分への音声ファイルタイムラインの分割を示す例示としてのグラフィックユーザインタフェースのグラフィック表現を示す図である。FIG. 6 shows a graphical representation of an exemplary graphic user interface showing the division of an audio file timeline into two vertical parts. 映像ファイルのユーザ選択範囲を含む表現の生成を示す例示としてのグラフィックユーザインタフェースのグラフィック表現を示す図である。FIG. 6 is a diagram illustrating a graphic representation of an exemplary graphic user interface illustrating generation of a representation including a user-selected range of a video file. マルチメディアコンテンツをプレビューするためのプレビューフィールドを示す例示としてのグラフィックユーザインタフェースのグラフィック表現を示す図である。FIG. 6 illustrates a graphical representation of an exemplary graphic user interface showing a preview field for previewing multimedia content. プレビューフィールドにおいてセグメントマーカーボタンの使用示す例示としてのグラフィックユーザインタフェースのグラフィック表現を示す図である。FIG. 5 illustrates a graphical representation of an exemplary graphic user interface showing the use of segment marker buttons in a preview field. ユーザにオプションを提供するダイアログボックスを示す例示としてのグラフィックユーザインタフェースのグラフィック表現を示す図である。FIG. 5 illustrates a graphical representation of an exemplary graphic user interface showing a dialog box providing options to a user. 映像セグメントを編集するためのダイアログボックスを示す例示としてのグラフィックユーザインタフェースのグラフィック表現を示す図である。FIG. 6 is a diagram illustrating a graphical representation of an exemplary graphic user interface showing a dialog box for editing a video segment. 顔検出技術が適用された映像表現の生成を示す例示としてのグラフィックユーザインタフェースのグラフィック表現を示す図である。FIG. 6 is a diagram illustrating a graphical representation of an exemplary graphic user interface illustrating generation of a video representation to which face detection technology has been applied. 複数のソースからの映像表現の生成を示す例示としてのグラフィックユーザインタフェースのグラフィック表現を示す図である。FIG. 5 is a diagram illustrating a graphical representation of an exemplary graphic user interface illustrating the generation of video representations from multiple sources. 分析技術のユーザ定義組み合わせが適用された映像表現の生成を示す例示としてのグラフィックユーザインタフェースのグラフィック表現を示す図である。FIG. 5 is a diagram illustrating a graphical representation of an exemplary graphic user interface illustrating generation of a video representation to which a user-defined combination of analysis techniques is applied. 映像フレームとテキストとを含む例示としてのマルチメディア表現のグラフィック表現を示す図である。FIG. 3 is a diagram illustrating an exemplary multimedia representation of a graphic representation including a video frame and text.

Explanation of symbols

１００システム
１０２プリンタ
１０３従来のプリンタ
１０４メディア分析ソフトウェアモジュール
１０６処理ロジック
１０８デジタルメディア出力
１１２文書フォーマット仕様（（ＤＦＳ）
１２０マルチメディア文書
１２２ＰＤＤＩ
１２４プレビューフィールド
１２６話者名フィールド
１２８主題フィールド
１３０タイムフィールド
１３２フレーム
１３４ユーザ選択可能識別子
１４０接続
１４２サムネールピクチャ
１４４デジタルメディア
２００システム
２０２マルチメディア記憶器
２０４マルチメディアレンダリングアプリケーション（ＭＲＡ）
２０６アプリケーションプラグイン
２０８プリンタドライバソフトウェアモジュール
２１０通信モニタリングモジュール又はユーザインタフェースリスナーモジュール（ＵＩＬｉｓｔｅｎｅｒ）
２１２アプリケーションサーバ
２１４処理器
２３０パーソナルコンピュータ（ＰＣ）
２４０バス
２４２バス
２４４バス
２４６マルチメディアデータ及び信号ライン
２４８バス
２５１バス
２６０デジタル出力
２６４マルチメディアファイル記憶器
３０２ユーザ
３０４印刷要求
３０６要求通知
３０８印刷ジョブ
３１０情報
３１２アプリケーションサーバ
３１４プリンタに送信される応答
３１６情報要求
３１８情報要求
３２０ダイアログボックスが表示される
３２２アプリケーションサーバへのユーザの応答
３２４プリンタ１０２への応答
４０２プリントボタン
７０２ファイル名フィールド
７０４プリンタフィールド
７０６印刷範囲フィールド
７０８コピー及び調節フィールド
７１０アドバンスドオプションフィールド
７１２プレビューフィールド
７１４コンテンツ選択フィールド
７１４ａコンテンツ選択フィールド
７１４ｂコンテンツ選択フィールド
７１６セグメント化タイプフィールド
７１８閾値フィールド
７２０フィットオンフィールド
７２２タイムライン数選択フィールド
７２４方向フィールド
７２６更新ボタン
７２８ページ設定ボタン
７３０ＯＫボタン
７３２キャンセルボタン
７３４音声波形タイムライン
７３６セレクタ
７４０セグメント
７４２セグメント
７４４セグメント
７５０音声波形タイムライン
７６０プレイ識別子又はプレイマーカー
７６６マーカー又は識別子
８００マルチメディア文書ページ又はダイアログインタフェース
８０２ペーパーフィールド
８０４方向フィールド
８０６好みフィールド
９０２セグメント
９０４セグメント
９０６セグメント
９０８セグメント
９１０タイムスタンプ
９１２タイムラインマーカー
１００２セグメント
１００４セグメント
１００６セグメント
１００８セグメント
１１２０マーカー
１１２２タイムスタンプ
１２０２セグメント化タイプフィールド
１２０４閾値フィールド
１２０６映像フレーム
１２０８マーカー又は識別子
１２１０プレイマーカー
１２１２早送り（ＦＦ）マーカー１２１２
１２１４巻き戻しマーカー
１２１６テキスト
１２２０プレビューコンテンツフィールド
１２２２セレクタ
１２２２ａセレクタ
１２２２ｂセレクタ
１２２４プレイ矢印
１２２６セグメント
１２２８セグメント
１２３０セグメント
１２３２セグメント
１２４０タイムマーカー
１２５０カラム
１２５２カラム
１２５４カラム
１２８０プレビューウィンドウ
１３０２プレビューウィンドウ
１３０４プレイボタン
１３０６ボリュームアジャスタ
１３０８スライダ
１３１０一時停止ボタン
１３１２巻き戻しボタン
１３１４早送りボタン
１３１６スライダバー
１４０２開始マーカーボタン
１４０４終了マーカーボタン
１５０２ダイアログボックス
１６０２編集ダイアログボックス
１６０４開始時間フィールド
１６０６終了時間フィールド
１７０２セグメント化表示フィールド
１７０４イベントセグメント
１９０６オプションボタン
２０００文書
２００２ヘッダ
２００４仕切り
２００６タイムスタンプ 100 System 102 Printer 103 Conventional Printer 104 Media Analysis Software Module 106 Processing Logic 108 Digital Media Output 112 Document Format Specification ((DFS))
120 multimedia document 122 PDDI
124 preview field 126 speaker name field 128 subject field 130 time field 132 frame 134 user selectable identifier 140 connection 142 thumbnail picture 144 digital media 200 system 202 multimedia store 204 multimedia rendering application (MRA)
206 Application Plug-in 208 Printer Driver Software Module 210 Communication Monitoring Module or User Interface Listener Module (UI Listener)
212 Application server 214 Processor 230 Personal computer (PC)
240 Bus 242 Bus 244 Bus 246 Multimedia data and signal line 248 Bus 251 Bus 260 Digital output 264 Multimedia file store 302 User 304 Print request 306 Request notification 308 Print job 310 Information 312 Application server 314 Response 316 sent to printer Information Request 318 Information Request 320 Dialog Box Displayed 322 User Response to Application Server 324 Response to Printer 102 402 Print Button 702 File Name Field 704 Printer Field 706 Print Range Field 708 Copy and Adjustment Field 710 Advanced Options Field 712 Preview field 714 Content selection field 714a Content selection field 714b Content selection field 716 Segmentation type field 718 Threshold field 720 Fit on field 722 Timeline number selection field 724 Direction field 726 Update button 728 Page setting button 730 OK button 732 Cancel button 734 Audio waveform timeline 736 Selector 740 Segment 742 Segment 744 Segment 750 Audio waveform timeline 760 Play identifier or play marker 766 Marker or identifier 800 Multimedia document page or dialog interface 802 Paper field 804 Direction field 806 Preference field 902 Segment 904 Segment 906 Segment 908 Segment 910 Time stamp 912 Timeline Manufacturers 1002 segment 1004 Segment 1006 segment 1008 Segment 1120 marker 1122 timestamp 1202 segmentation type field 1204 the threshold field 1206 video frame 1208 marker or identifier 1210 play marker 1212 fast forward (FF) marker 1212
1214 Rewind marker 1216 Text 1220 Preview content field 1222 Selector 1222a Selector 1222b Selector 1224 Play arrow 1226 Segment 1228 Segment 1230 Segment 1232 Segment 1240 Time marker 1250 Column 1252 Column 1254 Column 1280 Preview window 1302 Preview window 1304 Play button 1306 Volume adjuster 1308 Slider 1310 Pause button 1312 Rewind button 1314 Fast forward button 1316 Slider bar 1402 Start marker button 1404 End marker button 1502 Dialog box 1602 Edit dialog box 1604 Start time field 1606 End time field 1702 segmented display field 1704 event segment 1906 option buttons 2000 document 2002 header 2004 partition 2006 time stamp

Claims

A system that enables interaction with media data analysis and media representation generation:
A user interface for enabling a user to control media content analysis and media representation generation; and a media analysis software module for analyzing the characteristics of the media content for receiving media content analysis instructions A media analysis software module communicatively coupled to the user interface;
A system characterized by comprising.

The system of claim 1, wherein the media analysis software module further comprises content recognition software for recognizing features in media content.

The system of claim 1, further comprising processing logic for controlling display of a user interface.

The system of claim 1, further comprising processing logic for controlling the generation of a media representation.

The system of claim 1, further comprising hardware for writing a media representation in a digital format.

6. The system according to claim 5, further comprising a storage medium for storing a media representation written in a digital format.

The system of claim 1, wherein the media representation is generated in a paper format.

8. The system of claim 7, wherein the paper format includes at least one user selectable identifier that allows a user to access and control media content.

The system of claim 8, wherein the at least one user-selectable identifier further comprises at least one barcode printed for a media representation.

9. The system of claim 8, wherein the at least one user selectable identifier further comprises at least one play identifier that can be selected to play corresponding media content. Feature system.

The system of claim 1, further comprising a data structure for representing the conversion of media content.

The system of claim 1, wherein the communication monitoring module is for monitoring communication between components of the system, wherein the communication monitoring module transfers a request for information and responds to a request between stem components. A system further comprising modules.

The system of claim 1, wherein the user interface further comprises a selection menu that allows a user to select a feature analysis to be performed on the media content.

The system of claim 1, wherein the user interface further comprises a field for setting a threshold for a confidence value corresponding to a result of media content.

The system of claim 1, wherein the user interface further comprises at least one field for managing and modifying the display of media information for media representations.

The system of claim 1, wherein the user interface further comprises a preview field for previewing active media frames in home-based media content.

The system of claim 1, wherein the user interface further comprises a preview field for previewing a generated media representation.

The system of claim 1, wherein the user interface comprises at least one content selection field for selecting a segment of media content from at least one source displayed in a media representation. System.

19. The system of claim 18, wherein the content selection field further comprises a selector that a user can slide along the content selection field to select a segment to be displayed in a media representation. , A system characterized by that.

19. The system of claim 18, wherein the content selection field further comprises a graphical display of media content that allows a user to display media content and select a segment of media content. System.

21. The system of claim 20, wherein the graphical display of media content further comprises an audio waveform timeline that displays audio content.

21. The system of claim 20, wherein the graphical display of the media content further comprises a video timeline that displays video frames extracted from the video content.

21. The system of claim 20, wherein the graphic display of the media content further comprises a video timeline that displays text extracted from the video content.

19. The system according to claim 18, wherein the content selection field is a field for displaying a result of media content analysis, and the result is displayed as a segment defined along a timeline. The system further comprising:

The system of claim 1, wherein the output device driver module for facilitating the media content analysis and media representation generation is communicatively coupled to a user interface to receive user instructions. Further comprising an output device driver module.

26. The system of claim 25, wherein the extended output device for generating a media representation is communicatively coupled to the media analysis software module to receive converted media data. A system further comprising an extended output device communicatively coupled to the output device driver module to receive instructions for media representation generation.

A method that enables interaction between media data analysis and media representation generation:
Interacting with the interface to control media data analysis and media representation generation;
Analyzing the characteristics of media data for media representation generation;
Facilitating media data analysis; and facilitating media representation generation by receiving and transmitting instructions relating to media representation parameters;
A method comprising:

28. The method of claim 27, further comprising the step of generating a media representation.

28. The method of claim 27, wherein analyzing the characteristics of the media data further comprises a procedure for performing speech recognition on the media data.

28. The method of claim 27, wherein analyzing the characteristics of the media data further comprises a procedure for performing optical character recognition on the media data.

28. The method of claim 27, wherein analyzing the characteristics of the media data further comprises a procedure for performing face recognition on the media data.

28. The method of claim 27, wherein analyzing the characteristics of the media data further comprises a procedure for performing speaker detection on the media data.

28. The method of claim 27, wherein analyzing the characteristics of the media data further comprises a procedure for performing face detection on the media data.

28. The method of claim 27, wherein analyzing the characteristics of the media data further comprises a procedure for performing event detection on the media data.

28. The method of claim 27, further comprising adding a printing function to a media rendering application for performing media representation.

28. The method of claim 27, further comprising storing the media content on a storage medium accessible to the extended output device.

28. The method of claim 27, wherein interacting with an interface for controlling media data analysis and media representation generation further comprises using a user interface for displaying media content formatting options to a user. A method characterized by comprising.

28. The method of claim 27, wherein interacting with an interface for controlling media data analysis and media representation generation is a procedure for selecting an analysis technique to be applied to media content, the analysis technique A method characterized by further comprising a procedure for recognizing features defined in the media content.

28. The method of claim 27, wherein interacting with an interface for controlling media data analysis and media representation generation is applied to a confidence level corresponding to a defined feature recognized in the media content. The method further comprising the step of selecting a threshold value.

28. The method of claim 27, wherein interacting with an interface for controlling media data analysis and media representation generation is generated in a preview field that displays the media representation as the media representation is generated. A method further comprising the step of previewing the media representation.

28. The method of claim 27, wherein interacting with an interface for controlling media data analysis and media representation generation includes modifying an update field after modifying content in a user interface to update a preview field. The method further comprising the step of selecting.

28. The method of claim 27, wherein interacting with an interface for controlling media data analysis and media representation generation includes selecting media content in a field of the user interface by a selector along a timeline for displaying media content. The method further comprising the step of selecting a segment.

28. The method of claim 27, further comprising the step of interacting with an interface for controlling media data analysis and media representation generation selecting a play option at a user interface for playing media content. A method characterized by comprising.

28. The method of claim 27, further comprising selecting print options in a media rendering application, wherein a user interface appears and the user selects parameters for media content conversion. A method characterized by that.

28. The method of claim 27, wherein selecting a print option in a media rendering application displays a user interface in which default media content conversion is performed and the media representation is shown in a preview field of the user interface. The method further comprising:

28. The method of claim 27, wherein generating the printable multimedia representation further comprises the step of printing the media representation in a paper-based format.

48. The method of claim 47, further comprising the step of selecting user selectable identification in a paper-based format for playing corresponding media content.