JP2002518723A

JP2002518723A - Creating animation from video

Info

Publication number: JP2002518723A
Application number: JP2000554125A
Authority: JP
Inventors: チェン，シェンチャン・エリック; ターン，ウェイ−ツウ・ヘレン; ブラント，ジョナサン
Original assignee: プレゼンター．コム
Priority date: 1998-06-11
Filing date: 1999-06-09
Publication date: 2002-06-25
Also published as: CN1305620A; AU4558899A; WO1999065224A3; EP1097568A2; WO1999065224A2; HK1038625A1

Abstract

(57)【要約】アニメーションを創出および格納するため、およびアニメーションをビデオにリンクさせるための装置と方法。ビデオ画像のシーケンス（１０）は、ビデオ画像のシーケンスに描写されたシーンの第１変形を識別するために検査される。第１画像および第２画像は、ビデオ画像のシーケンスから得られ、第１画像は第１変形前のシーンを表し、第２画像は第１変形後のシーンを表す。アニメーション（１２）の格納については、ビデオから創出されたキーフレームのセットは、アニメーション・オブジェクト（３０）に格納されている。キーフレームのセットから選択されたキーフレームの第１シーケンスを示す１つまたは複数の値（３３、３５）は、第１シーケンスのキーフレームの間で補間するための情報と共に、アニメーション・オブジェクト（３０）に格納されている。キーフレームのセットから選択されたキーフレームの第２シーケンスを示す１つまたは複数の値も、第２シーケンスのキーフレームの間で補間するための情報と共にアニメーション・オブジェクトに格納されている。第２シーケンスのキーフレームの数は、第１シーケンスのキーフレームの数より少ない。ビデオとアニメーションの連結については、第１ビデオのそれぞれのフレームに対応する要素を含むデータ構造が生成されている。第２ビデオから創出されたアニメーションを示す情報は、１つまたは複数のデータ構造の要素に格納されている。 (57) Abstract: An apparatus and method for creating and storing animation and linking animation to video. The sequence of video images (10) is examined to identify a first variant of the scene depicted in the sequence of video images. The first image and the second image are obtained from a sequence of video images, wherein the first image represents the scene before the first deformation and the second image represents the scene after the first deformation. For the storage of animation (12), a set of keyframes created from the video is stored in an animation object (30). One or more values (33, 35) indicating a first sequence of keyframes selected from the set of keyframes, along with information for interpolating between the keyframes of the first sequence, are included in the animation object (30). ) Is stored. One or more values indicating a second sequence of keyframes selected from the set of keyframes are also stored in the animation object along with information for interpolating between keyframes of the second sequence. The number of key frames in the second sequence is less than the number of key frames in the first sequence. As for the connection between the video and the animation, a data structure including elements corresponding to respective frames of the first video is generated. Information indicating the animation created from the second video is stored in one or more elements of the data structure.

Description

DETAILED DESCRIPTION OF THE INVENTION

【０００１】（発明の分野）本発明は画像アニメーションの分野に関する。さらに詳細には、ビデオから自
動的にアニメーションを創出することに関する。[0001] The present invention relates to the field of image animation. More particularly, it relates to automatically creating animations from videos.

【０００２】（発明の背景）インターネットは、エンド・ユーザに完全なモーション・ビデオを配信するた
めに、ますます人気のある媒体となってきている。しかし、帯域幅が制限されて
いるために、大半のユーザは、要求に応じて高品質なビデオをダウンロードして
見るということができない。たとえば、１秒間に３０フレームで６４０×４８０
画素に圧縮した解像度のビデオを配信するためには、画像データを約８Ｍｂｓ（
メガ・ビット毎秒）で伝送しなければならず、必要な帯域幅は、今日大半のイン
ターネット・ユーザが利用可能な２８．８Ｋｂｓ（キロ・ビット毎秒）のモデム
速度の約３００倍である。産業用標準圧縮技術（たとえば、ＭＰＥＧ−Ｍｏｖｉ
ｎｇＰｉｃｔｕｒｅＥｘｐｅｒｔＧｒｏｕｐ）を使用しても、インターネ
ットに関するビデオ・エフェクトは、今日ではテレビで経験するよりも低品質の
スライド・ショーのようなものであることが多い。BACKGROUND OF THE INVENTION The Internet has become an increasingly popular medium for delivering complete motion video to end users. However, due to limited bandwidth, most users cannot download and watch high quality video on demand. For example, 640 x 480 at 30 frames per second
In order to deliver a video having a resolution compressed to pixels, the image data is converted to about 8 Mbs (
Megabits per second), the required bandwidth is about 300 times the modem speed of 28.8 Kbs (kilobits per second) available to most Internet users today. Industrial standard compression techniques (eg, MPEG-Mobi
Even with the use of the ng Picture Expert Group, video effects on the Internet are often like today's lower quality slide shows than experienced on television.

【０００３】ビデオ・エフェクトを創出するためにキーフレームと補間法を使用するアニメ
ーションは、潜在的に、ビデオを伝送するよいりもはるかに狭い帯域幅しか必要
としない。パーソナル・コンピュータの性能が改良されたので、テレビ品質のビ
デオ・エフェクトは、狭い帯域幅モデムを使用して受信することができる比較的
少ないキーフレームからリアルタイムで合成することができる。数秒毎にキーフ
レームの伝送が必要なアニメーション・シーケンスは、ビデオに比較して膨大な
帯域幅のセービングで配信され、さらに、非常に優れた画質を提供することがで
きる。[0003] Animations that use keyframes and interpolation to create video effects potentially require much less bandwidth to transmit the video. As personal computer performance has improved, television quality video effects can be synthesized in real time from relatively few key frames that can be received using a low bandwidth modem. Animation sequences that require the transmission of key frames every few seconds can be delivered with enormous bandwidth savings compared to video, and still provide very good image quality.

【０００４】必要な帯域幅が狭いということに加え、アニメーションは、再生画質およびフ
レーム・レートの両方でビデオよりもスケーリング可能である。ビデオ・エフェ
クトは再生中に稼動中して合成されるので、フレーム・レートと画質は、再生プ
ロセッサのスピード、ネットワークの帯域幅、およびユーザの好みなどのいくつ
かの要因に基づいて、動的に調整することができる。In addition to requiring less bandwidth, animation is more scalable than video in both playback quality and frame rate. As video effects are synthesized on the fly during playback, frame rate and image quality can be dynamically determined based on several factors, such as playback processor speed, network bandwidth, and user preferences. Can be adjusted.

【０００５】ユーザの交流や他のタイプの編集のために特徴を追加することも、アニメーシ
ョンではビデオよりもはるかに容易である。たとえば、カメラのパニング・パス
または被写体の動きの早さを調節するには、アニメーションの数個のキーフレー
ムに関連付けられた運動パラメータを変化させるだけでよい。同じエフェクトを
達成するためにビデオ・クリップを編集するには、何百というフレームの変更が
必要である。同様に、移動する被写体をある時間にわたって追跡するホット・ス
ポットを取り付けることは、アニメーションではビデオよりもはるかに容易に達
成することができる。[0005] Adding features for user interaction and other types of editing is also much easier with animation than with video. For example, to adjust the panning path of the camera or the speed of the subject's movement, it is only necessary to change the motion parameters associated with a few keyframes of the animation. Editing a video clip to achieve the same effect requires hundreds of frame changes. Similarly, installing a hot spot that tracks a moving subject over time can be achieved much more easily with animation than with video.

【０００６】アニメーションは欠点を有する。熟練した人間のアニメータは、伝統的に高品
質のアニメーションを創出することを要求されてきたので、アニメーションの工
程は高価で費用がかかることが多い。さらに人間のアニメータは、手でキーフレ
ームをスケッチすることが多いので、アニメーションは漫画のように見える傾向
があり、たいてい本物の世界のシーンを描写するために必要とされる真に迫った
画像に欠けている。ある場合には、アニメーションは、建築用ブロックのような
原始的な二次元および三次元物体で創出される。このタイプのアニメーションも
、自然な様相よりはむしろ人工的な様相を有する傾向があり、通常グラフィック
情報の表示に限定されている。[0006] Animation has drawbacks. Since skilled human animators have traditionally been required to create high quality animations, the animation process is often expensive and expensive. Furthermore, since human animators often sketch keyframes by hand, animations tend to look like cartoons, often with the true-to-life images needed to portray real world scenes. Missing. In some cases, animations are created with primitive two-dimensional and three-dimensional objects, such as building blocks. This type of animation also tends to have an artificial rather than a natural appearance, and is usually limited to displaying graphic information.

【０００７】（発明の概要）アニメーションを創出する方法および装置を開示する。ビデオ画像のシーケン
スは、ビデオ画像のシーケンスに描写されたシーンの第１変形を識別するために
検査される。第１画像と第２画像はビデオ画像のシーケンスから得られる。第１
画像は第１変形前のシーンを表し、第２画像は第１変形後のシーンを表す。第１
変形を示し、さらにビデオ画像のシーケンスの表示に似たビデオ・エフェクトを
作成するように第１画像と第２画像の間で補間するために使用することができる
情報が生成される。SUMMARY OF THE INVENTION A method and apparatus for creating an animation is disclosed. The sequence of video images is examined to identify a first variation of the scene depicted in the sequence of video images. The first image and the second image are obtained from a sequence of video images. First
The image represents the scene before the first transformation, and the second image represents the scene after the first transformation. First
Information is generated that indicates the deformation and that can be used to interpolate between the first and second images to create a video effect similar to displaying a sequence of video images.

【０００８】アニメーションを格納する方法および装置についても開示する。ビデオから創
出された１セットのキーフレームは、アニメーション・オブジェクトに格納され
る。キーフレームのセットから選択したキーフレームの第１シーケンスを示す１
つまたは複数の値は、第１シーケンスのキーフレーム間で補間するための情報と
共にアニメーション・オブジェクトに格納される。キーフレームのセットから選
択したキーフレームの第２シーケンスを示す１つまたは複数の値は、ビデオから
創出した１セットのキーフレームをアニメーション・オブジェクトに格納する第
１シーケンスのキーフレーム間で補間するための情報と共にアニメーション・オ
ブジェクトに格納される。第２シーケンスのキーフレームの数は、第１シーケン
スのキーフレームの数より少ない。[0008] A method and apparatus for storing animation is also disclosed. A set of keyframes created from the video are stored in the animation object. 1 indicating the first sequence of keyframes selected from the set of keyframes
The one or more values are stored in the animation object along with information for interpolating between the first sequence of keyframes. The one or more values indicating a second sequence of keyframes selected from the set of keyframes are used to interpolate between the first sequence of keyframes storing the set of keyframes created from the video in the animation object. Is stored in the animation object together with the information of The number of key frames in the second sequence is less than the number of key frames in the first sequence.

【０００９】ビデオとアニメーションをリンクする方法と装置についても開示する。第１ビ
デオのそれぞれのフレームに対応する要素を含むデータ構造が生成され、第２ビ
デオから創出されたアニメーションの画像を示す情報は、データ構造の１つまた
は複数の要素に格納される。A method and apparatus for linking video and animation are also disclosed. A data structure is generated that includes an element corresponding to each frame of the first video, and information indicating an image of the animation created from the second video is stored in one or more elements of the data structure.

【００１０】本発明の他の特徴および利点は、添付の図、および下記の詳細な説明から明ら
かになろう。[0010] Other features and advantages of the invention will be apparent from the accompanying drawings, and from the detailed description that follows.

【００１１】本発明を例によって説明し、参考文献と同様に類似の要素を示す添付図の図表
に制限はない。The present invention is illustrated by way of example, and there is no limitation on the figures of the accompanying drawings that show similar elements as well as references.

【００１２】（詳細な説明）本明細書において説明する実施形態によれば、ビデオは、キーフレームおよび
キーフレーム間で補間するための情報を含むアニメーションを自動的に創出する
ために分析される。キーフレームおよび補間情報は、アニメーション再生中に即
座に（オンザフライに）画像を合成するために使用することが可能である。表示
されるときに、合成された画像は、元のビデオを近似するビデオ・エフェクトを
作成する。画像の運動や色の変化などのビデオ・エフェクトは、アニメーション
では通常ビデオよりはるかに少ない情報で表示することができるので、アニメー
ションは、インターネットなどの通信ネットワーク経由で伝送されるとき、非常
に少ない帯域幅を費やすだけである。たとえば、本明細書で説明する方法および
装置を使用して、何百という画像のフレームを含むビデオを、極少数のキーフレ
ームおよびキーフレーム間で補間するための情報を含むアニメーションを創出す
るために使用することが可能である。アニメーション再生能力を有するデスクト
ップ・コンピュータなどの再生システムでアニメーションを受信するとき、再生
システムは、アニメーションが受信されるにつれ、画像を合成および表示するた
めに、アニメーションに与えられているキーフレームと補間情報を使用すること
ができる。アニメーションはビデオよりもコンパクトであり、同時にビデオを受
信および表示する帯域幅を有さない再生システムによっても、アニメーションを
同時に受信および表示することができるので、ビデオに基づいてアニメーション
を自動的に創出することが、本発明に開示する実施形態の意図する利点である。
さらに、再生中にユーザがアニメーションを見ることとビデオを見ることを切り
替えられるようにアニメーションとビデオをクロスリンククロスリンクすること
は、本明細書で開示する実施形態のさらに意図する利点である。選択可能な時間
的および空間的解像度を有するアニメーションを提供し、再生システムの特徴に
適した時間的および空間的解像度を有するアニメーションを選択し、再生システ
ムに配信するサーバ・システムを提供することは、本明細書で開示する実施形態
のさらに意図する利点である。DETAILED DESCRIPTION According to embodiments described herein, a video is analyzed to automatically create animations that include keyframes and information for interpolating between keyframes. The keyframes and interpolation information can be used to synthesize the image immediately (on the fly) during animation playback. When displayed, the composited image creates a video effect that approximates the original video. Because video effects, such as image motion and color changes, can be displayed with much less information in animation than normal video, animation has very low bandwidth when transmitted over communications networks such as the Internet. Just spend the breadth. For example, using the methods and apparatus described herein to create an animation that includes information for interpolating a video containing hundreds of image frames between a very small number of keyframes and keyframes. It is possible to use. When receiving an animation on a playback system, such as a desktop computer having animation playback capabilities, the playback system will provide keyframes and interpolation information provided with the animation to synthesize and display the image as the animation is received. Can be used. Animations are more compact than video and automatically create animations based on video because animations can be received and displayed simultaneously, even by playback systems that do not have the bandwidth to receive and display video at the same time. This is the intended advantage of the embodiments disclosed in the present invention.
Further, it is a further intended advantage of the embodiments disclosed herein that the animation and the video be crosslinked so that the user can switch between watching the animation and watching the video during playback. Providing a server system that provides animations with selectable temporal and spatial resolutions, selects animations with temporal and spatial resolutions appropriate for the characteristics of the playback system, and distributes the animations to the playback system It is a further intended advantage of the embodiments disclosed herein.

【００１３】これらおよび他の意図する利点について下記で説明する。[0013] These and other intended advantages are described below.

【００１４】用語本明細書では、「ビデオ」という用語は、所定のレートでカメラによって獲得
された画像シーケンス、または所定のレートで再生のために画像生成器によって
生成された画像シーケンスを指す。画像のシーケンスにある各画像はビデオのフ
レームに含まれ、画像で表されている実世界の主題はシーンと呼ばれる。ビデオ
・データは、各フレームごとにフレームの画像を表すデータがあるように格納さ
れる。このデータは、圧縮形態、または非圧縮ビット・マップである。理論的に
はあらゆる獲得レートが使用可能であるが、通常獲得レートは、シーンで人間が
知覚可能な運動を獲得するのに十分な速さ（たとえば毎秒１０フレーム以上）で
ある。Terminology As used herein, the term “video” refers to an image sequence acquired by a camera at a predetermined rate or generated by an image generator for playback at a predetermined rate. Each picture in the sequence of pictures is contained in a frame of video, and the real-world subject represented by the picture is called a scene. The video data is stored such that for each frame there is data representing an image of the frame. This data is in a compressed form or an uncompressed bit map. While any acquisition rate can be used in theory, the normal acquisition rate is fast enough (eg, 10 frames per second or more) to acquire human perceptible motion in the scene.

【００１５】フィルム、ＮＴＳＣビデオ（ＮａｔｉｏｎａｌＴｅｌｅｖｉｓｉｏｎＳｔ
ａｎｄａｒｄＣｏｄｅ）、または他のあらゆる中継または録画ビデオ・フォー
マットを含む源からビデオを提供することが可能であるが、これに限定されるも
のではない。ビデオは、陰極線管ディスプレイ（ＣＲＴ）、液晶ディスプレイ、
プラズマ・ディスプレイなどを含むいくつかの異なるタイプのディスプレイで表
示することが可能である。Film, NTSC Video (National Television St)
video can be provided from, including, but not limited to, standard code, or any other relay or recorded video format. Video can be displayed on a cathode ray tube display (CRT), liquid crystal display,
It is possible to display on several different types of displays, including plasma displays.

【００１６】本明細書で使用する用語「アニメーション」は、キーフレムおよびキーフレー
ム間で補間するための情報を含むデータ構成を指す。キーフレームは、シーンの
増分変形を記述、または記述するために使用することができる画像である。ある
実施形態では、新しいキーフレームをシーンの各増分変形（incremental transf
ormation）に提供し、システムの要求とユーザの好みに応じて、増分変形を構成
するものを決定する基準を調整することができる。基準が鋭敏になるにつれ（す
なわちシーンの変形が小さくなるにつれ）、多くのキーフレームがアニメーショ
ンに存在するようになる。As used herein, the term “animation” refers to a data structure that includes information for interpolating between keyframes and keyframes. A keyframe is an image that describes, or can be used to describe, incremental deformations of a scene. In one embodiment, a new keyframe is added to each incremental transf
and the criteria that determine what constitutes an incremental deformation can be adjusted according to system requirements and user preferences. As the criteria become more sensitive (ie, as the deformation of the scene decreases), more keyframes will be present in the animation.

【００１７】ある実施形態によれば、２つのタイプのキーフレーム、背景フレームとオブジ
ェクト・フレームがアニメーションに存在する。背景フレームは、背景運動また
は色の変化に起因するキーフレームである。背景運動は、通常シーンを録画する
ために使用するカメラの配置の変化によって生じる。典型的なカメラの配置の変
化には、カメラの並進、回転、パニング、傾斜、またはズームが含まれるが、こ
れに限定されるものではない。色の変化は、シーンの照明の変化（口径の変化な
どカメラの配置の変化に起因することもある）に起因することが多いが、シーン
内の広範な領域で色が変化することにより生じることもある。According to one embodiment, there are two types of key frames, background frames and object frames, in the animation. Background frames are key frames that result from background motion or color changes. Background motion is usually caused by changes in the arrangement of cameras used to record the scene. Typical camera placement changes include, but are not limited to, camera translation, rotation, panning, tilting, or zooming. Color changes are often due to changes in scene lighting (sometimes due to changes in camera placement, such as changes in aperture), but can also be caused by color changes in a wide area of the scene. There is also.

【００１８】オブジェクト・フレームは、シーン内のオブジェクトの運動または色の変化に
起因するキーフレームであり、シーンを録画するために使用するカメラの配置の
変化に起因するものではない。カメラの運動とは独立して運動または色を変化さ
せるシーンのオブジェクトは、本明細書では動的オブジェクトと呼ぶ。所与のオ
ブジェクトが動的オブジェクトであるかシーンの背景の部分であるかは、シーン
の残りの部分に対してオブジェクトがどれだけ大きいかにある程度依存すること
を理解されよう。オブジェクトが十分に大きくなると（たとえば物理的または光
学的にカメラに近いので）、動的オブジェクトは実際上シーンの背景になる。An object frame is a key frame that results from a movement or color change of an object in the scene, and not from a change in the arrangement of cameras used to record the scene. Objects in the scene that change motion or change color independently of camera motion are referred to herein as dynamic objects. It will be appreciated that whether a given object is a dynamic object or a background part of the scene will depend in part on how large the object is relative to the rest of the scene. When the object is large enough (eg, physically or optically close to the camera), the dynamic object effectively becomes the scene background.

【００１９】本明細書で開示する実施形態によれば、背景フレームのシーケンスおよび背景
フレーム間で補間するための情報は、背景トラックと呼ばれるデータ構造に格納
される。同様に、オブジェクト・フレームのシーケンスおよびオブジェクト・フ
レーム間で補間するための情報は、オブジェクト・トラックと呼ばれるデータ構
造に格納される。本明細書で開示する方法と装置で創出されたアニメーションは
、少なくとも１つの背景トラックとゼロまたは１つ以上のオブジェクト・トラッ
クを含む。背景トラックおよびオブジェクト・トラックは、アニメーション・オ
ブジェクトと呼ばれるデータ構造に格納される。アニメーションは、メモリに格
納するためのアニメーション・オブジェクト、あるいは通信ネットワークまたは
装置のサブシステム間で逐次伝送するためのアニメーション・データ・ストリー
ムで明示することが可能である。According to the embodiments disclosed herein, the sequence of background frames and information for interpolating between background frames is stored in a data structure called a background track. Similarly, information for interpolating between sequences of object frames and object frames is stored in a data structure called an object track. Animations created with the methods and apparatus disclosed herein include at least one background track and zero or more object tracks. Background tracks and object tracks are stored in data structures called animation objects. Animations can be specified in an animation object for storage in memory or in an animation data stream for sequential transmission between communication network or device subsystems.

【００２０】アニメーションの創出および配信図１は、アニメーション１４の創出と再生システム１８へのアニメーション１
４の配信を示す。アニメーションは、ビデオ源１０を使用するアニメーション制
作システム１２によって創出される。創出後または創出中に、アニメーション１
４は、アニメーション・データ・ストリーム１５に変換され、通信ネットワーク
２０を経由して、再生システム１８に配信される。代替として、アニメーション
１４は、アニメーションを表示するために、再生システム１８のサブシステムで
読むことができる配信可能な格納媒体２１で再生システム１８に配信される。配
信可能な格納媒体は例として磁気テープ、磁気ディスク、コンパクト・ディスク
、読取り専用記憶装置（ＣＤＲＯＭ）、デジタル・ビデオ・ディスケット（Ｄ
ＶＤ）などを含むが、これに限定されるものではない。再生システム１８は、ア
ニメーション再生専用に設計された装置（たとえばＤＶＤまたはカセット・プレ
ーヤ）、またはアニメーション１４を（たとえば通信ネットワークまたは配信可
能な媒体を経由して）得、アニメーション１４を表示するためにアニメーション
再生ソフトウェアを実行するようにプログラムされた汎用コンピュータ・システ
ムとすることができる。たとえば、ウェブ・ブラウジング・アプリケーション・
プログラムを、アニメーション再生システムを実現する、あらゆる数の異なるタ
イプのコンピュータ（たとえば、ＡｐｐｌｅＭａｃｉｎｔｏｓｈのコンピュー
タ、ＩＢＭ互換性パーソナル・コンピュータ、ワークステーションなど）で実行
することが可能である。アニメーション１４を再生するプログラム・コードは、
ウェブ・ブラウジング・アプリケーションがアニメーション・データ・ストリー
ム１５が受信されることになることを決定したとき、コンピュータの動作メモリ
に装備されているウェブ・ブラウジング・アプリケーション・プログラム自体ま
たはウェブ・ブラウジング・アプリケーションのエクステンションに含まれるこ
とができる。Creation and Distribution of Animation FIG. 1 shows creation of animation 14 and animation 1 to playback system 18.
4 is shown. The animation is created by an animation production system 12 using a video source 10. Animation 1 after or during creation
4 is converted to an animation data stream 15 and distributed to a playback system 18 via a communication network 20. Alternatively, animation 14 is distributed to playback system 18 on a distributable storage medium 21 readable by subsystems of playback system 18 for displaying the animation. Distributable storage media are, for example, magnetic tapes, magnetic disks, compact disks, read-only storage (CD ROM), digital video diskettes (D
VD) and the like, but are not limited thereto. The playback system 18 obtains a device (eg, a DVD or cassette player) or an animation 14 (eg, via a communication network or a distributable medium) designed specifically for playing the animation, and displays the animation 14 to display the animation 14. It can be a general purpose computer system programmed to execute playback software. For example, a web browsing application
The program may be executed on any number of different types of computers (eg, Apple Macintosh computers, IBM compatible personal computers, workstations, etc.) that implement the animation playback system. The program code for playing animation 14 is:
When the web browsing application determines that the animation data stream 15 is to be received, the web browsing application program itself or an extension of the web browsing application provided in the operating memory of the computer. Can be included.

【００２１】点線の矢印１９および点線の伝送経路１７で示すように、サーバ・システム１
６は、ネットワーク２０上の再生システムへのアニメーション配信を制御するた
めに使用することができる。たとえば、サーバ・システム１６は、あるクラスの
加入者に所属する再生システムからのアニメーションのダウンロード要求に優先
順位を与えたり、サービス構成または他の基準に基づいて、利用可能なアニメー
ションのメニューに対するアクセスを制限するために使用することが可能である
。さらに具体的な例として、家庭でプロジェクトを改良する命令アニメーション
（たとえば、タイルを張る、ドアを飾る、天井に扇風機をとりつけるなど）を提
供するために使用されるワールド・ワイド・ウェッブ・サイト（すなわちサーバ
・コンピュータ）について考察する。サイト提供者は、関心のある訪問者がサー
ビスの有用性について知ることを可能とする少なくとも１つのアニメーションを
自由に利用できるようにすることを望む可能性がある。他のアニメーションは、
ペイ・パー・ビューの原則に基づいてダウンロード可能である。またサイト提供
者は、加入者が定期的な支払いの見返りとして全てのアニメーションをダウンロ
ードするために自由にアクセスできるようにサイトへの加入権を販売することも
できる。サーバ・システム１６は、これらの異なるクラスの依頼者からのダウン
ロードの要求を区別し、それに応じて応答するために使用することができる。As indicated by the dotted arrow 19 and the dotted transmission path 17, the server system 1
6 can be used to control animation distribution to a playback system on the network 20. For example, the server system 16 may prioritize animation download requests from playback systems belonging to a class of subscribers, or provide access to a menu of available animations based on service configuration or other criteria. Can be used to limit. As a more specific example, the World Wide Web site used to provide instructional animations (eg, tiles, decorate doors, mount fans on ceilings, etc.) to improve projects at home (ie, Server computer). Site providers may wish to have at least one animation at their disposal that allows interested visitors to know about the usefulness of the service. Other animations
It is downloadable on a pay-per-view basis. The site provider can also sell subscription rights to the site so that subscribers have free access to download all animations in return for periodic payments. The server system 16 can be used to distinguish download requests from these different classes of requesters and respond accordingly.

【００２２】サーバ・システム１６の他の使用方法は、様々な異なるアニメーション・フォ
ーマットの１つのフォーマットで、アニメーション１４を再生システム１８に提
供することである。使用される特定のフォーマットは、伝送ネットワーク帯域幅
および再生システムの能力に基づいて決定することができる。たとえば、所与の
再生システム１８は、アニメーション１４が再生システム１８が理解することが
できる特定のフォーマットまたは言語（たとえば、ジャバ、ダイナミック・ハイ
パーテキスト・マークアップ・ラングェッジ（Ｄ−ＨＴＭＬ）、バーチュアル・
リアリティ・モデリング・ラングェッジ（ＶＲＭＬ）、マクロメディア・フラッ
シュ・フォーマットなど）で記述されることを要求する可能性がある。またアニ
メーション１４の背景およびオブジェクト・フレームは、通常再生システム１８
のダウンロード・レート（たとえばモデムのスピード）で制限されている伝送ネ
ットワークの帯域幅を超えることを避けるために、特定の空間および時間解像度
で送る必要がある。ある実施形態では、アニメーション言語とネットワーク帯域
幅の多くの可能な変更に適応させるために、アニメーション１４を言語と帯域幅
に独立したフォーマットで格納する。次いでサーバ・システム１６は、再生シス
テム１８のフォーマットおよび帯域幅の要求により、動的にアニメーション・デ
ータ・ストリームを創出するために使用することができる。このサーバ・システ
ムの動作について、下記でさらに詳細に説明する。Another use of server system 16 is to provide animation 14 to playback system 18 in one of a variety of different animation formats. The particular format used can be determined based on the transmission network bandwidth and the capabilities of the playback system. For example, a given playback system 18 may use a particular format or language in which the animation 14 can be understood by the playback system 18 (eg, Java, Dynamic Hypertext Markup Language (D-HTML), Virtual
It may require that it be described in Reality Modeling Langedge (VRML), Macromedia Flash format, etc. Also, the background and object frames of the animation 14 are stored in the normal playback system 18.
In order to avoid exceeding the bandwidth of the transmission network, which is limited by the current download rate (eg, modem speed), it must be sent at a specific spatial and temporal resolution. In some embodiments, the animation 14 is stored in a language and bandwidth independent format to accommodate many possible changes in animation language and network bandwidth. The server system 16 can then be used to dynamically create the animation data stream, depending on the format and bandwidth requirements of the playback system 18. The operation of the server system will be described in more detail below.

【００２３】引き続き図１を参照すると、再生システム１８は、通信ネットワーク２０から
、または局所的にアクセス可能である格納媒体（たとえば、ＤＶＤ、ＣＤＲＯＭ
、カセット・テープなど）に格納されているアニメーション・オブジェクトを読
むことにより、アニメーション・データ・ストリームを得ることが可能である。
ある実施形態では、再生システム１８は再生、ポーズ、早送り、巻き戻し、およ
びステップ機能を含む時間ベース制御器である。他の実施形態では、再生システ
ム１８は、アニメーションまたはビデオを表示するために、アニメーション再生
モードとビデオ再生モードを切り替えることができる。再生システム１８は、ユ
ーザがアニメーション内のホット・スポットをクリックしたり、アニメーション
のフレーム内でパンおよびズームしたり、アニメーションの静止フレームをダウ
ンロードしたりできるように、双方向の非時間ベース再生モードを含むこともで
きる。再生システムの追加の実施形態を下記に説明する。Still referring to FIG. 1, the playback system 18 includes a storage medium (eg, DVD, CDROM) that is accessible from a communication network 20 or locally.
An animation data stream can be obtained by reading an animation object stored on a storage device (e.g., cassette tape, etc.).
In one embodiment, playback system 18 is a time-based controller that includes play, pause, fast forward, rewind, and step functions. In another embodiment, the playback system 18 can switch between an animation playback mode and a video playback mode to display an animation or video. The playback system 18 provides a bi-directional, non-time based playback mode so that the user can click on hot spots in the animation, pan and zoom within the frames of the animation, and download still frames of the animation. Can also be included. Additional embodiments of the playback system are described below.

【００２４】アニメーション制作システム図２は、ある実施形態によるアニメーション制作システム１２のブロック図で
ある。アニメーション制作システム１２は、背景トラック生成器２５と、オブジ
ェクト・トラック生成器２７と、アニメーション・オブジェクト生成器２９とを
含む。ビデオ源１０は、初めに、背景トラック３３を生成するために、ビデオ源
１０にあるフレームのシーケンスを分析する背景トラック生成器２５で受信され
る。背景トラック３３は、背景フレームのシーケンスと、背景フレーム間で補間
するために使用することができる情報を含む。ある実施形態では、背景トラック
生成器２５は、背景トラック３３を対称物トラック生成器２７に出力し、背景ト
ラック完了後アニメーション・オブジェクト生成器２９に出力する。他の実施形
態では、背景トラック生成器２５は、背景トラック３３を対称物トラック生成器
２７に出力し、背景トラック３３内で新しい背景フレームが完了する度に、アニ
メーション・オブジェクト生成器２９に出力する。Animation Production System FIG. 2 is a block diagram of an animation production system 12 according to one embodiment. The animation production system 12 includes a background track generator 25, an object track generator 27, and an animation object generator 29. The video source 10 is first received at a background track generator 25 that analyzes a sequence of frames at the video source 10 to generate a background track 33. The background track 33 contains a sequence of background frames and information that can be used to interpolate between the background frames. In one embodiment, the background track generator 25 outputs the background track 33 to the symmetric object track generator 27 and outputs it to the animation object generator 29 after the background track is completed. In another embodiment, the background track generator 25 outputs the background track 33 to the symmetric object track generator 27 and to the animation object generator 29 each time a new background frame in the background track 33 is completed. .

【００２５】図２からわかるように、オブジェクト・トラック生成器２７は、背景トラック
生成器２５からの背景トラック３３とビデオ源１０を受信する。オブジェクト・
トラック生成器２７は、背景トラック３３およびビデオ源１０に基づいてゼロま
たは１以上のオブジェクト・トラック３５を生成し、オブジェクト・トラック３
５をアニメーション・オブジェクト生成器２９へと進める。各オブジェクト・ト
ラック３５は、オブジェクト・フレームのシーケンスと、オブジェクト・フレー
ム間で補間するために使用することができる伝送情報を含む。As can be seen from FIG. 2, the object track generator 27 receives the background track 33 from the background track generator 25 and the video source 10. object·
The track generator 27 generates zero or more object tracks 35 based on the background track 33 and the video source 10,
5 to the animation object generator 29. Each object track 35 contains a sequence of object frames and transmission information that can be used to interpolate between object frames.

【００２６】アニメーション・オブジェクト生成器２９は、背景トラック生成器２５からの
背景トラック３３と、オブジェクト・トラック生成器２７からのゼロまたは１以
上のオブジェクト・トラック３５を受信し、アニメーション・オブジェクト３０
にトラックを書き込む。下記で説明するように、アニメーション・オブジェクト
３０は、背景トラックとオブジェクト・トラックの複数の時間および空間解像度
を含むようにフォーマットすることが可能である。The animation object generator 29 receives the background track 33 from the background track generator 25 and zero or more object tracks 35 from the object track generator 27, and receives the animation object 30.
Write track to As described below, the animation object 30 can be formatted to include multiple temporal and spatial resolutions of the background track and the object track.

【００２７】図３は、ある実施形態による背景トラック生成器２５のブロック図である。背
景トラック生成器２５は、シーン変化推定装置４１、背景フレーム・コンストラ
クタ４３、背景運動推定装置４５、および背景混合推定装置４７を含む。FIG. 3 is a block diagram of the background track generator 25 according to one embodiment. The background track generator 25 includes a scene change estimator 41, a background frame constructor 43, a background motion estimator 45, and a background mixture estimator 47.

【００２８】シーン変化推定装置４１は、ビデオ・フレームのシーンの変形がいつ閾値を超
えるかを決定するために、連続するビデオ源１０のフレームを互いに比較する。
ビデオ源１０全体に加えられたとき、シーン変化推定装置４１の効果は、ビデオ
源１０にあるフレームのシーケンスを１つまたは複数のビデオ・フレームのサブ
シーケンス（すなわちビデオ・セグメント）に分割することであり、各サブシー
ケンスは、所定の閾値未満のシーン変形を示す。各ビデオ・セグメントは、背景
運動推定装置４５で処理することができ、背景フレーム・コンストラクタ４３お
よび背景混合推定装置４７は、ビデオ・セグメントに対し背景フレームおよび補
間情報を生成するために、シーン変化推定装置４１によって識別された各ビデオ
・セグメントを処理する。したがって、シーン変化推定装置４１によって加えら
れた所定の閾値は、結果的に新しい背景フレームの構築となるシーンの増分変形
を規定する。ある実施形態では、背景フレームは、おおよそ各ビデオ・セグメン
トの開始と終了に対応し、あるビデオ・セグメントの終了に対応する背景フレー
ムは、次のビデオ・セグメントの開始に対応する。したがって、各ビデオ・セグ
メントは背景フレームで記述されており、（開始および終了背景フレームが構築
される）第１ビデオ・セグメントを除いて、ある背景フレームはビデオ源１０の
各ビデオ・セグメントごとに構築されている。The scene change estimator 41 compares successive frames of the video source 10 with each other to determine when the scene deformation of a video frame exceeds a threshold.
When applied to the entire video source 10, the effect of the scene change estimator 41 is to divide the sequence of frames at the video source 10 into one or more sub-sequences of video frames (ie, video segments). Yes, each sub-sequence indicates a scene deformation below a predetermined threshold. Each video segment can be processed by a background motion estimator 45, and a background frame constructor 43 and a background blend estimator 47 generate scene change estimators to generate background frames and interpolation information for the video segments. Process each video segment identified by device 41. Thus, the predetermined threshold applied by the scene change estimator 41 defines the incremental deformation of the scene that results in the construction of a new background frame. In one embodiment, the background frame roughly corresponds to the start and end of each video segment, and the background frame corresponding to the end of one video segment corresponds to the start of the next video segment. Thus, each video segment is described by a background frame, and, except for the first video segment (where the start and end background frames are constructed), certain background frames are constructed for each video segment of video source 10. Have been.

【００２９】図４Ａは、図３のシーン変化推定装置４１によって識別されたビデオ・セグメ
ント５４を示す。ある実施形態によれば、シーン変化推定装置４１は、ビデオ・
セグメント５４の隣接するビデオ・フレームの各対に対し、変形ベクトルを決定
することによって動作する。本明細書では、フレームの時間シーケンスで第１フ
レームが第２フレームの直前または直後に続くならば、第１フレームは第２フレ
ームに隣接すると考える。FIG. 4A shows a video segment 54 identified by the scene change estimator 41 of FIG. According to one embodiment, the scene change estimator 41 includes a video
It operates by determining a deformation vector for each pair of adjacent video frames in segment 54. In this specification, a first frame is considered to be adjacent to a second frame if the first frame immediately follows or immediately follows the second frame in the time sequence of the frames.

【００３０】隣接するビデオ・フレームの各対に対する変形ベクトルを、それぞれデルタ（
すなわち記号「Δ」）によって図４Ａに示す。ある実施形態によれば、変形ベク
トルは、ビデオ・セグメント５４において、あるビデオ・フレームから次へのシ
ーンの変化の程度をそれぞれ示す複数のスカラ要素を含む。たとえば、変形ベク
トルのスカラ要素は、シーンにおける以下の変化、並進、スケーリング、回転、
パン、傾斜、スキュー、色の変化、および経過した時間の程度を含む。The deformation vector for each pair of adjacent video frames is represented by a delta (
That is, it is shown in FIG. 4A by the symbol “Δ”. According to one embodiment, the deformation vector includes a plurality of scalar elements in the video segment 54, each indicating a degree of change of the scene from one video frame to the next. For example, the scalar elements of the deformation vector are the following changes in the scene: translation, scaling, rotation,
Includes pan, tilt, skew, color change, and degree of elapsed time.

【００３１】ある実施形態によれば、シーン変化推定装置４１は、隣接するフレーム間の変
形デルタを計算する前に、ビデオ・セグメント５４における画像の濃淡を増大さ
せるために、ビデオ・セグメント５４に空間的低域通過フィルタを加える。低域
通過フィルタでろ過した後、ビデオ・セグメント５４の個々の画像は、変形デル
タを決定するために要求される計算が少なくなるので、ろ過前よりも少ない情報
を含む。ある実施態様では、ビデオ・セグメント５４の各隣接フレーム対に対し
て計算した変形デルタは、変形デルタの合計を蓄積するために、先行隣接フレー
ム対に対して計算した変形デルタに加えられる。実際には、変形デルタの合計は
、ビデオ・セグメント５４における第１ビデオ・フレーム５４Ａと、ビデオ・セ
グメント５４における最も最近に比較されたビデオ・フレームとの間の変形を表
す。ある実施形態では、変形デルタの合計は、最も最近比較されたビデオ・フレ
ームが変形閾値を超えたかどうかを決定するために、所定の変形閾値に対して比
較される。変形閾値は、シーンの色の変化、並進、スケーリング、回転、パニン
グ、傾斜、スキュー、および経過した時間に対する閾値を含む、複数のスカラ閾
値を含むベクトル量であることを理解されたい。代替実施形態では、ビデオ源１
０でフレームに対する望ましいビデオ・セグメント比を達成するために、動的に
変形閾値を調整する。他の代替実施形態では、望ましい平均ビデオ・セグメント
サイズ（すなわちビデオ・セグメントあたりの望ましいビデオ・フレームの数）
を達成するために、動的に変形閾値を調整する。さらに他の代替実施形態では、
ビデオ・セグメントあたり望ましい平均経過時間を達成するために、動的に変形
閾値を調整する。一般に、本発明の精神および範囲から逸脱することなく、変形
閾値を動的に調整するあらゆる技術を使用することが可能である。According to one embodiment, the scene change estimator 41 calculates a spatial delta between video frames 54 in order to increase the shading of the images in the video segments 54 before calculating the deformation delta between adjacent frames. Add a low pass filter. After filtering with a low pass filter, the individual images of video segment 54 contain less information than before filtering because fewer calculations are required to determine the deformation delta. In one embodiment, the modified delta calculated for each adjacent frame pair of video segment 54 is added to the modified delta calculated for the previous adjacent frame pair to accumulate the sum of the modified deltas. In effect, the sum of the deformation deltas represents the deformation between the first video frame 54A in video segment 54 and the most recently compared video frame in video segment 54. In some embodiments, the sum of the deformation deltas is compared against a predetermined deformation threshold to determine whether the most recently compared video frame has exceeded the deformation threshold. It should be understood that the deformation threshold is a vector quantity that includes multiple scalar thresholds, including thresholds for scene color change, translation, scaling, rotation, panning, tilt, skew, and elapsed time. In an alternative embodiment, video source 1
Dynamically adjust the deformation threshold to achieve the desired video segment to frame ratio at zero. In another alternative embodiment, the desired average video segment size (ie, the desired number of video frames per video segment)
, Dynamically adjust the deformation threshold. In yet another alternative embodiment,
The deformation threshold is dynamically adjusted to achieve the desired average elapsed time per video segment. In general, any technique for dynamically adjusting the deformation threshold may be used without departing from the spirit and scope of the present invention.

【００３２】ある実施形態では、最も最近に比較されたビデオ・フレーム５４Ｃが閾値を超
えているならば、シーンは変化したとみなされ、最も最近比較されたビデオ・フ
レーム５４Ｃに先行するビデオ・セグメント５４Ｂは、ビデオ・フレーム５４の
終了フレームであることが示される。したがって、所定の変形閾値を使用するな
らば、ビデオ源１０の各ビデオ・セグメントは、変形閾値未満である変形全体を
有することが保証される。可変変形閾値を使用するならば、それぞれのビデオ・
セグメントの変形デルタ全体でかなりな変化が生じ、変形デルタの変化を低減す
るために、繰り返しシーン変化推定装置を加えることが必要である。In one embodiment, if the most recently compared video frame 54C exceeds a threshold, the scene is considered to have changed and the video segment preceding the most recently compared video frame 54C 54B is indicated to be the end frame of the video frame 54. Thus, if a predetermined deformation threshold is used, it is ensured that each video segment of video source 10 has an entire deformation that is less than the deformation threshold. If a variable deformation threshold is used, each video
Substantial changes occur throughout the deformation delta of the segment, and it is necessary to add a repetitive scene change estimator to reduce the change in the deformation delta.

【００３３】図３に示す実施形態によれば、背景トラック生成器２５は、新しいビデオ・セ
グメントがそれぞれ決まる度に（すなわち新しいシーンの変化がそれぞれ検出さ
れる度に）、背景運動推定装置４５、背景フレーム・コンストラクタ４３、およ
び背景混合推定装置４７を呼び出す。代替実施形態では、シーン変化推定装置４
１は、あらゆるサブシーケンスが背景フレーム・コンストラクタ４３、背景運動
推定装置４５、および背景混合推定装置４７によって処理される前に、ビデオを
完全にサブシーケンスに解像するために使用される。According to the embodiment shown in FIG. 3, the background track generator 25 uses the background motion estimator 45, each time a new video segment is determined (ie, each time a new scene change is detected). The background frame constructor 43 and the background mixing estimator 47 are called. In an alternative embodiment, the scene change estimator 4
1 is used to completely resolve the video into sub-sequences before any sub-sequences are processed by the background frame constructor 43, the background motion estimator 45, and the background blend estimator 47.

【００３４】図４Ａに示し上記で説明したように、所与のビデオ・セグメント内のビデオ・
フレームは、変形デルタの蓄積が変形閾値を超えるまで、続けて選択および比較
される。ある実施形態では、ビデオの最後のフレームに達したとき、最後のフレ
ームは自動的にビデオ・セグメントの終了であるとみなされる。また新しいビデ
オ・セグメントのそれぞれが背景フレーム・コンストラクタ４３によって処理さ
れた後は、変形デルタの合計は消去される。あらゆるビデオ・セグメントが処理
される前に全ビデオがシーン変化推定装置４１によって解析される実施形態では
、各ビデオ・セグメントに関連付けられた変形デルタは、背景運動推定装置４５
および背景フレーム・コンストラクタ４３によって、後の使用のために記録され
る。As shown in FIG. 4A and described above, the video source within a given video segment
Frames are subsequently selected and compared until the deformation delta accumulation exceeds the deformation threshold. In some embodiments, when the last frame of the video is reached, the last frame is automatically considered to be the end of the video segment. Also, after each new video segment has been processed by the background frame constructor 43, the sum of the deformation deltas is erased. In an embodiment where the entire video is analyzed by the scene change estimator 41 before every video segment is processed, the deformation delta associated with each video segment is determined by the background motion estimator 45.
And recorded by the background frame constructor 43 for later use.

【００３５】図４Ｂは、図３に示した背景運動推定装置４５、背景フレーム・コンストラク
タ４３、および背景混合推定装置４７の動作を説明する流れ図５７である。ブロ
ック５９に始まり、背景運動推定装置は、シーン変化推定装置によって示された
ビデオ・セグメント５４（すなわち図４ＡのＢＦ_iおよびＢＦ_i+1によって拘束さ
れているビデオ・フレーム５４のサブシーケンス）を、それらのフレームに示す
シーンの主な運動を識別するために検査する。この主な運動は背景運動とみなさ
れる。FIG. 4B is a flowchart 57 illustrating the operation of the background motion estimator 45, the background frame constructor 43, and the background mixture estimator 47 shown in FIG. Beginning at block 59, the background motion estimator extracts the video segment 54 indicated by the scene change estimator (ie, a sub-sequence of video frame 54 constrained by BF _i and BF _{i + 1 in} FIG. 4A). Inspect to identify the main motion of the scene shown in those frames. This primary movement is considered a background movement.

【００３６】ビデオ・セグメントにおける背景運動を識別するために使用することができる
いくつかの技術がある。特徴トラッキングと呼ばれるある技術は、（たとえば縁
検出技術を用いて）ビデオ・フレームの特徴を識別すること、およびあるビデオ
・フレームから次のフレームへの特徴の移動を追跡することを含む。他の特徴に
対して統計的に異常である運動を示す特徴は、動的オブジェクトとみなされ、一
時的に無視される。多くの数の特徴（または大きな特徴）によって共有される運
動は、通常ビデオを記録するために使用されるカメラの配置の変化によって生じ
、背景運動とみなされる。There are several techniques that can be used to identify background motion in a video segment. One technique, called feature tracking, involves identifying features of a video frame (eg, using edge detection techniques) and tracking the movement of features from one video frame to the next. Features exhibiting motion that is statistically abnormal with respect to other features are considered dynamic objects and are temporarily ignored. Motion shared by a large number of features (or large features) usually results from changes in the arrangement of cameras used to record video and is considered background motion.

【００３７】ビデオ・セグメントにおいて背景運動を識別する他の技術は、共通の領域に基
づいてビデオ・セグメントのフレームを互いに相関させ、次いでそれらの領域の
フレームからフレームへのずれを決定する。次いでフレームからフレームへのず
れは、ビデオ・セグメントの背景運動を決定するために使用することができる。Another technique for identifying background motion in a video segment correlates the frames of the video segment with each other based on a common region, and then determines the frame-to-frame displacement of those regions. The frame-to-frame shift can then be used to determine the background motion of the video segment.

【００３８】ビデオ・セグメントの背景運動を識別する技術としてさらに他に考察する技術
は、ビデオ・セグメントの中のフレームの空間的に段階的な分解を使用する粗対
繊調査方法、および推定した主な運動に従わないビデオ・フレームの要素を除去
する、Ｍ推定などの堅固な推定技術であるが、これに限定されるものではない。
段階的な分解は、シーンの変化を識別するための時間の経過に伴うビデオ・フレ
ーム・ヒストグラムの特徴の変化の測定、運動識別のために使用することができ
るビデオ・セグメントの特徴を強調させるためのろ過、光学的流れの測定および
分析、より高速な処理速度、より大きな信頼性、またはその両方を達成するため
に色ディスプレイ（グレースケールを含む）を代替する画素フォーマット変換で
ある。Still another technique to consider the background motion of a video segment is a coarse-textured survey method that uses a spatially gradual decomposition of the frames in the video segment, and an estimated primary fiber. Robust estimation techniques, such as, but not limited to, M estimation, which remove elements of the video frame that do not follow the motion.
Gradual decomposition measures changes in the characteristics of a video frame histogram over time to identify scene changes, to highlight features of video segments that can be used for motion identification Pixel format conversion, which replaces color displays (including grayscale) to achieve higher filtration, optical flow measurement and analysis, faster processing speed, greater reliability, or both.

【００３９】引き続き図４Ｂの流れ図５７を参照すると、背景フレーム・コンストラクタは
ブロック６１で背景運動推定装置から背景運動情報を受け取り、ビデオ・セグメ
ントのフレームを互いに比べて登録するために背景運動情報を用いる。登録は、
背景運動によって生じた変化を説明する方式でビデオ・フレームを相関させるこ
とを指す。背景運動情報に基づいてビデオ・フレームを登録することにより、背
景運動と異なる運動を示すフレームの領域（すなわち動的オブジェクト）は、登
録されたビデオ・フレームの極少数の固定された位置に出現することになる。す
なわち、その領域は、静止背景に比べてフレームからフレームへ移動するのであ
る。この領域が動的オブジェクトである。ブロック６３で背景フレーム・コンス
トラクタは、ビデオ・フレームの処理済みシーケンスを作成するために、ビデオ
・セグメントから動的オブジェクトを取り除く。ブロック６５で背景フレーム・
コンストラクタは、ビデオ・フレームの処理済みシーケンスおよび背景運動情報
に基づいて、背景フレームを生成する。変形の性質に応じて、背景フレームの構
築は、２つ以上の処理済みビデオ・フレームを１つの背景画像に複合すること、
または背景フレームとして処理済みビデオ・フレームの１つを選択することを含
む。ある実施形態では、複合背景フレームは、パノラマ画像または高解像度静止
画像である。パノラマ画像は、２つ以上の処理済みビデオ・フレームを一緒にス
テッチすることにより創出され、カメラをパン、傾斜、または並進することによ
り得た背景シーンを表示するために使用することができる。高解像度静止画像は
、ビデオ・フレームの処理済みシーケンスの主題が比較的静的な背景シーンであ
るとき（すなわち、ビデオ源を記録するために使用したカメラの位置が大幅に変
化していないとき）、ふさわしいものである。高解像度静止画像を創出する１つ
の技術は、フレーム間でサブ画素の運動を識別するために、ビデオ・フレームの
処理済みシーケンスを分析することである。サブ画素の運動は、わずかなカメラ
の運動によって生じ、カメラが得たあらゆる個々のフレームよりも高い解像度を
有する複合画像を創出するために使用することができる。下記で説明するように
、高解像度静止画像は、ビデオ源１０にない詳細を提供するために表示すること
ができるので、特に有用である。また同じ主題の複数の高解像度静止画像が構築
されるとき、高解像度静止画像は、解像度が変動する領域を有する静止画像を形
成するために合成することができる。本明細書ではそのような画像を多重解像度
静止画像と呼ぶ。下記で説明するように、ユーザは、そのような静止画像の異な
る領域で、ズーム・インおよびズーム・アウトするためにアニメーションの再生
を停止することができる。同様に、ユーザは、パノラマ画像についてパンするた
めに、アニメーションの再生を停止することができる。パンとズームの組合せも
可能である。さらに、ビデオ源の再生中に、高解像度静止画像、パノラマ静止画
像、またはズーム可能静止画像を見るためにビデオ再生を停止するようにユーザ
を促すことができるように、アニメーションをビデオ源とクロスリンクすること
が可能である。クロスリンクについては、下記でさらに詳細に説明する。Still referring to flowchart 57 of FIG. 4B, the background frame constructor receives background motion information from the background motion estimator at block 61 and uses the background motion information to register frames of the video segment relative to each other. . Registration is
Refers to correlating video frames in a manner that accounts for changes caused by background motion. By registering a video frame based on background motion information, regions of the frame exhibiting motion different from the background motion (ie, dynamic objects) appear in a very small number of fixed locations in the registered video frame. Will be. That is, the area moves from frame to frame as compared to the stationary background. This area is a dynamic object. At block 63, the background frame constructor removes dynamic objects from the video segment to create a processed sequence of video frames. Background frame at block 65
The constructor generates a background frame based on the processed sequence of video frames and the background motion information. Depending on the nature of the deformation, the construction of the background frame comprises combining two or more processed video frames into one background image;
Or selecting one of the processed video frames as a background frame. In some embodiments, the composite background frame is a panoramic image or a high-resolution still image. Panoramic images are created by stitching two or more processed video frames together and can be used to display a background scene obtained by panning, tilting, or translating the camera. High resolution still images are used when the subject of the processed sequence of video frames is a relatively static background scene (ie, when the position of the camera used to record the video source has not changed significantly). Is appropriate. One technique for creating high resolution still images is to analyze the processed sequence of video frames to identify sub-pixel motion between frames. Sub-pixel motion is caused by slight camera motion and can be used to create a composite image with a higher resolution than any individual frame obtained by the camera. As described below, high resolution still images are particularly useful because they can be displayed to provide details not present in video source 10. Also, when multiple high-resolution still images of the same subject are constructed, the high-resolution still images can be combined to form a still image having regions of varying resolution. In the present specification, such an image is referred to as a multi-resolution still image. As described below, the user can stop playing the animation to zoom in and out at different regions of such a still image. Similarly, the user can stop playing the animation to pan about the panoramic image. A combination of pan and zoom is also possible. Additionally, animations can be cross-linked with the video source so that during playback of the video source, the user can be prompted to stop the video playback to see a high resolution still image, panoramic still image, or zoomable still image It is possible to Crosslinks are described in more detail below.

【００４０】図５は、図３に示した背景フレーム・コンストラクタ４３で生成した背景画像
セット７０を示す。背景フレームＢＦ_iは、処理済みビデオ・フレームであり、
複合画像ではない背景画像７１を指す。通常このタイプの背景画像は、スケーリ
ング（すなわちズーム・インまたはアウト）、または連続するビデオ・フレーム
間の突然のカットに起因する。背景フレームＢＦ_i+1は、本質的に同じシーンの
複数の処理済みビデオ・フレームから合成された高解像度静止画像７３を指す。
上記のように、このタイプの画像は、ビデオ源で知覚可能でない詳細を提供する
ために特に有用である。背景フレームＢＦ_i+2、ＢＦ_i+3、およびＢＦ_i+4は、そ
れぞれパノラマ背景画像７５の異なる領域に関係する。既に示したように、パノ
ラマ画像フレーム７５は、１つまたは複数の処理済みビデオ・フレームの部分７
６を他の処理済みビデオ・フレームにステッチすることにより生成されている。
この例では、シーンの多くを増分的に獲得するために、カメラを下方および右に
並進、または右にパンし下方に傾斜させた可能性がある。合成背景画像の他の形
状は、異なるタイプのカメラの運動に起因する。FIG. 5 shows a background image set 70 generated by the background frame constructor 43 shown in FIG. The background frame BF _i is a processed video frame,
A background image 71 that is not a composite image. Typically, this type of background image results from scaling (ie, zooming in or out) or sudden cuts between successive video frames. The background frame BF _{i + 1} refers to a high resolution still image 73 synthesized from a plurality of processed video frames of essentially the same scene.
As noted above, this type of image is particularly useful for providing details that are not perceptible at the video source. The background frames BF _{i + 2} , BF _{i + 3} , and BF _{i + 4} each relate to a different area of the panoramic background image 75. As already indicated, the panoramic image frame 75 comprises one or more portions 7 of the processed video frame.
6 by stitching it to another processed video frame.
In this example, the camera may have been translated down and to the right, or panned to the right and tilted down to gain incrementally more of the scene. Other shapes of the composite background image are due to different types of camera movement.

【００４１】図４Ｂの流れ図５７の最後のブロックに戻ると、背景混合推定装置（たとえば
図３の要素４７）は、背景運動情報およびブロック６７で新たに構築された背景
フレームに基づいて、背景混合情報を生成する。混合推定装置の動作について、
下記でより詳細に説明する。Returning to the last block of flowchart 57 of FIG. 4B, the background blend estimator (eg, element 47 of FIG. 3) performs background blending based on the background motion information and the newly constructed background frame in block 67. Generate information. Regarding the operation of the mixing estimation device,
This will be described in more detail below.

【００４２】図６は、ある実施形態によるオブジェクト・トラック生成器２７のブロック図
である。オブジェクト・トラック生成器２７は、背景トラック生成器（たとえば
図２の要素２５）によって生成された背景トラック３３とビデオ源１０を入力と
して受信する。オブジェクト・トラック生成器２７は、背景トラック３３とビデ
オ源１０の差に基づいて、シーンの動的オブジェクトを識別し、オブジェクト・
トラック３５のオブジェクト運動（ＯＭ）およびオブジェクト混合（ＯＢ）情報
と共に、動的オブジェクトを含むオブジェクト・フレーム（ＯＦ）を記録する。FIG. 6 is a block diagram of the object track generator 27 according to one embodiment. Object track generator 27 receives as input a background track 33 and a video source 10 generated by a background track generator (eg, element 25 of FIG. 2). The object track generator 27 identifies dynamic objects in the scene based on the difference between the background track 33 and the video source 10 and
An object frame (OF) containing a dynamic object is recorded along with the object motion (OM) and object mixture (OB) information of the track 35.

【００４３】ある実施形態では、オブジェクト・トラック生成器２７は、オブジェクト・フ
レーム・コンストラクタ８１と、オブジェクト運動推定装置８３と、オブジェク
ト混合推定装置８５とを含む。オブジェクト・フレーム・コンストラクタ８１は
、オブジェクト・フレーム（ＯＦ）を構築するために、背景トラック３３の背景
フレームに対してビデオ源１０のビデオ・フレームを比較する。下記で説明する
ように、オブジェクト・フレーム・コンストラクタ８１によって構築された各オ
ブジェクト・フレームは、動的オブジェクトを含む。ある実施形態では、所与の
ビデオ・セグメントで検出された動的オブジェクトあたり（すなわち、図３のシ
ーン変化推定装置４１によって識別されたビデオ・フレームのシーケンスで検出
された動的オブジェクトあたり）、少なくとも１つのオブジェクト・フレームが
生成される。オブジェクト運動推定装置８３は、オブジェクト運動情報（ＯＭ）
を生成するためにビデオ・セグメントの動的オブジェクトの運動を追跡し、オブ
ジェクト混合推定装置８５は、それぞれオブジェクト・フレーム・コンストラク
タ８１によって生成されたオブジェクト・フレーム、およびオブジェクト運動推
定装置８３によって生成されたオブジェクト運動情報に基づいてオブジェクト混
合情報（ＯＢ）を生成する。In one embodiment, the object track generator 27 includes an object frame constructor 81, an object motion estimator 83, and an object mixture estimator 85. The object frame constructor 81 compares the video frames of the video source 10 to the background frames of the background track 33 to construct an object frame (OF). As described below, each object frame constructed by the object frame constructor 81 contains a dynamic object. In some embodiments, at least per dynamic object detected in a given video segment (ie, per dynamic object detected in the sequence of video frames identified by scene change estimator 41 of FIG. 3), One object frame is generated. The object motion estimating device 83 is provided with object motion information (OM)
The motion of the dynamic object in the video segment is tracked to generate Object mixture information (OB) is generated based on the object motion information.

【００４４】図７Ａおよび図７Ｂは、図６のオブジェクト・トラック生成器２７の動作を詳
細に示す。図７Ａは、図３のシーン変化推定装置４１によって識別されたビデオ
・セグメント５４を示す。ビデオ・セグメント５４は、背景フレームＢＦ_iおよ
びＢＦ_i+1によって拘束され、動的オブジェクト５６を含む。図７Ｂは、オブジ
ェクト・トラック生成器２７の動作の流れ図１００である。FIGS. 7A and 7B illustrate in detail the operation of the object track generator 27 of FIG. FIG. 7A shows a video segment 54 identified by the scene change estimator 41 of FIG. Video segment 54 is constrained by background frames BF _i and BF _{i + 1} and includes dynamic object 56. FIG. 7B is a flowchart 100 of the operation of the object track generator 27.

【００４５】流れ図１００のブロック１０１に始まり、オブジェクト・フレーム・コンスト
ラクタ（たとえば、図６の要素８１）は、差フレーム９１を生成するために、背
景フレームＢＦ_iをビデオ・セグメント５４のビデオ・フレームＶＦ_jと比較する
。図７Ａに示すように、ＢＦ_iとＶＦ_jの小さな差が、差フレーム９１でいくらか
ランダムな差（雑音）を作成する。しかし、ＢＦ_iとＶＦ_jの差９２が比較的集中
した領域は、背景フレーム・コンストラクタ（たとえば図３の要素４３）によっ
て動的オブジェクトが背景フレームＢＦ_iから取り除かれた場合に生じる。流れ
図１００のブロック１０３で、ろ過済み差フレーム９３を作成するために、空間
的低域通過フィルタを差フレーム９１に加える。ろ過済み差フレーム９３では、
ランダムな差（すなわち高周波数要素）が消滅し、差９２の集中領域は、濃淡の
むらが増加したことを示す。結果として、差９２の集中領域の輪郭は、より容易
に識別することができる。したがって、流れ図１００のブロック１０５で、オブ
ジェクト・フレーム・コンストラクタは、ろ過済み差フレーム９３における差９
２の集中領域を識別するために、特徴調査（たとえば縁検出技術を用いて）を実
施する。ブロック１０７で、オブジェクト・フレーム・コンストラクタは、オブ
ジェクト・フレーム５６として、ろ過済み差フレーム９３で差９２が集中する領
域に対応するビデオ・フレームＶＦ_j内の領域を選択する。ある実施形態では、
オブジェクト・フレーム・コンストラクタは、差９２の集中領域を囲むろ過済み
差フレーム９３の矩形領域に対応する（たとえば同じｘ、ｙのずれを有する）矩
形領域として、オブジェクト・フレーム５６を選択する。代替のオブジェクト・
フレームの形状を使用することが可能である。ろ過済み差フレーム９３に差の集
中領域がなければ、オブジェクト・フレーム・コンストラクタによって選択され
るオブジェクト・フレームはないことを理解されたい。反対に、ろ過済み差フレ
ーム９３に複数の差の集中領域があれば、複数のオブジェクト・フレームを選択
することが可能である。ろ過済み差フレーム９３の差の各集中領域は、ビデオ・
フレーム５４のサブシーケンスにおいて動的オブジェクトに対応するとみなされ
る。Beginning at block 101 of flowchart 100, the object frame constructor (eg, element 81 of FIG. 6) converts background frame BF _i to video frame VF of video segment 54 to generate difference frame 91. Compare with _j . As shown in FIG. 7A, a small difference between BF _i and VF _j creates a somewhat random difference (noise) in the difference frame 91. However, a region where the difference 92 between BF _i and VF _j is relatively concentrated occurs when the dynamic object has been removed from the background frame BF _i by the background frame constructor (eg, element 43 of FIG. 3). At block 103 of the flowchart 100, a spatial low pass filter is applied to the difference frame 91 to create a filtered difference frame 93. In the filtered difference frame 93,
The random differences (ie, high frequency elements) have disappeared, and the concentrated area of difference 92 indicates that the shading has increased. As a result, the outline of the concentrated area of the difference 92 can be more easily identified. Thus, at block 105 of the flowchart 100, the object frame constructor determines the difference 9 in the filtered difference frame 93.
A feature survey (eg, using edge detection techniques) is performed to identify the two focus areas. At block 107, the object frame constructor as an object frame 56, the difference 92 in filtered difference frame 93 to select the area of the video frame VF _j corresponding to the region to focus. In some embodiments,
The object frame constructor selects the object frame 56 as a rectangular area (eg, having the same x, y offset) corresponding to the rectangular area of the filtered difference frame 93 surrounding the concentrated area of the difference 92. Alternative object
It is possible to use the shape of the frame. It should be understood that if there are no concentrated regions of difference in the filtered difference frame 93, no object frame will be selected by the object frame constructor. Conversely, if the filtered difference frame 93 has a plurality of difference concentration regions, a plurality of object frames can be selected. Each concentrated area of the difference of the filtered difference frame 93 is
It is considered to correspond to a dynamic object in a subsequence of frame 54.

【００４６】オブジェクト・フレーム・コンストラクタによりオブジェクト・フレーム５６
で動的オブジェクトが識別され、構成された後、動的オブジェクトの運動は、ビ
デオ・セグメント５４のフレームの列を通過するオブジェクト・フレーム５６の
位置の変化を追跡することによって決定される。したがって、流れ図１００のブ
ロック１０９で、オブジェクト運動推定装置（たとえば図６の要素８３）は、ビ
デオ・セグメント５４にあるビデオ・フレームから次のフレームへオブジェクト
・フレーム・コンストラクタによって識別および構成された動的オブジェクトの
運動を追跡する。ある実施形態によれば、オブジェクト運動の追跡は、関心のあ
る動的オブジェクトの新しい位置を決定するために、ビデオ・セグメント５４の
各連続ビデオ・フレーム内で、特徴調査をすることにより実施される。動的オブ
ジェクトの運動を構成するフレームを用いて、運動推定装置は、動的オブジェク
トの運動を近似するように連続するオブジェクト・フレーム間で補間するために
使用することができる運動情報を生成する。流れ図１００のブロック１１１で、
オブジェクト混合推定装置（たとえば図６の要素８５）は、オブジェクト運動情
報およびオブジェクト・フレームに基づいてオブジェクト混合情報を生成する。
ある実施形態では、オブジェクト混合推定装置の動作は、背景混合推定装置の動
作と同じである。しかし、連続するオブジェクト・フレームを混合する情報を生
成する代替技術は、本発明の精神および範囲から逸脱することなく使用すること
ができる。The object frame 56 is generated by the object frame constructor.
After the dynamic object has been identified and configured in, the motion of the dynamic object is determined by tracking changes in the position of the object frame 56 passing through the sequence of frames of the video segment 54. Accordingly, at block 109 of the flowchart 100, the object motion estimator (eg, element 83 of FIG. 6) determines the dynamics identified and configured by the object frame constructor from the video frame at video segment 54 to the next frame. Track the movement of an object. According to one embodiment, tracking of object motion is performed by performing a feature survey within each successive video frame of video segment 54 to determine a new position of the dynamic object of interest. . Using the frames that make up the motion of the dynamic object, the motion estimator generates motion information that can be used to interpolate between successive object frames to approximate the motion of the dynamic object. At block 111 of flowchart 100,
An object mixture estimator (eg, element 85 of FIG. 6) generates object mixture information based on the object motion information and the object frame.
In one embodiment, the operation of the object mixture estimator is the same as the operation of the background mixture estimator. However, alternative techniques for generating information that mixes successive object frames can be used without departing from the spirit and scope of the present invention.

【００４７】上記のように、図３のオブジェクト・トラック生成器２７のある実施形態では
、少なくとも１つのオブジェクト・フレームが、ビデオ・セグメント内でオブジ
ェクト・フレーム・コンストラクタ８１によって識別された各動的オブジェクト
に対して生成される。ビデオ・セグメントにおける動的オブジェクトの運動が非
常に複雑なために、ビデオ・セグメントを拘束するオブジェクト・フレーム間で
補間することにより適切に示すことができないことを、オブジェクト運動推定装
置８３が決定するならば、オブジェクト運動推定装置８３は、ビデオ・セグメン
トに対し、１つまたは複数の追加のオブジェクト・フレームを構築する必要性を
示す。次いでオブジェクト・フレーム・コンストラクタは、上記の技術を用いて
、オブジェクト運動推定装置によって示されたビデオ・セグメント内の接合部で
、追加のオブジェクト・フレームを生成することになる。背景フレーム構築に関
連して上記で説明したように、オブジェクト・フレームは、複合画像領域から引
き出した画像データを含むことが可能である。１つまたは複数の追加のオブジェ
クト・フレームが、複雑な運動を受ける動的オブジェクトを示すために構築され
るならば、アニメーションの再生中に、動的オブジェクトにシーンの他の特徴を
オーバーレイさせるために、追加のフレームをアニメーションのオブジェクトで
組織することが可能である。As described above, in one embodiment of the object track generator 27 of FIG. 3, at least one object frame is identified by each dynamic object identified by the object frame constructor 81 in the video segment. Is generated for If the object motion estimator 83 determines that the motion of the dynamic object in the video segment is so complex that it cannot be properly indicated by interpolating between the constraining object frames. For example, the object motion estimator 83 indicates the need to construct one or more additional object frames for the video segment. The object frame constructor will then use the techniques described above to generate additional object frames at the junctions in the video segment indicated by the object motion estimator. As described above in connection with background frame construction, an object frame can include image data derived from a composite image area. If one or more additional object frames are constructed to indicate the dynamic object undergoing complex movements, the dynamic object may be overlaid with other features of the scene during playback of the animation. It is possible to organize additional frames with animated objects.

【００４８】動的オブジェクトは、時折シーン内で互いに重なることがある。オブジェクト
・トラック生成器２７のある実施形態によれば、別々のオブジェクト・トラック
によって表される動的オブジェクトが互いに重なって遮蔽されるとき、遮蔽され
たオブジェクトが再出現すれば、遮蔽された動的オブジェクトのオブジェクト・
トラックは終了し、新たなオブジェクト・トラックが生成される。したがって、
動的オブジェクトが繰り返し互いを遮蔽するならば、多くの離散オブジェクト・
トラックを作成することができる。オブジェクト・トラック生成器の代替実施形
態では、２つの動的オブジェクトのスクリーンの位置が集中するならば、どちら
を他方の上に表示するかを示すオブジェクト・トラックに、情報を関連付けるこ
とができる。Dynamic objects may occasionally overlap each other in a scene. According to one embodiment of the object track generator 27, when the dynamic objects represented by the separate object tracks are occluded overlapping each other, the occluded dynamics will occur if the occluded object reappears. Object of object
The track ends and a new object track is created. Therefore,
If the dynamic objects repeatedly shield each other, many discrete objects
Tracks can be created. In an alternative embodiment of the object track generator, if the screen location of two dynamic objects is concentrated, the information can be associated with an object track that indicates which one to display on top of the other.

【００４９】背景画像の場合のように、動的オブジェクトの画像（すなわちオブジェクト画
像）は、複数のビデオ・フレームから複合することができる。複合オブジェクト
画像は、パノラマ・オブジェクト画像、高解像度静止オブジェクト画像、および
多重解像度静止オブジェクト画像を含むが、これに限定されるものではない。一
般に、複合背景画像を作成するために使用することができる画像のあらゆる複合
は、複合オブジェクト画像を作成するために使用することも可能である。As in the case of a background image, an image of a dynamic object (ie, an object image) can be composited from multiple video frames. Composite object images include, but are not limited to, panoramic object images, high resolution still object images, and multi-resolution still object images. In general, any composite of images that can be used to create a composite background image can also be used to create a composite object image.

【００５０】図８は、ある実施形態によるアニメーション・オブジェクト３０の図である。
アニメーション・オブジェクト３０は、背景トラック３３および、複数のオブジ
ェクト・トラック３５Ａ、３５Ｂ、３５Ｃを含む。上記のように、オブジェクト
・トラックの数は、ビデオ源に示すシーンで識別された動的オブジェクトの数に
依存し、識別される動的オブジェクトがなければ、アニメーション・オブジェク
ト３０には、オブジェクト・トラックがない。FIG. 8 is a diagram of an animation object 30 according to one embodiment.
The animation object 30 includes a background track 33 and a plurality of object tracks 35A, 35B, 35C. As described above, the number of object tracks depends on the number of dynamic objects identified in the scene shown in the video source; There is no.

【００５１】ある実施形態では、アニメーション・オブジェクト３０は、背景トラックおよ
びオブジェクト・トラックのリンクされたリスト１２１によって実現される。背
景トラック自体は、背景トラック要素ＢＴおよび背景フレームＢＦ₁からＢＦ_Nの
シーケンスのリンクされたリストによって実現される。各オブジェクト・トラッ
クは、同様にオブジェクト・トラック要素ＯＴ₁、ＯＴ₂、ＯＴ_R、およびオブジ
ェクト・フレームのシーケンスのそれぞれ（ＯＦ１₁からＯＦ１_M、ＯＦ２₁から
ＯＦ２_K、ＯＦＲ₁からＯＦＲ_J）のリンクされたリストによって実現される。あ
る実施形態では、背景トラック要素ＢＴおよびオブジェクト・トラック要素ＯＴ ₁ 、ＯＴ_2、ＯＴ_Rも、アニメーション・オブジェクトリンクされたリスト１２１
を実現するポインタを含む。すなわち、背景トラック要素ＢＴは、第１オブジェ
クト・トラック要素をＯＴ₁に対するポインタを含み、第１オブジェクト・トラ
ック要素ＯＴ₁は、次のオブジェクト・トラック要素ＯＴ₂に対するポインタを含
み、というようにオブジェクト・トラックＯＴ_Rに達するまで続く。ある実施形
態では、アニメーション・オブジェクトリンクされたリスト１２１と、個々の背
景およびオブジェクト・トラックのリンクされたリストの終了は、最終要素のそ
れぞれのヌルポインタによって示される。リンクされたリストの終了を示す他の
技術は、代替実施形態で使用することが可能である。たとえば、アニメーション
・オブジェクト３０は、アニメーション・オブジェクトがリンクされたリスト１
２１で、背景トラック３３を指摘する先頭ポインタ、および最終オブジェクト・
トラック３５Ｃを指摘する後尾ポインタを含むデータ構造を含む。同様に、背景
トラック要素ＢＴおよび各オブジェクト・トラック要素ＯＴ₁、ＯＴ₂、ＯＴ_Rは
、それぞれのリンクされたリストの終了を示すそれぞれの後尾ポインタを含む。
さらに他の実施形態では、リンクされたリストの要素における旗を、リストの終
了を示すために使用することが可能である。In one embodiment, the animation object 30 includes a background track and
And a linked list 121 of object tracks. Height
The scene track itself includes a background track element BT and a background frame BF.₁From BF_Nof
Implemented by a linked list of sequences. Each object track
Is also the object track element OT₁, OT_Two, OT_R, And obgi
Each of the sequences of the target frame (OF1₁From OF1_M, OF2₁From
OF2_K, OFR₁OFR to_J) By a linked list. Ah
In one embodiment, the background track element BT and the object track element OT ₁ , OT_2,OT_RAlso an animation / object linked list 121
And a pointer that implements That is, the background track element BT is the first object.
OT track element₁The first object trajectory.
Lock element OT₁Is the next object track element OT_TwoContains a pointer to
Object track OT_RContinue until you reach. An implementation
In the state, the animation / object linked list 121 and the individual
The end of the linked list of scenes and object tracks is
Each is indicated by a null pointer. Other to indicate the end of the linked list
The technique can be used in alternative embodiments. For example, animation
The object 30 is a list 1 to which the animation object is linked.
At 21, the first pointer pointing to the background track 33, and the last object
Includes a data structure that includes a tail pointer pointing to track 35C. Similarly, the background
Track element BT and each object track element OT₁, OT_Two, OT_RIs
, Respectively, to indicate the end of each linked list.
In yet another embodiment, a flag in an element of a linked list is displayed at the end of the list.
Can be used to indicate completion.

【００５２】引き続き図８を参照すると、データ構造１２３は、ある実施形態による背景フ
レームを実現するために使用される。背景フレーム・データ構造１２３のメンバ
は、背景トラック３３で次の背景フレームを指摘する次ポインタ（ＮＥＸＴＰ
ＴＲ）と、背景トラック３３で先行背景フレームを指摘する先行ポインタ（ＰＲ
ＥＶＰＴＲ）と、背景フレームに対し画像データの位置を指摘する画像ポイン
タ（ＩＭＡＧＥＰＴＲ）と、補間データ構造を指摘する補間ポインタ（ＩＮＴ
ＥＲＰＰＴＲ）と、背景フレームに対し相対再生時間を示すタイムスタンプ（
ＴＩＭＥＳＴＡＭＰ）とを含む。下記のように、さらに背景フレーム・データ構
造１２３は、ビデオ源のフレームを有するクロスリンクのために１つまたは複数
のメンバを含むことが可能である。With continued reference to FIG. 8, data structure 123 is used to implement a background frame according to an embodiment. A member of the background frame data structure 123 includes a next pointer (NEXT P) pointing to the next background frame in the background track 33.
TR) and a preceding pointer (PR) indicating the preceding background frame on the background track 33.
EV PTR), an image pointer (IMAGE PTR) indicating the position of the image data with respect to the background frame, and an interpolation pointer (INT) indicating the interpolation data structure.
ERP PTR) and a timestamp (relative to the background frame)
TIMESTAMP). As described below, the background frame data structure 123 may further include one or more members for crosslinks having frames of the video source.

【００５３】所与の背景フレームに対し表示されることになる画像は、非複合または複合背
景画像から得ることができることを思い起こすと、背景フレーム・データ構造１
２３の画像ポインタ自体は、メモリにおいて背景画像の位置を占めすデータ構造
、背景画像内でそれから背景フレームに対する画像データを獲得するためのずれ
（たとえば行と列）、および背景フレームを生成するために使用するビデオ・セ
グメントに対するポインタである可能性がある。下記のように、ビデオ・セグメ
ントに対するポインタは、アニメーションとビデオ源をリンクするために使用さ
れる。ある実施形態では、ビデオ・セグメントに対するポインタは、少なくとも
ビデオ・セグメントの第１ビデオ・フレームに対するポインタである。背景フレ
ームをビデオ・セグメントにリンクする他の技術は、本発明の精神および範囲を
逸脱することなく使用することが可能である。Recalling that the image to be displayed for a given background frame can be obtained from a non-composite or composite background image, the background frame data structure 1
The image pointer itself at 23 has a data structure that occupies the position of the background image in memory, a shift in the background image from which to obtain the image data for the background frame (eg, rows and columns), and to generate the background frame. May be a pointer to the video segment to use. As described below, pointers to video segments are used to link animation and video sources. In some embodiments, the pointer to the video segment is at least a pointer to the first video frame of the video segment. Other techniques for linking background frames to video segments can be used without departing from the spirit and scope of the present invention.

【００５４】ある実施形態では、背景補間データ構造１２５は、所与の背景フレームとそれ
に隣接する背景フレームの間で補間するためのデータを含む。背景フレームとそ
れに隣接して連続する背景フレーム（すなわち次の背景フレーム）の間で補間す
るための情報は、前方背景運動情報（ＢＭＦＯＲＷＡＲＤ）および前方背景混
合情報（ＢＢＦＯＲＷＡＲＤ）を含む。同様に、背景フレームとそれに隣接す
る先行背景フレームの間で補間するための情報は、後方背景運動情報（ＢＭＲ
ＥＶＥＲＳＥ）および後方背景混合情報（ＢＢＲＥＶＥＲＳＥ）を含む。所与
の方向（すなわち前方または後方）の背景運動情報自体は、いくつかのメンバを
含むデータ構造である可能性がある。図８に示す例示的な実施形態では、前方背
景運動情報（ＢＭＦＯＲＷＡＲＤ）は、次の背景フレームに達するＸおよびＹ
方向（すなわち、画像平面で水平および垂直方向）の背景シーンの並進を示すメ
ンバ、ＸおよびＹ方向（すなわちカメラのズーム・イン／アウトおよび縦横比の
変化）のスケール・ファクタ、回転ファクタ、パン・ファクタ、傾斜ファクタ、
およびスキュー・ファクタを含む。代替実施形態では、より多数、またはより少
数のパラメータを使用することを理解されたい。後方背景運動情報（ＢＭＲＥ
ＶＥＲＳＥ）は、同様の運動パラメータのセットにより示すことができる。In one embodiment, the background interpolation data structure 125 includes data for interpolating between a given background frame and its adjacent background frame. The information for interpolating between a background frame and a background frame adjacent thereto (ie, the next background frame) includes forward background motion information (BM FORWARD) and forward background mixed information (BB FORWARD). Similarly, information for interpolating between a background frame and a preceding background frame adjacent to the background frame is backward background motion information (BM R).
EVERSE) and rear background mixed information (BB REVERSE). The background motion information itself in a given direction (ie, forward or backward) can be a data structure that includes several members. In the exemplary embodiment shown in FIG. 8, the forward background motion information (BM FORWARD) is X and Y reaching the next background frame.
Members indicating the translation of the background scene in the directions (i.e., horizontal and vertical in the image plane); scale factors, rotation factors, panning in the X and Y directions (i.e., changes in camera zoom in / out and aspect ratio). Factor, slope factor,
And skew factors. It should be appreciated that alternative embodiments use more or fewer parameters. Back background motion information (BM RE
VERSE) can be indicated by a similar set of motion parameters.

【００５５】ある実施形態では、個々の各オブジェクト・フレームは、上記の背景フレーム
・データ構造１２３に類似のオブジェクト・フレーム・データ構造１２７によっ
て実現される。たとえば、オブジェクト・フレーム・データ構造１２７は、オブ
ジェクト・トラックにおける次のオブジェクト・フレームに対するポインタ（Ｎ
ＥＸＴＰＴＰ）と、オブジェクト・トラックにおける先行オブジェクト・フレ
ームに対するポインタ（ＰＲＥＶＰＴＲ）と、画像ポインタ（ＩＭＡＧＥＰ
ＴＲ）と、補間ポインタ（ＩＮＴＥＲＰＰＴＲ）と、タイムスタンプ（ＴＩＭ
ＥＳＴＡＭＰ）とを含み、それぞれは、背景トラック・データ構造１２３内の同
じメンバの機能に類似の機能を実施する。もちろん、オブジェクト・フレーム・
データ構造１２７の画像ポインタは、背景画像データの代わりにオブジェクト画
像データを示し、補間ポインタは、背景補間データの代わりにオブジェクト補間
データを示す。図８に示すように、例示的なオブジェクト補間データ構造は、前
方および後方オブジェクト運動（それぞれＯＭＦＯＲＷＡＲＤおよびＯＭＲ
ＥＶＥＲＳＥ）の両方と、前方および後方オブジェクト混合情報（それぞれＯＢ
ＦＯＲＷＡＲＤおよびＯＢＲＥＶＥＲＳＥ）の両方を示すメンバを含む。In one embodiment, each individual object frame is implemented by an object frame data structure 127 similar to the background frame data structure 123 described above. For example, the object frame data structure 127 contains a pointer (N) to the next object frame in the object track.
EXT PTP), a pointer to the preceding object frame in the object track (PREV PTR), and an image pointer (IMAGE PTP).
TR), an interpolation pointer (INTERPTR), and a time stamp (TIM).
ESTAMP), each performing a function similar to the function of the same member in the background track data structure 123. Of course, object frames
The image pointer of the data structure 127 indicates object image data instead of the background image data, and the interpolation pointer indicates object interpolation data instead of the background interpolation data. As shown in FIG. 8, an exemplary object interpolation data structure includes forward and backward object motions (OM FORWARD and OM R respectively).
EVERSE) and forward and backward object mixture information (OB respectively)
FORWARD and OB REVERSE).

【００５６】図９Ａは、背景混合を実施するために使用することができる背景フレーム混合
データ構造１３５Ａ、１３７Ａの例示的な実施形態を示す。オブジェクト混合デ
ータを同様に組織することが可能であることを理解されたい。ある実施形態では
、各混合データ構造１３５Ａ、１３７Ａは、多項式（Ａ、Ｂ、Ｃ、Ｄ）の係数の
形態にある混合演算子と、２つの連続する背景フレームにわたり混合演算子が適
用されるフレームの間隔部分を示す間隔分割（ＩＮＴＶ）と、連続する背景フレ
ームが複数の混合演算子によって表示することを可能とする次の混合データ構造
に対するポインタとを含む。FIG. 9A illustrates an exemplary embodiment of a background frame mixing data structure 135A, 137A that can be used to implement background mixing. It should be understood that the object mixture data can be similarly organized. In one embodiment, each mixed data structure 135A, 137A includes a mixed operator in the form of a polynomial (A, B, C, D) coefficient and a frame to which the mixed operator is applied over two consecutive background frames. And a pointer to the next mixed data structure that allows successive background frames to be displayed by multiple mixing operators.

【００５７】図９Ａでは、背景フレームＢＦ_iのための前方背景混合データ１３５Ａ、およ
び背景フレームＢＦ_i+1のための後方背景混合データ１３７Ａを、背景フレーム
ＢＦ_iとＢＦ_i+1を混合するために混合データを適用する方式を示すグラフ１３９
と共に示す。グラフに示されている混合動作は、混合間隔中（すなわち背景フレ
ーム間の時間）に、背景フレームＢＦ_iが効果的に背景フレームＢＦ_i+1にディゾ
ルブされるので、クロス・ディゾルブ動作として知られている。時刻ｔ_INTに補
間されたフレームを生成するために、背景フレームＢＦ_iは、フレームＢＦ_iに対
する前方背景運動情報に基づいて前方方向に変形され、背景フレームＢＦ_i+1は
、フレームＢＦ_i+1に対する後方背景運動情報に基づいて後方方向に変形される
。それぞれの重量（すなわち乗数）を、それらのフレームの混合情報を用いてフ
レームＢＦ_iとＢＦ_i+1に対し計算する。フレームＢＦ_iの重量は、背景フレーム
ＢＦ_iに対する前方混合情報に基づいており、背景フレームＢＦ_i+1の重量は、フ
レームＢＦ_i+1に対する後方混合情報に基づいている。次いでフレームＢＦ_iとフ
レームＢＦ_i+1の重量は、それぞれ背景フレームＢＦ_iとＢＦ_i+1の変形バージョ
ンに加えられ、結果として得られる変形および加重された画像は、補間されたフ
レームを生成するために（たとえば画素×画素の追加を用いて）組み合わされる
。[0057] In Figure 9A, the forward background mixed data 135A for the background frame BF _i, and the rear background mixed data 137A for the background frame BF _{i + 1,} for mixing the background frame BF _i and BF _{i + 1} 139 showing a method of applying the mixed data to the data
Shown together. The mixing operation shown in the graph is known as a cross-dissolve operation because during the mixing interval (ie, the time between background frames), the background frame BF _i is effectively dissolved into the background frame BF _{i + 1.} ing. To generate the interpolated frame at time t _INT, background frame BF _i is deformed in a forward direction based on the frontward background motion information for the frame BF _i, background frame BF i _{+ 1} is a frame BF i _{+ 1} Is deformed in the backward direction based on the backward background motion information for. The respective weights (ie, multipliers) are calculated for frames BF _i and BF _{i + 1} using the mixing information of those frames. Weight frame BF _i is based on the forward mixing information for background frame BF _i, the weight of the background frame BF i _{+ 1} is based on the rear mixing information for a frame BF i _{+ 1.} Then the weight of the frame BF _i and a frame BF i _{+ 1,} respectively added to the modified version of the background frame BF _i and BF i _{+ 1,} the variations and weighted image obtained as a result to produce an interpolated frame (E.g., using pixel x pixel addition).

【００５８】上記のように、ある実施形態では、混合演算子は、多項式の係数および多項式
が適用される混合間隔の部分を格納することにより実現される。たとえば、フレ
ームＢＦ_iに対する前方混合データ１３５Ａは、混合データ１３５Ａの係数Ａ、
Ｂ、Ｃ、Ｄによって示された混合演算子が全混合間隔（この場合はｔＢＦ_iとｔ
ＢＦ_i+1の間隔）にわたり適用されていることを示す、１の間隔フラクション（
ＩＮＴＶ＝１）を含む。一般に、１未満の間隔フラクションは、混合関数の全体
が次数が制限されている多項式では適切に表すことのできない不連続を含む場合
に使用される。しかしグラフ１３９に示した混合動作には、連続する第１次混合
動作が示されている。したがって、混合データ構造１３５Ａで指定されている係
数Ａ、Ｂ、Ｃ、およびＤを多項式の重量（Ｔ）＝ＡＴ³＋ＢＴ²＋ＣＴ＋Ｄに適用
すると、重量ＢＦ_i（Ｔ）＝１−Ｔとなる。ある実施形態によれば、Ｔの値は、
混合演算子Ａ＝０、Ｂ＝０、Ｃ＝−１、Ｄ＝１が、混合間隔を通じて時間と共に
線形に減少する乗数をもたらすように、該当する混合間隔のフラクションにわた
って０から１の範囲に正規化される。ＢＦ_iの乗数は１で始まり、線形に減少し
て混合間隔の終了時に０になる。フレームＢＦ_i+1に対する混合演算子を参照す
ると、混合データ構造１３７Ａで指定されている係数Ａ＝０、Ｂ＝０、Ｃ＝１、
Ｄ＝０を適用すると、重量ＢＦ_i+1（Ｔ）＝Ｔとなる。したがって、フレームＢ
Ｆ_i+1の乗数は０に始まり、混合間隔中１まで線形に増大する。As described above, in one embodiment, the mixing operator is implemented by storing the coefficients of the polynomial and the portion of the mixing interval to which the polynomial is applied. For example, the forward mixed data 135A for the frame BF _i is obtained by calculating the coefficient A of the mixed data 135A,
The mixing operator denoted by B, C, D is the total mixing interval (in this case, tBF _i and t
1 interval fraction (indicating that it has been applied over BF _{i + 1} intervals)
INTV = 1). In general, fractional fractions less than one are used when the entire mixture function contains discontinuities that cannot be adequately represented by a polynomial of limited order. However, the mixing operation shown in the graph 139 shows a continuous primary mixing operation. Therefore, when the coefficients A, B, C, and D specified in the mixed data structure 135A are applied to the polynomial weight (T) = AT ³ + BT ² + CT + D, the weight BF _i (T) = 1−T. According to one embodiment, the value of T is
The mixing operators A = 0, B = 0, C = -1, D = 1 are normalized to the range of 0 to 1 over the relevant mixing interval fraction so that the multipliers result in multipliers that decrease linearly with time over the mixing interval. Be transformed into The multiplier for BF _i starts at 1 and decreases linearly to 0 at the end of the mixing interval. Referring to the mixing operator for frame BF _{i + 1, the} coefficients A = 0, B = 0, C = 1, specified in the mixing data structure 137A
Applying D = 0 results in a weight BF _{i + 1} (T) = T. Therefore, frame B
The F _{i + 1} multiplier starts at 0 and increases linearly to 1 during the mixing interval.

【００５９】図９Ｂは、不連続混合関数１４１を示す。この場合は、背景フレームＢＦ_iと
ＢＦ_i+1の混合間隔は、３つの間隔フラクション１４６、１４７、および１４８
に分割される。混合間隔の第１フラクション１４６の間、背景フレームＢＦ_iに
適用される重量は一定して１に保持され、背景フレームＢＦ_i+1に加えられる重
量は一定して０に保持される。混合間隔の第２フラクション１４７の間、線形ク
ロス・ディゾルブが生じ、混合間隔の第３フラクション１４８の間、フレームＢ
Ｆ_iとＢＦ_i+1の乗数は再び一定に保持されるが、混合間隔の第１フラクション１
４６の値とは反対の値を取る。ある実施形態では、不連続混合関数１４１は、混
合データ構造１３５Ｂ、１３５Ｃ、１３５Ｄのリンクされたリストによって示さ
れており、リストの各混合データ構造は、それがそれぞれのＩＮＴＶパラメータ
で適用される混合間隔のフラクションを示している。したがって、背景フレーム
ＢＦ_iに対する第１前方混合データ構造１３５Ｂは、間隔フラクションＩＮＴＶ
＝０．２５、および乗数のまとまりは、混合間隔の最初の２５％（すなわち間隔
１４６）に渡りフレームＢＦ_iの変形されたバージョンに適用されていることを
示す混合演算子重量ＢＦ_i（Ｔ）＝１を含む。背景フレームＢＦ_iに対する第２混
合データ構造１３５Ｃは、間隔フラクションＩＮＴＶ＝０．５、および混合間隔
の中間の５０％の間（すなわち間隔１４７）、フレームＢＦ_iに加えられた重量
が線形に１から０まで減少することを示す混合演算子重量ＢＦ_i（Ｔ）＝１−Ｔ
を含む。説明を容易にするために、各間隔フラクションの間、Ｔの値は０から１
の範囲で正規化されていることを仮定していることに留意されたい。他の表示は
もちろん可能であり、本発明の範囲で考慮されるものである。背景フレームＢＦ _i の第３混合データ構造１３５Ｄは、間隔フラクションＩＮＴＶ＝０．２５、お
よび混合間隔の最後の２５％（すなわち、間隔１４８）の間フレームＢＦ_iは補
間された背景フレームに何ら寄与しないことを示し、重量ＢＦ_i（Ｔ）＝０で与
えられる混合演算子を含む。FIG. 9B shows the discontinuous mixing function 141. In this case, the background frame BF_iWhen
BF_{i + 1}The mixing intervals of the three interval fractions 146, 147, and 148
Is divided into During the first fraction 146 of the mixing interval, the background frame BF_iTo
The applied weight is kept constant at 1 and the background frame BF_{i + 1}Weight added to
The volume is kept constant at zero. During the second fraction 147 of the mixing interval,
A loss dissolve occurs and during the third fraction 148 of the mixing interval, frame B
F_iAnd BF_{i + 1}Is again kept constant, but the first fraction 1 of the mixing interval
It takes the opposite value to the value of 46. In one embodiment, the discontinuous mixing function 141 is
Indicated by a linked list of combined data structures 135B, 135C, 135D
And each mixed data structure in the list has its own INTV parameter
2 shows the fraction of the mixing interval applied in. Therefore, the background frame
BF_iThe first forward mixed data structure 135B for the
= 0.25, and the multiplier chunk is the first 25% of the mixing interval (ie, the interval
146) Frame BF_iApplied to a transformed version of
Show mixed operator weight BF_i(T) = 1. Background frame BF_iSecond blend against
The combined data structure 135C contains the interval fraction INTV = 0.5, and the mixed interval
Of the frame BF during the middle 50% of_iWeight added to
Operator weight BF indicating that decreases linearly from 1 to 0_i(T) = 1-T
including. For ease of explanation, the value of T is between 0 and 1 during each interval fraction.
Note that it is assumed that they are normalized in the range Other indications are
Of course, it is possible and considered within the scope of the invention. Background frame BF _i Of the third mixed data structure 135D is that the interval fraction INTV = 0.25,
BF during the last 25% of the mixing interval (ie, interval 148)_iIs supplement
Weighted BF, indicating no contribution to the interposed background frame_i(T) = 0
Contains the resulting mixed operators.

【００６０】引き続き図９Ｂを参照すると、背景フレームＢＦ_i+1に対する混合データ構造
１３７Ｂ、１３７Ｃ、１３７Ｄのリンクされたリストは、背景フレームＢＦ_iに
対して示されたリストの逆混合関数を示す。すなわち、混合間隔の最初の２５％
の間、０の重量がフレームＢＦ_i+1の変形されたバージョンに加えられ（その時
間の間、補間された背景フレームに対しフレームＢＦ_i+1から何ら寄与がないこ
とを示す）、混合間隔の中間の５０％の間、フレームＢＦ_i+1の変形されたバー
ジョンに加えられた重量は、０から１まで線形に増大し、混合間隔の最後の２５
％の間、乗数のまとまり（すなわち、重量＝１）は、補間された背景フレームを
創出するためにフレームＢＦ_i+1の変形されたバージョンに加えられる。Still referring to FIG. 9B, the linked list of mixed data structures 137B, 137C, 137D for background frame BF _{i + 1} shows the inverse mixing function of the list shown for background frame BF _i . That is, the first 25% of the mixing interval
, A weight of 0 is added to the transformed version of frame BF _{i + 1} (indicating that there is no contribution from frame BF _{i + 1} to the interpolated background frame during that time) and the mixing interval The weight added to the deformed version of frame BF _{i + 1} increases linearly from 0 to 1 during the middle 50% of the last 25 minutes of the mixing interval.
During%, the multiplier chunk (ie, weight = 1) is added to the modified version of frame BF _{i + 1} to create an interpolated background frame.

【００６１】図９Ｂに示すタイプの不連続混合関数を適用する１つの理由は、混合連続キー
フレームに関連付けられたゆがみを低減することである。混合間隔のフラクショ
ンの間、所与のキーフレームの寄与を一定に保持することにより、フレームＢＦ _i とＢＦ_i+1の前方および後方変形の相違によって生じるゆがみを低減することが
できる。ある実施形態では、演算子の入力は、所与のキーフレームの寄与を一定
に保持する混合間隔のフラクションを選択するために、アニメーション制作シス
テム（たとえば図１の要素１２）で受信される。代替実施形態では、ある画像ま
たは他の画像の寄与が一定に保持されなければならない間隔フラクションを自動
的に決定するために、画像の鮮鋭度の程度（たとえば画像階調度）を混合画像お
よび非混合画像に対し決定することができる。また、線形クロス・ディゾルブ動
作を上記で説明したが、他のタイプのクロス・ディゾルブ動作を異なる多項式に
よって規制することも可能である。また、混合動作のタイプを示すために多項式
の係数を用いる代わりに、他の指標を使用することも可能である。たとえば、線
形、２次、超越、対数、またはその他の混合動作を適用するかどうかを示す値は
、混合データ構造に格納することができる。背景混合について、クロス・ディゾ
ルブ動作の観点から第１に説明したが、他の混合効果も、ある背景フレームを他
の背景フレームに移行するために使用することが可能であり、退色および様々な
スクリーン・ワイプを含むが、これに限定されるものではない。One reason for applying a discontinuous mixture function of the type shown in FIG.
The purpose is to reduce the distortion associated with the frame. Fraction of mixing interval
By keeping the contribution of a given key frame constant during the _i And BF_{i + 1}Can reduce the distortion caused by the difference between the front and rear deformation
it can. In some embodiments, the input of the operator will make the contribution of a given keyframe constant.
Animation production system to select the fraction of the mixing interval to keep
System (eg, element 12 of FIG. 1). In an alternative embodiment, some images or
Or automatic interval fractions where the contribution of other images must be kept constant
The degree of sharpness of the image (for example, the image gradient) to determine
And unmixed images. Also, linear cross dissolve operation
The work has been described above, but other types of cross-dissolve behavior can be changed to different polynomials.
Therefore, it is also possible to regulate. Also, a polynomial to indicate the type of mixed operation
Instead of using the coefficient, other indices can be used. For example, a line
The value that indicates whether to apply shape, quadratic, transcendental, logarithmic, or other mixed operations
, Can be stored in a mixed data structure. Cross dizo for background mixing
Although described first from the viewpoint of the lube operation, other mixing effects can also be applied to a certain background frame.
Can be used to transition to background frames, fading and various
Including, but not limited to, screen wipes.

【００６２】図１０は、例示的なアニメーション・オブジェクト３０の背景トラック３３お
よびオブジェクト・トラック３５Ａ、３５Ｂが、アニメーション再生中に補間さ
れたフレームＩＦ_tを合成するために使用されるその方法を示す。[0062] Figure 10 illustrates exemplary animation object 30 of the background track 33 and the object track 35A, 35B is the method used to synthesize the frame IF _t interpolated in the animation playback.

【００６３】所与の時刻ｔで、補間フレームＩＦ_tは、背景トラック３３およびオブジェク
ト・トラック３５Ａ、３５Ｂにおける隣接するフレームのそれぞれの対に基づい
て生成される。隣接する背景フレームＢＦ_i、ＢＦ_i+1の対は、それらの背景フレ
ームに関連付けられた背景運動情報および背景混合情報を用いてそれぞれ変形お
よび加重される。背景フレームＢＦ_iは、フレームＢＦ_iに関連付けられた前方背
景運動情報（ＢＭ）により変形され、次いでフレームＢＦ_iに関連付けられた前
方背景混合情報（ＢＢ）により加重される。その効果は、背景フレームＢＦ_iの
画素を、前方運動情報（たとえば、並進、回転、スケーリング、パン、傾斜、ま
たはスキュー）に基づいてそれぞれの位置に変形し、次いで混合演算子に基づい
て画素値を加重することにより、各画素値の強度レベルを低減することである。
背景フレームＢＦ_i+1の画素も同様にフレームＦＢ_i+1に対する後方運動情報およ
び後方混合情報（ＢＭ、ＢＢ）によって変形および加重される。結果的に得られ
る変形済み画像は、次いで時刻ｔで背景シーンを表す補間された背景フレーム１
５１Ａを創出するために組み合わされる。オブジェクト・フレームＯＦ１_iおよ
びＯＦ１_i+1が同様に、前方および後方オブジェクト運動情報（ＯＭ）を用いて
それぞれ変形され、前方および後方オブジェクト混合情報（ＯＢ）を用いてそれ
ぞれ加重され、次いで組み合わされる。結果的に得られる補間されたオブジェク
ト・フレームは、次いで補間された背景および補間された動的オブジェクトを含
む補間されたフレーム１５１Ｂを作成するために、補間された背景フレーム１５
１Ａにオーバーレイされる。オブジェクト・フレームＯＦ２_iおよびＯＦ２_i+1も
、それらのオブジェクト・フレーム（ＯＭ、ＯＢ）に関連付けられたオブジェク
ト運動情報およびオブジェクト混合情報を用いて変形、加重、組み合わされ、次
いで補間された背景にオーバーレイされる。その結果が、完了補間済みフレーム
１５１Ｃである。連続する補間済みフレームも同様に、時間変形混合演算子の異
なる値と、運動情報に基づく背景フレームおよびオブジェクト・フレームに進行
性変形を用いて創出される。アニメーション再生の正味効果は、アニメーション
・オブジェクト３０を創出するために使用した元のビデオを近似するビデオ・エ
フェクトを作成することである。元のビデオから得られるサウンド・トラックも
、アニメーションと共に再生することが可能である。At a given time t, an interpolated frame IF _t is generated based on each pair of adjacent frames in the background track 33 and the object tracks 35A, 35B. Adjacent pairs of background frames BF _i , BF _{i + 1} are respectively transformed and weighted using the background motion information and the background mixing information associated with those background frames. Background frame BF _i is deformed by the forward background motion information associated with the frame BF _i (BM), then weighted by the forward background mixed information associated with the frame BF _i (BB). The effect is to transform the pixels of the background frame BF _i to their respective positions based on forward motion information (eg, translation, rotation, scaling, pan, tilt, or skew), and then pixel values based on the blend operator Is to reduce the intensity level of each pixel value.
The pixels of the background frame BF _{i + 1} are similarly deformed and weighted by the backward motion information and the backward mixed information (BM, BB) for the frame FB _{i + 1} . The resulting deformed image is then interpolated background frame 1 representing the background scene at time t
Combined to create 51A. Object frames OF1 _i and OF1 _{i + 1} are similarly transformed using forward and backward object motion information (OM), respectively, weighted using forward and backward object mixed information (OB), respectively, and then combined. The resulting interpolated object frame is then combined with the interpolated background frame 15 to create an interpolated frame 151B containing the interpolated background and the interpolated dynamic object.
Overlaid on 1A. Object frames OF2 _i and OF2 _{i + 1} are also transformed, weighted, combined using the object motion information and object blending information associated with those object frames (OM, OB), and then overlaid on the interpolated background. Is done. The result is the completed interpolated frame 151C. Consecutive interpolated frames are similarly created using different values of the temporal deformation blend operator and progressive deformation on background and object frames based on motion information. The net effect of animation playback is to create a video effect that approximates the original video used to create the animation object 30. The soundtrack from the original video can also be played along with the animation.

【００６４】図１１および図１２は、アニメーション・オブジェクトにおけるアニメーショ
ンの多重解像度を提供する技術を示す。図１１は、アニメーション・キーフレー
ムの複数の時間的解像度を提供する技術を示し、図１２は、アニメーション・キ
ーフレームの複数の空間的解像度を提供する技術を示す。ある実施形態では、ア
ニメーション・オブジェクトは、両方のタイプ、すなわち空間的および時間的な
多重再生解像度を提供するように構築される。これは再生システムのユーザに空
間的または時間的、またはその両方の次元で、アニメーション・シーケンスの解
像度を増大または低減する選択肢を提供する。再生システムが十分なダウンロー
ド帯域幅および処理能力を有するならば、解像度が最高であるアニメーション再
生を提示するように時間的および空間的に最大の解像度を選択することが可能で
ある。再生システムが十分なダウンロード帯域幅、または空間的および時間的に
最大の解像度を処理する処理能力を有さないならば、再生システムは、ユーザが
選択した基準に基づいて再生されているアニメーションの空間的または時間的解
像度を、自動的に低減することが可能である。たとえば、ユーザが空間的解像度
が最大である画像（すなわちより大きくより解像度の高い画像）を見ることを望
むことを示したならば、たとえそれがより少ないキーフレームとより多くの補間
フレームを意味するとしても、単位時間あたりより少ないキーフレームを有する
キーフレーム・トラック（すなわち背景トラックまたはオブジェクト・トラック
）を選択する一方で、最大またはほぼ最大の空間的解像度キーフレームを表示す
るために選択することが可能である。したがって、ユーザがより高い時間的解像
度（すなわち単位時間あたりより多くのキーフレーム）を望むならば、たとえ空
間的解像度を低減しなければならなくても、最大またはほぼ最大の時間的解像度
キーフレーム・トラックを選択することが可能であるが、各キーフレームは、空
間的解像度が低減されて表示されることになる。FIGS. 11 and 12 illustrate techniques for providing multiple resolutions of animation in an animation object. FIG. 11 illustrates a technique for providing multiple temporal resolutions of animation keyframes, and FIG. 12 illustrates a technique for providing multiple spatial resolutions of animation keyframes. In one embodiment, the animation object is constructed to provide both types, ie, spatial and temporal multiple playback resolution. This provides the user of the playback system with the option to increase or decrease the resolution of the animation sequence in spatial and / or temporal dimensions. If the playback system has sufficient download bandwidth and processing power, it is possible to select the maximum resolution in time and space to present the highest resolution animation playback. If the playback system does not have sufficient download bandwidth, or the processing power to handle the maximum resolution in space and time, the playback system will be able to provide the space for the animation being played based on the criteria selected by the user. Target or temporal resolution can be reduced automatically. For example, if a user indicates that they want to see the image with the highest spatial resolution (ie, a larger and higher resolution image), it means fewer key frames and more interpolated frames. Again, selecting a keyframe track with fewer keyframes per unit time (ie, a background track or object track) while selecting to display the largest or near maximum spatial resolution keyframes. It is possible. Thus, if a user desires a higher temporal resolution (ie, more keyframes per unit time), the maximum or near maximum temporal resolution keyframes, even if the spatial resolution must be reduced. It is possible to select a track, but each key frame will be displayed with reduced spatial resolution.

【００６５】アニメーションの時間的解像度を低減する使用方法として他に考察するものは
、アニメーション内で前方および後方の両方に高速にスキャンする場合である。
アニメーション再生中、ユーザはより速いレートでアニメーションを見ることを
要求して、時間的乗数（たとえば、２倍、５倍、１０倍など）の合図を出すこと
ができる。ある実施形態では、高速スキャニングの要求は、アニメーションの適
切な時間的解像度を選択する再生システムの帯域幅の能力と共に、時間的乗数を
用いることにより満足される。非常に速い再生レートでは、アニメーションの空
間的解像度も低減することができる。同様に時間的乗数は、スロー・モーション
効果を達成するために、自然なレートよりも遅いレートまでアニメーションの再
生を減速するために使用することが可能である。Another use for reducing the temporal resolution of an animation is when scanning quickly both forward and backward within an animation.
During animation playback, the user can signal a time multiplier (eg, 2x, 5x, 10x, etc.) requesting to view the animation at a faster rate. In one embodiment, the need for fast scanning is satisfied by using a temporal multiplier, with the playback system's bandwidth ability to select an appropriate temporal resolution of the animation. At very fast playback rates, the spatial resolution of the animation can also be reduced. Similarly, a temporal multiplier can be used to slow down the playback of the animation to a slower rate than the natural rate to achieve a slow motion effect.

【００６６】図１１は、多重時間的レベル背景トラック１６１を描写する。オブジェクト・
トラックは、同様に構成することが可能である。第１レベル背景トラック３５Ａ
では、最大数の背景フレーム（それぞれ「ＢＦ」の名称）は、背景フレームの連
続する各対の間で補間するための背景運動情報および背景混合情報と共に提供さ
れている。単位時間あたりの背景フレームの数は、あるビデオ・フレーム・レー
ト（その場合には、運動および混合情報は何ら情報を示さず、単に次のフレーム
へのカット）からそのビデオ・フレーム・レートの少量のフラクションの範囲で
変動する可能性がある。第２レベル背景トラック３５Ｂは、第１レベル背景トラ
ック３５Ａより少ない背景フレームを有し、第３レベル背景トラック３５Ｃは、
第２レベル背景トラックより少ない背景フレームを有するというように、第Ｎレ
ベル背景トラック３５Ｄまで続く。第２レベル背景トラック３５Ｂの背景フレー
ムの数は、レベル１の背景トラック３５Ａの数の半数であると図１１に示されて
いるが、他の比率を使用することができる。第２レベル背景トラック３５Ｂで、
背景フレームの連続する各対の間で補間するための混合情報および運動情報（Ｂ
Ｍ₂、ＢＢ₂）は、背景フレームから異なるレベル・トラックにある背景フレーム
への変形は異なるので、第１レベル背景トラック３５Ａに対する混合情報および
運動情報（ＢＭ₁、ＢＢ₁）とは異なる。同様に第３レベル背景トラック３５Ｃは
、第２レベル背景トラック３５Ｂより少ない背景フレームを有し、したがってフ
レーム毎に異なる運動情報および混合情報（ＢＭ₃、ＢＢ₃）を有する。背景トラ
ックのレベルが上がるにつれ、レベルＮで最低解像度の背景トラックに達するま
で、徐々に時間的解像度が悪くなっていく。FIG. 11 depicts a multi-temporal level background track 161. object·
Tracks can be similarly configured. First Level Background Track 35A
In, the maximum number of background frames (each named "BF") is provided with background motion information and background mixing information for interpolating between successive pairs of background frames. The number of background frames per unit of time is determined by a small amount of the video frame rate from one video frame rate (in which case, the motion and blending information shows no information, simply cuts to the next frame). May vary in the range of the fraction. The second level background track 35B has fewer background frames than the first level background track 35A, and the third level background track 35C has
Continue to the Nth level background track 35D, such as having less background frames than the second level background track. Although the number of background frames in the second level background track 35B is shown in FIG. 11 as being half of the number of level 1 background tracks 35A, other ratios can be used. In the second level background track 35B,
Mixing and motion information (B) for interpolating between successive pairs of background frames
M ₂ , BB ₂ ) is different from the mixed information and motion information (BM ₁ , BB ₁ ) for the first level background track 35A because the transformation from the background frame to the background frame on the different level track is different. Similarly, the third level background track 35C has fewer background frames than the second level background track 35B, and therefore has different motion and blending information (BM ₃ , BB ₃ ) for each frame. As the level of the background track increases, the temporal resolution gradually decreases until the lowest resolution background track is reached at level N.

【００６７】ある実施形態では、第１レベル３５Ａ以上の背景トラック・レベルは、実際に
は背景フレームの別々のシーケンスを含まない。その代わりに、第１レベル背景
トラック３５Ａ内の背景フレームに対するポインタが用意されている。たとえば
、第２レベル背景トラック３５Ｂの第１背景フレーム６２Ｂは、第１レベル背景
トラック６２Ａの第１背景フレーム６２Ａに対するポインタによって示すことが
可能であり、第２レベル背景トラック３５Ｂの第２背景フレーム６３Ｂは、第１
レベル背景トラック３５Ａの第３背景フレーム６３Ａに対するポインタによって
示すことが可能であるというようになっている。レベル１の背景トラック３５Ａ
の背景フレームに対するそれぞれのポインタは、データ構造において、次の背景
フレームへの変形を示す運動情報および混合情報、および先行背景フレームへの
変形を示す運動情報および混合情報と組み合わせることが可能である。さらに、
そのようなデータ構造のリンクされたリストは、背景フレームのシーケンスを示
すために使用することが可能である。背景フレームのシーケンスを示す他のデー
タ構造および技術を、本発明の精神および範囲を逸脱することなく使用すること
が可能である。In one embodiment, the background track levels above the first level 35 A do not actually include a separate sequence of background frames. Instead, a pointer to a background frame in the first level background track 35A is provided. For example, the first background frame 62B of the second level background track 35B can be indicated by a pointer to the first background frame 62A of the first level background track 62A, and the second background frame 63B of the second level background track 35B. Is the first
It can be indicated by a pointer to the third background frame 63A of the level background track 35A. Level 1 background track 35A
Can be combined in the data structure with motion and mixing information indicating a transformation to the next background frame and motion and mixing information indicating a transformation to the preceding background frame. further,
A linked list of such data structures can be used to indicate a sequence of background frames. Other data structures and techniques for indicating the sequence of background frames can be used without departing from the spirit and scope of the present invention.

【００６８】代替実施形態では、各背景トラック・レベル３５Ａ、３５Ｂ、３５Ｃ、３５Ｄ
は、背景フレームのセット（またはプール）から背景フレームを選択するいくつ
かの基準値によって形成される。この実施形態では、所与のレベルの背景トラッ
クを形成するために使用される基準値は、基準値の数によって決定された時間的
解像度を有するキーフレームのシーケンスを効果的に規制する。背景フレームを
選択するために使用する基準値は、背景フレームに対するポインタ、表において
背景フレームの位置を示す指数、または背景フレームを識別するために使用する
ことができる他のあらゆる値である可能性がある。In an alternative embodiment, each background track level 35A, 35B, 35C, 35D
Is formed by some criteria that selects a background frame from a set (or pool) of background frames. In this embodiment, the reference value used to form a given level of background track effectively regulates the sequence of key frames having a temporal resolution determined by the number of reference values. The reference value used to select the background frame can be a pointer to the background frame, an index that indicates the position of the background frame in the table, or any other value that can be used to identify the background frame. is there.

【００６９】ある実施形態では、多重レベル背景トラック１６１におけるよりレベルの高い
背景トラックに対する混合情報および運動情報は、より低いレベルの背景トラッ
クから運動および混合情報の複数のセットを組み合わせることによって得ること
が可能である。たとえば、背景トラック・レベル２において背景フレーム６２Ｂ
の間の移行に使用される背景運動情報および背景混合情報（ＢＭ₂、ＢＢ₂）は、
背景フレーム６２Ａと６４の間の移行に使用される背景運動情報および背景混合
情報と、背景フレーム６４と６３Ａの間の移行に使用される背景運動情報および
背景混合情報を組み合わせることにより、創出することが可能である。代替実施
形態では、よりレベルの高い背景トラックに対する背景運動情報および背景混合
情報は、トラックの背景フレームに基づいて、より低いレベルの背景トラックか
らの混合情報および運動情報を用いずに作成することが可能である。In some embodiments, the blending and motion information for higher level background tracks in the multi-level background track 161 may be obtained by combining multiple sets of motion and blending information from lower level background tracks. It is possible. For example, in background track level 2, background frame 62B
Background motion information and background mixed information (BM ₂ , BB ₂ ) used for transition between
Creating by combining background motion information and background mixing information used for transition between background frames 62A and 64, and background motion information and background mixing information used for transition between background frames 64 and 63A. Is possible. In an alternative embodiment, the background motion information and background mixing information for the higher level background track may be created based on the background frame of the track without using the mixing information and motion information from the lower level background track. It is possible.

【００７０】図１２は、多重空間的解像度背景フレームを示す。多重時間的解像度背景トラ
ックの背景トラックにおける各背景フレームは、ＢＦ₁、ＢＦ₂からＢＦ_Nまで様
々な解像度背景フレームを含むことが可能である。背景フレームＢＦ₁は、空間
解像度が最大の背景フレームであり、ある実施形態では、元のビデオ・フレーム
と同じ数の画素を含む。背景フレームＢＦ₂は、ＢＦ₁より低い空間的解像度を有
し、これは、ＢＦ₂がＢＦ₁より少ない画素を有する（すなわち画像がより小さい
）か、より大きなブロック・サイズを有することを意味する。ブロック・サイズ
は、通常画素の、画像を描写するために使用される視覚情報の要素単位のサイズ
を指す。ブロック・サイズが小さくなると、画像の特徴を描写するためにより繊
細な要素単位を使用するので、より空間的解像度の高い画像となる。ブロック・
サイズが大きくなると、より空間的解像度の低い画像となるが、唯一の画素値を
画素ブロックの画素のグループに適用するので、要求する全体の情報は少なくな
る。FIG. 12 shows a multi-spatial resolution background frame. Each background frame in the background track of the multi-temporal-resolution background track can include various resolution background frame from BF _1, BF ₂ to BF _N. Background frame BF ₁ is the background frame with the highest spatial resolution and, in one embodiment, includes the same number of pixels as the original video frame. Background frame BF ₂ has a lower spatial resolution than BF ₁ , which means that BF ₂ has fewer pixels (ie, smaller image) than BF ₁ or has a larger block size. . Block size refers to the size of a pixel, usually an elemental unit of visual information used to describe an image. Smaller block sizes result in images with higher spatial resolution because more delicate element units are used to describe the features of the image. block·
Larger sizes result in lower spatial resolution images, but require less overall information because only one pixel value is applied to the group of pixels in the pixel block.

【００７１】図１３は、再生システム１８Ａ、１８Ｂ、１８Ｃに配信されたアニメーション
・データ・ストリームの内容を制御するサーバ・システム１６の使用方法を示す
。ある実施形態によれば、サーバ・システム１６は、コンピュータ可読格納装置
１７０に格納されている様々なアニメーション・オブジェクト１４Ａ、１４Ｂ、
１４Ｃをダウンロードする要求を受信する。アニメーション・オブジェクトを受
信する前に、サーバ・システム１６は、システムの能力を決定するために、まず
再生システム１８Ａ、１８Ｂ、１８Ｃを照会する。たとえば、アニメーション・
オブジェクト３０Ｃをダウンロードする再生システム１８Ａの要求に応答して、
サーバ・システム１６は、適切なアニメーション・データ・ストリームを生成す
るためにサーバ・システム１６が使用することができる再生システムの特徴を１
セット提供するように再生システム１８Ａに要求する。図１３に示すように、所
与の再生システム１８Ａ、１８Ｂ、１８Ｃに関連付けられた再生システムの特徴
のセットは、再生システムのダウンロードの帯域幅またはそのネットワークのア
クセス媒体、再生システムの処理能力（たとえばプロセッサの数、プロセッサの
スピードなど）、再生システムの図形表示能力、再生システムによって使用され
るソフトウェア・アプリケーション（たとえばウェブ・ブラウザのタイプ）、ソ
フトウェア・アプリケーションを実行するオペレーティング・システム、および
ユーザの好みのセットを含むことができるが、これに限定されるものではない。
ユーザの好みには、空間的解像度を好み時間的解像度を犠牲にするという好みや
その逆の場合が含まれる。またユーザの好みは、アニメーションのダウンロード
およびディスプレイ中に、再生システムのユーザが動的に調整することが可能で
ある。FIG. 13 shows how to use the server system 16 for controlling the contents of the animation data stream distributed to the playback systems 18A, 18B, 18C. According to one embodiment, server system 16 includes various animation objects 14A, 14B,
Receive a request to download 14C. Before receiving the animation object, the server system 16 first queries the playback systems 18A, 18B, 18C to determine the capabilities of the system. For example, animation
In response to a request from playback system 18A to download object 30C,
The server system 16 features one of the playback system features that the server system 16 can use to generate the appropriate animation data stream.
It requests the playback system 18A to provide the set. As shown in FIG. 13, the set of playback system features associated with a given playback system 18A, 18B, 18C includes the playback system download bandwidth or its network access medium, playback system processing power (eg, The number of processors, the speed of the processor, etc.), the graphics display capabilities of the playback system, the software application used by the playback system (e.g., the type of web browser), the operating system running the software application, and the preferences of the user. A set may be included, but is not limited thereto.
User preferences include preferences of favoring spatial resolution and sacrificing temporal resolution, and vice versa. Also, user preferences can be dynamically adjusted by the user of the playback system during the downloading and display of the animation.

【００７２】ある実施形態では、アニメーション・オブジェクト１４Ａ、１４Ｂ、１４Ｃは
、多重時間的解像度および多重空間的解像度フォーマットで格納され、サーバ・
システム１６は、目標再生システムによって提供される特徴に最適な時間的およ
び空間的解像度を有するアニメーション・オブジェクト（たとえば、アニメーシ
ョン・オブジェクト３０Ｃ）から、背景トラックおよびオブジェクト・トラック
を選択する。したがって、グラフ１７２に示すように、サーバ・システム１６は
、再生システム１８Ａ、１８Ｂ、１８Ｃにダウンロードするために、同じアニメ
ーション・オブジェクト３０Ｃの異なる時間的／空間的解像度のバージョン１７
４Ａ、１７４Ｂ、１７４Ｃを、それぞれの特徴に基づいて選択することが可能で
ある。さらにサーバ・システムは、所与の再生システム１８Ａ、１８Ｂ、１８Ｃ
に提供されたアニメーションの時間的／空間的解像度を、再生システムの特徴の
変化に基づいて動的に調整することが可能である。In one embodiment, the animation objects 14A, 14B, 14C are stored in multiple temporal and multiple spatial resolution formats, and
System 16 selects a background track and an object track from an animation object (e.g., animation object 30C) having a temporal and spatial resolution that is optimal for the features provided by the target playback system. Thus, as shown in the graph 172, the server system 16 provides different temporal / spatial resolution versions 17 of the same animation object 30C for downloading to the playback systems 18A, 18B, 18C.
4A, 174B, 174C can be selected based on their characteristics. Further, the server system may be provided with a given playback system 18A, 18B, 18C.
Can be dynamically adjusted based on changes in the characteristics of the playback system.

【００７３】図１３は、通信ネットワークを経由してアニメーション・データ・ストリーム
の内容を制御するサーバ・システムの使用方法を示すが、複数の時間的および空
間的解像度アニメーション・トラックの間で動的に選択するために、同様の技術
を再生システム内で適用することが可能である。たとえば、再生システム内の選
択論理は、再生システムの特徴に適した時間的／空間的解像度を有する再生シス
テム内の論理を表示するために、アニメーション・データ・ストリームを提供す
ることが可能である。たとえばＤＶＤプレーヤは、アニメーション再生の時間的
または空間的解像度を、１つまたは複数の他のビデオまたはアニメーションも（
たとえば表示の他の領域に）表示されているかどうかに基づいて低減するように
設計することが可能である。FIG. 13 illustrates the use of a server system to control the content of an animation data stream via a communication network, but dynamically between multiple temporal and spatial resolution animation tracks. Similar techniques can be applied within the playback system to make the selection. For example, the selection logic in the playback system can provide an animation data stream to display the logic in the playback system having a temporal / spatial resolution appropriate for the features of the playback system. For example, a DVD player may adjust the temporal or spatial resolution of animation playback to one or more other videos or animations (
It can be designed to reduce based on whether or not it is being displayed (e.g. in other areas of the display).

【００７４】上記のように、ユーザが、再生中にアニメーションを見ることとビデオ源を切
り替えることができるように、アニメーションのキーフレームとビデオ源のビデ
オ・フレームを関連付けることは、本発明の実施形態の意図する利点である。キ
ーフレームとビデオ・フレームのこの関連は、「クロスリンク」と呼ばれ、アニ
メーションまたはビデオのある表示が他の表示に対し利点を提供する場合には、
特に有用である。たとえば、下記で説明するアニメーション再生システムのある
実施形態では、ビデオ・フレームのシーケンスがアニメーションの部分を形成す
る静止画像にリンクされているとき、ビデオの再生中にユーザは通知を受け取る
。下記で説明するように、静止画像は、より高いまたはより変形可能な解像度、
より広い視野（たとえばパノラマ画像）、より高い動的範囲、またはビデオ・フ
レームと異なる縦横比を有することが可能である。また静止画像は、ステレオ三
次元（３Ｄ）ビューを可能とするステレオ・パララックス情報または他の深度情
報を含むことも可能である。静止画像が利用可能であることを通知されるとき、
ユーザは、アニメーションに付帯する利点（たとえばより高い解像度の画像）を
達成するために、使用中にビデオ表示からアニメーション表示に切り替える入力
を提供することが可能である。代替として、ユーザは、アニメーションのパノラ
マ画像内で操縦するためにビデオ表示を停止したり、アニメーションの静止画像
上でズーム・インまたはズーム・アウトすることが可能である。他の実施形態で
はユーザは、ピクチャ・モードにあるピクチャで、アニメーションおよびビデオ
を再生したり、アニメーションの表示からクロスリンク・ビデオに切り替えるこ
とが可能である。As described above, associating key frames of an animation with video frames of a video source so that a user can watch the animation and switch video sources during playback is an embodiment of the present invention. Is the intended advantage. This association of keyframes and video frames is called "crosslinking" and if one display of animation or video offers advantages over another,
Particularly useful. For example, in one embodiment of the animation playback system described below, a user receives a notification during the playback of a video when a sequence of video frames is linked to a still image that forms part of the animation. As explained below, a still image has a higher or more deformable resolution,
It is possible to have a wider field of view (eg, a panoramic image), a higher dynamic range, or a different aspect ratio than the video frame. Still images can also include stereo parallax information or other depth information that allows for a stereo three-dimensional (3D) view. When notified that a still image is available,
The user can provide an input to switch from a video display to an animated display during use to achieve the benefits associated with the animation (eg, higher resolution images). Alternatively, the user can stop the video display to steer within the panoramic image of the animation, or zoom in or out on the still image of the animation. In other embodiments, a user can play animations and videos on a picture in picture mode, or switch from displaying animations to cross-link video.

【００７５】ある実施形態ではクロスリンクは、ビデオから静止画像を生成し、次いで静止
画像とビデオのフレームの間でクロスリンクを創出することを含む。代替の実施
形態では、静止画像がクロスリンクされているビデオ以外のビデオを用いて静止
画像を生成することが可能である。ビデオから静止画像を創出する技術について
は、下記で説明する。本発明の精神および範囲から逸脱することなく、静止画像
を創出するために他の同様な技術を使用することも可能であることを理解された
い。In one embodiment, cross-linking involves generating a still image from the video and then creating a cross-link between the still image and the frames of the video. In an alternative embodiment, the still image can be generated using a video other than the video to which the still image is cross-linked. Techniques for creating still images from video are described below. It should be understood that other similar techniques could be used to create a still image without departing from the spirit and scope of the present invention.

【００７６】ビデオ源のフレームよりも高い空間的解像度を有する静止画像は、複数のビデ
オ・フレームをある時間にわたって統合することにより達成することができる。
時間的に互いに近接するビデオ・フレームの画像は、通常、カメラのパニング、
ズーム、またはその他の運動の結果として少量の位置のシフト（たとえばサブ画
素の運動）を提示する。シフトにより、より高い解像度の画像を創出するために
複数のビデオ・フレームを空間的に登録することが可能となる。次いで空間的に
登録されたビデオ・フレームの隣接する画素の間で補間することにより、高解像
度静止画像を創出することができる。Still images having a higher spatial resolution than the frames of the video source can be achieved by integrating multiple video frames over time.
Images of video frames that are close together in time are typically panned by a camera,
It presents a small amount of position shift (eg, sub-pixel movement) as a result of zooming or other movement. The shift allows multiple video frames to be spatially registered to create higher resolution images. High resolution still images can then be created by interpolating between adjacent pixels of the spatially registered video frame.

【００７７】代替として、静止画像は、静止画像がリンクされているビデオより高い解像度
を示す第２ビデオ源から引き出すことができる。たとえばモーション・ピクチャ
は、通常、一般にビデオ・テープ用に使用されるＮＴＳＣビデオ・フォーマット
より何倍も高い解像度を有するフィルムに記録される。[0077] Alternatively, the still image can be derived from a second video source that exhibits a higher resolution than the video to which the still image is linked. For example, motion pictures are typically recorded on film having a resolution many times higher than the NTSC video format commonly used for video tape.

【００７８】静止画像は、静止画像がクロスリンクされているビデオ・フレームより広い動
的範囲を有することもできる。動的範囲は、画像において画素の色の各構成要素
に対する認識可能な強度レベルの範囲に関係する。カメラの露光設定は、変動す
る照明条件（たとえば自動アイリス）に適合するためにフレーム毎に変化させる
ことが可能なので、ビデオ・フレームのシーケンスは、個々のビデオ・フレーム
に対し増大した動的範囲を有する静止画象に統合することができる色のわずかな
変化を提示することができる。また静止画像は、広い動的範囲を有するビデオ源
（たとえばフィルム）から創出し、次いでより狭い動的範囲を有するビデオにク
ロスリンクすることが可能である。A still image can also have a wider dynamic range than the video frames to which the still image is cross-linked. The dynamic range relates to the range of recognizable intensity levels for each pixel color component in the image. Since the exposure settings of the camera can be changed from frame to frame to accommodate changing lighting conditions (eg, automatic iris), the sequence of video frames will increase the dynamic range for each video frame. A slight change in color can be presented that can be integrated into the still image having. Still images can also be created from video sources with a large dynamic range (eg, film) and then cross-linked to videos with a smaller dynamic range.

【００７９】静止画像は、静止画像がクロスリンクされているビデオ・フレームと異なる縦
横比を有することも可能である。縦横比は、画像の幅と高さの比を指す。たとえ
ば、静止画像は、フィルムなどの比較的広い縦横比を有するビデオ源から創出し
、次いでＮＴＳＣビデオなどのより狭い縦横比を有する異なるビデオ源にクロス
リンクすることが可能である。フィルムの典型的な縦横比は２．２×１である。
対称的にＮＴＳＣビデオは、４×３の縦横比を有する。A still image can also have a different aspect ratio than the video frame to which the still image is cross-linked. The aspect ratio indicates the ratio of the width to the height of an image. For example, a still image can be created from a video source with a relatively wide aspect ratio, such as film, and then cross-linked to a different video source with a narrower aspect ratio, such as NTSC video. The typical aspect ratio of the film is 2.2 × 1.
In contrast, NTSC video has a 4 × 3 aspect ratio.

【００８０】カメラをパンすることに起因するビデオ・フレームは、パノラマを創出するた
めに登録し、複合することができる。カメラをズームすることに起因するビデオ
・フレームは、異なる領域（すなわち、多重解像度の画像）で異なる解像度を有
する大きな静止画像を創出するために登録し、複合することができる。カメラの
ズームが生じた場合の多重解像度の領域は、多重解像度の画像の他の領域よりも
高い解像度を含むことになる。パノラマおよび多重解像度の静止画像は、本明細
書では操縦可能画像と呼ばれる画像のクラスである。一般に、操縦可能画像は、
異なるビューを提供するためにパンまたはズームすることができるあらゆる画像
、または使用することができる三次元の画像を含むあらゆる画像である。パノラ
マ画像および多重解像度静止画像は、複合画像によって表示することが可能であ
るが、パノラマ画像または多重解像度静止画像は、空間的に登録される離散静止
画像によって表示することもできる。The video frames resulting from panning the camera can be registered and composited to create a panorama. Video frames resulting from zooming the camera can be registered and composited to create large still images with different resolutions in different regions (ie, multi-resolution images). The multi-resolution area when camera zoom occurs will include higher resolution than other areas of the multi-resolution image. Panoramic and multi-resolution still images are a class of images referred to herein as steerable images. Generally, steerable images are:
Any image that can be panned or zoomed to provide a different view, or any image that can be used, including a three-dimensional image. While the panoramic image and the multi-resolution still image can be displayed by a composite image, the panoramic image or the multi-resolution still image can also be displayed by a spatially registered discrete still image.

【００８１】ステレオ画像対は、水平方向のカメラの追跡運動を示すビデオ・フレームのシ
ーケンスから得ることができる。別々の視点（たとえば瞳孔間距離によって離さ
れている視点）から記録された２つのビデオ・フレームは、ビデオ・シーケンス
からステレオ画像対として選択することが可能である。ステレオ画像は、ステレ
オ３Ｄディスプレイ、ステレオ眼鏡などのいくつかの異なるステレオ視覚装置を
用いて表示することができる。A stereo image pair can be obtained from a sequence of video frames showing the tracking motion of the camera in the horizontal direction. Two video frames recorded from different viewpoints (eg, viewpoints separated by interpupillary distance) can be selected as a stereo image pair from the video sequence. Stereo images can be displayed using a number of different stereo visual devices, such as stereo 3D displays, stereo glasses, and the like.

【００８２】さらにステレオ画像は、たとえば、画像修正または特徴整合技術を用いて、所
与のステレオ画像対において対応する画素または画像の特徴を識別するために、
分析することができる。次いで対応する画素または画像の特徴を、画素の深度を
確立し、したがって３Ｄレンジ画像を創出するために使用することができる。３
Ｄモデルを構築すること、画像から新規なビューまたはシーンを創出すること、
および新規なビューを創出するために画像間で補間することを含むいくつかの応
用でレンジ画像を使用することができる。Further, the stereo images may be used to identify corresponding pixels or image features in a given stereo image pair, for example, using image modification or feature matching techniques.
Can be analyzed. The corresponding pixel or image features can then be used to establish the pixel depth and thus create a 3D range image. 3
Building D-models, creating new views or scenes from images,
Range images can be used in several applications, including interpolating between images to create new views.

【００８３】図１４Ａは、ビデオ源１０とアニメーション制作システム１２によってビデオ
源１０から創出されたアニメーション１４の間のクロスリンクを確立するクロス
リンク生成器２０３の使用方法を示す。ビデオ源は、ビデオ符号器２０１（たと
えばベクトル量子化器）によって、クロスリンク生成器２０３で受信される前に
圧縮することが可能である。ある実施形態によれば、クロスリンク生成器２０３
は、ビデオ源のフレームが対応しているアニメーションにおいてキーフレームに
対するそれぞれのポインタを含むクロスリンク・データ構造を生成する。FIG. 14A illustrates the use of the crosslink generator 203 to establish a crosslink between the video source 10 and the animation 14 created by the animation production system 12 from the video source 10. The video source may be compressed by a video encoder 201 (eg, a vector quantizer) before being received at crosslink generator 203. According to one embodiment, the crosslink generator 203
Generates a crosslink data structure that includes respective pointers to keyframes in the animation to which the frames of the video source correspond.

【００８４】図１４Ｂは、ビデオ源１０と別のビデオ源２０４から創出されたアニメーショ
ン２０５の間でクロスリンクを確立するクロスリンク生成器２０３の使用方法を
示す。別のビデオ源２０４は、ビデオ源１０を作成するために使用された可能性
があるか、または２つのビデオ源１０、２０４は関係付けられていない可能性が
ある。２つのビデオ源１０、２０４が関係付けられていなければ、アニメーショ
ン２０５のどちらの画像がビデオ源１０のフレームとクロスリンクされることに
なるかを識別するために、演算子の補助が要求される可能性がある。２つのビデ
オ源１０、２０４が関係づけられていれば（たとえば一方がフィルムで、他方が
ＮＴＳＣフォーマットのビデオ）、時間的相関またはシーンの相関は、アニメー
ション２０５の画像とビデオ源１０のフレームを自動的にクロスリンクするクロ
スリンク生成器によって使用することが可能である。FIG. 14B illustrates the use of a crosslink generator 203 to establish a crosslink between a video source 10 and an animation 205 created from another video source 204. Another video source 204 may have been used to create video source 10, or the two video sources 10, 204 may not be related. If the two video sources 10, 204 are not related, operator assistance is required to identify which image of the animation 205 will be cross-linked with the frames of the video source 10. there is a possibility. If two video sources 10, 204 are associated (eg, one is film and the other is NTSC format video), the temporal correlation or scene correlation automatically converts the image of the animation 205 and the frames of the video source 10. It can be used by a cross-link generator that cross-links dynamically.

【００８５】図１５は、ある実施形態によるクロスリンク・データ構造２１２を示す。クロ
スリンク・データ構造２１２の各データ要素は、ビデオ・フレーム要素（ＶＦＥ
）と呼ばれ、ビデオ源のそれぞれのフレームに対応する。したがって、要素ＶＦ
Ｅ₁、ＶＦＥ₂、ＶＦＥ₃、ＶＦＥ_i、およびＶＦＥ_i+1は、ビデオ源のフレームＶ
Ｆ₁、ＶＦ₂、ＶＦ₃、ＶＦ_i、およびＶＦ_i+1（図示せず）に対応する。示したよ
うに、クロスリンク・データ構造２１２は、各ビデオ・フレーム要素が次のビデ
オ・フレームに対するポインタと、さらにアニメーションの背景フレーム２１５
、２１６に対するポインタを含むリンクされたリストとして実現される。代替実
施形態では、クロスリンク・データ構造２１２は、リンクされたリストとしてよ
りはむしろビデオ・フレーム要素のアレイとして実現することも可能である。さ
らに他の実施形態では、クロスリンク・データ構造２１２は、リンクされたリス
トの代わりにツリー・データ構造として実現することが可能である。ツリー・デ
ータ構造は、隣接しないビデオ・セグメントの間で関連を確立し、特定のビデオ
・フレームを見つける調査をするために有用である。一般に、クロスリンク・デ
ータ構造は、本発明の精神および範囲から逸脱することなく、あらゆるタイプの
データ構造によって表示することが可能である。FIG. 15 illustrates a crosslink data structure 212 according to one embodiment. Each data element of the crosslink data structure 212 is a video frame element (VFE).
) And corresponds to each frame of the video source. Therefore, the element VF
E ₁ , VFE ₂ , VFE ₃ , VFE _i , and VFE _{i + 1} are the frames V
_{_{_{F 1, VF 2, VF 3}}} , VF i, and corresponds to the VF _{i + 1} (not shown). As shown, the crosslink data structure 212 shows that each video frame element contains a pointer to the next video frame, and a background frame 215 for the animation.
, 216, implemented as a linked list that contains pointers to them. In an alternative embodiment, the cross-link data structure 212 may be implemented as an array of video frame elements rather than as a linked list. In yet other embodiments, the cross-link data structure 212 can be implemented as a tree data structure instead of a linked list. The tree data structure is useful for establishing associations between non-adjacent video segments and conducting searches to find specific video frames. In general, a cross-link data structure may be represented by any type of data structure without departing from the spirit and scope of the present invention.

【００８６】ある実施形態では、アニメーションの背景フレームは、それぞれが次の背景フ
レーム・データ構造に対するポインタ（ＮＥＸＴＰＴＲ）と、先行背景フレー
ム・データ構造に対するポインタ（ＰＲＥＶＰＴＲ）と、画像ポインタ（ＩＭ
ＡＧＥＰＴＲ）と、補間情報に対するポインタ（ＩＮＴＥＲＰＰＴＲ）と、
タイムスタンプと、クロスリンク・データ構造２１２における１つまたは複数の
要素に対するポインタ（ＶＦＰＴＲ）とを含む背景フレーム・データ構造２１
５、２１６によって表示される。ＮＥＸＴＰＴＲ、ＰＲＥＶＰＴＲ、ＩＭＡ
ＧＥＰＴＲ、およびＩＮＴＥＲＰＰＴＲは、図８に関連して上記で説明して
いる。In one embodiment, the background frames of the animation are each a pointer to the next background frame data structure (NEXT PTR), a pointer to the previous background frame data structure (PREV PTR), and an image pointer (IM
AGE PTR), a pointer to the interpolation information (INTERPTR),
Background frame data structure 21 including a timestamp and a pointer (VF PTR) to one or more elements in crosslink data structure 212
5, 216. NEXT PTR, PREV PTR, IMA
The GE PTR and the INTERP PTR are described above in connection with FIG.

【００８７】特定の背景フレーム・データ構造２１５、２１６のＶＦＰＴＲ、およびクロ
スリンク・データ構造２１２の対応する要素の背景フレーム・データ構造に対す
るポインタは、クロスリンク２１７を形成する。すなわち、背景フレーム・デー
タ構造およびビデオ・フレーム要素は、それぞれ互いに関連を含む。その関連は
、均一資源位置入力装置、メモリ・アドレス、アレー指標、または背景フレーム
・データ構造およびビデオ・フレーム要素を関連付けるあらゆる他の値である可
能性がある。The pointer to the VF PTR of a particular background frame data structure 215, 216 and the background frame data structure of the corresponding element of the crosslink data structure 212 forms a crosslink 217. That is, the background frame data structure and the video frame element each include an association with each other. The association can be a uniform resource location input device, a memory address, an array index, or any other value that associates a background frame data structure and video frame elements.

【００８８】背景フレーム・データ構造２１５を参照すると、ＶＦＰＴＲはクロスリンク
・データ構造２１２において１つのビデオ・フレーム要素（ＶＦＥ₁）のみを指
すように図１５に示しているが、ＶＦＰＴＲは、各ビデオ・フレーム要素に対
し、再びそれを指す別々のポインタを含むことが可能である。たとえば、ＶＦ
ＰＴＲは、各ビデオ・フレーム要素ＶＦＥ₁、ＶＦＥ₂、ＶＦＥ₃に対し、別々の
ポインタを含むデータ構造であることが可能である。代替として、ＶＦＰＴＲ
は、ビデオ・フレーム要素（たとえばＶＦＥ₁）および背景フレーム・データ構
造２１５がリンクされているビデオ・フレーム要素の合計の数を示す値に対する
ポインタを含むデータ構造であることが可能である。背景フレーム・データ構造
とビデオ・フレーム要素のシーケンスをクロスリンクする他のデータの構築は、
代替実施形態で使用することが可能である。Referring to the background frame data structure 215, the VF PTR is shown in FIG. 15 to point to only one video frame element (VFE ₁ ) in the crosslink data structure 212, but the VF PTR is For each video frame element, it is possible to again include a separate pointer to it. For example, VF
PTR, for each video frame elements _{_{_{VFE 1, VFE 2, VFE 3}}} , can be a data structure including separate pointers. Alternatively, VF PTR
May be a data structure that includes a pointer to a value indicating the total number of video frame elements to which the video frame element (eg, VFE ₁ ) and the background frame data structure 215 are linked. The construction of other data that crosslinks the background frame data structure and the sequence of video frame elements
It can be used in alternative embodiments.

【００８９】ある実施形態では、各背景フレーム・データ構造２１５、２１６の画像ポイン
タ（ＩＭＡＧＥＰＴＲ）は、背景フレームに対する画像データを獲得する背景
画像が、たとえば非複合静止画像（すなわちたとえあるとしても動的オブジェク
トが取り除かれているビデオ・フレーム）、高解像度静止画像、パノラマまたは
他の複合画像なのかを示す画像タイプのメンバを含む。また画像ポインタは、メ
モリにおける背景画像の位置、および背景フレームに対する画像データが位置し
ている背景画像内のずれを示すメンバを含む。In one embodiment, the image pointer (IMAGE PTR) of each background frame data structure 215, 216 indicates that the background image acquiring image data for the background frame is, for example, a non-composite still image (ie, a moving image if any). (A video frame with the target object removed), a high resolution still image, a panorama or other composite image. The image pointer includes a member indicating a position of the background image in the memory and a shift in the background image where the image data is located with respect to the background frame.

【００９０】テキスト・ディスクリプタ（ＴＥＸＴＤＥＳＣＲ）を、背景フレーム・デー
タ構造２１５、２１６の部分として含むことも可能である。ある実施形態では、
テキスト・ディスクリプタは、背景フレームによってスパンされるアニメーショ
ンの部分を記述するテキスト記述（たとえば文字列）に対するポインタである。
テキスト記述は、アニメーション上またはディスプレイの他の場所（たとえばコ
ントロール・バー）にオーバーレイとして表示することが可能である。クロスリ
ンクしている間、適切なデフォルト値が、識別された運動のタイプに基づいて各
テキスト記述に割り当てられる。図１６を参照すると、たとえば、３つの描写さ
れているアニメーション・セグメント２２１、２２３、２２５のそれぞれに対す
るデフォルト・テキスト記述は、それぞれ「カメラ静止」、「カメラ・パン」、
「カメラ・ズーム」である。これらのデフォルト値は、クロスリンク中、または
後のビデオまたはアニメーションの再生中、ユーザが編集することができる。代
替実施形態では、背景フレーム・データ構造２１５、２１６におけるテキスト・
ディスクリプタ（ＴＥＸＴＤＥＳＣＲ）はポインタではないが、テキスト記述
の表からテキスト記述を選択するために使用することができる指標である。A text descriptor (TEXT DESCR) can also be included as part of the background frame data structure 215, 216. In some embodiments,
A text descriptor is a pointer to a text description (eg, a string) that describes the part of the animation spanned by the background frame.
The text description can be displayed as an overlay on the animation or elsewhere on the display (eg, a control bar). While cross-linking, appropriate default values are assigned to each text description based on the type of exercise identified. Referring to FIG. 16, for example, the default text descriptions for each of the three depicted animation segments 221, 223, 225 are "Camera Still", "Camera Pan",
"Camera zoom". These default values can be edited by the user during cross-linking or during later playback of a video or animation. In an alternative embodiment, text in the background frame data structures 215, 216
The descriptor (TEXT DESCR) is not a pointer, but is an index that can be used to select a text description from a table of text descriptions.

【００９１】上記のクロスリンク構成を用いてビデオ・フレームが表示されているとき、ク
ロスリンク・データ構造２１２の対応するビデオ・フレーム要素は、アニメーシ
ョンのクロスリンク背景フレーム・データ構造２１５、２１６を識別するために
参照することが可能である。次いで背景フレーム・データ構造２１５、２１６に
おける画像ポインタは、背景フレームが複合または非複合画像から引き出されて
いるかを決定するために参照することが可能である。複合画像の場合、ユーザは
、ビデオの再生中に複合画像が利用可能であることを（たとえば視覚または音声
の支持メッセージによって）通知されることが可能である。次いでユーザは、ア
ニメーションを再生するか、または背景画像内で見て操縦することを選択する。
たとえば、パノラマの場合、ユーザはパノラマ視覚ツール（すなわち、ユーザが
選択した複合画像の部分を表示するために汎用コンピュータで実行することがで
きるソフトウェア・プログラム）を用いて、パノラマを見ることが可能である。
同様に、高解像度静止画像の場合は、ユーザは、ビデオ源では利用不可能または
認識が困難であった詳細について認識するために、静止フレームとして画像を見
ることを望むことができる。ズーム可能静止画像の場合は、ユーザは静止フレー
ム上でズーム・インおよびズーム・アウトすることを望むことができる。アニメ
ーション内で指定されたホット・スポットを選択すること、アニメーション内で
動的オブジェクトを分離すること、オブジェクト運動または背景運動を指図する
ことなど、アニメーションで可能な他の活動も実施することができる。When a video frame is being displayed using the above crosslink configuration, the corresponding video frame element of the crosslink data structure 212 identifies the crosslink background frame data structure 215, 216 of the animation. It is possible to refer to. The image pointer in the background frame data structures 215, 216 can then be consulted to determine if the background frame has been derived from a composite or non-composite image. In the case of a composite image, the user may be notified (eg, by a visual or audio support message) that the composite image is available during the playback of the video. The user then chooses to play the animation or look and maneuver in the background image.
For example, in the case of a panorama, the user can view the panorama using a panoramic viewing tool (ie, a software program that can be executed on a general purpose computer to display the portion of the composite image selected by the user). is there.
Similarly, for high resolution still images, the user may want to view the image as a still frame to recognize details that were unavailable or difficult to recognize with the video source. For a zoomable still image, the user may want to zoom in and out on a still frame. Other activities possible with the animation can also be performed, such as selecting a designated hot spot in the animation, isolating dynamic objects in the animation, directing object or background motion.

【００９２】図１６は、ビデオ源におけるビデオ・フレーム２３０のシーケンスと、上記の
アニメーション制作技術を用いて創出したアニメーションからの背景画像２３１
の間のクロスリンク関係の図である。図からわかるように、ビデオ・フレームの
シーケンスは、クロスリンク２１７を経由してそれぞれ背景画像２２１、２２３
、２２５、２２７に関連付けられている４つのビデオ・セグメント２２２、２２
４、２２６、２２８を含む。ビデオ・セグメント２２２は、静止シーン（すなわ
ちある運動閾値内で静止している）を示し、対応する静止背景画像２２１にクロ
スリンクされている。ビデオ・セグメント２２４は、カメラのパンによって生じ
たシーンを示し、ビデオ・セグメント２２４から２つまたはそれ以上のフレーム
を処理またはステッチすることにより創出した対応するパノラマ２２３にクロス
リンクされている。ビデオ・セグメント２２６は、カメラのズームによって生じ
たシーンを示し、高解像度でズーム可能な静止画像２２５にクロスリンクされて
いる。ビデオ・セグメント２２８は、カメラを１つまたは複数の３Ｄオブジェク
トのまわりで動かすことにより生じたシーンを示し、３Ｄオブジェクト画像にク
ロスリンクされている。上記のように、高解像度静止画像および３Ｄオブジェク
ト画像は、ビデオ・セグメント（たとえば、ビデオ・セグメント２２２、２２４
、２２６、２２８）からフレームを処理および複合することにより創出される。FIG. 16 shows a sequence of video frames 230 in a video source and a background image 231 from an animation created using the animation production techniques described above.
It is a figure of the cross link relationship between. As can be seen, the sequence of video frames is transmitted via the cross link 217 to the background images 221 and 223, respectively.
, 225, 227 associated with four video segments 222, 22
4, 226, 228. Video segment 222 shows a still scene (ie, is still within a certain motion threshold) and is cross-linked to a corresponding still background image 221. Video segment 224 shows a scene caused by camera panning and is cross-linked to a corresponding panorama 223 created by processing or stitching two or more frames from video segment 224. Video segment 226 shows the scene created by the camera zoom and is cross-linked to a high resolution, zoomable still image 225. Video segment 228 shows a scene created by moving the camera around one or more 3D objects, and is cross-linked to the 3D object image. As described above, the high resolution still image and the 3D object image are converted to video segments (eg, video segments 222, 224).
, 226, 228) by processing and combining the frames.

【００９３】図１７は、再生システムによって生成されたディスプレイ２４１を示す。ある
実施形態によれば、再生システムは、ビデオまたはアニメーションをディスプレ
イ２４１とすることができる。図１７に示すように、再生システムはディスプレ
イ２４１上でビデオを作成している。ディスプレイ２４１の下部では、巻き戻し
ボタン、再生ボタン、ポーズ・ボタン、および停止ボタンを含むコントロール・
バー２４２が提示されている。ある実施形態によれば、各ビデオ・フレームが作
成されると、対応するビデオ・フレーム要素とアニメーションの背景フレームの
間のクロスリンクは、背景フレームが高解像度静止画像、パノラマ画像またはズ
ーム可能画像から引き出されているかどうかを決定することになる。たとえば、
背景フレームがパノラマ画像から引き出されていれば、図１７でＰＡＮを示すア
イコンがディスプレイ、強調、または他の方法で活動中であることが示されてい
る。可聴トーンも、パノラマ画像が利用可能であることを示すために生成するこ
とが可能である。パノラマ画像が利用可能であるという指示に応答して、ユーザ
は、ビデオの表示を停止させるため、またはパノラマ画像を表示させるために、
（たとえばマウスまたは他の手持ち式制御装置を使用して）ＰＡＮアイコンをク
リックまたは別の方法で選択することが可能である。パノラマ画像が表示されて
いるとき、パノラマ画像を操縦するプログラム・コードは、既に備わっているの
でなければ、再生システムのオペレーティング・メモリに自動的に装備され、ユ
ーザがパノラマの透視図をパン、傾斜、およびズームすることが可能であるよう
に実行される。ＰＡＮアイコンの場合のように、静止またはズーム・アイコンＳ
ＴＩＬＬ、ＺＯＯＭが活動状態になると、ユーザは、高解像度静止画像またはズ
ーム可能画像を見るために、適切なＳＴＩＬＬまたはＺＯＯＭアイコンをクリッ
クすることができる。FIG. 17 shows a display 241 generated by the playback system. According to some embodiments, the playback system may display video or animation on the display 241. As shown in FIG. 17, the playback system is creating a video on the display 241. At the bottom of the display 241, controls including a rewind button, a play button, a pause button, and a stop button
A bar 242 is presented. According to one embodiment, as each video frame is created, the crosslink between the corresponding video frame element and the animated background frame is such that the background frame is from a high resolution still image, panoramic image or zoomable image. You will determine if it has been withdrawn. For example,
If the background frame is derived from the panoramic image, the icon indicating PAN is shown in FIG. 17 as being displayed, highlighted, or otherwise active. An audible tone can also be generated to indicate that a panoramic image is available. In response to the indication that a panoramic image is available, the user can either stop displaying the video, or display the panoramic image,
The PAN icon can be clicked or otherwise selected (e.g., using a mouse or other hand-held control). When a panoramic image is being displayed, the program code for manipulating the panoramic image, if not already provided, is automatically installed in the operating system's operating memory, allowing the user to pan and tilt the panoramic perspective view. And it is possible to zoom. Static or zoom icon S, as in the case of the PAN icon
When TILL, ZOOM is active, the user can click on the appropriate STILL or ZOOM icon to view the high resolution still image or zoomable image.

【００９４】ビデオは、１つまたは複数の三次元オブジェクト、またはビデオに関係付けら
れたシーンにリンクすることができる。ビデオの再生中に三次元オブジェクトへ
のリンクが呼び出されると、上記に類似の方式で、三次元オブジェクトの特定の
ビューが表示される。オブジェクトの異なる透視図を生成するために、ユーザが
三次元座標系で仮想カメラの方向と位置を変化させることを可能とするためにプ
ログラム・コードが実行される。A video can be linked to one or more three-dimensional objects, or scenes associated with the video. When a link to a three-dimensional object is invoked during video playback, a particular view of the three-dimensional object is displayed in a manner similar to that described above. To generate different perspectives of the object, program code is executed to allow a user to change the direction and position of the virtual camera in a three-dimensional coordinate system.

【００９５】ある実施形態では、コントロール・バー２４２は、ビデオの表示とビデオにク
ロスリンクされているアニメーションの表示の間で切り替えるために使用するこ
とができるアイコンＡＮＩＭ／ＶＩＤＥＯも含む。ユーザがＡＮＩＭ／ＶＩＤＥ
Ｏボタンをクリックすると、その時表示されているビデオ・フレームに対応する
ビデオ・フレーム要素は、アニメーションにおけるクロスリンクフレームを識別
するために検査される。アニメーションにおけるクロスリンクフレームのタイム
・スタンプは、アニメーションの背景トラックおよびオブジェクト・トラック内
で相対開始時間を決定するために使用され、それに応じて再生システムはアニメ
ーションを作成する。アニメーションの再生中に、ユーザが再びＡＮＩＭ／ＶＩ
ＤＥＯアイコンをクリックすれば、その時の背景トラック・データ構造は、ビデ
オにおけるクロスリンク・フレームを識別するために検査される。次いでクロス
リンクフレームでビデオの再生が再開される。In one embodiment, control bar 242 also includes an icon ANIM / VIDEO that can be used to switch between displaying a video and displaying an animation cross-linked to the video. The user is ANIM / VIDE
When the O button is clicked, the video frame element corresponding to the currently displayed video frame is examined to identify cross-link frames in the animation. The time stamp of the crosslink frame in the animation is used to determine the relative start time in the background and object tracks of the animation, and the playback system creates the animation accordingly. During the playback of the animation, the user re-enters ANIM / VI
Clicking on the DEO icon causes the current background track data structure to be examined to identify cross-link frames in the video. Then, the reproduction of the video is restarted at the cross link frame.

【００９６】図１８は、再生システムにおいてアニメーションの再生によって生成された代
替ディスプレイ２６１を描写する。ある実施形態では、ディスプレイ２６１内の
コントロール・バー２６２は、巻き戻し、再生、ポーズ、停止、およびアニメー
ションの再生のアイコン（すなわち、アイコンＲＥＷＩＮＤ、ＰＬＡＹ、ＰＡＵ
ＳＥ、ＳＴＯＰ）を含む。コントロール・バー２６２は、再生システムのユーザ
が、アニメーションの再生で時間的および空間的解像度に対する相対的な好みを
示すことを可能とするスライド・バー２６４の形態の解像度セレクタも含む。カ
ーソル制御装置を有するスライド２６５を選択し、スライド・バー２６４内でス
ライド２６５を左右に動かすことにより、ユーザは、空間的および時間的解像度
に対する好みを調節することができる。たとえば、スライド・バー２６４内でス
ライド２６５が最も左の位置にあると、最大空間解像度に対する好みが示され、
スライド２６５がスライド・バー２６４内で最も右の位置に動くと、最大時間解
像度に対する好みが示されるのである。FIG. 18 depicts an alternative display 261 created by playing an animation in a playback system. In one embodiment, control bar 262 in display 261 includes icons for rewind, play, pause, stop, and play animation (ie, icons REWIND, PLAY, PAU).
SE, STOP). The control bar 262 also includes a resolution selector in the form of a slide bar 264 that allows a user of the playback system to indicate relative preferences for temporal and spatial resolution in playing the animation. By selecting slide 265 with a cursor control and moving slide 265 left and right within slide bar 264, the user can adjust his preferences for spatial and temporal resolution. For example, the leftmost position of slide 265 within slide bar 264 indicates a preference for maximum spatial resolution,
As slide 265 moves to the far right position within slide bar 264, a preference for maximum temporal resolution is indicated.

【００９７】ＡＮＩＭ／ＶＩＤＥＯアイコンは、ユーザがクロスリンクされたビデオとアニ
メーションの表示の間で切り替えさせるコントロール・バー２６２にある。図１
８に示す実施形態によれば、アニメーションを表示するために選択するとき、ク
ロスリンクされたビデオは、ピクチャ・イン・ピクチャ・フォーマットによりサ
ブウィンドウ２６８内で同時に表示される。ＡＮＩＭ／ＶＩＤＥＯアイコンをユ
ーザがクリックすると、ビデオがディスプレイ２６１の第１ビュー・エリアに表
示され、アニメーションがサブウィンドウ２６８に提示される。ピクチャ・イン
・ピクチャ能力は、ディスプレイ２６１に示しているメニュー（図示せず）から
使用可能または不可能とすることができる。The ANIM / VIDEO icon is on the control bar 262 that allows the user to switch between cross-linked video and animation display. FIG.
According to the embodiment shown in FIG. 8, when selecting to display the animation, the cross-linked video is displayed simultaneously in the sub-window 268 in a picture-in-picture format. When the user clicks on the ANIM / VIDEO icon, the video is displayed in the first view area of display 261 and the animation is presented in sub-window 268. The picture-in-picture capability may be enabled or disabled from a menu (not shown) shown on display 261.

【００９８】アニメーションとビデオの間のクロスリンクは、いくつかの有用なエフェクト
を与えるために使用することができる。たとえば、市場の操縦可能画像を店頭を
含むビデオのフレームにクロスリンクすることにより、市場で描写されている商
品やサービスの買い物をするためのパノラマ画像にスイッチするように、ビデオ
を見るユーザを促すことが可能である。商品とサービスの業務処理は、通信ネッ
トワークを経由して電子的に行うことが可能である。操縦可能な画像とビデオを
クロスリンクすることは、操縦可能な画像がパノラマまたはビデオのシーンにお
ける位置の他の複合画像である場合に、特に効果的となる。たとえば、ビデオが
操縦可能な環境（たとえば、飛行機、宇宙船、潜水艦、巡航船、建築物など）を
含む場合である。たとえば、巡航船上のあるキャラクタが、みやげ物店の前を歩
いて通り過ぎるシーンを想像されたい。これを見ている人は、自発的および直感
的にビデオを停止し、みやげ物店をブラウズすることができるのである。[0098] Crosslinks between animation and video can be used to provide some useful effects. For example, by cross-linking a steerable image of a market to a frame of video that includes a storefront, prompting the user to watch the video to switch to a panoramic image for shopping for goods or services depicted in the market. It is possible. Business processing of goods and services can be performed electronically via a communication network. Cross-linking the steerable image with the video is particularly effective where the steerable image is a panorama or other composite image of a location in the video scene. For example, when the video includes a steerable environment (eg, an airplane, spacecraft, submarine, cruise ship, building, etc.). For example, imagine a character on a cruise ship walking in front of a souvenir shop. The person watching it can voluntarily and intuitively stop the video and browse the souvenir shop.

【００９９】クロスリンクの他の有用な応用により、ユーザはビデオを構成することが可能
となる。ユーザは、ビデオのクロスリンクされたフレームに達するとき、アニメ
ーション・シーケンスが自動的に呼び出されるように、アニメーション・シーケ
ンスをビデオにクロスリンクすることができる。アニメーション・シーケンスの
終了に達するとき、ビデオの表示は他のクロスリンクされたビデオ・フレームで
再開することが可能である。ユーザは、選択的にアウトテークをビデオのシーン
に追加したり、ビデオの部分をアニメーション・シーケンスで置き換えることが
できるのである。[0099] Another useful application of crosslinks allows a user to compose a video. The user can cross-link the animation sequence to the video so that the animation sequence is automatically invoked when a cross-linked frame of the video is reached. When the end of the animation sequence is reached, the display of the video can resume with another cross-linked video frame. The user can selectively add outtakes to the video scene or replace portions of the video with animation sequences.

【００１００】明細書ではこれまで、特定の例示的な実施形態に関連して本発明の説明を行っ
てきた。しかし、添付の請求項で述べるように、本発明のより広範な精神および
範囲から逸脱することなく、特定の例示的な実施形態に様々な修正および変更を
行うことが可能であることが明らかとなる。したがって、明細書および図は、限
定的という感覚ではなく、むしろ例証的であるという感覚でみなされるものであ
る。The foregoing description of the invention has been described with reference to specific exemplary embodiments. However, it will be apparent that various modifications and changes can be made to certain exemplary embodiments without departing from the broader spirit and scope of the invention, as set forth in the appended claims. Become. Accordingly, the specification and figures are to be regarded in an illustrative, rather than a restrictive, sense.

[Brief description of the drawings]

【図１】アニメーションの創出および配信を示す図である。FIG. 1 is a diagram showing creation and distribution of animation.

【図２】一実施形態によるアニメーション制作システムのブロック図である。FIG. 2 is a block diagram of an animation production system according to one embodiment.

【図３】一実施形態による背景トラック生成器のブロック図である。FIG. 3 is a block diagram of a background track generator according to one embodiment.

【図４Ａ】背景トラック生成器内でシーン変化推定装置によって識別されたビデオ・セグ
メントを示す図である。FIG. 4A illustrates a video segment identified by a scene change estimator in a background track generator.

【図４Ｂ】図３に描写した背景運動推定装置、背景フレーム・コンストラクタ、および背
景混合推定装置の動作を説明する流れ図である。FIG. 4B is a flowchart illustrating the operation of the background motion estimator, background frame constructor, and background mixing estimator depicted in FIG. 3;

【図５】図３に描写した背景フレーム・コンストラクタによって生成された背景画像セ
ットを示す図である。FIG. 5 is a diagram illustrating a background image set generated by the background frame constructor depicted in FIG. 3;

【図６】一実施形態によるオブジェクト・トラック生成器のブロック図である。FIG. 6 is a block diagram of an object track generator according to one embodiment.

【図７Ａ】図３のシーン変化推定装置４１によって識別されたビデオ・セグメントを描写
する図である。FIG. 7A depicts a video segment identified by the scene change estimator 41 of FIG.

【図７Ｂ】一実施形態によるオブジェクト・トラック生成器の動作の流れ図１００である
。FIG. 7B is a flowchart 100 of the operation of the object track generator according to one embodiment.

【図８】一実施形態によるアニメーション・オブジェクトの図である。FIG. 8 is a diagram of an animation object according to one embodiment.

【図９Ａ】背景混合を実施するために使用することができる背景フレーム混合データ構造
の例示的な実施形態を示す図である。FIG. 9A illustrates an exemplary embodiment of a background frame mixing data structure that can be used to implement background mixing.

【図９Ｂ】不連続混合機能を示す図である。FIG. 9B illustrates a discontinuous mixing function.

【図１０】アニメーション再生中に、模範的なアニメーション・オブジェクトの背景トラ
ックおよびオブジェクト・トラックを補間されたフレームを合成するために使用
することが可能である方法を示す図である。FIG. 10 illustrates how background tracks and object tracks of exemplary animation objects can be used to synthesize interpolated frames during animation playback.

【図１１】アニメーション・キーフレームの複数の時間的解像度を提供する技術を示す図
である。FIG. 11 illustrates a technique for providing multiple temporal resolutions of animation keyframes.

【図１２】アニメーション・キーフレームの複数の空間的解像度を提供する技術を示す図
である。FIG. 12 illustrates a technique for providing multiple spatial resolutions of animation keyframes.

【図１３】再生システムに配信されるアニメーション・データ・ストリームの内容を制御
するサーバ・システムの使用を示す図である。FIG. 13 illustrates the use of a server system to control the content of an animation data stream delivered to a playback system.

【図１４Ａ】ビデオ源から創出されたビデオ源とアニメーションの間のクロスリンクを設立
するクロスリンク生成器の使用を示す図である。FIG. 14A illustrates the use of a crosslink generator to establish a crosslink between a video source created from a video source and an animation.

【図１４Ｂ】第２ビデオから創出された第１ビデオとアニメーションの間のクロスリンクを
設立するクロスリンク生成器の使用を示す図である。FIG. 14B illustrates the use of a crosslink generator to establish a crosslink between a first video and an animation created from a second video.

【図１５】一実施形態によるクロスリンク・データ構造を示す図である。FIG. 15 illustrates a crosslink data structure according to one embodiment.

【図１６】ビデオ源のビデオ・フレームのシーケンスとアニメーションからの背景画像の
間のクロスリンク関係の流れ図である。FIG. 16 is a flow diagram of a crosslink relationship between a sequence of video frames of a video source and a background image from an animation.

【図１７】再生システムによって生成された表示を描写する図である。FIG. 17 depicts a display generated by the playback system.

【図１８】再生システムでアニメーションの再生によって生成された代替表示を描写する
図である。FIG. 18 is a diagram depicting an alternative display generated by playing an animation in the playback system.

───────────────────────────────────────────────────── フロントページの続き (31)優先権主張番号０９／０９６，４８７ (32)優先日平成10年６月11日(1998．6．11) (33)優先権主張国米国（ＵＳ） (81)指定国ＥＰ(ＡＴ，ＢＥ，ＣＨ，ＣＹ，ＤＥ，ＤＫ，ＥＳ，ＦＩ，ＦＲ，ＧＢ，ＧＲ，ＩＥ，ＩＴ，ＬＵ，ＭＣ，ＮＬ，ＰＴ，ＳＥ)，ＯＡ(ＢＦ，ＢＪ，ＣＦ，ＣＧ，ＣＩ，ＣＭ，ＧＡ，ＧＮ，ＧＷ，ＭＬ，ＭＲ，ＮＥ，ＳＮ，ＴＤ，ＴＧ)，ＡＰ(ＧＨ，ＧＭ，ＫＥ，ＬＳ，ＭＷ，ＳＤ，ＳＬ，ＳＺ，ＵＧ，ＺＷ)，ＥＡ(ＡＭ，ＡＺ，ＢＹ，ＫＧ，ＫＺ，ＭＤ，ＲＵ，ＴＪ，ＴＭ)，ＡＥ，ＡＬ，ＡＭ，ＡＴ，ＡＵ，ＡＺ，ＢＡ，ＢＢ，ＢＧ，ＢＲ，ＢＹ，ＣＡ，ＣＨ，ＣＮ，ＣＵ，ＣＺ，ＤＥ，ＤＫ，ＥＥ，ＥＳ，ＦＩ，ＧＢ，ＧＤ，ＧＥ，ＧＨ，ＧＭ，ＨＲ，ＨＵ，ＩＤ，ＩＬ，ＩＮ，ＩＳ，ＪＰ，ＫＥ，ＫＧ，ＫＰ，ＫＲ，ＫＺ，ＬＣ，ＬＫ，ＬＲ，ＬＳ，ＬＴ，ＬＵ，ＬＶ，ＭＤ，ＭＧ，ＭＫ，ＭＮ，ＭＷ，ＭＸ，ＮＯ，ＮＺ，ＰＬ，ＰＴ，ＲＯ，ＲＵ，ＳＤ，ＳＥ，ＳＧ，ＳＩ，ＳＫ，ＳＬ，ＴＪ，ＴＭ，ＴＲ，ＴＴ，ＵＡ，ＵＧ，ＵＺ，ＶＮ，ＹＵ，ＺＡ，ＺＷ (72)発明者ブラント，ジョナサンアメリカ合衆国・95065・カリフォルニア州・サンタクルズ・ライダーリッジロード・377 Ｆターム(参考） 5B050 AA08 BA06 BA08 BA09 BA10 BA11 BA13 CA05 CA06 CA08 DA04 DA07 EA03 EA06 EA09 EA13 EA18 EA19 EA24 EA27 EA28 FA02 FA05 FA06 GA08 【要約の続き】フレームの数より少ない。ビデオとアニメーションの連結については、第１ビデオのそれぞれのフレームに対応する要素を含むデータ構造が生成されている。第２ビデオから創出されたアニメーションを示す情報は、１つまたは複数のデータ構造の要素に格納されている。──────────────────────────────────────────────────続き Continued on the front page (31) Priority claim number 09 / 096,487 (32) Priority date June 11, 1998 (June 11, 1998) (33) Priority claim country United States (US) ( 81) Designated countries EP (AT, BE, CH, CY, DE, DK, ES, FI, FR, GB, GR, IE, IT, LU, MC, NL, PT, SE), OA (BF, BJ, CF, CG, CI, CM, GA, GN, GW, ML, MR, NE, SN, TD, TG), AP (GH, GM, KE, LS, MW, SD, SL, SZ, UG, ZW) , EA (AM, AZ, BY, KG, KZ, MD, RU, TJ, TM), AE, AL, AM, AT, AU, AZ, BA, BB, BG, BR, BY, CA, CH, CN , CU, CZ, DE, DK, EE, ES, FI, GB, GD, GE, GH, GM, HR, HU, ID, IL, IN, IS, JP, KE, KG, KP, KR, KZ, LC, LK , LR, LS, LT, LU, LV, MD, MG, MK, MN, MW, MX, NO, NZ, PL, PT, RO, RU, SD, SE, SG, SI, SK, SL, TJ, TM, TR, TT, UA, UG, UZ, VN, YU, ZA, ZW (72) Inventor Brandt, Jonathan United States, 95065, California, Santa Cruz, Rider Ridge Ridge Road, 377 F Term (Reference) BA06 BA08 BA09 BA10 BA11 BA13 CA05 CA06 CA08 DA04 DA07 EA03 EA06 EA09 EA13 EA18 EA19 EA24 EA27 EA28 FA02 FA05 FA06 GA08 [Continuation of summary] Less than the number of frames. For the connection between video and animation, a data structure is generated that includes elements corresponding to each frame of the first video. Information indicating the animation created from the second video is stored in one or more data structure elements.

Claims

[Claims]

1. Examining a sequence of video images to identify a first variation of a scene depicted in the sequence of video images; and a first image representing a scene prior to the first variation from the sequence of video images. And obtaining a second image representing the scene after the first deformation, and between the first and second images to create a video effect indicating the first deformation and approximating the display of the sequence of video images. Generating information that can be used to interpolate in a computer-implemented method for creating an animation.

2. Examining the sequence of video images to identify a first variant of the scene determines when a difference between a selected one of the video images and a next one of the video images exceeds a threshold. The method of claim 1, wherein the selected one of the video images and the next one of the video images indicate a start image and an end image of a segment of the video image, respectively.

3. The method according to claim 2, wherein the start picture of the segment of the video picture indicates the end picture of the preceding section of the video picture.

Determining when a difference between a selected one of the video images and the next one of the video images exceeds a threshold, selecting a video image following the starting image from the sequence of video images. Comparing the video image following the starting image with the adjacent preceding video image from the sequence of video images to generate an incremental difference value, adding the incremental difference value to the sum of the incremental difference values, selecting 3. The method of claim 2, comprising: repeating the comparing, comparing, and adding acts until the sum of the incremental difference values exceeds a threshold.

5. When the next one of the video images is added to the sum of the incremental differences,
5. The method of claim 4, wherein the video image is used to generate an incremental difference that causes the sum of the incremental differences to exceed a threshold.

6. The method of claim 5, wherein the end image of the set of video images is adjacent to the next one of the video images.

7. The method of claim 2, wherein the difference between the selected one of the video images and the next one of the video images includes a difference caused by a change in a camera arrangement used to record the sequence of video images. The described method.

8. The method of claim 2, wherein the difference between the selected one of the video images and the next one of the video images includes a color difference.

9. The method of claim 2, wherein the difference between the selected one of the video images and the next one of the video images includes a difference in elapsed time between the selected video image and the next one of the video images. The method described in.

10. The method of claim 1, wherein obtaining a first image and a second image from a sequence of video images includes selecting a start image and an end image of a set of video images as the first image and the second image, respectively. Item 3. The method according to Item 2.

11. A method for obtaining a second image from a sequence of video images, identifying one or more dynamic objects in a final image, and one or more dynamic objects to create a second image. 3. The method of claim 2, comprising removing the strategic object.

12. The method of claim 1, wherein identifying one or more dynamic objects in the end image comprises receiving one or more of the set of video images undergoing a second deformation not indicated by the first deformation in the set of video images. 12. The method of claim 11, comprising identifying a characteristic of:

13. The method according to claim 12, wherein the second variation comprises a change in the arrangement of one or more dynamic objects that is not due to a change in the arrangement of the cameras used to record the sequence of video images. Method.

14. Generating information that is indicative of a first deformation and that can be used to interpolate between the first and second images, comprising: The method of claim 1, further comprising: generating a value indicating the time elapsed between displaying the first image and displaying the second image.

15. The method of claim 15, wherein generating a value indicative of a degree of change comprises generating a value indicative of a degree of change caused by a change in an arrangement of cameras used to record the sequence of video images. Item 15. The method according to Item 14.

16. The method of claim 14, wherein generating a value indicative of a degree of change comprises generating a value indicative of a degree of color change.

17. Identifying a first variation of a scene depicted in the sequence of video images that indicates a change in camera arrangement used to record the sequence of video images; and identifying a first variation of the scene depicted in the sequence of video images. Identifying an object from the first and second images of the sequence of video images to identify a second deformation of the scene that indicates a change in the position of the object in the scene, and to generate first and second background images. Removing each region including the first and second background images between the first and second background images to create an interpolated background image that is displayable to approximate the first deformation of the scene and to approximate the first deformation of the scene. Generating background information that can be used to interpolate. A computer-implemented method for creating an animation.

18. Generating first and second object images including respective regions removed from the first and second images of a sequence of video images, wherein the first object image is a dynamic object before a second deformation. And the second object image represents a dynamic object after the second transformation. Interpolation between the first and second object images to create the interpolated object image, indicating the second transformation. 18. The method of claim 17, further comprising: generating object information that can be used to: display the interpolated object image to approximate a change in position of an object in the scene.

19. The method according to claim 19, further comprising the steps of:
And storing the second background image and the background information; and storing the first and second background images in the object track of the animation object.
19. The method of claim 18, further comprising storing the object image and the object information.
The method described in.

20. The method of claim 19, further comprising transmitting the animation object to a computer network in response to a request from the animation playback device.

21. A background track generator that examines a sequence of video images and generates a background track therefrom, wherein the background track is between a sequence of background frames and a background frame to synthesize additional images. A background track generator that includes deformation information that can be used to interpolate; and an object track generator that examines a sequence of video images and generates an object track therefrom, wherein the object track is , An object track generator comprising a sequence of object frames and deformation information that can be used to interpolate between the object frames to synthesize additional object images. .

22. The animation production system according to claim 21, further comprising an animation object generator for storing a background track and an object track in the animation object for later recall.

23. The animation production system of claim 22, further comprising: receiving a request to download the animation object from one or more client devices and responding to the one or more client devices. An animation distribution system comprising a communication device for transmitting an animation.

24. The animation production system according to claim 22, wherein reproduction time information is stored in the animation object to indicate a relative reproduction time with respect to the object track and the background track.

25. The system of claim 2, wherein at least one of the background track generator and the object track generator is implemented by a programmed processor.
2. The animation production system according to 1.

26. A scene change estimator for resolving a sequence of video images into one or more video segments, a background track generator, and in one or more video segments, based on respective deformations. 3. A background motion estimator for generating deformation information, and a background frame constructor for generating a sequence of background frames based on each deformation in one or more video segments.
2. The animation production system according to 1.

27. The animation production system according to claim 26, wherein the background track generator further comprises a mixing estimator for generating mixing information for combining background frames in a sequence of background frames.

28. The animation production system according to claim 27, wherein the mixed information indicates a cross dissolve operation.

29. The background frame constructor generates at least one background frame of a sequence of background frames by combining one or more images from one or more video segments. Animation production system.

30. The animation production system according to claim 29, wherein the background frame constructor combines one or more images into a panoramic image by stitching the one or more images.

31. The animation production system according to claim 29, wherein the background frame constructor combines one or more images into a high resolution image.

32. A computer-readable medium having instructions stored thereon, the instructions being, when executed by a processor, for indicating to a processor a first variation of a scene depicted in a sequence of the video image. Examining the sequence; obtaining a first image and a second image from the sequence of video images, wherein the first image represents the scene before the first deformation and the second image represents the scene after the first deformation. Generating information that can be used to interpolate between the first image and the second image to create a video effect that approximates the display of the sequence of video images; A medium that does things.

33. The computer readable medium of claim 32, wherein the computer readable medium includes one or more mass storage disks.

34. The computer readable medium according to claim 33, wherein the computer readable medium is a computer data signal encoded on a carrier wave.

35. The instructions for causing the processor to examine a sequence of video images to identify a first variant of the scene, the instructions comprising, when executed, causing the processor to determine a difference between a selected one of the video images and a next one of the video images. 34. The computer-readable method of claim 33, further comprising instructions for determining when is greater than a threshold, wherein the selected one of the video images and the next one of the video images indicate a start image and an end image of the set of video images, respectively. Medium.

36. An instruction, which when executed, causes the processor to determine when a difference between a selected one of the video images and a next one of the video images exceeds a threshold, the processor comprising: Selecting the following video image; comparing the video image following the start image with an adjacent preceding video image from the sequence of video images to generate an incremental difference value; 36. The computer-readable medium of claim 35, comprising instructions for adding to the sum and repeating the act of selecting, comparing, and adding until the sum of the incremental difference values exceeds a threshold.

37. An animation comprising: inspecting the sequence of video images to identify a first variation of the scene depicted in the sequence of video images; and displaying the scene representing the scene prior to the first variation from the sequence of video images. Acquiring one image and a second image representing the scene after the first deformation; and information indicating the first deformation and the first image and the second image to create a video effect that approximates the display of the sequence of video images. A computer-readable medium having data stored for displaying a sequence of images from its animation that has been created by generating information that can be used to interpolate between the images.

38. Generating a data structure including elements corresponding to respective frames of the first video, and storing one or more elements of data structure information indicating an image of an animation created from the second video. How to link videos and animations, including:

39. The method of claim 38, wherein generating a data structure including the element comprises generating a data structure including a respective element for each frame of the first video.

40. The method of claim 38, wherein storing information indicative of an image of the animation comprises storing a reference value indicative of a key frame of the animation.

41. The method of claim 40, wherein storing a reference to a key frame of the animation comprises storing a reference value indicating a background frame of the animation object.

42. Storing a reference value indicating a background frame includes storing an address of a background frame data structure, wherein the background frame data structure includes information indicating a background image and the background image includes a composite image. 42. The method of claim 41, comprising information indicating whether

43. The method of claim 42, wherein the information indicating whether the background image is a composite image includes information indicating whether the background image is a panoramic image.

44. The method of claim 38, wherein the data structure is an array of elements.

45. The data structure linked to a list of elements.
The method described in.

46. The first video and the second video are the same video.
9. The method according to 8.

47. The method of claim 3, wherein the first video is generated using the second video.
9. The method according to 8.

48. The method of claim 38, wherein the animation comprises a high resolution still image.

49. The animation comprising a multi-resolution still image having first and second regions, wherein the first region has a higher pixel resolution than the second region.
The method described in.

50. The method of claim 38, wherein the animation comprises a still image having a wider field of view than the frames of the first video.

51. The method of claim 38, wherein the animation comprises a still image having a wider dynamic range than the frames of the first video.

52. The method of claim 38, wherein the animation comprises a still image having an aspect ratio different from an aspect ratio of a frame of the first video.

53. The method of claim 38, wherein the animation comprises a pair of still images forming a stereo image pair.

54. The animation as claimed in claim 38, wherein the animation comprises an image containing depth information.
The method described in.

55. The method of claim 38, wherein the animation comprises an object having three-dimensional geometric properties.

56. The method of claim 38, wherein the text description is associated with at least one image of the animation.

57. The method of claim 38, wherein the animation comprises an animation object having a plurality of elements corresponding to the images of the animation, wherein the method further comprises: converting one or more frames of the first video. A method comprising storing in one or more of a plurality of elements in the indicated animation object information.

58. The method of claim 38, wherein the animation comprises an animation object having a plurality of elements corresponding to the images of the animation, wherein the method further comprises the plurality of animation object information indicating a sequence of frames. Storing in one or more of the elements of the.

59. Displaying a frame of a video on a display of a playback system; and when the animation key frame is automatically generated using the frame of the video, the animation key corresponding to the frame of the video. Playing a data element associated with the frame of the video to identify the frame; and prompting a user of the playback system to begin displaying an image associated with the animation keyframe. How to display video on the system.

60. A method for determining whether an image associated with an animation keyframe is a composite image, and for viewing the composite image if the image associated with the animation keyframe is a composite image. 60. The method of claim 59, further comprising: signaling a user.

61. The method of claim 60, wherein determining whether an image associated with the animation keyframe is a composite image includes determining whether the image associated with the animation keyframe is a panoramic image. the method of.

62. Receiving a request from a user to view a panoramic image, and executing the program code in response to a request from the user to view the panoramic image in response to a steering input from the user. 63. The method of claim 61, further comprising:

63. The method of claim 62, wherein the maneuver input from the user includes a command to pan a perspective view of the scene depicted in the panoramic image in the horizontal direction.

64. The method of claim 62, wherein the maneuver input from the user includes a command to tilt a perspective view of the scene depicted in the panoramic image.

65. Determining whether the image associated with the animation keyframe is a composite image includes determining whether the image associated with the animation keyframe is a high resolution still image. 60. The method according to item 59.

66. Receiving a request from a user to view a high-resolution still image and responding to a zoom input from the user to scale the viewing of the high-resolution still image to the request from the user. 66. The method of claim 65, further comprising executing the program code in response.

67. Prompting the user of the playback system to begin displaying an image associated with the animation keyframe signals the user that it is possible to view the image associated with the animation keyframe. 60. The method of claim 59, comprising displaying an indicator on a display of the playback system to do so.

68. Prompting the user of the playback system to begin displaying an image associated with the animation keyframe signals the user that it is possible to view the image associated with the animation keyframe. 60. The method of claim 59, comprising activating an indicator on a display of the playback system to do so.

69. Activating an indicator on a playback system includes activating an indicator on a handheld control of the playback system.
9. The method according to 8.

70. A method for displaying a video on a playback system, the method comprising: displaying a frame of the video on a display of the playback system; and identifying an animation keyframe corresponding to the frame of the video. Inspects the data elements associated with the frames of the video and displays that the animation keyframes are automatically generated using the frames of the video, and simultaneously displays the frames of the video and animates them in a window on the display. Displaying an image associated with the keyframe;

71. A processor, a display coupled to the processor, a media reader coupled to the processor, and a memory coupled to the processor, wherein the memory is configured to execute to the processor, to a video frame. Video data with associated elements
Signaling a media reader to provide video data including a sequence of frames from a machine-readable medium, causing a sequence of video frames to be displayed on a display, and animation keyframes corresponding to one or more video frames Inspect the data structure elements associated with the video frames to identify the animation frame, and an animation keyframe is automatically generated using one or more of the video frames to provide the playback system user with an animation. A playback system comprising program code for prompting to start displaying an image associated with the key frame.

72. Displaying a frame of a video on a display of a playback system, and switching from displaying the video to displaying an image of a 3D object steerable image associated with the frame of the video. A method comprising: receiving input from a requesting user; and displaying a steerable image.

73. The method of claim 72, further comprising panning a perspective view of the steerable image in response to input from a user.

74. The method of claim 72, further comprising processing a sale of the item depicted in the steerable image in response to input from a user.

75. The method of claim 72, further comprising processing an arrangement to perform a service indicated by one or more characteristics of the steerable image in response to input from a user.

76. The method of claim 72, further comprising zooming a perspective view of the steerable image in response to input from a user.

77. The method of claim 72, wherein the steerable image is a panoramic image of a market that includes merchandise that can be purchased electronically.

78. The method according to claim 72, wherein the steerable image includes one or more three-dimensional objects.

79. A user requesting to display a frame of a video on a display of a playback system and to switch from displaying the video to displaying a three-dimensional object associated with the frame of the video. Receiving the input of the user.

80. The method of claim 79, further comprising changing a viewpoint from a viewpoint at which the three-dimensional object is displayed in response to a user input.

81. A computer readable medium having data stored for displaying a sequence of images from an animation, the method comprising: generating a data structure including an element corresponding to each frame of a first video; A computer-readable medium wherein the animation is linked to the video by storing one or more elements of data structure information indicating an image of the animation created from the video.

82. A method for storing a set of keyframes created from a video in an animation object and indicating a first sequence of keyframes selected from the set of keyframes.
Storing in the animation object one or more values and information for interpolating between the first sequence of keyframes, the number of keyframes of the first sequence of keyframes selected from the set of keyframes; One or more values indicating a lesser number of second sequences;
An animation object that stores information to interpolate between keyframes in a sequence.

83. Storing a set of keyframes includes storing first and second subsets of keyframes in an animation object, wherein the second subset of keyframes is stored in a first subset of keyframes. 83. The method of claim 82, wherein the included image comprises a reduced resolution version.

84. The method of claim 82, wherein each of the one or more values indicating the first sequence of selected keyframes is a reference value that identifies a respective keyframe in the set of keyframes.