WO2009151755A2 - Traitement vidéo - Google Patents

Traitement vidéo Download PDF

Info

Publication number
WO2009151755A2
WO2009151755A2 PCT/US2009/039728 US2009039728W WO2009151755A2 WO 2009151755 A2 WO2009151755 A2 WO 2009151755A2 US 2009039728 W US2009039728 W US 2009039728W WO 2009151755 A2 WO2009151755 A2 WO 2009151755A2
Authority
WO
WIPO (PCT)
Prior art keywords
mosaic
tracks
frame
video
frames
Prior art date
Application number
PCT/US2009/039728
Other languages
English (en)
Other versions
WO2009151755A3 (fr
Inventor
Andrew Fitzgibbon
Alexander Rav-Acha
Pushmeet Kohli
Carsten Rother
Original Assignee
Microsoft Corporation
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Microsoft Corporation filed Critical Microsoft Corporation
Priority to CN2009801282731A priority Critical patent/CN102100063B/zh
Publication of WO2009151755A2 publication Critical patent/WO2009151755A2/fr
Publication of WO2009151755A3 publication Critical patent/WO2009151755A3/fr

Links

Classifications

    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N5/00Details of television systems
    • H04N5/14Picture signal circuitry for video frequency region
    • H04N5/144Movement detection
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T11/002D [Two Dimensional] image generation
    • G06T11/60Editing figures and text; Combining figures or text
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T7/00Image analysis
    • G06T7/20Analysis of motion
    • G06T7/246Analysis of motion using feature-based methods, e.g. the tracking of corners or segments
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2207/00Indexing scheme for image analysis or image enhancement
    • G06T2207/20Special algorithmic details
    • G06T2207/20112Image segmentation details
    • G06T2207/20164Salient point detection; Corner detection
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N5/00Details of television systems
    • H04N5/14Picture signal circuitry for video frequency region
    • H04N5/147Scene change detection

Definitions

  • a method and apparatus for processing video is disclosed.
  • image features of an object within a frame of video footage are identified and the movement of each of these features is tracked throughout the video footage to determine its trajectory (track).
  • the tracks are analyzed, the maximum separation of the tracks is determined and used to determine a texture map, which is in turn interpolated to provide an unwrap mosaic for the object.
  • the process may be iterated to provide an improved mosaic. Effects or artwork can be overlaid on this mosaic and the edited mosaic can be warped via the mapping, and combined with layers of the original footage. The effect or artwork may move with the object's surface.
  • FIG. 1 shows four example frames from a portion of video footage
  • FIG. 2 shows a foreground object identified from the footage of FIG.1
  • FIG. 3 shows a plurality of tracks generated from the video footage
  • FIG. 4 shows a 2D map generating from the tracks shown in FIG. 3;
  • FIG. 5 shows a mosaic generated from the 2D map shown in FIG.4;
  • FIG. 6 shows an edited mosaic;
  • FIG. 7 shows four edited example frames from a portion of video footage
  • FIG. 8 is a flow diagram of a method for carrying out video editing
  • FIG. 9 is a flow diagram showing a step of the method in FIG. 8 in greater detail
  • FIG. 10 is a flow diagram showing a method of allowing user interaction in the editing process.
  • FIG. 11 illustrates an exemplary computing-based device in which embodiments of video editing may be implemented.
  • FIG. 1 shows a representation of four example frames 100, 102, 104, 106 taken from a portion of video footage.
  • the footage comprises a man 108 standing in front of a mountain vista 110, who turns from right to left with a gradually broadening smile.
  • the background mountain vista 110 also changes over time as a cloud 112 passes in front of the sun 114.
  • first frame 100 the man 108 is looking to his right. His left ear can be seen, as can a mole to the left of his nose.
  • second example frame 102 taken from the video footage a few seconds later, the man 108 has turned to face the camera. His left ear is no longer visible.
  • the cloud 112 has drifted over the sun 114.
  • third example frame 104 the man 108 remains facing the camera and has begun to smile broadly. His mole is now partially obscured by his nose.
  • the cloud 112 has drifted further over the sun 114.
  • fourth example frame 105 the man 108 has turned to look to his left. The mole is now completely obscured by his nose and his right ear can be seen.
  • the video comprises a total of T frames.
  • the real-life 3D surface currently on view captured is captured as a 2D projection.
  • the 3D surface is designated S(u) herein, and the 2D projection is designated I(x), where u is a 2D vector with coordinates (u,v) and x is a 2D vector in image space with coordinates (x,y).
  • I(x) the 2D projection
  • u is a 2D vector with coordinates (u,v)
  • x is a 2D vector in image space with coordinates (x,y).
  • Occlusion mask' or 'object space visibility map' b(u) which has the function if _S(u) _is _visible otherwise
  • each point u on the surface has a color and this can be used to build a texture map C(u)of the surface.
  • Each point in S(u) can be mapped to a point in C(u) using a 2D to 2D mapping, herein designated w(u).
  • Each pixel in I(x) also has a color. As the light incident on the camera is being focused, the color of the pixel will depend on function of all the colors in the area of the surface which is focused onto that pixel. The area which is focused is dependent on the 'point-spread function' p of the camera.
  • a further modification can be made to consider multiple objects in the sequence. If the number of surfaces (including the background) is L (in the example of the figures, L is 4- the man 108, the background 110, the sun 112 and the cloud 114), with each object represented by the tuple of functions (C , w 1 , b 1 ).
  • the images can then de defined as
  • the visibility masks b are now encoding inter-object occlusions (such as when the cloud 114 obscures the sun 112) as well as self-occlusions.
  • the mosaic is made up of a sampling of the set of frames, and the processing in this example encourages the sampling such that each pixel in the mosaic is taken from the input frame t in which it is most fronto-parallel (i.e. forward facing).
  • the video footage is uploaded onto a computing device such as the device 1000 shown in FIG. 11 (block 802).
  • the footage is separated into frames and each frame is segmented into independent objects (block 804).
  • the objects are the man 108, the background 110, the sun 112 and the cloud 114.
  • the computing device 1000 is able to compute segmentation maps automatically.
  • the method may include allowing or requiring user interaction to indicate the different objects. As will be familiar to the skilled person, this may comprise utilizing an input device to indicate which is a background object and which is a foreground object. This may comprise applying a 'brushstroke' to the image, for example using a mouse of the computing device 1000.
  • the user could in some examples provide an input in relation to several frames and the segmentation can be propagated using optical flow.
  • Manual input for segmentation may be appropriate for footage with many foreground objects with similar color statistics or background object(s) which exhibit similar movement to the foreground object(s), or any other footage that may confuse automatic algorithms.
  • FIG. 2 shows the man 200 once isolated from the background 110 following segmentation of the first example frame 100, and this segmentation is propagated through the frames.
  • FIG. 2 also shows a number of points 202, 204, 206, 208, 210 which have been determined (block 806) and which are associated with features of the image.
  • the points which are determined are selected on the basis that they can be tracked throughout the video footage. This is carried out by identifying easily recognizable features at points of high contrast or change within the image. In this example, the five points identified are the corner of the right ear hole 202, the mole 204, the right hand corner of the mouth 206, the right edge of the right eyebrow 208 and the right edge of the left eyebrow 210. It will be understood that in practical applications of the method, there may be many more points- perhaps many thousand- identified but the number of points shown herein is limited for reasons of clarity. The number of points identified may depend on the particular video and other examples may have a number of points between, for example, 100- 5000 points. [0023] The next stage is to track each of these points though the video footage
  • the point tracks 302, 304, 306, 308, 310 can be viewed as a multi-dimensional projection of the 2D surface parameters. As is discussed in greater detail below, in this example, each point is usually represented in the mosaic at the position along its trajectory where the track is furthest from all other tracks (i.e. where the tracks are optimally spaced).
  • the surface's parameter space can be recovered by computing an embedding of the point tracks into 2D, yielding coordinates for each track.
  • Such embeddings are known from the field of visualization, and have been used to create texture coordinates for 3D models so will be familiar to the skilled person.
  • the difference may be a 2 dimensional vector and in other examples, the difference may be a vector norm difference.
  • the separation of points in parameter space is commensurate with distances in the image frames at which pairs of tracks are maximally separated. This is also likely to be the point at which the underlying point is 'face on' in the image.
  • a 'softmax' function could be used to determine the difference.
  • a softmax function is a neural transfer function which calculates an output from a net input, and will also encourage the selection of fronto-parallel points. [0027] This allows a map 400 to be created (block 810), such as is shown schematically in FIG. 4.
  • Interpolating this map 400 allows each image to be warped into a common frame (block 812), which can be represented as a mosaic such as the mosaic 500 shown in FIG. 5.
  • a common frame block 812
  • Each point in the map 400 is replaced with the pixel corresponding to the feature in the video at its selected coordinates along its trajectory (which will correspond to a particular frame of the video footage).
  • the interpolation uses a variation of the standard mosaic stitching technique, which has the property that it often chooses a good texture map, even when some per-frame mappings have large errors. This is discussed in greater detail below.
  • the result is a mosaic such as the mosaic 500 shown in FIG. 5. This has the appearance of the object having been 'unwrapped', as if the surface has been peeled off the underlying object.
  • the initial mosaic may not be the best representation of the footage, but will generally be good enough to create a reference template to match against the original frames. Because this matching is to a single reference, it reduces any 'drift' that may have been present after the original tracking phase. 'Drift' might occur if, for example, a shadow moves over an object and the shadow, rather than the feature, is tracked or if two similar features are close together and the track mistakenly follows first one, then the other.
  • Regularization of the mapping defines a dense interpolation, so that track information propagates to occluded areas of the scene, giving a complete description of the object motion.
  • the mapping is defined only at the feature points, and is reliable only there.
  • regularization "smooths" the mapping, it extends the mapping's definition from feature points to all uv space, although it is less reliable the farther a point is from a feature point.
  • the term 'energy minimization' is used to refer to an optimization procedure which tries to obtain a solution which is the best choice under a cost or "energy" function.
  • the energy function associated with a solution measures its consistency with regard to observed data and prior knowledge.
  • a first step in the energy minimization procedure described herein comprises track refinement (block 902).
  • Track refinement allows an improved estimate of inter-track distances (i.e. track differences), so an improved embedding and mosaic can be achieved by iterating these processes. Because, as is described in greater detail below, each stage minimizes the same energy function, a consistent global convergence measure can be defined, and the end-point for iteration can be determined. For example, the iteration may be carried out until the change in E is ⁇ 1%.
  • a set of tracks can be assessed and refined as follows.
  • the first term in the energy is the data cost, which encourages the solution to be consistent with the observed data. It encourages the model to predict the input sequence, and in particular to explain every input image pixel and therefore identify the best tracks. If the input frames are I(*, t).
  • the basic form of the data cost is the sum [0032]
  • the robust norm min(
  • the robust kernel width ⁇ is set to match an estimate of image noise. For relatively noise-free video sequences, this can be set low, e.g. 10/255 gray-levels. For relatively noisy video sequences, this can be set higher, e.g. 50/255 gray-levels.
  • This cost is a discrete sum over the point samples in / , but contains a continuous integral in the evaluation of I(x,t) . Evaluating the integral yields the discrete model
  • the mosaic uses the early frames of the video footage, e.g. the first example frame 102, for the left hand side of the face, as this portion was 'face on' in these early frames.
  • the latter frames characterized by the broad smile
  • e.g. the fourth example frame 106 is used is creating the right hand side of the face. This can also be optimized by ensuring that w is smoothly mapped.
  • mapping w is a proxy for the projection of a 3D surface, which is assume to be undergoing smooth deformations over time. A relatively smooth camera motion is also assumed. The mapping is encouraged to be fronto-parallel in at least one frame. Without camera roll or zoom this could be expressed as the energy
  • the mapping Jacobian should be close to the identity in at least one frame.
  • an overall 2D affine transformation for each frame, H t is estimated (as will be familiar to the skilled person, an 'affine transformation' is a transformation between two vector spaces which comprises a transformation followed by a translation) and the following function is minimized
  • E temporal
  • This regularizer leads to a way to initialize the parameterization.
  • a "weak membrane” model uses Markov Random Fields within the Bayesian inference framework for image reconstruction and segmentation problems.
  • E btempora i ⁇ Potts(b( ⁇ ,t),b(u + Au( ⁇ ,t),t)) u,t
  • Au(u, t) J(u, t) ⁇ -l (, w(u, t + X) - w(u, t)) , using the Jacobian to convert local displacements in the image into displacements on the mosaic.
  • a final regularizing term encourages the texture map C to have the same texture statistics as the input sequence. Neighboring pixels in the texture map are encouraged to come from the same input image by adding the texture prior described below. [0044] A linear combination of the above terms yields the overall energy. [0045] The energy is written as a function of the discrete variables C, w,b :
  • E(c,w,b) E dala (c, w,b)+ ⁇ l E A W )+ ⁇ 2 E Wtem po r aM + ⁇ 3 E b (b)+ ⁇ 4 E btemporal (b)
  • the scale parameter in the embedding distance calculation r 3 is set higher for less detailed images (or portions of images) (e.g. about 40 pixels) and lower for more detailed images (or portions of images) (e.g. about 10 pixels). This parameter can also be utilized as a convergence control, by starting from a high value and reducing.
  • r(M,0 -I w (M,s) s t and setting C I w ( ⁇ ,s*) .
  • s * argmin ⁇ (r,
  • r(M,0 -I w (M,s) s t and setting C I w ( ⁇ ,s*) .
  • a texture prior may be added to the original energy, which encourages adjacent pixels in the texture map to be taken from the same input frame, yielding an energy of the form
  • One variable which does not appear as an explicit parameter of the energy functional relates to the parameterization of u space.
  • the data cost Edata is, by construction, invariant to reparametrization, but the regularizers are not. Specifically, the value of the regularization terms of the energy function is dependent on the parameterization of the u space (because derivatives with respect to u appear in these terms). This is not the case for the data cost, which is independent of how the u space is represented or encoded.
  • the initialization of the overall algorithm consists in obtaining sparse point tracks using standard computer vision techniques as was described above.
  • the i th track is the set ⁇ x (U 1 , t)
  • T 1 is the set of frame indices in which the point is tracked
  • U 1 is the unknown pre-image of the track in parameter space. Finding these U 1 will anchor all other computations.
  • the input x may be viewed as samples of w at some randomly spaced points whose locations are not given, but can be discovered, u must be found for each value of x, using the regularizer which involves derivatives with respect to u. [0052]
  • Finding the optimal parameterization then consists of assigning the U 1 values such that the warp regularizer terms E(w) + E wtempora iCw) are minimized. For a given pair of tracks, with coordinates U 1 and u, the energy of the mapping which minimizes the regularizer (subject to the mapping being consistent with the tracks) must be determined.
  • Each sub- problem is a quadratic form of size 1000 x 1000, which can be computed in a few minutes on a typical home computer.
  • the mosaic size is naturally selected by this process: because distances in (u, v) space are measured in pixels, and because pairs of points are encouraged to be as far apart as their greatest separation in the input sequence, a bounding box of the recovered coordinates is sized to store the model without loss of resolution.
  • a dense mapping w may be obtained given the tracks and their embedding coordinates. In this case, min E (w) + E 1 (w) w
  • occlusion may be taken into account, which requires minimizing over w and b.
  • the energy for the update is implemented as a variant of robust optical flow, alternating search for ⁇ w and b on a multiresolution pyramid.
  • E data (AW) ⁇ b( ⁇ ) ⁇ ( ⁇ ) - C(U + AW)f ⁇ gives a linear system in w which is readily solved.
  • Temporal smoothness is imposed via a forward/backward implementation where the mapping w and the mask b of the previous frame are transformed to the coordinate system of the current frame using the image-domain optic flow between frames, and added as a prior to the current estimate, as follows:
  • SIFT Scale Invariant Feature Transform
  • the interaction now described deals with improving the mosaic coverage. For example, not all frames of a video will represented in the final mosaic.
  • the mosaic is presented to a user (block 950), who realizes that one or more frames which contain a unique portion of information are not included (block 952) (this may result because of the rule that adjacent pixels come from the same frame), for example by observing that a feature such as the mole is missing from the mosaic. If this is observed, a user can force a particular frame to be included in the mosaic by applying a brush stroke to the feature in that frame (block 954). This results in stitching variable S( ⁇ ) being given fixed values for some set of ⁇ . These can be incorporated as hard constraints in the optimization of
  • the edit in this example is represented as an overlay on the texture map, which is warped by the 2D-2D mapping, masked by the occlusion masks, and alpha-blended with the original image.
  • 'alpha blending' is a combination of two colors allowing for transparency effects in an image.
  • Another possible edit is to remove layers or portions of layers: a removed area can be filled in because the mapping is defined even in occluded areas. This will allow the cloud 114 to be removed: the background 110 and the sun 112 can be filled in from other frames.
  • FIG. 11 illustrates various components of an exemplary computing-based device 1000 which may be implemented as any form of a computing and/or electronic device, and in which embodiments of the invention may be implemented.
  • the computing-based device 1000 comprises one or more inputs 1004 which are of any suitable type for receiving inputs such as an input from a digital video camera.
  • the device 1000 also comprises a communication interface 1008 for communicating with other entities such as servers, other computing devices, and the like.
  • Computing-based device 1000 also comprises one or more processors 1001 which may be microprocessors, controllers or any other suitable type of processors for processing computing executable instructions to control the operation of the device in order to carry out the functions required to process a video sequence.
  • Platform software comprising an operating system 1002 or any other suitable platform software may be provided at the computing-based device to enable application software 1005 to be executed on the device 100.
  • the computer executable instructions may be provided using any computer- readable media, such as memory 1003.
  • the memory is of any suitable type such as random information memory (RAM), a disk storage device of any type such as a magnetic or optical storage device, a hard disk drive, or a CD, DVD or other disc drive. Flash memory, EPROM or EEPROM may also be used.
  • An output 1007 may also be provided such as an audio and/or video output to a display system integral with or in communication with the computing-based device.
  • the display system may provide a graphical user interface, or other user interface of any suitable type although this is not essential.
  • 'computer' includes PCs, servers, mobile telephones, personal digital assistants and many other devices.
  • the methods described herein may be performed by software in machine readable form on a tangible storage medium.
  • the software can be suitable for execution on a parallel processor or a serial processor such that the method steps may be carried out in any suitable order, or simultaneously.
  • the local computer may download pieces of the software as needed, or execute some software instructions at the local terminal and some at the remote computer (or computer network).
  • the remote computer or computer network.
  • a dedicated circuit such as a DSP, programmable logic array, or the like.

Abstract

L'invention porte sur un procédé et sur un appareil pour un traitement vidéo. Dans un mode de réalisation, des caractéristiques d'image d'un objet dans une trame de séquence vidéo sont identifiées et le mouvement de chacune de ces caractéristiques est suivi tout au long de la séquence vidéo pour déterminer sa trajectoire (piste). Les pistes sont analysées, la séparation maximale des pistes est déterminée et utilisée pour déterminer une carte de texture, qui est à son tour interpolée pour fournir une mosaïque à plat pour l'objet. Le processus peut être itéré pour fournir une mosaïque améliorée. Des effets ou une maquette peut être superposée sur cette mosaïque et la mosaïque éditée peut être déformée par l'intermédiaire du mappage, et combinée avec des couches de la séquence initiale. Les effets ou la maquette peut être déplacée avec la surface de l'objet.
PCT/US2009/039728 2008-05-16 2009-04-07 Traitement vidéo WO2009151755A2 (fr)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN2009801282731A CN102100063B (zh) 2008-05-16 2009-04-07 视频处理方法

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
US12/122,129 2008-05-16
US12/122,129 US8824801B2 (en) 2008-05-16 2008-05-16 Video processing

Publications (2)

Publication Number Publication Date
WO2009151755A2 true WO2009151755A2 (fr) 2009-12-17
WO2009151755A3 WO2009151755A3 (fr) 2010-02-25

Family

ID=41316264

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/US2009/039728 WO2009151755A2 (fr) 2008-05-16 2009-04-07 Traitement vidéo

Country Status (3)

Country Link
US (1) US8824801B2 (fr)
CN (1) CN102100063B (fr)
WO (1) WO2009151755A2 (fr)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN103155538A (zh) * 2010-10-05 2013-06-12 索尼电脑娱乐公司 图像显示装置及图像显示方法

Families Citing this family (13)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
KR101041178B1 (ko) * 2008-05-21 2011-06-13 삼성전자주식회사 전자기기에서 영상 녹화 방법 및 장치
US9013634B2 (en) * 2010-09-14 2015-04-21 Adobe Systems Incorporated Methods and apparatus for video completion
US8872928B2 (en) 2010-09-14 2014-10-28 Adobe Systems Incorporated Methods and apparatus for subspace video stabilization
US8810626B2 (en) * 2010-12-20 2014-08-19 Nokia Corporation Method, apparatus and computer program product for generating panorama images
US9153031B2 (en) 2011-06-22 2015-10-06 Microsoft Technology Licensing, Llc Modifying video regions using mobile device input
US20130335635A1 (en) * 2012-03-22 2013-12-19 Bernard Ghanem Video Analysis Based on Sparse Registration and Multiple Domain Tracking
WO2013160533A2 (fr) * 2012-04-25 2013-10-31 Nokia Corporation Procédé, appareil et produit-programme d'ordinateur de génération d'images panoramiques
US9299160B2 (en) 2012-06-25 2016-03-29 Adobe Systems Incorporated Camera tracker target user interface for plane detection and object creation
US9984300B2 (en) * 2012-09-19 2018-05-29 Nec Corporation Image processing system, image processing method, and program
CN102930569B (zh) * 2012-09-28 2015-06-17 清华大学 不规则鳞片马赛克图片生成方法
EP2790152B1 (fr) * 2013-04-12 2015-12-02 Alcatel Lucent Procédé et dispositif de détection automatique et de suivi d'un ou de plusieurs objets d'intérêt dans une vidéo
CN105469379B (zh) * 2014-09-04 2020-07-28 广东中星微电子有限公司 视频目标区域遮挡方法和装置
EP4285363A1 (fr) * 2021-01-28 2023-12-06 InterDigital CE Patent Holdings, SAS Procédé et appareil d'édition de multiples séquences vidéo

Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
KR20050022306A (ko) * 2003-08-29 2005-03-07 삼성전자주식회사 영상에 기반한 사실감 있는 3차원 얼굴 모델링 방법 및 장치
WO2006055512A2 (fr) * 2004-11-17 2006-05-26 Euclid Discoveries, Llc Appareil et procede de traitement de donnees video
US20060244757A1 (en) * 2004-07-26 2006-11-02 The Board Of Trustees Of The University Of Illinois Methods and systems for image modification

Family Cites Families (13)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US5999173A (en) * 1992-04-03 1999-12-07 Adobe Systems Incorporated Method and apparatus for video editing with video clip representations displayed along a time line
US5850352A (en) * 1995-03-31 1998-12-15 The Regents Of The University Of California Immersive video, including video hypermosaicing to generate from multiple video views of a scene a three-dimensional video mosaic from which diverse virtual video scene images are synthesized, including panoramic, scene interactive and stereoscopic images
US5907626A (en) * 1996-08-02 1999-05-25 Eastman Kodak Company Method for object tracking and mosaicing in an image sequence using a two-dimensional mesh
US6956573B1 (en) * 1996-11-15 2005-10-18 Sarnoff Corporation Method and apparatus for efficiently representing storing and accessing video information
US6587156B1 (en) * 1999-04-16 2003-07-01 Eastman Kodak Company Method for detecting mosaic fades in digitized video
US6819318B1 (en) * 1999-07-23 2004-11-16 Z. Jason Geng Method and apparatus for modeling via a three-dimensional image mosaic system
US6788333B1 (en) * 2000-07-07 2004-09-07 Microsoft Corporation Panoramic video
AU2003226140A1 (en) 2002-03-27 2003-10-13 The Trustees Of Columbia University In The City Of New York Methods for summarizing video through mosaic-based shot and scene clustering
US7006706B2 (en) * 2002-04-12 2006-02-28 Hewlett-Packard Development Company, L.P. Imaging apparatuses, mosaic image compositing methods, video stitching methods and edgemap generation methods
US7289662B2 (en) * 2002-12-07 2007-10-30 Hrl Laboratories, Llc Method and apparatus for apparatus for generating three-dimensional models from uncalibrated views
RU2358319C2 (ru) * 2003-08-29 2009-06-10 Самсунг Электроникс Ко., Лтд. Способ и устройство для фотореалистического трехмерного моделирования лица на основе изображения
US7382897B2 (en) * 2004-04-27 2008-06-03 Microsoft Corporation Multi-image feature matching using multi-scale oriented patches
US7460730B2 (en) * 2005-08-04 2008-12-02 Microsoft Corporation Video registration and image sequence stitching

Patent Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
KR20050022306A (ko) * 2003-08-29 2005-03-07 삼성전자주식회사 영상에 기반한 사실감 있는 3차원 얼굴 모델링 방법 및 장치
US20060244757A1 (en) * 2004-07-26 2006-11-02 The Board Of Trustees Of The University Of Illinois Methods and systems for image modification
WO2006055512A2 (fr) * 2004-11-17 2006-05-26 Euclid Discoveries, Llc Appareil et procede de traitement de donnees video

Cited By (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN103155538A (zh) * 2010-10-05 2013-06-12 索尼电脑娱乐公司 图像显示装置及图像显示方法
US9124867B2 (en) 2010-10-05 2015-09-01 Sony Corporation Apparatus and method for displaying images
CN103155538B (zh) * 2010-10-05 2015-11-25 索尼电脑娱乐公司 图像显示装置及图像显示方法
US9497391B2 (en) 2010-10-05 2016-11-15 Sony Corporation Apparatus and method for displaying images

Also Published As

Publication number Publication date
US20090285544A1 (en) 2009-11-19
CN102100063B (zh) 2013-07-10
WO2009151755A3 (fr) 2010-02-25
CN102100063A (zh) 2011-06-15
US8824801B2 (en) 2014-09-02

Similar Documents

Publication Publication Date Title
US8824801B2 (en) Video processing
Habermann et al. Livecap: Real-time human performance capture from monocular video
Hasson et al. Leveraging photometric consistency over time for sparsely supervised hand-object reconstruction
Dou et al. Fusion4d: Real-time performance capture of challenging scenes
Wu et al. Real-time shading-based refinement for consumer depth cameras
Thies et al. Real-time expression transfer for facial reenactment.
Valgaerts et al. Lightweight binocular facial performance capture under uncontrolled lighting.
Wu et al. On-set performance capture of multiple actors with a stereo camera
Zhang et al. Spacetime stereo: Shape recovery for dynamic scenes
Shi et al. Automatic acquisition of high-fidelity facial performances using monocular videos
Gall et al. Optimization and filtering for human motion capture: A multi-layer framework
Chang et al. Global registration of dynamic range scans for articulated model reconstruction
Beeler et al. High-quality passive facial performance capture using anchor frames.
US9036898B1 (en) High-quality passive performance capture using anchor frames
Wu et al. Full body performance capture under uncontrolled and varying illumination: A shading-based approach
Rav-Acha et al. Unwrap mosaics: A new representation for video editing
Pressigout et al. Hybrid tracking algorithms for planar and non-planar structures subject to illumination changes
Wang et al. Flow supervision for deformable nerf
Tsoli et al. Patch-based reconstruction of a textureless deformable 3d surface from a single rgb image
Chen et al. Kinect depth recovery using a color-guided, region-adaptive, and depth-selective framework
Li et al. Three-dimensional motion estimation via matrix completion
US11080861B2 (en) Scene segmentation using model subtraction
JP6806160B2 (ja) 3次元運動評価装置、3次元運動評価方法、及びプログラム
Kim et al. Multi-view object extraction with fractional boundaries
Park et al. Virtual object placement in video for augmented reality

Legal Events

Date Code Title Description
WWE Wipo information: entry into national phase

Ref document number: 200980128273.1

Country of ref document: CN

121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 09763047

Country of ref document: EP

Kind code of ref document: A2

NENP Non-entry into the national phase

Ref country code: DE

122 Ep: pct application non-entry in european phase

Ref document number: 09763047

Country of ref document: EP

Kind code of ref document: A2