EP1097432A1 - Balayage automatique de scenes en 3d provenant d'images mobiles - Google Patents

Balayage automatique de scenes en 3d provenant d'images mobiles

Info

Publication number
EP1097432A1
EP1097432A1 EP99935733A EP99935733A EP1097432A1 EP 1097432 A1 EP1097432 A1 EP 1097432A1 EP 99935733 A EP99935733 A EP 99935733A EP 99935733 A EP99935733 A EP 99935733A EP 1097432 A1 EP1097432 A1 EP 1097432A1
Authority
EP
European Patent Office
Prior art keywords
image
feature
features
images
points
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Withdrawn
Application number
EP99935733A
Other languages
German (de)
English (en)
Inventor
Arthur Zwern
Sandor Fejes
Jinlong Chen
Roman Waupotitsch
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Geometrix Inc
Original Assignee
Geometrix Inc
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Geometrix Inc filed Critical Geometrix Inc
Publication of EP1097432A1 publication Critical patent/EP1097432A1/fr
Withdrawn legal-status Critical Current

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T7/00Image analysis
    • G06T7/50Depth or shape recovery
    • G06T7/55Depth or shape recovery from multiple images
    • G06T7/579Depth or shape recovery from multiple images from motion
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/10Image acquisition
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2207/00Indexing scheme for image analysis or image enhancement
    • G06T2207/10Image acquisition modality
    • G06T2207/10016Video; Image sequence
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V2201/00Indexing scheme relating to image or video recognition or understanding
    • G06V2201/12Acquisition of 3D measurements of objects

Definitions

  • the present invention generally relates to image-based motion estimation and detection systems and more particularly relates to methods and systems for modeling 3D scene from motion images produced by a video imaging system.
  • Factorization requires that every visible feature be corresponded between every frame in the video sequence in order to completely fill the factorization matrix. This typically limits the maximum camera motion severely, and may make it difficult to use the approach for generating a single 3D model containing of all sides of an object. It also limits the density of the extracted polygonal mesh, since any visual feature which can not be corresponded between all frames of the image sequence must be ignored.
  • Factorization is highly sensitive to feature tracking errors, as even a single mis-tracked feature dramatically modifies the entire extracted 3D structure. This limits factorization to use only with the most salient features in an image sequence, resulting in sparse 3D point clouds.
  • Factorization involves significant camera model approximations and assumptions (such as orthographic, weak perspective, or paraperspective projection), which can introduce significant error outside of controlled laboratory demonstrations. Since factorization using true-perspective projection is non-linear and often fails to converge, most factorization approaches use weak-perspective. Weak perspective only yields the correct shape of an object when the object has a very small depth compared to its distance to the camera - a situation which can only be approximated for real objects when they are ideally infinitely distant.
  • the factorization process is an interesting approach and provides one of the solutions in the automated photogrammetry. Nevertheless, the assumptions and conditions are too restrictive and unrealistic in view of many practical applications. Thus, it will be a desirable significant advancement if the factorization process could be used to provide practical solutions when the above limitations are overcome.
  • the present invention relates to techniques that provide for automatically generating fully-textured 3D models of objects from a sequence of motion images.
  • a 3D modeling system employing the invention disclosed herein can be used to model 3D objects or targets in a wide ranges from a simple man-made part to a natural scene.
  • motion images are generated using a video camera or still photo camera that is moved gradually around or relatively against an object.
  • a salient feature operator is applied to only an initial image or those images that appear to have lost some of the features being tracked.
  • a tracking of these salient features is carried out using multi-resolution feature structures generated for each of the salient features.
  • a features tracking map is used to construct feature blocks, each is then provided as input to a factorization process that is used in a feedback correction system.
  • results from the factorization process are used recursively to adjust image positions of the features to emulate the orthographic projections so as to derive valid camera motion segments that are then assembled to obtain the complete motion .
  • the use of the orthographic factorization embedded in the recursive feedback framework provides a mechanism to obtain the accurate camera motion and 3D points from a true perspective camera.
  • a global optimization technique such as a non-linear optimizing methodology, is used to refine 3D coordinates of the 3D points in accordance with the obtained camera motion so as to minimize their back-projection errors with respect to their original locations.
  • a plurality of dense points are detected and then tracked using the constraints by epipolar lines in conjunction with the knowledge of the camera motion to avoid extensive detection and reduce false matches of these dense points.
  • the 3D positions of the dense points are then estimated by triangulation.
  • a mesh model is finally built upon the dense points by computing the 3D Delaunay triangulation.
  • a mechanism in generating texture mapping for the mesh model, is provided to export the patches assembling the mesh model in a commonly used image file format.
  • the patches can be subsequently modified independently with an image processing application.
  • the texture mapping process described herein can be implemented to take advantage of the graphics accelerator architecture commonly in most computer systems. Redirecting the graphics accelerator to draw into a buffer in memory rather than the buffer for the monitor can yield a much more efficient mapping of the textures, hence high performance of the overall system
  • the invention can be implemented in numerous ways, including a method, a system and a computer readable medium containing program code for automatically generating a fully- textured 3D model of an object without extensive knowledge, intensive labors and expensive equipment.
  • the advantages of the invention are numerous. Different embodiments or implementations may yield one or more of the following unique advantages and benefits.
  • the feature extraction mechanism uses a salient feature operator to accurately and unbiasedly locate salient features based on a 3D interpretation of the image intensity/color.
  • the tracking mechanism uses multi-resolution feature structures that provide an effectively large search area yet precise location of all salient features being tracked.
  • the tracking mechanism is capable of handling perspective distortions or other view changes of the features, reacquiring lost features when needed and fully adaptively decimating high rate video frames to reduce redundant input data while still maintaining sufficient feature correspondence.
  • Another one of the important advantages and benefits in present invention is the use of a factorization approach under orthography.
  • a feedback system emulates the orthographic camera model by iteratively "correcting" the perspective camera model so that the factorization approach provides practical and accurate solutions.
  • Figure 1 demonstrates a system in which the present invention may be practiced
  • Figure 2 shows a block diagram of a preferred internal construction of computer system that may be used in the system of Figure 1 ;
  • Figure 3 illustrates a 3D drawing of an intensity image that includes a white area and a dark area
  • Figure 4A shows two exemplary consecutive images and successively received from an imager
  • Figure 4B shows that an exemplary multi-resolution hierarchical feature structure for extracting a feature in one of the images in Figure 4A;
  • Figure 4C shows K image structures from a single image and each of image structures is for one feature
  • Figure 4D shows, as an example, what is called herein a
  • features tracking map or simply features map
  • Figure 4E shows a flowchart of the feature extraction process
  • Figure 4F shows a series of images are receiving from the imager and images at L-th, 1 L-th, 2L-th, ... frame are regularly used for feature and extraction;
  • Figure 4E illustrates a template update in feature tracking among a set of consecutive images
  • Figure 5 shows a flowchart of a camera motion estimation process
  • Figure 6A shows a features map being divided into individual feature blocks, each pair of the feature blocks overlaps
  • Figure 6B shows displacement Tfq of a scene point Pfq projected onto two adjacent images
  • Figure 6C shows an implementation of the camera motion estimation process using the factorization method
  • Figure 6D illustrates how a cube is projected under an orthographic and respective projection, respectively
  • Figures 7A-7C show, respectively, a process of combining the camera motion from a number of camera motion segments concatenated over an overlapping portion and an exemplary resultant camera motion ;
  • Figure 8 shows a flowchart of the depth mapping process disclosed herein;
  • Figure 9A illustrates a house image being detected for the line features;
  • Figure 9B shows a point P in the object being projected on two adjacent image planes
  • Figure 9C illustrates a flowchart of generating a self-constraint and interconnected triangular mesh model based on the Delaunay triangulation
  • Figure 10A shows a process flowchart of applying the texture patterns to a mesh model
  • Figure 10B shows a flowchart of the textured patch generation process according to one embodiment of the present invention.
  • Figure 11 A shows a group of triangles being assigned to respective side view images
  • Figure 11 B illustrates that a patch is growing with every newly added triangle.
  • Figure 1 demonstrates a system 100 in which the present invention may be practiced.
  • An object 102 is typically large and may not be feasible to be placed on a turntable to be rotated while being imaged.
  • the object may include, but may not be limited to, a nature scene, terrain, man- made architecture and parts.
  • a user or operator will carry a camera or an imager and produce a sequence of images in a format of video frames or a sequence of pictures by gradually moving the imager around or relatively against the object.
  • an imager is attached to a flying vehicle if a particular area of urban terrain needs to be modeled. A sequence of images of the particular area is thus generated when the flying vehicle flies over the urban terrain.
  • the object 102 is assumed to be a building (e.g. a tower in the figure).
  • Imager 104 may be a video camera whose focal length is, preferably, set to a fixed known position, when the surrounding imagery is generated. Imager 104 is coupled to computer system
  • the frame grabber digitizes each of the video frames received from imager 104 to produce a sequence of digital images Ci, C 2 , ... CN, typically in a commonly used color format, coordinates or space.
  • R, G, and B color image data representation is not necessarily the best color space for certain desired computations, there are many other color spaces that may be particularly useful for one purpose or another.
  • HIS hue, intensity, and saturation
  • Other possible coordinates that may possess similar characteristics to HIS may include LuV and La*b*.
  • computer system 106 receives color images in the format of the RGB space.
  • Computer system 106 may be a computing system that may include, but not be limited to, a desl top computer, a laptop computer or a portable device.
  • Figure 2 shows a block diagram showing an exemplary internal construction of computer system 106.
  • computer system 106 includes a central processing unit (CPU) 122 interfaced to a data bus 120 and a device interface 124.
  • CPU 122 executes certain instructions to manage all devices and interfaces coupled to data bus 120 for synchronized operations and device interface 124 may be coupled to an external device such as imaging system 108 hence image data therefrom are received into a memory or storage through data bus 120.
  • Also interfaced to data bus 120 is a display interface 126, network interface 128, printer interface 130 and floppy disk drive interface 138.
  • a complied and linked version of one embodiment of the present invention is loaded into storage 136 through floppy disk drive interface 138, network interface 128, device interface 124 or other interfaces coupled to data bus 120.
  • Main memory 132 such as random access memory (RAM) is also interfaced to data bus 120 to provide CPU 122 with the instructions and access to memory storage 136 for data and other instructions.
  • RAM random access memory
  • CPU 122 when executing stored application program instructions, such as the complied and linked version of the present invention, CPU 122 is caused to manipulate the image data to achieve desired results.
  • ROM (read only memory) 134 is provided for storing invariant instruction sequences such as a basic input/output operation system (BIOS) for operation of keyboard 140, display 126 and pointing device 142 if there are any.
  • BIOS basic input/output operation system
  • One of the features in the present invention is to provide an automatic mechanism that extracts and tracks only the most salient features in the image sequence, and use them to automatically generate the motion of the imager.
  • the features used in the present invention are those that are characterized as least altered from one frame to an adjacent frame and can be most accurately located in the image, for example, salient comer-like features in each of the image frames.
  • the present invention uses a salient feature operator to detect the features only in an initial image or those images that appear to have lost some of the features being tracked.
  • the present invention utilizes multi-resolution hierarchical feature tracking to establish features correspondence to the features detected by the salient feature operator.
  • the salient features to be extracted are typically those comer-like features in the images.
  • FIG. 3 illustrates a 3D drawing 202 of an intensity image 200 that includes a white area 204 and a dark area 206.
  • Drawing 202 shows a raised stage 208 corresponding to white area 204 and a flat plane 210 corresponding to dark area 206.
  • Comer 212 is the salient feature of interest whose location change can be the most accurately determined and typically least affected from one frame to a next frame.
  • a salient feature detection processing is designed to detect all the salient features in an image.
  • the salient feature detection processing is to apply a feature detection operator to an image to detect the salient features therein.
  • the feature detection operator or feature operator 0(/) on an image / is a function of the Hessian matrix of a local area of the image that is based on the Laplacian operator performed on the area.
  • Det ⁇ ) is the determinant of matrix H and ⁇ is a controllable scaling constant and:
  • image / is an intensity image that may be an intensity component in the HIS color space or a luminance component derived from the original color image.
  • each of the salient features is presented as a template, such as a 1 1 -by-1 1 or 13-by-13 image template.
  • the characteristics or attributes of a salient feature template may comprise the location of the feature in the image, color information and strength thereof.
  • the location indicates where the detected salient feature or the template is located within the image, commonly expressed in coordinates (i, j).
  • the color information may carry color information of the template centered at (i, j).
  • the strength may include information on how strongly the salient feature is extracted or computed as l f (i, j).
  • each color image is received, it is first transformed to a color space in which the luminance or intensity component may be separated from the chrominance components.
  • the color image conversation is only needed when the original color image is presented in a format that is not suitable for the feature extraction process.
  • many color images are in the RGB color space and therefore may be preferably transformed to a color space in which the luminance component may be consolidated into an image.
  • the above feature operator is then applied to the luminance component to produce a plurality of the salient features that preferably are indexed and kept in a table as a plurality of templates.
  • Each of the templates may record the characteristics or attributes of each feature.
  • N corresponding feature tables each comprising a plurality of the salient features.
  • the tables can then be organized as a map, referred to herein as a features tracking map, that can be used to detect how each of the features is moving from one image frame to another.
  • a multi-resolution hierarchical feature structure is used to extract the features for tracking.
  • Figure 4A shows two consecutive images 402 and 404 are successively received from imager 104. After the salient feature operator is applied to image 402, it is assumed that one feature 406 is detected and the characteristics thereof are recorded. When second image 404 comes in, a multi-resolution hierarchical image pyramid from the image is generated.
  • Figure 4B shows an exemplary multi-resolution hierarchical feature structure 408 for extracting feature 406 in image 404.
  • image layers 410 e.g. L layers
  • Each of the image layers 410 is successively generated from the original image 404 by a decimation process around the feature location.
  • layer 410-L is generated by decimating layer 410-(L-1 ).
  • the decimation factor is typically a constant, preferably equal to 2.
  • an approximate search area for the feature can be defined in the second image and centered at the original location of the feature. More specifically, if feature 406 is located at coordinates (152, 234) in image 402, the window to search for the same feature may be defined as a square centered at (152, 234) in image 404.
  • the window size is predefined but the motion of the imager is unknown, there can be situations in which the feature may fall out of the predefined search window, resulting in the loss of the feature.
  • One intuitive approach is to enlarge the search window so that the feature can be detected within the window.
  • the processing time is quadraticly proportionally increased.
  • Multi-resolution hierarchical feature structure 408 shows that a sought feature can be extracted even if it happens to fall out of a predefined search window without increasing the processing time.
  • the resolution of each of layers 410 decreases.
  • search area is essentially enlarged.
  • search window 412 covers relatively a larger area in layer 410-L than in layer 410-(L-1).
  • layer 410-L is first used to find an approximated location of the feature within search window 412.
  • One of the available methods for finding the location of the corresponding function in the consecutive images is to use a template matching process.
  • the template is defined as typically a square image region (1 1 -by-1 1 to 15-by-15) centered at the location of the original feature extracted by the salient feature operator. Then the corresponding subpixel accurate location of the match can be found at that position where the normalized cross-correlation of the two corresponding images regions is the largest (ideally "1 " for a complete match).
  • Layer 410-(L-1 ) is then used to refine the approximated location of the feature within the closest area in the same window size and finally layer 410 is used to precisely determine the exact location (x, y) of the feature. It can be appreciated that the use of the feature structure has many advantages over prior art feature extraction approaches. In essence, an effectively larger representation of the feature template can be achieved, which makes it possible to track a feature effectively and precisely and is directly suitable to the hierarchical tracking mechanism.
  • FIG. 4C shows K feature structures 420 from a single image, each of feature structures 420 is for one feature.
  • a set of attributes F((7) describing each of the K features are produced and may comprise information of the location, strength and color of the feature.
  • Figure 4D shows what is called herein a "features tracking map", or simply, a features map that illustrates collectively all the features found for N images and is used for tracking the features so as to estimate the motion of the imager.
  • Figure 4E shows a flowchart of the feature extraction process. Both of Figures 4D-4E are described conjointly to fully understand the feature detection and tracking process in the present invention.
  • color images are successively received from the imager.
  • a dominant component preferably the luminance or intensity component is extracted from the color images at 454.
  • the color images are simply transformed to another color space that provides a separate luminance component.
  • the process looks up, for example, a memory area, for any features or feature templates stored there. If there are sufficient number of feature templates in the memory area, that means that the process needs to proceed with feature tracking in the next image, otherwise, the process needs to check if new features must be extracted at 458.
  • the first received image always invokes the feature extraction operation with the salient feature operator as there are no stored features or feature templates to perform the feature tracking process. So the process now goes to 460.
  • the feature extraction process generates K features in the received image (e.g. frame #1 ). As illustrated in Figure 4D, there are K features in the received image frame #1 .
  • the attributes of the K features, as feature templates, are stored in a memory space for subsequent feature extraction process.
  • the process goes to 464 to generate the multiple-resolution hierarchical image pyramid preferably having the newly arrived image as the base.
  • the tracking process searches for locations in the image pyramid which demonstrate most similarity to the respective layers of the feature templates stored in the feature structures.
  • K or less corresponding features are localized from each corresponding layer in the image pyramid at 466 and the K feature locations are then collected and appended to the features map for frame 2.
  • the process goes to 462 via 456 repeatedly to extract K features from each of the n1 frames.
  • the imager may have been moved around the object considerably with respect to the initial position from which the first image is captured.
  • Some of the K features may not necessarily be found in those late generated images. Because of the perspective changes and motion of the imager, those features may be either out of the view or completely changed so that they could be no longer tracked. For example, a corner of a roof of a house may be out of the view or lose its salient feature when viewed from a particular perspective. Therefore, the representation 430 of the K features for n1 images in Figure 4D shows the dropping of the number of the features.
  • the generation of features is invoked when the number of features drops exceeds a predefined threshold (T).
  • T a predefined threshold
  • the process goes to 458 to determine if it is necessary to extract new features to make up the K features.
  • new features may have to be extracted and added to maintain sufficient amount of features to be tracked in an image.
  • the process restarts the feature detection at 460, namely applying the salient feature operator to the image to generate a set of salient features to make up for those that have been lost.
  • the process is shown, as an example, to restart the feature detection at frame n1 in
  • the process makes an attempt to reduce inter-frame decimation by what is called "back tracking" and load a preceding image frame successively at 471 until a sufficient number of correspondence is recovered.
  • the imager produces 30 frames of image per second.
  • a number of consecutive images possess high correlation between each other and provide mostly redundant information as to how the features are moved from one frame to another. Therefore to eliminate redundant input data, the incoming images are sampled at a predefined rate that can be an integer starting from 1 .
  • Figure 4F shows a series of images 480 are receiving from the imager. Generally, images at frame L-th, 1 L-th , 2L-th, ... are actually used.
  • the feature tracking process in Figure 4E performs a back tracking at 471 before applying the salient feature operator to image 482 to generate additional feature templates at 460. Specifically, skipped images before image 482 may be backtracked to determine exactly from which image the features are actually disappeared.
  • an immediate preceding image 484 or a middle preceding image 483 is now retrieved for feature tracking. If the features in sought are still not found in image 483 or 484, the process goes repeatedly from 470 to 462 and back to 471 via 456 and 458 to sequentially retrieve images for feature tracking till an image frame is found between 484 and 485 with sufficient number of feature correspondence.
  • the advantage of the backtracking provides the benefit of automatic determination of the necessary lowest frame sub- sampling rate at which the sufficient number of feature correspondences can still be maintained.
  • the feature templates to be matched with consecutive images remain as the original set in tracking the features in subsequent images and do not change from one frame to another.
  • establishing feature correspondence between consecutive image frames can be accomplished by two ways. One is to achieve this in directly consecutive image pairs, the other one is by fixing the first frame as reference and finding the corresponding locations in all other frames with respect to this reference frame.
  • the second approach is used since it minimizes possible bias or drifts in finding the accurate feature locations, as opposed to the first approach where significant drifts can be accumulated over several image frames.
  • the second approach permits only short-lived feature persistence over a few frames as the scene viewed by the camera undergoes large changes of view as the camera covers large displacements, which ultimately causes the tracking process proceed to 472.
  • a feature template update mechanism is inco ⁇ orated in 474.
  • the templates of the lost features are replaced by the ones located in the most recent frame 492 in which they have been successfully tracked, i.e. at 494.
  • the template update at 474 of Figure 4E provides the benefits that features can be successfully tracked even if they may have had a significant perspective view change by minimizing accumulative drift typical for the first approach.
  • Figure 4D shows, respectively, feature sets 432 - 436 for images at frame number n1 , n2, n3, n4, n5 and n6.
  • the frame number n1 , n2, n3, n4, n5 and n6 may not necessarily have an identical number of frames in between.
  • some of the features may reappear in some of the subsequent images, are shown as 438-440, and may be reused depending on the implementation preference.
  • the process ensures that all the frames are processed and features thereof are obtained. As a result, a features map, as an example in Figure 4D, is obtained.
  • Estimation of camera motion as disclosed herein is an automatic process to detect from a sequence of images the actual motion parameters (translation and rotation) of the camera or imager that has traveled to produce the sequence of images.
  • the estimation of the camera motion has many applications, such as to combine computer graphics with live video footage, also known as match movie application in movie production or indoor/outdoor robot navigation. Those skilled in the art will appreciate that the process described below can be used independently.
  • Figure 5 shows a flowchart of the camera motion estimation process 500 and should be understood with Figures 6A-6D.
  • a features map having characteristics similar to the example in Figure 4D is used in Figure 6A to describe process 500.
  • features are grouped respectively.
  • features extracted from a number of successive images are grouped into a respective feature block.
  • a group of features 430 and 432 are respectively collected as a feature block 602 and 604.
  • a feature block of K features and n frames is expressed as a 2K-by-n feature matrix in the following: l l *12 • • • xln
  • (xij, yij) is the coordinates of a feature
  • i is a i-th feature
  • j is a j-th frame.
  • the size of the overlapping may be, for example, 10 to 30 features versus 3 to 10 frames.
  • the first and last few columns of the above feature matrix are generally for features in the overlapping.
  • the overlapping provides information to concatenate camera motion segments derived respectively from each feature block.
  • a complete camera motion comprises a number of small motion segments that are each respectively derived from one of the feature blocks. As shown in the figure, there is an overlapping between each pair of two adjacent feature blocks, such as overlapping 606 between feature blocks 610 and 612 to provide information to concatenate two motion segments from feature blocks 610 and 612, so as to form a concatenated motion of the camera.
  • process 500 adjusts the positions of the features in the feature block with feedback information from 515.
  • the positions of the features as well as the detailed feedback information will be described below.
  • process 500 does not adjust the positions of the features in the feature block but instead transfer the features directly to factorization process 510 to compute initial solutions.
  • the factorization process at 510 is an intermediate process that can recover shape and motion from a series of image under orthography.
  • the detailed description is provided by Tomasi and
  • Kanade "Shape and Motion from Image Streams under Orthography: a Factorization Method," International Journal of Computer Vision Volume 912, 1992, pp.137-154, which is hereby inco ⁇ orated by reference in its entirety.
  • the factorization process takes as an input a feature matrix and outputs a camera rotation and object shape.
  • a feature matrix for a feature block having a size of K-by n is a 2K-by-n matrix.
  • the 2K-by-n matrix is then factored, under certain constraints, into a product of two matrixes R and S, where R is a 2K-by-3 matrix that represents the camera rotation and S is a 3-by-n matrix that represents shape in a coordinates system in which the object is positioned.
  • R is a 2K-by-3 matrix that represents the camera rotation
  • S is a 3-by-n matrix that represents shape in a coordinates system in which the object is positioned.
  • Tfq represents the displacement of a scene point located at Pfq projected onto the image.
  • the initial estimate of the average absolute distance Zo between an image and the object is computed. It should be pointed out that both of the focal length and the average absolute distance Zo need not to be accurate and will be refined to a predefined accuracy through an iterative feedback adjustment, which is described in detail below.
  • an iterative feedback mechanism is used to adjust the coordinates of the features as the input to factorization process at 510.
  • the underlying feedback mechanism may be better understood in accordance with Figure 6C which is implemented as one embodiment of the present invention.
  • normal perspective camera 630 means a regular video camera that is used to generate the sequence of images.
  • normal perspective camera 630 goes through an orthographic correction 632 of the feature locations.
  • the outputs from factorization 634 are further refined by least square (LS) triangulation using the average depth obtained through the focal length characterizing camera 630.
  • the refined outputs are then fed back to correction 632 that further adjusts perspective data of normal perspective camera 630 to orthographic data until the unique conditions in which the factorization method 634 works right are closely approximated, hence resulting in correct outputs.
  • the adjustment of the perspective data is done by extending the positions of the features outward from the image center as a function of their distances from the camera.
  • Figure 6D illustrates how a cube 650 is projected and corrected through perspective correction 632.
  • a perspectiveiy distorted cube 652 will be produced in an image due to the different distance of the points on cube 650 from the camera.
  • the unique condition for the factorization method 634 to work right is the orthographic projection
  • noticeably large errors will inevitably result when a regular image (e.g. cube 652) is provided.
  • the errors are used to adjust the image so that an adjusted image gets closer to an image obtained under the orthographic projection.
  • the adjusted image is then provided to factorization method 634, smaller errors will be produced, the errors are then again used to further adjust the image.
  • the feedback processing keeps going on (e.g. 10 loops) until the adjusted image gets very close to an image that would be otherwise produced under the orthographic projection.
  • the outputs from factorization 634 include Rfq, Tfq, Cfq and Pfq for each of the image frames.
  • Rfq and Pfq, rotation information and scene coordinates of a feature are corresponding elements in the rotation matrix R and shape matrix S.
  • Tfq and Cfq are the corresponding translation and scaling factor.
  • Pfq and the approximated focal length is used to refine the average distance Zo in 636 that is provided to the least square estimation of 3D points Pe 636. With the refined 3D points Pe, the averaged distance Zo, the camera translation is refined to Te using the least square triangulation .
  • the rotation Rfq or the rotation matrix R are not further refined as it has been produced with the refined Te and Pe.
  • all the refined values are iteratively provided as feedback signals to correction 632 so that subsequent refined values become accurate enough to derive the camera motion .
  • process 500 checks if the feature map is complete at 516. In other words, a feature block including the last image frame shall be processed and the refined rotation matrix R and shape matrix S thereof are derived.
  • each feature block produces a set of accurate rotation Re and 3D points Pe.
  • each of the feature blocks 702 produces a camera motion segment for the corresponding frame interval, which includes the camera positions and orientations in the particular image frame.
  • feature block 702-1 includes a number of image frames and each of the frames corresponds to a vertex in the camera motion segment 704-1. It is understood that, except for the first and the last ones, each of the camera motion segments 704 has a certain number of overlaps with the neighboring ones by construction For example, given an overlapping of 3 frames, the last 3 vertices of motion segment 704-1 should coincide with the first 3 vertices of motion segment 704-2, and the last 3 vertices of motion segment 704-2 should coincide with the first 3 vertices of motion segment 704- 3.
  • FIG. 7B shows how two motion segments 704-1 and 704-2 are stitched to form a concatenated motion segment710.
  • the overlapping vertices 706 and 708 are used as constraints to determine the common motion . Since vertices 706 and 708 are from the overlapping and hence coincidental, motion segment 704-2 is rotated, translated and scaled to coincide with end vertices 706 of motion segment 704-1 , resulting in a concatenated motion segment 710 with vertices 706 and 708 coincided at 712. With all the motion segments stitched together as described above, the whole camera motion can be obtained.
  • Figure 7C shows an exemplary camera motion .
  • motion segment 704-2 has been rotated, translated and scaled to be stitched with motion segment 704-1.
  • the derived 3D points as well as the camera motion segments are placed in a common coordinate system by rotation, translation, and scale.
  • a global nonlinear optimization 522 is employed to refine the parameters, which reduces the difference between the extracted feature locations and their corresponding backprojected 3D coordinates. This process provides the final, globally optimized rotation and translation of the camera motion and the 3D locations of the features.
  • the errors are examined if they are within the predefined range, otherwise the optimization and adjustment process is repeated till the errors are within the predefined range. Depth Mapping Process
  • the depth mapping process in the present invention is an automatic process to generate high density surface points for a subsequent mesh model generation.
  • Each of the surface points are represented in the scene space.
  • the results from the camera motion process include the rotation Re, translation Te parameters, and 3D coordinates for each of the salient feature points.
  • those salient feature points represent only a small portion of points that would not be sufficient by farto generate a surface wire frame or mesh model of the object.
  • dense points may be expected, which typically include all boundaries and edge points located in high contrast areas.
  • the number of such dense points can be in the range from 1000 to 100,000 for a regular object in the scene space.
  • feature tracking techniques one can establish feature correspondence for these dense features as well, and recover their 3D locations as it is descried more detail below.
  • Figure 8 shows a flowchart of the depth mapping process and should be understood with Figures 6A-6C and 9A-9C.
  • the results of the camera motion are received. They can be used to provide constrained search and assist to determine the geometric location of the dense points by triangulation.
  • a first image frame is retrieved for extracting dense points.
  • the number of currently tracked points is examined. If this number is below a threshold (a particular case for the first frame), the process moves to 808 and 810 to detect any available dense points.
  • First straight line segments are extracted as they are the most accurate and stable features for generating a mesh model. Other advantages of detecting line feature include significant reduction of computation and persistent accuracy as will be appreciated in the following description.
  • FIG. 9A illustrates that a house image 900 is detected for the line features of which line 908 is shown in image 902 and points 906 and 908 represents line 908.
  • the line detection does not result in any points around non-line shaped parts, for example an ellipse 904 in Figure 9A.
  • the dense points other than the line type need to be detected next at 810.
  • a mask is preferably placed around the line area as shown 916 in image 912.
  • an operator is applied to detect those non-line type dense points.
  • the operator may be one of those edge detectors that essentially detects all the significant points located around high contrast areas.
  • Ellipse 904 in Figure 9A is detected as edge points 914 in image 912 when the edge detector is applied, wherein a dotted block 916 shows an exemplary mask around line 910 that has been detected.
  • FIG. 9B there is shown an object point P being projected on two adjacent image planes 920 and 922 at (x1 , y1 ) and (x2, y2) respectively.
  • the camera projection centers are at 925 and 927. Together with the object point P, the two points 925 and 927 forms an epipolar plane 928 that intersects both image planes 920 and 922.
  • the intersecting lines 921 and 923 are the epipolar lines for image points 924 and 926.
  • the process in Figure 8 has detected a dense point characterizing the projected point 924 at (x1 , y1 ) in image plane 920, the coordinates (x2, y2) of the projected point 926 in image plane 922must lie on the epipolar line.
  • the 3D point must be projected onto this epipolar line in the second consecutive image as it is illustrated in Figure 9B.
  • the problem of tracking lines segments between frames is reduced to tracking sparsely subsampled points of the line segments and then robustly detecting line segments on the basis of tracked points in the second image.
  • a next image is obtained at 812 for tracking the dense points along with respective epipolar lines at 814.
  • the match along an epipolar line can be found by performing sub-pixel accurate template matching using, for example, normalized correlation computation.
  • the 3D coordinates of the dense line points and those non-line points extracted above are respectively reconstructed using LS triangulation.
  • the process needs to checks if all the images have been processed for the dense points. If there are still one or more images that shall be proceeded, the process goes back to 806. As there are sufficient detected line and non-line feature points, the process will proceed with 812 to 820.
  • a description of the surface of the object is needed.
  • a mesh model of the object is a desired description of the surface, as it provides the information how each localized area is oriented and positioned in a scene space so that corresponding texture information may be applied thereto to generate subsequently a fully textured 3D model.
  • a mesh model may be used as a basis to create a display or reproduction of the real world object and generate other displays such as "morphs", fantasy or special effects.
  • the generation of a 3D mesh model is a process that generates a mesh model of an object by dividing its 3D surface into a set of small triangular (or quadrilateral) elements.
  • the input to the process is a list of the dense points obtained from the depth mapping described above, and the output of the process is a list of facets of the convex hull of the points with vertices defined at the point locations.
  • the process is based on computing the 3D Delaunay triangulation that is a well known methodology to compute a triangular mesh based on a set of surface dense points.
  • Figure 9C shows a flowchart of generating a mesh model based on the 3D Delaunay triangulation. Since the 3D Delaunay triangle is defined only on 3D points, each line segment needs to be subsampled into points. Letting the subsampling density sufficiently high, the triangulation will likely have a set of edges connecting the points of the subsampled line segments with each other, which coincide with the original underlying line segment as preferred.
  • the dense points including the line and non-line type feature points are obtained from the above depth mapping process.
  • those line type feature points are identified and sub-sampled into sparse feature points, before coming to 948 to compute the 3D Delaunay triangulation.
  • the 3D Delaunay triangles are computed based on the individual feature points.
  • the triangle facets computed by the Delaunay triangulation that are based on the supplied dense points may include invalid triangles that do not correspond to true physical surface of the object in a scene in addition to unusual triangles, such as very elongated or inproportional triangles.
  • unusual triangles such as very elongated or inproportional triangles.
  • the 3D Delaunay triangulation generates a set of tetrahedrons which occupies the convex hull of the set of dense 3D feature points. Therefore, it usually contains many triangle facets which do not correspond to true physical surfaces of the scene being observed.
  • a sequence of post-processing steps has to be performed on the 3D triangular mesh which is based on using various geometric constraints.
  • three steps in the postprocessing are applied.
  • constraints based upon the visibility of image features are used to eliminate those triangles that can not be valid surface. Specifically, each feature point visible in the input sequence has to be also visible in the generated 3D mesh from the same viewpoints. No triangle facet generated in the mesh is allowed to occlude the point from any of those camera viewpoints where the point was visible in that image. If such a visible point is occluded by any triangles, the triangles have to be removed from the mesh.
  • texture consistency of the triangles across several views is applied at 954 to check the consistency of the texture.
  • a triangle facet of the surface mesh corresponds to a true physical surface patch
  • the texture defined by projecting the images on the triangle from all the views where the triangle is visible has to be consistent.
  • the image projections from the visible views may define different texture maps on that triangle. For instance, if a triangle facet has been defined through a point on a roof of a house, a point on the ground and a point on a tree, then one image frame may project the side wall of the house as texture on the triangle, whereas an other view may project the sky. This inconsistency indicates that this triangle can not be a true physical surface therefore it should not be included in the 3D mesh.
  • the check of texture inconsistency can be performed by e.g. computing the normalized cross-correlation of the corresponding texture maps.
  • the final surface mesh can be obtained which corresponds to the preferably true physical surface estimated from the given sequences of image frames.
  • the three steps at 950, 952 and 954 are exemplary steps to post-process the triangles computed from the Delaunay triangulation. There can be other approaches know to those skilled in the art to further refine the mesh model to a predefined degree of refinement.
  • the next step at 956 is to add texture patterns to the mesh model to enhance the realism of the model.
  • the process itself is called texture mapping, an image synthesis technique in which a 2D image, also known as a texture image, is mapped onto a surface of a 3D mesh model.
  • texture mapping an image synthesis technique in which a 2D image, also known as a texture image, is mapped onto a surface of a 3D mesh model.
  • FIG 10A shows a process flowchart of applying the texture patterns to the mesh model.
  • a mesh model is received and preferably described in triangles.
  • these polygons are triangular, in other modes, they may be rectangular, hexagonal or the like.
  • special steps may be required to ensure that all of the vertices lie within a common plane.
  • higher order polygons can be reduced to triangles (polygons of order 3) for convenience in processing.
  • the mesh model is assumed to be of triangles and those skilled in the art will appreciate that the description herein is equally applied to a mesh model with polygons of order greater than three.
  • the mesh model may be modified at 1004, depending on a desired resolution or a degree of refinement.
  • the approach used at 1004 may include a decimation process which according to a set of rules reduces the number of triangles to facilitate an efficient and effective texture mapping process to be followed.
  • the rules may include a normal comparison between two or more neighboring triangles. If a normal of one triangle is similar to a neighboring triangle within a predefined degree of refinement, the corresponding triangle may be merged together with the neighboring triangle.
  • a user may subdivide the mesh model into one or more logic parts for texture mapping at 1004 either within the current process or using a commercially available tool, such as 3D Studio MAX in which the mesh model can be displayed and interacted with.
  • each of the triangles is assigned to a side view image Ci.
  • Figure 11 A shows a group of triangles being assigned to respective side view images.
  • a surrounding view of the object has been captured in a number of side view images C1 , C2, ... CN, each taken at a known position relative to the object.
  • each of the triangles can be respectively assigned to one of the side view images C1 , C2, ... CN.
  • a visibility test is applied for every triangle and a side view in order to ensure that the triangle is visible from the chosen side view. If the triangle is not visible from the chosen side view, an alternative side needs to be selected.
  • each triangle assigned to a side view image is mapped to/with the side view image for texturing, namely with the patch corresponding to the portion of texture information for the triangle.
  • a local blending process is applied to smooth those texture discontinuities. Additional information of process 1006, 1008 and 1010 is provided by W. Niem, et al "Mapping Texture From Multiple Camera Views Onto 3D-Object Models for Computer Animation", the proceedings of the international Workshop on Stereoscopic and Three
  • a patch is a collection of triangles of the mesh with the property that every triangle in the patch shares at least one edge with some other triangle in the same patch.
  • all patches have the properties that the union of all the patches contains all the triangles of the mesh, and that no two patches contain the same triangle.
  • Exporting such patches in image files makes it possible for a user to alter or modify the texture mapping for a particular patch in a desirable way.
  • a 3D modeling system typically, is not designed to model the bottom of a 3D object that is often assumed black or a color extended from what is on the bottom portion of the object. Consequently, the final 3D model loses its realism when its bottom is caused to be displayed.
  • a procedure is provided to generate one or more patches, alternatively, it is to subdivide the mesh into a patch or patches.
  • the detail of 1012 is provided in Figure 10B.
  • an empty patch is created (i.e. a memory space is initiated) and indexed.
  • one of the triangles in the mesh model is chosen as a seed triangle.
  • the seed triangle may be chosen randomly from the triangles that are not included in a patch yet or from a group of local triangles that demonstrate a similar normal.
  • neighboring triangles to the seed triangle are sequentially checked if they have been tested for suitability to be included in the patch that is to be described below. If the neighboring triangles are all tested, that means the patch is finished. Otherwise, the triangles are further respectively tested at 1026 to see if any of the triangles can be added to the patch.
  • Figure 11 B illustrates that a patch is growing with every newly added triangle.
  • triangle 1110 is a seed triangle that begins the patch initiated at 1020.
  • triangle 1112 will be tested to see if it shares at least one edge with the seed triangle. If it is not, it means that the triangle does not belong to the patch or that it may be added to the patch later in the process.
  • neighboring triangle 1114 does not belong to the patch and will be thus discarded for the time being. If triangle 1112 shares one edge with triangle 1110.
  • mapping is created therefore at 1028 of Figure 10B. It should be emphasized that the particular mapping in the current embodiment is based on the orthographic projection from the 3D model to the texture image. For a particular patch, the projection is along the direction of the face normal of the seed triangle. Alternatively, the perspective projection may be used or any other suitable projections may be used.
  • the accepted triangle is further tested to see if it intersects the patch. If it does, the triangle is labeled "tested", and the process goes to 1024 to test another triangle. If the triangle does not intersect the patch, it is now added to the patch at 1034 so that the patch grows one triangle bigger.
  • the patch generation process permits to generate multiple patches.
  • it checks if the entire mesh model has been processed, namely expressed now in a number of patches. If there are still some triangles that have not been put into a patch, then the process goes to 1020 to generate a new patch.
  • the patch generation process in Figure 10B can be implemented by a recursive programming and subsequently produces a number of mutually exclusive patches, each comprising a plurality of triangles that share at least one edge with other triangles in the patch.
  • the process is to create texture image or images. These are the images that store the actual texture. The creation of this image requires that the textures stored for every triangle are projected into the image. In the current embodiment, we accelerate the process by using graphics accelerator architecture. If such architecture is not available, the architecture is emulated by software.
  • the shape of patch 1118 is formed and the textured triangles therein provide a textured patch that can be saved or exported at 1016 in a commonly used image format, such as TIFF (Tag Image File Format) or JPEG (Joint Photographic Experts Group), that can be opened by an image processing application such as PhotoShop.
  • a user can repaint or modify any portion of the textured patch using the PhotoShop that provides sufficient graphic user interface to modify the patch at pixel level.
  • the process described above shows a method for creating contiguous texture patches. Rather than mapping texture to each of the triangles of the mesh model, the process chooses to map the texture from every triangle into a respective portion of the texture image.
  • the texture mapping process described herein can be implemented to take advantage of the graphics accelerator architecture commonly in most computer systems. Redirecting the graphics accelerator to draw into a buffer in memory rather than the buffer for the monitor can yield a much more efficient mapping of the textures.
  • One of the advantages is an economical and efficient 3D modeling system that is low in cost and easy to operate, virtually anywhere within minutes.
  • the modeling system employing the present invention can be used and operated by an ordinary skilled person to generate fully-textured models of 3D objects within a limited time for many applications including Internet commerce and product designs.
  • Another advantage is the MAE scheme that encodes all mask images to make the space carving process nearly independent of the size of images.
  • Still another advantage is the process of generating a mesh model using neighborhood configuration that produces only valid triangles.
  • Still another advantage is the texture mapping process that provides a mechanism to generate exportable patches comprising triangles that can be provided contiguous texture mapping without user intervention.
  • Yet another advantage is the possible implementation of the texture mapping processing on graphics accelerator architecture to redirecte the graphics accelerator to draw into a buffer in memory rather than the buffer for a monitor, yielding a much more efficient mapping of the textures As a result of the texture mapping, a fully-texture 3D model of an object is created.
  • the advantages of the invention are numerous. Several advantages that embodiments of the invention may include are as follows.
  • One of the advantages is the use of efficient feature extraction and tracking mechanisms to track salient features in a sequence of images.
  • the feature extraction mechanism uses a salient feature operator to accurately and unbiasedly locate salient features based on a 3D representation of the image intensity/color.
  • Another advantage is the use of a factorization approach under orthography.
  • a feedback system emulates the orthographic camera model by iteratively "correcting" the perspective camera model so that the factorization approach provides practical and accurate solutions.
  • Still another advantage is the texture mapping process that provides a mechanism to generate exportable patches comprising triangles that can be provided contiguous texture mapping without user intervention.
  • the present invention has been described in sufficient detail with a certain degree of particularity.

Landscapes

  • Engineering & Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • Theoretical Computer Science (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Multimedia (AREA)
  • Image Analysis (AREA)
  • Image Processing (AREA)
  • Testing, Inspecting, Measuring Of Stereoscopic Televisions And Televisions (AREA)
  • Length Measuring Devices By Optical Means (AREA)

Abstract

L'invention concerne un système pour générer automatiquement un modèle en 3 D à textures complètes à partir d'un objet figurant dans des images mobiles. Le système surveille les traits saillants dans les images mobiles sur la base des traits saillants détectés au moyen d'un opérateur de traits saillants. On utilise une carte de surveillance de traits pour construire des blocs de traits qui comprennent les traits saillants surveillés, chacun des blocs de traits étant ensuite fourni sous forme d'entrée à un processus d'estimation des mouvements de la caméra, ledit processus étant commandé de manière à fournir des solutions pour un modèle de caméra en perspective. Une fois qu'on a dérivé les mouvements estimés de la caméra à partir des solutions fournies par le processus d'estimation des mouvements de la caméra, on extrait les points denses qui seront utilisés pour générer un modèle à maille. On applique finalement des textures au modèle à maille de manière à produire un modèle 3 D à textures complètes.
EP99935733A 1998-07-20 1999-07-20 Balayage automatique de scenes en 3d provenant d'images mobiles Withdrawn EP1097432A1 (fr)

Applications Claiming Priority (3)

Application Number Priority Date Filing Date Title
US9349298P 1998-07-20 1998-07-20
US93492P 1998-07-20
PCT/US1999/016395 WO2000004508A1 (fr) 1998-07-20 1999-07-20 Balayage automatique de scenes en 3d provenant d'images mobiles

Publications (1)

Publication Number Publication Date
EP1097432A1 true EP1097432A1 (fr) 2001-05-09

Family

ID=22239262

Family Applications (1)

Application Number Title Priority Date Filing Date
EP99935733A Withdrawn EP1097432A1 (fr) 1998-07-20 1999-07-20 Balayage automatique de scenes en 3d provenant d'images mobiles

Country Status (3)

Country Link
EP (1) EP1097432A1 (fr)
JP (1) JP2002520969A (fr)
WO (1) WO2000004508A1 (fr)

Families Citing this family (10)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US7625335B2 (en) 2000-08-25 2009-12-01 3Shape Aps Method and apparatus for three-dimensional optical scanning of interior surfaces
US7016824B2 (en) 2001-02-06 2006-03-21 Geometrix, Inc. Interactive try-on platform for eyeglasses
US8032337B2 (en) 2001-03-02 2011-10-04 3Shape A/S Method for modeling customized earpieces
AU2002318862B2 (en) * 2001-12-19 2005-02-10 Canon Kabushiki Kaisha A Method for Video Object Detection and Tracking Using a Dense Motion or Range Field
US9279602B2 (en) 2007-10-04 2016-03-08 Sungevity Inc. System and method for provisioning energy systems
JP4852591B2 (ja) * 2008-11-27 2012-01-11 富士フイルム株式会社 立体画像処理装置、方法及び記録媒体並びに立体撮像装置
MX2013003853A (es) * 2010-10-07 2013-09-26 Sungevity Modelado tridimensional rápido.
DE102012009688B4 (de) * 2012-05-09 2016-01-07 Db Systel Gmbh Verfahren, Signalfolge sowie Rechneranlage zum Erstellen, Verwalten, Komprimieren und Auswerten von 3D-Daten eines dreidimensionalen Geländemodells und ein Computerprogramm mit Programmcode zur Durchführung des Verfahrens auf einem Computer
WO2015031593A1 (fr) 2013-08-29 2015-03-05 Sungevity, Inc. Amélioration d'estimation de conception et d'installation pour des systèmes d'énergie solaire
US9965861B2 (en) 2014-12-29 2018-05-08 Intel Corporation Method and system of feature matching for multiple images

Family Cites Families (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US5511153A (en) * 1994-01-18 1996-04-23 Massachusetts Institute Of Technology Method and apparatus for three-dimensional, textured models from plural video images
US5864640A (en) * 1996-10-25 1999-01-26 Wavework, Inc. Method and apparatus for optically scanning three dimensional objects using color information in trackable patches

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
See references of WO0004508A1 *

Also Published As

Publication number Publication date
JP2002520969A (ja) 2002-07-09
WO2000004508A1 (fr) 2000-01-27

Similar Documents

Publication Publication Date Title
US11721067B2 (en) System and method for virtual modeling of indoor scenes from imagery
KR101195942B1 (ko) 카메라 보정 방법 및 이를 이용한 3차원 물체 재구성 방법
US6831643B2 (en) Method and system for reconstructing 3D interactive walkthroughs of real-world environments
EP2272050B1 (fr) Utilisation de collections de photos pour la modélisation tridimensionnelle
Johnson et al. Registration and integration of textured 3D data
CN110135455A (zh) 影像匹配方法、装置及计算机可读存储介质
Dick et al. Automatic 3D Modelling of Architecture.
Pan et al. Rapid scene reconstruction on mobile phones from panoramic images
EP1097432A1 (fr) Balayage automatique de scenes en 3d provenant d'images mobiles
Saxena et al. 3-d reconstruction from sparse views using monocular vision
Koppel et al. Image-based rendering and modeling in video-endoscopy
Grzeszczuk et al. Creating compact architectural models by geo-registering image collections
Kumar et al. 3D manipulation of motion imagery
Cheng et al. Texture mapping 3d planar models of indoor environments with noisy camera poses
KR100490885B1 (ko) 직각 교차 실린더를 이용한 영상기반 렌더링 방법
Schindler et al. Fast on-site reconstruction and visualization of archaeological finds
Becker Vision-assisted modeling for model-based video representations
Laycock et al. Rapid generation of urban models
Cheng et al. Texture mapping 3D models of indoor environments with noisy camera poses.
Li A Geometry Reconstruction And Motion Tracking System Using Multiple Commodity RGB-D Cameras
Siddiqui et al. Surface reconstruction from multiple views using rational B-splines and knot insertion
Malleson Dynamic scene modelling and representation from video and depth
DiVerdi Towards anywhere augmentation
Komodakis et al. 3D visual reconstruction of large scale natural sites and their fauna
Pan et al. Towards Rapid 3d Reconstruction on Mobile Phones from Wide-Field-of-View Images

Legal Events

Date Code Title Description
PUAI Public reference made under article 153(3) epc to a published international application that has entered the european phase

Free format text: ORIGINAL CODE: 0009012

17P Request for examination filed

Effective date: 20010120

AK Designated contracting states

Kind code of ref document: A1

Designated state(s): AT BE CH CY DE DK ES FI FR GB GR IE IT LI LU MC NL PT SE

RIN1 Information on inventor provided before grant (corrected)

Inventor name: WAUPOTITSCH, ROMAN

Inventor name: CHEN, JINLONG

Inventor name: FEJES, SANDOR

Inventor name: ZWERN, ARTHUR

17Q First examination report despatched

Effective date: 20010705

STAA Information on the status of an ep patent application or granted ep patent

Free format text: STATUS: THE APPLICATION IS DEEMED TO BE WITHDRAWN

18D Application deemed to be withdrawn

Effective date: 20030201