WO2022172053A1

WO2022172053A1 - Automatic low level-of-detail (lod) model generation based on geoinformation

Info

Publication number: WO2022172053A1
Application number: PCT/IB2021/051048
Authority: WO
Inventors: Wisdom CHAN; Chuen Kit LUK; Wai Pang Yau
Original assignee: Chain Technology Development Co., Ltd.
Priority date: 2021-02-10
Filing date: 2021-02-10
Publication date: 2022-08-18
Also published as: CN116964636A

Abstract

Provided are a system and method for reconstructing a low Level-of-detail (LOD) model automatically based on geoinformation of any 3D object. The method replaces the MVS reconstruction in the process so that the information needed for the 3D mesh is reduced significantly. This reduction may facilitate the subsequent processing of the 3D object on the portable device.

Description

AUTOMATIC LOW LEVEL-OF-DETAIL (LOD) MODEL GENERATION BASED ON

GEOINFORMATION

TECHNICAL SPECIFICATION [0001] This presentation relates to a system and method of three-dimensional (3-

D) reconstruction. In particular, aspects of the invention attempt to reduce the hardware requirement of the computer device for displaying and manipulating the reconstructed 3D scene. BACKGROUND

[0002] Reconstructing a shape or appearance of an object or a scene to a three- dimensional (3D) computer models is one of the important technological advances in computer science, computer generated graphics, and computer programming. In recent decades, reconstructing and rendering objects or scenes have been widely used in construction and infrastructure projects for engineers to design and make operational decisions throughout the project, as well as other applications such as in arts and motion pictures. However, the reconstructed scenes may not be easily viewed by currently available portable device without any pre- or post-processing work because the way these reconstructed objects are displayed or rendered require both computational power and storage access speed to be adequate to enable the viewers to have an enjoyable viewing experience.

[0003] Existing 3D reconstruction process usually consists of two steps. First step is a Structure-from-Motion pipeline (SFM), which takes in a set of images, and then return the camera parameters and a sparse point cloud. In one embodiment, the SFM may include a photogrammetric range imaging technique for estimating three-dimensional structures from two-dimensional image sequences that may be coupled with local motion signals. The second step is the Multi view stereo (MVS) reconstruction, which takes in the output from the first step and then reconstruct a dense point cloud or a 3D mesh with texture coloring. [0004] In one aspect in the second step above, the 3D reconstruction often results in a 3D mesh with millions or even ten millions of polygons, which require a very high power of CPU and GPU with large amount of on-chip memory in order to be displayed, be rendered, or be manipulated, thus may not be easily viewed by a portable device, such as a smartphone with limited memory, GPU capacity or battery power to allow a CPU and a display to run at a sustained duration.

[0005] Therefore, a technical approach to overcome the shortcomings of the existing approach is desirable for enabling mobile devices to efficiently display, render or manipulate 3D objects.

SUMMARY

[0006] Aspects of the invention may create a system and method to overcome the above challenges by reconstructing a low Level-of-detail (LOD) model automatically based on geoinformation of any 3D object. Aspects of the invention attempt to replace the MVS reconstruction in the process so that the information needed for the 3D mesh is reduced significantly. This reduction may facilitate the subsequent processing of the 3D object on the portable device. BRIEF DESCRIPTION OF THE DRAWINGS

[0007] Persons of ordinary skill in the art may appreciate that elements in the figures are illustrated for simplicity and clarity so not all connections and options have been shown. For example, common but well-understood elements that are useful or necessary in a commercially feasible embodiment may often not be depicted in order to facilitate a less obstructed view of these various embodiments of the present disclosure. It may be further appreciated that certain actions and/or steps may be described or depicted in a particular order of occurrence while those skilled in the art may understand that such specificity with respect to sequence is not actually required.

It may also be understood that the terms and expressions used herein may be defined with respect to their corresponding respective areas of inquiry and study except where specific meanings have otherwise been set forth herein.

[0008] FIG. 1 is a flow diagram to illustrate a method of creating a 3D object based on an existing approach.

[0009] FIG. 2 is a flow diagram to illustrate a method of creating a 3D object according to some embodiments.

[0010] FIG. 3 is a diagram illustrating a section of an area on a map having geoinformation according to some embodiments.

[0011] FIG. 4 is an illustration of a LOD 3D image or model according to some embodiments. [0012] FIG. 5 is a diagram illustrating a portable computing device according to one embodiment.

[0013] FIG. 6 is a diagram illustrating a computing device according to one embodiment.

DESCRIPTION

[0014] Embodiments may now be described more fully with reference to the accompanying drawings, which form a part hereof, and which show, by way of illustration, specific exemplary embodiments which may be practiced. These illustrations and exemplary embodiments may be presented with the understanding that the present disclosure is an exemplification of the principles of one or more embodiments and may not be intended to limit any one of the embodiments illustrated. Embodiments may be embodied in many different forms and should not be construed as limited to the embodiments set forth herein; rather, these embodiments are provided so that this disclosure may be thorough and complete, and may fully convey the scope of embodiments to those skilled in the art. Among other things, the present invention may be embodied as methods, systems, computer readable media, apparatuses, or devices. Accordingly, the present invention may take the form of an entirely hardware embodiment, an entirely software embodiment, or an embodiment combining software and hardware aspects. The following detailed description may, therefore, not to be taken in a limiting sense.

[0015] Aspects of the invention provide an improved approach to move away from the approach of the reliance on better equipped computer hardware in desktops which is a luxury or inconceivable for mobile devices such as smartphones or tablets. One aspect of the invention may replace the traditional multi view stereo reconstruction (MVS reconstruction) because the traditional MVS reconstruction requires long computational power and time to reconstruct the 3D model. The MVS reconstruction typically generates a 3D model with a 3D mesh having millions or even ten millions of polygons, which requires a lot of memory to store, read and write the data. In addition, having the 3D mesh is not the end product; it still needs further processing. That processing of course would further require high power CPU and GPU with large amount of on-chip memory.

[0016] Referring now to FIG. 1, a flow chart of providing a 3D object using a traditional MVS reconstruction approach is illustrated. For example, if the three dimensional (3D) image or model is needed of a city area, a collection of aerial photos (in digital format) of the city area is obtained at 102. The obtained photos are processed via SFM at 104 and then MVS at 106. The mesh is then subjected to a texture coloring at 108 and then a high LOD city area 3D object is provided and rendered at 110.

[0017] Aspects of the invention move away from such approach by reduce the number of polygons generated. Referring now to FIG. 2, a flow diagram illustrates a method of creating a 3D object according to some embodiments. In one example, if the three dimensional (3D) image or model is needed of a city area, a collection of aerial photos (in digital format) of the city area is obtained at 202. In another embodiment, the area may include any defined space or area, such as a shopping mall or an airport.

[0018] In one aspect, based on the aerial photos, which may include geoinformation, a 3D model is generated based on the geoinformation at 204. In one example, the generated 3D model are basic three dimensional shapes, such as may be cuboids or prisms, or a combination of them. For example, a house may be represented with a cuboid as the ground floor or lower levels while a prism at top as roof. In one embodiment, a generic representation of a structure is used without elaborate information added to the 3D shapes. Of course, once the geoinformation is provided, the 3D model may be improved and be more accurate.

[0019] To the contrary, the traditional MVS approach employs various algorithms as an attempt to reconstruct every details from the taken photos. This approach produces, usually, a very complex polygon mesh to describe the features of the structures. For example, hundreds of triangles may be used to reconstruct a window under the MVS approach.

[0020] In one embodiment, geoinformation may include a position, name, outline, height, a size, a volume, and shape of the buildings. In another embodiment, the geoinformation may also include roads or other non-building structures such as parks. In another embodiment, the geoinformation may enable aspects of the invention to reconstruct objects such as buildings or structures in a cubic shape.

[0021] In one embodiment, most of the geoinformation may be retrieved from the spatial or geographic data from a third party. For example, the geoinformation may be obtained from commercially available maps such as Google® Maps, Bing® Maps or Apple® Maps, or community freely editable map such as Open Street Map. Aspects of the invention do not need the data in the aerial photos during the 3D model reconstruction thus reducing amount of processing time and resources in the 3D model generation. In one aspect, embodiments of the invention may enhance the 3D model such as texture or coloring as described below based on the geoinformation or the like.

[0022] At the same time, the structure-from -motion (SFM) process 206 may include a photogrammetric range imaging technique for estimating three-dimensional structures from two-dimensional image sequences that may be coupled with local motion signals. In one aspect, the steps 204 and 206 may be conducted in parallel. After the texture coloring is applied to the aerial photos in 208, a low LOD city area 3D model is provided.

[0023] In one embodiment, the low LOD 3D model generated by aspects of the invention maintain or provide basic structures or architectures of the buildings or objects without identifying details such as the number or location of windows, types or elements of fagade, types or location of doors, or any other exterior features of a structure or architecture.

[0024] In one aspect, the low LOD model may include basic geometry information without precise measurement, scale, shape, size, or the like. This approach is meant to be intentional as the 3D model may further reduce the generation time and resources needed.

[0025] Referring now to FIG. 3, a basic illustration of a two-dimensional image 300 of a monitored space according to some embodiments. In one embodiment, the image 300 in this example may include a number of buildings (302, 304, and 306) and streets

(308, 310, and 312) surrounding them. In one embodiment, the shapes of the buildings are also provided, such as rectangular for 302 and 304 and triangular for 306. The image 300 may also include a number of additional geoinformation, such as the building height, building direction, latitude, and longitude. In another embodiment, the image 300 may include GPS information or data.

[0026] With the image 300, referring now to FIG. 4, aspects of the invention may provide a 3D model or rendering without requiring the robust hardware. In FIG. 4, the image 300 has been transformed to a 3D model 400 where the 2D illustrations of the buildings are now in three dimensions: object 402 for building 302, object 404 for building 304, and object 406 for building 306. The geoinformation from the image 300 has been incorporated into the 3D objects and they are being rendered as 3D objects, whether they are cubes or prism, etc. The illustrations for streets 308, 310, and 312 continue to be as streets 308, 310, and 312. [0027] With this approach, rendering of the 3D model 400 is less burdensome for mobile devices, such as a portable computing device 801. In another embodiment, the low LOD 3D models created enable faster loading from a remote server to a local device. It is of course understood that the rendered 3D images may be downloaded onto a desktop as well without departing from the scope or spirit of the invention.

[0028] As illustrated in FIG. 2, SMF 110 processing may still be performed as well as adding texture or coloring to the 3D model 400. In one embodiment, the source of the information of the texture or coloring may come from the 2D image, such as the image 300. [0029] FIG. 5 may be a high level illustration of a portable computing device 801 communicating with a remote computing device 841 in FIG. 6 but the application may be stored and accessed in a variety of ways. In addition, the application may be obtained in a variety of ways such as from an app store, from a web site, from a store Wi-Fi system, etc. There may be various versions of the application to take advantage of the benefits of different computing devices, different languages and different API platforms.

[0030] In one embodiment, a portable computing device 801 may be a mobile device 108 that operates using a portable power source 855 such as a battery. The portable computing device 801 may also have a display 802 which may or may not be a touch sensitive display. More specifically, the display 802 may have a capacitance sensor, for example, that may be used to provide input data to the portable computing device 801. In other embodiments, an input pad 804 such as arrows, scroll wheels, keyboards, etc., may be used to provide inputs to the portable computing device 801. In addition, the portable computing device 801 may have a microphone 806 which may accept and store verbal data, a camera 808 to accept images and a speaker 810 to communicate sounds.

[0031] The portable computing device 801 may be able to communicate with a computing device 841 or a plurality of computing devices 841 that make up a cloud of computing devices 811. The portable computing device 801 may be able to communicate in a variety of ways. In some embodiments, the communication may be wired such as through an Ethernet cable, a USB cable or RJ6 cable. In other embodiments, the communication may be wireless such as through Wi-Fi® (802.11 standard), BLUETOOTFI, cellular communication or near field communication devices. The communication may be direct to the computing device 841 or may be through a communication network such as cellular service, through the Internet, through a private network, through BLUETOOTH, etc., via a network or communication module 880.

[0032] FIG. 5 may be a sample portable computing device 801 that is physically configured according to be part of the system. The portable computing device 801 may have a processor 850 that is physically configured according to computer executable instructions. It may have a portable power supply 855 such as a battery which may be rechargeable. It may also have a sound and video module 860 which assists in displaying video and sound and may turn off when not in use to conserve power and battery life. The portable computing device 801 may also have non-volatile memory 870 and volatile memory 865. The network or communication module 880 may have GPS, BLUETOOTH, NFC, cellular or other communication capabilities. In one embodiment, some or all of the network or communication capabilities may be separate circuits or may be part of the processor 850. There also may be an input/output bus 875 that shuttles data to and from the various user input devices such as the microphone 806, the camera 808 and other inputs, such as the input pad 804, the display 802, and the speakers 810, etc. It also may control communicating with the networks, either through wireless or wired devices. Of course, this is just one embodiment of the portable computing device 801 and the number and types of portable computing devices 801 is limited only by the imagination. [0033] The physical elements that make up the remote computing device 841 may be further illustrated in FIG. 6. At a high level, the computing device 841 may include a digital storage such as a magnetic disk, an optical disk, flash storage, non-volatile storage, etc. Structured data may be stored in the digital storage such as in a database. The server 841 may have a processor 1000 that is physically configured according to computer executable instructions. It may also have a sound and video module 1005 which assists in displaying video and sound and may turn off when not in use to conserve power and battery life. The server 841 may also have volatile memory 1010 and non volatile memory 1015.

[0034] The database 1025 may be stored in the memory 1010 or 1015 or may be separate. The database 1025 may also be part of a cloud of computing device 841 and may be stored in a distributed manner across a plurality of computing devices 841. There also may be an input/output bus 1020 that shuttles data to and from the various user input devices such as the microphone 806, the camera 808, the inputs such as the input pad 804, the display 802, and the speakers 810, etc. The input/output bus 1020 may also connect to similar devices of the microphone 806, the camera 808, the inputs such as the input pad 804, the display 802, and the speakers 810, or other peripheral devices, etc. The input/output bus 1020 also may interface with a network or communication module 1030 to control communicating with other devices or computer networks, either through wireless or wired devices. In some embodiments, the application may be on the local computing device 801 and in other embodiments, the application may be remote 841. Of course, this is just one embodiment of the server 841 and the number and types of portable computing devices 841 is limited only by the imagination.

[0035] The user devices, computers and servers described herein (e.g., 801 or 841) may be computers that may have, among other elements, a microprocessor (such as from the Intel® Corporation, AMD®, ARM®, Qualcomm®, or MediaTek®); volatile and non- volatile memory; one or more mass storage devices (e.g., a hard drive); various user input devices, such as a mouse, a keyboard, or a microphone; and a video display system. The user devices, computers and servers described herein may be running on any one of many operating systems including, but not limited to WINDOWS®, UNIX®, LINUX®, MAC® OS®, iOS®, or Android®. It is contemplated, however, that any suitable operating system may be used for the present invention. The servers may be a cluster of web servers, which may each be LINUX® based and supported by a load balancer that decides which of the cluster of web servers should process a request based upon the current request-load of the available server(s).

[0036] The user devices, computers and servers described herein may communicate via networks, including the Internet, wide area network (WAN), local area network (LAN), Wi-Fi®, other computer networks (now known or invented in the future), and/or any combination of the foregoing. It should be understood by those of ordinary skill in the art having the present specification, drawings, and claims before them that networks may connect the various components over any combination of wired and wireless conduits, including copper, fiber optic, microwaves, and other forms of radio frequency, electrical and/or optical communication techniques. It should also be understood that any network may be connected to any other network in a different manner. The interconnections between computers and servers in system are examples. Any device described herein may communicate with any other device via one or more networks.

[0037] The example embodiments may include additional devices and networks beyond those shown. Further, the functionality described as being performed by one device may be distributed and performed by two or more devices. Multiple devices may also be combined into a single device, which may perform the functionality of the combined devices.

[0038] The various participants and elements described herein may operate one or more computer apparatuses to facilitate the functions described herein. Any of the elements in the above-described Figures, including any servers, user devices, or databases, may use any suitable number of subsystems to facilitate the functions described herein.

[0039] Any of the software components or functions described in this application, may be implemented as software code or computer readable instructions that may be executed by at least one processor using any suitable computer language such as, for example, Java, C++, or Perl using, for example, conventional or object-oriented techniques.

[0040] The software code may be stored as a series of instructions or commands on a non-transitory computer readable medium, such as a random access memory (RAM), a read only memory (ROM), a magnetic medium such as a hard-drive or a floppy disk, or an optical medium such as a CD-ROM. Any such computer readable medium may reside on or within a single computational apparatus and may be present on or within different computational apparatuses within a system or network. [0041] It may be understood that the present invention as described above may be implemented in the form of control logic using computer software in a modular or integrated manner. Based on the disclosure and teachings provided herein, a person of ordinary skill in the art may know and appreciate other ways and/or methods to implement the present invention using hardware, software, or a combination of hardware and software.

[0042] The above description is illustrative and is not restrictive. Many variations of embodiments may become apparent to those skilled in the art upon review of the disclosure. The scope embodiments should, therefore, be determined not with reference to the above description, but instead should be determined with reference to the pending claims along with their full scope or equivalents.

[0043] One or more features from any embodiment may be combined with one or more features of any other embodiment without departing from the scope embodiments. A recitation of "a", "an" or "the" is intended to mean "one or more" unless specifically indicated to the contrary. Recitation of "and/or" is intended to represent the most inclusive sense of the term unless specifically indicated to the contrary.

[0044] One or more of the elements of the present system may be claimed as means for accomplishing a particular function. Where such means-plus-function elements are used to describe certain elements of a claimed system it may be understood by those of ordinary skill in the art having the present specification, figures and claims before them, that the corresponding structure includes a computer, processor, or microprocessor (as the case may be) programmed to perform the particularly recited function using functionality found in a computer after special programming and/or by implementing one or more algorithms to achieve the recited functionality as recited in the claims or steps described above. As would be understood by those of ordinary skill in the art that algorithm may be expressed within this disclosure as a mathematical formula, a flow chart, a narrative, and/or in any other manner that provides sufficient structure for those of ordinary skill in the art to implement the recited process and its equivalents. [0045] While the present disclosure may be embodied in many different forms, the drawings and discussion are presented with the understanding that the present disclosure is an exemplification of the principles of one or more inventions and is not intended to limit any one embodiments to the embodiments illustrated.

[0046] The present disclosure provides a solution to the long-felt need described above. In particular, aspects of the invention enable faster loading and rendering of 3D models on devices that may not have the processing power of a desktop or other devices that may process and create 3D models from 2D images.

[0047] Further advantages and modifications of the above described system and method may readily occur to those skilled in the art. [0048] The disclosure, in its broader aspects, is therefore not limited to the specific details, representative system and methods, and illustrative examples shown and described above. Various modifications and variations may be made to the above specification without departing from the scope or spirit of the present disclosure, and it is intended that the present disclosure covers all such modifications and variations provided they come within the scope of the following claims and their equivalents.

Claims

CLAIMS What is claimed is:

1. A low level-of-detail (LOD) 3D model image generating method comprising: receiving one or more two dimensional (2D) images, wherein the one or more 2D images include one or more views of one or more objects in the one or more 2D images; without geoinformation from the one or more 2D images or external sources, generating a 3D object for each of the one or more objects, wherein generating comprises reconstructing the 3D object for each of the one or more objects absent subjecting the 3D object to a multi view stereo reconstruction process; and refining the 3D object based on the geoinformation from the one or more 2D images.

2. The LOD 3D model image generating method of claim 1 , wherein the geoinformation comprises at least the following information of each of one or more objects: a position, a name, an outline, a height, a size, a volume, and a shape.

3. The LOD 3D model image generating method of claim 1 , wherein refining comprises refining the 3D object in response to retrieving the geoinformation from a third-party, wherein the third-party comprises a database.

4. The LOD 3D model image generating method of claim 1, further comprising processing the one or more 2D images via a structure-from-motion (SFM) algorithm.

5. The LOD 3D model image generating method of claim 1 , further comprising applying a texture to surfaces of the 3D object in response to refining.

6. The LOD 3D model image generating method of claim 5, further comprising applying a color to the surfaces of the 3D object in response to refining.

7. The LOD 3D model image generating method of claim 5 or claim 6, wherein applying comprising receiving the geoinformation of the texture or color from the one or more 2D images or the external sources.

8. A tangible computer-readable medium having stored thereon computer-executable instructions for a level-of-detail (LOD) 3D model image generating method, said computer-executable instructions configured to be executed by a processor, wherein the computer-executable instructions comprising: receiving one or more two dimensional (2D) images, wherein the one or more 2D images include one or more views of one or more objects in the one or more 2D images; without geoinformation from the one or more 2D images or external sources, generating a 3D object for each of the one or more objects, wherein generating comprises reconstructing the 3D object for each of the one or more objects absent subjecting the 3D object to a multi view stereo reconstruction process; and refining the 3D object based on the geoinformation from the one or more 2D images.

9. The tangible computer-readable medium of claim 8, wherein the geoinformation comprises at least the following information of each of one or more objects: a position, a name, an outline, a height, a size, a volume, and a shape.

10. The tangible computer-readable medium of claim 8, wherein refining comprises refining the 3D object in response to retrieving the geoinformation from a third-party, wherein the third-party comprises a database.

11. The tangible computer-readable medium of claim 8, further comprising processing the one or more 2D images via a structure-from-motion (SFM) algorithm.

12. The tangible computer-readable medium of claim 8, further comprising applying a texture to surfaces of the 3D object in response to refining.

13. The tangible computer-readable medium of claim 12, further comprising applying a color to the surfaces of the 3D object in response to refining.

14. The tangible computer-readable medium of claim 12 or claim 13, wherein applying comprising receiving the geoinformation of the texture or color from the one or more 2D images.