WO2024037822A1 - 3d reconstruction of a target - Google Patents

3d reconstruction of a target Download PDF

Info

Publication number
WO2024037822A1
WO2024037822A1 PCT/EP2023/070218 EP2023070218W WO2024037822A1 WO 2024037822 A1 WO2024037822 A1 WO 2024037822A1 EP 2023070218 W EP2023070218 W EP 2023070218W WO 2024037822 A1 WO2024037822 A1 WO 2024037822A1
Authority
WO
WIPO (PCT)
Prior art keywords
reconstruction
global
local
target
local feature
Prior art date
Application number
PCT/EP2023/070218
Other languages
French (fr)
Inventor
Zi-chuan ZHAO
Anasol PENA-RIOS
Adrian Clark
Anthony Conway
Original Assignee
British Telecommunications Public Limited Company
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Priority claimed from GBGB2211987.9A external-priority patent/GB202211987D0/en
Application filed by British Telecommunications Public Limited Company filed Critical British Telecommunications Public Limited Company
Publication of WO2024037822A1 publication Critical patent/WO2024037822A1/en

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T17/00Three dimensional [3D] modelling, e.g. data description of 3D objects
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T19/00Manipulating 3D models or images for computer graphics
    • G06T19/20Editing of 3D images, e.g. changing shapes or colours, aligning objects or positioning parts
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2219/00Indexing scheme for manipulating 3D models or images for computer graphics
    • G06T2219/20Indexing scheme for editing of 3D models
    • G06T2219/2021Shape modification

Definitions

  • the present invention relates to a computer-implemented method and system for interactive 3D reconstruction of a target.
  • a digital twin system may be used for the creation of 3D model of a target such as an object.
  • Augmented Reality (AR) and Virtual Reality (VR) technologies and their applications with digital twin systems often rely upon 3D virtual representations of complex physical objects.
  • the improvements in the accuracy of the virtual representation and level of detail that can be reproduced can assist in the optimization of AR and VR applications and provide an improved user experience.
  • Fast, automatic creation of virtual representations from physical objects provides a challenge to the AR and VR industry.
  • 3D reconstruction is a technique which attempts to recover the original 3D shape of an object or scene from input data such as, for example, one or more images, or from a point cloud acquired from a scanning device.
  • One technique for 3D reconstruction is implicit field reconstruction in which the output target is represented as a scalar field in the 3D space.
  • Deep learning based implicit reconstruction systems can be classified into two categories, the forward class, and the converging class.
  • forward-class algorithms the input data is first encoded to a latent code by an encoder neural network, and then decoded into the implicit field by a decoder neural network by the learned parameters.
  • This category of architecture is capable of reconstructing 3D shapes from learned priors which reduces noise and prevents missing parts in the reconstruction. However, it performs poorly when reproducing targets not encountered in the training set and tends to over-smooth the output.
  • the converging-class tries to learn a neural network that represents the entire implicit field for each individual object. This class performs better at reproducing details but is less reliable at reproducing the shape of an object and takes longer for individual objects. It is desirable therefore to overcome the shortcomings of the two classes to be able to produce a 3D reconstruction in a manner which is able to accurately reproduce the shape of the target while also being able to efficiently recreate complex details.
  • a method for 3D reconstruction of a target comprising: obtaining an initial global reconstruction of the target in a 3D space, inferred by a global machine learning model; providing, to a user, an initial visualisation of the target based on the reconstruction; receiving, from the user, at least one indication of at least one point of interest in the visualisation; resampling at least one first subsection of the target based on the at least one point of interest to obtain local data, wherein the local data is associated with the subsection based on spatial information that associates the local data with a point in 3D space; inputting the resampled local data and spatial information into a local feature machine learning model to obtain at least one 3D reconstruction of the target, wherein the local feature machine learning model has been trained to output a target reconstruction from local data of resampled subsections, and wherein the 3D coordinate system of the local 3D reconstruction aligns with the global 3D reconstruction; and merging the global 3D reconstruction with the local 3
  • the steps of the method are performed iteratively using the merged reconstruction and the further visualisation based on the merged reconstruction as the initial global reconstruction and initial visualisation in the next iteration until receiving, from the user, an indication to step.
  • the global model is an encoder-decoder network comprising a global encoder trained to infer a global latent code from the target data
  • the local feature machine learning model comprises an encoder-decoder network, the local feature encoder-decoder network comprising: a local feature encoder trained to infer a local feature latent code form the resampled local data and spatial information; and a local feature decoder trained to infer a representation of the target in the 3D space from a combination of the local feature latent code and the global latent code.
  • the local 3D reconstruction comprises at least one reconstruction of at least one additional subsection of the target that is inferred by the local feature decoder based on the local data of the at least one first subsection and at least one property of the target learned by the global encoder.
  • obtaining a global reconstruction of the target in a 3D space comprises inputting the global latent code to the local feature decoder wherein the local feature decoder is also trained to infer, from the global latent code, a global representation of the target in the 3D space.
  • merging the global 3D reconstruction and the local 3D reconstruction comprises: receiving, from the local feature decoder, a local reconstruction corresponding to each point of interest; receiving, from the local feature decoder, weight information comprising a weight value for each point in space for each local reconstruction and the global reconstruction; and margining the global reconstruction and the at least one local reconstruction based on the weight information.
  • resampling the target based on the at least one point of interest comprises resampling a subspace in the global 3D space centred on the at least one point of interest.
  • the local and global 3D reconstructions are each a scalar field representing the occupation probability of a point in space and wherein merging the global 3D reconstruction with the local 3D reconstruction comprises: combining the scalar field values of the global 3D reconstruction with the scalar field values of the local 3D reconstruction; and extracting a probability iso-surface from the combined scalar field to represent the shape of the target for visualisation.
  • a computer system comprising a processor and a memory storing computer program code for performing the steps of the method set out above.
  • a computer program or readable medium comprising computer program code to, when loaded on and executed by a computer, causes the computer to carry out the steps of the method set out above.
  • Figure 1 is a flowchart showing steps of a method in accordance with an embodiment of the present invention.
  • Figure 2 is a flow diagram showing additional optional steps of the method of figure 1.
  • Figure 3 is a flowchart showing additional optional steps of the method of figure 1.
  • Figure 4 is a flow diagram showing additional optional steps of the method of figure 1
  • Figure 5 is a component diagram of a computer system suitable for the operation of embodiments of the present invention.
  • FIG. 1 is a flowchart showing steps for carrying out a method 100 in accordance with embodiments of the present invention.
  • an initial global reconstruction of a target in a 3D space is obtained.
  • the target may be any three dimensional object.
  • the initial global reconstruction may be generated by a global model that is trained to reconstruct a 3D representation of the entirety of a target in 3D space from data that has been sampled from the target.
  • the global model may be provided the sample target data to obtain the global reconstruction.
  • the global reconstruction may be received from an external source; the external source may itself have generated the global reconstruction using a global model.
  • a sampling device may be used to acquire data on the target with spatial information that indicates a position in 3D space associated with the acquired data points.
  • a scanner may be used to scan an object surface to provide a point cloud that associates data captured by the scanner with a point in 3D space that corresponds to the position and angle of the scanner with acquiring the data.
  • the model may be provided an image or multiple images captured by a camera along with information that indicates the position and angle of the camera view relative to the target.
  • the data may be acquired directly from the sampling device or may be provided to the model from a database.
  • the global model is an encoder-decoder network comprising an encoder and a decoder wherein the encoder is trained to output a global latent code from the acquired target data and spatial information and wherein the decoder is trained to output, from the global latent code, a 3D reconstruction of the target.
  • the global reconstruction is an implicit reconstruction.
  • the global reconstruction may be an occupational probability scalar field wherein the decoder of the global encoder-decoder network is provided a set of query points comprising a set of points in the 3D space with the global latent code and outputs a scalar value between 0 and 1 representing the probability of each query point being occupied by the object.
  • Other techniques for 3D global reconstruction of a target are known to the skilled person and may be used in place of implicit reconstruction.
  • an initial visualisation of the target is provided to a user, for example, via a computer display.
  • the visualisation may be a 2D or 3D image representation of the reconstruction. This allows the user to observe the visualisation and be able to determine a point in the visualisation that does not satisfy the user’s expectation or which the user is otherwise interested in.
  • at least one indication of at least one point of interest in the visualisation is received from the user. This spatial point that the user has indicated may be referred to as a “seed”. The user may indicate more than one seed.
  • at least one first subsection of the target is resampled based on the point of interest, or “seed”, to obtain local data associated with the subsection.
  • the seed is set at a point in the 3D space of the global representation and the subsection to be resampled is defined based on its relative position to the seed in space.
  • a subspace in the global 3D space centred on the at least one point of interest may be resampled; this may be a sphere or cube or other defined shape around the seed.
  • the local data is associated with the subsection based on spatial information that associates the local data with a point in 3D space.
  • the local data may be retrieved from the original data used to infer the initial global reconstruction or it may be resampled directly from the target using the same type of sampling device as the original target data.
  • the resampled data is also scanned by a device that produces a point cloud with points associated with a 3D coordinate.
  • the resampled data may come from camera image which are associated with a position, angle and field of view relative to the target.
  • the local data may then be obtained based on the 3D coordinate associated with the data being within the corresponding coordinates of the 3D subsection in the 3D space. If the 3D coordinate of the original data lies within the subsection marked by the seed, that data may be retrieved to compose the local data. If the target is being resampled directly, the sampling device may associate the acquired data with a 3D coordinate.
  • the local data may be sampled at a different resolution than the original data.
  • the local data and corresponding spatial information is input into a local feature machine learning model to obtain a 3D reconstruction of the target.
  • the local feature model is trained to output a target reconstruction from local data of resampled subsections.
  • the 3D coordinates of the local reconstruction may align with the 3D coordinates of the global reconstruction.
  • the same point relative to the target may be set as an origin point in each coordinate system when sampling both the original global data and the local data.
  • the global 3D reconstruction is merged with the local 3D reconstruction.
  • the merging may be determined differently depending on different calculations based on different architectures of the models and different modes of reconstruction.
  • both the local and global reconstructions may comprise shapes which are to be aligned and combined; in another example, the local reconstruction may comprise a set of values representing differences from the global reconstruction.
  • the local model is trained on subsections of the target, the knowledge representation of the local features will be different from that of the target globally.
  • the global model will reproduce closed shapes representing an entire physical object while the local feature model does not.
  • the shapes of local features may occur in different distributions to global features.
  • the local reconstruction may more accurately represent the details of the shape of the target where the user is most interested, around the seed.
  • Figure 2 is a flow diagram showing steps for carrying out method 100 comprising additional steps 210 and 220.
  • Step 150 has already been described in relation to figure 1.
  • the local and global 3D reconstructions are each a scalar field representing the occupation probability of a point in space. Implementations of such implicit reconstructions will be known to the person skilled in the art.
  • a model may be trained to infer a function over a 3D space of the probability of the point in space being occupied and then is provided a set of discrete points in 3D space, the query points, which are then assigned values by the function.
  • the step 160 may comprise a step 210 wherein the scalar field values of the global 3D reconstruction are combined with the scalar field values of the local 3D reconstruction.
  • both the local and global reconstruction field values are probabilities between 0 and 1 , and the two are added together based on a weighted sum.
  • Weight information comprising a weight value for each point in space for each local reconstruction and the global reconstruction may be calculated.
  • the scalar field value for each point in space in the combined scalar field may then be the sum of the products of the weight value with the field value of the point of space in each reconstruction.
  • step 160 may further comprise step 220 in which a probability iso-surface is extracted from the combined scalar field to represent the shape of the target for visualisation.
  • a probability iso-surface is extracted from the combined scalar field to represent the shape of the target for visualisation.
  • a value, or range of values may be set or calculated to be the occupation probability value that represents the start of the surface of the target.
  • the coordinates of the areas in space that contain points, or lie within points, that have this occupational probability value are extracted such that a visualisation of the target with a surface at those coordinates could be built.
  • Figure 3 is a flowchart showing steps for carrying out method 300 in accordance with embodiments of the current invention.
  • Method 300 comprises the steps of method 100 with the same steps indicated with the same reference numerals as in figure 1 and the description of the same steps not repeated here.
  • the steps of the method are performed iteratively using the merged reconstruction and the further visualisation based on the merged reconstruction as the initial global reconstruction and initial visualisation in the next iteration until receiving, from the user, an indication to stop.
  • Method 300 may comprise step 310, after the user is provided an initial visualisation of the target, in which the method checks if it has received from the user an indication to stop. If it does, the method stops, if not, the method continues with step 130.
  • Each iteration of the steps may individually further improve the quality of the visual reconstruction at the target around the seeds.
  • Outputs from the global model and the local feature model, such as the 3D reconstructions corresponding to each seed and their corresponding weightings, may be stored in a memory after the iteration in which they are generated, to allow the system to avoid recalculating the entire output at each iteration.
  • the user may observe the level of improvement and then halt the process when they are satisfied with the quality of the reconstruction or when the rate of improvement has peaked.
  • the global model may be an encoder-decoder network comprising a global encoder trained to infer a global latent code from the target data
  • the local feature machine learning model may comprise an encoder-decoder network comprising a local feature encoder and a local feature decoder.
  • the local feature encoder may be trained to infer a local feature latent code from the resampled local data and spatial information and the local feature decoder may be trained to infer a representation of the target in the 3D space from a combination of the local feature latent code and the global latent code.
  • the latent codes may be stored in a memory after the iteration in which they are generated.
  • the encoder-decoder networks may be deep-learning networks comprising multiple layers.
  • the encoder can be considered to refer to the layers before the global latent code layer that is input into the local feature decoder, and the decoder can be considered to be the layers after which the latent code layer from either the local feature model or the global model can be input into the local feature model.
  • the global latent code layer is a latent representation of the encoded information that implicitly represents the information of the structure and shape of the target learned by the global model.
  • the local feature decoder can infer a representation of the target across all of the global 3D space, with greater accuracy at the local subsections of the 3D space corresponding to the resampled local data.
  • the local feature decoder may use the properties of the target that have been learned by the global encoder to reconstruct additional subsections of the target, outside of the resampled subsection, with greater accuracy
  • the global encoder may have inferred that the target has a symmetry in its shape. Prior information about that symmetry may be encoded in the global latent code.
  • the local feature decoder may then infer from that prior information that there are subsections of the target that correspond under symmetry to the subsection resampled based on the seed; for example if the seed was placed at the foot of a table leg of a rectangular table and the global encoder has inferred the rectangular symmetry of the table, the local feature decoder may infer that the other three feet of the table will match the foot where the seed is placed.
  • the subsections that correspond to the resampled subsection under symmetry, such as the other three table feet, may then also be inferred by the local feature decoder, within the total 3D reconstruction, with greater accuracy, given the local feature latent code.
  • the global encoder-decoder network comprises a separate global decoder that is trained to infer the initial global 3D reconstruction from the global latent code.
  • the same decoder may be used for both the global network and the local feature network and the global latent code is input to the local feature decoder to obtain the initial global reconstruction, and the local feature decoder is also trained to infer a global representation of the target in the 3D space from the global latent code. While the parameters of the global network and the local feature network will be different due to the different set of inputs on which they will be trained, because they both relate to reconstruction of the same set of targets, some parameters will be shared. Thus sharing the same decoder for both the global network and the local feature network is more space efficient.
  • Figure 4 is a flowchart showing steps for carrying out a method 100 comprising additional steps 410, 420 and 430. Steps 110-150 are carried out as described above.
  • the local feature decoder may also be trained to infer weight information comprising a weight value for each point in space for each local reconstruction based on the combined local feature latent code and the global latent code.
  • step 160 of merging the global 3D reconstruction and the local 3D reconstruction may comprise: step 410 of receiving from the local feature decoder, each local reconstruction that corresponds to each seed; step 420 of receiving the weight information from the local feature decoder; and step 430 of merging the global reconstruction and the one or more local reconstructions based on the weight information.
  • step 430 can be carried out using steps 210 and 220 as described in relation to figure 2 above.
  • Use of an encoder-decoder architecture for occupational probability field global reconstruction aids in eliminating noise and the risk of missing parts at the cost of possible over-smoothing in the final reconstruction that eliminates surface details of the target.
  • the local reconstructions help compensate by more accurately reflecting surface details and by allowing the local network to reprocess the area of interest with an additional sample without having to reprocess the entire target data.
  • the weight values will correspond to the subsections of the total 3D reconstruction of the target that will be reconstructed more accurately.
  • the parts of the local feature reconstruction that have been reconstructed based on the resample subsections of the target, or based on the resampled subsections of the target and a target property learned by the global encoder, may be weighted higher.
  • the resampled data may correspond to a subsection of the target in a 3D space at a fixed distance from the seed.
  • the weight value for points in space that are within that fixed distance may be higher than those for points in space outside of it, and the weight values may decline as the points in space get further from the seed.
  • the weight values of the points in space of the global reconstruction may be higher outside of the fixed distance from the seed, as the global reconstruction may be necessarily relied on or may produce a more accurate representation outside of the resampled subsection. Because the weight information is output from the local feature decoder, the local feature decoder can determine the weight values based on properties of the target inferred by the global encoder. For example, the weight values for a local reconstruction can be based on the symmetry or shape of the target inferred by the global encoder. Points in space that are part of additional subsections of the target in the reconstruction that correspond, under symmetry, to the first subsection that was resampled based on the seed, may be weighted higher. The local feature decoder may also determine a confidence level of the values of the scalar field output and the weight values may reflect the confidence level so that a local reconstruction or global reconstruction which has lower confidence values for its reconstruction at a particular point in space is weighted lower.
  • FIG. 5 is a component diagram of a computer system 500 suitable for the operation of embodiments of the present invention.
  • System 500 may perform any of the methods described herein that correspond to embodiments of the present invention.
  • System 500 comprises a processor 510 and a memory 520.
  • Memory 520 may store computer program code or computer executable instructions that when executed by the processor 510 causes the processor to carry out the any of the methods described herein.
  • the processor may comprise one or more processors, each of which may perform steps of the method as described above.
  • a software-controlled programmable processing device such as a microprocessor, digital signal processor or other processing device, data processing apparatus or system
  • a computer program for configuring a programmable device, apparatus or system to implement the foregoing described methods is envisaged as an aspect of the present invention.
  • the computer program may be embodied as source code or undergo compilation for implementation on a processing device, apparatus or system or may be embodied as object code, for example.
  • the computer program is stored on a non-transitory carrier medium in machine or device readable form, such as in the form of a computer readable medium.
  • the medium can be one or more of a solid-state memory, magnetic memory such as disk or tape, optically or magneto-optically readable memory such as compact disk or digital versatile disk etc., and the processing device utilises the program or a part thereof to configure it for operation.
  • the computer program may be supplied from a remote source embodied in a communications medium such as an electronic signal, radio frequency carrier wave or optical carrier wave. Such carrier media are also envisaged.

Landscapes

  • Engineering & Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Theoretical Computer Science (AREA)
  • Computer Graphics (AREA)
  • Software Systems (AREA)
  • General Physics & Mathematics (AREA)
  • Geometry (AREA)
  • Architecture (AREA)
  • Computer Hardware Design (AREA)
  • General Engineering & Computer Science (AREA)
  • Image Generation (AREA)
  • Image Analysis (AREA)
  • Processing Or Creating Images (AREA)

Abstract

A computer-implemented method for 3D reconstruction of a target is provided, comprising obtaining an initial global reconstruction of the target in a 3D space, inferred by a global machine learning model; providing, to a user, an initial visualisation of the target based on the reconstruction; receiving, from the user, at least one indication of at least one point of interest in the visualisation; resampling at least one first subsection of the target based on the at least one point of interest to obtain local data, wherein the local data is associated with the subsection based on spatial information that associates the local data with a point in 3D space; inputting the resampled local data and spatial information into a local feature machine learning model to obtain at least one 3D reconstruction of the target, wherein the local feature machine learning model has been trained to output a target reconstruction from local data of resampled subsections, and wherein the 3D coordinate system of the local 3D reconstruction aligns with the global 3D reconstruction; and merging the global 3D reconstruction with the local 3D reconstruction. A corresponding computer system and computer readable medium may also be provided.

Description

3D Reconstruction of a Target
The present invention relates to a computer-implemented method and system for interactive 3D reconstruction of a target.
Background
A digital twin system may be used for the creation of 3D model of a target such as an object. Augmented Reality (AR) and Virtual Reality (VR) technologies and their applications with digital twin systems often rely upon 3D virtual representations of complex physical objects. The improvements in the accuracy of the virtual representation and level of detail that can be reproduced can assist in the optimization of AR and VR applications and provide an improved user experience. Fast, automatic creation of virtual representations from physical objects provides a challenge to the AR and VR industry.
3D reconstruction is a technique which attempts to recover the original 3D shape of an object or scene from input data such as, for example, one or more images, or from a point cloud acquired from a scanning device. One technique for 3D reconstruction is implicit field reconstruction in which the output target is represented as a scalar field in the 3D space.
Deep learning based implicit reconstruction systems can be classified into two categories, the forward class, and the converging class. In forward-class algorithms, the input data is first encoded to a latent code by an encoder neural network, and then decoded into the implicit field by a decoder neural network by the learned parameters. This category of architecture is capable of reconstructing 3D shapes from learned priors which reduces noise and prevents missing parts in the reconstruction. However, it performs poorly when reproducing targets not encountered in the training set and tends to over-smooth the output. The converging-class tries to learn a neural network that represents the entire implicit field for each individual object. This class performs better at reproducing details but is less reliable at reproducing the shape of an object and takes longer for individual objects. It is desirable therefore to overcome the shortcomings of the two classes to be able to produce a 3D reconstruction in a manner which is able to accurately reproduce the shape of the target while also being able to efficiently recreate complex details.
Summary of the invention
According to a first aspect of the present invention, there is provided a method for 3D reconstruction of a target, the method comprising: obtaining an initial global reconstruction of the target in a 3D space, inferred by a global machine learning model; providing, to a user, an initial visualisation of the target based on the reconstruction; receiving, from the user, at least one indication of at least one point of interest in the visualisation; resampling at least one first subsection of the target based on the at least one point of interest to obtain local data, wherein the local data is associated with the subsection based on spatial information that associates the local data with a point in 3D space; inputting the resampled local data and spatial information into a local feature machine learning model to obtain at least one 3D reconstruction of the target, wherein the local feature machine learning model has been trained to output a target reconstruction from local data of resampled subsections, and wherein the 3D coordinate system of the local 3D reconstruction aligns with the global 3D reconstruction; and merging the global 3D reconstruction with the local 3D reconstruction.
Preferably, the steps of the method are performed iteratively using the merged reconstruction and the further visualisation based on the merged reconstruction as the initial global reconstruction and initial visualisation in the next iteration until receiving, from the user, an indication to step.
Preferably, the global model is an encoder-decoder network comprising a global encoder trained to infer a global latent code from the target data, and the local feature machine learning model comprises an encoder-decoder network, the local feature encoder-decoder network comprising: a local feature encoder trained to infer a local feature latent code form the resampled local data and spatial information; and a local feature decoder trained to infer a representation of the target in the 3D space from a combination of the local feature latent code and the global latent code.
Preferably, the local 3D reconstruction comprises at least one reconstruction of at least one additional subsection of the target that is inferred by the local feature decoder based on the local data of the at least one first subsection and at least one property of the target learned by the global encoder.
Preferably, obtaining a global reconstruction of the target in a 3D space comprises inputting the global latent code to the local feature decoder wherein the local feature decoder is also trained to infer, from the global latent code, a global representation of the target in the 3D space.
Preferably, merging the global 3D reconstruction and the local 3D reconstruction comprises: receiving, from the local feature decoder, a local reconstruction corresponding to each point of interest; receiving, from the local feature decoder, weight information comprising a weight value for each point in space for each local reconstruction and the global reconstruction; and margining the global reconstruction and the at least one local reconstruction based on the weight information. Preferably, resampling the target based on the at least one point of interest comprises resampling a subspace in the global 3D space centred on the at least one point of interest.
Preferably, the local and global 3D reconstructions are each a scalar field representing the occupation probability of a point in space and wherein merging the global 3D reconstruction with the local 3D reconstruction comprises: combining the scalar field values of the global 3D reconstruction with the scalar field values of the local 3D reconstruction; and extracting a probability iso-surface from the combined scalar field to represent the shape of the target for visualisation.
According to a second aspect of the present invention, there is provided a computer system comprising a processor and a memory storing computer program code for performing the steps of the method set out above.
According to a third aspect of the present invention, there is provided a computer program or readable medium comprising computer program code to, when loaded on and executed by a computer, causes the computer to carry out the steps of the method set out above.
Brief Description of the Drawings
Embodiments of the present invention are now described, by way of example only, with reference to the accompanying drawings, in which:
Figure 1 is a flowchart showing steps of a method in accordance with an embodiment of the present invention.
Figure 2 is a flow diagram showing additional optional steps of the method of figure 1.
Figure 3 is a flowchart showing additional optional steps of the method of figure 1.
Figure 4 is a flow diagram showing additional optional steps of the method of figure 1
Figure 5 is a component diagram of a computer system suitable for the operation of embodiments of the present invention.
Detailed Description of the Invention
Figure 1 is a flowchart showing steps for carrying out a method 100 in accordance with embodiments of the present invention. At step 110, an initial global reconstruction of a target in a 3D space is obtained. The target may be any three dimensional object. The initial global reconstruction may be generated by a global model that is trained to reconstruct a 3D representation of the entirety of a target in 3D space from data that has been sampled from the target. In some examples, the global model may be provided the sample target data to obtain the global reconstruction. In other examples, the global reconstruction may be received from an external source; the external source may itself have generated the global reconstruction using a global model.
A sampling device may be used to acquire data on the target with spatial information that indicates a position in 3D space associated with the acquired data points. For example, a scanner may be used to scan an object surface to provide a point cloud that associates data captured by the scanner with a point in 3D space that corresponds to the position and angle of the scanner with acquiring the data. Alternatively, the model may be provided an image or multiple images captured by a camera along with information that indicates the position and angle of the camera view relative to the target. The data may be acquired directly from the sampling device or may be provided to the model from a database.
In some examples, the global model is an encoder-decoder network comprising an encoder and a decoder wherein the encoder is trained to output a global latent code from the acquired target data and spatial information and wherein the decoder is trained to output, from the global latent code, a 3D reconstruction of the target.
In one example, the global reconstruction is an implicit reconstruction. In particular, the global reconstruction may be an occupational probability scalar field wherein the decoder of the global encoder-decoder network is provided a set of query points comprising a set of points in the 3D space with the global latent code and outputs a scalar value between 0 and 1 representing the probability of each query point being occupied by the object. Other techniques for 3D global reconstruction of a target are known to the skilled person and may be used in place of implicit reconstruction.
At step 120, an initial visualisation of the target, based on the initial global reconstruction, is provided to a user, for example, via a computer display. The visualisation may be a 2D or 3D image representation of the reconstruction. This allows the user to observe the visualisation and be able to determine a point in the visualisation that does not satisfy the user’s expectation or which the user is otherwise interested in. At step 130, at least one indication of at least one point of interest in the visualisation is received from the user. This spatial point that the user has indicated may be referred to as a “seed”. The user may indicate more than one seed. At step 140, at least one first subsection of the target is resampled based on the point of interest, or “seed”, to obtain local data associated with the subsection. For example, the seed is set at a point in the 3D space of the global representation and the subsection to be resampled is defined based on its relative position to the seed in space. In one example, a subspace in the global 3D space centred on the at least one point of interest may be resampled; this may be a sphere or cube or other defined shape around the seed. The local data is associated with the subsection based on spatial information that associates the local data with a point in 3D space. The local data may be retrieved from the original data used to infer the initial global reconstruction or it may be resampled directly from the target using the same type of sampling device as the original target data. For example, if the global reconstruction was inferred from a point cloud where the points are associated with a 3D coordinate, the resampled data is also scanned by a device that produces a point cloud with points associated with a 3D coordinate. In another example, if the global reconstruction was inferred from multiple camera images, the resampled data may come from camera image which are associated with a position, angle and field of view relative to the target. The local data may then be obtained based on the 3D coordinate associated with the data being within the corresponding coordinates of the 3D subsection in the 3D space. If the 3D coordinate of the original data lies within the subsection marked by the seed, that data may be retrieved to compose the local data. If the target is being resampled directly, the sampling device may associate the acquired data with a 3D coordinate. In some examples, the local data may be sampled at a different resolution than the original data.
At step 150, the local data and corresponding spatial information is input into a local feature machine learning model to obtain a 3D reconstruction of the target. The local feature model is trained to output a target reconstruction from local data of resampled subsections. By providing the spatial information such that the 3D coordinates of the local data can be aligned with the coordinates of the reconstruction space, the 3D coordinates of the local reconstruction may align with the 3D coordinates of the global reconstruction. For example, the same point relative to the target may be set as an origin point in each coordinate system when sampling both the original global data and the local data.
At step 160, the global 3D reconstruction is merged with the local 3D reconstruction. The merging may be determined differently depending on different calculations based on different architectures of the models and different modes of reconstruction. For one example, both the local and global reconstructions may comprise shapes which are to be aligned and combined; in another example, the local reconstruction may comprise a set of values representing differences from the global reconstruction. Because the local model is trained on subsections of the target, the knowledge representation of the local features will be different from that of the target globally. For example, the global model will reproduce closed shapes representing an entire physical object while the local feature model does not. The shapes of local features may occur in different distributions to global features. Beneficially, the local reconstruction may more accurately represent the details of the shape of the target where the user is most interested, around the seed. Figure 2 is a flow diagram showing steps for carrying out method 100 comprising additional steps 210 and 220. Step 150 has already been described in relation to figure 1. In some examples, the local and global 3D reconstructions are each a scalar field representing the occupation probability of a point in space. Implementations of such implicit reconstructions will be known to the person skilled in the art. A model may be trained to infer a function over a 3D space of the probability of the point in space being occupied and then is provided a set of discrete points in 3D space, the query points, which are then assigned values by the function. In such examples, the step 160 may comprise a step 210 wherein the scalar field values of the global 3D reconstruction are combined with the scalar field values of the local 3D reconstruction. For example, both the local and global reconstruction field values are probabilities between 0 and 1 , and the two are added together based on a weighted sum.
Weight information comprising a weight value for each point in space for each local reconstruction and the global reconstruction may be calculated. The scalar field value for each point in space in the combined scalar field may then be the sum of the products of the weight value with the field value of the point of space in each reconstruction.
After step 210, step 160 may further comprise step 220 in which a probability iso-surface is extracted from the combined scalar field to represent the shape of the target for visualisation. For example, a value, or range of values, may be set or calculated to be the occupation probability value that represents the start of the surface of the target. The coordinates of the areas in space that contain points, or lie within points, that have this occupational probability value are extracted such that a visualisation of the target with a surface at those coordinates could be built.
Figure 3 is a flowchart showing steps for carrying out method 300 in accordance with embodiments of the current invention. Method 300 comprises the steps of method 100 with the same steps indicated with the same reference numerals as in figure 1 and the description of the same steps not repeated here. The steps of the method are performed iteratively using the merged reconstruction and the further visualisation based on the merged reconstruction as the initial global reconstruction and initial visualisation in the next iteration until receiving, from the user, an indication to stop. Method 300 may comprise step 310, after the user is provided an initial visualisation of the target, in which the method checks if it has received from the user an indication to stop. If it does, the method stops, if not, the method continues with step 130. This allows the user to observe the merged reconstruction and to place a seed at further points of interest or at the same point if the reconstruction is still not to their satisfaction. Each iteration of the steps may individually further improve the quality of the visual reconstruction at the target around the seeds. Outputs from the global model and the local feature model, such as the 3D reconstructions corresponding to each seed and their corresponding weightings, may be stored in a memory after the iteration in which they are generated, to allow the system to avoid recalculating the entire output at each iteration. The user may observe the level of improvement and then halt the process when they are satisfied with the quality of the reconstruction or when the rate of improvement has peaked.
In the examples described above, the global model may be an encoder-decoder network comprising a global encoder trained to infer a global latent code from the target data, and the local feature machine learning model may comprise an encoder-decoder network comprising a local feature encoder and a local feature decoder. The local feature encoder may be trained to infer a local feature latent code from the resampled local data and spatial information and the local feature decoder may be trained to infer a representation of the target in the 3D space from a combination of the local feature latent code and the global latent code. In examples where the method is applied iteratively, the latent codes may be stored in a memory after the iteration in which they are generated. The encoder-decoder networks may be deep-learning networks comprising multiple layers. In this case, the encoder can be considered to refer to the layers before the global latent code layer that is input into the local feature decoder, and the decoder can be considered to be the layers after which the latent code layer from either the local feature model or the global model can be input into the local feature model. The global latent code layer is a latent representation of the encoded information that implicitly represents the information of the structure and shape of the target learned by the global model. The local feature decoder can infer a representation of the target across all of the global 3D space, with greater accuracy at the local subsections of the 3D space corresponding to the resampled local data. Furthermore, because the local feature decoder has been trained to infer a representation of the target in the total 3D space from both the local feature latent code and the global latent code, the local feature decoder may use the properties of the target that have been learned by the global encoder to reconstruct additional subsections of the target, outside of the resampled subsection, with greater accuracy For example, the global encoder may have inferred that the target has a symmetry in its shape. Prior information about that symmetry may be encoded in the global latent code. The local feature decoder may then infer from that prior information that there are subsections of the target that correspond under symmetry to the subsection resampled based on the seed; for example if the seed was placed at the foot of a table leg of a rectangular table and the global encoder has inferred the rectangular symmetry of the table, the local feature decoder may infer that the other three feet of the table will match the foot where the seed is placed. The subsections that correspond to the resampled subsection under symmetry, such as the other three table feet, may then also be inferred by the local feature decoder, within the total 3D reconstruction, with greater accuracy, given the local feature latent code. In some examples, the global encoder-decoder network comprises a separate global decoder that is trained to infer the initial global 3D reconstruction from the global latent code. In other examples, the same decoder may be used for both the global network and the local feature network and the global latent code is input to the local feature decoder to obtain the initial global reconstruction, and the local feature decoder is also trained to infer a global representation of the target in the 3D space from the global latent code. While the parameters of the global network and the local feature network will be different due to the different set of inputs on which they will be trained, because they both relate to reconstruction of the same set of targets, some parameters will be shared. Thus sharing the same decoder for both the global network and the local feature network is more space efficient.
Figure 4 is a flowchart showing steps for carrying out a method 100 comprising additional steps 410, 420 and 430. Steps 110-150 are carried out as described above. In one example, the local feature decoder may also be trained to infer weight information comprising a weight value for each point in space for each local reconstruction based on the combined local feature latent code and the global latent code. In this case, step 160 of merging the global 3D reconstruction and the local 3D reconstruction may comprise: step 410 of receiving from the local feature decoder, each local reconstruction that corresponds to each seed; step 420 of receiving the weight information from the local feature decoder; and step 430 of merging the global reconstruction and the one or more local reconstructions based on the weight information.
When the global and local reconstructions are occupational probability scalar field reconstructions, step 430 can be carried out using steps 210 and 220 as described in relation to figure 2 above. Use of an encoder-decoder architecture for occupational probability field global reconstruction aids in eliminating noise and the risk of missing parts at the cost of possible over-smoothing in the final reconstruction that eliminates surface details of the target. The local reconstructions help compensate by more accurately reflecting surface details and by allowing the local network to reprocess the area of interest with an additional sample without having to reprocess the entire target data.
By training the local feature network to infer the weight information by using a loss function based on the merged reconstruction, the weight values will correspond to the subsections of the total 3D reconstruction of the target that will be reconstructed more accurately. The parts of the local feature reconstruction that have been reconstructed based on the resample subsections of the target, or based on the resampled subsections of the target and a target property learned by the global encoder, may be weighted higher. For example, the resampled data may correspond to a subsection of the target in a 3D space at a fixed distance from the seed. The weight value for points in space that are within that fixed distance may be higher than those for points in space outside of it, and the weight values may decline as the points in space get further from the seed. The weight values of the points in space of the global reconstruction may be higher outside of the fixed distance from the seed, as the global reconstruction may be necessarily relied on or may produce a more accurate representation outside of the resampled subsection. Because the weight information is output from the local feature decoder, the local feature decoder can determine the weight values based on properties of the target inferred by the global encoder. For example, the weight values for a local reconstruction can be based on the symmetry or shape of the target inferred by the global encoder. Points in space that are part of additional subsections of the target in the reconstruction that correspond, under symmetry, to the first subsection that was resampled based on the seed, may be weighted higher. The local feature decoder may also determine a confidence level of the values of the scalar field output and the weight values may reflect the confidence level so that a local reconstruction or global reconstruction which has lower confidence values for its reconstruction at a particular point in space is weighted lower.
Figure 5 is a component diagram of a computer system 500 suitable for the operation of embodiments of the present invention. System 500 may perform any of the methods described herein that correspond to embodiments of the present invention. System 500 comprises a processor 510 and a memory 520. Memory 520 may store computer program code or computer executable instructions that when executed by the processor 510 causes the processor to carry out the any of the methods described herein. In some examples the processor may comprise one or more processors, each of which may perform steps of the method as described above.
Insofar as embodiments described are implementable, at least in part, using a software- controlled programmable processing device, such as a microprocessor, digital signal processor or other processing device, data processing apparatus or system, it will be appreciated that a computer program for configuring a programmable device, apparatus or system to implement the foregoing described methods is envisaged as an aspect of the present invention. The computer program may be embodied as source code or undergo compilation for implementation on a processing device, apparatus or system or may be embodied as object code, for example.
Suitably, the computer program is stored on a non-transitory carrier medium in machine or device readable form, such as in the form of a computer readable medium. In examples, the medium can be one or more of a solid-state memory, magnetic memory such as disk or tape, optically or magneto-optically readable memory such as compact disk or digital versatile disk etc., and the processing device utilises the program or a part thereof to configure it for operation. The computer program may be supplied from a remote source embodied in a communications medium such as an electronic signal, radio frequency carrier wave or optical carrier wave. Such carrier media are also envisaged.

Claims

Claims
1. A computer-implemented method for 3D reconstruction of a target, the method comprising: obtaining an initial global reconstruction of the target in a 3D space, inferred by a global machine learning model; providing, to a user, an initial visualisation of the target based on the reconstruction; receiving, from the user, at least one indication of at least one point of interest in the visualisation; resampling at least one first subsection of the target based on the at least one point of interest to obtain local data, wherein the local data is associated with the subsection based on spatial information that associates the local data with a point in 3D space; inputting the resampled local data and spatial information into a local feature machine learning model to obtain at least one 3D reconstruction of the target, wherein the local feature machine learning model has been trained to output a target reconstruction from local data of resampled subsections, and wherein the 3D coordinate system of the local 3D reconstruction aligns with the global 3D reconstruction; and merging the global 3D reconstruction with the local 3D reconstruction.
2. The method of claim 1, wherein the steps of the method are performed iteratively using the merged reconstruction as the initial global reconstruction in the next iteration until receiving, from the user, an indication to stop.
3. The method claims of 1 or 2, wherein the global model is an encoder-decoder network comprising a global encoder trained to infer a global latent code from the target data, and wherein the local feature machine learning model comprises a local feature encoder-decoder network, the local feature encoder-decoder network comprising: a local feature encoder trained to infer a local feature latent code from the resampled local data and spatial information; and a local feature decoder trained to infer a representation of the target in the 3D space from a combination of the local feature latent code and the global latent code.
4. The method of claim 3 wherein the local 3D reconstruction comprises at least one reconstruction of at least one additional subsection of the target that is inferred by the local feature decoder based on the local data of the at least one first subsection and at least one property of the target learned by the global encoder.
5. The method of claim 3 or 4 wherein obtaining a global reconstruction of the target in a 3D space comprises: inputting the global latent code to the local feature decoder wherein the local feature decoder is trained to infer, from the global latent code, a global representation of the target in the 3D space.
6. The method of claims 3, 4, or 5 wherein merging the global 3D reconstruction and the local 3D reconstruction comprises: receiving, from the local feature decoder, a local reconstruction corresponding to each point of interest; receiving, from the local feature decoder, weight information comprising a weight value for each point in space for each local reconstruction; and merging the global reconstruction and the at least one local reconstruction based on the weight information; wherein the local feature decoder is trained to infer the weight information based on the combined local feature latent code and the global latent code.
7. The method of any preceding claim wherein resampling the target based on the at least one point of interest comprises resampling a subspace in the global 3D space centred on the at least one point of interest.
8. The method of any preceding claim, wherein the local and global 3D reconstructions are each a scalar field representing an occupation probability of a point in space and wherein merging the global 3D reconstruction with the local 3D reconstruction comprises: combining the scalar field values of the global 3D reconstruction with the scalar field values of the local 3D reconstruction; and, optionally, extracting a probability iso-surface from the combined scalar field to represent the shape of the target for visualisation.
9. A computer system comprising a processor and a memory storing instructions executable by the processor to cause the processor to perform the method of any preceding claims.
10. A computer readable medium comprising computer program code to, when loaded on and executed by a computer, causes the computer to carry out the method of any of claims 1 to 8.
PCT/EP2023/070218 2022-08-17 2023-07-20 3d reconstruction of a target WO2024037822A1 (en)

Applications Claiming Priority (4)

Application Number Priority Date Filing Date Title
EP22190722 2022-08-17
GBGB2211987.9A GB202211987D0 (en) 2022-08-17 2022-08-17 3d reconstruction of a target
EP22190722.3 2022-08-17
GB2211987.9 2022-08-17

Publications (1)

Publication Number Publication Date
WO2024037822A1 true WO2024037822A1 (en) 2024-02-22

Family

ID=87312103

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/EP2023/070218 WO2024037822A1 (en) 2022-08-17 2023-07-20 3d reconstruction of a target

Country Status (1)

Country Link
WO (1) WO2024037822A1 (en)

Citations (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2021184933A1 (en) * 2020-03-20 2021-09-23 华为技术有限公司 Three-dimensional human body model reconstruction method

Patent Citations (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2021184933A1 (en) * 2020-03-20 2021-09-23 华为技术有限公司 Three-dimensional human body model reconstruction method

Non-Patent Citations (4)

* Cited by examiner, † Cited by third party
Title
JERRY LIU ET AL: "Interactive 3D Modeling with a Generative Adversarial Network", ARXIV.ORG, CORNELL UNIVERSITY LIBRARY, 201 OLIN LIBRARY CORNELL UNIVERSITY ITHACA, NY 14853, 16 June 2017 (2017-06-16), XP081306205 *
RODRÍGUEZ AGUILERA ALEJANDRO ET AL: "A parallel resampling method for interactive deformation of volumetric models", COMPUTERS AND GRAPHICS, ELSEVIER, GB, vol. 53, 20 October 2015 (2015-10-20), pages 147 - 155, XP029315335, ISSN: 0097-8493, DOI: 10.1016/J.CAG.2015.10.002 *
SIMON GIEBENHAIN ET AL: "AIR-Nets: An Attention-Based Framework for Locally Conditioned Implicit Representations", ARXIV.ORG, CORNELL UNIVERSITY LIBRARY, 201 OLIN LIBRARY CORNELL UNIVERSITY ITHACA, NY 14853, 22 October 2021 (2021-10-22), XP091079773 *
XIE HAOZHE ET AL: "Pix2Vox++: Multi-scale Context-aware 3D Object Reconstruction from Single and Multiple Images", INTERNATIONAL JOURNAL OF COMPUTER VISION, SPRINGER, vol. 128, no. 12, 15 July 2020 (2020-07-15), pages 2919 - 2935, XP037257767, DOI: 10.1007/S11263-020-01347-6 *

Similar Documents

Publication Publication Date Title
US11410323B2 (en) Method for training convolutional neural network to reconstruct an image and system for depth map generation from an image
US11055847B2 (en) Adversarial and dual inverse deep learning networks for medical image analysis
CN109948796B (en) Self-encoder learning method, self-encoder learning device, computer equipment and storage medium
CN112767554B (en) Point cloud completion method, device, equipment and storage medium
CN112927359B (en) Three-dimensional point cloud completion method based on deep learning and voxels
US20020085219A1 (en) Method of and system for generating and viewing multi-dimensional images
AU2016266968A1 (en) Modelling a three-dimensional space
US10373372B2 (en) System and method for object recognition
EP3857457B1 (en) Neural network systems for decomposing video data into layered representations
CN114170146A (en) Image processing method, image processing device, electronic equipment and computer readable storage medium
CN111680573B (en) Face recognition method, device, electronic equipment and storage medium
Nakashima et al. Learning to drop points for lidar scan synthesis
CN113763539B (en) Implicit function three-dimensional reconstruction method based on image and three-dimensional input
WO2024037822A1 (en) 3d reconstruction of a target
CN116524070A (en) Scene picture editing method and system based on text
Niemirepo et al. Open3DGen: open-source software for reconstructing textured 3D models from RGB-D images
US11790606B2 (en) Determining camera rotations based on known translations
Chen et al. MoCo‐Flow: Neural Motion Consensus Flow for Dynamic Humans in Stationary Monocular Cameras
JP3871582B2 (en) Object shape restoration / moving object detection method, object shape restoration / moving object detection device, object shape restoration / moving object detection program, and recording medium recording this program
CN116168137B (en) New view angle synthesis method, device and memory based on nerve radiation field
CN114399574B (en) Method and device for generating animation with detailed expression for single face scanning data
KR102648938B1 (en) Method and apparatus for 3D image reconstruction based on few-shot neural radiance fields using geometric consistency
WO2022264519A1 (en) Information processing device, information processing method, and computer program
Gawrilowicz Traceable Surface Reconstruction
Gunderson MORP: Monocular Orientation Regression Pipeline

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 23741735

Country of ref document: EP

Kind code of ref document: A1

DPE1 Request for preliminary examination filed after expiration of 19th month from priority date (pct application filed from 20040101)