WO2024127310A1 - Autocodeurs pour la validation de représentations de soins buccodentaires 3d - Google Patents

Autocodeurs pour la validation de représentations de soins buccodentaires 3d Download PDF

Info

Publication number
WO2024127310A1
WO2024127310A1 PCT/IB2023/062704 IB2023062704W WO2024127310A1 WO 2024127310 A1 WO2024127310 A1 WO 2024127310A1 IB 2023062704 W IB2023062704 W IB 2023062704W WO 2024127310 A1 WO2024127310 A1 WO 2024127310A1
Authority
WO
WIPO (PCT)
Prior art keywords
oral care
representation
mesh
tooth
setups
Prior art date
Application number
PCT/IB2023/062704
Other languages
English (en)
Inventor
Jonathan D. Gandrud
Michael Starr
Original Assignee
3M Innovative Properties Company
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 3M Innovative Properties Company filed Critical 3M Innovative Properties Company
Publication of WO2024127310A1 publication Critical patent/WO2024127310A1/fr

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T17/00Three dimensional [3D] modelling, e.g. data description of 3D objects
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/70Arrangements for image or video recognition or understanding using pattern recognition or machine learning
    • G06V10/74Image or video pattern matching; Proximity measures in feature spaces
    • G06V10/75Organisation of the matching processes, e.g. simultaneous or sequential comparisons of image or video features; Coarse-fine approaches, e.g. multi-scale approaches; using context analysis; Selection of dictionaries
    • G06V10/757Matching configurations of points or features
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/70Arrangements for image or video recognition or understanding using pattern recognition or machine learning
    • G06V10/77Processing image or video features in feature spaces; using data integration or data reduction, e.g. principal component analysis [PCA] or independent component analysis [ICA] or self-organising maps [SOM]; Blind source separation
    • G06V10/776Validation; Performance evaluation
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/70Arrangements for image or video recognition or understanding using pattern recognition or machine learning
    • G06V10/82Arrangements for image or video recognition or understanding using pattern recognition or machine learning using neural networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2210/00Indexing scheme for image generation or computer graphics
    • G06T2210/41Medical
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V2201/00Indexing scheme relating to image or video recognition or understanding
    • G06V2201/03Recognition of patterns in medical or anatomical images

Definitions

  • Patent Applications is incorporated herein by reference: 63/432,627; 63/366,492; 63/366,495; 63/352,850; 63/366,490; 63/366,494; 63/370,160; 63/366,507; 63/352,877; 63/366,514; 63/366,498; 63/366,514; and 63/264,914.
  • This disclosure relates to configurations and training of machine learning models to improve the accuracy of automatically validating 3D oral care representations using one or more autoencoders, to determine whether those 3D oral care representations are suitable for use in creating oral care appliances, such as dental restoration appliances or orthodontic treatment appliances (e.g., clear tray aligners or indirect bonding trays).
  • dental restoration appliances e.g., clear tray aligners or indirect bonding trays.
  • orthodontic treatment appliances e.g., clear tray aligners or indirect bonding trays.
  • the present disclosure describes systems and techniques for training and using one or more machine learning models, such as neural networks to validate a 3D oral care representation as acceptable (or not) for use in the creation of an oral care appliance.
  • techniques of this disclosure may train an encoder-decoder structure, such as a reconstruction autoencoder (e.g., a variational autoencoder optionally utilizing normalizing flows), to reconstruct a particular type of 3D oral care representation.
  • the reconstruction autoencoder may be trained on ground truth examples of the 3D oral care representation, until that reconstruction autoencoder becomes capable to reconstruct trial examples of that type of 3D oral care representation.
  • the reconstruction autoencoder may yield a low reconstruction error when reconstructing a trial 3D oral care representation that falls within the distribution of the examples of the training dataset.
  • a high reconstruction error may indicate that a trial 3D oral care representation does not fall within such a training distribution, which may result in a failing validation output.
  • the validation techniques described herein may issue output indicating that the trial 3D oral care representation is suitable for use in creating an oral care appliance (e.g., a clear tray aligner, an indirect bracket bonding tray, a dental restoration appliance, or the like).
  • An encoder-decoder structure may comprise at least one encoder or at least one decoder.
  • Nonlimiting examples of an encoder-decoder structure include a 3D U-Net, a transformer, a pyramid encoderdecoder or an autoencoder, among others.
  • the validation techniques described herein may contain aspects derived from a denoising diffusion model (e.g., a neural network which may be trained to iteratively denoise one or more 3D oral care representations - such as 3D oral care representations which are initialized stochastically or using Gaussian noise).
  • a denoising diffusion model e.g., a neural network which may be trained to iteratively denoise one or more 3D oral care representations - such as 3D oral care representations which are initialized stochastically or using Gaussian noise.
  • the validation techniques described herein may use one or more neural networks which are trained to use mathematical operations associated with continuous normalizing flows (e.g., the use of a neural network which may be trained in one form and then be inverted for use in inference).
  • autoencoders include variational autoencoders, regularized autoencoders, masked autoencoders or capsule autoencoders.
  • a machine learning (ML) model such as an autoencoder, may be trained on examples of 3D oral care representations where ground truth data are provided to the ML model, and loss functions are used to quantify the difference between predicted and ground tmth examples.
  • Loss values may then be used to update the validation ML model (e.g., to update the weights of a neural network).
  • Such validation techniques may determine whether a trial 3D oral care representation is acceptable or suitable for use in creating an oral care appliance.
  • "Acceptable” may, in some instances, indicate that a trial 3D oral care representation conforms with the distribution of the ground tmth examples that were used in training the ML validation model.
  • "Acceptable” may, in some instances, indicate that the trial 3D oral care representation is correctly shaped (or structured) or is correctly positioned relative to one or more aspects of dental anatomy.
  • the autoencoder-based validation techniques may determine whether the component intersects with the correct landmarks or other portions of dental anatomy (e.g., the incisal edges and cusp tips - for the mold parting surface). Determinations that the validation techniques of this disclosure may output include one or more of whether a CTA trimline intersect the gums in a manner that reflects the distribution of the ground truth, whether a library component get placed correctly with relation to one or more target teeth (e.g., snap clamps placed in relation to the posterior teeth or a center clip in relation to the incisors) or with relation to one or more landmarks on a target tooth.
  • target teeth e.g., snap clamps placed in relation to the posterior teeth or a center clip in relation to the incisors
  • Nonlimiting examples include determinations of whether a hardware element get placed on the face of tooth, with margins which reflect the distribution of ground truth examples, whether the mesh element labeling for a segmentation (or mesh cleanup) operation conform to the distribution of the labels in the ground truth examples, and/or whether the shape and/or structure of a dental restoration tooth design conform with the distribution of tooth designs amongst the ground truth training examples.
  • FIG. 1 shows a method of augmenting training data for use in training machine learning (ML) models of this disclosure.
  • FIG. 2 shows a method of training a capsule autoencoder.
  • FIG. 3 shows a method of training a tooth reconstruction autoencoder.
  • FIG. 4 shows a method of using a deployed fully trained tooth reconstruction autoencoder.
  • FIG. 5 shows a reconstructed tooth mesh, which has been reconstructed using a reconstruction autoencoder, according to techniques of this disclosure.
  • FIG. 6 shows a reconstructed tooth mesh, which has been reconstructed using a reconstruction autoencoder, according to techniques of this disclosure.
  • FIG. 7 shows a visualization of reconstruction error for a tooth.
  • FIG. 8 shows reconstruction error values for several tooth reconstructions.
  • FIG. 9 shows method of training a reconstruction autoencoder.
  • FIG. 10 shows non-limiting example code for a reconstruction autoencoder.
  • FIG. 14 shows a method of validating a 3D oral care representation using a fully trained reconstruction autoencoder.
  • a first module e.g., an autoencoder neural network
  • a 3D oral care representation e.g., trained to reconstruct a tooth mesh - comprising crown, root and/or attached articles
  • a 3D encoder may be trained to encode an oral care mesh into a latent form
  • a 3D decoder may be trained to reconstruct that latent form into a facsimile of the received oral care mesh, where techniques disclosed herein may be used to measure the resulting reconstruction error.
  • the first module may create a representation.
  • a second module may use that representation for prediction. There may be one or more instances of the first module, and there may be one or more instances of the second module.
  • Described herein are techniques which may make use of an autoencoder which has been trained for oral care mesh reconstruction, which provides the advantage of encoding a potentially complex oral care mesh into a latent form (e.g., such as a latent vector or latent capsule) which may have reduced dimensionality and may be ingested by an instance of the second module (e.g., a predictive model for mesh cleanup, setups prediction, tooth restoration design generation, classification of 3D representations, validation of 3D representations, or setups comparison) for prediction purposes. While the dimensionality of the latent form may be reduced relative to the received oral care mesh, information about the reconstruction characteristics of the received oral care mesh may be retained.
  • a latent form e.g., such as a latent vector or latent capsule
  • the second module e.g., a predictive model for mesh cleanup, setups prediction, tooth restoration design generation, classification of 3D representations, validation of 3D representations, or setups comparison
  • the first module may also be trained to produce other kinds of representations, such as those generated by neural networks performing convolution and/or pooling operations (e.g., a network with a size 5 convolution kernel which also performs average pooling, or a network such as a U-Net).
  • neural networks performing convolution and/or pooling operations
  • a network with a size 5 convolution kernel which also performs average pooling e.g., a network with a size 5 convolution kernel which also performs average pooling, or a network such as a U-Net).
  • Either or both of the first and/or second modules may receive a variety of input data, as described herein, including tooth meshes for one or both arches of the patient.
  • the tooth data may be presented in the form of 3D representations, such as meshes or point clouds. These data may be preprocessed, for example, by arranging the constituent mesh elements into lists and computing an optional mesh element feature vector for each mesh element.
  • Such feature vectors may provide valuable information about the shape and/or structure of an oral care mesh to either or both of the first and/or second modules.
  • the first module which generates the representations, may receive the vertices of a 3D mesh (or of a 3D point cloud) and compute a mesh element feature vector for each vertex.
  • the metric value may be received as input of either or both of the first and second modules, as a way of training the underlying model of that particular module to encode a distribution of such a metric over the several examples of the training dataset.
  • the network may then receive this metric value as an input, to assist in training the network to link that inputted metric value to the physical aspects of the ground truth oral care mesh which is used in loss calculation.
  • Such a loss calculation may quantify the difference between a prediction and a ground truth example (e.g., between a predicted oral care mesh and a ground truth oral care mesh).
  • the techniques of this disclosure may, through the course of loss calculation and subsequent backpropagation, train the network to encode a distribution of a given metric.
  • one or more oral care parameters may be defined to specify one or more aspects of an intended oral care mesh, which is to be generated using either or both of the first and/or second modules which has been trained for that purpose.
  • an oral care parameter may be defined which corresponds to an oral care metric, which may be received as input to either or both of a deployed first module and/or a deployed second module, and be taken as an instruction to that module to generate an oral care mesh with the specified customization. This interplay between oral care metrics and oral care parameters may also apply to the training and deployment of other predictive models in oral care as well.
  • the systems of this disclosure may enable orthodontic treatment planning, which may involve setups prediction as at least one operation.
  • Systems of this disclosure may also enable restoration design generation, where one or more restored tooth designs are generated and processed in the course of creating oral care appliances.
  • Systems of this disclosure may enable either or both of orthodontic or dental treatment planning, or may enable automation steps in the generation of either or both of orthodontic or dental appliances. Some appliances may enable both of dental and orthodontic treatment, while other appliances may enable one or the other.
  • a typical example of a cohort patient case may contain up to 32 crown meshes (e.g., which may each contain tens of thousands of vertices or tens of thousands of faces), up to 32 root meshes (e.g., which may each contain tens of thousands of vertices or tens of thousands of faces), multiple gingiva mesh (e.g., which may each contain tens of thousands of vertices or tens of thousands of faces) or one or more JSON files which may each contain tens of thousands of values (e.g., objects, arrays, strings, real values, Boolean values or Null values).
  • values e.g., objects, arrays, strings, real values, Boolean values or Null values
  • This disclosure pertains to digital oral care, which encompasses the fields of digital dentistry and digital orthodontics.
  • This disclosure generally describes methods of processing three-dimensional (3D) representations of oral care data.
  • 3D representation is a 3D geometry.
  • a 3D representation may include, be, or be part of one or more of a 3D polygon mesh, a 3D point cloud (e.g., such as derived from a 3D mesh), a 3D voxelized representation (e.g., a collection of voxels - for sparse processing), or 3D representations which are described by mathematical equations.
  • 3D representation may describe elements of the 3D geometry and/or 3D structure of an object.
  • a third arch S3 includes the same meshes as SI and S2, which are arranged (e.g., using transforms) in their positions in the mouth, where the teeth are in the predicted final setup poses (e.g., as predicted by one or more of the techniques of this disclosure).
  • S4 is a counterpart to S3, where the teeth are in the poses corresponding to one of the several intermediate stages of orthodontic treatment with clear tray aligners.
  • a patient’s dentition may include one or more 3D representations of the patient’s teeth (e.g., and/or associated transforms), gums and/or other oral anatomy.
  • An orthodontic metric may, in some implementations, quantify the relative positions and/or orientations of at least one 3D representation of a tooth relative to at least one other 3D representation of a tooth.
  • a restoration design metric may, in some implementations, quantify at least one aspect of the structure and/or shape of a 3D representation of a tooth.
  • An orthodontic landmark (OL) may, in some implementations, locate one or more points or other structural regions of interest on a 3D representation of a tooth.
  • Doctor Restoration Design Preferences may, in some implementations, specify at least one typical value for an RDP, which may, in some instances, be derived from past cases which have been treated by one or more oral care practitioners.
  • 3D oral care representations may include, but are not limited to: 1) a set of mesh element labels which may be applied to the 3D mesh elements of teeth/gums/hardware/appliance meshes (or point clouds) in the course of mesh segmentation or mesh cleanup; 2) 3D representation(s) for one or more teeth/gums/hardware/appliances for which shapes have been modified (e.g., trimmed, distorted, or filled-in) in the course of mesh segmentation or mesh cleanup; 3) one or more coordinate systems (e.g., describing one, two, three or more coordinate axes) for a single tooth or a group of teeth (such as a full arch - as with the LDE coordinate system); 4) 3D representation(s) for one or more teeth for which shapes have been modified or otherwise made suitable for use in
  • one or more vectors S of the orthodontic metrics described elsewhere in this disclosure may be provided to a neural network for setups predictions.
  • the advantage is an improved capacity for the network to become trained to understand the state of a maloccluded setup and therefore be able to predict a more accurate final setup or intermediate stage.
  • the neural networks may take as input one or more indications of interproximal reduction (IPR) U, which may indicate the amount of enamel that is to be removed from a tooth during the course orthodontic treatment (either mesially or distally).
  • IPR information e.g., quantity of IPR that is to be performed on one or more teeth, as measured in millimeters, or one or more binary flags to indicate whether or not IPR is to be performed on each tooth identified by flagging
  • the vector(s) and/or capsule(s) resulting from such a concatenation may be provided to one or more of the neural networks of the present disclosure, with the technical improvement or added advantage of enabling that predictive neural network to account for IPR.
  • IPR is especially relevant to setups prediction methods, which may determine the positions and poses of teeth at the end of treatment or during one or more stages during treatment. It is important to account for the amount of enamel that is to be removed ahead of predicted tooth movements.
  • one or more procedure parameters K and/or doctor preferences vectors L may be introduced to a setups prediction model.
  • one or more optional vectors or values of tooth position N e.g., XYZ coordinates, in either tooth local or global coordinates
  • tooth orientation O e.g., pose, such as in transformation matrices or quaternions, Euler angles or other forms described herein
  • dimensions of teeth P e.g., length, width, height, circumference, diameter, diagonal measure, volume - any of which dimensions may be normalized in comparison to another tooth or teeth
  • distance between adjacent teeth Q may be used to describe the intended dimensions of a tooth for dental restoration design generation.
  • tooth dimensions P may be measured inside a plane, such as the plane that intersects the centroid of the tooth, or the plane that intersects a center point that is located midway between the centroid and either the incisal-most extent or the gingival-most extent of the tooth.
  • the tooth dimension of height may be measured as the distance from gums to incisal edge.
  • the tooth dimension of width may be measured as the distance from the mesial extent to the distal extent of the tooth.
  • the circularity or roundness of the tooth cross-section may be measured and included in the vector P.
  • Circularity or roundness may be defined as the ratio of the radii of inscribed and circumscribed circles.
  • the distance Q between adjacent teeth can be implemented in different ways (and computed using different distance definitions, such as Euclidean or geodesic).
  • a distance QI may be measured as an averaged distance between the mesh elements of two adjacent teeth.
  • a distance Q2 may be measured as the distance between the centers or centroids of two adjacent teeth.
  • a distance Q3 may be measured between the mesh elements of closest approach between two adjacent teeth.
  • a distance Q4 may be measured between the cusp tips of two adjacent teeth. Teeth may, in some implementations, be considered adjacent within an arch.
  • Teeth may, in some implementations, also be considered adjacent between opposing arches.
  • any of QI, Q2, Q3 and Q4 may be divided by a term for the purpose of normalizing the resulting value of Q.
  • the normalizing term may involve one or more of: the volume of a tooth, the count of mesh elements in a tooth, the surface area of a tooth, the cross-sectional area of a tooth (e.g., as projected into the XY plane), or some other term related to tooth size.
  • Other information about the patient’s dentition or treatment needs may be concatenated with the other input vectors to one or more of MLP, GAN, generator, encoder structure, decoder structure, transformer, VAE, conditional VAE, regularized VAE, 3D U-Net, capsule autoencoder, diffusion model, and/or any of the neural networks models listed elsewhere in this disclosure.
  • the vector M may contain flags which apply to one or more teeth.
  • M contains at least one flag for each tooth to indicate whether the tooth is pinned.
  • M contains at least one flag for each tooth to indicate whether the tooth is fixed.
  • M contains at least one flag for each tooth to indicate whether the tooth is pontic.
  • Other and additional flags are possible for teeth, as are combinations of fixed, pinned and pontic flags.
  • a flag that is set to a value that indicates that a tooth should be fixed is a signal to the network that the tooth should not move over the course of treatment.
  • the neural network loss function may be designed to be penalized for any movement in the indicated teeth (and in some particular cases, may be heavily penalized).
  • a flag to indicate that a tooth is pontic informs the network that the tooth gap is to be maintained, although that gap is allowed to move.
  • M may contain a flag indicating that a tooth is missing.
  • the presence of one or more fixed teeth in an arch may aid in setups prediction, because the one or more fixed teeth may provide an anchor for the poses of the other teeth in the arch (i.e., may provide a fixed reference for the pose transformations of one or more of the other teeth in the arch).
  • one or more teeth may be intentionally fixed, so as to provide an anchor against which the other teeth may be positioned.
  • a 3D representation (such as a mesh) which corresponds to the gums may be introduced, to provide a reference point against which teeth can be moved.
  • one or more of the optional input vectors K, L, M, N, O, P, Q, R, S, U and V described elsewhere in this disclosure may also be introduced to the input or into an intermediate layer of one or more of the predictive models of this disclosure.
  • these optional vectors may be introduced to the MLP Setups, GDL Setups, RL Setups, VAE Setups, Capsule Setups and/or Diffusion Setups, with the advantage of enabling the respective model to output setups which better meet the orthodontic treatment needs of the patient.
  • such inputs may be introduced, for example, by being concatenated with one or more latent vectors A which are also provided to one or more of the predictive models of this disclosure.
  • such inputs may be introduced, for example, by being concatenated with one or more latent capsules T which are also provided to one or more of the predictive models of this disclosure.
  • a setups prediction model (such as GDL Setups, RL Setups, VAE Setups, Capsule Setups, MLP Setups and Diffusion Setups) may take as input one or more latent capsules T which correspond to one or more input oral care meshes (e.g., such as tooth meshes).
  • a setups prediction method may take as input both of A and T.
  • Various loss calculation techniques are generally applicable to the techniques of this disclosure (e.g., GDL Setups, RL Setups, VAE Setups, Capsule Setups, MLP Setups, Diffusion Setups, PT Setups, Similarity Setups, Setups Classification, Tooth Classification, VAE Mesh Element Labelling, MAE Mesh In-Filling and the imputation of procedure parameters).
  • Losses may also be used to train encoder structures and decoder structures.
  • a KL- Divergence loss may be used, at least in part, to train one or more of the neural networks of the present disclosure, such as a mesh reconstruction autoencoder or the generator of GDL Setups, which the advantage of imparting Gaussian behavior to the optimization space.
  • This Gaussian behavior may enable a reconstruction autoencoder to produce a better reconstruction (e.g., when a latent vector representation is modified and that modified latent vector is reconstructed using a decoder, the resulting reconstruction is more likely to be a valid instance of the inputted representation).
  • There are other techniques for computing losses which may be described elsewhere in this disclosure. Such losses may be based on quantifying the difference between two or more 3D representations.
  • MSE loss calculation may involve the calculation of an average squared distance between two sets, vectors or datasets. MSE may be generally minimized. MSE may be applicable to a regression problem, where the prediction generated by the neural network or other machine learning model may be a real number.
  • a neural network may be equipped with one or more linear activation units on the output to generate an MSE prediction.
  • Mean absolute error (MAE) loss and mean absolute percentage error (MAPE) loss can also be used in accordance with the techniques of this disclosure.
  • Cross entropy may, in some implementations, be used to quantify the difference between two or more distributions.
  • Cross entropy loss may, in some implementations, be used to train the neural networks of the present disclosure.
  • Cross entropy loss may, in some implementations, involve comparing a predicted probability to a ground truth probability.
  • Other names of cross entropy loss include “logarithmic loss,” “logistic loss,” and “log loss”.
  • a small cross entropy loss may indicate a better (e.g., more accurate) model.
  • Cross entropy loss may be logarithmic.
  • Cross entropy loss may, in some implementations, be applied to binary classification problems.
  • a neural network may be equipped with a sigmoid activation unit at the output to generate a probability prediction.
  • cross entropy may also be used.
  • a neural network trained to make multi-class predictions may, in some implementations, be equipped with one or more softmax activation functions at the output (e.g., where there is one output node for class that is to be predicted).
  • Other loss calculation techniques which may be applied in the training of the neural networks of this disclosure include one or more of: Huber loss, Hinge loss, Categorical hinge loss, cosine similarity, Poisson loss, Logcosh loss, or mean squared logarithmic error loss (MSLE). Other loss calculation methods are described herein and may be applied to the training of any of the neural networks described in the present disclosure.
  • One or more of the neural networks of the present disclosure may, in some implementations, be trained, at least in part by a loss which is based on at least one of: a Point-wise Mesh Euclidean Distance (PMD) and an Earth Mover’s Distance (EMD).
  • PMD Point-wise Mesh Euclidean Distance
  • EMD Earth Mover’s Distance
  • Some implementations may incorporate a Hausdorff Distance (HD) calculation into the loss calculation.
  • HD Hausdorff Distance
  • Computing the Hausdorff distance between two or more 3D representations may provide one or more technical improvements, in that the HD not only accounts for the distances between two meshes, but also accounts for the way that those meshes are oriented, and the relationship between the mesh shapes in those orientations (or positions or poses).
  • Hausdorff distance may improve the comparison of two or more tooth meshes, such as two or more instances of a tooth mesh which are in different poses (e.g., such as the comparison of predicted setup to ground truth setup which may be performed in the course of computing a loss value for training a setups prediction neural network).
  • Reconstruction loss may compare a predicted output to a ground truth (or reference) output.
  • all_points_target is a 3D representation (e.g., a 3D mesh or point cloud) corresponding to ground tmth data (e.g., a ground truth tooth restoration design, or a ground tmth example of some other 3D oral care representation).
  • all_points_predicted is a 3D representation (e.g., a 3D mesh or point cloud) corresponding to generated or predicted data (e.g., a generated tooth restoration design, or a generated example of some other kind of 3D oral care representation).
  • reconstruction loss may additionally (or alternatively) involve L2 loss, mean absolute error (MAE) loss or Huber loss terms.
  • Reconstruction error may compare reconstructed output data (e.g., as generated by a reconstruction autoencoder, such as a tooth design which has been generated for use in generating a dental restoration appliance) to the original input data (e.g., the data which were provided to the input of the reconstruction autoencoder, such as a pre-restoration tooth).
  • all_points_input is a 3D representation (e.g., a 3D mesh or point cloud) corresponding to input data (e.g., the pre-restoration tooth design which was provided to a reconstruction autoencoder, or another 3D oral care representation which is provided to the input of an ML model).
  • all_points_reconstructed is a 3D representation (e.g., 3D mesh or point cloud) corresponding to reconstructed (or generated) data (e.g., a reconstructed tooth restoration design, or another example of a generated 3D oral care representation).
  • reconstruction loss is concerned with computing a difference between a predicted output and a reference output
  • reconstruction error is concerned with computing a difference between a reconstructed output and an original input from which the reconstructed data are derived.
  • the techniques of this disclosure may include operations such as 3D convolution, 3D pooling, 3D unconvolution and 3D unpooling.
  • 3D convolution may aid segmentation processing, for example in down sampling a 3D mesh.
  • 3D un-convolution undoes 3D convolution for example, in a U- Net.
  • 3D pooling may aid the segmentation processing, for example in summarized neural network feature maps.
  • 3D un-pooling undoes 3D pooling for example in a U-Net.
  • These operations may be implemented by way of one or more layers in the predictive or generative neural networks described herein. These operations may be applied directly on mesh elements, such as mesh edges or mesh faces.
  • neural networks may be trained to operate on 2D representations (such as images). In some implementations of the techniques of this disclosure, neural networks may be trained to operate on 3D representations (such as meshes or point clouds).
  • An intraoral scanner may capture 2D images of the patient's dentition from various views. An intraoral scanner may also (or alternatively) capture 3D mesh or 3D point cloud data which describes the patient's dentition.
  • autoencoders or other neural networks described herein may be trained to operate on either or both of 2D representations and 3D representations.
  • a 2D autoencoder (comprising a 2D encoder and a 2D decoder) may be trained on 2D image data to encode an input 2D image into a latent form (such as a latent vector or a latent capsule) using the 2D encoder, and then reconstruct a facsimile of the input 2D image using the 2D decoder.
  • a latent form such as a latent vector or a latent capsule
  • 2D images may be readily captured using one or more of the onboard cameras.
  • 2D images may be captured using an intraoral scanner which is configmed for such a function.
  • 2D image convolution may involve the "sliding" of a kernel across a 2D image and the calculation of elementwise multiplications and the summing of those elementwise multiplications into an output pixel.
  • the output pixel that results from each new position of the kernel is saved into an output 2D feature matrix.
  • neighboring elements e.g., pixels
  • may be in well-defined locations e.g., above, below, left and right
  • a 2D pooling layer may be used to down sample a feature map and summarize the presence of certain features in that feature map.
  • 2D reconstruction error may be computed between the pixels of the input and reconstmcted images.
  • the mapping between pixels may be well understood (e.g., the upper pixel [23, 134] of the input image is directly compared to pixel [23,134] of the reconstructed image, assuming both images have the same dimensions).
  • Modem mobile devices may also have the capability of generating 3D data (e.g., using multiple cameras and stereophotogrammetry, or one camera which is moved around the subject to capture multiple images from different views, or both), which in some implementations, may be arranged into 3D representations such as 3D meshes, 3D point clouds and/or 3D voxelized representations.
  • 3D representations such as 3D meshes, 3D point clouds and/or 3D voxelized representations.
  • the analysis of a 3D representation of the subject may in some instances provide technical improvements over 2D analysis of the same subject.
  • a 3D representation may describe the geometry and/or structure of the subject with less ambiguity than a 2D representation (which may contain shadows and other artifacts which complicate the depiction of depth from the subject and texture of the subject).
  • 3D processing may enable technical improvements because of the inverse optics problem which may, in some instances, affect 2D representations.
  • the inverse optics problem refers to the phenomenon where, in some instances, the size of a subject, the orientation of the subject and the distance between the subject and the imaging device may be conflated in a 2D image of that subject. Any given projection of the subject on the imaging sensor could map to an infinite count of ⁇ size, orientation, distance ⁇ pairings.
  • 3D representations enable the technical improvement in that 3D representations remove the ambiguities introduced by the inverse optics problem.
  • a device that is configmed with the dedicated purpose of 3D scanning such as a 3D intraoral scanner (or a CT scanner or MRI scanner), may generate 3D representations of the subject (e.g., the patient's dentition) which have significantly higher fidelity and precision than is possible with a handheld device.
  • 3D intraoral scanner or a CT scanner or MRI scanner
  • 3D representations of the subject e.g., the patient's dentition
  • the use of a 3D autoencoder is offers technical improvements (such as increased data precision), to extract the best possible signal out of those 3D data (i.e., to get the signal out of the 3D crown meshes used in tooth classification or setups classification).
  • a 3D autoencoder (comprising a 3D encoder and a 3D decoder) may be trained on 3D data representations to encode an input 3D representation into a latent form (such as a latent vector or a latent capsule) using the 3D encoder, and then reconstruct a facsimile of the input 3D representation using the 3D decoder.
  • a latent form such as a latent vector or a latent capsule
  • 3D decoder e.g., 3D convolution, 3D pooling and 3D reconstruction error calculation.
  • a 3D convolution may be performed to aggregate local features from nearby mesh elements. Processing may be performed above and beyond the techniques for 2D convolution, to account for the differing count and locations of neighboring mesh elements (relative to a particular mesh element).
  • a particular 3D mesh element may have a variable count of neighbors and those neighbors may not be found in expected locations (as opposed to a pixel in 2D convolution which may have a fixed count of neighboring pixels which may be found in known or expected locations).
  • the order of neighboring mesh elements may be relevant to 3D convolution.
  • a 3D pooling operation may enable the combining of features from a 3D mesh (or other 3D representation) at multiple scales.
  • 3D pooling may iteratively reduce a 3D mesh into mesh elements which are most highly relevant to a given application (e.g., for which a neural network has been trained).
  • 3D pooling may benefit from special processing beyond that entailed in 2D convolution, to account for the differing count and locations of neighboring mesh elements (relative to a particular mesh element).
  • the order of neighboring mesh elements may be less relevant to 3D pooling than to 3D convolution.
  • 3D reconstruction error may be computed using one or more of the techniques described herein, such as computing Euclidean distances between corresponding mesh elements, between the two meshes. Other techniques are possible in accordance with aspects of this disclosure. 3D reconstruction error may generally be computed on 3D mesh elements, rather than the 2D pixels of 2D reconstruction error. 3D reconstruction error may enable technical improvements over 2D reconstruction error, because a 3D representation may, in some instances, have less ambiguity than a 2D representation (i.e., have less ambiguity in form, shape and/or structure).
  • Additional processing may, in some implementations, be entailed for 3D reconstruction which is above and beyond that of 2D reconstruction, because of the complexity of mapping between the input and reconstructed mesh elements (i.e., the input and reconstructed meshes may have different mesh element counts, and there may be a less clear mapping between mesh elements than there is for the mapping between pixels in 2D reconstruction).
  • the technical improvements of 3D reconstruction error calculation include data precision improvement.
  • a 3D representation may be produced using a 3D scanner, such as an intraoral scanner, a computerized tomography (CT) scanner, ultrasound scanner, a magnetic resonance imaging (MRI) machine or a mobile device which is enabled to perform stereophotogrammetry.
  • a 3D representation may describe the shape and/or structure of a subject.
  • a 3D representation may include one or more 3D mesh, 3D point cloud, and/or a 3D voxelized representation, among others.
  • a 3D mesh includes edges, vertices, or faces. Though interrelated in some instances, these three types of data are distinct. The vertices are the points in 3D space that define the boundaries of the mesh.
  • An edge is described by two points and can also be referred to as a line segment.
  • a face is described by a number of edges and vertices. For instance, in the case of a triangle mesh, a face comprises three vertices, where the vertices are interconnected to form three contiguous edges.
  • Some meshes may contain degenerate elements, such as non-manifold mesh elements, which may be removed, to the benefit of later processing. Other mesh pre-processing operations are possible in accordance with aspects of this disclosure.
  • 3D meshes are commonly formed using triangles, but may in other implementations be formed using quadrilaterals, pentagons, or some other n-sided polygon.
  • a 3D mesh may be converted to one or more voxelized geometries (i.e., comprising voxels), such as in the case that sparse processing is performed.
  • the techniques of this disclosure which operate on 3D meshes may receive as input one or more tooth meshes (e.g., arranged in one or more dental arches). Each of these meshes may undergo pre-processing before being input to the predictive architecture (e.g., including at least one of an encoder, decoder, pyramid encoder-decoder and U-Net).
  • This pre-processing may include the conversion of the mesh into lists of mesh elements, such as vertices, edges, faces or in the case of sparse processing - voxels.
  • mesh elements such as vertices, edges, faces or in the case of sparse processing - voxels.
  • feature vectors may be generated. In some examples, one feature vector is generated per vertex of the mesh.
  • Each feature vector may contain a combination of spatial and/or structural features, as specified in the following table:
  • Table 1 discloses non-limiting examples of mesh element features.
  • color or other visual cues/identifiers
  • a point differs from a vertex in that a point is part of a 3D point cloud, whereas a vertex is part of a 3D mesh and may have incident faces or edges.
  • a dihedral angle (which may be expressed in either radians or degrees) may be computed as the angle (e.g., a signed angle) between two connected faces (e.g., two faces which are connected along an edge).
  • a sign on a dihedral angle may reveal information about the convexity or concavity of a mesh surface.
  • a positively signed angle may, in some implementations, indicate a convex surface.
  • a negatively signed angle may, in some implementations, indicate a concave surface.
  • directional curvatures may first be calculated to each adjacent vertex around the vertex. These directional curvatures may be sorted in circular order (e.g., 0, 49, 127, 210, 305 degrees) in proximity to the vertex normal vector and may comprise a subsampled version of the complete curvature tensor. Circular order means: sorted in by angle around an axis.
  • the sorted directional curvatures may contribute to a linear system of equations amenable to a closed form solution which may estimate the two principal curvatures and directions, which may characterize the complete curvature tensor.
  • a voxel may also have features which are computed as the aggregates of the other mesh elements (e.g., vertices, edges and faces) which either intersect the voxel or, in some implementations, are predominantly or fully contained within the voxel. Rotating the mesh may not change structural features but may change spatial features.
  • the term “mesh” should be considered in a nonlimiting sense to be inclusive of 3D mesh, 3D point cloud and 3D voxelized representation.
  • mesh element features apart from mesh element features, there are alternative methods of describing the geometry of a mesh, such as 3D keypoints and 3D descriptors. Examples of such 3D keypoints and 3D descriptors are found in “TONIONI A, et al. in ‘Learning to detect good 3D keypoints.’, Int J Comput. Vis. 2018 Vol .126, pages 1-20.”. 3D keypoints and 3D descriptors may, in some implementations, describe extrema (either minima or maxima) of the surface of a 3D representation.
  • one or more mesh element features may be computed, at least in part, via deep feature synthesis (DFS), e.g. as described in: J. M. Kanter and K. Veeramachaneni, "Deep feature synthesis: Towards automating data science endeavors," 2015 IEEE International Conference on Data Science and Advanced Analytics (DSAA), 2015, pp. 1-10, doi: 10.1109/DSAA.2015.7344858.
  • DFS deep
  • mesh element features may convey aspects of a 3D representation’s surface shape and/or structure to the neural network models of this disclosure.
  • Each mesh element feature describes distinct information about the 3D representation that may not be redundantly present in other input data that are provided to the neural network. For example, a vertex curvature may quantify aspects of the concavity or convexity of the surface of a 3D representation which would not otherwise be understood by the network.
  • mesh element features may provide a processed version of the structure and/or shape of the 3D representation; data that would not otherwise be available to the neural network. This processed information is often more accessible, or more amenable for encoding by the neural network.
  • a system implementing the techniques disclosed herein has been utilized to mn a number of experiments on 3D representations of teeth. For example, mesh element features have been provided to a representation generation neural network which is based on a U-Net model, and also to a representation generation model based on a variational autoencoder with continuous normalizing flows.
  • Predictive models which may operate on feature vectors of the aforementioned features include but are not limited to: GDL Setups, RL Setups, VAE Setups, Capsule Setups, MLP Setups, Diffusion Setups, PT Setups, Similarity Setups, Tooth Classification, Setups Classification, Setups Comparison, VAE Mesh Element Labeling, MAE Mesh In-filling, Mesh Reconstruction Autoencoder, Validation Using Autoencoders, Mesh Segmentation, Coordinate System Prediction, Mesh Cleanup, Restoration Design Generation, Appliance Component Generation and/or Placement, and Archform Prediction.
  • Such feature vectors may be presented to the input of a predictive model. In some implementations, such feature vectors may be presented to one or more internal layers of a neural network which is part of one or more of those predictive models.
  • the neural networks of this disclosure may exploit one or more benefits of the operation of parameter tuning, whereby the inputs and parameters of a neural network are optimized to produce more data-precide results.
  • One parameter which may be tuned is neural network learning rate (e.g., which may have values such as 0.1, 0.01, 0.001, etc.).
  • Data augmentation schemes may also be tuned or optimized, such as schemes where “shiver” is added to the tooth meshes before being input to the neural network (i.e., small random rotations, translations and/or scaling may be applied to vary the dataset and make the neural network robust to variations in data).
  • a subset of the neural network model parameters available for tuning are as follows: o Learning rate (LR) decay rate (e.g., how much the LR decays during a training run) o Learning rate (LR).
  • the floating-point value (e.g., 0.001) that is used by the optimizer.
  • o LR schedule e.g., cosine annealing, step, exponential
  • Voxel size for cases with sparse mesh processing operations
  • Dropout % e.g., dropout which may be performed in a linear encoder
  • LR decay step size e.g., decay evety 10 or 20 or 30 epochs
  • Model scaling which may increase or decrease the count of layers and/or the count of parameters per layer.
  • Parameter tuning may be advantageously applied to the training of a neural network for the prediction of final setups or intermediate staging to provide data precision-oriented technical improvements. Parameter tuning may also be advantageously applied to the training of a neural network for mesh element labeling or a neural network for mesh in-filling. In some examples, parameter tuning may be advantageously applied to the training of a neural network for tooth reconstruction. In terms of classifier models of this disclosure, parameter tuning may be advantageously applied to a neural network for the classification of one or more setups (i.e., classification of one or more arrangements of teeth). The advantage of parameter tuning is to improve the data precision of the output of a predictive model or a classification model.
  • Parameter tuning may, in some instances, provide the advantage of obtaining the last remaining few percentage points of validation accuracy out of a predictive or classification model.
  • Various neural network models of this disclosure may draw benefits from data augmentation. Examples include models of this which are trained on 3D meshes, such as GDL Setups, RL Setups, VAE Setups, Capsule Setups, MLP Setups, Diffusion Setups, PT Setups, Similarity Setups, FDG Setups, Setups Classification, Setups Comparison, VAE Mesh Element Labeling, MAE Mesh In-filling, Mesh Reconstruction VAE, and Validation Using Autoencoders.
  • Data augmentation such as by way of the method shown in FIG. 1, may increase the size of the training dataset of dental arches.
  • Data augmentation can provide additional training examples by adding random rotations, translations, and/or rescaling to copies of existing dental arches.
  • data augmentation may be carried out by perturbing or jittering the vertices of the mesh, in a manner similar to that described in (“Equidistant and Uniform Data Augmentation for 3D Objects”, IEEE Access, Digital Object Identifier 10.1109/ ACCESS.2021.3138162).
  • the position of a vertex may be perturbed through the addition of Gaussian noise, for example with zero mean, and 0.1 standard deviation. Other mean and standard deviation values are possible in accordance with the techniques of this disclosure.
  • generator networks of this disclosure can be implemented as one or more neural networks
  • the generator may contain an activation function.
  • an activation function When executed, an activation function outputs a determination of whether or not a neuron in a neural network will fire (e.g., send output to the next layer).
  • Some activation functions may include: binary step functions, or linear activation functions.
  • Other activation functions impart non-linear behavior to the network, including: sigmoid/logistic activation functions, Tanh (hyperbolic tangent) functions, rectified linear units (ReLU), leaky ReLU functions, parametric ReLU functions, exponential linear units (ELU), softmax function, swish function, Gaussian error linear unit (GELU), or scaled exponential linear unit (SELU).
  • a linear activation function may be well suited to some regression applications (among other applications), in an output layer.
  • a sigmoid/logistic activation function may be well suited to some binary classification applications (among other applications), in an output layer.
  • a softmax activation function may be well suited to some multiclass classification applications (among other applications), in an output layer.
  • a sigmoid activation function may be well suited to some multilabel classification applications (among other applications), in an output layer.
  • a ReLU activation function may be well suited in some convolutional neural network (CNN) applications (among other applications), in a hidden layer.
  • CNN convolutional neural network
  • a Tanh and/or sigmoid activation function may be well suited in some recurrent neural network (RNN) applications (among other applications), for example, in a hidden layer.
  • RNN recurrent neural network
  • gradient descent which determines a training gradient using first-order derivatives and is commonly used in the training of neural networks
  • Newton's method which may make use of second derivatives in loss calculation to find better training directions than gradient descent, but may require calculations involving Hessian matrices
  • additional methods may be employed to update weights, in addition to or in place of the techniques described above. These additional methods include the Levenberg-Marquardt method and/or simulated annealing.
  • the backpropagation algorithm is used to transfer the results of loss calculation back into the network so that network weights can be adjusted, and learning can progress.
  • Neural networks contribute to the functioning of many of the applications of the present disclosure, including but not limited to: GDL Setups, RL Setups, VAE Setups, Capsule Setups, MLP Setups, Diffusion Setups, PT Setups, Similarity Setups, Tooth Classification, Setups Classification, Setups Comparison, VAE Mesh Element Labeling, MAE Mesh In-filling, Mesh Reconstruction Autoencoder, Validation Using Autoencoders, imputation of oral care parameters, 3D mesh segmentation (3D representation segmentation), Coordinate System Prediction, Mesh Cleanup, Restoration Design Generation, Appliance Component Generation and/or Placement, or Archform Prediction.
  • GDL Setups RL Setups
  • VAE Setups Capsule Setups
  • MLP Setups Diffusion Setups
  • PT Setups Similarity Setups, Tooth Classification, Setups Classification, Setups Comparison, VAE Mesh Element Labeling,
  • the neural networks of the present disclosure may embody part or all of a variety of different neural network models. Examples include the U-Net architecture, multi-later perceptron (MLP), transformer, pyramid architecture, recurrent neural network (RNN), autoencoder, variational autoencoder, regularized autoencoder, conditional autoencoder, capsule network, capsule autoencoder, stacked capsule autoencoder, denoising autoencoder, sparse autoencoder, conditional autoencoder, long/short term memory (LSTM), gated recurrent unit (GRU), deep belief network (DBN), deep convolutional network (DCN), deep convolutional inverse graphics network (DCIGN), liquid state machine (LSM), extreme learning machine (ELM), echo state network (ESN), deep residual network (DRN), Kohonen network (KN), neural Turing machine (NTM), or generative adversarial network (GAN).
  • U-Net architecture multi-later perceptron (MLP), transformer, pyramid architecture, recurrent
  • an encoder structure or a decoder structure may be used.
  • Each of these models provides one or more of its own particular advantages.
  • a particular neural networks architecture may be especially well suited to a particular ML technique.
  • autoencoders are particularly suited to the classification of 3D oral care representations, due to the ability to encode the 3D oral care representation into a form which is more easily classifiable.
  • the neural networks of this disclosure can be adapted to operate on 3D point cloud data (alternatively on 3D meshes or 3D voxelized representation).
  • Numerous neural network implementations may be applied to the processing of 3D representations and may be applied to training predictive and/or generative models for oral care applications, including: PointNet, PointNet++, SO-Net, spherical convolutions, Monte Carlo convolutions and dynamic graph networks, PointCNN, ResNet, MeshNet, DGCNN, VoxNet, 3D-ShapeNets, Kd-Net, Point GCN, Grid-GCN, KCNet, PD-Flow, PU-Flow, MeshCNN and DSG-Net.
  • Oral care applications include, but are not limited to: setups prediction (e.g., using VAE, RL, MLP, GDL, Capsule, Diffusion, etc. which have been trained for setups prediction), 3D representation segmentation, 3D representation coordinate system prediction, element labeling for 3D representation clean-up (VAE for Mesh Element labeling), in-filling of missing elements in 3D representation (MAE for Mesh In-Filling), dental restoration design generation, setups classification, appliance component generation and/or placement, archform prediction, imputation of oral care parameters, setups validation, or other validation applications and tooth 3D representation classification.
  • setups prediction e.g., using VAE, RL, MLP, GDL, Capsule, Diffusion, etc. which have been trained for setups prediction
  • 3D representation segmentation e.g., 3D representation coordinate system prediction
  • element labeling for 3D representation clean-up VAE for Mesh Element labeling
  • MAE Mesh In-Filling
  • dental restoration design generation setup
  • Autoencoders that can be used in accordance with aspects of this disclosure include but are not limited to: AtlasNet, FoldingNet and 3D-PointCapsNet. Some autoencoders may be implemented based on PointNet.
  • Representation learning may be applied to setups prediction techniques of this disclosure by training a neural network to learn a representation of the teeth, and then using another neural network to generate transforms for the teeth.
  • Some implementations may use a VAE or a Capsule Autoencoder to generate a representation of the reconstruction characteristics of the one or more meshes related to the oral care domain (including, in some instances, information about the structures of the tooth meshes).
  • that representation (either a latent vector or a latent capsule) may be used as input to a module which generates the one or more transforms for the one or more teeth.
  • These transforms may in some implementations place the teeth into final setups poses.
  • These transforms may in some implementations place the teeth into intermediate staging poses.
  • a transform may be described by a 9x1 transformation vector (e.g., that specifies a translation vector and a quaternion).
  • a transform may be described by a transformation matrix (e.g., a 4x4 affine transformation matrix).
  • systems of this disclosure may implement a principal components analysis (PCA) on an oral care mesh, and use the resulting principal components as at least a portion of the representation of the oral care mesh in subsequent machine learning and/or other predictive or generative processing.
  • PCA principal components analysis
  • An autoencoder may be trained to generate a latent form of a 3D oral care representation.
  • An autoencoder may contain a 3D encoder (which encodes a 3D oral care representation into a latent form), and/or a 3D decoder (which reconstructs that latent from into a facsimile of the inputted 3D oral care representation).
  • 3D encoders and 3D decoders the term 3D should be interpreted in a non-limiting fashion to encompass multi-dimensional modes of operation.
  • systems of this disclosure may train multi-dimensional encoders and/or multi-dimensional decoders.
  • Systems of this disclosure may implement end-to-end training.
  • End-to-end training-based techniques of this disclosure may involve two or more neural networks, where the two or more neural networks are trained together (i.e., the weights are updated concurrently during the processing of each batch of input oral care data).
  • End-to-end training may, in some implementations, be applied to setups prediction by concurrently training a neural network which leams a representation of the teeth, along with a neural network which generates the tooth transforms.
  • a neural network (e.g., a U-Net) may be trained on a first task (e.g., such as coordinate system prediction).
  • the neural network trained on the first task may be executed to provide one or more of the starting neural network weights for the training of another neural network that is trained to perform a second task (e.g., setups prediction).
  • the first network may learn the low-level neural network features of oral care meshes and be shown to work well at the first task.
  • the second network may exhibit faster training and/or improved performance by using the first network as a starting point in training.
  • Certain layers may be trained to encode neural network features for the oral care meshes that were in the training dataset.
  • These layers may thereafter be fixed (or be subjected to minor changes over the course of training) and be combined with other neural network components, such as additional layers, which are trained for one or more oral care tasks (such as setups prediction).
  • additional layers which are trained for one or more oral care tasks (such as setups prediction).
  • a portion of a neural network for one or more of the techniques of the present disclosure may receive initial training on another task, which may yield important learning in the trained network layers. This encoded learning may then be built upon with further task-specific training of another network.
  • transfer learning may be used for setups prediction, as well as for other oral care applications, such as mesh classification (e.g., tooth or setups classification), mesh element labeling, mesh element in-filling, procedure parameter imputation, mesh segmentation, coordinate system prediction, restoration design generation, mesh validation (for any of the applications disclosed herein).
  • mesh classification e.g., tooth or setups classification
  • mesh element labeling e.g., mesh element in-filling
  • procedure parameter imputation e.g., mesh element in-filling
  • mesh segmentation e.g., coordinate system prediction
  • restoration design generation for any of the applications disclosed herein.
  • a neural network trained to output predictions based on oral care meshes may first be partially trained on one of the following publicly available datasets, before being further trained on oral care data: Google PartNet dataset, ShapeNet dataset, ShapeNetCore dataset, Princeton Shape Benchmark dataset, ModelNet dataset, ObjectNet3D dataset, ThingilOK dataset (which is especially relevant to 3D printed parts validation), ABC: A Big CAD Model Dataset For Geometric Deep Learning, ScanObjectNN, VOCASET, 3D-FUTURE, MCB: Mechanical Components Benchmark, PoseNet dataset, PointCNN dataset, MeshNet dataset, MeshCNN dataset, PointNet++ dataset, PointNet dataset, or PointCNN dataset.
  • a neural network which was previously trained on a first dataset may subsequently receive further training on oral care data and be applied to oral care applications (such as setups prediction).
  • Transfer learning may be employed to further train any of the following networks: GCN (Graph Convolutional Networks), PointNet, ResNet or any of the other neural networks from the published literature which are listed above.
  • a first neural network may be trained to predict coordinate systems for teeth (such as by using the techniques described in WO2022123402A1 or US Provisional Application No. US63/366492).
  • a second neural network may be trained for setups prediction, according to any of the setups prediction techniques of the present disclosure (or a combination of any two or more of the techniques described herein).
  • Transfer learning may transfer at least a portion of the knowledge or capability of the first neural network to the second neural network. As such, transfer learning may provide the second neural network an accelerated training phase to reach convergence.
  • the training of the second network may, after being augmented with the transferred learning, then be completed using one or more of the techniques of this disclosure.
  • Systems of this disclosure may train ML models with representation learning.
  • representation learning e.g., neural network that predicts a transform for use in setups prediction
  • the generative network e.g., neural network that predicts a transform for use in setups prediction
  • the representation generation model extracts hierarchical neural network features and/or reconstruction characteristics of an inputted representation (e.g., a mesh or point cloud) through loss calculations or network architectures chosen for that purpose).
  • Reconstruction characteristics may comprise values in of a latent representation (e.g., a latent vector) that describe aspects of the shape and/or structure of the 3D representation that was provided to the representation generation module that generated the latent representation.
  • the weights of the encoder module of a reconstruction autoencoder may be trained to encode a 3D representation (e.g., a 3D mesh, or others described herein) into a latent vector representation (e.g., a latent vector).
  • the capability to encode a large set (e.g., hundreds, thousands or millions) of mesh elements into a latent vector may be learned by the weights of the encoder.
  • Each dimension of that latent vector may contain a real number which describes some aspect of the shape and/or structure of the original 3D representation.
  • the weights of the decoder module of the reconstruction autoencoder may be trained to reconstruct the latent vector into a close facsimile of the original 3D representation.
  • the capability to interpret the dimensions of the latent vector, and to decode the values within those dimensions may be learned by the decoder.
  • the encoder and decoder neural network modules are trained to perform the mapping of a 3D representation into a latent vector, which may then be mapped back (or otherwise reconstructed) into a 3D representation that is substantially similar to an original 3D representation for which the latent vector was generated.
  • examples of loss calculation may include KL-divergence loss, reconstruction loss or other losses disclosed herein.
  • Representation learning may reduce the size of the dataset required for training a model, because the representation model learns the representation, enabling the generative network to focus on learning the generative task.
  • the result may be improved model generalization because meaningful neural network features of the input data (e.g., local and/or global features) are made available to the generative network.
  • a first network may learn the representation, and a second network may make the predictive decision.
  • each of the networks may generate more accurate results for their respective tasks than with a single network which is trained to both learn a representation and make a decision.
  • transfer learning may first train a representation generation model. That representation generation model (in whole or in part) may then be used to pre-train a subsequent model, such as a generative model (e.g., that generates transform predictions).
  • a representation generation model may benefit from taking mesh element features as input, to improve the capability of a second ML module to encode the structure and/or shape of the inputted 3D oral care representations in the training dataset.
  • One or more of the neural networks models of this disclosure may have attention gates integrated within. Attention gate integration provides the enhancement of enabling the associated neural network architecture to focus resources on one or more input values.
  • an attention gate may be integrated with a U-Net architecture, with the advantage of enabling the U-Net to focus on certain inputs, such as input flags which correspond to teeth which are meant to be fixed (e.g,. prevented from moving) during orthodontic treatment (or which require other special handling).
  • An attention gate may also be integrated with an encoder or with an autoencoder (such as VAE or capsule autoencoder) to improve predictive accuracy, in accordance with aspects of this disclosure.
  • attention gates can be used to configure a machine learning model to give higher weight to aspects of the data which are more likely to be relevant to correctly generated outputs.
  • attention gates or mechanisms
  • the quality and makeup of the training dataset for a neural network can impact the performance of the neural network in its execution phase.
  • Dataset filtering and outlier removal can be advantageously applied to the training of the neural networks for the various techniques of the present disclosure (e.g., for the prediction of final setups or intermediate staging, for mesh element labeling or a neural network for mesh in-filling, for tooth reconstruction, for 3D mesh classification, etc.), because dataset filtering and outlier removal may remove noise from the dataset.
  • dataset filtering and outlier removal may remove noise from the dataset.
  • the mechanism for realizing an improvement is different than using attention gates, that ultimate outcome is that this approach allows for the machine learning model to focus on relevant aspects of the dataset, and may lead to improvements in accuracy similar to improvements in accuracy realized vis-a-vis attention gates.
  • a patient case may contain at least one of a set of segmented tooth meshes for that patient, a mal transform for each tooth, and/or a ground tmth setup transform for each tooth.
  • a patient case may contain at least one of a set of segmented tooth meshes for that patient, a mal transform for each tooth, and/or a set of ground truth intermediate stage transforms for each tooth.
  • a training dataset may exclude patient cases which contact passive stages (i.e., stages where the teeth of an arch do not move).
  • the dataset may exclude cases where passive stages exist at the end of treatment.
  • a dataset may exclude cases where overcrowding is present at the end of treatment (i.e., where the oral care provider, such as an orthodontist or dentist) has chosen a final setup where the tooth meshes overlap to some degree.
  • the dataset may exclude cases of a certain level (or levels) of difficulty (e.g., easy, medium and hard).
  • the dataset may include cases with zero pinned teeth (or may include cases where at least one tooth is pinned).
  • a pinned tooth may be designated by a technician as they design the treatment to stop the various tools from moving that particular tooth.
  • a dataset may exclude cases without any fixed teeth (conversely, where at least one tooth is fixed).
  • a fixed tooth may be defined as a tooth that shall not move in the course of treatment.
  • a dataset may exclude cases without any pontic teeth (conversely, cases in which at least one tooth is pontic).
  • a pontic tooth may be described as a “ghost” tooth that is represented in the digital model of the arch but is either not actually present in the patient’ s dentition or where there may be a small or partial tooth that may benefit from future work (such as the addition of composite material through a dental restoration appliance).
  • the advantage of including a pontic tooth in a patient case is to leave space in the arch as a part of a plan for the movements of other teeth, in the course of orthodontic treatment.
  • a pontic tooth may save space in the patient’s dentition for future dental or orthodontic work, such as the installation of an implant or crown, or the application of a dental restoration appliance, such as to add composite material to an existing tooth that is too small or has an undesired shape.
  • the dataset may exclude cases where the patient does not meet an age requirement (e.g., younger than 12). In some implementations, the dataset may exclude cases with interproximal reduction (IPR) beyond a certain threshold amount (e.g., more than 1.0 mm).
  • the dataset to train a neural network to predict setups for clear tray aligners (CTA) may exclude patient cases which are not related to CTA treatment.
  • the dataset to train a neural network to predict setups for an indirect bonding tray product may exclude cases which are not related to indirect bonding tray treatment.
  • the dataset may exclude cases where only certain teeth are treated. In such implementations, a dataset may comprise of only cases where at least one of the following are treated: anterior teeth, posterior teeth, bicuspids, molars, incisors, and/or cuspids.
  • the Setups Comparison tool may be used to compare the output of the GDL Setups model against ground truth data, compare the output of the RL Setups model against ground truth data, compare the output of the VAE Setups model against ground truth data and compare the output of the MLP Setups model against ground truth data.
  • the Metrics Visualization tool can enable a global view of the final setups and intermediate stages produced by one or more of the setups prediction models, with the advantage of enabling the selection of the best setups prediction model.
  • the Metrics Visualization tool furthermore, enables the computation of metrics which have a global scope over a set of intermediate stages. These global metrics may, in some implementations, be consumed as inputs to the neural networks for predicting setups (e.g., GDL Setups, RL Setups, VAE Setups, Capsule Setups, MLP Setups, Diffusion Setups, PT Setups, Similarity Setups, among others). The global metrics may also be provided to FDG Setups.
  • GDL Setups e.g., GDL Setups, RL Setups, VAE Setups, Capsule Setups, MLP Setups, Diffusion Setups, PT Setups, Similarity Setups, among others.
  • the global metrics may also be provided to FDG Setups.
  • the local metrics from this disclosure may, in some implementations, be consumed by the neural networks herein for predicting setups, with the advantage of improving predictive results.
  • the metrics described in this disclosure may, in some implementations, be visualized using the Metric Visualization tool.
  • the VAE and MAE models for mesh element labelling and mesh in-filling can be advantageously combined with the setups prediction neural networks, for the purpose of mesh cleanup ahead of or during the prediction process.
  • the VAE for mesh element labelling may be used to flag mesh elements for further processing, such as metrics calculation, removal or modification.
  • flagged mesh elements may be provided as inputs to a setups prediction neural network, to inform that neural network about important mesh features, attributes or geometries, with the advantage of improving the performance of the resulting setups prediction model.
  • mesh in-fdling may cause the geometry of a tooth to become more nearly complete, enabling the better functioning of a setups prediction model (i.e., improved correctness of prediction on account of better-formed geometry).
  • a neural network to classify a setup i.e., the Setups Classifier
  • the setups classifier tells that setups prediction neural network when the predicted setup is acceptable for use and can be provided to a method for aligner tray generation.
  • a Setups Classifier may aid in the generation of final setups and also in the generation of intermediate stages.
  • a Setups Classifier neural network may be combined with the Metrics Visualization tool.
  • a Setups Classification neural network may be combined with the Setups Comparison tool (e.g., the Setup Comparison tool may output an indication of how a setup produced in part by the Setups Classifier compares to a setup produced by another setups prediction method).
  • the VAE for mesh element labelling may identify one or more mesh elements for use in a metrics calculation.
  • the resulting metrics outputs may be visualized by the Metrics Visualization tool.
  • the Setups Classifier neural network may aid in the setups prediction technique described in U.S. Patent Application No. US20210259808A1 (which is incorporated herein by reference in its entirety) or the setups prediction technique described in PCT Application with Publication No. WO2021245480A1 (which is incorporated herein by reference in its entirety) or in PCT Application No. PCT/IB2022/057373 (which is incorporated herein by reference in its entirety).
  • the Setups Classifier would help one or more of those techniques to know when the predicted final setup is most nearly correct.
  • the Setups Classifier neural network may output an indication of how far away from final setup a given setup is (i.e., a progress indicator).
  • the latent space embedding vector(s) from the reconstruction VAE can be concatenated with the inputs to the setups prediction neural network described in WO2021245480A1.
  • the latent space vectors can also be incorporated as inputs to the other setups prediction models: GDL Setups, RL Setups, VAE Setups, Capsule Setups, MLP Setups and Diffusion Setups, among others.
  • the advantage is to impart the reconstruction characteristics (e.g., latent vector dimensions of a tooth mesh) to that neural network, hence improving the generated setups prediction.
  • the various setups prediction neural networks of this disclosure may work together to produce the setups required for orthodontic treatment.
  • the GDL Setups model may produce a final setup, and the RL Setups model may use that final setup as input to produce a series of intermediate stages setups.
  • the VAE Setups model (or the MLP Setups model) may create a final setup which may be used by a RL Setups model to produce a series of intermediate stages setups.
  • a setup prediction may be produced by one setups prediction neural network, and then taken as input to another setups prediction neural network for further improvements and adjustments to be made. In some implementations, such improvements may be performed in iterative fashion.
  • a setups validation model such as the model disclosed in US Provisional Application No. US63/366495, may be involved in this iterative setups prediction loop.
  • a setup may be generated (e.g., using a model trained for setups prediction, such as GDL Setups, RL Setups, VAE Setups, Capsule Setups, MLP Setups, Diffusion Setups, PT Setups, Similarity Setups and FDG Setups, among others), then the setup undergoes validation. If the setup passes validation, the setup may be outputted for use. If the setup fails validation, the setup may be sent back to one or more of the setups prediction models for corrections, improvements and/or adjustments.
  • the setups validation model may output an indication of what is wrong with the setup, enabling the setups generation model to make an improved version upon the next iteration. The process iterates until done.
  • two or more of the following techniques of the present disclosure may be combined in the course of orthodontic and/or dental treatment: GDL Setups, Setups Classification, Reinforcement Learning (RL) Setups, Setups Comparison, Autoencoder Setups (VAE Setups or Capsule Setups), VAE Mesh Element Labeling, Masked Autoencoder (MAE) Mesh Infilling, Multi-Layer Perceptron (MLP) Setups, Metrics Visualization, Imputation of Missing Oral Care Parameters Values, Tooth Classification Using Latent Vector, FDG Setups, Pose Transfer Setups, Restoration Design Metrics Calculation, Neural Network Techniques for Dental Restoration and/or Orthodontics (e.g., 3D Oral Care Representation Generation or Modification Using Transformers), Landmark-based (LB) Setups, Diffusion Setups, Imputation of Tooth Movement Procedures, Capsule Autoencoder Segmentation
  • Some autoencoder-based implementations of this disclosure use capsule autoencoders to automate processing steps in the creation of oral care appliances (e.g., for orthodontic treatment or dental restoration).
  • capsule autoencoders which have been trained on oral care data is to leverage latent space techniques which reduce the dimensionality of oral care mesh data and thereby refine those data, making the signal in the data stronger and more readily usable by downstream processing modules, whether those downstream modules may be other autoencoder(s), decoder(s), other neural networks, or other types of ML models (such as the supervised and unsupervised models described elsewhere in this disclosure).
  • Capsule autoencoders were originally applied in the 2D domain to perform object recognition in 2D images, where capsules were trained to create a model of the object that was to be recognized. Such an approach enabled an object to be recognized in the 2D image, even if the object was imaged from a new view that was not present in the training dataset. Later research extended capsule autoencoders to the domain of 3D point clouds, such as in “3D Point Capsule Networks” in the proceedings of CVPR 2019, which is incorporated herein by reference in its entirety.
  • a 3D autoencoder may encode one or more 3D geometries (point clouds or meshes) into latent capsules which encode the reconstruction characteristics of the input 3D representation. These latent capsules exist in two or more dimensions and describe features of the input mesh (or point cloud) and the likelihoods of those features.
  • a set of latent capsules stands in contrast to the latent vector which may be produced by a variational autoencoder (VaE), which may be encoded as a ID vector.
  • VaE variational autoencoder
  • Particular examples of applications include segmentation of 3D oral care geometries, setups prediction (both final setups and intermediate stages), mesh cleanup of 3D oral care geometries (e.g., both for the labeling of mesh elements and the filling-in of missing mesh elements), tooth classification (e.g., according to standard dental notation schemes), setups classification (e.g., as mal, staging and final setup) and automated dental restoration design generation.
  • the one or more latent capsules describing an input 3D representation can be provided to a capsule decoder, to reconstruct a facsimile of the input 3D representation.
  • This facsimile can be compared to the input 3D representation through the calculation of a reconstruction error, thereby demonstrating the information-rich nature of the latent capsule (i.e., that the latent capsule describes sufficient reconstruction characteristics of the input mesh, such that the mesh can be reconstructed from that latent capsule).
  • a low reconstruction error indicates that the reconstruction was a success.
  • Some of the applications disclosed herein use this information-rich latent capsule for further processing (e.g., such as setups prediction, mesh segmentation, coordinate system prediction, mesh element labelling for mesh cleanup, in-filling of missing mesh elements or of holes in meshes, classification of setups, classification of oral care meshes, validation of setups and other validation appliances too).
  • Some of the applications disclosed herein make one or more changes to the latent capsule, such as to effectuate changes in the reconstructed mesh, which may then outputted for further use (e.g., to create a dental restoration appliance).
  • FIG. 2 illustrates a training method for a capsule autoencoder for reconstructing oral care meshes (or point clouds).
  • FIG. 2 shows a capsule autoencoder pipeline for mesh reconstruction, which are primarily applied to oral care meshes in the non-limiting examples described herein, but which may also be applied to other healthcare meshes, or to personal safety meshes, such as meshes pertaining to the design, shape, function, and/or use of personal protective equipment, such as disposable respirators.
  • the deployment method omits the two modules on the bottom.
  • the training method encompasses the whole diagram.
  • the latent capsule T may be a reduced dimensionality form of the inputted oral care mesh and may be used as an input to other processing.
  • an input point cloud or mesh (such as containing oral care data) may be rearranged into one or more vectors of mesh elements.
  • a vector may be Nx3 (in the case representing the XYZ coordinates of points or vertices).
  • Such a vector may be Nx3 (in the case of representing mesh faces, each of which may be defined by 3 indices, each of which indexes into a list of vertices/points).
  • Such a vector may be Nx2 (in the case of representing mesh edges, each of which may be defined by 2 indices, each of which can be indexed into a list of vertices/points).
  • Such a vector may be Nx3 (in the case of representing voxels, each of which has an XYZ location, such as a centroid, where the Length x Width x Height of each voxel is known).
  • a neural network such as an MLP
  • MLP may be used to extract features from the Nx3 mesh element input list, yielding an Nxl28 list of feature vectors, one feature vector per mesh element.
  • a vector of one or more computed mesh element features may be computed for one or more of the N inputed mesh elements.
  • these mesh element features may be used in place of the MLP-generated features.
  • each mesh element may be given a feature which is a hybrid of MLP-generated features and the computed mesh element features, in which case the layer dimension may be augmented to be Nx(128+aug_len), where aug len is the length of the augmentation vector, consisting of the computed mesh element features.
  • this layer will simply be referred to as Nxl28 hereafter.
  • the length ‘aug len’ may vary from implementation to implementation, depending on which mesh elements are analyzed and which mesh element features are chosen for use.
  • information from more than one type of mesh element may be introduced with the Nxl28 vector (e.g., point/vertex information may be combined with face information, point/vertex information may be combined with edge information, or point/vertex information may be combined with voxel information).
  • point/vertex information may be combined with face information
  • point/vertex information may be combined with edge information
  • point/vertex information may be combined with voxel information.
  • the analysis of different kinds of oral care meshes may call for one mesh element type or another, or for a particular set of mesh features, according to various applications.
  • the Nxl28 layer may be passed to a set of subsequent convolutions layers, each of which has been trained to have its own parameter values.
  • the purpose of each of these independent convolution layers may encode the individual mesh element capsules.
  • the output of each of the convolution layers may be maxpooled to a size of 1024 elements.
  • the count of these convolution layers may be a power of two (e.g., 8, 16, 32, 64).
  • These 32 maxpooling output vectors may be concatenated, forming a layer that may be 1024x32, called the Primary Mesh Element Capsules (PMEC).
  • PMEC Primary Mesh Element Capsules
  • a dynamic routing module encodes these PMECs into one or more latent capsules, each of which may have square dimensions (e.g., 16x16, 32x32, 64x64, or 128x128). Non-square dimensions are also possible.
  • a dynamic routing module may enable the output of a latent capsule to be routed to a suitable neural network layer in a subsequent processing module of the capsule autoencoder.
  • the dynamic routing module uses unsupervised techniques (e.g., clustering and/or other unsupervised techniques) to arrange the output of the set of max-pooled feature maps into one or more stacked latent capsules.
  • These latent capsules summarize feature information from the input 3D representation (e.g., one or more tooth meshes or point clouds) and also the likelihood information associated with each capsule. These stacked capsules contain sufficient information about the input 3D representation to reconstruct that 3D representation via the Capsule-Decoder module.
  • a grid of mesh elements may be generated by Grid Patches module. Points will be used for the mesh element, in this example.
  • this grid may comprise of randomly arranged points. In other implementations, this grid may reflect a regular and/or rectilinear arrangement of points. The points in each of these grid patches are the "raw material" from which the reconstructed 3D representation may be formed.
  • the latent capsule (e.g., with dimension 128x128) may be replicated [3 times, and each of those p latent capsules may be appended with each of the grid patch of randomly generated mesh elements (e.g., points/vertices) in turn, before being input to one or more MLPs.
  • MLP may comprise of fully connected layers with the following dimensions: ⁇ 64 - 64 - 32 - 16 - 3 ⁇ .
  • the goal of such an operation is to tailor the mesh elements to a specific local area of the 3D representation which may be to be reconstructed.
  • the decoder iterates, generating additional random grid patches and outputting more random portions of the reconstructed 3D representation (i.e., as point cloud patches). These point cloud patches are accumulated until a reconstruction loss drops below a target threshold.
  • the reconstruction loss may be computed using one or more of reconstruction loss (as defined herein) and KL-Divergence loss.
  • An autoencoder such as a variational autoencoder (VAE) may be trained to encode 3D mesh data in a latent space vector A, which may exist in an information-rich low-dimensional latent space.
  • VAE variational autoencoder
  • This latent space vector A may be particularly suitable for later processing by digital oral care applications (e.g., such as mesh cleanup, mesh segmentation, mesh validation, mesh classification, setups classification, setups prediction and restoration design generation, among others), because A enables high-dimensional tooth mesh data to be efficiently manipulated.
  • digital oral care applications e.g., such as mesh cleanup, mesh segmentation, mesh validation, mesh classification, setups classification, setups prediction and restoration design generation, among others
  • Such a VAE may be trained to reconstruct the latent space vector A back into a facsimile of the input mesh (or transform or other data structure describing a 3D oral care representation).
  • the latent space vector A may be strategically modified, so as to result in changes to the reconstructed mesh (or other data structure).
  • the reconstructed mesh may be a tooth mesh with an altered and/or improved shape, such as would be suitable for use in the design of a dental restoration appliance, such as a 3M FILTEK Matrix or a veneer.
  • the term mesh should be considered in a non-limiting sense to be inclusive of 3D mesh, 3D point cloud and 3D voxelized representation.
  • the tooth reconstruction VAE may advantageously make use of loss functions, nonlinearities (aka neural network activation functions) and/or solvers which are not mentioned by existing techniques.
  • loss functions may include: mean absolute error (MAE), mean squared error (MSE), Ll- loss, L2-loss, KL-divergence, entropy, and reconstruction loss.
  • MSE mean absolute error
  • Ll- loss L2-loss
  • KL-divergence KL-divergence
  • entropy entropy
  • reconstruction loss Such loss functions enable each generated prediction to be compared against the corresponding ground truth value in a quantified manner, leading to one or more loss values which can be used to train, at least in part, one or more of the neural networks.
  • solvers may include: dopri5, bdf, rk4, midpoint, adams, explicit adams, and fixed adams.
  • the solvers may enable the neural networks to solve systems of equations and corresponding unknown variables.
  • nonlinearities may include: tanh, rein, softplus, elu, swish, square, and identity.
  • the activation functions may be used to introduce nonlinear behavior to the neural networks in a manner that enables the neural networks to better represent the training data.
  • Losses may be computed through the process of training the neural networks via backpropagation. Neural network layers such as the following may be used: ignore, concat, concat_v2, squash, concatsquash, scale and concatscale.
  • the tooth reconstruction VAE model may be trained on patient cases of teeth in mal occlusion, or alternatively in local coordinates.
  • FIG. 3 shows a method of training such a VAE.
  • a 3D oral care representation F may be provided to the encoder El (along with optional tooth type information R), which may generate latent vector A.
  • Latent vector A may be reconstructed into reconstructed 3D oral care representation G.
  • Loss may be computed between the reconstructed 3D oral care representation G and ground truth 3D oral care representation GT (e.g., using the VAE loss calculation methods or other loss calculation methods described herein). Backpropagation may be used to train El and D 1 with such loss.
  • FIG. 4 shows the trained mesh reconstruction VAE in deployment.
  • the mesh reconstruction VAE is shown reconstructing a tooth mesh in deployment.
  • R is an optional input, particularly in the case of tooth mesh classification, when such information R is not yet available (due to the tooth mesh classification neural network being trained to generate tooth type information R as an output, according to particular implementations).
  • R may, in some implementations, be used to improve other techniques such as mesh element labelling techniques, mesh reconstruction techniques, oral care mesh classification techniques (e.g., such as tooth classification or setups classification), among others.
  • FIGS. 5 and 6 show reconstructed tooth meshes.
  • FIG. 5 illustrates an example of an input tooth mesh on the left and the outputted reconstructed tooth mesh on the right.
  • FIG. 5 illustrates an example of an input tooth mesh on the left and the outputted reconstructed tooth mesh on the right.
  • FIG. 6 illustrates another example of an input tooth mesh on the left and the corresponding outputted reconstructed tooth mesh on the right.
  • the use case of FIG. 5 is different from the use case of FIG. 6.
  • FIG. 7 shows a depiction of the reconstruction error from the reconstructed tooth shown in FIG. 6, called a reconstruction error plot.
  • FIG. 8 is a bar chart in which each bar represents an individual tooth and represents the mean absolute distance of all vertices involved in the reconstruction of that tooth in a data that was used to evaluate a mesh reconstruction model.
  • the tooth mesh reconstruction autoencoder of which a variational autoencoder (VAE) is an example, may be trained to encode a tooth as a reduced-dimensionality form, called a latent space vector.
  • the reconstruction VAE may be trained on example tooth meshes.
  • the tooth mesh may be received by the VAE, deconstructed into a latent space vector using a 3D encoder and then reconstructed into a facsimile of the input mesh using a 3D decoder.
  • Existing techniques for setups prediction lack such a deconstruction/reconstruction method.
  • the encoder El may become trained to encode a tooth mesh (or mesh of a dental appliance, gums, or other body part or anatomy) into a reduced-dimension form that can be used in the training and deployment of any of suite of powerful setups prediction methods (e.g., GDL Setups, RL Setups, VAE Setups, Capsule Setups, MLP Setups and Diffusion Setups, among others).
  • This reduced-dimensionality form of the tooth may enable the setups prediction neural network to more efficiently encode the reconstruction characteristics of the tooth, and better learn to place the tooth into a pose suitable for either final setups or intermediate stages, thereby providing technical improvements in terms of both data precision and resource footprint.
  • the reconstructed mesh may be compared to the input mesh, for example using a reconstruction error (as described elsewhere in this disclosure), which quantifies the differences between the meshes.
  • This reconstruction error may be computed using Euclidean distances between corresponding mesh elements between the two meshes. There are other methods of computing this error too which may be derived from material described elsewhere in this disclosure.
  • FIGS 7 and 8 show example reconstruction errors, in accordance with the techniques described herein.
  • the mesh or meshes which are provided to the mesh reconstruction VAE may first be converted to vertex lists (or point clouds) before being provided to the encoder El. This manner of handling the input to El may be conducive to either a single mesh input (such as in a tooth mesh classification task) or a set of multiple teeth (such as in the setups classification task). The input meshes do not need to be connected.
  • the encoder El may be trained to encode a tooth mesh into a latent space vector A (or “tooth representation vector”).
  • encoder El may arrange an input tooth mesh into a mesh element vector F, and encode it into a latent space vector A.
  • This latent space vector A may be a reduced dimensionality representation of F that describes the important geometrical attributes of F.
  • Latent space vector A may be provided to the decoder DI to be restored to full resolution or near full resolution, along with the desired geometrical changes.
  • the restored full resolution mesh or near-full resolution mesh may be described by G, which may then be arranged into the output mesh.
  • the tooth name, the tooth designation and/or tooth type R may be concatenated with the latent vector A, as a means of conditioning the VAE on such information, to improve the ability of the VAE to respond to specific tooth types or designations.
  • reconstruction error may be computed as element-to -element distances between two meshes, for example using Euclidean distances.
  • Other distance measures are possible in accordance with various implementations of the techniques of this disclosure, such as Cosine distance, Manhattan distance, Minkowski distance, Chebyshev distance, Jaccard distance (e.g. intersection over union of meshes), Haversine distance (e.g., distance across a surface), and Sorensen- Dice distance.
  • the performance of a mesh reconstruction VAE may, in some implementations, be verified via reconstruction error plots and/or other key performance indicators.
  • the latent space vectors for one or more input tooth meshes may be plotted (e.g., in 2D) using UMAP or t-SNE dimensionality reduction techniques and compared, to select the best available separability between classes of tooth (molar, premolar, incisor, etc.), indicating that the model has an awareness of the strong geometric variation between classes, and a strong similarity within a class. This would be illustrated by clear, nonoverlapping clusters in the resulting UMAP / t-SNE plots.
  • the latent vector corresponding to a mesh may be used as a part of a classifier to classify that. For example, classification may be performed to identify a tooth type, or to detect errors in the mesh (or an arrangement of meshes), such as in a validation operation.
  • the latent vector and/or computed mesh element features may be provided to a supervised machine learning model to classify the mesh. A non-exhaustive list of possible supervised ML models is found elsewhere in this disclosure.
  • a reconstruction VAE may be trained to reconstruct any arbitrary tooth type. In other implementations, a reconstruction VAE may be trained to reconstruct a specific tooth type (e.g., a 1 st molar, or a central incisor).
  • a specific tooth type e.g., a 1 st molar, or a central incisor.
  • FIG. 9 describes the training of a mesh reconstruction VAE which, in some implementations, may be used to encode a tooth mesh (or other 3D oral care representation) into a latent representation (e.g., a latent vector) A.
  • a latent representation e.g., a latent vector
  • the encoder portion of the VAE may encode the 3D oral care representation into a latent representation.
  • a VAE may also be trained to encode other kinds of 3D representations (e.g., setups transforms, mesh element labels, or meshes that describe gums, fixture model components, oral care hardware such as brackets and/or attachments, dental restoration appliance components, other portions of anatomy, or the like) into a latent vector A.
  • the latent representation may be reconstructed (e.g., using an autoencoder decoder, or other decoders described herein) into a reconstructed for of the original 3D oral care representation (e.g., a reconstructed tooth).
  • a reconstruction error may be computed between the original and the reconstructed versions of the 3D oral care representation.
  • Validation may be performed by comparing the reconstruction error to one or more thresholds. When the measured reconstruction error is beyond a threshold, the validation method may yield a “failing” result (e.g., the 3D oral care representation fails validation). Otherwise, the validation method may yield a “passing” result (e.g., the 3D oral care representation is suitable for use in generating an oral care appliance).
  • the latent representation(s) may be provided to a second ML module.
  • the second ML module e.g., a Gaussian process, an SVM, a neural network, or another discriminative machine learning model
  • the validation may generate a determination regarding whether the 3D oral care representation (e.g., a tooth mesh, one or more transforms, one or more mesh element labels, etc.) is suitable for use in generating an oral care appliance.
  • FIG. 9 shows a method that systems of this disclosure may implement to train a reconstruction autoencoder for reconstructing a 3D representation of the patient’s dentition.
  • FIG. 9 provides further details on training a tooth crown reconstruction VAE of this disclosure.
  • the particular example of FIG. 9 illustrates training of a variational autoencoder (VAE) for reconstructing a tooth mesh 900.
  • VAE variational autoencoder
  • the systems of this disclosure may generate a watertight mesh by merging the tooth’s crown mesh with the corresponding root mesh such that the vertices on the open edge of the crown mesh match up with the vertices on the open edge of the root mesh (902).
  • the systems of this disclosure may perform a registration step (904) to align a tooth mesh with a template tooth mesh (e.g., using the iterative closest point technique or by applying the inverse mal transform for that tooth), with the technical enhancement of improving the accuracy and data precision of the mesh correspondence computation at 906.
  • the systems of this disclosure may compute correspondences between a tooth mesh and the corresponding template tooth mesh, with the technical improvement of conditioning the tooth mesh to be ready to be provided to the reconstruction autoencoder.
  • the dataset of prepared tooth meshes are split into train, validation and holdout test sets (910), which are then used to train a reconstruction autoencoder (912), described herein as a tooth VAE, tooth reconstruction VAE or more generally as a reconstruction autoencoder.
  • the tooth VAE may comprise a 3D encoder which encodes a tooth mesh into a latent form (e.g., a latent vector A), and a subsequent 3D decoder reconstructs that tooth into a facsimile of the inputted tooth mesh.
  • the tooth VAE of this disclosure may be trained using a combination of reconstruction loss and KL-Divergence loss, and optionally other of the loss functions described herein. The output of this method is a trained tooth VAE 914.
  • FIG. 10 shows non-limiting code implementing an example 3D encoder and an example 3D decoder for a mesh reconstruction VAE.
  • the code is source code for the encoder and decoder, in Python.
  • These implementations may include: convolution operations, batch norm operations, linear neural network layers, Gaussian operations, and continuous normalizing flows (CNF), among others.
  • One of the steps which may take place in the VAE training data pre-processing is the calculation of mesh correspondences.
  • Correspondences may be computed between the mesh elements of the input mesh and the mesh elements of a reference or template mesh with known stmcture.
  • the goal of mesh correspondence calculation may be to find matching points between the surfaces of an input mesh and of a template (reference) mesh.
  • Mesh correspondence may generate point to point correspondences between input and template meshes by mapping each vertex from the input mesh to at least one vertex in the template mesh.
  • Correspondences may be computed between the mesh elements of the input mesh and the mesh elements of a reference or template mesh with known structure.
  • a range of entries in the vector may correspond to the mesial lingual cusp tip; another range of elements may correspond to the distal lingual cusp tip; another range of elements may correspond to the mesial surface of that tooth; another range of elements may correspond to the lingual surface of that tooth, and so on.
  • the autoencoder may be trained on just a subset of teeth (e.g., only molars or only upper left first molars). In other implementations, the autoencoder may be trained on a larger subset or all of the teeth in the mouth.
  • an input vector may be provided to the autoencoder (e.g., a vector of flags) which may define or otherwise influence the autoencoder as to which type of tooth mesh may have been received by the autoencoder as input.
  • a data precision improvement of this approach is to mesh correspondences in mesh reconstruction to reduce sampling error, improve alignment, and improve mesh generation quality. Further details on the use of mesh correspondences with the autoencoder models of this disclosure is found elsewhere in this disclosure.
  • an iterative closest point (ICP) algorithm may be mn between the input tooth mesh and a template tooth mesh, during the computation of mesh correspondences.
  • the correspondences may be computed to establish vertex-to-vertex relationships (between the input tooth mesh and the reconstructed tooth mesh), for use in computing reconstruction error.
  • training data may be generalized to one or more arches of teeth (e.g., among other 3D or larger oral care representations) or may be more specific to particular teeth within an arch (e.g., among other 3D oral care representations).
  • the specific training data can be presented as a tooth template.
  • a tooth template may be specific to one or more tooth types (e.g., lower right central incisor).
  • a tooth template may be generated which is an average of many examples of a certain type of tooth (such as an average of lower first molars). In some implementations, a tooth template may be generated which is an average of many examples of more than one tooth type (such as an average of first and second bicuspids from both upper and lower arches).
  • the pre-processing procedure may involve one or more of the following steps: generation of watertight meshes (e.g. making sure that the boundary of the root mesh seals cleanly against the boundary of the crown mesh), registration to align the tooth mesh with a template mesh (e.g., using either ICP or the inverse mal transform), and the computation of mesh correspondences (i.e., to generate mesh element-to-mesh element correspondences between the input tooth mesh and a template tooth mesh).
  • generation of watertight meshes e.g. making sure that the boundary of the root mesh seals cleanly against the boundary of the crown mesh
  • registration to align the tooth mesh with a template mesh e.g., using either ICP or the inverse mal transform
  • the computation of mesh correspondences i.e., to generate mesh element-to-mesh element correspondences between the input tooth mesh and a template tooth mesh.
  • the left side (labelled as “Training Data (ICP)" shows a tooth mesh (in the form of a 3D point cloud) after the completion of the pre-processing steps, where pre-processing used ICP to do the registration.
  • the right side shows two things: the output of the tooth reconstruction VAE (in the left column) and the corresponding ground truth tooth 3D representation. In this instance as well, the 3D representation of each tooth is represented by a point cloud. This output shown in FIG. 11 was generated at epoch 849 of the reconstruction VAE training.
  • a reconstruction autoencoder trained based on the above material is also relevant to validation operations, such as segmentation validation, coordinate system validation, mesh cleanup validation, restoration design validation, fixture model validation, clear tray aligner (CT A) trimline validation, setups validation, oral care appliance component validation (either or both of placement and generation), and hardware (bracket, attachment, etc.) placement validation, to name some examples.
  • validation operations such as segmentation validation, coordinate system validation, mesh cleanup validation, restoration design validation, fixture model validation, clear tray aligner (CT A) trimline validation, setups validation, oral care appliance component validation (either or both of placement and generation), and hardware (bracket, attachment, etc.) placement validation, to name some examples.
  • Autoencoders of this disclosure may process other types of oral care data, such as text data, categorical data, spatiotemporal data, real-time data and/or vectors of real numbers, such as may be found among the procedure parameters.
  • Data may be qualitative or quantitative.
  • Data may be nominal or ordinal.
  • Data may be discrete or continuous.
  • Data may be structured, unstructured or semi-structured.
  • the autoencoders of this disclosure may also encode such data into latent space vectors (or latent capsules) for later reconstruction. Those latent vectors/latent capsules may be used for prediction and/or classification.
  • the reconstructions may be used for model verification, and for validation applications, for example, through the calculation of reconstruction error and/or the labeling of data elements.
  • a latent vector A which may be generated by the encoder El in a fully trained mesh reconstruction autoencoder (e.g., for tooth meshes), may be a reduced-dimensionality representation of the input mesh (e.g., a tooth mesh).
  • the latent vector A may be a vector of 128 real numbers (or some other size, such as 256 or 512).
  • the decoder DI of the fully trained mesh reconstruction autoencoder may be capable to take the latent vector A as input and reconstruct a close facsimile of the input tooth mesh, with low reconstruction error.
  • modifications may be made to the latent vector A, so as to effect changes in the shape of the reconstructed mesh that is generated from the decoder D2.
  • Such modifications may be made after first mapping-out the latent space, to gain insight into the effects of making particular change.
  • loss functions which may be used in the training of El and DI, which may involve terms related to reconstruction loss and/or KL-Divergence between distributions (e.g., in some instances to minimize the distance between the latent space distribution and a multidimensional Gaussian distribution).
  • One purpose of the reconstruction loss term is to compare the predicted reconstructed tooth 3D representation to the corresponding ground truth reconstructed tooth 3D representation.
  • KL-divergence term is to make the latent space more Gaussian, and therefore improve the quality of reconstmcted meshes (i.e., especially in the case where the latent space vector may be modified, to change the shape of the outputted mesh, for example to segment a 3D mesh, or to perform tooth design generation for use in generating a dental reconstruction appliance).
  • modifications may be made to the latent vector A so as to change the characteristics of the reconstructed mesh (such as with the generation of a dental restoration tooth design mesh).
  • the loss L is computed using only reconstruction loss, and changes are made to the latent vector A, then in some use case scenarios, the reconstructed mesh may reflect the expected form of output (e.g., be a recognizable tooth). In other use case scenarios however, the output of the reconstructed mesh may not conform to the expected form of output (e.g., not be a recognizable tooth).
  • FIG. 12 illustrates a latent space in which the loss incorporates reconstruction loss but does not incorporate KL-Divergence loss. In FIG.
  • point Pl corresponds to the original form of a latent space vector A.
  • Point P2 corresponds to a different location in the latent space, which may be sampled as a result of making modifications to the latent vector A, but where the mesh which is reconstructed from P2 may not give good output (e.g., does not look like a recognizable or otherwise suitable tooth).
  • Point P3 corresponds to still a different location in the latent space, which may be sampled as a result of making a different set of modifications to the latent vector A, but where the mesh which is reconstructed from P3 may give good output (e.g., has the appearance of a tooth design which is suitable for use in generating a dental restoration appliance).
  • loss involves only reconstruction loss
  • the subset of the latent space which can be sampled to produce a latent space vector P3 which yielding a valid reconstructed mesh may be irregular or hard to predict.
  • a loss calculation may, in some implementations, incorporate normalizing flows, for example, by the incorporation of a KL-divergence term.
  • incorporating the KL-divergence term as described herein enables training that involves flow normalization. If the loss is improved by incorporating a KL-divergence term, the quality of the latent space may improve significantly.
  • the latent space may become more Gaussian under this new scenario (as shown in FIG. 13), a latent supervector A corresponds to point P4 near the center of a multidimensional Gaussian curve.
  • the loss in the latent space includes both reconstruction loss and KL-divergence loss.
  • Changes may be made to the latent supervector A, yielding point P5 nearby P4, where the resulting reconstructed mesh is highly likely to reflect desired attributes (e.g., is highly likely to be a valid tooth).
  • the introduction of the KL- divergence term to loss may make the process of modifying the latent space vector A and getting a valid reconstructed mesh more reliable.
  • the latent vector maybe replaced with a latent capsule, which may undergo modification and subsequently be reconstructed.
  • This autoencoder framework may, in some implementations, be adapted to the segmentation of tooth meshes. Additionally, this autoencoder framework may, in some implementations, be adapted to the task of tooth coordinate system prediction.
  • a mesh reconstruction autoencoder for coordinate system prediction may compress the tooth data into latent vector form, and then provide the latent vector as input to a second ML module (e.g., an MLP) which may have been trained for coordinate system prediction (e.g., for coordinate system prediction on a mesh, with the goal of defining a local coordinate system for that mesh, such as a tooth mesh).
  • a second ML module e.g., an MLP
  • coordinate system prediction e.g., for coordinate system prediction on a mesh, with the goal of defining a local coordinate system for that mesh, such as a tooth mesh.
  • the latent space can be mapped-out, so that changes to the latent space vector A may lead to reasonably well reconstructed meshes.
  • the latent space may be systematically mapped by generating latent vectors with carefully chosen variations in value (e.g., by experimenting with different combinations of 128 values in an example latent vector). In some instances, a grid search of values may be performed, with the advantage of efficiently exploring the latent space.
  • the shape of a mesh may be modified by nudging the values in one or more elements of the latent vector values towards the portion of the mapped out latent space which has been found to correspond to the desired tooth characteristics.
  • KL-divergence in the loss calculation increases the likelihood that the modified latent vector gets reconstructed into a valid example of the inputted 3D oral care representation (e.g., 3D tooth mesh).
  • the mesh may correspond to at least some portion of a tooth. Changes may be made to a latent vector A, such that the resulting reconstructed tooth mesh may have characteristics which meet the specification set by the restoration design parameters.
  • a neural network for tooth restoration design generation is described in US Provisional Application No. US63/366514, the entire disclosure of which is incorporated herein by reference.
  • a tooth setup may be designed at least in part, by modifying a latent vector that corresponds to one or more teeth (e.g., each described as 3D point clouds, voxels or meshes) of an arch or arches which are to be placed in a setup configuration.
  • This mesh may be encoded into a latent vector A which then undergoes modification to adjust the poses of the resulting tooth poses.
  • the modified latent vector A’ may then be reconstructed into the mesh or meshes which describe the setup.
  • Such a technique may be used to design a final setup configuration or an intermediate stage configuration, or the like.
  • the modifications to a latent vector may, in some implementations, be carried out via an ML model, such as one of the neural network models or other ML models disclosed elsewhere in this disclosure.
  • a neural network may be trained to operate within the latent space of such vectors A of setups meshes.
  • the mapping of the latent space of A may have been previously generated by making controlled adjustments to trial latent vectors and observing the resulting changes to a setups configuration (i.e., after the modified A has been reconstructed back into a full mesh or meshes of the dental arch).
  • the mapping of the latent space may, in some instances, follow methodical search patterns, such as in a grid search.
  • a tooth reconstruction VAE may take a single input of tooth name/type/designation R, which may command the VAE to output a tooth mesh of the designated type. This can be accomplished by generating a latent vector A' for use in reconstructing a suitable tooth mesh. In some implementations, this latent vector A' may be sampled or generated "on the fly", out of a prior mapping of the latent vector space. Such a mapping may have been performed to understand which portions of the latent vector space correspond to different shapes, structures and/or geometries of tooth.
  • certain elements may have been determined to correspond to a certain type/name/designation of tooth and/or a tooth with a certain shape or other intended characteristics.
  • This model for tooth mesh generation may also apply to the generation of oral care hardware, appliances and appliance components (such as to be used for orthodontic treatment).
  • This model may also be trained for the generation of other types of anatomy.
  • This model may also be trained for the generation of other types on non-oral care meshes as well.
  • the mesh comparison module may compare two or more meshes, for example for the computation of a loss function or for the computation of a reconstruction error. Some implementations may involve a comparison of the volume and/or area of the two meshes. Some implementations may involve the computation of a minimum distance between corresponding vertices/faces/edges/voxels of two meshes. For a point in one mesh (vertex point, mid-point on edge, or triangle center, for example) compute the minimum distance between that point and the corresponding point in the other mesh. In the case that the other mesh has a different number of elements or there is otherwise no clear mapping between corresponding points for the two meshes, different approaches can be considered.
  • the open-source software packages CloudCompare and MeshLab each have mesh comparison tools which may play a role in the mesh comparison module for the present disclosure.
  • a Hausdorff Distance may be computed to quantify the difference in shape between two meshes.
  • the open-source software tool Metro developed by the Visual Computing Lab, can also play a role in quantifying the difference between two meshes.
  • the following paper describes the approach taken by Metro, which may be adapted by the neural networks applications of the present disclosure for use in mesh comparison and difference quantification: "Metro: measuring error on simplified surfaces" by P. Cignoni, C. Rocchini and R. Scopigno, Computer Graphics Forum, Blackwell Publishers, vol. 17(2), June 1998, pp 167-174.
  • Some techniques of this disclosure may incorporate the operation of, for one or more points on the first mesh, shooting a ray normal to the mesh surface and calculating the distance before that ray is incident upon the second mesh.
  • the lengths of the resulting line segments may be used to quantify the distance between the meshes.
  • the distance may be assigned a color based on the magnitude of that distance and that color may be applied to the first mesh, by way of visualization.
  • an autoencoder such as a variational autoencoder (VAE) or capsule autoencoder, may be used to validate the correctness of an oral care mesh.
  • VAE variational autoencoder
  • the mesh may be produced by scanning a 3D printed object.
  • the mesh may correspond to a patient’s dentition (e.g., containing teeth, gums, hardware and the like) and be used for the creation of a dental or orthodontic appliance.
  • 3D mesh is meant to encompass the validation of other forms of 3D representations, such as 3D point clouds and 3D voxelized representations (such as may be used in sparce processing).
  • an autoencoder such as a VAE or capsule autoencoder
  • the input mesh may comprise thousands, tens of thousands, or millions of mesh elements, and become compressed into a small data structure (e.g., a vector of m floating point values, where m may equal 128 or some other number).
  • This latent space vector may then be provided to a decoder portion of the autoencoder, which may have been trained to reconstruct that vector into a mesh.
  • Such an autoencoder may be trained on numerous examples of correctly formed 3D representations (e.g., such as a 3D mesh of teeth, a dental arch or a full fixture model including a base attached to the dentition). Through training, the autoencoder may become effective and efficient at deconstructing and subsequently reconstructing the type of mesh represented by the training examples. When training the autoencoder, providing training examples that are reasonably cohesive in form, structure, shape, layout, features, attributes and/or other characteristics may provide one or more benefits in terms of fiiture performance in the execution phase. If an introduced mesh deviates significantly from the characteristics of mesh on which the autoencoder was trained, then the autoencoder may not reconstruct the mesh accurately.
  • 3D representations e.g., such as a 3D mesh of teeth, a dental arch or a full fixture model including a base attached to the dentition.
  • the VAE validation engine may, in some implementations, output an indication of which error occurred, for example what may be wrong with the received 3D representation (e.g., in the case of a tooth crown mesh, hardware may be present).
  • a notification may be generated that the mesh should be corrected or modified to meet expectations for use in designing and/or fabricating a dental and/or orthodontic appliance.
  • the generated notification can be displayed to a user of the system to initiate remedial measures.
  • the generated notification can be used by the system to automatically remediate the identified error. For instance, in the case where the 3D mesh was produced by a scanning a 3D-printed object, the 3D-printed part may need to be reprinted. In some instances, corrections may need to be made to the 3D part before reprinting.
  • the autoencoder-based validation techniques of this disclosure may be advantageously applied to any of the following non-exhaustive list of techniques: segmentation validation, coordinate system validation, mesh cleanup validation, chairside intraoral dental scan validation, clear tray aligners (CT A) setups validation, bracket/attachment placement validation, the validation of custom oral care appliances (e.g., such as validating the shape or placement of a dental restoration appliance component), restoration design generation validation, fixture model validation and CTA trimline validation.
  • CT A clear tray aligners
  • Other meshes associated with digital oral care can also be validated using such techniques (e.g., involving autoencoders, VAEs, calculation of latent space vectors from 3D meshes, reconstruction of latent space vectors into 3D meshes and the calculation of reconstruction error).
  • one or more teeth comprising a setup may be run through the deconstruction/latent vector/reconstruction phases of an autoencoder.
  • an autoencoder may be trained on exemplary setups (e.g., where the tooth poses are suitable for use in creating an orthodontic appliance).
  • the reconstruction error of the output may be measured and used to determine whether the inputted arch represented a well-formed setup, and is suitable for use in designing and fabricating one or more orthodontic appliances, such as clear tray aligners.
  • An encoder-decoder structure may comprise at least one encoder or at least one decoder.
  • Non-limiting examples of an encoder-decoder structure include a U-Net, a transformer or an autoencoder, among others.
  • an oral care appliance or appliance component may be validated (e.g., such as for a dental restoration appliance).
  • An example of such a component is a mold parting surface.
  • a mold parting surface may be combined with one or more teeth via a Boolean operation to divide that tooth.
  • a tooth which has been cut by a parting surface may be encoded into a latent space form (e.g., using an autoencoder which has been trained to reconstruct that type of mesh).
  • This latent space form (latent vector or latent capsule) may be classified by an ML classifier which has been trained to perform such a task.
  • Each of the aforementioned validation applications of this section may, in some implementations, use a capsule autoencoder, such as the capsule autoencoder shown in FIG. 2.
  • the input 3D representation (such as described in Table 1) is provided to the capsule-encoder portion of the capsule autoencoder and may be encoded into one or more latent capsules.
  • the one or more latent capsules may be reconstructed by the capsule-decoder portion of the capsule autoencoder, yielding a facsimile of the input 3D representation.
  • a reconstruction error may then be computed between the input 3D representation and the reconstructed 3D representation.
  • a low reconstruction error (e.g., of a portion of the 3D representation, such as a subset of mesh elements, or of the whole 3D representation) may indicate that the input 3D representation is of the general nature, characters, geometrical attributes, class and/or category of the 3D representation that was used to train the capsule autoencoder.
  • a high reconstruction error (e.g., of a portion of the 3D representation, such as a subset of mesh elements, or of the whole 3D representation) indicates that the input 3D representation is not of the general nature, characters, geometrical attributes, class and/or category of the 3D representation that was used to train the capsule autoencoder.
  • Reconstruction error may be computed using reconstruction loss, KL-divergence loss and/or a combination of the two. Other possible losses include LI loss, L2 loss, MSE loss or any of the losses described elsewhere in this disclosure.
  • one or more of the optional inputs may be provided to an autoencoder for mesh validation, including but not limited to: tooth dimension info P, tooth gap info Q, tooth position info N, tooth orientation info O, tooth name/flag for each available tooth R and/or orthodontic metrics S.
  • FIG. 14 shows these inputs to an autoencoder for mesh validation (or the validation of other 3D representations described herein). As such, FIG. 14 shows a deployment method for validation of oral care meshes using an autoencoder.
  • a capsule autoencoder may be used in place of the autoencoder.
  • Archform information may, in some implementations, be provided to the autoencoder.
  • Other 3D oral care representations may also be validated by comparable methods, using reconstruction autoencoders which have been trained to reconstruct that particular type of 3D oral care representation (e.g., labels on mesh elements, or transformation matrices, etc.).
  • the encoder-decoder structure comprising encoder El 1402 and decoder DI 1406 may be trained as a reconstruction autoencoder to reconstruct a particular type of 3D oral care representation (e.g., reconstruct meshes, sets of mesh element labels, transforms, or others described herein).
  • the method shown in FIG. 14 may perform validation of a trial 3D oral care representation 1400 (e.g., a tooth mesh, an appliance component mesh, a fixture model mesh, a set of mesh element labels, a set of tooth transforms, a set of transforms for appliance components, or others described herein).
  • Trail 3D oral care representation may be provided to encoder El 1402, which may generate latent representation 1404.
  • the latent representation 1404 may be provided to decoder DI 1406, which may reconstruct the latent representation 1404 into a reconstructed 3D oral care representation 1408, which is a close facsimile of the trail 3D oral care representation 1400.
  • a reconstruction error may be computed (1410) between the reconstructed 3D oral care representation 1408 and the trial 3D oral care representation 1400.
  • the method outputs (1416) an indication that the trial 3D oral care representation 1400 does not pass validation.
  • the method When the reconstruction error is below a threshold (1412) for all portions of the reconstructed 3D oral care representation 1408, then the method outputs and indication (1414) that the trial 3D oral care representation 1400 passes validation.
  • the validation applications described herein may in some implementations be incorporated into source code testing frameworks, as described in WO2022123402A1. The entire disclosure of PCT Patent Application WO2022123402A1 is incorporated herein by reference.
  • Techniques of this disclosure may, in some implementations, use PointNet, PointNet++, or derivative neural networks (e.g., networks trained via transfer learning using either PointNet or PointNet++ as a basis for training) to extract local or global neural network features from a 3D point cloud or other 3D representation (e.g., a 3D point cloud describing aspects of the patient’s dentition - such as teeth or gums).
  • Techniques of this disclosure may, in some implementations, use U-Nets to extract local or global neural network features from a 3D point cloud or other 3D representation.
  • 3D oral care representations are described herein as such because 3-dimensional representations are currently state of the art. Nevertheless, 3D oral care representations are intended to be used in a non-limiting fashion to encompass any representations of 3 -dimensions or higher orders of dimensionality (e.g., 4D, 5D, etc.), and it should be appreciated that machine learning models can be trained using the techniques disclosed herein to operate on representations of higher orders of dimensionality.
  • input data may comprise 3D mesh data, 3D point cloud data, 3D surface data, 3D polyline data, 3D voxel data, or data pertaining to a spline (e.g., control points).
  • An encoderdecoder structure may comprise one or more encoders, or one or more decoders.
  • the encoder may take as input mesh element feature vectors for one or more of the inputted mesh elements. By processing mesh element feature vectors, the encoder is trained in a manner to generate more accurate representations of the input data.
  • the mesh element feature vectors may provide the encoder with more information about the shape and/or structure of the mesh, and therefore the additional information provided allows the encoder to make better-informed decisions and/or generate more-accurate latent representations of the mesh.
  • encoder-decoder structures include U-Nets, autoencoders or transformers (among others).
  • a representation generation module may comprise one or more encoder-decoder structures (or portions of encoders-decoder structures - such as individual encoders or individual decoders).
  • a representation generation module may generate an information-rich (optionally reduced-dimensionality) representation of the input data, which may be more easily consumed by other generative or discriminative machine learning models.
  • a U-Net may comprise an encoder, followed by a decoder.
  • the architecture of a U-Net may resemble a U shape.
  • the encoder may extract one or more global neural network features from the input 3D representation, zero or more intermediate-level neural network features, or one or more local neural network features (at the most local level as contrasted with the most global level).
  • the output from each level of the encoder may be passed along to the input of corresponding levels of a decoder (e.g., by way of skip connections).
  • the decoder may operate on multiple levels of global-to-local neural network features. For instance, the decoder may output a representation of the input data which may contain global, intermediate or local information about the input data.
  • the U-Net may, in some implementations, generate an information-rich (optionally reduced-dimensionality) representation of the input data, which may be more easily consumed by other generative or discriminative machine learning models.
  • An autoencoder may be configured to encode the input data into a latent form.
  • An autoencoder may train an encoder to reformat the input data into a reduced-dimensionality latent form in between the encoder and the decoder, and then train a decoder to reconstruct the input data from that latent form of the data.
  • a reconstruction error may be computed to quantify the extent to which the reconstructed form of the data differs from the input data.
  • the latent form may, in some implementations, be used as an information-rich reduced-dimensionality representation of the input data which may be more easily consumed by other generative or discriminative machine learning models.
  • an autoencoder may be trained to input a 3D representation, encode that 3D representation into a latent form (e.g., a latent embedding), and then reconstruct a close facsimile of that input 3D representation as the output.
  • a latent form e.g., a latent embedding
  • a transformer may be trained to use self-attention to generate, at least in part, representations of its input.
  • a transformer may encode long-range dependencies (e.g., encode relationships between a large number of inputs).
  • a transformer may comprise an encoder or a decoder. Such an encoder may, in some implementations, operate in a bi-directional fashion or may operate a self-attention mechanism.
  • Such a decoder may, in some implementations, may operate a masked self-attention mechanism, may operate a cross-attention mechanism, or may operate in an auto-regressive manner.
  • the self-attention operations of the transformers described herein may, in some implementations, relate different positions or aspects of an individual 3D oral care representation in order to compute a reduced-dimensionality representation of that 3D oral care representation.
  • the cross-attention operations of the transformers described herein may, in some implementations, mix or combine aspects of two (or more) different 3D oral care representations.
  • the auto-regressive operations of the transformers described herein may, in some implementations, consume previously generated aspects of 3D oral care representations (e.g., previously generated points, point clouds, transforms, etc.) as additional input when generating a new or modified 3D oral care representation.
  • the transformer may, in some implementations, generate a latent form of the input data, which may be used as an information-rich reduced-dimensionality representation of the input data, which may be more easily consumed by other generative or discriminative machine learning models.
  • an encoder-decoder structure may first be trained as an autoencoder. In deployment, one or more modifications may be made to the latent form of the input data. This modified latent form may then proceed to be reconstructed by the decoder, yielding a reconstructed form of the input data which differs from the input data in one or more intended aspects. Oral care arguments, such as oral care parameters or oral care metrics may be supplied to the encoder, the decoder, or may be used in the modification of the latent form, to influence the encoder-decoder structure in generating a reconstructed form that has desired characteristics (e.g., characteristics which may differ from that of the input data).
  • Federated learning may enable multiple remote clinicians to iteratively improve a machine learning model (e.g., validation of 3D oral care representations, mesh segmentation, mesh cleanup, other techniques which involve labeling mesh elements, coordinate system prediction, non-organic object placement on teeth, appliance component generation, tooth restoration design generation, techniques for placing 3D oral care representations, setups prediction, generation or modification of 3D oral care representations using autoencoders, generation or modification of 3D oral care representations using transformers, generation or modification of 3D oral care representations using diffusion models, 3D oral care representation classification, imputation of missing values), while protecting data privacy (e.g., the clinical data may not need to be sent “over the wire” to a third party).
  • a machine learning model e.g., validation of 3D oral care representations, mesh segmentation, mesh cleanup, other techniques which involve labeling mesh elements, coordinate system prediction, non-organic object placement on teeth, appliance component generation, tooth restoration design generation, techniques for placing 3D oral care representations, setups prediction, generation or modification of
  • a clinician may receive a copy of a machine learning model, use a local machine learning program to further train that ML model using locally available data from the local clinic, and then send the updated ML model back to the central hub or third party.
  • the central hub or third party may integrate the updated ML models from multiple clinicians into a single updated ML model which benefits from the learnings of recently collected patient data at the various clinical sites. In this way, a new ML model may be trained which benefits from additional and updated patient data (possibly from multiple clinical sites), while those patient data are never actually sent to the 3rd party.
  • Training on a local in-clinic device may, in some instances, be performed when the device is idle or otherwise be performed during off-hours (e.g., when patients are not being treated in the clinic).
  • Devices in the clinical environment for the collection of data and/or the training of ML models for techniques described herein may include intra-oral scanners, CT scanners, X- ray machines, laptop computers, servers, desktop computers or handheld devices (such as smart phones with image collection capability).
  • contrastive learning may be used to train, at least in part, the ML models described herein. Contrastive learning may, in some instances, augment samples in a training dataset to accentuate the differences in samples from difference classes and/or increase the similarity of samples of the same class.
  • a local coordinate system for a 3D oral care representation such as a tooth
  • a 3D oral care representation such as a tooth
  • transforms e.g., an affine transformation matrix, translation vector or quaternion
  • Systems of this disclosure may be trained for coordinate system prediction using past cohort patient case data.
  • the past patient data may include at least: one or more tooth meshes or one or more ground truth tooth coordinate systems.
  • Machine learning models such as: U-Nets, encoders, autoencoders, pyramid encoder-decoders, transformers, or convolution and/or pooling layers, may be trained for coordinate system prediction.
  • Representation learning may determine a representation of a tooth (e.g., encodeing a mesh or point cloud into a latent representation, for example, using a U-Net, encoder, transformer, convolution and/or pooling layers or the like), and then predict a transform for that representation (e.g., using a trained multilayer perceptron, transformer, encoder, transformer, or the like) that defines a local coordinate system for that representation (e.g., comprising one or more coordinate axes).
  • a representation of a tooth e.g., encodeing a mesh or point cloud into a latent representation, for example, using a U-Net, encoder, transformer, convolution and/or pooling layers or the like
  • a transform for that representation e.g., using a trained multilayer perceptron, transformer, encoder, transformer, or the like
  • a local coordinate system for that representation e.g., comprising one or more coordinate axes.
  • the mesh convolutional techniques described herein can leverage invariance to rotations, translations, and/or scaling of that tooth mesh to generate predications that techniques that are not invariant to the rotations, translations, and/or scaling of that tooth mesh cannot generate.
  • Pose transfer techniques may be trained for coordinate system prediction, in the form of predicting a transform for a tooth.
  • Reinforcement learning techniques may be trained for coordinate system prediction, in the form of predicting a transform for a tooth.
  • Machine learning models such as: U-Nets, encoders, autoencoders, pyramid encoderdecoders, transformers, or convolution and/or pooling layers, may be trained as a part of a method for hardware (or appliance component) placement.
  • Representation learning may train a first module to determine an embedded representation of a 3D oral care representation (e.g., encodeing a mesh or point cloud into a latent form using an autoencoder, or using a U-Net, encoder, transformer, block of convolution and/or pooling layers or the like). That representation may comprise a reduced dimensionality form and/or information-rich version of the inputted 3D oral care representation.
  • a representation may be aided by the calculation of a mesh element feature vector for one or more mesh elements (e.g., each mesh element).
  • a representation may be computed for a hardware element (or appliance component).
  • Such representations are suitable to be provided to a second module, which may perform a generative task, such as transform prediction (e.g., a transform to place a 3D oral care representation relative to another 3D oral care representation, such as to place a hardware element or appliance component relative to one or more teeth) or 3D point cloud generation.
  • transform prediction e.g., a transform to place a 3D oral care representation relative to another 3D oral care representation, such as to place a hardware element or appliance component relative to one or more teeth
  • 3D point cloud generation e.g., a transform to place a 3D oral care representation relative to another 3D oral care representation, such as to place a hardware element or appliance component relative to one or more teeth
  • Such a transform may comprise an affine transformation matrix, translation vector or quatern
  • Machine learning models which may be trained to predict a transform to place a hardware element (or appliance component) relative to elements of patient dentition include: MLP, transformer, encoder, or the like.
  • Systems of this disclosure may be trained for 3D oral care appliance placement using past cohort patient case data.
  • the past patient data may include at least: one or more ground truth transforms and one or more 3D oral care representations (such as tooth meshes, or other elements of patient dentition).
  • the mesh convolution and/or mesh pooling techniques described herein leverage invariance to rotations, translations, and/or scaling of that tooth mesh to generate predications that techniques that are not invariant to the rotations, translations, and/or scaling of that tooth mesh cannot generate.
  • Pose transfer techniques may be trained for hardware or appliance component placement.
  • Reinforcement learning techniques may be trained for hardware or appliance component placement.

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • Software Systems (AREA)
  • Evolutionary Computation (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Artificial Intelligence (AREA)
  • Medical Informatics (AREA)
  • General Health & Medical Sciences (AREA)
  • Databases & Information Systems (AREA)
  • Health & Medical Sciences (AREA)
  • Computing Systems (AREA)
  • Multimedia (AREA)
  • Computer Graphics (AREA)
  • Geometry (AREA)
  • Dental Tools And Instruments Or Auxiliary Dental Instruments (AREA)

Abstract

L'invention concerne des systèmes et des techniques pour valider des représentations de soins buccodentaires tridimensionnelles (3D) utilisées dans un traitement de soins buccodentaires numériques. Le procédé consiste à recevoir une représentation de soins buccodentaires 3D associée au traitement de soins buccodentaires d'un patient à l'aide de circuits de traitement d'un dispositif informatique. La représentation de soins buccodentaires 3D est ensuite fournie en tant qu'entrée à un réseau autocodeur entraîné pendant la phase d'exécution. Le réseau autocodeur entraîné génère une représentation de soins buccodentaires 3D reconstruite qui ressemble étroitement à la représentation d'origine. De plus, le réseau autocodeur délivre une indication de l'adéquation de la représentation de soins buccodentaires 3D pour générer un appareil de soins buccodentaires pour le patient. Ces systèmes et techniques garantissent la précision et la fiabilité des représentations de soins buccodentaires 3D, permettant une planification de traitement et une génération d'appareil efficaces et effectives dans le domaine des soins de santé buccodentaires.
PCT/IB2023/062704 2022-12-14 2023-12-14 Autocodeurs pour la validation de représentations de soins buccodentaires 3d WO2024127310A1 (fr)

Applications Claiming Priority (4)

Application Number Priority Date Filing Date Title
US202263432627P 2022-12-14 2022-12-14
US63/432,627 2022-12-14
US202363461505P 2023-04-24 2023-04-24
US63/461,505 2023-04-24

Publications (1)

Publication Number Publication Date
WO2024127310A1 true WO2024127310A1 (fr) 2024-06-20

Family

ID=89378497

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/IB2023/062704 WO2024127310A1 (fr) 2022-12-14 2023-12-14 Autocodeurs pour la validation de représentations de soins buccodentaires 3d

Country Status (1)

Country Link
WO (1) WO2024127310A1 (fr)

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2020026117A1 (fr) 2018-07-31 2020-02-06 3M Innovative Properties Company Procédé de génération automatisée de configurations finales de traitement orthodontique
WO2020181975A1 (fr) * 2019-03-14 2020-09-17 杭州朝厚信息科技有限公司 Procédé de génération, au moyen d'un réseau neuronal à apprentissage profond basé sur un auto-codeur variationnel, d'un ensemble de données numériques représentant une disposition de dent cible
WO2021245480A1 (fr) 2020-06-03 2021-12-09 3M Innovative Properties Company Système pour générer un traitement d'aligneur orthodontique par étapes
WO2022123402A1 (fr) 2020-12-11 2022-06-16 3M Innovative Properties Company Traitement automatisé de balayages dentaires à l'aide d'un apprentissage profond géométrique
US20220222896A1 (en) * 2019-02-27 2022-07-14 3Shape A/S Method for manipulating 3d objects by flattened mesh

Patent Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2020026117A1 (fr) 2018-07-31 2020-02-06 3M Innovative Properties Company Procédé de génération automatisée de configurations finales de traitement orthodontique
US20210259808A1 (en) 2018-07-31 2021-08-26 3M Innovative Properties Company Method for automated generation of orthodontic treatment final setups
US20220222896A1 (en) * 2019-02-27 2022-07-14 3Shape A/S Method for manipulating 3d objects by flattened mesh
WO2020181975A1 (fr) * 2019-03-14 2020-09-17 杭州朝厚信息科技有限公司 Procédé de génération, au moyen d'un réseau neuronal à apprentissage profond basé sur un auto-codeur variationnel, d'un ensemble de données numériques représentant une disposition de dent cible
WO2021245480A1 (fr) 2020-06-03 2021-12-09 3M Innovative Properties Company Système pour générer un traitement d'aligneur orthodontique par étapes
WO2022123402A1 (fr) 2020-12-11 2022-06-16 3M Innovative Properties Company Traitement automatisé de balayages dentaires à l'aide d'un apprentissage profond géométrique

Non-Patent Citations (5)

* Cited by examiner, † Cited by third party
Title
"Equidistant and Uniform Data Augmentation for 3D Objects", IEEE ACCESS, DIGITAL OBJECT IDENTIFIER
J. M. KANTERK. VEERAMACHANENI: "Deep feature synthesis: Towards automating data science endeavors", IEEE INTERNATIONAL CONFERENCE ON DATA SCIENCE AND ADVANCED ANALYTICS (DSAA, 2015, pages 1 - 10, XP032826310, DOI: 10.1109/DSAA.2015.7344858
P. CIGNONIC. ROCCHINIR. SCOPIGNO: "Computer Graphics Forum", vol. 17, June 1998, BLACKWELL PUBLISHERS, article "Metro: measuring error on simplified surfaces", pages: 167 - 174
TONIONI A ET AL.: "Learning to detect good 3D keypoints.", INT J COMPUT. VIS., vol. 126, 2018, pages 1 - 20, XP036405732, DOI: 10.1007/s11263-017-1037-3
YANG LINGCHEN ET AL: "iOrthoPredictor", ACM TRANSACTIONS ON GRAPHICS, ACM, NY, US, vol. 39, no. 6, 26 November 2020 (2020-11-26), pages 1 - 15, XP059134801, ISSN: 0730-0301, DOI: 10.1145/3414685.3417771 *

Similar Documents

Publication Publication Date Title
JP7493464B2 (ja) 3dオブジェクトの正準ポーズの自動化判定、および深層学習を使った3dオブジェクトの重ね合わせ
JP7489964B2 (ja) 深層学習を使用した自動化矯正治療計画
JP2023552589A (ja) 幾何学的深層学習を使用する歯科スキャンの自動処理
WO2023242757A1 (fr) Génération de géométrie pour des appareils de restauration dentaire et validation de cette géométrie
Brahmi et al. Automatic tooth instance segmentation and identification from panoramic X-Ray images using deep CNN
WO2024127310A1 (fr) Autocodeurs pour la validation de représentations de soins buccodentaires 3d
WO2024127316A1 (fr) Autocodeurs pour le traitement de représentations 3d dans des soins buccodentaires numériques
WO2024127308A1 (fr) Classification de représentations 3d de soins bucco-dentaires
WO2024127309A1 (fr) Autoencodeurs pour configurations finales et étapes intermédiaires d'aligneurs transparents
WO2024127311A1 (fr) Modèles d'apprentissage automatique pour génération de conception de restauration dentaire
WO2024127304A1 (fr) Transformateurs pour configurations finales et stadification intermédiaire dans des aligneurs de plateaux transparents
WO2024127302A1 (fr) Apprentissage profond géométrique pour configurations finales et séquençage intermédiaire dans le domaine des aligneurs transparents
WO2024127306A1 (fr) Techniques de transfert de pose pour des représentations de soins bucco-dentaires en 3d
WO2024127303A1 (fr) Apprentissage par renforcement pour configurations finales et organisation intermédiaire dans des aligneurs de plateaux transparents
WO2024127313A1 (fr) Calcul et visualisation de métriques dans des soins buccaux numériques
WO2024127315A1 (fr) Techniques de réseau neuronal pour la création d'appareils dans des soins buccodentaires numériques
WO2023242771A1 (fr) Validation de configurations de dents pour des aligneurs en orthodontie numérique
WO2023242776A1 (fr) Placement de boîtier et fixation en orthodontie numérique, et validation de ces placements
WO2023242763A1 (fr) Segmentation de maillage et validation de segmentation de maillage en dentisterie numérique
WO2023242767A1 (fr) Prédiction de système de coordonnées en odontologie numérique et orthodontie numérique et validation de ladite prédiction
WO2023242774A1 (fr) Validation pour des parties de prototypage rapide en dentisterie
WO2023242761A1 (fr) Validation pour la mise en place et la génération de composants pour des appareils de restauration dentaire
WO2024127314A1 (fr) Imputation de valeurs de paramètres ou de valeurs métriques dans des soins buccaux numériques
WO2024127318A1 (fr) Débruitage de modèles de diffusion pour soins buccaux numériques
WO2023242765A1 (fr) Validation de modèle d'appareil pour des aligneurs en orthodontie numérique