WO2022194258A1

WO2022194258A1 - Method and apparatus for training dental cast deformation model

Info

Publication number: WO2022194258A1
Application number: PCT/CN2022/081543
Authority: WO
Inventors: 刘娜丽; 田彦; 江腾飞; 赵晓波
Original assignee: 先临三维科技股份有限公司
Priority date: 2021-03-17
Filing date: 2022-03-17
Publication date: 2022-09-22
Also published as: CN112884885A

Abstract

Provided in the embodiments of the present disclosure are a method and apparatus for training a dental cast deformation model, which method and apparatus relate to the technical field of three-dimensional deformation. The method comprises: acquiring sample data, wherein the sample data comprises a plurality of initial dental casts acquired by means of scanning an oral cavity, and corresponding target deformation models obtained by means of artificially processing the initial dental casts; acquiring a feature tensor corresponding to each initial dental cast, wherein each element of the feature tensor corresponding to each initial dental cast is a TSDF value of each voxel in a cube space where each initial dental cast is located; inputting the feature tensor corresponding to each initial dental cast into a preset network model, so as to acquire a predicted deformation model corresponding to each initial dental cast; and according to a target deformation model and the predicted deformation model corresponding to each initial dental cast, optimizing the preset network model to acquire a dental cast deformation model. The embodiments of the present disclosure are used for acquiring a dental cast deformation model which can automatically convert an initial dental cast into a dental cast meeting specific product requirements.

Description

A method and device for training a dental model deformation model

This disclosure claims the priority of the Chinese patent application filed on March 17, 2021 with the application number 202110287715.X and titled "A method and device for training a dental model deformation model", the entire contents of which are approved by References are incorporated in this disclosure.

technical field

The present disclosure relates to the technical field of three-dimensional deformation, and in particular, to a method and device for training a deformation model of a dental mold.

Background technique

Tooth digitization technology aims at 3D modeling of teeth to obtain digital tooth models, so as to realize subsequent processing and personalized customization.

In general, the final actual tooth model is not the initial tooth model obtained by scanning the oral cavity and performing 3D reconstruction, but the initial tooth model is further processed based on the specific product requirements, and then the tooth model that meets the specific product requirements is obtained. Among them, the process of processing an initial tooth model based on specific product requirements to obtain a tooth model that meets specific product requirements is called 3D dental model deformation. At present, the deformation of 3D dental models is generally done manually. That is, a person manually processes the initial tooth model based on the specific product requirements so that the initial tooth model meets the specific product requirements. However, manually completing the 3D dental model deformation has many disadvantages, such as low efficiency, high cost, and unreliable quality. Therefore, how to automatically convert the initial tooth model into a tooth model that meets the requirements of a specific product has become an urgent problem in the field to be solved.

SUMMARY OF THE INVENTION

(1) Technical problems to be solved

Manually completing the 3D dental model deformation has problems such as low efficiency, high cost, and unreliable quality.

(2) Technical solutions

In order to solve the above problems, the technical solutions provided by the embodiments of the present disclosure are as follows:

In a first aspect, an embodiment of the present disclosure provides a method for training a dental model deformation model, including:

Obtaining sample data, the sample data includes a plurality of initial tooth models obtained by scanning the oral cavity and a target deformation model corresponding to each initial tooth model obtained by manually processing each initial tooth model;

Obtain the feature tensor corresponding to each initial tooth model, and each element of the feature tensor corresponding to each initial tooth model is the truncated signed distance function TSDF value of each voxel in the cubic space where each initial tooth model is located;

Input the feature tensor corresponding to each initial tooth model into the preset network model, and obtain the predicted deformation model corresponding to each initial tooth model;

According to the target deformation model and the predicted deformation model corresponding to each initial tooth model, the preset network model is optimized to obtain the tooth model deformation model.

As an optional implementation manner of the embodiment of the present disclosure, the preset network model includes: an encoder component composed of multiple encoders with a serial structure, a self-attention component, a feature transfer component, a multi-scale analysis component, and a decoder component composed of a plurality of decoders in a series structure; the input of the encoder component is the input of the preset network model, and the output of the encoder component is the input of the self-attention component; the The output of the self-attention component is the input of the feature transfer component; the output of the feature transfer component is the input of the multi-scale analysis component, and the output of the multi-scale analysis component is the input of the decoder component, so The output of the decoder component is the output of the preset network model;

Wherein, the self-attention component is used to extract non-local information on the feature tensor output by the encoder component to obtain an environment feature tensor; the feature transfer component processes the output of the self-attention component, and transmit the processing result to the multi-scale analysis component; the multi-scale analysis component is used for extracting the feature tensors in multiple scales of the feature tensor output by the feature transfer component.

As an optional implementation manner of the embodiment of the present disclosure, the encoder component includes three encoders with a serial structure, each encoder includes a residual unit and a down-sampling unit; the residual unit of each encoder is used to pass The three convolutional layers of concatenated structure perform a convolution operation on the input of the residual unit and perform a summation operation on the convolution result of the convolution operation and the input of the residual unit, and the downsampling unit of each encoder is used to The input feature tensor downsampling is the output feature tensor whose number of channels is twice the number of channels of the input feature tensor, and the length, width and height are half of the length, width and height of the input feature tensor;

The input of each residual unit is the input of the encoder to which it belongs, the input of each down-sampling unit is the output of the residual unit of the encoder to which it belongs, and the output of each down-sampling unit is the output of the encoder to which it belongs. The input of the first encoder is the input of the encoder component, the output of the third encoder is the output of the encoder component, and the input of the second encoder and the third encoder are the first encoder and the second encoder respectively. output of the device.

As an optional implementation manner of the embodiment of the present disclosure, the self-attention component includes: a residual unit, a first convolutional layer, a second convolutional layer, a third convolutional layer, a fourth convolutional layer, a Five convolution layers, sixth convolution layer, first dot product unit, second dot product unit, first sum unit, and second sum unit; the residual unit is used for convolution through three concatenated structures The layer performs a convolution operation on the input of the residual unit and performs a summation operation on the convolution result of the convolution operation and the input of the residual unit, and the first dot product unit and the second dot product unit are used for the input. The feature tensor performs a dot product operation, and the first summation unit and the second summation unit are used to perform a summation operation on the input feature tensor;

The input of the residual unit is the output of the encoder component, the output of the residual unit is the input of the first convolutional layer; the output of the first convolutional layer is the second convolutional layer layer, the third convolution layer, the input of the fourth convolution layer; the input of the first dot product unit is the output of the second convolution layer and the output of the third convolution layer; The input of the second dot product unit is the output of the first dot product unit and the output of the fourth convolution layer; the input of the fifth convolution layer is the output of the second dot product unit; The input of the first summing unit is the output of the fifth convolutional layer and the output of the first convolutional layer; the input of the sixth convolutional layer is the output of the first summing unit; The input of the second summing unit is the output of the sixth convolutional layer and the output of the residual unit, and the output of the second summing unit is the output of the self-attention component.

As an optional implementation manner of the embodiment of the present disclosure, the feature transfer component includes a downsampling unit and a residual unit;

The downsampling unit is used for downsampling the input feature tensor to a number of channels that is twice the number of channels of the input feature tensor, and the length, width and height are half of the length, width and height of the input feature tensor. The output feature tensor of add operation;

The input of the downsampling unit is the output of the self-attention component, the output of the downsampling unit is the input of the residual unit, and the output of the residual unit is the output of the feature transfer component.

As an optional implementation manner of the embodiment of the present disclosure, the multi-scale analysis component includes: a seventh convolution layer, an eighth convolution layer, a ninth convolution layer, a tenth convolution layer, and an eleventh convolution layer layer, the twelfth convolution layer and the splicing unit; the expansion rates of the seventh convolution layer, the eighth convolution layer, the ninth convolution layer, and the tenth convolution layer are different; The splicing unit is used to perform a splicing operation on the input feature tensor;

The inputs of the seventh convolutional layer, the eighth convolutional layer, the ninth convolutional layer and the tenth convolutional layer are the outputs of the feature transfer component, and the inputs of the splicing unit are the output of the feature transfer component, the output of the seventh convolutional layer, the output of the eighth convolutional layer, the output of the ninth convolutional layer, and the output of the tenth convolutional layer, the The input of the eleventh convolutional layer is the output of the splicing unit, the input of the twelfth convolutional layer is the output of the eleventh convolutional layer; the output of the twelfth convolutional layer is the output of the Multiscale Analysis component described above.

As an optional implementation manner of the embodiment of the present disclosure, the decoder component includes four decoders in a serial structure; each decoder includes: an upsampling unit, a fusion unit, and a residual unit; the residual unit of each decoder It is used to perform a convolution operation on the input of the residual unit through three convolutional layers in a concatenated structure and perform a summation operation on the convolution result of the convolution operation and the input of the residual unit, and the fusion units of each decoder are all It is used to fuse the input feature tensor; the upsampling unit of each decoder is used to upsample the input feature tensor to a channel number that is half the number of channels of the input feature tensor, and the length, width, The output feature tensor whose height is twice the length, width and height of the input feature tensor;

The input of the upsampling unit of the first decoder of the decoder component is the output of the multi-scale analysis component, and the input of the fusion unit of the first decoder is the upsampling unit of the first decoder. output and the output of the self-attention module; the input of the residual unit of the first decoder is the output of the fusion unit of the first decoder and the output of the self-attention module; the decoding The inputs of the second decoder, the third decoder, and the up-sampling unit of the fourth decoder of the decoder component are the outputs of the previous decoder, and the second decoder and third decoder of the decoder component , the input of the fusion unit of the fourth decoder corresponds to the output of the residual unit of the encoder and the output of the upsampling unit of the corresponding decoder respectively, the second decoder of the decoder component, the third decoding, The input of the residual unit of the fourth decoder is the output of the corresponding residual unit of the encoder and the output of the corresponding fusion unit of the decoder.

As an optional implementation of the embodiment of the present disclosure, the fusion unit of each decoder includes: a thirteenth convolutional layer, a fourteenth convolutional layer, a fifteenth convolutional layer, a third summation unit, and a fourth summation unit Units a third dot product unit and a fourth dot product unit; the third and fourth summation units are used to perform an addition operation on the input, and the third and fourth dot product units are used to perform an addition operation on the input perform a dot product operation;

The inputs of the thirteenth convolutional layer and the fourteenth convolutional layer of the fusion unit of the first decoder of the decoder component are the output of the upsampling unit of the first decoder and the self-attention, respectively The output of the component, the input of the fusion unit of the second decoder, the third decoder, and the fourth decoder of the decoder component is the output of the upsampling unit of the corresponding decoder and the residual of the corresponding encoder. The output of the difference unit, the input of the third summation unit is the output of the thirteenth convolutional layer and the output of the fourteenth convolutional layer, and the input of the fifteenth convolutional layer is the third summation output of the unit, the input of the third dot product unit is the output of the thirteenth convolution layer and the output of the fifteenth convolution layer, and the input of the fourth dot product unit is the tenth The output of the fourth convolutional layer and the output of the fifteenth convolutional layer, the input of the fourth summing unit is the output of the third dot product unit and the output of the fourth dot product unit, the The output of the fourth summing unit is the output of the associated fusion unit.

As an optional implementation manner of the embodiment of the present disclosure, according to the target deformation model and the predicted deformation model corresponding to each initial tooth model, the preset network model is optimized to obtain the tooth model deformation model, including:

Constructing a loss function, and optimizing the preset network model to obtain a tooth model deformation model according to the loss function, the target deformation model corresponding to each initial tooth model, and the predicted deformation model;

Wherein, the loss function includes:

Wherein, alpha is a constant, out _i is the data obtained by sequentially processing the output of the multi-scale analysis component and the output of each decoder of the decoder component, seg is the intermediate supervision signal, and mean() is the averaging function .

In a second aspect, an embodiment of the present disclosure provides a device for establishing a dental mold deformation model, including:

a sample acquisition unit, used for acquiring sample data, the sample data includes a plurality of initial tooth models obtained by scanning the oral cavity and a target deformation model corresponding to each initial tooth model obtained by manually processing each initial tooth model;

The preprocessing unit is used to obtain the feature tensor corresponding to each initial tooth model, and each element of the feature tensor corresponding to each initial tooth model is the truncated signed distance function value TSDF value of each voxel in the cubic space where each initial tooth model is located;

The prediction unit is used to input the feature tensor corresponding to each initial tooth model into the preset network model, and obtain the predicted deformation model corresponding to each initial tooth model;

The optimization unit is configured to optimize the preset network model according to the target deformation model and the predicted deformation model corresponding to each initial tooth model to obtain the tooth model deformation model.

As an optional implementation manner of the embodiment of the present disclosure, the multi-scale analysis component includes: a seventh convolution layer, an eighth convolution layer, a ninth convolution layer, a tenth convolution layer, and an eleventh convolution layer layer, the twelfth convolution layer and the splicing unit; the expansion rates of the seventh convolution layer, the eighth convolution layer, the ninth convolution layer, and the tenth convolution layer are all different; the The stitching unit is used to perform a stitching operation on the input feature tensor;

As an optional implementation manner of the embodiment of the present disclosure, the optimization unit is specifically configured to construct a loss function, and according to the loss function, the target deformation model corresponding to each initial tooth model, and the predicted deformation model, perform a The network model is optimized to obtain the dental model deformation model;

Wherein, the loss function includes:

In a third aspect, an embodiment of the present disclosure provides an electronic device, including: a memory and a processor, where the memory is used to store a computer program; the processor is used to execute the first aspect or any optional option of the first aspect when the computer program is invoked The method for training a dental model deformation model according to the embodiment.

In a fourth aspect, an embodiment of the present disclosure provides a computer-readable storage medium on which a computer program is stored, and when the computer program is executed by a processor, the first aspect or any optional implementation manner of the first aspect is implemented. A method for training a dental model deformation model.

In a fifth aspect, an embodiment of the present disclosure provides a computer program product, including a computer program/instruction, when the computer program/instruction is executed by a processor, the first aspect or any optional implementation manner of the first aspect is implemented. A method for training a dental model deformation model

Three (beneficial effect)

The method for training a dental model deformation model provided by the embodiment of the present disclosure first obtains sample data including multiple initial tooth models and target deformation models corresponding to each initial tooth model, and then obtains the TSDF of each voxel in the cubic space where each initial tooth model is located The value is used as the element of the feature tensor corresponding to each initial tooth model, and then the feature tensor corresponding to each initial tooth model is input into the preset network model, and the predicted deformation model corresponding to each initial tooth model is obtained. The target deformation model and the predicted deformation model are optimized to obtain the dental model deformation model by optimizing the preset network model. Since the method for training a dental model deformation model provided by the embodiment of the present disclosure can obtain a dental model deformation model, and obtain a dental model that meets the requirements of a specific product according to the initial model that needs to be deformed according to the dental model deformation model, the embodiment of the present disclosure is adopted. The dental model deformation model obtained by the provided method for training a dental model deformation model can automatically convert the initial tooth model into a tooth model that meets specific product requirements.

Description of drawings

The accompanying drawings, which are incorporated in and constitute a part of this specification, illustrate embodiments consistent with the disclosure and together with the description serve to explain the principles of the disclosure.

In order to more clearly illustrate the embodiments of the present disclosure or the technical solutions in the prior art, the accompanying drawings that are required to be used in the description of the embodiments or the prior art will be briefly introduced below. In other words, on the premise of no creative labor, other drawings can also be obtained from these drawings.

1 is a flowchart of a method for training a dental model deformation model provided by one or more embodiments of the present disclosure;

2 is an architectural diagram of a preset network model provided by one or more embodiments of the present disclosure;

3 is a schematic structural diagram of an encoder assembly provided by one or more embodiments of the present disclosure;

4 is a schematic structural diagram of a self-attention component provided by one or more embodiments of the present disclosure;

5 is a schematic structural diagram of a feature transfer assembly provided by one or more embodiments of the present disclosure;

6 is a schematic structural diagram of a multi-scale analysis component provided by one or more embodiments of the present disclosure;

7 is a schematic structural diagram of a decoder component provided by one or more embodiments of the present disclosure;

8 is a schematic structural diagram of a fusion unit provided by one or more embodiments of the present disclosure;

9 is a structural diagram of an apparatus for training a dental mold deformation model provided by one or more embodiments of the present disclosure;

FIG. 10 is a schematic diagram of a hardware structure of an electronic device provided by one or more embodiments of the present disclosure.

Detailed ways

In order to more clearly understand the above objects, features and advantages of the present disclosure, the solutions of the present disclosure will be further described below. It should be noted that the embodiments of the present disclosure and the features in the embodiments may be combined with each other under the condition of no conflict.

Many specific details are set forth in the following description to facilitate a full understanding of the present disclosure, but the present disclosure can also be implemented in other ways different from those described herein; obviously, the embodiments in the specification are only a part of the embodiments of the present disclosure, and Not all examples.

In the embodiments of the present disclosure, words such as "exemplary" or "such as" are used to mean serving as an example, illustration, or illustration. Any embodiments or designs described in the embodiments of the present disclosure as "exemplary" or "such as" should not be construed as preferred or advantageous over other embodiments or designs. Rather, the use of words such as "exemplary" or "such as" is intended to present the related concepts in a specific manner. In addition, in the description of the embodiments of the present disclosure, unless otherwise specified, the meaning of "plurality" refers to two or more.

The execution subject of the method for training a dental model deformation model provided by the embodiment of the present disclosure may be a device for establishing a dental model deformation model. The device for establishing a dental model deformation model can be a mobile phone, a tablet computer, a notebook computer, an ultra-mobile personal computer (UMPC), a netbook, a personal digital assistant (PDA), a smart watch, a smart hand A terminal device such as a ring, or the terminal device may also be another type of terminal device, and the embodiment of the present disclosure does not limit the type of the terminal device.

An embodiment of the present disclosure provides a method for training a dental model deformation model. Referring to FIG. 1 , the method for training a dental model deformation model includes the following steps S11 to S14:

S11. Obtain sample data.

The sample data includes a plurality of initial tooth models obtained by scanning the oral cavity and a target deformation model corresponding to each initial tooth model obtained by manually processing each initial tooth model.

Specifically, the oral cavity can be scanned and 3D reconstructed for multiple users to obtain the initial tooth model of each user, and then some gingival areas in the dental models of various chambers can be manually removed based on specific requirements, and scanned and remodeled. Thereby, the target deformation model corresponding to each initial tooth model is obtained.

S12: Obtain feature tensors corresponding to each initial tooth model.

The feature tensor corresponding to each initial tooth model is the Truncated Signed Distance Function (TSDF) value of each voxel in the cubic space where each initial tooth model is located.

Specifically, acquiring the feature tensor corresponding to each initial tooth model may include: firstly establishing a square outer bounding box of the tooth model as the cubic space where each initial tooth model is located, then voxelizing the cubic space where each initial tooth model is located, and finally using Truncated Signed Distance Function (TSDF) calculates the distance of each voxel to the surface of the original tooth model as the TSDF value of each voxel. Wherein, the distance value TSDF(x _i , y _i , z _i )=0 indicates that the voxel is on the surface of the tooth model, and TSDF(x _i , y _i , z _i )>0 indicates that the voxel is outside the tooth model , TSDF(x _i , y _i , z _i )<0 indicates that the voxel is inside the tooth model.

S13: Input the feature tensor corresponding to each initial tooth model into a preset network model, and obtain a predicted deformation model corresponding to each initial tooth model.

That is, a preset network model for generating a dental model deformation model is established in advance, and the feature tensor corresponding to each initial tooth model in the sample data is input into the preset network model, and the corresponding output is obtained as the prediction corresponding to the initial tooth model. Deformation model.

S14. According to the target deformation model and the predicted deformation model corresponding to each initial tooth model, optimize the preset network model to obtain a tooth model deformation model.

The method for training a dental model deformation model provided by the embodiment of the present disclosure first obtains sample data including multiple initial tooth models and target deformation models corresponding to each initial tooth model, and then obtains the TSDF of each voxel in the cubic space where each initial tooth model is located The value is used as the element of the feature tensor corresponding to each initial tooth model, and then the feature tensor corresponding to each initial tooth model is input into the preset network model, and the predicted deformation model corresponding to each initial tooth model is obtained. The target deformation model and the predicted deformation model are optimized to obtain the dental model deformation model by optimizing the preset network model. Since the method for training a dental model deformation model provided by the embodiment of the present disclosure can obtain a dental model deformation model, and obtain a dental model that meets the requirements of a specific product according to the initial model that needs to be deformed according to the dental model deformation model, the embodiment of the present disclosure is adopted. The dental model deformation model obtained by the provided method for training a dental model deformation model can automatically convert the initial tooth model into a tooth model that meets specific product requirements. The preset network model in the above embodiment will be described in detail below.

Referring to FIG. 2 , the preset network model in the embodiment of the present disclosure includes:

An encoder component 21, a self-attention component 22, a feature transfer component 23, a multi-scale analysis component 24, and a decoder component 25 consisting of a plurality of concatenated decoders are composed of multiple encoders in concatenated structures.

The input of the encoder component 21 is the input of the preset network model, and the output of the encoder component 21 is the input of the self-attention component 22; the output of the self-attention component 22 is the feature The input of the transfer component 23; the output of the feature transfer component 23 is the input of the multi-scale analysis component 24, and the output of the multi-scale analysis component 24 is the input of the decoder component 25, and the decoder component 25 The output of is the output of the preset network model.

Wherein, the self-attention component 22 is used for non-local information extraction on the feature tensor output by the encoder component 21 to obtain the environmental feature tensor; the feature transfer component performs the output of the self-attention component process, and transmit the processing result to the multi-scale analysis component; the multi-scale analysis component 24 is configured to extract the feature tensors of the feature tensor output by the feature transfer component 23 in multiple scales.

Because the self-attention component can perform non-local information extraction on the feature tensor output by the encoder component to obtain the dependencies between non-local features, and the multi-scale analysis component can extract the feature tensor output by the feature transfer component Feature tensors at multiple scales, so as to mine the correlation between feature tensors at different scales, and obtain context information including multi-scale analysis results. Therefore, the method for training a dental model deformation model provided by the embodiments of the present disclosure The obtained tooth model deformation model can more accurately deform the tooth model, and more accurately obtain a tooth model that meets specific requirements.

Further, as shown in FIG. 3 , the encoder component 21 includes three encoders (encoder 211 , encoder 212 , encoder 213 ) in series structure, each encoder (encoder 211 , encoder 212 , encoder device 213) includes a residual unit (residual unit E1, residual unit E2, residual unit E3) and a down-sampling unit (down-sampling unit Do1, down-sampling unit Do2, down-sampling unit Do3); The difference unit (residual unit E1, residual unit E2, residual unit E3) is used to perform a convolution operation on the input of the residual unit through three convolutional layers in a series structure and the convolution result of the convolution operation The sum operation is performed with the input of the residual unit, and the downsampling units of each encoder (downsampling unit Do1, downsampling unit Do2, downsampling unit Do3) are used to downsample the input feature tensor into the number of channels as the input feature tensor The output feature tensor whose length, width and height are half of the length, width and height of the input feature tensor.

The input of each residual unit is the input of the corresponding encoder (the input of the residual unit E1 is the input of the encoder 211, the input of the residual unit E2 is the input of the encoder 212, and the input of the residual unit E3 is the input of the encoder. 213 input). The input of each downsampling unit is the output of the residual unit of the corresponding encoder (the input of the downsampling unit Do1 is the output of the residual unit E1 of the encoder 211, and the input of the downsampling unit Do2 is the residual unit of the encoder 212. The output of E2 and the input of the down-sampling unit Do3 are the output of the residual unit E3 of the encoder 213), and the output of each down-sampling unit is the output of the corresponding encoder (the output of the down-sampling unit Do1 is the output of the encoder 211). , the output of the down-sampling unit Do2 is the output of the encoder 212, the output of the down-sampling unit Do3 is the output of the encoder 213), the input of the first encoder 211 is the input of the encoder component 21, the third encoder 213 The output of the encoder component 21 is the output of the encoder component 21, and the inputs of the second encoder 212 and the third encoder 213 are the outputs of the first encoder 211 and the second encoder 212, respectively.

That is, the input of the encoder component 21, the input of the first encoder 211, and the input of the residual unit E1 are the same input, and the output of the residual unit E1 is the input of the down-sampling unit Do1. The output of the down-sampling unit Do1 is the output of the first encoder 211 . The input of the second encoder 212 and the input of the residual unit E2 are the same input, and both are outputs of the first encoder 211 . The output of the residual unit E2 is the input of the down-sampling unit Do2. The output of the downsampling unit Do2 is the output of the second encoder 212. The input of the third encoder 213 and the input of the residual unit E3 are the same input, and both are the output of the second encoder 212 . The output of the residual unit E3 is the input of the down-sampling unit Do3. The output of the down-sampling unit Do3 is the output of the third encoder 212 , the output of the encoder component 21 .

Optionally, the convolution kernels of the three convolutional layers of the residual unit of each decoder are all 3×3×3, and the length of the feature tensor output by the three convolutional layers of the residual unit of each decoder is , width and height are the same as the length, width and height of the input feature tensor. The number of channels of the feature tensor output by the residual unit of the first encoder is 16 times the number of channels of the input feature tensor. The number of channels of the feature tensor output by the residual unit of the encoder and the residual unit of the third encoder is the same as the number of channels of the input feature tensor. Each down-sampling unit is a convolutional layer with a stride of 2 and a convolution kernel of 2×2×2.

Further, referring to FIG. 4 , the self-attention component 22 includes: a residual unit E4, a first convolutional layer Co1, a second convolutional layer Co2, a third convolutional layer Co3, and a fourth convolutional layer Co4 , the fifth convolutional layer Co5, the sixth convolutional layer Co6, the first dot product unit Pro1, the second dot product unit Pro2, the first summation unit Add1, and the second summation unit Add2.

The residual unit E4 is used to perform a convolution operation on the input of the residual unit through three convolutional layers of a concatenated structure, and perform a sum operation on the convolution result of the convolution operation and the input of the residual unit, so The first dot product unit Pro1 and the second dot product unit Pro2 are used to perform a dot product operation on the input feature tensor, and the first addition unit Add1 and the second addition unit Add2 perform a sum operation on the input feature tensor .

The input of the residual unit E4 is the output of the encoder component 21 (the output of the downsampling unit Do3 of the third encoder 213 of the encoder component 21), and the output of the residual unit E4 is the An input to the convolutional layer Co1. The output of the first convolutional layer Co1 is the input of the second convolutional layer Co2, the third convolutional layer Co3 and the fourth convolutional layer Co4. The input of the first dot product unit Pro1 is the output of the second convolutional layer Co2 and the output of the third convolutional layer Co3. The input of the second dot product unit Pro2 is the output of the first dot product unit Pro1 and the output of the fourth convolution layer Co4. The input of the fifth convolutional layer Co5 is the output of the second dot product unit Pro2. The input of the first summing unit Add1 is the output of the fifth convolutional layer Co5 and the output of the first convolutional layer Co1. The input of the sixth convolutional layer Co6 is the output of the first summing unit Add1. The inputs of the second summing unit Add2 are the output of the sixth convolutional layer Co6 and the output of the residual unit E4. The output of the second summing unit Add2 is the output of the self-attention component 22 .

Optionally, the first convolutional layer, the second convolutional layer, the third convolutional layer, the fourth convolutional layer, the fifth convolutional layer, and the sixth convolutional layer The convolution kernels of the layers are all 1×1×1. The length, width and height of the output feature tensor of the first convolution layer are the same as the length, width and height of the input feature tensor. The number of channels of the output feature tensor of the first convolution layer is one eighth of the number of channels of the input feature tensor. The number of channels of the output feature tensor of the second convolution layer, the third convolution layer, and the fourth convolution layer is half of the channel number of the feature tensor of the input feature tensor. The number of channels of the feature tensor of the output of the fifth convolution layer is twice the number of channels of the feature tensor of the input feature tensor. The number of channels of the feature tensor of the output of the sixth convolution layer is eight times the number of channels of the feature tensor of the input feature tensor.

Let: the output feature tensor X∈R ^C×H×W×L of the residual unit E4, where C is the number of channels of the output feature tensor of the residual unit E4, H, W, L are the residuals respectively The length, width and height of the feature tensor output by the unit E4, then the feature tensor X ₁ ∈ R ^C1×H×W×L output by the first convolutional layer Co1, C1=C/8, the second convolutional layer The feature tensor X ₂ ∈R ^C2×H×W×L output by Co2, C2=C1/2, the feature tensor X ₃ ∈ R ^C3×H×W×L output by the third convolution layer Co3, C3=C1 /2.

The labels of the i-th voxel in X_2 and the j-th voxel in X ₃ are represented by x _i ∈ R ¹ and x _j ∈ R ¹ respectively, and X ₂ ( _xi )∈R ^C2 represents the feature of the x _i -th voxel in X ₂ Vector, X ₃ (x _j )∈R ^C3 represents the feature vector of the x _jth voxel in X ₃ , then the attention distribution is:

The feature tensor X ₄ ∈ R ^C4×H×W×L output by the fourth convolution layer Co4, C4=C1/2. Performing the dot product operation on _X4 and S, you can get the environmental characteristics that describe the non-local dependencies:

Then the environmental feature Con2∈R ^C5×H×W×L , C5=C1/2 can be obtained.

The feature tensor X ₅ output by the fifth convolution layer Co5, the first summation unit Add1 performs the sum operation on X ₅ and X ₁ , and the feature tensor res1∈R ^C1×H×W output by the first summation unit Add1 ^×L , the feature tensor X ₆ ∈ R ^C×H×W×L output by the sixth convolution layer Co6, the second summation unit Add2 performs the sum operation on X and X ₆ to obtain the final environmental feature tensor res2 ∈ R ^C×W×H×L .

Further, as shown in FIG. 5 , the feature transfer component 23 includes a downsampling unit Do4 and a residual unit E5;

The downsampling unit Do4 is used for downsampling the input feature tensor to a channel number that is twice the number of channels of the input feature tensor, and the length, width, and height are half of the length, width, and height of the input feature tensor. An output feature tensor of . The residual unit E5 is configured to perform a convolution operation on the input of the residual unit through three convolutional layers in a concatenated structure, and perform a summation operation on the convolution result of the convolution operation and the input of the residual unit.

The input of the downsampling unit Do4 is the output of the self-attention component 22, the output of the downsampling unit Do4 is the input of the residual unit E5, and the output of the residual unit E5 is the feature transfer Output of component 23.

Optionally, the downsampling unit Do4 is a convolution layer with a stride of 2 and a convolution kernel of 2×2×2. The convolution kernels of the three convolutional layers of the residual unit E5 are all 3×3×3, and the length, width and height of the output feature tensor of the three convolutional layers of each residual unit and the input features The length, width and height of the tensor are the same, and the number of channels of the output feature tensor of the residual unit E5 is the same as the number of channels of the input feature tensor.

Further, as shown in FIG. 6, the multi-scale analysis component 24 includes: the seventh convolutional layer Co7, the eighth convolutional layer Co8, the ninth convolutional layer Co9, the tenth convolutional layer Co10, the eleventh volume The stacking layer Co11, the twelfth convolutional layer Co12, and the splicing unit MON.

Wherein, the expansion rates of the seventh convolutional layer Co7, the eighth convolutional layer Co8, the ninth convolutional layer Co9, and the tenth convolutional layer Co10 are all different; the splicing unit MON is used to The feature tensor of the splicing operation is performed.

The inputs of the seventh convolutional layer Co7, the eighth convolutional layer Co8, the ninth convolutional layer Co9, and the tenth convolutional layer Co10 are the outputs of the feature transfer component 23, and the The input of the splicing unit MON is the output of the feature transfer component 23, the output of the seventh convolutional layer Co7, the output of the eighth convolutional layer Co8, the output of the ninth convolutional layer Co9 and the The output of the tenth convolutional layer Co10, the input of the eleventh convolutional layer Co11 is the output of the splicing unit MON, and the input of the twelfth convolutional layer Co12 is the eleventh convolutional layer Co11 The output of the twelfth convolutional layer Co12 is the output of the multi-scale analysis component 24 .

Optionally, the convolution kernel of the seventh convolution layer is 1×1×1, the number of channels of the output feature tensor is the same as the number of channels of the input feature tensor, and the expansion rate is 1. The convolution kernels of the eighth convolutional layer, the ninth convolutional layer, and the tenth convolutional layer are all 3×3×3, and the number of channels of the output feature tensor is the same as the input feature tensor. The number of channels is the same, and the expansion rates are 2, 3, and 4, respectively. The convolution kernel of the eleventh convolution layer is 3×3×3, and the number of channels of the output feature tensor is one-fifth of the number of channels of the input feature tensor; the twelfth convolution layer The convolution kernel is 3×3×3, and the number of channels of the output feature tensor is the same as the number of channels of the input feature tensor.

Suppose: the feature tensor A∈R ^{C×H×W×L output by the feature transfer component 23, then the feature tensor A1∈R C×H×W×L} ^output by the seventh convolutional layer Co7, the eighth convolutional layer The feature tensor A2∈R ^{C×H×W×L output by Co8, the feature tensor A3∈R C×H×W×L} ^output by the ninth convolutional layer Co9, the feature tensor output by the tenth convolutional layer Co10 A4∈R ^C×H×W×L , the feature tensor Cat∈R ^{5×C×H×W×L} output by the splicing module MON, the feature tensor Cat1∈R C ^× H output by the eleventh convolutional layer Co11 ^×W×L , the feature tensor Cat2∈R ^C×H×W×L output by the twelfth convolutional layer Co12.

Further, as shown in FIG. 7 , the decoder component 25 includes four decoders (decoder 251 , decoder 252 , decoder 253 , and decoder 254 ) in a serial structure; each decoder includes: an upsampling unit ( The upsampling unit Up1 of the decoder 251, the upsampling unit Up2 of the decoder 252, the upsampling unit Up3 of the decoder 253, the upsampling unit Up4 of the decoder 254), the fusion unit (the fusion unit F1 of the decoder 251, the decoder 252 fusion unit F2, decoder 253 fusion unit F3, decoder 254 fusion unit F4) and residual units (decoder 251 residual unit E6, decoder 252 residual unit E7, decoder 253 residual unit difference unit E8, residual unit E9 of decoder 254).

The residual unit of each decoder is used to perform convolution operation on the input of the residual unit through three convolutional layers of concatenated structure, and perform addition of the convolution result of the convolution operation and the input of the residual unit. and operation, the fusion unit of each decoder is used to perform fusion operation on the input feature tensor; the upsampling unit of each decoder is used to upsample the input feature tensor to a number of channels equal to the number of channels of the input feature tensor. One-half, the output feature tensor whose length, width and height are twice the length, width and height of the input feature tensor.

The input of the upsampling unit Up1 of the first decoder 251 of the decoder component is the output of the multi-scale analysis component 24, and the input of the fusion unit F1 of the first decoder 251 is the first decoder The output of the upsampling unit Up1 of 251 and the output of the self-attention module 22; the input of the residual unit E6 of the first decoder 251 is the output of the fusion unit F1 of the first decoder 251 and The output of the self-attention module 22; the up-sampling units (up-sampling unit Up2, up-sampling unit Up3, The input of the upsampling unit Up4) is the output of the previous decoder, and the fusion units of the second decoder 252, the third decoding 253, and the fourth decoder 254 of the decoder component (fusion unit F2, fusion The outputs of the residual units (residual unit E3, residual unit E2, residual unit E1) of the corresponding encoders (encoder 213, encoder 212, and encoder 211) respectively correspond to the inputs of the unit F3 and the fusion unit F4) and belong to The output of the upsampling unit of the decoder, the residual unit of the second decoder 252, the third decoder 253, the fourth decoder 254 of the decoder component (residual unit E7, residual unit E8 residual The input of the unit E9) is the output of the residual unit (residual unit E3, residual unit E2, residual unit E1) of the corresponding encoder (encoder 213, encoder 212, encoder 211) and the corresponding decoder. The output of the fusion unit.

Optionally, the convolution kernels of the three convolutional layers of the residual unit of each encoder are all 3×3×3, and the length of the feature tensor output by the three convolutional layers of the residual unit of each encoder is , width and height are the same as the length, width and height of the input feature tensor, and the number of channels of the feature tensor output by the residual unit of each decoder is the same as the number of channels of the input feature tensor. Each upsampling unit is a deconvolution layer with a stride of 2 and a convolution kernel of 2×2×2.

Further, as shown in FIG. 8, the fusion unit of each decoder includes: the thirteenth convolutional layer Co13, the fourteenth convolutional layer Co14, the fifteenth convolutional layer Co15, the third summation unit Add3, and the fourth summation Unit Add4, third dot product unit Pro3, and fourth dot product unit Pro4.

The third adding unit Add3 and the fourth adding unit Add4 are used to perform an adding operation on the input, and the third dot product unit Pro3 and the fourth dot product unit Pro4 are used to perform a dot product operation on the input.

The inputs of the thirteenth convolutional layer Co13 and the fourteenth convolutional layer Co14 of the fusion unit F1 of the first decoder 251 of the decoder component 25 are respectively the inputs of the upsampling unit Up1 of the first decoder 251 . The output and the output of the self-attention component 22, the fusion units of the second decoder 252, the third decoder 253, and the fourth decoder 254 of the decoder component (fusion unit F2, fusion unit F3, The input of the fusion unit F4) is the output of the upsampling unit of the corresponding decoder and the output of the corresponding residual unit of the encoder (the input of the fusion unit F2 is the output of the residual unit E3 of the encoder 213 and the output of the decoder 252. The output of the upsampling unit Up2, the input of the fusion unit F3 is the output of the residual unit E2 of the encoder 212 and the output of the upsampling unit Up3 of the decoder 253, and the input of the fusion unit F4 is the residual unit E1 of the encoder 211. output and the output of the upsampling unit Up4 of the decoder 254), the input of the third summing unit Add3 is the output of the thirteenth convolutional layer Co13 and the output of the fourteenth convolutional layer Co14, the The input of the fifteenth convolutional layer Co15 is the output of the third summing unit Add3, and the input of the third dot product unit Pro3 is the output of the thirteenth convolutional layer Co13 and the fifteenth convolutional layer Co15 , the input of the fourth dot product unit Pro4 is the output of the fourteenth convolutional layer Co14 and the output of the fifteenth convolutional layer Co15, and the input of the fourth summing unit Add4 is the The output of the third dot product unit Pro3 and the output of the fourth dot product unit Pro4, the output of the fourth summing unit Add4 is the output of the fusion unit to which it belongs.

That is, as shown in Figure 8, the fusion unit performs convolution operations on the input two feature tensors Ai and Bi through the thirteenth convolutional layer Co13 and the fourteenth convolutional layer Co14, respectively, to obtain the features after dimension reduction. Tensors Ci and Di; then use the third summing unit Add3 to perform the sum operation on Ci and Di, fuse Ci and Di, and then send the fusion result of the third summing unit Add3 to the fifteenth convolutional layer Co15 to obtain the encoder weights respectively The value coefficient tensor Ei and the decoder weight coefficient tensor Fi; and then through the third dot product unit Pro3, the weight coefficient tensor Ci and Ei are dot-multiplied to obtain the result Gi, and the fourth dot product unit Pro4 is used to compare Di and Fi The result Hi is obtained by dot multiplication; finally, the addition operation is performed on Gi and Hi through the fourth adding unit Add4, and the fusion feature Zi containing the output feature of the encoder and the output feature of the decoder is obtained as the residual unit of the i-th module decoder. Input feature tensor.

Optionally, the convolution kernels of the thirteenth convolutional layer, the fourteenth convolutional layer, and the fifteenth convolutional layer are all 1×1×1, and the output features of the third convolutional layer and the fourth convolutional layer are The number of channels of the tensor is half of the number of channels of the input feature tensor, and the number of channels of the output feature tensor of the fifteenth convolutional layer is 1.

Let: Ai∈R ^C×W×H×L and Bi∈R ^C×W×H×L , then Ci∈R ^{1/2C×W×H×L} and Di∈R ^{1/2C×W×H×L} , Ei∈R ^1×W×H×L , Fi∈R ^1×W×H×L .

As an optional implementation manner of the embodiment of the present disclosure, the above step S104 (according to the target deformation model and the predicted deformation model corresponding to each initial tooth model, optimize the preset network model to obtain the tooth model deformation model), including :

Wherein, the loss function includes:

Wherein, alpha is a constant, out _i is data obtained by sequentially processing the output of the multi-scale analysis component and the output of each decoder of the decoder component, and seg is an intermediate supervision signal.

Optionally, out ₁ is to perform a convolution operation on the output of the multi-scale analysis component through a convolution layer with a convolution kernel of 1 × 1 × 1 and the number of channels of the output feature tensor is 1, and then through trilinear interpolation (Trilinear interpolation). ) Expand the length, width and height of the feature tensor obtained by the convolution operation by 16 times, and then perform the sigmoid operation on the trilinear interpolation result to obtain the result;

out ₂ is to perform a convolution operation on the output of the first decoder of the decoder component through a convolution layer with a convolution kernel of 1 × 1 × 1 and the number of channels of the output feature tensor is 1, and then through trilinear interpolation (Trilinear interpolation). ) Expand the length, width and height of the feature tensor obtained by the convolution operation by 8 times, and then perform the Sigmoid operation on the trilinear interpolation result to obtain the result;

out ₃ is to perform a convolution operation on the output of the second decoder of the decoder component through a convolution layer with a convolution kernel of 1 × 1 × 1 and the number of channels of the output feature tensor is 1, and then through trilinear interpolation (Trilinear interpolation). ) Expand the length, width and height of the feature tensor obtained by the convolution operation by 4 times, and then perform the Sigmoid operation on the trilinear interpolation result to obtain the result;

out ₄ is to perform a convolution operation on the output of the third decoder of the decoder component through a convolution layer with a convolution kernel of 1 × 1 × 1 and the number of channels of the output feature tensor is 1, and then through trilinear interpolation (Trilinear interpolation) ) Expand the length, width and height of the feature tensor obtained by the convolution operation by 2 times, and then perform the Sigmoid operation on the trilinear interpolation result to obtain the result;

out ₅ is to perform a convolution operation on the output of the fourth decoder of the decoder component through a convolution layer with a convolution kernel of 1 × 1 × 1 and the number of channels of the output feature tensor is 1, and then the convolution operation is obtained. The feature tensor performs a sigmoid operation to get the result.

Optionally, alpha is 0.25.

Based on the same inventive concept, as an implementation of the above method, an embodiment of the present disclosure further provides an apparatus for establishing a dental model deformation model, and the apparatus embodiment corresponds to the foregoing method embodiment. For ease of reading, this apparatus embodiment does not The details in the foregoing method embodiments are described one by one, but it should be clear that the apparatus for establishing a dental model deformation model in this embodiment can correspondingly implement all the foregoing method embodiments.

FIG. 9 is a schematic structural diagram of an apparatus for establishing a dental mold deformation model provided by an embodiment of the present disclosure. As shown in FIG. 9 , the apparatus 900 for establishing a dental mold deformation model provided in this embodiment includes:

The sample acquisition unit 91 is configured to acquire sample data, the sample data includes a plurality of initial tooth models obtained by scanning the oral cavity and a target deformation model corresponding to each initial tooth model obtained by manually processing each initial tooth model;

The preprocessing unit 92 is used to obtain the feature tensor corresponding to each initial tooth model, and each element of the feature tensor corresponding to each initial tooth model is the truncated symbol distance function value TSDF value of each voxel in the cubic space where each initial tooth model is located ;

The prediction unit 93 is used to input the feature tensor corresponding to each initial tooth model into the preset network model, and obtain the predicted deformation model corresponding to each initial tooth model;

The optimization unit 94 is configured to optimize the preset network model according to the target deformation model and the predicted deformation model corresponding to each initial tooth model to obtain the tooth model deformation model.

As an optional implementation manner of the embodiment of the present disclosure, the multi-scale analysis component includes: a seventh convolution layer, an eighth convolution layer, a ninth convolution layer, a tenth convolution layer, and an eleventh convolution layer layer, the twelfth convolution layer and the splicing unit; the expansion rates of the seventh convolution layer, the eighth convolution layer, the ninth convolution layer, and the tenth convolution layer are different; the splicing The unit is used to perform a concatenation operation on the input feature tensor;

As an optional implementation manner of the embodiment of the present disclosure, the optimization unit 94 is specifically configured to construct a loss function, and according to the loss function, the target deformation model corresponding to each initial tooth model, and the predicted deformation model, the The preset network model is optimized to obtain the dental model deformation model;

Wherein, the loss function includes:

The apparatus for establishing a dental model deformation model provided in this embodiment can execute the method for training a dental model deformation model provided by the above method embodiments, and the implementation principle and technical effect thereof are similar, and are not repeated here.

Based on the same inventive concept, an embodiment of the present disclosure also provides an electronic device. FIG. 10 is a schematic structural diagram of an electronic device provided by an embodiment of the present disclosure. As shown in FIG. 10 , the electronic device provided by this embodiment includes: a memory 101 and a processor 102. The memory 101 is used for storing computer programs; the processor 102 is used for When the computer program is invoked, each step in the method for training a dental model deformation model provided by the above method embodiments is executed.

Specifically, the memory 101 can be used to store software programs and various data. The memory 101 may mainly include a stored program area and a stored data area, wherein the stored program area may store an operating system, an application program required for at least one function (such as a sound playback function, an image playback function, etc.), etc.; Data created by the use of the mobile phone (such as audio data, phone book, etc.), etc. In addition, memory 101 may include high-speed random access memory, and may also include non-volatile memory, such as at least one magnetic disk storage device, flash memory device, or other volatile solid-state storage device.

The processor 102 is the control center of the electronic device, using various interfaces and lines to connect various parts of the entire electronic device, by running or executing the software programs and/or modules stored in the memory 101, and calling the data stored in the memory 101. , perform various functions of electronic equipment and process data, so as to monitor electronic equipment as a whole. Processor 102 may include one or more processing units.

In addition, it should be understood that the electronic device provided by the embodiments of the present disclosure may further include components such as a radio frequency unit, a network module, an audio output unit, a sensor, a signal receiving unit, a display, a user receiving unit, an interface unit, and a power supply. Those skilled in the art can understand that the structure of the electronic device described above does not constitute a limitation on the electronic device, and the electronic device may include more or less components, or combine some components, or arrange different components. In the embodiments of the present disclosure, electronic devices include, but are not limited to, mobile phones, tablet computers, notebook computers, handheld computers, vehicle-mounted terminals, wearable devices, and pedometers.

The radio frequency unit can be used for receiving and sending signals during sending and receiving of information or during a call. Specifically, after receiving downlink data from the base station, it is processed by the processor 102; in addition, the uplink data is sent to the base station. Typically, a radio frequency unit includes, but is not limited to, an antenna, at least one amplifier, a transceiver, a coupler, a low noise amplifier, a duplexer, and the like. In addition, the radio frequency unit can also communicate with the network and other devices through the wireless communication system.

Electronic devices provide users with wireless broadband Internet access through network modules, such as helping users send and receive e-mails, browse web pages, and access streaming media.

The audio output unit may convert audio data received by the radio frequency unit or the network module or stored in the memory 101 into audio signals and output as sound. Also, the audio output unit may also provide audio output related to a specific function performed by the electronic device (eg, call signal reception sound, message reception sound, etc.). The audio output unit includes speakers, buzzers, and receivers.

The signal receiving unit is used to receive audio or video signals. The receiving unit may include a graphics processor (Graphics Processing Unit, GPU) and a microphone, and the graphics processor convolves image data of still pictures or videos obtained by an image capture device (such as a camera) in a video capture mode or an image capture mode . The processed image frames can be displayed on the display unit. The image frames processed by the graphics processor may be stored in memory (or other storage medium) or transmitted via a radio frequency unit or a network module. The microphone can receive sound and can process such sound into audio data. The processed audio data can be converted into a format that can be transmitted to a mobile communication base station via a radio frequency unit for output in the case of a telephone call mode.

The electronic device also includes at least one sensor, such as a light sensor, a motion sensor, and other sensors. Specifically, the light sensor includes an ambient light sensor and a proximity sensor, wherein the ambient light sensor can adjust the brightness of the display panel according to the brightness of the ambient light, and the proximity sensor can turn off the display panel and/or the backlight when the electronic device is moved to the ear . As a kind of motion sensor, the accelerometer sensor can detect the magnitude of acceleration in all directions (usually three axes), and can detect the magnitude and direction of gravity when stationary, which can be used to identify the posture of electronic devices (such as horizontal and vertical screen switching, related games , magnetometer attitude calibration), vibration recognition related functions (such as pedometer, tapping), etc.; sensors can also include fingerprint sensors, pressure sensors, iris sensors, molecular sensors, gyroscopes, barometers, hygrometers, thermometers, infrared Sensors, etc., will not be repeated here.

The display unit is used to display information input by the user or information provided to the user. The display unit may include a display panel, and the display panel may be configured in the form of a liquid crystal display (Liquid Crystal Display, LCD), an organic light-emitting diode (Organic Light-Emitting Diode, OLED), and the like.

The user receiving unit can be used for receiving inputted numerical or character information, and generating key signal input related to user setting and function control of the electronic device. Specifically, the user receiving unit includes a touch panel and other input devices. A touch panel, also known as a touch screen, collects user touch operations on or near it (such as a user's operations on or near the touch panel using a finger, stylus, or any suitable object or accessory). The touch panel may include two parts, a touch detection device and a touch controller. Among them, the touch detection device detects the user's touch orientation, detects the signal brought by the touch operation, and transmits the signal to the touch controller; the touch controller receives the touch information from the touch detection device, converts it into contact coordinates, and then sends it to the touch controller. To the processor 102, the command sent by the processor 102 is received and executed. In addition, the touch panel can be realized by various types of resistive, capacitive, infrared, and surface acoustic waves. Besides the touch panel, the user receiving unit may also include other input devices. Specifically, other input devices may include, but are not limited to, physical keyboards, function keys (such as volume control keys, switch keys, etc.), trackballs, mice, and joysticks, which will not be described herein again.

Further, the touch panel can be overlaid on the display panel, and when the touch panel detects a touch operation on or near it, it transmits it to the processor 102 to determine the type of the touch event, and then the processor 102 determines the type of the touch event according to the type of the touch event. Provide the corresponding visual output on the display panel. In general, the touch panel and the display panel are used as two independent components to realize the input and output functions of the electronic device, but in some embodiments, the touch panel and the display panel can be integrated to realize the input and output functions of the electronic device and output functions, which are not specifically limited here.

The interface unit is an interface for connecting an external device with an electronic device. For example, external devices may include wired or wireless headset ports, external power (or battery charger) ports, wired or wireless data ports, memory card ports, ports for connecting devices with identification modules, audio input/output (I/O) ports, video I/O ports, headphone ports, and more. The interface unit may be used to receive input (eg, data information, power, etc.) from an external device and transmit the received input to one or more elements in the electronic device or may be used to communicate between the electronic device and the external device transfer data.

The electronic device may also include a power supply (such as a battery) for supplying power to various components. Optionally, the power supply may be logically connected to the processor 102 through a power management system, so as to manage charging, discharging, and power consumption management functions through the power management system. .

Embodiments of the present disclosure further provide a computer-readable storage medium, where a computer program is stored on the computer-readable storage medium, and when the computer program is executed by a processor, the method for training a dental model deformation model provided by the foregoing method embodiments is implemented.

As will be appreciated by one skilled in the art, embodiments of the present disclosure may be provided as a method, system, or computer program product. Accordingly, the present disclosure may take the form of an entirely hardware embodiment, an entirely software embodiment, or an embodiment combining software and hardware aspects. Furthermore, the present disclosure may take the form of a computer program product embodied on one or more computer-usable storage media having computer-usable program code embodied therein.

Computer readable media includes both persistent and non-permanent, removable and non-removable storage media. A storage medium can be implemented by any method or technology for storing information, and the information can be computer readable instructions, data structures, modules of programs, or other data. Examples of computer storage media include, but are not limited to, phase-change memory (PRAM), static random access memory (SRAM), dynamic random access memory (DRAM), other types of random access memory (RAM), read only memory (ROM), Electrically Erasable Programmable Read Only Memory (EEPROM), Flash Memory or other memory technology, Compact Disc Read Only Memory (CD-ROM), Digital Versatile Disc (DVD) or other optical storage, Magnetic tape cartridges, magnetic disk storage or other magnetic storage devices or any other non-transmission medium that can be used to store information that can be accessed by a computing device. As defined herein, computer-readable media does not include transitory computer-readable media, such as modulated data signals and carrier waves.

It should be noted that, herein, the terms "comprising", "comprising" or any other variation thereof are intended to encompass non-exclusive inclusion, such that a process, method, article or device comprising a series of elements includes not only those elements, It also includes other elements not expressly listed or inherent to such a process, method, article or apparatus. Without further limitation, an element qualified by the phrase "comprising a..." does not preclude the presence of additional identical elements in a process, method, article or apparatus that includes the element.

The above descriptions are only specific embodiments of the present disclosure, so that those skilled in the art can understand or implement the present disclosure. Various modifications to these embodiments will be readily apparent to those skilled in the art, and the generic principles defined herein may be implemented in other embodiments without departing from the spirit or scope of the present disclosure. Therefore, the present disclosure is not intended to be limited to the embodiments described herein, but is to be accorded the widest scope consistent with the principles and novel features disclosed herein.

Industrial Applicability

The method for training a dental model deformation model provided by the embodiment of the present disclosure can obtain a dental model deformation model, and obtain a tooth model that meets the requirements of a specific product by performing an initial model to be deformed according to the dental model deformation model. Therefore, the embodiments of the present disclosure provide The method for training a dental model deformation model and the obtained dental model deformation model can automatically convert the initial tooth model into a tooth model that meets specific product requirements, which has strong industrial practicability.

Claims

A method for training a dental model deformation model, comprising:

Obtaining sample data, the sample data includes a plurality of initial tooth models obtained by scanning the oral cavity and a target deformation model corresponding to each initial tooth model obtained by manually processing each initial tooth model;

Obtain the feature tensor corresponding to each initial tooth model, and each element of the feature tensor corresponding to each initial tooth model is the truncated signed distance function TSDF value of each voxel in the cubic space where each initial tooth model is located;

Input the feature tensor corresponding to each initial tooth model into the preset network model, and obtain the predicted deformation model corresponding to each initial tooth model;

According to the target deformation model and the predicted deformation model corresponding to each initial tooth model, the preset network model is optimized to obtain the tooth model deformation model.
The method according to claim 1, wherein the preset network model comprises: an encoder component composed of multiple encoders in a series structure, a self-attention component, a feature transfer component, a multi-scale analysis component, and a decoder component composed of a plurality of decoders in a series structure; the input of the encoder component is the input of the preset network model, and the output of the encoder component is the input of the self-attention component; the The output of the self-attention component is the input of the feature transfer component; the output of the feature transfer component is the input of the multi-scale analysis component, and the output of the multi-scale analysis component is the input of the decoder component, so The output of the decoder component is the output of the preset network model;

Wherein, the self-attention component is used to extract non-local information on the feature tensor output by the encoder component to obtain an environment feature tensor; the feature transfer component processes the output of the self-attention component, and transmit the processing result to the multi-scale analysis component; the multi-scale analysis component is used for extracting the feature tensors in multiple scales of the feature tensor output by the feature transfer component.
The method of claim 2, wherein:

The encoder component includes three encoders with a serial structure, and each encoder includes a residual unit and a down-sampling unit; the residual unit of each encoder is used to pass the three convolutional layers of the serial structure to the residual unit. Perform a convolution operation on the input and perform a sum operation on the convolution result of the convolution operation and the input of the residual unit, and the downsampling unit of each encoder is used to downsample the input feature tensor into the number of channels as the input feature tensor The output feature tensor whose length, width and height are half of the length, width and height of the input feature tensor;

The input of each residual unit is the input of the encoder to which it belongs, the input of each down-sampling unit is the output of the residual unit of the encoder to which it belongs, and the output of each down-sampling unit is the output of the encoder to which it belongs. The input of the first encoder is the input of the encoder component, the output of the third encoder is the output of the encoder component, and the input of the second encoder and the third encoder are the first encoder and the second encoder respectively. output of the device.
The method according to claim 2, wherein the self-attention component comprises: a residual unit, a first convolutional layer, a second convolutional layer, a third convolutional layer, a fourth convolutional layer, a Five convolution layers, sixth convolution layer, first dot product unit, second dot product unit, first sum unit, and second sum unit; the residual unit is used for convolution through three concatenated structures The layer performs a convolution operation on the input of the residual unit and performs a summation operation on the convolution result of the convolution operation and the input of the residual unit, and the first dot product unit and the second dot product unit are used for the input. The feature tensor performs a dot product operation, and the first summation unit and the second summation unit are used to perform a summation operation on the input feature tensor;

The input of the residual unit is the output of the encoder component, the output of the residual unit is the input of the first convolutional layer; the output of the first convolutional layer is the second convolutional layer layer, the third convolution layer, the input of the fourth convolution layer; the input of the first dot product unit is the output of the second convolution layer and the output of the third convolution layer; The input of the second dot product unit is the output of the first dot product unit and the output of the fourth convolution layer; the input of the fifth convolution layer is the output of the second dot product unit; The input of the first summing unit is the output of the fifth convolutional layer and the output of the first convolutional layer; the input of the sixth convolutional layer is the output of the first summing unit; The input of the second summing unit is the output of the sixth convolutional layer and the output of the residual unit, and the output of the second summing unit is the output of the self-attention component.
The method according to claim 2, wherein the feature transfer component comprises a downsampling unit and a residual unit;

The downsampling unit is used for downsampling the input feature tensor to a number of channels that is twice the number of channels of the input feature tensor, and the length, width and height are half of the length, width and height of the input feature tensor. The output feature tensor of add operation;

The input of the downsampling unit is the output of the self-attention component, the output of the downsampling unit is the input of the residual unit, and the output of the residual unit is the output of the feature transfer component.
The method according to claim 2, wherein the multi-scale analysis component comprises: a seventh convolution layer, an eighth convolution layer, a ninth convolution layer, a tenth convolution layer, and an eleventh convolution layer layer, the twelfth convolution layer and the splicing unit; the expansion rates of the seventh convolution layer, the eighth convolution layer, the ninth convolution layer, and the tenth convolution layer are different; the splicing The unit is used to perform a concatenation operation on the input feature tensor;

The inputs of the seventh convolutional layer, the eighth convolutional layer, the ninth convolutional layer and the tenth convolutional layer are the outputs of the feature transfer component, and the inputs of the splicing unit are the output of the feature transfer component, the output of the seventh convolutional layer, the output of the eighth convolutional layer, the output of the ninth convolutional layer, and the output of the tenth convolutional layer, the The input of the eleventh convolutional layer is the output of the splicing unit, the input of the twelfth convolutional layer is the output of the eleventh convolutional layer; the output of the twelfth convolutional layer is the output of the Multiscale Analysis component described above.
The method according to claim 2, wherein the decoder component comprises four decoders in a serial structure; each decoder comprises: an upsampling unit, a fusion unit and a residual unit; the residual unit of each decoder It is used to perform a convolution operation on the input of the residual unit through three convolutional layers in a concatenated structure and perform a summation operation on the convolution result of the convolution operation and the input of the residual unit, and the fusion units of each decoder are all It is used to fuse the input feature tensor; the upsampling unit of each decoder is used to upsample the input feature tensor to a channel number that is half the number of channels of the input feature tensor, and the length, width, The output feature tensor whose height is twice the length, width and height of the input feature tensor;

The input of the upsampling unit of the first decoder of the decoder component is the output of the multi-scale analysis component, and the input of the fusion unit of the first decoder is the upsampling unit of the first decoder. output and the output of the self-attention module; the input of the residual unit of the first decoder is the output of the fusion unit of the first decoder and the output of the self-attention module; the decoding The inputs of the second decoder, the third decoder, and the up-sampling unit of the fourth decoder of the decoder component are the outputs of the previous decoder, and the second decoder and third decoder of the decoder component , the input of the fusion unit of the fourth decoder corresponds to the output of the residual unit of the encoder and the output of the upsampling unit of the corresponding decoder respectively, the second decoder of the decoder component, the third decoding, The input of the residual unit of the fourth decoder is the output of the corresponding residual unit of the encoder and the output of the corresponding fusion unit of the decoder.
The method according to claim 7, wherein the fusion unit of each decoder comprises: a thirteenth convolutional layer, a fourteenth convolutional layer, a fifteenth convolutional layer, a third summation unit, and a fourth summation unit Units a third dot product unit and a fourth dot product unit; the third and fourth summation units are used to perform an addition operation on the input, and the third and fourth dot product units are used to perform an addition operation on the input perform a dot product operation;

The inputs of the thirteenth convolutional layer and the fourteenth convolutional layer of the fusion unit of the first decoder of the decoder component are the output of the upsampling unit of the first decoder and the self-attention, respectively The output of the component, the input of the fusion unit of the second decoder, the third decoder, and the fourth decoder of the decoder component is the output of the upsampling unit of the corresponding decoder and the residual of the corresponding encoder. The output of the difference unit, the input of the third summation unit is the output of the thirteenth convolutional layer and the output of the fourteenth convolutional layer, and the input of the fifteenth convolutional layer is the third summation output of the unit, the input of the third dot product unit is the output of the thirteenth convolution layer and the output of the fifteenth convolution layer, and the input of the fourth dot product unit is the tenth The output of the fourth convolutional layer and the output of the fifteenth convolutional layer, the input of the fourth summing unit is the output of the third dot product unit and the output of the fourth dot product unit, the The output of the fourth summing unit is the output of the associated fusion unit.
The method according to any one of claims 1-8, characterized in that, according to the target deformation model and the predicted deformation model corresponding to each initial tooth model, the preset network model is optimized to obtain the tooth model deformation model, include:

Constructing a loss function, and optimizing the preset network model to obtain a tooth model deformation model according to the loss function, the target deformation model corresponding to each initial tooth model, and the predicted deformation model;

Wherein, the loss function includes:

Wherein, alpha is a constant, out i is the data obtained by sequentially processing the output of the multi-scale analysis component and the output of each decoder of the decoder component, seg is the intermediate supervision signal, and mean() is the averaging function .
A device for establishing a dental model deformation model, comprising:

a sample acquisition unit, configured to acquire sample data, the sample data includes a plurality of initial tooth models obtained by scanning the oral cavity and a target deformation model corresponding to each initial tooth model obtained by manually processing each initial tooth model;

The preprocessing unit is used to obtain the feature tensor corresponding to each initial tooth model, and each element of the feature tensor corresponding to each initial tooth model is the truncated signed distance function value TSDF value of each voxel in the cubic space where each initial tooth model is located;

The prediction unit is used to input the feature tensor corresponding to each initial tooth model into the preset network model, and obtain the predicted deformation model corresponding to each initial tooth model;

The optimization unit is configured to optimize the preset network model according to the target deformation model and the predicted deformation model corresponding to each initial tooth model to obtain the tooth model deformation model.