WO2022194258A1 - Method and apparatus for training dental cast deformation model - Google Patents

Method and apparatus for training dental cast deformation model Download PDF

Info

Publication number
WO2022194258A1
WO2022194258A1 PCT/CN2022/081543 CN2022081543W WO2022194258A1 WO 2022194258 A1 WO2022194258 A1 WO 2022194258A1 CN 2022081543 W CN2022081543 W CN 2022081543W WO 2022194258 A1 WO2022194258 A1 WO 2022194258A1
Authority
WO
WIPO (PCT)
Prior art keywords
unit
output
input
decoder
component
Prior art date
Application number
PCT/CN2022/081543
Other languages
French (fr)
Chinese (zh)
Inventor
刘娜丽
田彦
江腾飞
赵晓波
Original Assignee
先临三维科技股份有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 先临三维科技股份有限公司 filed Critical 先临三维科技股份有限公司
Publication of WO2022194258A1 publication Critical patent/WO2022194258A1/en

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T17/00Three dimensional [3D] modelling, e.g. data description of 3D objects
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/045Combinations of networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2210/00Indexing scheme for image generation or computer graphics
    • G06T2210/12Bounding box
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2210/00Indexing scheme for image generation or computer graphics
    • G06T2210/41Medical
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2210/00Indexing scheme for image generation or computer graphics
    • G06T2210/44Morphing

Definitions

  • the present disclosure relates to the technical field of three-dimensional deformation, and in particular, to a method and device for training a deformation model of a dental mold.
  • Tooth digitization technology aims at 3D modeling of teeth to obtain digital tooth models, so as to realize subsequent processing and personalized customization.
  • the final actual tooth model is not the initial tooth model obtained by scanning the oral cavity and performing 3D reconstruction, but the initial tooth model is further processed based on the specific product requirements, and then the tooth model that meets the specific product requirements is obtained.
  • the process of processing an initial tooth model based on specific product requirements to obtain a tooth model that meets specific product requirements is called 3D dental model deformation.
  • the deformation of 3D dental models is generally done manually. That is, a person manually processes the initial tooth model based on the specific product requirements so that the initial tooth model meets the specific product requirements.
  • manually completing the 3D dental model deformation has many disadvantages, such as low efficiency, high cost, and unreliable quality. Therefore, how to automatically convert the initial tooth model into a tooth model that meets the requirements of a specific product has become an urgent problem in the field to be solved.
  • an embodiment of the present disclosure provides a method for training a dental model deformation model, including:
  • the sample data includes a plurality of initial tooth models obtained by scanning the oral cavity and a target deformation model corresponding to each initial tooth model obtained by manually processing each initial tooth model;
  • each element of the feature tensor corresponding to each initial tooth model is the truncated signed distance function TSDF value of each voxel in the cubic space where each initial tooth model is located;
  • the preset network model is optimized to obtain the tooth model deformation model.
  • the preset network model includes: an encoder component composed of multiple encoders with a serial structure, a self-attention component, a feature transfer component, a multi-scale analysis component, and a decoder component composed of a plurality of decoders in a series structure; the input of the encoder component is the input of the preset network model, and the output of the encoder component is the input of the self-attention component; the The output of the self-attention component is the input of the feature transfer component; the output of the feature transfer component is the input of the multi-scale analysis component, and the output of the multi-scale analysis component is the input of the decoder component, so The output of the decoder component is the output of the preset network model;
  • the self-attention component is used to extract non-local information on the feature tensor output by the encoder component to obtain an environment feature tensor;
  • the feature transfer component processes the output of the self-attention component, and transmit the processing result to the multi-scale analysis component;
  • the multi-scale analysis component is used for extracting the feature tensors in multiple scales of the feature tensor output by the feature transfer component.
  • the encoder component includes three encoders with a serial structure, each encoder includes a residual unit and a down-sampling unit; the residual unit of each encoder is used to pass The three convolutional layers of concatenated structure perform a convolution operation on the input of the residual unit and perform a summation operation on the convolution result of the convolution operation and the input of the residual unit, and the downsampling unit of each encoder is used to
  • the input feature tensor downsampling is the output feature tensor whose number of channels is twice the number of channels of the input feature tensor, and the length, width and height are half of the length, width and height of the input feature tensor;
  • each residual unit is the input of the encoder to which it belongs
  • the input of each down-sampling unit is the output of the residual unit of the encoder to which it belongs
  • the output of each down-sampling unit is the output of the encoder to which it belongs.
  • the input of the first encoder is the input of the encoder component
  • the output of the third encoder is the output of the encoder component
  • the input of the second encoder and the third encoder are the first encoder and the second encoder respectively. output of the device.
  • the self-attention component includes: a residual unit, a first convolutional layer, a second convolutional layer, a third convolutional layer, a fourth convolutional layer, a Five convolution layers, sixth convolution layer, first dot product unit, second dot product unit, first sum unit, and second sum unit;
  • the residual unit is used for convolution through three concatenated structures
  • the layer performs a convolution operation on the input of the residual unit and performs a summation operation on the convolution result of the convolution operation and the input of the residual unit, and the first dot product unit and the second dot product unit are used for the input.
  • the feature tensor performs a dot product operation, and the first summation unit and the second summation unit are used to perform a summation operation on the input feature tensor;
  • the input of the residual unit is the output of the encoder component, the output of the residual unit is the input of the first convolutional layer; the output of the first convolutional layer is the second convolutional layer layer, the third convolution layer, the input of the fourth convolution layer; the input of the first dot product unit is the output of the second convolution layer and the output of the third convolution layer; The input of the second dot product unit is the output of the first dot product unit and the output of the fourth convolution layer; the input of the fifth convolution layer is the output of the second dot product unit; The input of the first summing unit is the output of the fifth convolutional layer and the output of the first convolutional layer; the input of the sixth convolutional layer is the output of the first summing unit; The input of the second summing unit is the output of the sixth convolutional layer and the output of the residual unit, and the output of the second summing unit is the output of the self-attention component.
  • the feature transfer component includes a downsampling unit and a residual unit
  • the downsampling unit is used for downsampling the input feature tensor to a number of channels that is twice the number of channels of the input feature tensor, and the length, width and height are half of the length, width and height of the input feature tensor.
  • the input of the downsampling unit is the output of the self-attention component
  • the output of the downsampling unit is the input of the residual unit
  • the output of the residual unit is the output of the feature transfer component
  • the multi-scale analysis component includes: a seventh convolution layer, an eighth convolution layer, a ninth convolution layer, a tenth convolution layer, and an eleventh convolution layer layer, the twelfth convolution layer and the splicing unit; the expansion rates of the seventh convolution layer, the eighth convolution layer, the ninth convolution layer, and the tenth convolution layer are different;
  • the splicing unit is used to perform a splicing operation on the input feature tensor;
  • the inputs of the seventh convolutional layer, the eighth convolutional layer, the ninth convolutional layer and the tenth convolutional layer are the outputs of the feature transfer component, and the inputs of the splicing unit are the output of the feature transfer component, the output of the seventh convolutional layer, the output of the eighth convolutional layer, the output of the ninth convolutional layer, and the output of the tenth convolutional layer, the
  • the input of the eleventh convolutional layer is the output of the splicing unit
  • the input of the twelfth convolutional layer is the output of the eleventh convolutional layer
  • the output of the twelfth convolutional layer is the output of the Multiscale Analysis component described above.
  • the decoder component includes four decoders in a serial structure; each decoder includes: an upsampling unit, a fusion unit, and a residual unit; the residual unit of each decoder It is used to perform a convolution operation on the input of the residual unit through three convolutional layers in a concatenated structure and perform a summation operation on the convolution result of the convolution operation and the input of the residual unit, and the fusion units of each decoder are all It is used to fuse the input feature tensor; the upsampling unit of each decoder is used to upsample the input feature tensor to a channel number that is half the number of channels of the input feature tensor, and the length, width, The output feature tensor whose height is twice the length, width and height of the input feature tensor;
  • the input of the upsampling unit of the first decoder of the decoder component is the output of the multi-scale analysis component, and the input of the fusion unit of the first decoder is the upsampling unit of the first decoder. output and the output of the self-attention module; the input of the residual unit of the first decoder is the output of the fusion unit of the first decoder and the output of the self-attention module; the decoding
  • the inputs of the second decoder, the third decoder, and the up-sampling unit of the fourth decoder of the decoder component are the outputs of the previous decoder, and the second decoder and third decoder of the decoder component , the input of the fusion unit of the fourth decoder corresponds to the output of the residual unit of the encoder and the output of the upsampling unit of the corresponding decoder respectively, the second decoder of the decoder component, the third decoding, The input of the residual unit of the fourth decoder is the
  • the fusion unit of each decoder includes: a thirteenth convolutional layer, a fourteenth convolutional layer, a fifteenth convolutional layer, a third summation unit, and a fourth summation unit Units a third dot product unit and a fourth dot product unit; the third and fourth summation units are used to perform an addition operation on the input, and the third and fourth dot product units are used to perform an addition operation on the input perform a dot product operation;
  • the inputs of the thirteenth convolutional layer and the fourteenth convolutional layer of the fusion unit of the first decoder of the decoder component are the output of the upsampling unit of the first decoder and the self-attention, respectively
  • the output of the component, the input of the fusion unit of the second decoder, the third decoder, and the fourth decoder of the decoder component is the output of the upsampling unit of the corresponding decoder and the residual of the corresponding encoder.
  • the output of the difference unit, the input of the third summation unit is the output of the thirteenth convolutional layer and the output of the fourteenth convolutional layer, and the input of the fifteenth convolutional layer is the third summation output of the unit
  • the input of the third dot product unit is the output of the thirteenth convolution layer and the output of the fifteenth convolution layer
  • the input of the fourth dot product unit is the tenth
  • the output of the fourth convolutional layer and the output of the fifteenth convolutional layer, the input of the fourth summing unit is the output of the third dot product unit and the output of the fourth dot product unit, the The output of the fourth summing unit is the output of the associated fusion unit.
  • the preset network model is optimized to obtain the tooth model deformation model, including:
  • the loss function includes:
  • alpha is a constant
  • out i is the data obtained by sequentially processing the output of the multi-scale analysis component and the output of each decoder of the decoder component
  • seg is the intermediate supervision signal
  • mean() is the averaging function .
  • an embodiment of the present disclosure provides a device for establishing a dental mold deformation model, including:
  • the sample data includes a plurality of initial tooth models obtained by scanning the oral cavity and a target deformation model corresponding to each initial tooth model obtained by manually processing each initial tooth model;
  • the preprocessing unit is used to obtain the feature tensor corresponding to each initial tooth model, and each element of the feature tensor corresponding to each initial tooth model is the truncated signed distance function value TSDF value of each voxel in the cubic space where each initial tooth model is located;
  • the prediction unit is used to input the feature tensor corresponding to each initial tooth model into the preset network model, and obtain the predicted deformation model corresponding to each initial tooth model;
  • the optimization unit is configured to optimize the preset network model according to the target deformation model and the predicted deformation model corresponding to each initial tooth model to obtain the tooth model deformation model.
  • the preset network model includes: an encoder component composed of multiple encoders with a serial structure, a self-attention component, a feature transfer component, a multi-scale analysis component, and a decoder component composed of a plurality of decoders in a series structure; the input of the encoder component is the input of the preset network model, and the output of the encoder component is the input of the self-attention component; the The output of the self-attention component is the input of the feature transfer component; the output of the feature transfer component is the input of the multi-scale analysis component, and the output of the multi-scale analysis component is the input of the decoder component, so The output of the decoder component is the output of the preset network model;
  • the self-attention component is used to extract non-local information on the feature tensor output by the encoder component to obtain an environment feature tensor;
  • the feature transfer component processes the output of the self-attention component, and transmit the processing result to the multi-scale analysis component;
  • the multi-scale analysis component is used for extracting the feature tensors in multiple scales of the feature tensor output by the feature transfer component.
  • the encoder component includes three encoders with a serial structure, each encoder includes a residual unit and a down-sampling unit; the residual unit of each encoder is used to pass The three convolutional layers of concatenated structure perform a convolution operation on the input of the residual unit and perform a summation operation on the convolution result of the convolution operation and the input of the residual unit, and the downsampling unit of each encoder is used to
  • the input feature tensor downsampling is the output feature tensor whose number of channels is twice the number of channels of the input feature tensor, and the length, width and height are half of the length, width and height of the input feature tensor;
  • each residual unit is the input of the encoder to which it belongs
  • the input of each down-sampling unit is the output of the residual unit of the encoder to which it belongs
  • the output of each down-sampling unit is the output of the encoder to which it belongs.
  • the input of the first encoder is the input of the encoder component
  • the output of the third encoder is the output of the encoder component
  • the input of the second encoder and the third encoder are the first encoder and the second encoder respectively. output of the device.
  • the self-attention component includes: a residual unit, a first convolutional layer, a second convolutional layer, a third convolutional layer, a fourth convolutional layer, a Five convolution layers, sixth convolution layer, first dot product unit, second dot product unit, first sum unit, and second sum unit;
  • the residual unit is used for convolution through three concatenated structures
  • the layer performs a convolution operation on the input of the residual unit and performs a summation operation on the convolution result of the convolution operation and the input of the residual unit, and the first dot product unit and the second dot product unit are used for the input.
  • the feature tensor performs a dot product operation, and the first summation unit and the second summation unit are used to perform a summation operation on the input feature tensor;
  • the input of the residual unit is the output of the encoder component, the output of the residual unit is the input of the first convolutional layer; the output of the first convolutional layer is the second convolutional layer layer, the third convolution layer, the input of the fourth convolution layer; the input of the first dot product unit is the output of the second convolution layer and the output of the third convolution layer; The input of the second dot product unit is the output of the first dot product unit and the output of the fourth convolution layer; the input of the fifth convolution layer is the output of the second dot product unit; The input of the first summing unit is the output of the fifth convolutional layer and the output of the first convolutional layer; the input of the sixth convolutional layer is the output of the first summing unit; The input of the second summing unit is the output of the sixth convolutional layer and the output of the residual unit, and the output of the second summing unit is the output of the self-attention component.
  • the feature transfer component includes a downsampling unit and a residual unit
  • the downsampling unit is used for downsampling the input feature tensor to a number of channels that is twice the number of channels of the input feature tensor, and the length, width and height are half of the length, width and height of the input feature tensor.
  • the input of the downsampling unit is the output of the self-attention component
  • the output of the downsampling unit is the input of the residual unit
  • the output of the residual unit is the output of the feature transfer component
  • the multi-scale analysis component includes: a seventh convolution layer, an eighth convolution layer, a ninth convolution layer, a tenth convolution layer, and an eleventh convolution layer layer, the twelfth convolution layer and the splicing unit; the expansion rates of the seventh convolution layer, the eighth convolution layer, the ninth convolution layer, and the tenth convolution layer are all different; the The stitching unit is used to perform a stitching operation on the input feature tensor;
  • the inputs of the seventh convolutional layer, the eighth convolutional layer, the ninth convolutional layer and the tenth convolutional layer are the outputs of the feature transfer component, and the inputs of the splicing unit are the output of the feature transfer component, the output of the seventh convolutional layer, the output of the eighth convolutional layer, the output of the ninth convolutional layer, and the output of the tenth convolutional layer, the
  • the input of the eleventh convolutional layer is the output of the splicing unit
  • the input of the twelfth convolutional layer is the output of the eleventh convolutional layer
  • the output of the twelfth convolutional layer is the output of the Multiscale Analysis component described above.
  • the decoder component includes four decoders in a serial structure; each decoder includes: an upsampling unit, a fusion unit, and a residual unit; the residual unit of each decoder It is used to perform a convolution operation on the input of the residual unit through three convolutional layers in a concatenated structure and perform a summation operation on the convolution result of the convolution operation and the input of the residual unit, and the fusion units of each decoder are all It is used to fuse the input feature tensor; the upsampling unit of each decoder is used to upsample the input feature tensor to a channel number that is half the number of channels of the input feature tensor, and the length, width, The output feature tensor whose height is twice the length, width and height of the input feature tensor;
  • the input of the upsampling unit of the first decoder of the decoder component is the output of the multi-scale analysis component, and the input of the fusion unit of the first decoder is the upsampling unit of the first decoder. output and the output of the self-attention module; the input of the residual unit of the first decoder is the output of the fusion unit of the first decoder and the output of the self-attention module; the decoding
  • the inputs of the second decoder, the third decoder, and the up-sampling unit of the fourth decoder of the decoder component are the outputs of the previous decoder, and the second decoder and third decoder of the decoder component , the input of the fusion unit of the fourth decoder corresponds to the output of the residual unit of the encoder and the output of the upsampling unit of the corresponding decoder respectively, the second decoder of the decoder component, the third decoding, The input of the residual unit of the fourth decoder is the
  • the fusion unit of each decoder includes: a thirteenth convolutional layer, a fourteenth convolutional layer, a fifteenth convolutional layer, a third summation unit, and a fourth summation unit Units a third dot product unit and a fourth dot product unit; the third and fourth summation units are used to perform an addition operation on the input, and the third and fourth dot product units are used to perform an addition operation on the input perform a dot product operation;
  • the inputs of the thirteenth convolutional layer and the fourteenth convolutional layer of the fusion unit of the first decoder of the decoder component are the output of the upsampling unit of the first decoder and the self-attention, respectively
  • the output of the component, the input of the fusion unit of the second decoder, the third decoder, and the fourth decoder of the decoder component is the output of the upsampling unit of the corresponding decoder and the residual of the corresponding encoder.
  • the output of the difference unit, the input of the third summation unit is the output of the thirteenth convolutional layer and the output of the fourteenth convolutional layer, and the input of the fifteenth convolutional layer is the third summation output of the unit
  • the input of the third dot product unit is the output of the thirteenth convolution layer and the output of the fifteenth convolution layer
  • the input of the fourth dot product unit is the tenth
  • the output of the fourth convolutional layer and the output of the fifteenth convolutional layer, the input of the fourth summing unit is the output of the third dot product unit and the output of the fourth dot product unit, the The output of the fourth summing unit is the output of the associated fusion unit.
  • the optimization unit is specifically configured to construct a loss function, and according to the loss function, the target deformation model corresponding to each initial tooth model, and the predicted deformation model, perform a The network model is optimized to obtain the dental model deformation model;
  • the loss function includes:
  • alpha is a constant
  • out i is the data obtained by sequentially processing the output of the multi-scale analysis component and the output of each decoder of the decoder component
  • seg is the intermediate supervision signal
  • mean() is the averaging function .
  • an embodiment of the present disclosure provides an electronic device, including: a memory and a processor, where the memory is used to store a computer program; the processor is used to execute the first aspect or any optional option of the first aspect when the computer program is invoked
  • the method for training a dental model deformation model according to the embodiment is: a memory and a processor, where the memory is used to store a computer program; the processor is used to execute the first aspect or any optional option of the first aspect when the computer program is invoked.
  • an embodiment of the present disclosure provides a computer-readable storage medium on which a computer program is stored, and when the computer program is executed by a processor, the first aspect or any optional implementation manner of the first aspect is implemented.
  • a method for training a dental model deformation model is described.
  • an embodiment of the present disclosure provides a computer program product, including a computer program/instruction, when the computer program/instruction is executed by a processor, the first aspect or any optional implementation manner of the first aspect is implemented.
  • the method for training a dental model deformation model provided by the embodiment of the present disclosure first obtains sample data including multiple initial tooth models and target deformation models corresponding to each initial tooth model, and then obtains the TSDF of each voxel in the cubic space where each initial tooth model is located The value is used as the element of the feature tensor corresponding to each initial tooth model, and then the feature tensor corresponding to each initial tooth model is input into the preset network model, and the predicted deformation model corresponding to each initial tooth model is obtained.
  • the target deformation model and the predicted deformation model are optimized to obtain the dental model deformation model by optimizing the preset network model.
  • the method for training a dental model deformation model provided by the embodiment of the present disclosure can obtain a dental model deformation model, and obtain a dental model that meets the requirements of a specific product according to the initial model that needs to be deformed according to the dental model deformation model, the embodiment of the present disclosure is adopted.
  • the dental model deformation model obtained by the provided method for training a dental model deformation model can automatically convert the initial tooth model into a tooth model that meets specific product requirements.
  • FIG. 1 is a flowchart of a method for training a dental model deformation model provided by one or more embodiments of the present disclosure
  • FIG. 2 is an architectural diagram of a preset network model provided by one or more embodiments of the present disclosure
  • FIG. 3 is a schematic structural diagram of an encoder assembly provided by one or more embodiments of the present disclosure.
  • FIG. 4 is a schematic structural diagram of a self-attention component provided by one or more embodiments of the present disclosure.
  • FIG. 5 is a schematic structural diagram of a feature transfer assembly provided by one or more embodiments of the present disclosure.
  • FIG. 6 is a schematic structural diagram of a multi-scale analysis component provided by one or more embodiments of the present disclosure.
  • FIG. 7 is a schematic structural diagram of a decoder component provided by one or more embodiments of the present disclosure.
  • FIG. 8 is a schematic structural diagram of a fusion unit provided by one or more embodiments of the present disclosure.
  • FIG. 9 is a structural diagram of an apparatus for training a dental mold deformation model provided by one or more embodiments of the present disclosure.
  • FIG. 10 is a schematic diagram of a hardware structure of an electronic device provided by one or more embodiments of the present disclosure.
  • words such as “exemplary” or “such as” are used to mean serving as an example, illustration, or illustration. Any embodiments or designs described in the embodiments of the present disclosure as “exemplary” or “such as” should not be construed as preferred or advantageous over other embodiments or designs. Rather, the use of words such as “exemplary” or “such as” is intended to present the related concepts in a specific manner.
  • the meaning of "plurality” refers to two or more.
  • the execution subject of the method for training a dental model deformation model provided by the embodiment of the present disclosure may be a device for establishing a dental model deformation model.
  • the device for establishing a dental model deformation model can be a mobile phone, a tablet computer, a notebook computer, an ultra-mobile personal computer (UMPC), a netbook, a personal digital assistant (PDA), a smart watch, a smart hand
  • UMPC ultra-mobile personal computer
  • PDA personal digital assistant
  • a terminal device such as a ring, or the terminal device may also be another type of terminal device, and the embodiment of the present disclosure does not limit the type of the terminal device.
  • An embodiment of the present disclosure provides a method for training a dental model deformation model.
  • the method for training a dental model deformation model includes the following steps S11 to S14:
  • the sample data includes a plurality of initial tooth models obtained by scanning the oral cavity and a target deformation model corresponding to each initial tooth model obtained by manually processing each initial tooth model.
  • the oral cavity can be scanned and 3D reconstructed for multiple users to obtain the initial tooth model of each user, and then some gingival areas in the dental models of various chambers can be manually removed based on specific requirements, and scanned and remodeled. Thereby, the target deformation model corresponding to each initial tooth model is obtained.
  • the feature tensor corresponding to each initial tooth model is the Truncated Signed Distance Function (TSDF) value of each voxel in the cubic space where each initial tooth model is located.
  • TSDF Truncated Signed Distance Function
  • acquiring the feature tensor corresponding to each initial tooth model may include: firstly establishing a square outer bounding box of the tooth model as the cubic space where each initial tooth model is located, then voxelizing the cubic space where each initial tooth model is located, and finally using Truncated Signed Distance Function (TSDF) calculates the distance of each voxel to the surface of the original tooth model as the TSDF value of each voxel.
  • TSDF Truncated Signed Distance Function
  • TSDF(x i , y i , z i )>0 indicates that the voxel is outside the tooth model
  • TSDF(x i , y i , z i ) ⁇ 0 indicates that the voxel is inside the tooth model.
  • a preset network model for generating a dental model deformation model is established in advance, and the feature tensor corresponding to each initial tooth model in the sample data is input into the preset network model, and the corresponding output is obtained as the prediction corresponding to the initial tooth model.
  • Deformation model is established in advance, and the feature tensor corresponding to each initial tooth model in the sample data is input into the preset network model, and the corresponding output is obtained as the prediction corresponding to the initial tooth model.
  • the method for training a dental model deformation model provided by the embodiment of the present disclosure first obtains sample data including multiple initial tooth models and target deformation models corresponding to each initial tooth model, and then obtains the TSDF of each voxel in the cubic space where each initial tooth model is located The value is used as the element of the feature tensor corresponding to each initial tooth model, and then the feature tensor corresponding to each initial tooth model is input into the preset network model, and the predicted deformation model corresponding to each initial tooth model is obtained.
  • the target deformation model and the predicted deformation model are optimized to obtain the dental model deformation model by optimizing the preset network model.
  • the method for training a dental model deformation model provided by the embodiment of the present disclosure can obtain a dental model deformation model, and obtain a dental model that meets the requirements of a specific product according to the initial model that needs to be deformed according to the dental model deformation model, the embodiment of the present disclosure is adopted.
  • the dental model deformation model obtained by the provided method for training a dental model deformation model can automatically convert the initial tooth model into a tooth model that meets specific product requirements.
  • the preset network model in the above embodiment will be described in detail below.
  • the preset network model in the embodiment of the present disclosure includes:
  • An encoder component 21, a self-attention component 22, a feature transfer component 23, a multi-scale analysis component 24, and a decoder component 25 consisting of a plurality of concatenated decoders are composed of multiple encoders in concatenated structures.
  • the input of the encoder component 21 is the input of the preset network model, and the output of the encoder component 21 is the input of the self-attention component 22; the output of the self-attention component 22 is the feature
  • the output of is the output of the preset network model.
  • the self-attention component 22 is used for non-local information extraction on the feature tensor output by the encoder component 21 to obtain the environmental feature tensor; the feature transfer component performs the output of the self-attention component process, and transmit the processing result to the multi-scale analysis component; the multi-scale analysis component 24 is configured to extract the feature tensors of the feature tensor output by the feature transfer component 23 in multiple scales.
  • the self-attention component can perform non-local information extraction on the feature tensor output by the encoder component to obtain the dependencies between non-local features
  • the multi-scale analysis component can extract the feature tensor output by the feature transfer component Feature tensors at multiple scales, so as to mine the correlation between feature tensors at different scales, and obtain context information including multi-scale analysis results. Therefore, the method for training a dental model deformation model provided by the embodiments of the present disclosure
  • the obtained tooth model deformation model can more accurately deform the tooth model, and more accurately obtain a tooth model that meets specific requirements.
  • the encoder component 21 includes three encoders (encoder 211 , encoder 212 , encoder 213 ) in series structure, each encoder (encoder 211 , encoder 212 , encoder device 213) includes a residual unit (residual unit E1, residual unit E2, residual unit E3) and a down-sampling unit (down-sampling unit Do1, down-sampling unit Do2, down-sampling unit Do3);
  • the difference unit (residual unit E1, residual unit E2, residual unit E3) is used to perform a convolution operation on the input of the residual unit through three convolutional layers in a series structure and the convolution result of the convolution operation The sum operation is performed with the input of the residual unit, and the downsampling units of each encoder (downsampling unit Do1, downsampling unit Do2, downsampling unit Do3) are used to downsample the input feature tensor into the number of channels as the input feature tensor
  • the output is performed with the input
  • the input of each residual unit is the input of the corresponding encoder (the input of the residual unit E1 is the input of the encoder 211, the input of the residual unit E2 is the input of the encoder 212, and the input of the residual unit E3 is the input of the encoder. 213 input).
  • the input of each downsampling unit is the output of the residual unit of the corresponding encoder (the input of the downsampling unit Do1 is the output of the residual unit E1 of the encoder 211, and the input of the downsampling unit Do2 is the residual unit of the encoder 212.
  • the output of E2 and the input of the down-sampling unit Do3 are the output of the residual unit E3 of the encoder 213), and the output of each down-sampling unit is the output of the corresponding encoder (the output of the down-sampling unit Do1 is the output of the encoder 211).
  • the output of the down-sampling unit Do2 is the output of the encoder 212
  • the output of the down-sampling unit Do3 is the output of the encoder 213)
  • the input of the first encoder 211 is the input of the encoder component 21
  • the third encoder 213 The output of the encoder component 21 is the output of the encoder component 21, and the inputs of the second encoder 212 and the third encoder 213 are the outputs of the first encoder 211 and the second encoder 212, respectively.
  • the input of the encoder component 21, the input of the first encoder 211, and the input of the residual unit E1 are the same input, and the output of the residual unit E1 is the input of the down-sampling unit Do1.
  • the output of the down-sampling unit Do1 is the output of the first encoder 211 .
  • the input of the second encoder 212 and the input of the residual unit E2 are the same input, and both are outputs of the first encoder 211 .
  • the output of the residual unit E2 is the input of the down-sampling unit Do2.
  • the output of the downsampling unit Do2 is the output of the second encoder 212.
  • the input of the third encoder 213 and the input of the residual unit E3 are the same input, and both are the output of the second encoder 212 .
  • the output of the residual unit E3 is the input of the down-sampling unit Do3.
  • the output of the down-sampling unit Do3 is the output of the third encoder 212 , the output of the encoder component 21 .
  • the convolution kernels of the three convolutional layers of the residual unit of each decoder are all 3 ⁇ 3 ⁇ 3, and the length of the feature tensor output by the three convolutional layers of the residual unit of each decoder is , width and height are the same as the length, width and height of the input feature tensor.
  • the number of channels of the feature tensor output by the residual unit of the first encoder is 16 times the number of channels of the input feature tensor.
  • the number of channels of the feature tensor output by the residual unit of the encoder and the residual unit of the third encoder is the same as the number of channels of the input feature tensor.
  • Each down-sampling unit is a convolutional layer with a stride of 2 and a convolution kernel of 2 ⁇ 2 ⁇ 2.
  • the self-attention component 22 includes: a residual unit E4, a first convolutional layer Co1, a second convolutional layer Co2, a third convolutional layer Co3, and a fourth convolutional layer Co4 , the fifth convolutional layer Co5, the sixth convolutional layer Co6, the first dot product unit Pro1, the second dot product unit Pro2, the first summation unit Add1, and the second summation unit Add2.
  • the residual unit E4 is used to perform a convolution operation on the input of the residual unit through three convolutional layers of a concatenated structure, and perform a sum operation on the convolution result of the convolution operation and the input of the residual unit, so
  • the first dot product unit Pro1 and the second dot product unit Pro2 are used to perform a dot product operation on the input feature tensor, and the first addition unit Add1 and the second addition unit Add2 perform a sum operation on the input feature tensor .
  • the input of the residual unit E4 is the output of the encoder component 21 (the output of the downsampling unit Do3 of the third encoder 213 of the encoder component 21), and the output of the residual unit E4 is the An input to the convolutional layer Co1.
  • the output of the first convolutional layer Co1 is the input of the second convolutional layer Co2, the third convolutional layer Co3 and the fourth convolutional layer Co4.
  • the input of the first dot product unit Pro1 is the output of the second convolutional layer Co2 and the output of the third convolutional layer Co3.
  • the input of the second dot product unit Pro2 is the output of the first dot product unit Pro1 and the output of the fourth convolution layer Co4.
  • the input of the fifth convolutional layer Co5 is the output of the second dot product unit Pro2.
  • the input of the first summing unit Add1 is the output of the fifth convolutional layer Co5 and the output of the first convolutional layer Co1.
  • the input of the sixth convolutional layer Co6 is the output of the first summing unit Add1.
  • the inputs of the second summing unit Add2 are the output of the sixth convolutional layer Co6 and the output of the residual unit E4.
  • the output of the second summing unit Add2 is the output of the self-attention component 22 .
  • the first convolutional layer, the second convolutional layer, the third convolutional layer, the fourth convolutional layer, the fifth convolutional layer, and the sixth convolutional layer The convolution kernels of the layers are all 1 ⁇ 1 ⁇ 1.
  • the length, width and height of the output feature tensor of the first convolution layer are the same as the length, width and height of the input feature tensor.
  • the number of channels of the output feature tensor of the first convolution layer is one eighth of the number of channels of the input feature tensor.
  • the number of channels of the output feature tensor of the second convolution layer, the third convolution layer, and the fourth convolution layer is half of the channel number of the feature tensor of the input feature tensor.
  • the number of channels of the feature tensor of the output of the fifth convolution layer is twice the number of channels of the feature tensor of the input feature tensor.
  • the number of channels of the feature tensor of the output of the sixth convolution layer is eight times the number of channels of the feature tensor of the input feature tensor.
  • the output feature tensor X ⁇ R C ⁇ H ⁇ W ⁇ L of the residual unit E4 where C is the number of channels of the output feature tensor of the residual unit E4, H, W, L are the residuals respectively
  • C is the number of channels of the output feature tensor of the residual unit E4
  • H, W, L are the residuals respectively
  • the feature tensor X 2 ⁇ R C2 ⁇ H ⁇ W ⁇ L output by Co2, C2 C1/2
  • the feature tensor X 3 ⁇ R C3 ⁇ H ⁇ W ⁇ L output by the third convolution layer Co3, C3 C1 /2.
  • the labels of the i-th voxel in X_2 and the j-th voxel in X 3 are represented by x i ⁇ R 1 and x j ⁇ R 1 respectively, and X 2 ( xi ) ⁇ R C2 represents the feature of the x i -th voxel in X 2 Vector, X 3 (x j ) ⁇ R C3 represents the feature vector of the x jth voxel in X 3 , then the attention distribution is:
  • the feature tensor X 5 output by the fifth convolution layer Co5 the first summation unit Add1 performs the sum operation on X 5 and X 1 , and the feature tensor res1 ⁇ R C1 ⁇ H ⁇ W output by the first summation unit Add1 ⁇ L , the feature tensor X 6 ⁇ R C ⁇ H ⁇ W ⁇ L output by the sixth convolution layer Co6, the second summation unit Add2 performs the sum operation on X and X 6 to obtain the final environmental feature tensor res2 ⁇ R C ⁇ W ⁇ H ⁇ L .
  • the feature transfer component 23 includes a downsampling unit Do4 and a residual unit E5;
  • the downsampling unit Do4 is used for downsampling the input feature tensor to a channel number that is twice the number of channels of the input feature tensor, and the length, width, and height are half of the length, width, and height of the input feature tensor.
  • the residual unit E5 is configured to perform a convolution operation on the input of the residual unit through three convolutional layers in a concatenated structure, and perform a summation operation on the convolution result of the convolution operation and the input of the residual unit.
  • the input of the downsampling unit Do4 is the output of the self-attention component 22
  • the output of the downsampling unit Do4 is the input of the residual unit E5
  • the output of the residual unit E5 is the feature transfer Output of component 23.
  • the downsampling unit Do4 is a convolution layer with a stride of 2 and a convolution kernel of 2 ⁇ 2 ⁇ 2.
  • the convolution kernels of the three convolutional layers of the residual unit E5 are all 3 ⁇ 3 ⁇ 3, and the length, width and height of the output feature tensor of the three convolutional layers of each residual unit and the input features
  • the length, width and height of the tensor are the same, and the number of channels of the output feature tensor of the residual unit E5 is the same as the number of channels of the input feature tensor.
  • the multi-scale analysis component 24 includes: the seventh convolutional layer Co7, the eighth convolutional layer Co8, the ninth convolutional layer Co9, the tenth convolutional layer Co10, the eleventh volume The stacking layer Co11, the twelfth convolutional layer Co12, and the splicing unit MON.
  • the expansion rates of the seventh convolutional layer Co7, the eighth convolutional layer Co8, the ninth convolutional layer Co9, and the tenth convolutional layer Co10 are all different; the splicing unit MON is used to The feature tensor of the splicing operation is performed.
  • the inputs of the seventh convolutional layer Co7, the eighth convolutional layer Co8, the ninth convolutional layer Co9, and the tenth convolutional layer Co10 are the outputs of the feature transfer component 23, and the The input of the splicing unit MON is the output of the feature transfer component 23, the output of the seventh convolutional layer Co7, the output of the eighth convolutional layer Co8, the output of the ninth convolutional layer Co9 and the The output of the tenth convolutional layer Co10, the input of the eleventh convolutional layer Co11 is the output of the splicing unit MON, and the input of the twelfth convolutional layer Co12 is the eleventh convolutional layer Co11 The output of the twelfth convolutional layer Co12 is the output of the multi-scale analysis component 24 .
  • the convolution kernel of the seventh convolution layer is 1 ⁇ 1 ⁇ 1
  • the number of channels of the output feature tensor is the same as the number of channels of the input feature tensor
  • the expansion rate is 1.
  • the convolution kernels of the eighth convolutional layer, the ninth convolutional layer, and the tenth convolutional layer are all 3 ⁇ 3 ⁇ 3, and the number of channels of the output feature tensor is the same as the input feature tensor.
  • the number of channels is the same, and the expansion rates are 2, 3, and 4, respectively.
  • the convolution kernel of the eleventh convolution layer is 3 ⁇ 3 ⁇ 3, and the number of channels of the output feature tensor is one-fifth of the number of channels of the input feature tensor; the twelfth convolution layer
  • the convolution kernel is 3 ⁇ 3 ⁇ 3, and the number of channels of the output feature tensor is the same as the number of channels of the input feature tensor.
  • the feature tensor A ⁇ R C ⁇ H ⁇ W ⁇ L output by the feature transfer component 23 then the feature tensor A1 ⁇ R C ⁇ H ⁇ W ⁇ L output by the seventh convolutional layer Co7, the eighth convolutional layer
  • the feature tensor A2 ⁇ R C ⁇ H ⁇ W ⁇ L output by Co8 the feature tensor A3 ⁇ R C ⁇ H ⁇ W ⁇ L output by the ninth convolutional layer Co9, the feature tensor output by the tenth convolutional layer Co10 A4 ⁇ R C ⁇ H ⁇ W ⁇ L , the feature tensor Cat ⁇ R 5 ⁇ C ⁇ H ⁇ W ⁇ L output by the splicing module MON, the feature tensor Cat1 ⁇ R C ⁇ H output by the eleventh convolutional layer Co11 ⁇ W ⁇ L , the feature tensor Cat2 ⁇ R C ⁇ H ⁇ W ⁇ L output by the twelfth convolutional layer Co12.
  • the decoder component 25 includes four decoders (decoder 251 , decoder 252 , decoder 253 , and decoder 254 ) in a serial structure; each decoder includes: an upsampling unit ( The upsampling unit Up1 of the decoder 251, the upsampling unit Up2 of the decoder 252, the upsampling unit Up3 of the decoder 253, the upsampling unit Up4 of the decoder 254), the fusion unit (the fusion unit F1 of the decoder 251, the decoder 252 fusion unit F2, decoder 253 fusion unit F3, decoder 254 fusion unit F4) and residual units (decoder 251 residual unit E6, decoder 252 residual unit E7, decoder 253 residual unit difference unit E8, residual unit E9 of decoder 254).
  • each decoder includes: an upsampling unit ( The upsampling unit Up1 of the decoder 251, the upsampling unit Up2
  • the residual unit of each decoder is used to perform convolution operation on the input of the residual unit through three convolutional layers of concatenated structure, and perform addition of the convolution result of the convolution operation and the input of the residual unit. and operation, the fusion unit of each decoder is used to perform fusion operation on the input feature tensor; the upsampling unit of each decoder is used to upsample the input feature tensor to a number of channels equal to the number of channels of the input feature tensor. One-half, the output feature tensor whose length, width and height are twice the length, width and height of the input feature tensor.
  • the input of the upsampling unit Up1 of the first decoder 251 of the decoder component is the output of the multi-scale analysis component 24, and the input of the fusion unit F1 of the first decoder 251 is the first decoder
  • the input of the residual unit E6 of the first decoder 251 is the output of the fusion unit F1 of the first decoder 251 and The output of the self-attention module 22;
  • the up-sampling units up-sampling unit Up2, up-sampling unit Up3, The input of the upsampling unit Up4) is the output of the previous decoder, and the fusion units of the second decoder 252, the third decoding 253, and the fourth decoder 254 of the decoder component (fusion unit F2, fusion
  • the convolution kernels of the three convolutional layers of the residual unit of each encoder are all 3 ⁇ 3 ⁇ 3, and the length of the feature tensor output by the three convolutional layers of the residual unit of each encoder is , width and height are the same as the length, width and height of the input feature tensor, and the number of channels of the feature tensor output by the residual unit of each decoder is the same as the number of channels of the input feature tensor.
  • Each upsampling unit is a deconvolution layer with a stride of 2 and a convolution kernel of 2 ⁇ 2 ⁇ 2.
  • the fusion unit of each decoder includes: the thirteenth convolutional layer Co13, the fourteenth convolutional layer Co14, the fifteenth convolutional layer Co15, the third summation unit Add3, and the fourth summation Unit Add4, third dot product unit Pro3, and fourth dot product unit Pro4.
  • the third adding unit Add3 and the fourth adding unit Add4 are used to perform an adding operation on the input, and the third dot product unit Pro3 and the fourth dot product unit Pro4 are used to perform a dot product operation on the input.
  • the inputs of the thirteenth convolutional layer Co13 and the fourteenth convolutional layer Co14 of the fusion unit F1 of the first decoder 251 of the decoder component 25 are respectively the inputs of the upsampling unit Up1 of the first decoder 251 .
  • the output and the output of the self-attention component 22, the fusion units of the second decoder 252, the third decoder 253, and the fourth decoder 254 of the decoder component (fusion unit F2, fusion unit F3,
  • the input of the fusion unit F4) is the output of the upsampling unit of the corresponding decoder and the output of the corresponding residual unit of the encoder (the input of the fusion unit F2 is the output of the residual unit E3 of the encoder 213 and the output of the decoder 252.
  • the output of the upsampling unit Up2, the input of the fusion unit F3 is the output of the residual unit E2 of the encoder 212 and the output of the upsampling unit Up3 of the decoder 253, and the input of the fusion unit F4 is the residual unit E1 of the encoder 211.
  • the input of the third summing unit Add3 is the output of the thirteenth convolutional layer Co13 and the output of the fourteenth convolutional layer Co14
  • the The input of the fifteenth convolutional layer Co15 is the output of the third summing unit Add3
  • the input of the third dot product unit Pro3 is the output of the thirteenth convolutional layer Co13 and the fifteenth convolutional layer Co15
  • the input of the fourth dot product unit Pro4 is the output of the fourteenth convolutional layer Co14 and the output of the fifteenth convolutional layer Co15
  • the fusion unit performs convolution operations on the input two feature tensors Ai and Bi through the thirteenth convolutional layer Co13 and the fourteenth convolutional layer Co14, respectively, to obtain the features after dimension reduction.
  • Tensors Ci and Di then use the third summing unit Add3 to perform the sum operation on Ci and Di, fuse Ci and Di, and then send the fusion result of the third summing unit Add3 to the fifteenth convolutional layer Co15 to obtain the encoder weights respectively
  • the result Hi is obtained by dot multiplication; finally, the addition operation is performed on Gi and Hi through the fourth adding unit Add4, and the fusion feature Zi containing the output feature of the encoder and the output feature of the decoder is obtained
  • the convolution kernels of the thirteenth convolutional layer, the fourteenth convolutional layer, and the fifteenth convolutional layer are all 1 ⁇ 1 ⁇ 1, and the output features of the third convolutional layer and the fourth convolutional layer are
  • the number of channels of the tensor is half of the number of channels of the input feature tensor, and the number of channels of the output feature tensor of the fifteenth convolutional layer is 1.
  • the above step S104 (according to the target deformation model and the predicted deformation model corresponding to each initial tooth model, optimize the preset network model to obtain the tooth model deformation model), including :
  • the loss function includes:
  • alpha is a constant
  • out i is data obtained by sequentially processing the output of the multi-scale analysis component and the output of each decoder of the decoder component
  • seg is an intermediate supervision signal
  • out 1 is to perform a convolution operation on the output of the multi-scale analysis component through a convolution layer with a convolution kernel of 1 ⁇ 1 ⁇ 1 and the number of channels of the output feature tensor is 1, and then through trilinear interpolation (Trilinear interpolation). ) Expand the length, width and height of the feature tensor obtained by the convolution operation by 16 times, and then perform the sigmoid operation on the trilinear interpolation result to obtain the result;
  • out 2 is to perform a convolution operation on the output of the first decoder of the decoder component through a convolution layer with a convolution kernel of 1 ⁇ 1 ⁇ 1 and the number of channels of the output feature tensor is 1, and then through trilinear interpolation (Trilinear interpolation). ) Expand the length, width and height of the feature tensor obtained by the convolution operation by 8 times, and then perform the Sigmoid operation on the trilinear interpolation result to obtain the result;
  • out 3 is to perform a convolution operation on the output of the second decoder of the decoder component through a convolution layer with a convolution kernel of 1 ⁇ 1 ⁇ 1 and the number of channels of the output feature tensor is 1, and then through trilinear interpolation (Trilinear interpolation). ) Expand the length, width and height of the feature tensor obtained by the convolution operation by 4 times, and then perform the Sigmoid operation on the trilinear interpolation result to obtain the result;
  • out 4 is to perform a convolution operation on the output of the third decoder of the decoder component through a convolution layer with a convolution kernel of 1 ⁇ 1 ⁇ 1 and the number of channels of the output feature tensor is 1, and then through trilinear interpolation (Trilinear interpolation) ) Expand the length, width and height of the feature tensor obtained by the convolution operation by 2 times, and then perform the Sigmoid operation on the trilinear interpolation result to obtain the result;
  • out 5 is to perform a convolution operation on the output of the fourth decoder of the decoder component through a convolution layer with a convolution kernel of 1 ⁇ 1 ⁇ 1 and the number of channels of the output feature tensor is 1, and then the convolution operation is obtained.
  • the feature tensor performs a sigmoid operation to get the result.
  • alpha is 0.25.
  • an embodiment of the present disclosure further provides an apparatus for establishing a dental model deformation model, and the apparatus embodiment corresponds to the foregoing method embodiment.
  • this apparatus embodiment does not The details in the foregoing method embodiments are described one by one, but it should be clear that the apparatus for establishing a dental model deformation model in this embodiment can correspondingly implement all the foregoing method embodiments.
  • FIG. 9 is a schematic structural diagram of an apparatus for establishing a dental mold deformation model provided by an embodiment of the present disclosure.
  • the apparatus 900 for establishing a dental mold deformation model provided in this embodiment includes:
  • the sample acquisition unit 91 is configured to acquire sample data, the sample data includes a plurality of initial tooth models obtained by scanning the oral cavity and a target deformation model corresponding to each initial tooth model obtained by manually processing each initial tooth model;
  • the preprocessing unit 92 is used to obtain the feature tensor corresponding to each initial tooth model, and each element of the feature tensor corresponding to each initial tooth model is the truncated symbol distance function value TSDF value of each voxel in the cubic space where each initial tooth model is located ;
  • the prediction unit 93 is used to input the feature tensor corresponding to each initial tooth model into the preset network model, and obtain the predicted deformation model corresponding to each initial tooth model;
  • the optimization unit 94 is configured to optimize the preset network model according to the target deformation model and the predicted deformation model corresponding to each initial tooth model to obtain the tooth model deformation model.
  • the preset network model includes: an encoder component composed of multiple encoders with a serial structure, a self-attention component, a feature transfer component, a multi-scale analysis component, and a decoder component composed of a plurality of decoders in a series structure; the input of the encoder component is the input of the preset network model, and the output of the encoder component is the input of the self-attention component; the The output of the self-attention component is the input of the feature transfer component; the output of the feature transfer component is the input of the multi-scale analysis component, and the output of the multi-scale analysis component is the input of the decoder component, so The output of the decoder component is the output of the preset network model;
  • the self-attention component is used to extract non-local information on the feature tensor output by the encoder component to obtain an environment feature tensor;
  • the feature transfer component processes the output of the self-attention component, and transmit the processing result to the multi-scale analysis component;
  • the multi-scale analysis component is used for extracting the feature tensors in multiple scales of the feature tensor output by the feature transfer component.
  • the encoder component includes three encoders with a serial structure, each encoder includes a residual unit and a down-sampling unit; the residual unit of each encoder is used to pass The three convolutional layers of concatenated structure perform a convolution operation on the input of the residual unit and perform a summation operation on the convolution result of the convolution operation and the input of the residual unit, and the downsampling unit of each encoder is used to
  • the input feature tensor downsampling is the output feature tensor whose number of channels is twice the number of channels of the input feature tensor, and the length, width and height are half of the length, width and height of the input feature tensor;
  • each residual unit is the input of the encoder to which it belongs
  • the input of each down-sampling unit is the output of the residual unit of the encoder to which it belongs
  • the output of each down-sampling unit is the output of the encoder to which it belongs.
  • the input of the first encoder is the input of the encoder component
  • the output of the third encoder is the output of the encoder component
  • the input of the second encoder and the third encoder are the first encoder and the second encoder respectively. output of the device.
  • the self-attention component includes: a residual unit, a first convolutional layer, a second convolutional layer, a third convolutional layer, a fourth convolutional layer, a Five convolution layers, sixth convolution layer, first dot product unit, second dot product unit, first sum unit, and second sum unit;
  • the residual unit is used for convolution through three concatenated structures
  • the layer performs a convolution operation on the input of the residual unit and performs a summation operation on the convolution result of the convolution operation and the input of the residual unit, and the first dot product unit and the second dot product unit are used for the input.
  • the feature tensor performs a dot product operation, and the first summation unit and the second summation unit are used to perform a summation operation on the input feature tensor;
  • the input of the residual unit is the output of the encoder component, the output of the residual unit is the input of the first convolutional layer; the output of the first convolutional layer is the second convolutional layer layer, the third convolution layer, the input of the fourth convolution layer; the input of the first dot product unit is the output of the second convolution layer and the output of the third convolution layer; The input of the second dot product unit is the output of the first dot product unit and the output of the fourth convolution layer; the input of the fifth convolution layer is the output of the second dot product unit; The input of the first summing unit is the output of the fifth convolutional layer and the output of the first convolutional layer; the input of the sixth convolutional layer is the output of the first summing unit; The input of the second summing unit is the output of the sixth convolutional layer and the output of the residual unit, and the output of the second summing unit is the output of the self-attention component.
  • the feature transfer component includes a downsampling unit and a residual unit
  • the downsampling unit is used for downsampling the input feature tensor to a number of channels that is twice the number of channels of the input feature tensor, and the length, width and height are half of the length, width and height of the input feature tensor.
  • the input of the downsampling unit is the output of the self-attention component
  • the output of the downsampling unit is the input of the residual unit
  • the output of the residual unit is the output of the feature transfer component
  • the multi-scale analysis component includes: a seventh convolution layer, an eighth convolution layer, a ninth convolution layer, a tenth convolution layer, and an eleventh convolution layer layer, the twelfth convolution layer and the splicing unit; the expansion rates of the seventh convolution layer, the eighth convolution layer, the ninth convolution layer, and the tenth convolution layer are different; the splicing The unit is used to perform a concatenation operation on the input feature tensor;
  • the inputs of the seventh convolutional layer, the eighth convolutional layer, the ninth convolutional layer and the tenth convolutional layer are the outputs of the feature transfer component, and the inputs of the splicing unit are the output of the feature transfer component, the output of the seventh convolutional layer, the output of the eighth convolutional layer, the output of the ninth convolutional layer, and the output of the tenth convolutional layer, the
  • the input of the eleventh convolutional layer is the output of the splicing unit
  • the input of the twelfth convolutional layer is the output of the eleventh convolutional layer
  • the output of the twelfth convolutional layer is the output of the Multiscale Analysis component described above.
  • the decoder component includes four decoders in a serial structure; each decoder includes: an upsampling unit, a fusion unit, and a residual unit; the residual unit of each decoder It is used to perform a convolution operation on the input of the residual unit through three convolutional layers in a concatenated structure and perform a summation operation on the convolution result of the convolution operation and the input of the residual unit, and the fusion units of each decoder are all It is used to fuse the input feature tensor; the upsampling unit of each decoder is used to upsample the input feature tensor to a channel number that is half the number of channels of the input feature tensor, and the length, width, The output feature tensor whose height is twice the length, width and height of the input feature tensor;
  • the input of the upsampling unit of the first decoder of the decoder component is the output of the multi-scale analysis component, and the input of the fusion unit of the first decoder is the upsampling unit of the first decoder. output and the output of the self-attention module; the input of the residual unit of the first decoder is the output of the fusion unit of the first decoder and the output of the self-attention module; the decoding
  • the inputs of the second decoder, the third decoder, and the up-sampling unit of the fourth decoder of the decoder component are the outputs of the previous decoder, and the second decoder and third decoder of the decoder component , the input of the fusion unit of the fourth decoder corresponds to the output of the residual unit of the encoder and the output of the upsampling unit of the corresponding decoder respectively, the second decoder of the decoder component, the third decoding, The input of the residual unit of the fourth decoder is the
  • the fusion unit of each decoder includes: a thirteenth convolutional layer, a fourteenth convolutional layer, a fifteenth convolutional layer, a third summation unit, and a fourth summation unit Units a third dot product unit and a fourth dot product unit; the third and fourth summation units are used to perform an addition operation on the input, and the third and fourth dot product units are used to perform an addition operation on the input perform a dot product operation;
  • the inputs of the thirteenth convolutional layer and the fourteenth convolutional layer of the fusion unit of the first decoder of the decoder component are the output of the upsampling unit of the first decoder and the self-attention, respectively
  • the output of the component, the input of the fusion unit of the second decoder, the third decoder, and the fourth decoder of the decoder component is the output of the upsampling unit of the corresponding decoder and the residual of the corresponding encoder.
  • the output of the difference unit, the input of the third summation unit is the output of the thirteenth convolutional layer and the output of the fourteenth convolutional layer, and the input of the fifteenth convolutional layer is the third summation output of the unit
  • the input of the third dot product unit is the output of the thirteenth convolution layer and the output of the fifteenth convolution layer
  • the input of the fourth dot product unit is the tenth
  • the output of the fourth convolutional layer and the output of the fifteenth convolutional layer, the input of the fourth summing unit is the output of the third dot product unit and the output of the fourth dot product unit, the The output of the fourth summing unit is the output of the associated fusion unit.
  • the optimization unit 94 is specifically configured to construct a loss function, and according to the loss function, the target deformation model corresponding to each initial tooth model, and the predicted deformation model, the The preset network model is optimized to obtain the dental model deformation model;
  • the loss function includes:
  • alpha is a constant
  • out i is the data obtained by sequentially processing the output of the multi-scale analysis component and the output of each decoder of the decoder component
  • seg is the intermediate supervision signal
  • mean() is the averaging function .
  • the apparatus for establishing a dental model deformation model provided in this embodiment can execute the method for training a dental model deformation model provided by the above method embodiments, and the implementation principle and technical effect thereof are similar, and are not repeated here.
  • FIG. 10 is a schematic structural diagram of an electronic device provided by an embodiment of the present disclosure.
  • the electronic device provided by this embodiment includes: a memory 101 and a processor 102.
  • the memory 101 is used for storing computer programs; the processor 102 is used for When the computer program is invoked, each step in the method for training a dental model deformation model provided by the above method embodiments is executed.
  • the memory 101 can be used to store software programs and various data.
  • the memory 101 may mainly include a stored program area and a stored data area, wherein the stored program area may store an operating system, an application program required for at least one function (such as a sound playback function, an image playback function, etc.), etc.; Data created by the use of the mobile phone (such as audio data, phone book, etc.), etc.
  • memory 101 may include high-speed random access memory, and may also include non-volatile memory, such as at least one magnetic disk storage device, flash memory device, or other volatile solid-state storage device.
  • the processor 102 is the control center of the electronic device, using various interfaces and lines to connect various parts of the entire electronic device, by running or executing the software programs and/or modules stored in the memory 101, and calling the data stored in the memory 101. , perform various functions of electronic equipment and process data, so as to monitor electronic equipment as a whole.
  • Processor 102 may include one or more processing units.
  • the electronic device may further include components such as a radio frequency unit, a network module, an audio output unit, a sensor, a signal receiving unit, a display, a user receiving unit, an interface unit, and a power supply.
  • components such as a radio frequency unit, a network module, an audio output unit, a sensor, a signal receiving unit, a display, a user receiving unit, an interface unit, and a power supply.
  • the structure of the electronic device described above does not constitute a limitation on the electronic device, and the electronic device may include more or less components, or combine some components, or arrange different components.
  • electronic devices include, but are not limited to, mobile phones, tablet computers, notebook computers, handheld computers, vehicle-mounted terminals, wearable devices, and pedometers.
  • the radio frequency unit can be used for receiving and sending signals during sending and receiving of information or during a call. Specifically, after receiving downlink data from the base station, it is processed by the processor 102; in addition, the uplink data is sent to the base station.
  • a radio frequency unit includes, but is not limited to, an antenna, at least one amplifier, a transceiver, a coupler, a low noise amplifier, a duplexer, and the like.
  • the radio frequency unit can also communicate with the network and other devices through the wireless communication system.
  • Electronic devices provide users with wireless broadband Internet access through network modules, such as helping users send and receive e-mails, browse web pages, and access streaming media.
  • the audio output unit may convert audio data received by the radio frequency unit or the network module or stored in the memory 101 into audio signals and output as sound. Also, the audio output unit may also provide audio output related to a specific function performed by the electronic device (eg, call signal reception sound, message reception sound, etc.).
  • the audio output unit includes speakers, buzzers, and receivers.
  • the signal receiving unit is used to receive audio or video signals.
  • the receiving unit may include a graphics processor (Graphics Processing Unit, GPU) and a microphone, and the graphics processor convolves image data of still pictures or videos obtained by an image capture device (such as a camera) in a video capture mode or an image capture mode .
  • the processed image frames can be displayed on the display unit.
  • the image frames processed by the graphics processor may be stored in memory (or other storage medium) or transmitted via a radio frequency unit or a network module.
  • the microphone can receive sound and can process such sound into audio data.
  • the processed audio data can be converted into a format that can be transmitted to a mobile communication base station via a radio frequency unit for output in the case of a telephone call mode.
  • the electronic device also includes at least one sensor, such as a light sensor, a motion sensor, and other sensors.
  • the light sensor includes an ambient light sensor and a proximity sensor, wherein the ambient light sensor can adjust the brightness of the display panel according to the brightness of the ambient light, and the proximity sensor can turn off the display panel and/or the backlight when the electronic device is moved to the ear .
  • the accelerometer sensor can detect the magnitude of acceleration in all directions (usually three axes), and can detect the magnitude and direction of gravity when stationary, which can be used to identify the posture of electronic devices (such as horizontal and vertical screen switching, related games , magnetometer attitude calibration), vibration recognition related functions (such as pedometer, tapping), etc.; sensors can also include fingerprint sensors, pressure sensors, iris sensors, molecular sensors, gyroscopes, barometers, hygrometers, thermometers, infrared Sensors, etc., will not be repeated here.
  • the display unit is used to display information input by the user or information provided to the user.
  • the display unit may include a display panel, and the display panel may be configured in the form of a liquid crystal display (Liquid Crystal Display, LCD), an organic light-emitting diode (Organic Light-Emitting Diode, OLED), and the like.
  • LCD Liquid Crystal Display
  • OLED Organic Light-Emitting Diode
  • the user receiving unit can be used for receiving inputted numerical or character information, and generating key signal input related to user setting and function control of the electronic device.
  • the user receiving unit includes a touch panel and other input devices.
  • a touch panel also known as a touch screen, collects user touch operations on or near it (such as a user's operations on or near the touch panel using a finger, stylus, or any suitable object or accessory).
  • the touch panel may include two parts, a touch detection device and a touch controller. Among them, the touch detection device detects the user's touch orientation, detects the signal brought by the touch operation, and transmits the signal to the touch controller; the touch controller receives the touch information from the touch detection device, converts it into contact coordinates, and then sends it to the touch controller.
  • the command sent by the processor 102 is received and executed.
  • the touch panel can be realized by various types of resistive, capacitive, infrared, and surface acoustic waves.
  • the user receiving unit may also include other input devices.
  • other input devices may include, but are not limited to, physical keyboards, function keys (such as volume control keys, switch keys, etc.), trackballs, mice, and joysticks, which will not be described herein again.
  • the touch panel can be overlaid on the display panel, and when the touch panel detects a touch operation on or near it, it transmits it to the processor 102 to determine the type of the touch event, and then the processor 102 determines the type of the touch event according to the type of the touch event.
  • the touch panel and the display panel are used as two independent components to realize the input and output functions of the electronic device, but in some embodiments, the touch panel and the display panel can be integrated to realize the input and output functions of the electronic device and output functions, which are not specifically limited here.
  • the interface unit is an interface for connecting an external device with an electronic device.
  • external devices may include wired or wireless headset ports, external power (or battery charger) ports, wired or wireless data ports, memory card ports, ports for connecting devices with identification modules, audio input/output (I/O) ports, video I/O ports, headphone ports, and more.
  • the interface unit may be used to receive input (eg, data information, power, etc.) from an external device and transmit the received input to one or more elements in the electronic device or may be used to communicate between the electronic device and the external device transfer data.
  • the electronic device may also include a power supply (such as a battery) for supplying power to various components.
  • a power supply such as a battery
  • the power supply may be logically connected to the processor 102 through a power management system, so as to manage charging, discharging, and power consumption management functions through the power management system. .
  • Embodiments of the present disclosure further provide a computer-readable storage medium, where a computer program is stored on the computer-readable storage medium, and when the computer program is executed by a processor, the method for training a dental model deformation model provided by the foregoing method embodiments is implemented.
  • embodiments of the present disclosure may be provided as a method, system, or computer program product. Accordingly, the present disclosure may take the form of an entirely hardware embodiment, an entirely software embodiment, or an embodiment combining software and hardware aspects. Furthermore, the present disclosure may take the form of a computer program product embodied on one or more computer-usable storage media having computer-usable program code embodied therein.
  • Computer readable media includes both persistent and non-permanent, removable and non-removable storage media.
  • a storage medium can be implemented by any method or technology for storing information, and the information can be computer readable instructions, data structures, modules of programs, or other data. Examples of computer storage media include, but are not limited to, phase-change memory (PRAM), static random access memory (SRAM), dynamic random access memory (DRAM), other types of random access memory (RAM), read only memory (ROM), Electrically Erasable Programmable Read Only Memory (EEPROM), Flash Memory or other memory technology, Compact Disc Read Only Memory (CD-ROM), Digital Versatile Disc (DVD) or other optical storage, Magnetic tape cartridges, magnetic disk storage or other magnetic storage devices or any other non-transmission medium that can be used to store information that can be accessed by a computing device.
  • computer-readable media does not include transitory computer-readable media, such as modulated data signals and carrier waves.
  • the method for training a dental model deformation model provided by the embodiment of the present disclosure can obtain a dental model deformation model, and obtain a tooth model that meets the requirements of a specific product by performing an initial model to be deformed according to the dental model deformation model. Therefore, the embodiments of the present disclosure provide The method for training a dental model deformation model and the obtained dental model deformation model can automatically convert the initial tooth model into a tooth model that meets specific product requirements, which has strong industrial practicability.

Landscapes

  • Engineering & Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Theoretical Computer Science (AREA)
  • Software Systems (AREA)
  • General Physics & Mathematics (AREA)
  • General Health & Medical Sciences (AREA)
  • Computing Systems (AREA)
  • Computational Linguistics (AREA)
  • Data Mining & Analysis (AREA)
  • Evolutionary Computation (AREA)
  • Biomedical Technology (AREA)
  • Molecular Biology (AREA)
  • Biophysics (AREA)
  • General Engineering & Computer Science (AREA)
  • Artificial Intelligence (AREA)
  • Mathematical Physics (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Health & Medical Sciences (AREA)
  • Computer Graphics (AREA)
  • Geometry (AREA)
  • Error Detection And Correction (AREA)

Abstract

Provided in the embodiments of the present disclosure are a method and apparatus for training a dental cast deformation model, which method and apparatus relate to the technical field of three-dimensional deformation. The method comprises: acquiring sample data, wherein the sample data comprises a plurality of initial dental casts acquired by means of scanning an oral cavity, and corresponding target deformation models obtained by means of artificially processing the initial dental casts; acquiring a feature tensor corresponding to each initial dental cast, wherein each element of the feature tensor corresponding to each initial dental cast is a TSDF value of each voxel in a cube space where each initial dental cast is located; inputting the feature tensor corresponding to each initial dental cast into a preset network model, so as to acquire a predicted deformation model corresponding to each initial dental cast; and according to a target deformation model and the predicted deformation model corresponding to each initial dental cast, optimizing the preset network model to acquire a dental cast deformation model. The embodiments of the present disclosure are used for acquiring a dental cast deformation model which can automatically convert an initial dental cast into a dental cast meeting specific product requirements.

Description

一种训练牙模形变模型的方法及装置A method and device for training a dental model deformation model
本公开要求于2021年03月17日提交中国专利局、申请号为202110287715.X、发明名称为“一种训练牙模形变模型的方法及装置”的中国专利申请的优先权,其全部内容通过引用结合在本公开中。This disclosure claims the priority of the Chinese patent application filed on March 17, 2021 with the application number 202110287715.X and titled "A method and device for training a dental model deformation model", the entire contents of which are approved by References are incorporated in this disclosure.
技术领域technical field
本公开涉及三维形变技术领域,尤其涉及一种训练牙模形变模型的方法及装置。The present disclosure relates to the technical field of three-dimensional deformation, and in particular, to a method and device for training a deformation model of a dental mold.
背景技术Background technique
牙齿数字化技术旨在对牙齿进行3D建模得到数字化的牙齿模型,从而实现后续加工和个性化定制。Tooth digitization technology aims at 3D modeling of teeth to obtain digital tooth models, so as to realize subsequent processing and personalized customization.
一般情况下,最终实际使用的牙齿模型都不是扫描口腔并进行3D重建而得到的初始牙齿模型,而是基于特定产品要求对初始牙齿模型进行进一步加工,进而获取的符合特定产品要求的牙齿模型。其中,基于特定产品要求对初始牙齿模型进行加工获取符合特定产品要求的牙齿模型的过程称为3D牙模形变。目前,3D牙模形变普遍是基于人工完成的。即,人基于特定产品要求手动对初始牙齿模型进行加工从而使得初始牙齿模型满足特定的产品要求。然而,基于人工完成3D牙模形变具有效率低、成本高、质量不可靠等诸多弊端,因此如何自动化的将初始牙齿模型转换为符合特定产品要求的牙齿模型已成为本领域亟待解决的一个问题。In general, the final actual tooth model is not the initial tooth model obtained by scanning the oral cavity and performing 3D reconstruction, but the initial tooth model is further processed based on the specific product requirements, and then the tooth model that meets the specific product requirements is obtained. Among them, the process of processing an initial tooth model based on specific product requirements to obtain a tooth model that meets specific product requirements is called 3D dental model deformation. At present, the deformation of 3D dental models is generally done manually. That is, a person manually processes the initial tooth model based on the specific product requirements so that the initial tooth model meets the specific product requirements. However, manually completing the 3D dental model deformation has many disadvantages, such as low efficiency, high cost, and unreliable quality. Therefore, how to automatically convert the initial tooth model into a tooth model that meets the requirements of a specific product has become an urgent problem in the field to be solved.
发明内容SUMMARY OF THE INVENTION
(一)要解决的技术问题(1) Technical problems to be solved
基于人工完成3D牙模形变具有效率低、成本高、质量不可靠等问题。Manually completing the 3D dental model deformation has problems such as low efficiency, high cost, and unreliable quality.
(二)技术方案(2) Technical solutions
为了解决上述问题,本公开实施例提供技术方案如下:In order to solve the above problems, the technical solutions provided by the embodiments of the present disclosure are as follows:
第一方面,本公开实施例提供了一种训练牙模形变模型的方法,包括:In a first aspect, an embodiment of the present disclosure provides a method for training a dental model deformation model, including:
获取样本数据,所述样本数据包括扫描口腔获取的多个初始牙齿模型以及对各初始牙齿模型进行人工加工得到的各初始牙齿模型对应的目标形变模型;Obtaining sample data, the sample data includes a plurality of initial tooth models obtained by scanning the oral cavity and a target deformation model corresponding to each initial tooth model obtained by manually processing each initial tooth model;
获取各初始牙齿模型对应的特征张量,各初始牙齿模型对应的特征张量的各元素为各初始牙齿模型所在立方体空间中各个体素的截断符号距离函数TSDF值;Obtain the feature tensor corresponding to each initial tooth model, and each element of the feature tensor corresponding to each initial tooth model is the truncated signed distance function TSDF value of each voxel in the cubic space where each initial tooth model is located;
将各初始牙齿模型对应的特征张量输入预设网络模型,获取各初始牙齿模型对应的预测形变模型;Input the feature tensor corresponding to each initial tooth model into the preset network model, and obtain the predicted deformation model corresponding to each initial tooth model;
根据各初始牙齿模型对应的目标形变模型和预测形变模型,对所述预设网络模型进行优化获取牙模形变模型。According to the target deformation model and the predicted deformation model corresponding to each initial tooth model, the preset network model is optimized to obtain the tooth model deformation model.
作为本公开实施例一种可选的实施方式,所述预设网络模型,包括:由多个串联结构的编码器组成的编码器组件、自注意力组件、特征传递组件、多尺度分析组件以及由多个串联结构的解码器组成的解码器组件;所述编码器组件的输入为所述预设网络模型的输入,所述编码器组件的输出为所述自注意力组件的输入;所述自注意力组件的输出为所述特征传递组件的输入;所述特征传递组件的输出为所述多尺度分析组件的输入,所述多尺度分析组件的输出为所述解码器组件的输入,所述解码器组件的输出为所述预设网络模型的输出;As an optional implementation manner of the embodiment of the present disclosure, the preset network model includes: an encoder component composed of multiple encoders with a serial structure, a self-attention component, a feature transfer component, a multi-scale analysis component, and a decoder component composed of a plurality of decoders in a series structure; the input of the encoder component is the input of the preset network model, and the output of the encoder component is the input of the self-attention component; the The output of the self-attention component is the input of the feature transfer component; the output of the feature transfer component is the input of the multi-scale analysis component, and the output of the multi-scale analysis component is the input of the decoder component, so The output of the decoder component is the output of the preset network model;
其中,所述自注意力组件用于对所述编码器组件输出的特征张量进行非局部信息提取,获取环境特征张量;所述特征传递组件对所述自注意力组件的输出进行处理,并将处理结果传递至所述多尺度分析组件;所述多尺度分析组件用于提取所述特征传递组件输出的特征张量在多个尺度下的特征张量。Wherein, the self-attention component is used to extract non-local information on the feature tensor output by the encoder component to obtain an environment feature tensor; the feature transfer component processes the output of the self-attention component, and transmit the processing result to the multi-scale analysis component; the multi-scale analysis component is used for extracting the feature tensors in multiple scales of the feature tensor output by the feature transfer component.
作为本公开实施例一种可选的实施方式,所述编码器组件包括三个串联结构的编码器,各编码器包括一残差单元和一下采样单元;各编码器的残差单元用于通过三个串联结构的卷积层对残差单元的输入进行卷积操作并对所述卷积操作的卷积结果和残差单元的输入执行加和操作,各编码器的下采样单元用于将输入特征张量下采样为通道数为输入特征张量的通道数的两倍,长、宽、高为输入特征张量的长、宽、高的二分之一的输出特征张量;As an optional implementation manner of the embodiment of the present disclosure, the encoder component includes three encoders with a serial structure, each encoder includes a residual unit and a down-sampling unit; the residual unit of each encoder is used to pass The three convolutional layers of concatenated structure perform a convolution operation on the input of the residual unit and perform a summation operation on the convolution result of the convolution operation and the input of the residual unit, and the downsampling unit of each encoder is used to The input feature tensor downsampling is the output feature tensor whose number of channels is twice the number of channels of the input feature tensor, and the length, width and height are half of the length, width and height of the input feature tensor;
各残差单元的输入均为所属的编码器的输入,各下采样单元的输入均为所属的编码器的残差单元输出,各下采样单元的输出均为所属的编码器的输出,第一个编码器的输入为编码器组件的输入,第三个编码器的输出为编码器组件的输出,第二编码器和第三个编码器的输入分别为第一个编码器和第二个编码器的输出。The input of each residual unit is the input of the encoder to which it belongs, the input of each down-sampling unit is the output of the residual unit of the encoder to which it belongs, and the output of each down-sampling unit is the output of the encoder to which it belongs. The input of the first encoder is the input of the encoder component, the output of the third encoder is the output of the encoder component, and the input of the second encoder and the third encoder are the first encoder and the second encoder respectively. output of the device.
作为本公开实施例一种可选的实施方式,所述自注意力组件包括:残差单元、第一卷积层、第二卷积层、第三卷积层、第四卷积层、第五卷积层、第六卷积层、第一点积单元、第二点积单元、第一加和单元以及第二加和单元;所述残差单元用于通过三个串联结构的卷积层对残差单元的输入进行卷积操作并对所述卷积操作的卷积结果和残差单元的输入执行加和操作,所述第一点积单元和第二点积单元用于对输入特征张量执行点积操作,所述第一加和单元和第二加和单元用于对输入特征张量执行加和操作;As an optional implementation manner of the embodiment of the present disclosure, the self-attention component includes: a residual unit, a first convolutional layer, a second convolutional layer, a third convolutional layer, a fourth convolutional layer, a Five convolution layers, sixth convolution layer, first dot product unit, second dot product unit, first sum unit, and second sum unit; the residual unit is used for convolution through three concatenated structures The layer performs a convolution operation on the input of the residual unit and performs a summation operation on the convolution result of the convolution operation and the input of the residual unit, and the first dot product unit and the second dot product unit are used for the input. The feature tensor performs a dot product operation, and the first summation unit and the second summation unit are used to perform a summation operation on the input feature tensor;
所述残差单元的输入为所述编码器组件的输出,所述残差单元的输出为所述第一卷积层的输入;所述第一卷积层的输出为所述第二卷积层、所述第三卷积层、所述第四卷积层的输入;所述第一点积单元的输入为所述第二卷积层的输出和所述第三卷积层的输出;所述第二点积单元的输入为所述第一点积单元的输出和所述第四卷积层的输出;所述第五卷积层的输入为所述第二点积单元的输出;所述第一加和单元的输入为所述第五卷积层的输出和所述第一卷积层的输出;所述第六卷积层的输入为所述第一加和单元的输出;所述第二加和单元的输入为所述第六卷积层的输出和所述残差单元的输出,所述第二加和单元的输出为所述自注意力组件的输出。The input of the residual unit is the output of the encoder component, the output of the residual unit is the input of the first convolutional layer; the output of the first convolutional layer is the second convolutional layer layer, the third convolution layer, the input of the fourth convolution layer; the input of the first dot product unit is the output of the second convolution layer and the output of the third convolution layer; The input of the second dot product unit is the output of the first dot product unit and the output of the fourth convolution layer; the input of the fifth convolution layer is the output of the second dot product unit; The input of the first summing unit is the output of the fifth convolutional layer and the output of the first convolutional layer; the input of the sixth convolutional layer is the output of the first summing unit; The input of the second summing unit is the output of the sixth convolutional layer and the output of the residual unit, and the output of the second summing unit is the output of the self-attention component.
作为本公开实施例一种可选的实施方式,所述特征传递组件包括一个下采样单元和一个残差单元;As an optional implementation manner of the embodiment of the present disclosure, the feature transfer component includes a downsampling unit and a residual unit;
所述下采样单元用于将输入特征张量下采样为通道数为输入特征张量的通道数的两倍,长、宽、高为输入特征张量的长、宽、高的二分之一的输出特征张量,所述残差单元用于通过三个串联结构的卷积层对残差单元的输入进行卷积操作并对所述卷积操作的卷积结果和残差单元的输入执行加和操作;The downsampling unit is used for downsampling the input feature tensor to a number of channels that is twice the number of channels of the input feature tensor, and the length, width and height are half of the length, width and height of the input feature tensor. The output feature tensor of add operation;
所述下采样单元的输入为所述自注意力组件的输出,所述下采样单元的输出为所述残差单元的输入,所述残差单元的输出为所述特征传递组件的输出。The input of the downsampling unit is the output of the self-attention component, the output of the downsampling unit is the input of the residual unit, and the output of the residual unit is the output of the feature transfer component.
作为本公开实施例一种可选的实施方式,所述多尺度分析组件包括:第七卷积层、第八卷积层、第九卷积层、第十卷积层、第十一卷积层、第十二卷积层以及拼接单元;所述第七卷积层、所述第八卷积层、所述第九卷积层、所述第十卷积层的扩张率均不同;所述拼接单元用于对输入的特征张量执行拼接操作;As an optional implementation manner of the embodiment of the present disclosure, the multi-scale analysis component includes: a seventh convolution layer, an eighth convolution layer, a ninth convolution layer, a tenth convolution layer, and an eleventh convolution layer layer, the twelfth convolution layer and the splicing unit; the expansion rates of the seventh convolution layer, the eighth convolution layer, the ninth convolution layer, and the tenth convolution layer are different; The splicing unit is used to perform a splicing operation on the input feature tensor;
所述第七卷积层、所述第八卷积层、所述第九卷积层以及所述第十卷积层的输入均为所述特征传递组件的输出,所述拼接单元的输入为所述特征传递组件的输出、所述第七卷积层的输出、所述第八卷积层的输出、所述第九卷积层的输出以及所述第十卷积层的输出,所述第十一卷积层的输入为所述拼接单元的输出,所述第十二卷积层的输入为所述第十一卷积层的输出;所述第十二卷积层的输出为所述多尺度分析组件的输出。The inputs of the seventh convolutional layer, the eighth convolutional layer, the ninth convolutional layer and the tenth convolutional layer are the outputs of the feature transfer component, and the inputs of the splicing unit are the output of the feature transfer component, the output of the seventh convolutional layer, the output of the eighth convolutional layer, the output of the ninth convolutional layer, and the output of the tenth convolutional layer, the The input of the eleventh convolutional layer is the output of the splicing unit, the input of the twelfth convolutional layer is the output of the eleventh convolutional layer; the output of the twelfth convolutional layer is the output of the Multiscale Analysis component described above.
作为本公开实施例一种可选的实施方式,所述解码器组件包括四个串联结构的解码 器;各解码器包括:上采样单元、融合单元以及残差单元;各解码器的残差单元用于通过三个串联结构的卷积层对残差单元的输入进行卷积操作并对所述卷积操作的卷积结果和残差单元的输入执行加和操作,各解码器的融合单元均用于对输入的特征张量进行融合操作;各解码器的上采样单元均用于将输入特征张量上采样为通道数为输入特征张量的通道数的二分之一,长、宽、高为输入特征张量的长、宽、高的二倍的输出特征张量;As an optional implementation manner of the embodiment of the present disclosure, the decoder component includes four decoders in a serial structure; each decoder includes: an upsampling unit, a fusion unit, and a residual unit; the residual unit of each decoder It is used to perform a convolution operation on the input of the residual unit through three convolutional layers in a concatenated structure and perform a summation operation on the convolution result of the convolution operation and the input of the residual unit, and the fusion units of each decoder are all It is used to fuse the input feature tensor; the upsampling unit of each decoder is used to upsample the input feature tensor to a channel number that is half the number of channels of the input feature tensor, and the length, width, The output feature tensor whose height is twice the length, width and height of the input feature tensor;
所述解码器组件的第一个解码器的上采样单元的输入为多尺度分析组件的输出,所述第一个解码器的融合单元的输入为所述第一个解码器的上采样单元的输出和所述自注意力模块的输出;所述第一个解码器的残差单元的输入为所述第一个解码器的融合单元的输出和所述自注意力模块的输出;所述解码器组件的第二个解码器、第三个解码、第四个解码器的上采样单元的输入均为上一个解码器的输出,所述解码器组件的第二个解码器、第三个解码、第四个解码器的融合单元的输入分别对应的编码器的残差单元的输出和所属的解码器的上采样单元输出,所述解码器组件的第二个解码器、第三个解码、第四个解码器的残差单元的输入分别为对应的编码器的残差单元的输出和所属的解码器的融合单元的输出。The input of the upsampling unit of the first decoder of the decoder component is the output of the multi-scale analysis component, and the input of the fusion unit of the first decoder is the upsampling unit of the first decoder. output and the output of the self-attention module; the input of the residual unit of the first decoder is the output of the fusion unit of the first decoder and the output of the self-attention module; the decoding The inputs of the second decoder, the third decoder, and the up-sampling unit of the fourth decoder of the decoder component are the outputs of the previous decoder, and the second decoder and third decoder of the decoder component , the input of the fusion unit of the fourth decoder corresponds to the output of the residual unit of the encoder and the output of the upsampling unit of the corresponding decoder respectively, the second decoder of the decoder component, the third decoding, The input of the residual unit of the fourth decoder is the output of the corresponding residual unit of the encoder and the output of the corresponding fusion unit of the decoder.
作为本公开实施例一种可选的实施方式,各解码器的融合单元包括:第十三卷积层、第十四卷积层、第十五卷积层、第三加和单元、第四加和单元第三点积单元以及第四点积单元;所述第三加和单元和第四加和单元用于对输入执行加和操作,所述第三点积单元和第四点积单元用于对输入执行点积操作;As an optional implementation of the embodiment of the present disclosure, the fusion unit of each decoder includes: a thirteenth convolutional layer, a fourteenth convolutional layer, a fifteenth convolutional layer, a third summation unit, and a fourth summation unit Units a third dot product unit and a fourth dot product unit; the third and fourth summation units are used to perform an addition operation on the input, and the third and fourth dot product units are used to perform an addition operation on the input perform a dot product operation;
所述解码器组件的第一个解码器的融合单元的第十三卷积层和第十四卷积层输入分别为所述第一个解码器的上采样单元的输出和所述自注意力组件的输出,所述解码器组件的第二个解码器、第三个解码器、第四个解码器的融合单元的输入为所属的解码器的上采样单元的输出和对应的编码器的残差单元的输出,所述第三加和单元的输入为所述第十三卷积层的输出和所述第十四卷积层的输出,所述第十五卷积层的输入为所述第三加和单元的输出,所述第三点积单元的输入为所述第十三卷积层的输出和所述第十五卷积层的输出,所述第四点积单元的输入为所述第十四卷积层的输出和所述第十五卷积层的输出,所述第四加和单元的输入为所述第三点积单元的输出和所述第四点积单元的输出,所述第四加和单元的输出为所属的融合单元的输出。The inputs of the thirteenth convolutional layer and the fourteenth convolutional layer of the fusion unit of the first decoder of the decoder component are the output of the upsampling unit of the first decoder and the self-attention, respectively The output of the component, the input of the fusion unit of the second decoder, the third decoder, and the fourth decoder of the decoder component is the output of the upsampling unit of the corresponding decoder and the residual of the corresponding encoder. The output of the difference unit, the input of the third summation unit is the output of the thirteenth convolutional layer and the output of the fourteenth convolutional layer, and the input of the fifteenth convolutional layer is the third summation output of the unit, the input of the third dot product unit is the output of the thirteenth convolution layer and the output of the fifteenth convolution layer, and the input of the fourth dot product unit is the tenth The output of the fourth convolutional layer and the output of the fifteenth convolutional layer, the input of the fourth summing unit is the output of the third dot product unit and the output of the fourth dot product unit, the The output of the fourth summing unit is the output of the associated fusion unit.
作为本公开实施例一种可选的实施方式,所述根据各初始牙齿模型对应的目标形变模型和预测形变模型,对所述预设网络模型进行优化获取牙模形变模型,包括:As an optional implementation manner of the embodiment of the present disclosure, according to the target deformation model and the predicted deformation model corresponding to each initial tooth model, the preset network model is optimized to obtain the tooth model deformation model, including:
构建损失函数,并根据所述损失函数、各初始牙齿模型对应的目标形变模型以及预测形变模型,对所述预设网络模型进行优化获取牙模形变模型;Constructing a loss function, and optimizing the preset network model to obtain a tooth model deformation model according to the loss function, the target deformation model corresponding to each initial tooth model, and the predicted deformation model;
其中,所述损失函数包括:Wherein, the loss function includes:
Figure PCTCN2022081543-appb-000001
Figure PCTCN2022081543-appb-000001
Figure PCTCN2022081543-appb-000002
Figure PCTCN2022081543-appb-000002
Figure PCTCN2022081543-appb-000003
Figure PCTCN2022081543-appb-000003
其中,alpha为常数,out i依次根据所述多尺度分析组件的输出、所述解码器组件的各个解码器的输出进行处理得到的数据,seg为中间督导信号、mean()为求平均值函数。 Wherein, alpha is a constant, out i is the data obtained by sequentially processing the output of the multi-scale analysis component and the output of each decoder of the decoder component, seg is the intermediate supervision signal, and mean() is the averaging function .
第二方面,本公开实施例提供一种建立牙模形变模型的装置,包括:In a second aspect, an embodiment of the present disclosure provides a device for establishing a dental mold deformation model, including:
样本获取单元,用于获取样本数据,所述样本数据包括扫描口腔获取的多个初始牙齿模型以及对各初始牙齿模型进行人工加工得到的各初始牙齿模型对应的目标形变模 型;a sample acquisition unit, used for acquiring sample data, the sample data includes a plurality of initial tooth models obtained by scanning the oral cavity and a target deformation model corresponding to each initial tooth model obtained by manually processing each initial tooth model;
预处理单元,用于获取各初始牙齿模型对应的特征张量,各初始牙齿模型对应的特征张量的各元素为各初始牙齿模型所在立方体空间中各个体素的截断符号距离函数值TSDF值;The preprocessing unit is used to obtain the feature tensor corresponding to each initial tooth model, and each element of the feature tensor corresponding to each initial tooth model is the truncated signed distance function value TSDF value of each voxel in the cubic space where each initial tooth model is located;
预测单元,用于将各初始牙齿模型对应的特征张量输入预设网络模型,获取各初始牙齿模型对应的预测形变模型;The prediction unit is used to input the feature tensor corresponding to each initial tooth model into the preset network model, and obtain the predicted deformation model corresponding to each initial tooth model;
优化单元,用于根据各初始牙齿模型对应的目标形变模型和预测形变模型,对所述预设网络模型进行优化获取牙模形变模型。The optimization unit is configured to optimize the preset network model according to the target deformation model and the predicted deformation model corresponding to each initial tooth model to obtain the tooth model deformation model.
作为本公开实施例一种可选的实施方式,所述预设网络模型,包括:由多个串联结构的编码器组成的编码器组件、自注意力组件、特征传递组件、多尺度分析组件以及由多个串联结构的解码器组成的解码器组件;所述编码器组件的输入为所述预设网络模型的输入,所述编码器组件的输出为所述自注意力组件的输入;所述自注意力组件的输出为所述特征传递组件的输入;所述特征传递组件的输出为所述多尺度分析组件的输入,所述多尺度分析组件的输出为所述解码器组件的输入,所述解码器组件的输出为所述预设网络模型的输出;As an optional implementation manner of the embodiment of the present disclosure, the preset network model includes: an encoder component composed of multiple encoders with a serial structure, a self-attention component, a feature transfer component, a multi-scale analysis component, and a decoder component composed of a plurality of decoders in a series structure; the input of the encoder component is the input of the preset network model, and the output of the encoder component is the input of the self-attention component; the The output of the self-attention component is the input of the feature transfer component; the output of the feature transfer component is the input of the multi-scale analysis component, and the output of the multi-scale analysis component is the input of the decoder component, so The output of the decoder component is the output of the preset network model;
其中,所述自注意力组件用于对所述编码器组件输出的特征张量进行非局部信息提取,获取环境特征张量;所述特征传递组件对所述自注意力组件的输出进行处理,并将处理结果传递至所述多尺度分析组件;所述多尺度分析组件用于提取所述特征传递组件输出的特征张量在多个尺度下的特征张量。Wherein, the self-attention component is used to extract non-local information on the feature tensor output by the encoder component to obtain an environment feature tensor; the feature transfer component processes the output of the self-attention component, and transmit the processing result to the multi-scale analysis component; the multi-scale analysis component is used for extracting the feature tensors in multiple scales of the feature tensor output by the feature transfer component.
作为本公开实施例一种可选的实施方式,所述编码器组件包括三个串联结构的编码器,各编码器包括一残差单元和一下采样单元;各编码器的残差单元用于通过三个串联结构的卷积层对残差单元的输入进行卷积操作并对所述卷积操作的卷积结果和残差单元的输入执行加和操作,各编码器的下采样单元用于将输入特征张量下采样为通道数为输入特征张量的通道数的两倍,长、宽、高为输入特征张量的长、宽、高的二分之一的输出特征张量;As an optional implementation manner of the embodiment of the present disclosure, the encoder component includes three encoders with a serial structure, each encoder includes a residual unit and a down-sampling unit; the residual unit of each encoder is used to pass The three convolutional layers of concatenated structure perform a convolution operation on the input of the residual unit and perform a summation operation on the convolution result of the convolution operation and the input of the residual unit, and the downsampling unit of each encoder is used to The input feature tensor downsampling is the output feature tensor whose number of channels is twice the number of channels of the input feature tensor, and the length, width and height are half of the length, width and height of the input feature tensor;
各残差单元的输入均为所属的编码器的输入,各下采样单元的输入均为所属的编码器的残差单元输出,各下采样单元的输出均为所属的编码器的输出,第一个编码器的输入为编码器组件的输入,第三个编码器的输出为编码器组件的输出,第二编码器和第三个编码器的输入分别为第一个编码器和第二个编码器的输出。The input of each residual unit is the input of the encoder to which it belongs, the input of each down-sampling unit is the output of the residual unit of the encoder to which it belongs, and the output of each down-sampling unit is the output of the encoder to which it belongs. The input of the first encoder is the input of the encoder component, the output of the third encoder is the output of the encoder component, and the input of the second encoder and the third encoder are the first encoder and the second encoder respectively. output of the device.
作为本公开实施例一种可选的实施方式,所述自注意力组件包括:残差单元、第一卷积层、第二卷积层、第三卷积层、第四卷积层、第五卷积层、第六卷积层、第一点积单元、第二点积单元、第一加和单元以及第二加和单元;所述残差单元用于通过三个串联结构的卷积层对残差单元的输入进行卷积操作并对所述卷积操作的卷积结果和残差单元的输入执行加和操作,所述第一点积单元和第二点积单元用于对输入特征张量执行点积操作,所述第一加和单元和第二加和单元用于对输入特征张量执行加和操作;As an optional implementation manner of the embodiment of the present disclosure, the self-attention component includes: a residual unit, a first convolutional layer, a second convolutional layer, a third convolutional layer, a fourth convolutional layer, a Five convolution layers, sixth convolution layer, first dot product unit, second dot product unit, first sum unit, and second sum unit; the residual unit is used for convolution through three concatenated structures The layer performs a convolution operation on the input of the residual unit and performs a summation operation on the convolution result of the convolution operation and the input of the residual unit, and the first dot product unit and the second dot product unit are used for the input. The feature tensor performs a dot product operation, and the first summation unit and the second summation unit are used to perform a summation operation on the input feature tensor;
所述残差单元的输入为所述编码器组件的输出,所述残差单元的输出为所述第一卷积层的输入;所述第一卷积层的输出为所述第二卷积层、所述第三卷积层、所述第四卷积层的输入;所述第一点积单元的输入为所述第二卷积层的输出和所述第三卷积层的输出;所述第二点积单元的输入为所述第一点积单元的输出和所述第四卷积层的输出;所述第五卷积层的输入为所述第二点积单元的输出;所述第一加和单元的输入为所述第五卷积层的输出和所述第一卷积层的输出;所述第六卷积层的输入为所述第一加和单元的输出;所述第二加和单元的输入为所述第六卷积层的输出和所述残差单元的输出,所述第二加和单元的输出为所述自注意力组件的输出。The input of the residual unit is the output of the encoder component, the output of the residual unit is the input of the first convolutional layer; the output of the first convolutional layer is the second convolutional layer layer, the third convolution layer, the input of the fourth convolution layer; the input of the first dot product unit is the output of the second convolution layer and the output of the third convolution layer; The input of the second dot product unit is the output of the first dot product unit and the output of the fourth convolution layer; the input of the fifth convolution layer is the output of the second dot product unit; The input of the first summing unit is the output of the fifth convolutional layer and the output of the first convolutional layer; the input of the sixth convolutional layer is the output of the first summing unit; The input of the second summing unit is the output of the sixth convolutional layer and the output of the residual unit, and the output of the second summing unit is the output of the self-attention component.
作为本公开实施例一种可选的实施方式,所述特征传递组件包括一个下采样单元和一个残差单元;As an optional implementation manner of the embodiment of the present disclosure, the feature transfer component includes a downsampling unit and a residual unit;
所述下采样单元用于将输入特征张量下采样为通道数为输入特征张量的通道数的 两倍,长、宽、高为输入特征张量的长、宽、高的二分之一的输出特征张量,所述残差单元用于通过三个串联结构的卷积层对残差单元的输入进行卷积操作并对所述卷积操作的卷积结果和残差单元的输入执行加和操作;The downsampling unit is used for downsampling the input feature tensor to a number of channels that is twice the number of channels of the input feature tensor, and the length, width and height are half of the length, width and height of the input feature tensor. The output feature tensor of add operation;
所述下采样单元的输入为所述自注意力组件的输出,所述下采样单元的输出为所述残差单元的输入,所述残差单元的输出为所述特征传递组件的输出。The input of the downsampling unit is the output of the self-attention component, the output of the downsampling unit is the input of the residual unit, and the output of the residual unit is the output of the feature transfer component.
作为本公开实施例一种可选的实施方式,所述多尺度分析组件包括:第七卷积层、第八卷积层、第九卷积层、第十卷积层、第十一卷积层、第十二卷积层以及拼接单元;所述第七卷积层、所述第八卷积层、所述第九卷积层、所述第十卷积层扩张率均不同;所述拼接单元用于对输入的特征张量执行拼接操作;As an optional implementation manner of the embodiment of the present disclosure, the multi-scale analysis component includes: a seventh convolution layer, an eighth convolution layer, a ninth convolution layer, a tenth convolution layer, and an eleventh convolution layer layer, the twelfth convolution layer and the splicing unit; the expansion rates of the seventh convolution layer, the eighth convolution layer, the ninth convolution layer, and the tenth convolution layer are all different; the The stitching unit is used to perform a stitching operation on the input feature tensor;
所述第七卷积层、所述第八卷积层、所述第九卷积层以及所述第十卷积层的输入均为所述特征传递组件的输出,所述拼接单元的输入为所述特征传递组件的输出、所述第七卷积层的输出、所述第八卷积层的输出、所述第九卷积层的输出以及所述第十卷积层的输出,所述第十一卷积层的输入为所述拼接单元的输出,所述第十二卷积层的输入为所述第十一卷积层的输出;所述第十二卷积层的输出为所述多尺度分析组件的输出。The inputs of the seventh convolutional layer, the eighth convolutional layer, the ninth convolutional layer and the tenth convolutional layer are the outputs of the feature transfer component, and the inputs of the splicing unit are the output of the feature transfer component, the output of the seventh convolutional layer, the output of the eighth convolutional layer, the output of the ninth convolutional layer, and the output of the tenth convolutional layer, the The input of the eleventh convolutional layer is the output of the splicing unit, the input of the twelfth convolutional layer is the output of the eleventh convolutional layer; the output of the twelfth convolutional layer is the output of the Multiscale Analysis component described above.
作为本公开实施例一种可选的实施方式,所述解码器组件包括四个串联结构的解码器;各解码器包括:上采样单元、融合单元以及残差单元;各解码器的残差单元用于通过三个串联结构的卷积层对残差单元的输入进行卷积操作并对所述卷积操作的卷积结果和残差单元的输入执行加和操作,各解码器的融合单元均用于对输入的特征张量进行融合操作;各解码器的上采样单元均用于将输入特征张量上采样为通道数为输入特征张量的通道数的二分之一,长、宽、高为输入特征张量的长、宽、高的二倍的输出特征张量;As an optional implementation manner of the embodiment of the present disclosure, the decoder component includes four decoders in a serial structure; each decoder includes: an upsampling unit, a fusion unit, and a residual unit; the residual unit of each decoder It is used to perform a convolution operation on the input of the residual unit through three convolutional layers in a concatenated structure and perform a summation operation on the convolution result of the convolution operation and the input of the residual unit, and the fusion units of each decoder are all It is used to fuse the input feature tensor; the upsampling unit of each decoder is used to upsample the input feature tensor to a channel number that is half the number of channels of the input feature tensor, and the length, width, The output feature tensor whose height is twice the length, width and height of the input feature tensor;
所述解码器组件的第一个解码器的上采样单元的输入为多尺度分析组件的输出,所述第一个解码器的融合单元的输入为所述第一个解码器的上采样单元的输出和所述自注意力模块的输出;所述第一个解码器的残差单元的输入为所述第一个解码器的融合单元的输出和所述自注意力模块的输出;所述解码器组件的第二个解码器、第三个解码、第四个解码器的上采样单元的输入均为上一个解码器的输出,所述解码器组件的第二个解码器、第三个解码、第四个解码器的融合单元的输入分别对应的编码器的残差单元的输出和所属的解码器的上采样单元输出,所述解码器组件的第二个解码器、第三个解码、第四个解码器的残差单元的输入分别为对应的编码器的残差单元的输出和所属的解码器的融合单元的输出。The input of the upsampling unit of the first decoder of the decoder component is the output of the multi-scale analysis component, and the input of the fusion unit of the first decoder is the upsampling unit of the first decoder. output and the output of the self-attention module; the input of the residual unit of the first decoder is the output of the fusion unit of the first decoder and the output of the self-attention module; the decoding The inputs of the second decoder, the third decoder, and the up-sampling unit of the fourth decoder of the decoder component are the outputs of the previous decoder, and the second decoder and third decoder of the decoder component , the input of the fusion unit of the fourth decoder corresponds to the output of the residual unit of the encoder and the output of the upsampling unit of the corresponding decoder respectively, the second decoder of the decoder component, the third decoding, The input of the residual unit of the fourth decoder is the output of the corresponding residual unit of the encoder and the output of the corresponding fusion unit of the decoder.
作为本公开实施例一种可选的实施方式,各解码器的融合单元包括:第十三卷积层、第十四卷积层、第十五卷积层、第三加和单元、第四加和单元第三点积单元以及第四点积单元;所述第三加和单元和第四加和单元用于对输入执行加和操作,所述第三点积单元和第四点积单元用于对输入执行点积操作;As an optional implementation of the embodiment of the present disclosure, the fusion unit of each decoder includes: a thirteenth convolutional layer, a fourteenth convolutional layer, a fifteenth convolutional layer, a third summation unit, and a fourth summation unit Units a third dot product unit and a fourth dot product unit; the third and fourth summation units are used to perform an addition operation on the input, and the third and fourth dot product units are used to perform an addition operation on the input perform a dot product operation;
所述解码器组件的第一个解码器的融合单元的第十三卷积层和第十四卷积层输入分别为所述第一个解码器的上采样单元的输出和所述自注意力组件的输出,所述解码器组件的第二个解码器、第三个解码器、第四个解码器的融合单元的输入为所属的解码器的上采样单元的输出和对应的编码器的残差单元的输出,所述第三加和单元的输入为所述第十三卷积层的输出和所述第十四卷积层的输出,所述第十五卷积层的输入为所述第三加和单元的输出,所述第三点积单元的输入为所述第十三卷积层的输出和所述第十五卷积层的输出,所述第四点积单元的输入为所述第十四卷积层的输出和所述第十五卷积层的输出,所述第四加和单元的输入为所述第三点积单元的输出和所述第四点积单元的输出,所述第四加和单元的输出为所属的融合单元的输出。The inputs of the thirteenth convolutional layer and the fourteenth convolutional layer of the fusion unit of the first decoder of the decoder component are the output of the upsampling unit of the first decoder and the self-attention, respectively The output of the component, the input of the fusion unit of the second decoder, the third decoder, and the fourth decoder of the decoder component is the output of the upsampling unit of the corresponding decoder and the residual of the corresponding encoder. The output of the difference unit, the input of the third summation unit is the output of the thirteenth convolutional layer and the output of the fourteenth convolutional layer, and the input of the fifteenth convolutional layer is the third summation output of the unit, the input of the third dot product unit is the output of the thirteenth convolution layer and the output of the fifteenth convolution layer, and the input of the fourth dot product unit is the tenth The output of the fourth convolutional layer and the output of the fifteenth convolutional layer, the input of the fourth summing unit is the output of the third dot product unit and the output of the fourth dot product unit, the The output of the fourth summing unit is the output of the associated fusion unit.
作为本公开实施例一种可选的实施方式,所述优化单元,具体用于构建损失函数,并根据所述损失函数、各初始牙齿模型对应的目标形变模型以及预测形变模型,对所述预设网络模型进行优化获取牙模形变模型;As an optional implementation manner of the embodiment of the present disclosure, the optimization unit is specifically configured to construct a loss function, and according to the loss function, the target deformation model corresponding to each initial tooth model, and the predicted deformation model, perform a The network model is optimized to obtain the dental model deformation model;
其中,所述损失函数包括:Wherein, the loss function includes:
Figure PCTCN2022081543-appb-000004
Figure PCTCN2022081543-appb-000004
Figure PCTCN2022081543-appb-000005
Figure PCTCN2022081543-appb-000005
Figure PCTCN2022081543-appb-000006
Figure PCTCN2022081543-appb-000006
其中,alpha为常数,out i依次根据所述多尺度分析组件的输出、所述解码器组件的各个解码器的输出进行处理得到的数据,seg为中间督导信号、mean()为求平均值函数。 Wherein, alpha is a constant, out i is the data obtained by sequentially processing the output of the multi-scale analysis component and the output of each decoder of the decoder component, seg is the intermediate supervision signal, and mean() is the averaging function .
第三方面,本公开实施例提供一种电子设备,包括:存储器和处理器,存储器用于存储计算机程序;处理器用于在调用计算机程序时执行第一方面或第一方面任一种可选的实施方式所述的训练牙模形变模型的方法。In a third aspect, an embodiment of the present disclosure provides an electronic device, including: a memory and a processor, where the memory is used to store a computer program; the processor is used to execute the first aspect or any optional option of the first aspect when the computer program is invoked The method for training a dental model deformation model according to the embodiment.
第四方面,本公开实施例提供一种计算机可读存储介质,其上存储有计算机程序,计算机程序被处理器执行时实现第一方面或第一方面任一种可选的实施方式所述的训练牙模形变模型的方法。In a fourth aspect, an embodiment of the present disclosure provides a computer-readable storage medium on which a computer program is stored, and when the computer program is executed by a processor, the first aspect or any optional implementation manner of the first aspect is implemented. A method for training a dental model deformation model.
第五方面,本公开实施例提供一种计算机程序产品,包括计算机程序/指令,该计算机程序/指令被处理器执行时实现第一方面或第一方面任一种可选的实施方式所述的训练牙模形变模型的方法In a fifth aspect, an embodiment of the present disclosure provides a computer program product, including a computer program/instruction, when the computer program/instruction is executed by a processor, the first aspect or any optional implementation manner of the first aspect is implemented. A method for training a dental model deformation model
三(有益效果)Three (beneficial effect)
本公开实施例提供的训练牙模形变模型的方法首先获取包括多个初始牙齿模型以及各初始牙齿模型对应的目标形变模型的样本数据,然后获取各初始牙齿模型所在立方体空间中各个体素的TSDF值作为各初始牙齿模型对应的特征张量的元素,再将各初始牙齿模型对应的特征张量输入预设网络模型,获取各初始牙齿模型对应的预测形变模型,最后根据各初始牙齿模型对应的目标形变模型和预测形变模型,对所述预设网络模型进行优化获取牙模形变模型。由于本公开实施例提供的训练牙模形变模型的方法可以获取牙模形变模型,并根据牙模形变模型对需要进行形变的初始模型,获取符合特定产品要求的牙齿模型,因此通过本公开实施例提供的训练牙模形变模型的方法获取的牙模形变模型可以自动化的将初始牙齿模型转换为符合特定产品要求的牙齿模型。The method for training a dental model deformation model provided by the embodiment of the present disclosure first obtains sample data including multiple initial tooth models and target deformation models corresponding to each initial tooth model, and then obtains the TSDF of each voxel in the cubic space where each initial tooth model is located The value is used as the element of the feature tensor corresponding to each initial tooth model, and then the feature tensor corresponding to each initial tooth model is input into the preset network model, and the predicted deformation model corresponding to each initial tooth model is obtained. The target deformation model and the predicted deformation model are optimized to obtain the dental model deformation model by optimizing the preset network model. Since the method for training a dental model deformation model provided by the embodiment of the present disclosure can obtain a dental model deformation model, and obtain a dental model that meets the requirements of a specific product according to the initial model that needs to be deformed according to the dental model deformation model, the embodiment of the present disclosure is adopted. The dental model deformation model obtained by the provided method for training a dental model deformation model can automatically convert the initial tooth model into a tooth model that meets specific product requirements.
附图说明Description of drawings
此处的附图被并入说明书中并构成本说明书的一部分,示出了符合本公开的实施例,并与说明书一起用于解释本公开的原理。The accompanying drawings, which are incorporated in and constitute a part of this specification, illustrate embodiments consistent with the disclosure and together with the description serve to explain the principles of the disclosure.
为了更清楚地说明本公开实施例或现有技术中的技术方案,下面将对实施例或现有技术描述中所需要使用的附图作简单地介绍,显而易见地,对于本领域普通技术人员而言,在不付出创造性劳动性的前提下,还可以根据这些附图获得其他的附图。In order to more clearly illustrate the embodiments of the present disclosure or the technical solutions in the prior art, the accompanying drawings that are required to be used in the description of the embodiments or the prior art will be briefly introduced below. In other words, on the premise of no creative labor, other drawings can also be obtained from these drawings.
图1为本公开一个或多个实施例提供的训练牙模形变模型的方法的流程图;1 is a flowchart of a method for training a dental model deformation model provided by one or more embodiments of the present disclosure;
图2为本公开一个或多个实施例提供的预设网络模型的架构图;2 is an architectural diagram of a preset network model provided by one or more embodiments of the present disclosure;
图3为本公开一个或多个实施例提供的编码器组件的结构示意图;3 is a schematic structural diagram of an encoder assembly provided by one or more embodiments of the present disclosure;
图4为本公开一个或多个实施例提供的自注意力组件的结构示意图;4 is a schematic structural diagram of a self-attention component provided by one or more embodiments of the present disclosure;
图5为本公开一个或多个实施例提供的特征传递组件的结构示意图;5 is a schematic structural diagram of a feature transfer assembly provided by one or more embodiments of the present disclosure;
图6为本公开一个或多个实施例提供的多尺度分析组件的结构示意图;6 is a schematic structural diagram of a multi-scale analysis component provided by one or more embodiments of the present disclosure;
图7为本公开一个或多个实施例提供的解码器组件的结构示意图;7 is a schematic structural diagram of a decoder component provided by one or more embodiments of the present disclosure;
图8为本公开一个或多个实施例提供的融合单元的结构示意图;8 is a schematic structural diagram of a fusion unit provided by one or more embodiments of the present disclosure;
图9为本公开一个或多个实施例提供的训练牙模形变模型的装置的结构图;9 is a structural diagram of an apparatus for training a dental mold deformation model provided by one or more embodiments of the present disclosure;
图10为本公开一个或多个实施例提供的电子设备的硬件结构示意图。FIG. 10 is a schematic diagram of a hardware structure of an electronic device provided by one or more embodiments of the present disclosure.
具体实施方式Detailed ways
为了能够更清楚地理解本公开的上述目的、特征和优点,下面将对本公开的方案进行进一步描述。需要说明的是,在不冲突的情况下,本公开的实施例及实施例中的特征可以相互组合。In order to more clearly understand the above objects, features and advantages of the present disclosure, the solutions of the present disclosure will be further described below. It should be noted that the embodiments of the present disclosure and the features in the embodiments may be combined with each other under the condition of no conflict.
在下面的描述中阐述了很多具体细节以便于充分理解本公开,但本公开还可以采用其他不同于在此描述的方式来实施;显然,说明书中的实施例只是本公开的一部分实施例,而不是全部的实施例。Many specific details are set forth in the following description to facilitate a full understanding of the present disclosure, but the present disclosure can also be implemented in other ways different from those described herein; obviously, the embodiments in the specification are only a part of the embodiments of the present disclosure, and Not all examples.
在本公开实施例中,“示例性的”或者“例如”等词用于表示作例子、例证或说明。本公开实施例中被描述为“示例性的”或者“例如”的任何实施例或设计方案不应被解释为比其它实施例或设计方案更优选或更具优势。确切而言,使用“示例性的”或者“例如”等词旨在以具体方式呈现相关概念。此外,在本公开实施例的描述中,除非另有说明,“多个”的含义是指两个或两个以上。In the embodiments of the present disclosure, words such as "exemplary" or "such as" are used to mean serving as an example, illustration, or illustration. Any embodiments or designs described in the embodiments of the present disclosure as "exemplary" or "such as" should not be construed as preferred or advantageous over other embodiments or designs. Rather, the use of words such as "exemplary" or "such as" is intended to present the related concepts in a specific manner. In addition, in the description of the embodiments of the present disclosure, unless otherwise specified, the meaning of "plurality" refers to two or more.
本公开实施例提供的训练牙模形变模型的方法的执行主体可以为建立牙模形变模型的装置。该建立牙模形变模型的装置可以为手机、平板电脑、笔记本电脑、超级移动个人计算机(ultra-mobile personal computer,UMPC)、上网本、个人数字助理(personal digital assistant,PDA)、智能手表、智能手环等终端设备,或者该终端设备还可以为其他类型的终端设备,本公开实施例对终端设备的类型不作限定。The execution subject of the method for training a dental model deformation model provided by the embodiment of the present disclosure may be a device for establishing a dental model deformation model. The device for establishing a dental model deformation model can be a mobile phone, a tablet computer, a notebook computer, an ultra-mobile personal computer (UMPC), a netbook, a personal digital assistant (PDA), a smart watch, a smart hand A terminal device such as a ring, or the terminal device may also be another type of terminal device, and the embodiment of the present disclosure does not limit the type of the terminal device.
本公开实施例提供了一种训练牙模形变模型的方法,参照图1所示,该训练牙模形变模型的方法包括如下步骤S11至S14:An embodiment of the present disclosure provides a method for training a dental model deformation model. Referring to FIG. 1 , the method for training a dental model deformation model includes the following steps S11 to S14:
S11、获取样本数据。S11. Obtain sample data.
其中,所述样本数据包括扫描口腔获取的多个初始牙齿模型以及对各初始牙齿模型进行人工加工得到的各初始牙齿模型对应的目标形变模型。The sample data includes a plurality of initial tooth models obtained by scanning the oral cavity and a target deformation model corresponding to each initial tooth model obtained by manually processing each initial tooth model.
具体的,可以对多个用户进行口腔扫描并进行3D重建从而得到每一个用户的初始牙齿模型,然后通过人工基于特定需求剔除各处室牙齿模型中的部分牙龈区域,并进行扫描重新建模,从而获取各初始牙齿模型对应的目标形变模型。Specifically, the oral cavity can be scanned and 3D reconstructed for multiple users to obtain the initial tooth model of each user, and then some gingival areas in the dental models of various chambers can be manually removed based on specific requirements, and scanned and remodeled. Thereby, the target deformation model corresponding to each initial tooth model is obtained.
S12、获取各初始牙齿模型对应的特征张量。S12: Obtain feature tensors corresponding to each initial tooth model.
其中,各初始牙齿模型对应的特征张量为各初始牙齿模型所在立方体空间中各个体素的截断符号距离函数(Truncated Signed Distance Function,TSDF)值。The feature tensor corresponding to each initial tooth model is the Truncated Signed Distance Function (TSDF) value of each voxel in the cubic space where each initial tooth model is located.
具体的,获取各初始牙齿模型对应的特征张量可以包括:首先建立牙模的正方形外包围盒,作为各初始牙齿模型所在立方体空间,然后将各初始牙齿模型所在立方体空间体素化,最后使用截断符号距离函数(Truncated Signed Distance Function,TSDF)计算各体素化到初始牙齿模型表面的距离作为各体素的TSDF值。其中,距离值TSDF(x i,y i,z i)=0表示该体素在牙齿模型的表面上,TSDF(x i,y i,z i)>0表示该体素在牙齿模型的外部,TSDF(x i,y i,z i)<0表示该体素在牙齿模型的内部。 Specifically, acquiring the feature tensor corresponding to each initial tooth model may include: firstly establishing a square outer bounding box of the tooth model as the cubic space where each initial tooth model is located, then voxelizing the cubic space where each initial tooth model is located, and finally using Truncated Signed Distance Function (TSDF) calculates the distance of each voxel to the surface of the original tooth model as the TSDF value of each voxel. Wherein, the distance value TSDF(x i , y i , z i )=0 indicates that the voxel is on the surface of the tooth model, and TSDF(x i , y i , z i )>0 indicates that the voxel is outside the tooth model , TSDF(x i , y i , z i )<0 indicates that the voxel is inside the tooth model.
S13、将各初始牙齿模型对应的特征张量输入预设网络模型,获取各初始牙齿模型对应的预测形变模型。S13: Input the feature tensor corresponding to each initial tooth model into a preset network model, and obtain a predicted deformation model corresponding to each initial tooth model.
即,预先建立一个用于生成牙模形变模型的预设网络模型,并将样本数据中各初始牙齿模型对应的特征张量输入该预设网络模型,获取对应的输出作为初始牙齿模型对应的预测形变模型。That is, a preset network model for generating a dental model deformation model is established in advance, and the feature tensor corresponding to each initial tooth model in the sample data is input into the preset network model, and the corresponding output is obtained as the prediction corresponding to the initial tooth model. Deformation model.
S14、根据各初始牙齿模型对应的目标形变模型和预测形变模型,对所述预设网络模型进行优化获取牙模形变模型。S14. According to the target deformation model and the predicted deformation model corresponding to each initial tooth model, optimize the preset network model to obtain a tooth model deformation model.
本公开实施例提供的训练牙模形变模型的方法首先获取包括多个初始牙齿模型以及各初始牙齿模型对应的目标形变模型的样本数据,然后获取各初始牙齿模型所在立方体空间中各个体素的TSDF值作为各初始牙齿模型对应的特征张量的元素,再将各初始牙齿模型对应的特征张量输入预设网络模型,获取各初始牙齿模型对应的预测形变模型,最后根据各初始牙齿模型对应的目标形变模型和预测形变模型,对所述预设网络模型进 行优化获取牙模形变模型。由于本公开实施例提供的训练牙模形变模型的方法可以获取牙模形变模型,并根据牙模形变模型对需要进行形变的初始模型,获取符合特定产品要求的牙齿模型,因此通过本公开实施例提供的训练牙模形变模型的方法获取的牙模形变模型可以自动化的将初始牙齿模型转换为符合特定产品要求的牙齿模型。以下详细对上述实施例中的预设网络模型进行说明。The method for training a dental model deformation model provided by the embodiment of the present disclosure first obtains sample data including multiple initial tooth models and target deformation models corresponding to each initial tooth model, and then obtains the TSDF of each voxel in the cubic space where each initial tooth model is located The value is used as the element of the feature tensor corresponding to each initial tooth model, and then the feature tensor corresponding to each initial tooth model is input into the preset network model, and the predicted deformation model corresponding to each initial tooth model is obtained. The target deformation model and the predicted deformation model are optimized to obtain the dental model deformation model by optimizing the preset network model. Since the method for training a dental model deformation model provided by the embodiment of the present disclosure can obtain a dental model deformation model, and obtain a dental model that meets the requirements of a specific product according to the initial model that needs to be deformed according to the dental model deformation model, the embodiment of the present disclosure is adopted. The dental model deformation model obtained by the provided method for training a dental model deformation model can automatically convert the initial tooth model into a tooth model that meets specific product requirements. The preset network model in the above embodiment will be described in detail below.
参照图2所示,本公开实施例中的预设网络模型包括:Referring to FIG. 2 , the preset network model in the embodiment of the present disclosure includes:
由多个串联结构的编码器组成的编码器组件21、自注意力组件22、特征传递组件23、多尺度分析组件24以及由多个串联结构的解码器组成的解码器组件25。An encoder component 21, a self-attention component 22, a feature transfer component 23, a multi-scale analysis component 24, and a decoder component 25 consisting of a plurality of concatenated decoders are composed of multiple encoders in concatenated structures.
所述编码器组件21的输入为所述预设网络模型的输入,所述编码器组件21的输出为所述自注意力组件22的输入;所述自注意力组件22的输出为所述特征传递组件23的输入;所述特征传递组件23的输出为所述多尺度分析组件24的输入,所述多尺度分析组件24的输出为所述解码器组件25的输入,所述解码器组件25的输出为所述预设网络模型的输出。The input of the encoder component 21 is the input of the preset network model, and the output of the encoder component 21 is the input of the self-attention component 22; the output of the self-attention component 22 is the feature The input of the transfer component 23; the output of the feature transfer component 23 is the input of the multi-scale analysis component 24, and the output of the multi-scale analysis component 24 is the input of the decoder component 25, and the decoder component 25 The output of is the output of the preset network model.
其中,所述自注意力组件22用于对所述编码器组件21输出的特征张量进行非局部信息提取,获取环境特征张量;所述特征传递组件对所述自注意力组件的输出进行处理,并将处理结果传递至所述多尺度分析组件;所述多尺度分析组件24用于提取所述特征传递组件23输出的特征张量在多个尺度下的特征张量。Wherein, the self-attention component 22 is used for non-local information extraction on the feature tensor output by the encoder component 21 to obtain the environmental feature tensor; the feature transfer component performs the output of the self-attention component process, and transmit the processing result to the multi-scale analysis component; the multi-scale analysis component 24 is configured to extract the feature tensors of the feature tensor output by the feature transfer component 23 in multiple scales.
由于自注意力组件可以对所述编码器组件输出的特征张量进行非局部信息提取,获取非局部特征之间的依赖关系,且多尺度分析组件可以提取所述特征传递组件输出的特征张量在多个尺度下的特征张量,从而挖掘特征张量在不同尺度之间的相关系性,得到包含多尺度分析结果的上下文信息,因此通过本公开实施例提供的训练牙模形变模型的方法获取的牙模形变模型可以更加准确的对牙齿模型进行形变,更加准确的获取满足特定要求的牙齿模型。Because the self-attention component can perform non-local information extraction on the feature tensor output by the encoder component to obtain the dependencies between non-local features, and the multi-scale analysis component can extract the feature tensor output by the feature transfer component Feature tensors at multiple scales, so as to mine the correlation between feature tensors at different scales, and obtain context information including multi-scale analysis results. Therefore, the method for training a dental model deformation model provided by the embodiments of the present disclosure The obtained tooth model deformation model can more accurately deform the tooth model, and more accurately obtain a tooth model that meets specific requirements.
进一步的,参照图3所示,所述编码器组件21包括三个串联结构的编码器(编码器211、编码器212、编码器213),各编码器(编码器211、编码器212、编码器213)包括一残差单元(残差单元E1、残差单元E2、残差单元E3)和一下采样单元(下采样单元Do1、下采样单元Do2、下采样单元Do3);各编码器的残差单元(残差单元E1、残差单元E2、残差单元E3)用于通过三个串联结构的卷积层对残差单元的输入进行卷积操作并对所述卷积操作的卷积结果和残差单元的输入执行加和操作,各编码器的下采样单元(下采样单元Do1、下采样单元Do2、下采样单元Do3)用于将输入特征张量下采样为通道数为输入特征张量的通道数的两倍,长、宽、高为输入特征张量的长、宽、高的二分之一的输出特征张量。Further, as shown in FIG. 3 , the encoder component 21 includes three encoders (encoder 211 , encoder 212 , encoder 213 ) in series structure, each encoder (encoder 211 , encoder 212 , encoder device 213) includes a residual unit (residual unit E1, residual unit E2, residual unit E3) and a down-sampling unit (down-sampling unit Do1, down-sampling unit Do2, down-sampling unit Do3); The difference unit (residual unit E1, residual unit E2, residual unit E3) is used to perform a convolution operation on the input of the residual unit through three convolutional layers in a series structure and the convolution result of the convolution operation The sum operation is performed with the input of the residual unit, and the downsampling units of each encoder (downsampling unit Do1, downsampling unit Do2, downsampling unit Do3) are used to downsample the input feature tensor into the number of channels as the input feature tensor The output feature tensor whose length, width and height are half of the length, width and height of the input feature tensor.
各残差单元的输入均为所属的编码器的输入(残差单元E1的输入为编码器211的输入、残差单元E2的输入为编码器212的输入、残差单元E3的输入为编码器213的输入)。各下采样单元的输入均为所属的编码器的残差单元输出(下采样单元Do1的输入为编码器211的残差单元E1的输出、下采样单元Do2的输入为编码器212的残差单元E2的输出、下采样单元Do3的输入为编码器213的残差单元E3的输出),各下采样单元的输出均为所属的编码器的输出(下采样单元Do1的输出为编码器211的输出、下采样单元Do2的输出为编码器212的输出、下采样单元Do3的输出为编码器213的输出),第一个编码器211的输入为编码器组件21的输入,第三个编码器213的输出为编码器组件21的输出,第二编码器212和第三个编码器213的输入分别为第一个编码器211和第二个编码器212的输出。The input of each residual unit is the input of the corresponding encoder (the input of the residual unit E1 is the input of the encoder 211, the input of the residual unit E2 is the input of the encoder 212, and the input of the residual unit E3 is the input of the encoder. 213 input). The input of each downsampling unit is the output of the residual unit of the corresponding encoder (the input of the downsampling unit Do1 is the output of the residual unit E1 of the encoder 211, and the input of the downsampling unit Do2 is the residual unit of the encoder 212. The output of E2 and the input of the down-sampling unit Do3 are the output of the residual unit E3 of the encoder 213), and the output of each down-sampling unit is the output of the corresponding encoder (the output of the down-sampling unit Do1 is the output of the encoder 211). , the output of the down-sampling unit Do2 is the output of the encoder 212, the output of the down-sampling unit Do3 is the output of the encoder 213), the input of the first encoder 211 is the input of the encoder component 21, the third encoder 213 The output of the encoder component 21 is the output of the encoder component 21, and the inputs of the second encoder 212 and the third encoder 213 are the outputs of the first encoder 211 and the second encoder 212, respectively.
即,编码器组件21的输入、第一个编码器211的输入以及残差单元E1的输入为同一输入,残差单元E1的输出为下采样单元Do1的输入。下采样单元Do1的输出为第一个编码器211的输出。第二个编码器212的输入和残差单元E2的输入为同一输入,均为第一个编码器211的输出。残差单元E2的输出为下采样单元Do2的输入。下采样单 元Do2的输出为第二个编码器212的输出。第三个编码器213的输入和残差单元E3的输入为同一输入,均为第二个编码器212的输出。残差单元E3的输出为下采样单元Do3的输入。下采样单元Do3的输出为第三个编码器212的输出,编码器组件21的输出。That is, the input of the encoder component 21, the input of the first encoder 211, and the input of the residual unit E1 are the same input, and the output of the residual unit E1 is the input of the down-sampling unit Do1. The output of the down-sampling unit Do1 is the output of the first encoder 211 . The input of the second encoder 212 and the input of the residual unit E2 are the same input, and both are outputs of the first encoder 211 . The output of the residual unit E2 is the input of the down-sampling unit Do2. The output of the downsampling unit Do2 is the output of the second encoder 212. The input of the third encoder 213 and the input of the residual unit E3 are the same input, and both are the output of the second encoder 212 . The output of the residual unit E3 is the input of the down-sampling unit Do3. The output of the down-sampling unit Do3 is the output of the third encoder 212 , the output of the encoder component 21 .
可选的,各解码器的残差单元的三个卷积层的卷积核均为3×3×3,各解码器的残差单元的三个卷积层的输出的特征张量的长、宽、高和输入的特征张量的长、宽、高相同,第一个编码器的残差单元输出的特征张量的通道数为输入特征张量的通道数的16倍,第二个编码器的残差单元和第三编码器的残差单元输出的特征张量的通道数与输入的特征张量的通道数相同。各下采样单元均为步长为2、卷积核为2×2×2的卷积层。Optionally, the convolution kernels of the three convolutional layers of the residual unit of each decoder are all 3×3×3, and the length of the feature tensor output by the three convolutional layers of the residual unit of each decoder is , width and height are the same as the length, width and height of the input feature tensor. The number of channels of the feature tensor output by the residual unit of the first encoder is 16 times the number of channels of the input feature tensor. The number of channels of the feature tensor output by the residual unit of the encoder and the residual unit of the third encoder is the same as the number of channels of the input feature tensor. Each down-sampling unit is a convolutional layer with a stride of 2 and a convolution kernel of 2×2×2.
进一步的,参照图4所示,所述自注意力组件22包括:残差单元E4、第一卷积层Co1、第二卷积层Co2、第三卷积层Co3、第四卷积层Co4、第五卷积层Co5、第六卷积层Co6、第一点积单元Pro1、第二点积单元Pro2、第一加和单元Add1以及第二加和单元Add2。Further, referring to FIG. 4 , the self-attention component 22 includes: a residual unit E4, a first convolutional layer Co1, a second convolutional layer Co2, a third convolutional layer Co3, and a fourth convolutional layer Co4 , the fifth convolutional layer Co5, the sixth convolutional layer Co6, the first dot product unit Pro1, the second dot product unit Pro2, the first summation unit Add1, and the second summation unit Add2.
所述残差单元E4用于通过三个串联结构的卷积层对残差单元的输入进行卷积操作并对所述卷积操作的卷积结果和残差单元的输入执行加和操作,所述第一点积单元Pro1和第二点积单元Pro2用于对输入特征张量执行点积操作,所述第一加和单元Add1和第二加和单元Add2对输入特征张量执行加和操作。The residual unit E4 is used to perform a convolution operation on the input of the residual unit through three convolutional layers of a concatenated structure, and perform a sum operation on the convolution result of the convolution operation and the input of the residual unit, so The first dot product unit Pro1 and the second dot product unit Pro2 are used to perform a dot product operation on the input feature tensor, and the first addition unit Add1 and the second addition unit Add2 perform a sum operation on the input feature tensor .
所述残差单元E4的输入为所述编码器组件21的输出(编码器组件21的第三个编码器213的下采样单元Do3的输出),所述残差单元E4的输出为所述第一卷积层Co1的输入。所述第一卷积层Co1的输出为所述第二卷积层Co2、所述第三卷积层Co3以及所述第四卷积层Co4的输入。所述第一点积单元Pro1的输入为所述第二卷积层Co2的输出和所述第三卷积层Co3的输出。所述第二点积单元Pro2的输入为所述第一点积单元Pro1的输出和所述第四卷积层Co4的输出。所述第五卷积层Co5的输入为所述第二点积单元Pro2的输出。所述第一加和单元Add1的输入为所述第五卷积层Co5的输出和所述第一卷积层Co1的输出。所述第六卷积层Co6的输入为所述第一加和单元Add1的输出。所述第二加和单元Add2的输入为所述第六卷积层Co6的输出和所述残差单元E4的输出。所述第二加和单元Add2的输出为所述自注意力组件22的输出。The input of the residual unit E4 is the output of the encoder component 21 (the output of the downsampling unit Do3 of the third encoder 213 of the encoder component 21), and the output of the residual unit E4 is the An input to the convolutional layer Co1. The output of the first convolutional layer Co1 is the input of the second convolutional layer Co2, the third convolutional layer Co3 and the fourth convolutional layer Co4. The input of the first dot product unit Pro1 is the output of the second convolutional layer Co2 and the output of the third convolutional layer Co3. The input of the second dot product unit Pro2 is the output of the first dot product unit Pro1 and the output of the fourth convolution layer Co4. The input of the fifth convolutional layer Co5 is the output of the second dot product unit Pro2. The input of the first summing unit Add1 is the output of the fifth convolutional layer Co5 and the output of the first convolutional layer Co1. The input of the sixth convolutional layer Co6 is the output of the first summing unit Add1. The inputs of the second summing unit Add2 are the output of the sixth convolutional layer Co6 and the output of the residual unit E4. The output of the second summing unit Add2 is the output of the self-attention component 22 .
可选的,所述第一卷积层、所述第二卷积层、所述第三卷积层、所述第四卷积层、所述第五卷积层以及所述第六卷积层的卷积核均为1×1×1。所述第一卷积层的输出特征张量的长、宽、高和输入特征张量的长、宽、高相同。所述第一卷积层的输出特征张量的通道数为输入特征张量的通道数的八分之一。所述第二卷积层、第三卷积层、第四卷积层的输出的特征张量的通道数为输入的特征张量的特征张量的通道数的二分之一。所述第五卷积层的输出的特征张量的通道数为输入的特征张量的特征张量的通道数的两倍。所述第六卷积层的输出的特征张量的通道数为输入的特征张量的特征张量的通道数的八倍。Optionally, the first convolutional layer, the second convolutional layer, the third convolutional layer, the fourth convolutional layer, the fifth convolutional layer, and the sixth convolutional layer The convolution kernels of the layers are all 1×1×1. The length, width and height of the output feature tensor of the first convolution layer are the same as the length, width and height of the input feature tensor. The number of channels of the output feature tensor of the first convolution layer is one eighth of the number of channels of the input feature tensor. The number of channels of the output feature tensor of the second convolution layer, the third convolution layer, and the fourth convolution layer is half of the channel number of the feature tensor of the input feature tensor. The number of channels of the feature tensor of the output of the fifth convolution layer is twice the number of channels of the feature tensor of the input feature tensor. The number of channels of the feature tensor of the output of the sixth convolution layer is eight times the number of channels of the feature tensor of the input feature tensor.
设:残差单元E4的输出的特征张量X∈R C×H×W×L,其中C值为残差单元E4的输出的特征张量的通道数,H、W、L分别为残差单元E4的输出的特征张量的长、宽、高,则第一卷积层Co1输出的特征张量X 1∈R C1×H×W×L,C1=C/8,第二卷积层Co2输出的特征张量X 2∈R C2×H×W×L,C2=C1/2,第三卷积层Co3输出的特征张量X 3∈R C3×H×W×L,C3=C1/2。 Let: the output feature tensor X∈R C×H×W×L of the residual unit E4, where C is the number of channels of the output feature tensor of the residual unit E4, H, W, L are the residuals respectively The length, width and height of the feature tensor output by the unit E4, then the feature tensor X 1 ∈ R C1×H×W×L output by the first convolutional layer Co1, C1=C/8, the second convolutional layer The feature tensor X 2 ∈R C2×H×W×L output by Co2, C2=C1/2, the feature tensor X 3 ∈ R C3×H×W×L output by the third convolution layer Co3, C3=C1 /2.
分别通过x i∈R 1和x j∈R 1表示X_2中第i个和X 3中第j个体素的标号,X 2(x i)∈R C2表示X 2中第x i个体素的特征向量,X 3(x j)∈R C3表示X 3中第x j个体素的特征向量,则注意力分布为: The labels of the i-th voxel in X_2 and the j-th voxel in X 3 are represented by x i ∈ R 1 and x j ∈ R 1 respectively, and X 2 ( xi )∈R C2 represents the feature of the x i -th voxel in X 2 Vector, X 3 (x j )∈R C3 represents the feature vector of the x jth voxel in X 3 , then the attention distribution is:
Figure PCTCN2022081543-appb-000007
Figure PCTCN2022081543-appb-000007
第四卷积层Co4输出的特征张量X 4∈R C4×H×W×L,C4=C1/2。对X 4和S执行点积操 作,则可以得到描述非局部依赖关系的环境特征: The feature tensor X 4 ∈ R C4×H×W×L output by the fourth convolution layer Co4, C4=C1/2. Performing the dot product operation on X4 and S, you can get the environmental characteristics that describe the non-local dependencies:
Figure PCTCN2022081543-appb-000008
Figure PCTCN2022081543-appb-000008
进而可得环境特征Con2∈R C5×H×W×L,C5=C1/2。 Then the environmental feature Con2∈R C5×H×W×L , C5=C1/2 can be obtained.
第五卷积层Co5输出的特征张量X 5,第一加和单元Add1对X 5和X 1执行加和操作,第一加和单元Add1输出的特征张量res1∈R C1×H×W×L,第六卷积层Co6输出的特征张量X 6∈R C×H×W×L,第二加和单元Add2对X和X 6执行加和操作,得到最终环境特征张量res2∈R C×W×H×LThe feature tensor X 5 output by the fifth convolution layer Co5, the first summation unit Add1 performs the sum operation on X 5 and X 1 , and the feature tensor res1∈R C1×H×W output by the first summation unit Add1 ×L , the feature tensor X 6 ∈ R C×H×W×L output by the sixth convolution layer Co6, the second summation unit Add2 performs the sum operation on X and X 6 to obtain the final environmental feature tensor res2 ∈ R C×W×H×L .
进一步的,参照图5所示,所述特征传递组件23包括一个下采样单元Do4和一个残差单元E5;Further, as shown in FIG. 5 , the feature transfer component 23 includes a downsampling unit Do4 and a residual unit E5;
所述下采样单元Do4用于将输入特征张量下采样为通道数为输入特征张量的通道数的两倍,长、宽、高为输入特征张量的长、宽、高的二分之一的输出特征张量。所述残差单元E5用于通过三个串联结构的卷积层对残差单元的输入进行卷积操作并对所述卷积操作的卷积结果和残差单元的输入执行加和操作。The downsampling unit Do4 is used for downsampling the input feature tensor to a channel number that is twice the number of channels of the input feature tensor, and the length, width, and height are half of the length, width, and height of the input feature tensor. An output feature tensor of . The residual unit E5 is configured to perform a convolution operation on the input of the residual unit through three convolutional layers in a concatenated structure, and perform a summation operation on the convolution result of the convolution operation and the input of the residual unit.
所述下采样单元Do4的输入为所述自注意力组件22的输出,所述下采样单元Do4的输出为所述残差单元E5的输入,所述残差单元E5的输出为所述特征传递组件23的输出。The input of the downsampling unit Do4 is the output of the self-attention component 22, the output of the downsampling unit Do4 is the input of the residual unit E5, and the output of the residual unit E5 is the feature transfer Output of component 23.
可选的,所述下采样单元Do4为步长为2、卷积核为2×2×2的卷积层。所述残差单元E5的三个卷积层的卷积核均为3×3×3,各残差单元的三个卷积层的输出的特征张量的长、宽、高和输入的特征张量的长、宽、高相同,残差单元E5的输出特征张量的通道数和输入特征张量的通道数相同。Optionally, the downsampling unit Do4 is a convolution layer with a stride of 2 and a convolution kernel of 2×2×2. The convolution kernels of the three convolutional layers of the residual unit E5 are all 3×3×3, and the length, width and height of the output feature tensor of the three convolutional layers of each residual unit and the input features The length, width and height of the tensor are the same, and the number of channels of the output feature tensor of the residual unit E5 is the same as the number of channels of the input feature tensor.
进一步的,参照图6所示,所述多尺度分析组件24包括:第七卷积层Co7、第八卷积层Co8、第九卷积层Co9、第十卷积层Co10、第十一卷积层Co11、第十二卷积层Co12以及拼接单元MON。Further, as shown in FIG. 6, the multi-scale analysis component 24 includes: the seventh convolutional layer Co7, the eighth convolutional layer Co8, the ninth convolutional layer Co9, the tenth convolutional layer Co10, the eleventh volume The stacking layer Co11, the twelfth convolutional layer Co12, and the splicing unit MON.
其中,所述第七卷积层Co7、第八卷积层Co8、所述第九卷积层Co9、所述第十卷积层Co10的扩张率均不同;所述拼接单元MON用于对输入的特征张量执行拼接操作。Wherein, the expansion rates of the seventh convolutional layer Co7, the eighth convolutional layer Co8, the ninth convolutional layer Co9, and the tenth convolutional layer Co10 are all different; the splicing unit MON is used to The feature tensor of the splicing operation is performed.
所述第七卷积层Co7、所述第八卷积层Co8、所述第九卷积层Co9以及所述第十卷积层Co10的输入均为所述特征传递组件23的输出,所述拼接单元MON的输入为所述特征传递组件23的输出、所述第七卷积层Co7的输出、所述第八卷积层Co8的输出、所述第九卷积层Co9的输出以及所述第十卷积层Co10的输出,所述第十一卷积层Co11的输入为所述拼接单元MON的输出,所述第十二卷积层Co12的输入为所述第十一卷积层Co11的输出;所述第十二卷积层Co12的输出为所述多尺度分析组件24的输出。The inputs of the seventh convolutional layer Co7, the eighth convolutional layer Co8, the ninth convolutional layer Co9, and the tenth convolutional layer Co10 are the outputs of the feature transfer component 23, and the The input of the splicing unit MON is the output of the feature transfer component 23, the output of the seventh convolutional layer Co7, the output of the eighth convolutional layer Co8, the output of the ninth convolutional layer Co9 and the The output of the tenth convolutional layer Co10, the input of the eleventh convolutional layer Co11 is the output of the splicing unit MON, and the input of the twelfth convolutional layer Co12 is the eleventh convolutional layer Co11 The output of the twelfth convolutional layer Co12 is the output of the multi-scale analysis component 24 .
可选的,所述第七卷积层的卷积核为1×1×1,输出的特征张量的通道数和输入的特征张量的通道数相同,扩张率为1。所述第八卷积层、所述第九卷积层、所述第十卷积层的卷积核均为3×3×3,输出的特征张量的通道数和输入的特征张量的通道数相同,扩张率分别为2、3、4。所述第十一卷积层的卷积核为3×3×3,输出的特征张量的通道数为输入的特征张量的通道数的五分之一;所述第十二卷积层的卷积核为3×3×3,输出的特征张量的通道数和输入的特征张量的通道数相同。Optionally, the convolution kernel of the seventh convolution layer is 1×1×1, the number of channels of the output feature tensor is the same as the number of channels of the input feature tensor, and the expansion rate is 1. The convolution kernels of the eighth convolutional layer, the ninth convolutional layer, and the tenth convolutional layer are all 3×3×3, and the number of channels of the output feature tensor is the same as the input feature tensor. The number of channels is the same, and the expansion rates are 2, 3, and 4, respectively. The convolution kernel of the eleventh convolution layer is 3×3×3, and the number of channels of the output feature tensor is one-fifth of the number of channels of the input feature tensor; the twelfth convolution layer The convolution kernel is 3×3×3, and the number of channels of the output feature tensor is the same as the number of channels of the input feature tensor.
设:特征传递组件23输出的特征张量A∈R C×H×W×L,则第七卷积层Co7输出的特征张量A1∈R C×H×W×L,第八卷积层Co8输出的特征张量A2∈R C×H×W×L,第九卷积层Co9输出的特征张量A3∈R C×H×W×L,第十卷积层Co10输出的特征张量A4∈R C×H×W×L,拼接模块MON输出的特征张量Cat∈R 5×C×H×W×L,第十一卷积层Co11输出的特征张量Cat1∈R C×H×W×L,第十二卷积层Co12输出的特征张量Cat2∈R C×H×W×LSuppose: the feature tensor A∈R C×H×W×L output by the feature transfer component 23, then the feature tensor A1∈R C×H×W×L output by the seventh convolutional layer Co7, the eighth convolutional layer The feature tensor A2∈R C×H×W×L output by Co8, the feature tensor A3∈R C×H×W×L output by the ninth convolutional layer Co9, the feature tensor output by the tenth convolutional layer Co10 A4∈R C×H×W×L , the feature tensor Cat∈R 5×C×H×W×L output by the splicing module MON, the feature tensor Cat1∈R C × H output by the eleventh convolutional layer Co11 ×W×L , the feature tensor Cat2∈R C×H×W×L output by the twelfth convolutional layer Co12.
进一步的,参照图7所示,所述解码器组件25包括四个串联结构的解码器(解码 器251、解码器252、解码器253、解码器254);各解码器包括:上采样单元(解码器251的上采样单元Up1、解码器252的上采样单元Up2、解码器253的上采样单元Up3、解码器254的上采样单元Up4)、融合单元(解码器251的融合单元F1、解码器252的融合单元F2、解码器253的融合单元F3、解码器254的融合单元F4)以及残差单元(解码器251的残差单元E6、解码器252的残差单元E7、解码器253的残差单元E8、解码器254的残差单元E9)。Further, as shown in FIG. 7 , the decoder component 25 includes four decoders (decoder 251 , decoder 252 , decoder 253 , and decoder 254 ) in a serial structure; each decoder includes: an upsampling unit ( The upsampling unit Up1 of the decoder 251, the upsampling unit Up2 of the decoder 252, the upsampling unit Up3 of the decoder 253, the upsampling unit Up4 of the decoder 254), the fusion unit (the fusion unit F1 of the decoder 251, the decoder 252 fusion unit F2, decoder 253 fusion unit F3, decoder 254 fusion unit F4) and residual units (decoder 251 residual unit E6, decoder 252 residual unit E7, decoder 253 residual unit difference unit E8, residual unit E9 of decoder 254).
其中,各解码器的残差单元均用于通过三个串联结构的卷积层对残差单元的输入进行卷积操作并对所述卷积操作的卷积结果和残差单元的输入执行加和操作,各解码器的融合单元用于对输入的特征张量进行融合操作;各解码器的上采样单元均用于将输入特征张量上采样为通道数为输入特征张量的通道数的二分之一,长、宽、高为输入特征张量的长、宽、高的二倍的输出特征张量。The residual unit of each decoder is used to perform convolution operation on the input of the residual unit through three convolutional layers of concatenated structure, and perform addition of the convolution result of the convolution operation and the input of the residual unit. and operation, the fusion unit of each decoder is used to perform fusion operation on the input feature tensor; the upsampling unit of each decoder is used to upsample the input feature tensor to a number of channels equal to the number of channels of the input feature tensor. One-half, the output feature tensor whose length, width and height are twice the length, width and height of the input feature tensor.
所述解码器组件的第一个解码器251的上采样单元Up1的输入为多尺度分析组件24的输出,所述第一个解码器251的融合单元F1的输入为所述第一个解码器251的上采样单元Up1的输出和所述自注意力模块22的输出;所述第一个解码器251的残差单元E6的输入为所述第一个解码器251的融合单元F1的输出和所述自注意力模块22的输出;所述解码器组件的第二个解码器252、第三个解码253、第四个解码器254的上采样单元(上采样单元Up2、上采样单元Up3、上采样单元Up4)的输入均为上一个解码器的输出,所述解码器组件的第二个解码器252、第三个解码253、第四个解码器254的融合单元(融合单元F2、融合单元F3、融合单元F4)的输入分别对应的编码器(编码器213、编码器212、编码器211)的残差单元(残差单元E3、残差单元E2残差单元E1)的输出和所属的解码器的上采样单元输出,所述解码器组件的第二个解码器252、第三个解码253、第四个解码器254的残差单元(残差单元E7、残差单元E8残差单元E9)的输入分别为对应的编码器(编码器213、编码器212、编码器211)的残差单元(残差单元E3、残差单元E2残差单元E1)的输出和所属的解码器的融合单元的输出。The input of the upsampling unit Up1 of the first decoder 251 of the decoder component is the output of the multi-scale analysis component 24, and the input of the fusion unit F1 of the first decoder 251 is the first decoder The output of the upsampling unit Up1 of 251 and the output of the self-attention module 22; the input of the residual unit E6 of the first decoder 251 is the output of the fusion unit F1 of the first decoder 251 and The output of the self-attention module 22; the up-sampling units (up-sampling unit Up2, up-sampling unit Up3, The input of the upsampling unit Up4) is the output of the previous decoder, and the fusion units of the second decoder 252, the third decoding 253, and the fourth decoder 254 of the decoder component (fusion unit F2, fusion The outputs of the residual units (residual unit E3, residual unit E2, residual unit E1) of the corresponding encoders (encoder 213, encoder 212, and encoder 211) respectively correspond to the inputs of the unit F3 and the fusion unit F4) and belong to The output of the upsampling unit of the decoder, the residual unit of the second decoder 252, the third decoder 253, the fourth decoder 254 of the decoder component (residual unit E7, residual unit E8 residual The input of the unit E9) is the output of the residual unit (residual unit E3, residual unit E2, residual unit E1) of the corresponding encoder (encoder 213, encoder 212, encoder 211) and the corresponding decoder. The output of the fusion unit.
可选的,各编码器的残差单元的三个卷积层的卷积核均为3×3×3,各编码器的残差单元的三个卷积层的输出的特征张量的长、宽、高和输入的特征张量的长、宽、高相同,各解码器的残差单元的残差单元输出的特征张量的通道数与输入的特征张量的通道数相同。各上采样单元均为步长为2、卷积核为2×2×2的反卷积层。Optionally, the convolution kernels of the three convolutional layers of the residual unit of each encoder are all 3×3×3, and the length of the feature tensor output by the three convolutional layers of the residual unit of each encoder is , width and height are the same as the length, width and height of the input feature tensor, and the number of channels of the feature tensor output by the residual unit of each decoder is the same as the number of channels of the input feature tensor. Each upsampling unit is a deconvolution layer with a stride of 2 and a convolution kernel of 2×2×2.
进一步的,参照图8所示,各解码器的融合单元包括:第十三卷积层Co13、第十四卷积层Co14、第十五卷积层Co15、第三加和单元Add3、第四加和单元Add4、第三点积单元Pro3以及第四点积单元Pro4。Further, as shown in FIG. 8, the fusion unit of each decoder includes: the thirteenth convolutional layer Co13, the fourteenth convolutional layer Co14, the fifteenth convolutional layer Co15, the third summation unit Add3, and the fourth summation Unit Add4, third dot product unit Pro3, and fourth dot product unit Pro4.
所述第三加和单元Add3和第四加和单元Add4用于对输入执行加和操作,所述第三点积单元Pro3和第四点积单元Pro4用于对输入执行点积操作。The third adding unit Add3 and the fourth adding unit Add4 are used to perform an adding operation on the input, and the third dot product unit Pro3 and the fourth dot product unit Pro4 are used to perform a dot product operation on the input.
所述解码器组件25的第一个解码器251的融合单元F1的第十三卷积层Co13和第十四卷积层Co14输入分别为所述第一个解码器251的上采样单元Up1的输出和所述自注意力组件22的输出,所述解码器组件的第二个解码器252、第三个解码器253、第四个解码器254的融合单元(融合单元F2、融合单元F3、融合单元F4)的输入为所属的解码器的上采样单元的输出和对应的编码器的残差单元的输出(融合单元F2的输入为编码器213的残差单元E3的输出和解码器252的上采样单元Up2的输出,融合单元F3的输入为编码器212的残差单元E2的输出和解码器253的上采样单元Up3的输出,融合单元F4的输入为编码器211的残差单元E1的输出和解码器254的上采样单元Up4的输出),所述第三加和单元Add3的输入为所述第十三卷积层Co13的输出和所述第十四卷积层Co14的输出,所述第十五卷积层Co15的输入为所述第三加和单元Add3的输出,所述第三点积单元Pro3的输入为所述第十三卷积层Co13的输出和所述第十五卷积层Co15的输出,所述第四点积单元Pro4的输入为所述第十四卷积层Co14的输出和所述第十五卷积层Co15的输出,所述第四加和单元Add4的输入为所述第三点积单元Pro3 的输出和所述第四点积单元Pro4的输出,所述第四加和单元Add4的输出为所属的融合单元的输出。The inputs of the thirteenth convolutional layer Co13 and the fourteenth convolutional layer Co14 of the fusion unit F1 of the first decoder 251 of the decoder component 25 are respectively the inputs of the upsampling unit Up1 of the first decoder 251 . The output and the output of the self-attention component 22, the fusion units of the second decoder 252, the third decoder 253, and the fourth decoder 254 of the decoder component (fusion unit F2, fusion unit F3, The input of the fusion unit F4) is the output of the upsampling unit of the corresponding decoder and the output of the corresponding residual unit of the encoder (the input of the fusion unit F2 is the output of the residual unit E3 of the encoder 213 and the output of the decoder 252. The output of the upsampling unit Up2, the input of the fusion unit F3 is the output of the residual unit E2 of the encoder 212 and the output of the upsampling unit Up3 of the decoder 253, and the input of the fusion unit F4 is the residual unit E1 of the encoder 211. output and the output of the upsampling unit Up4 of the decoder 254), the input of the third summing unit Add3 is the output of the thirteenth convolutional layer Co13 and the output of the fourteenth convolutional layer Co14, the The input of the fifteenth convolutional layer Co15 is the output of the third summing unit Add3, and the input of the third dot product unit Pro3 is the output of the thirteenth convolutional layer Co13 and the fifteenth convolutional layer Co15 , the input of the fourth dot product unit Pro4 is the output of the fourteenth convolutional layer Co14 and the output of the fifteenth convolutional layer Co15, and the input of the fourth summing unit Add4 is the The output of the third dot product unit Pro3 and the output of the fourth dot product unit Pro4, the output of the fourth summing unit Add4 is the output of the fusion unit to which it belongs.
即,如图8所示,对融合单元分别通过第十三卷积层Co13和第十四卷积层Co14对输入的两个特征张量Ai、Bi执行卷积操作,得到降维后的特征张量Ci、Di;随后使用第三加和单元Add3对Ci和Di执行加和操作,融合Ci和Di,再将第三加和单元Add3的融合结果送入到第十五卷积层Co15分别得到编码器权值系数张量Ei和解码器权值系数张量Fi;再通过第三点积单元Pro3对权值系数张量Ci和Ei点乘,得到结果Gi,通过第四点积单元Pro4对Di与Fi点乘得到结果Hi;最后,通过第四加和单元Add4对Gi和Hi执行加和操作,得到包含编码器输出特征和解码器输出特征的融合特征Zi作为第i模块解码器残差单元的的输入特征张量。That is, as shown in Figure 8, the fusion unit performs convolution operations on the input two feature tensors Ai and Bi through the thirteenth convolutional layer Co13 and the fourteenth convolutional layer Co14, respectively, to obtain the features after dimension reduction. Tensors Ci and Di; then use the third summing unit Add3 to perform the sum operation on Ci and Di, fuse Ci and Di, and then send the fusion result of the third summing unit Add3 to the fifteenth convolutional layer Co15 to obtain the encoder weights respectively The value coefficient tensor Ei and the decoder weight coefficient tensor Fi; and then through the third dot product unit Pro3, the weight coefficient tensor Ci and Ei are dot-multiplied to obtain the result Gi, and the fourth dot product unit Pro4 is used to compare Di and Fi The result Hi is obtained by dot multiplication; finally, the addition operation is performed on Gi and Hi through the fourth adding unit Add4, and the fusion feature Zi containing the output feature of the encoder and the output feature of the decoder is obtained as the residual unit of the i-th module decoder. Input feature tensor.
可选的,第十三卷积层、第十四卷积层、第十五卷积层的卷积核均为1×1×1,第三卷积层和第四卷积层的输出特征张量的通道数为输入特征张量的通道数的二分之一,第十五卷积层的输出特征张量的通道数为1。Optionally, the convolution kernels of the thirteenth convolutional layer, the fourteenth convolutional layer, and the fifteenth convolutional layer are all 1×1×1, and the output features of the third convolutional layer and the fourth convolutional layer are The number of channels of the tensor is half of the number of channels of the input feature tensor, and the number of channels of the output feature tensor of the fifteenth convolutional layer is 1.
设:Ai∈R C×W×H×L和Bi∈R C×W×H×L,则Ci∈R 1/2C×W×H×L和Di∈R 1/2C×W×H×L,Ei∈R 1×W×H×L、Fi∈R 1×W×H×LLet: Ai∈R C×W×H×L and Bi∈R C×W×H×L , then Ci∈R 1/2C×W×H×L and Di∈R 1/2C×W×H×L , Ei∈R 1×W×H×L , Fi∈R 1×W×H×L .
作为本公开实施例一种可选的实施例方式,上述步骤S104(根据各初始牙齿模型对应的目标形变模型和预测形变模型,对所述预设网络模型进行优化获取牙模形变模型),包括:As an optional implementation manner of the embodiment of the present disclosure, the above step S104 (according to the target deformation model and the predicted deformation model corresponding to each initial tooth model, optimize the preset network model to obtain the tooth model deformation model), including :
构建损失函数,并根据所述损失函数、各初始牙齿模型对应的目标形变模型以及预测形变模型,对所述预设网络模型进行优化获取牙模形变模型;Constructing a loss function, and optimizing the preset network model to obtain a tooth model deformation model according to the loss function, the target deformation model corresponding to each initial tooth model, and the predicted deformation model;
其中,所述损失函数包括:Wherein, the loss function includes:
Figure PCTCN2022081543-appb-000009
Figure PCTCN2022081543-appb-000009
Figure PCTCN2022081543-appb-000010
Figure PCTCN2022081543-appb-000010
Figure PCTCN2022081543-appb-000011
Figure PCTCN2022081543-appb-000011
其中,alpha为常数,out i依次根据所述多尺度分析组件的输出、所述解码器组件的各个解码器的输出进行处理得到的数据,seg为中间督导信号。 Wherein, alpha is a constant, out i is data obtained by sequentially processing the output of the multi-scale analysis component and the output of each decoder of the decoder component, and seg is an intermediate supervision signal.
可选的,out 1为通过卷积核为1×1×1、输出特征张量的通道数为1的卷积层对多尺度分析组件的输出执行卷积操作,然后通过三线性插值(Trilinear)对卷积操作得到的特征张量的长、宽、高扩大16倍,再对三线性插值结果执行Sigmoid操作得到结果; Optionally, out 1 is to perform a convolution operation on the output of the multi-scale analysis component through a convolution layer with a convolution kernel of 1 × 1 × 1 and the number of channels of the output feature tensor is 1, and then through trilinear interpolation (Trilinear interpolation). ) Expand the length, width and height of the feature tensor obtained by the convolution operation by 16 times, and then perform the sigmoid operation on the trilinear interpolation result to obtain the result;
out 2为通过卷积核为1×1×1、输出特征张量的通道数为1的卷积层对解码器组件的第一解码器的输出执行卷积操作,然后通过三线性插值(Trilinear)对卷积操作得到的特征张量的长、宽、高扩大8倍,再对三线性插值结果执行Sigmoid操作得到结果; out 2 is to perform a convolution operation on the output of the first decoder of the decoder component through a convolution layer with a convolution kernel of 1 × 1 × 1 and the number of channels of the output feature tensor is 1, and then through trilinear interpolation (Trilinear interpolation). ) Expand the length, width and height of the feature tensor obtained by the convolution operation by 8 times, and then perform the Sigmoid operation on the trilinear interpolation result to obtain the result;
out 3为通过卷积核为1×1×1、输出特征张量的通道数为1的卷积层对解码器组件的第二解码器的输出执行卷积操作,然后通过三线性插值(Trilinear)对卷积操作得到的特征张量的长、宽、高扩大4倍,再对三线性插值结果执行Sigmoid操作得到结果; out 3 is to perform a convolution operation on the output of the second decoder of the decoder component through a convolution layer with a convolution kernel of 1 × 1 × 1 and the number of channels of the output feature tensor is 1, and then through trilinear interpolation (Trilinear interpolation). ) Expand the length, width and height of the feature tensor obtained by the convolution operation by 4 times, and then perform the Sigmoid operation on the trilinear interpolation result to obtain the result;
out 4为通过卷积核为1×1×1、输出特征张量的通道数为1的卷积层对解码器组件的第三解码器的输出执行卷积操作,然后通过三线性插值(Trilinear)对卷积操作得到的特征张量的长、宽、高扩大2倍,再对三线性插值结果执行Sigmoid操作得到结果; out 4 is to perform a convolution operation on the output of the third decoder of the decoder component through a convolution layer with a convolution kernel of 1 × 1 × 1 and the number of channels of the output feature tensor is 1, and then through trilinear interpolation (Trilinear interpolation) ) Expand the length, width and height of the feature tensor obtained by the convolution operation by 2 times, and then perform the Sigmoid operation on the trilinear interpolation result to obtain the result;
out 5为通过卷积核为1×1×1、输出特征张量的通道数为1的卷积层对解码器组件的第四解码器的输出执行卷积操作,再对卷积操作得到的特征张量执行Sigmoid操作得到结果。 out 5 is to perform a convolution operation on the output of the fourth decoder of the decoder component through a convolution layer with a convolution kernel of 1 × 1 × 1 and the number of channels of the output feature tensor is 1, and then the convolution operation is obtained. The feature tensor performs a sigmoid operation to get the result.
可选的,alpha为0.25。Optionally, alpha is 0.25.
基于同一发明构思,作为对上述方法的实现,本公开实施例还提供了一种建立牙模 形变模型的装置,该装置实施例与前述方法实施例对应,为便于阅读,本装置实施例不再对前述方法实施例中的细节内容进行逐一赘述,但应当明确,本实施例中的建立牙模形变模型的装置能够对应实现前述方法实施例中的全部内容。Based on the same inventive concept, as an implementation of the above method, an embodiment of the present disclosure further provides an apparatus for establishing a dental model deformation model, and the apparatus embodiment corresponds to the foregoing method embodiment. For ease of reading, this apparatus embodiment does not The details in the foregoing method embodiments are described one by one, but it should be clear that the apparatus for establishing a dental model deformation model in this embodiment can correspondingly implement all the foregoing method embodiments.
图9为本公开实施例提供的建立牙模形变模型的装置的结构示意图,如图9所示,本实施例提供的建立牙模形变模型的装置900包括:FIG. 9 is a schematic structural diagram of an apparatus for establishing a dental mold deformation model provided by an embodiment of the present disclosure. As shown in FIG. 9 , the apparatus 900 for establishing a dental mold deformation model provided in this embodiment includes:
样本获取单元91,用于获取样本数据,所述样本数据包括扫描口腔获取的多个初始牙齿模型以及对各初始牙齿模型进行人工加工得到的各初始牙齿模型对应的目标形变模型;The sample acquisition unit 91 is configured to acquire sample data, the sample data includes a plurality of initial tooth models obtained by scanning the oral cavity and a target deformation model corresponding to each initial tooth model obtained by manually processing each initial tooth model;
预处理单元92,用于获取各初始牙齿模型对应的特征张量,各初始牙齿模型对应的特征张量的各元素为各初始牙齿模型所在立方体空间中各个体素的截断符号距离函数值TSDF值;The preprocessing unit 92 is used to obtain the feature tensor corresponding to each initial tooth model, and each element of the feature tensor corresponding to each initial tooth model is the truncated symbol distance function value TSDF value of each voxel in the cubic space where each initial tooth model is located ;
预测单元93,用于将各初始牙齿模型对应的特征张量输入预设网络模型,获取各初始牙齿模型对应的预测形变模型;The prediction unit 93 is used to input the feature tensor corresponding to each initial tooth model into the preset network model, and obtain the predicted deformation model corresponding to each initial tooth model;
优化单元94,用于根据各初始牙齿模型对应的目标形变模型和预测形变模型,对所述预设网络模型进行优化获取牙模形变模型。The optimization unit 94 is configured to optimize the preset network model according to the target deformation model and the predicted deformation model corresponding to each initial tooth model to obtain the tooth model deformation model.
作为本公开实施例一种可选的实施方式,所述预设网络模型,包括:由多个串联结构的编码器组成的编码器组件、自注意力组件、特征传递组件、多尺度分析组件以及由多个串联结构的解码器组成的解码器组件;所述编码器组件的输入为所述预设网络模型的输入,所述编码器组件的输出为所述自注意力组件的输入;所述自注意力组件的输出为所述特征传递组件的输入;所述特征传递组件的输出为所述多尺度分析组件的输入,所述多尺度分析组件的输出为所述解码器组件的输入,所述解码器组件的输出为所述预设网络模型的输出;As an optional implementation manner of the embodiment of the present disclosure, the preset network model includes: an encoder component composed of multiple encoders with a serial structure, a self-attention component, a feature transfer component, a multi-scale analysis component, and a decoder component composed of a plurality of decoders in a series structure; the input of the encoder component is the input of the preset network model, and the output of the encoder component is the input of the self-attention component; the The output of the self-attention component is the input of the feature transfer component; the output of the feature transfer component is the input of the multi-scale analysis component, and the output of the multi-scale analysis component is the input of the decoder component, so The output of the decoder component is the output of the preset network model;
其中,所述自注意力组件用于对所述编码器组件输出的特征张量进行非局部信息提取,获取环境特征张量;所述特征传递组件对所述自注意力组件的输出进行处理,并将处理结果传递至所述多尺度分析组件;所述多尺度分析组件用于提取所述特征传递组件输出的特征张量在多个尺度下的特征张量。Wherein, the self-attention component is used to extract non-local information on the feature tensor output by the encoder component to obtain an environment feature tensor; the feature transfer component processes the output of the self-attention component, and transmit the processing result to the multi-scale analysis component; the multi-scale analysis component is used for extracting the feature tensors in multiple scales of the feature tensor output by the feature transfer component.
作为本公开实施例一种可选的实施方式,所述编码器组件包括三个串联结构的编码器,各编码器包括一残差单元和一下采样单元;各编码器的残差单元用于通过三个串联结构的卷积层对残差单元的输入进行卷积操作并对所述卷积操作的卷积结果和残差单元的输入执行加和操作,各编码器的下采样单元用于将输入特征张量下采样为通道数为输入特征张量的通道数的两倍,长、宽、高为输入特征张量的长、宽、高的二分之一的输出特征张量;As an optional implementation manner of the embodiment of the present disclosure, the encoder component includes three encoders with a serial structure, each encoder includes a residual unit and a down-sampling unit; the residual unit of each encoder is used to pass The three convolutional layers of concatenated structure perform a convolution operation on the input of the residual unit and perform a summation operation on the convolution result of the convolution operation and the input of the residual unit, and the downsampling unit of each encoder is used to The input feature tensor downsampling is the output feature tensor whose number of channels is twice the number of channels of the input feature tensor, and the length, width and height are half of the length, width and height of the input feature tensor;
各残差单元的输入均为所属的编码器的输入,各下采样单元的输入均为所属的编码器的残差单元输出,各下采样单元的输出均为所属的编码器的输出,第一个编码器的输入为编码器组件的输入,第三个编码器的输出为编码器组件的输出,第二编码器和第三个编码器的输入分别为第一个编码器和第二个编码器的输出。The input of each residual unit is the input of the encoder to which it belongs, the input of each down-sampling unit is the output of the residual unit of the encoder to which it belongs, and the output of each down-sampling unit is the output of the encoder to which it belongs. The input of the first encoder is the input of the encoder component, the output of the third encoder is the output of the encoder component, and the input of the second encoder and the third encoder are the first encoder and the second encoder respectively. output of the device.
作为本公开实施例一种可选的实施方式,所述自注意力组件包括:残差单元、第一卷积层、第二卷积层、第三卷积层、第四卷积层、第五卷积层、第六卷积层、第一点积单元、第二点积单元、第一加和单元以及第二加和单元;所述残差单元用于通过三个串联结构的卷积层对残差单元的输入进行卷积操作并对所述卷积操作的卷积结果和残差单元的输入执行加和操作,所述第一点积单元和第二点积单元用于对输入特征张量执行点积操作,所述第一加和单元和第二加和单元用于对输入特征张量执行加和操作;As an optional implementation manner of the embodiment of the present disclosure, the self-attention component includes: a residual unit, a first convolutional layer, a second convolutional layer, a third convolutional layer, a fourth convolutional layer, a Five convolution layers, sixth convolution layer, first dot product unit, second dot product unit, first sum unit, and second sum unit; the residual unit is used for convolution through three concatenated structures The layer performs a convolution operation on the input of the residual unit and performs a summation operation on the convolution result of the convolution operation and the input of the residual unit, and the first dot product unit and the second dot product unit are used for the input. The feature tensor performs a dot product operation, and the first summation unit and the second summation unit are used to perform a summation operation on the input feature tensor;
所述残差单元的输入为所述编码器组件的输出,所述残差单元的输出为所述第一卷积层的输入;所述第一卷积层的输出为所述第二卷积层、所述第三卷积层、所述第四卷积层的输入;所述第一点积单元的输入为所述第二卷积层的输出和所述第三卷积层的输出;所述第二点积单元的输入为所述第一点积单元的输出和所述第四卷积层的输出;所 述第五卷积层的输入为所述第二点积单元的输出;所述第一加和单元的输入为所述第五卷积层的输出和所述第一卷积层的输出;所述第六卷积层的输入为所述第一加和单元的输出;所述第二加和单元的输入为所述第六卷积层的输出和所述残差单元的输出,所述第二加和单元的输出为所述自注意力组件的输出。The input of the residual unit is the output of the encoder component, the output of the residual unit is the input of the first convolutional layer; the output of the first convolutional layer is the second convolutional layer layer, the third convolution layer, the input of the fourth convolution layer; the input of the first dot product unit is the output of the second convolution layer and the output of the third convolution layer; The input of the second dot product unit is the output of the first dot product unit and the output of the fourth convolution layer; the input of the fifth convolution layer is the output of the second dot product unit; The input of the first summing unit is the output of the fifth convolutional layer and the output of the first convolutional layer; the input of the sixth convolutional layer is the output of the first summing unit; The input of the second summing unit is the output of the sixth convolutional layer and the output of the residual unit, and the output of the second summing unit is the output of the self-attention component.
作为本公开实施例一种可选的实施方式,所述特征传递组件包括一个下采样单元和一个残差单元;As an optional implementation manner of the embodiment of the present disclosure, the feature transfer component includes a downsampling unit and a residual unit;
所述下采样单元用于将输入特征张量下采样为通道数为输入特征张量的通道数的两倍,长、宽、高为输入特征张量的长、宽、高的二分之一的输出特征张量,所述残差单元用于通过三个串联结构的卷积层对残差单元的输入进行卷积操作并对所述卷积操作的卷积结果和残差单元的输入执行加和操作;The downsampling unit is used for downsampling the input feature tensor to a number of channels that is twice the number of channels of the input feature tensor, and the length, width and height are half of the length, width and height of the input feature tensor. The output feature tensor of add operation;
所述下采样单元的输入为所述自注意力组件的输出,所述下采样单元的输出为所述残差单元的输入,所述残差单元的输出为所述特征传递组件的输出。The input of the downsampling unit is the output of the self-attention component, the output of the downsampling unit is the input of the residual unit, and the output of the residual unit is the output of the feature transfer component.
作为本公开实施例一种可选的实施方式,所述多尺度分析组件包括:第七卷积层、第八卷积层、第九卷积层、第十卷积层、第十一卷积层、第十二卷积层以及拼接单元;所述第七卷积层、第八卷积层、所述第九卷积层、所述第十卷积层的扩张率均不同;所述拼接单元用于对输入的特征张量执行拼接操作;As an optional implementation manner of the embodiment of the present disclosure, the multi-scale analysis component includes: a seventh convolution layer, an eighth convolution layer, a ninth convolution layer, a tenth convolution layer, and an eleventh convolution layer layer, the twelfth convolution layer and the splicing unit; the expansion rates of the seventh convolution layer, the eighth convolution layer, the ninth convolution layer, and the tenth convolution layer are different; the splicing The unit is used to perform a concatenation operation on the input feature tensor;
所述第七卷积层、所述第八卷积层、所述第九卷积层以及所述第十卷积层的输入均为所述特征传递组件的输出,所述拼接单元的输入为所述特征传递组件的输出、所述第七卷积层的输出、所述第八卷积层的输出、所述第九卷积层的输出以及所述第十卷积层的输出,所述第十一卷积层的输入为所述拼接单元的输出,所述第十二卷积层的输入为所述第十一卷积层的输出;所述第十二卷积层的输出为所述多尺度分析组件的输出。The inputs of the seventh convolutional layer, the eighth convolutional layer, the ninth convolutional layer and the tenth convolutional layer are the outputs of the feature transfer component, and the inputs of the splicing unit are the output of the feature transfer component, the output of the seventh convolutional layer, the output of the eighth convolutional layer, the output of the ninth convolutional layer, and the output of the tenth convolutional layer, the The input of the eleventh convolutional layer is the output of the splicing unit, the input of the twelfth convolutional layer is the output of the eleventh convolutional layer; the output of the twelfth convolutional layer is the output of the Multiscale Analysis component described above.
作为本公开实施例一种可选的实施方式,所述解码器组件包括四个串联结构的解码器;各解码器包括:上采样单元、融合单元以及残差单元;各解码器的残差单元用于通过三个串联结构的卷积层对残差单元的输入进行卷积操作并对所述卷积操作的卷积结果和残差单元的输入执行加和操作,各解码器的融合单元均用于对输入的特征张量进行融合操作;各解码器的上采样单元均用于将输入特征张量上采样为通道数为输入特征张量的通道数的二分之一,长、宽、高为输入特征张量的长、宽、高的二倍的输出特征张量;As an optional implementation manner of the embodiment of the present disclosure, the decoder component includes four decoders in a serial structure; each decoder includes: an upsampling unit, a fusion unit, and a residual unit; the residual unit of each decoder It is used to perform a convolution operation on the input of the residual unit through three convolutional layers in a concatenated structure and perform a summation operation on the convolution result of the convolution operation and the input of the residual unit, and the fusion units of each decoder are all It is used to fuse the input feature tensor; the upsampling unit of each decoder is used to upsample the input feature tensor to a channel number that is half the number of channels of the input feature tensor, and the length, width, The output feature tensor whose height is twice the length, width and height of the input feature tensor;
所述解码器组件的第一个解码器的上采样单元的输入为多尺度分析组件的输出,所述第一个解码器的融合单元的输入为所述第一个解码器的上采样单元的输出和所述自注意力模块的输出;所述第一个解码器的残差单元的输入为所述第一个解码器的融合单元的输出和所述自注意力模块的输出;所述解码器组件的第二个解码器、第三个解码、第四个解码器的上采样单元的输入均为上一个解码器的输出,所述解码器组件的第二个解码器、第三个解码、第四个解码器的融合单元的输入分别对应的编码器的残差单元的输出和所属的解码器的上采样单元输出,所述解码器组件的第二个解码器、第三个解码、第四个解码器的残差单元的输入分别为对应的编码器的残差单元的输出和所属的解码器的融合单元的输出。The input of the upsampling unit of the first decoder of the decoder component is the output of the multi-scale analysis component, and the input of the fusion unit of the first decoder is the upsampling unit of the first decoder. output and the output of the self-attention module; the input of the residual unit of the first decoder is the output of the fusion unit of the first decoder and the output of the self-attention module; the decoding The inputs of the second decoder, the third decoder, and the up-sampling unit of the fourth decoder of the decoder component are the outputs of the previous decoder, and the second decoder and third decoder of the decoder component , the input of the fusion unit of the fourth decoder corresponds to the output of the residual unit of the encoder and the output of the upsampling unit of the corresponding decoder respectively, the second decoder of the decoder component, the third decoding, The input of the residual unit of the fourth decoder is the output of the corresponding residual unit of the encoder and the output of the corresponding fusion unit of the decoder.
作为本公开实施例一种可选的实施方式,各解码器的融合单元包括:第十三卷积层、第十四卷积层、第十五卷积层、第三加和单元、第四加和单元第三点积单元以及第四点积单元;所述第三加和单元和第四加和单元用于对输入执行加和操作,所述第三点积单元和第四点积单元用于对输入执行点积操作;As an optional implementation of the embodiment of the present disclosure, the fusion unit of each decoder includes: a thirteenth convolutional layer, a fourteenth convolutional layer, a fifteenth convolutional layer, a third summation unit, and a fourth summation unit Units a third dot product unit and a fourth dot product unit; the third and fourth summation units are used to perform an addition operation on the input, and the third and fourth dot product units are used to perform an addition operation on the input perform a dot product operation;
所述解码器组件的第一个解码器的融合单元的第十三卷积层和第十四卷积层输入分别为所述第一个解码器的上采样单元的输出和所述自注意力组件的输出,所述解码器组件的第二个解码器、第三个解码器、第四个解码器的融合单元的输入为所属的解码器的上采样单元的输出和对应的编码器的残差单元的输出,所述第三加和单元的输入为所述第十三卷积层的输出和所述第十四卷积层的输出,所述第十五卷积层的输入为所述第 三加和单元的输出,所述第三点积单元的输入为所述第十三卷积层的输出和所述第十五卷积层的输出,所述第四点积单元的输入为所述第十四卷积层的输出和所述第十五卷积层的输出,所述第四加和单元的输入为所述第三点积单元的输出和所述第四点积单元的输出,所述第四加和单元的输出为所属的融合单元的输出。The inputs of the thirteenth convolutional layer and the fourteenth convolutional layer of the fusion unit of the first decoder of the decoder component are the output of the upsampling unit of the first decoder and the self-attention, respectively The output of the component, the input of the fusion unit of the second decoder, the third decoder, and the fourth decoder of the decoder component is the output of the upsampling unit of the corresponding decoder and the residual of the corresponding encoder. The output of the difference unit, the input of the third summation unit is the output of the thirteenth convolutional layer and the output of the fourteenth convolutional layer, and the input of the fifteenth convolutional layer is the third summation output of the unit, the input of the third dot product unit is the output of the thirteenth convolution layer and the output of the fifteenth convolution layer, and the input of the fourth dot product unit is the tenth The output of the fourth convolutional layer and the output of the fifteenth convolutional layer, the input of the fourth summing unit is the output of the third dot product unit and the output of the fourth dot product unit, the The output of the fourth summing unit is the output of the associated fusion unit.
作为本公开实施例一种可选的实施方式,所述优化单元94,具体用于构建损失函数,并根据所述损失函数、各初始牙齿模型对应的目标形变模型以及预测形变模型,对所述预设网络模型进行优化获取牙模形变模型;As an optional implementation manner of the embodiment of the present disclosure, the optimization unit 94 is specifically configured to construct a loss function, and according to the loss function, the target deformation model corresponding to each initial tooth model, and the predicted deformation model, the The preset network model is optimized to obtain the dental model deformation model;
其中,所述损失函数包括:Wherein, the loss function includes:
Figure PCTCN2022081543-appb-000012
Figure PCTCN2022081543-appb-000012
Figure PCTCN2022081543-appb-000013
Figure PCTCN2022081543-appb-000013
Figure PCTCN2022081543-appb-000014
Figure PCTCN2022081543-appb-000014
其中,alpha为常数,out i依次根据所述多尺度分析组件的输出、所述解码器组件的各个解码器的输出进行处理得到的数据,seg为中间督导信号、mean()为求平均值函数。 Wherein, alpha is a constant, out i is the data obtained by sequentially processing the output of the multi-scale analysis component and the output of each decoder of the decoder component, seg is the intermediate supervision signal, and mean() is the averaging function .
本实施例提供的建立牙模形变模型的装置可以执行上述方法实施例提供的训练牙模形变模型的方法,其实现原理与技术效果类似,此处不再赘述。The apparatus for establishing a dental model deformation model provided in this embodiment can execute the method for training a dental model deformation model provided by the above method embodiments, and the implementation principle and technical effect thereof are similar, and are not repeated here.
基于同一发明构思,本公开实施例还提供了一种电子设备。图10为本公开实施例提供的电子设备的结构示意图,如图10所示,本实施例提供的电子设备包括:存储器101和处理器102,存储器101用于存储计算机程序;处理器102用于在调用计算机程序时执行上述方法实施例提供的训练牙模形变模型的方法中的各步骤。Based on the same inventive concept, an embodiment of the present disclosure also provides an electronic device. FIG. 10 is a schematic structural diagram of an electronic device provided by an embodiment of the present disclosure. As shown in FIG. 10 , the electronic device provided by this embodiment includes: a memory 101 and a processor 102. The memory 101 is used for storing computer programs; the processor 102 is used for When the computer program is invoked, each step in the method for training a dental model deformation model provided by the above method embodiments is executed.
具体的,存储器101可用于存储软件程序以及各种数据。存储器101可主要包括存储程序区和存储数据区,其中,存储程序区可存储操作系统、至少一个功能所需的应用程序(比如声音播放功能、图像播放功能等)等;存储数据区可存储根据手机的使用所创建的数据(比如音频数据、电话本等)等。此外,存储器101可以包括高速随机存取存储器,还可以包括非易失性存储器,例如至少一个磁盘存储器件、闪存器件、或其他易失性固态存储器件。Specifically, the memory 101 can be used to store software programs and various data. The memory 101 may mainly include a stored program area and a stored data area, wherein the stored program area may store an operating system, an application program required for at least one function (such as a sound playback function, an image playback function, etc.), etc.; Data created by the use of the mobile phone (such as audio data, phone book, etc.), etc. In addition, memory 101 may include high-speed random access memory, and may also include non-volatile memory, such as at least one magnetic disk storage device, flash memory device, or other volatile solid-state storage device.
处理器102是电子设备的控制中心,利用各种接口和线路连接整个电子设备的各部分,通过运行或执行存储在存储器101中的软件程序和/或模块,以及调用存储在存储器101中的数据,执行电子设备的各种功能和处理数据,从而对电子设备进行整体监控。处理器102可包括一个或多个处理单元。The processor 102 is the control center of the electronic device, using various interfaces and lines to connect various parts of the entire electronic device, by running or executing the software programs and/or modules stored in the memory 101, and calling the data stored in the memory 101. , perform various functions of electronic equipment and process data, so as to monitor electronic equipment as a whole. Processor 102 may include one or more processing units.
此外,应当理解的是,本公开实施例提供的电子设备还可以包括:射频单元、网络模块、音频输出单元、传感器、信号接收单元、显示器、用户接收单元、接口单元、以及电源等部件。本领域技术人员可以理解,上述描述出的电子设备的结构并不构成对电子设备的限定,电子设备可以包括更多或更少的部件,或者组合某些部件,或者不同的部件布置。在本公开实施例中,电子设备包括但不限于手机、平板电脑、笔记本电脑、掌上电脑、车载终端、可穿戴设备、以及计步器等。In addition, it should be understood that the electronic device provided by the embodiments of the present disclosure may further include components such as a radio frequency unit, a network module, an audio output unit, a sensor, a signal receiving unit, a display, a user receiving unit, an interface unit, and a power supply. Those skilled in the art can understand that the structure of the electronic device described above does not constitute a limitation on the electronic device, and the electronic device may include more or less components, or combine some components, or arrange different components. In the embodiments of the present disclosure, electronic devices include, but are not limited to, mobile phones, tablet computers, notebook computers, handheld computers, vehicle-mounted terminals, wearable devices, and pedometers.
其中,射频单元可用于收发信息或通话过程中,信号的接收和发送,具体的,将来自基站的下行数据接收后,给处理器102处理;另外,将上行的数据发送给基站。通常,射频单元包括但不限于天线、至少一个放大器、收发信机、耦合器、低噪声放大器、双工器等。此外,射频单元还可以通过无线通信系统与网络和其他设备通信。The radio frequency unit can be used for receiving and sending signals during sending and receiving of information or during a call. Specifically, after receiving downlink data from the base station, it is processed by the processor 102; in addition, the uplink data is sent to the base station. Typically, a radio frequency unit includes, but is not limited to, an antenna, at least one amplifier, a transceiver, a coupler, a low noise amplifier, a duplexer, and the like. In addition, the radio frequency unit can also communicate with the network and other devices through the wireless communication system.
电子设备通过网络模块为用户提供了无线的宽带互联网访问,如帮助用户收发电子邮件、浏览网页和访问流式媒体等。Electronic devices provide users with wireless broadband Internet access through network modules, such as helping users send and receive e-mails, browse web pages, and access streaming media.
音频输出单元可以将射频单元或网络模块接收的或者在存储器101中存储的音频数据转换成音频信号并且输出为声音。而且,音频输出单元还可以提供与电子设备执行的特定功能相关的音频输出(例如,呼叫信号接收声音、消息接收声音等等)。音频输出单元包括扬声器、蜂鸣器以及受话器等。The audio output unit may convert audio data received by the radio frequency unit or the network module or stored in the memory 101 into audio signals and output as sound. Also, the audio output unit may also provide audio output related to a specific function performed by the electronic device (eg, call signal reception sound, message reception sound, etc.). The audio output unit includes speakers, buzzers, and receivers.
信号接收单元用于接收音频或视频信号。接收单元可以包括图形处理器(Graphics Processing Unit,GPU)和麦克风,图形处理器对在视频捕获模式或图像捕获模式中由图像捕获装置(如摄像头)获得的静态图片或视频的图像数据进行卷积。处理后的图像帧可以显示在显示单元上。经图形处理器处理后的图像帧可以存储在存储器(或其它存储介质)中或者经由射频单元或网络模块进行发送。麦克风可以接收声音,并且能够将这样的声音处理为音频数据。处理后的音频数据可以在电话通话模式的情况下转换为可经由射频单元发送到移动通信基站的格式输出。The signal receiving unit is used to receive audio or video signals. The receiving unit may include a graphics processor (Graphics Processing Unit, GPU) and a microphone, and the graphics processor convolves image data of still pictures or videos obtained by an image capture device (such as a camera) in a video capture mode or an image capture mode . The processed image frames can be displayed on the display unit. The image frames processed by the graphics processor may be stored in memory (or other storage medium) or transmitted via a radio frequency unit or a network module. The microphone can receive sound and can process such sound into audio data. The processed audio data can be converted into a format that can be transmitted to a mobile communication base station via a radio frequency unit for output in the case of a telephone call mode.
电子设备还包括至少一种传感器,比如光传感器、运动传感器以及其他传感器。具体地,光传感器包括环境光传感器及接近传感器,其中,环境光传感器可根据环境光线的明暗来调节显示面板的亮度,接近传感器可在电子设备移动到耳边时,关闭显示面板和/或背光。作为运动传感器的一种,加速计传感器可检测各方向上(一般为三轴)加速度的大小,静止时可检测出重力的大小及方向,可用于识别电子设备姿态(比如横竖屏切换、相关游戏、磁力计姿态校准)、振动识别相关功能(比如计步器、敲击)等;传感器还可以包括指纹传感器、压力传感器、虹膜传感器、分子传感器、陀螺仪、气压计、湿度计、温度计、红外线传感器等,在此不再赘述。The electronic device also includes at least one sensor, such as a light sensor, a motion sensor, and other sensors. Specifically, the light sensor includes an ambient light sensor and a proximity sensor, wherein the ambient light sensor can adjust the brightness of the display panel according to the brightness of the ambient light, and the proximity sensor can turn off the display panel and/or the backlight when the electronic device is moved to the ear . As a kind of motion sensor, the accelerometer sensor can detect the magnitude of acceleration in all directions (usually three axes), and can detect the magnitude and direction of gravity when stationary, which can be used to identify the posture of electronic devices (such as horizontal and vertical screen switching, related games , magnetometer attitude calibration), vibration recognition related functions (such as pedometer, tapping), etc.; sensors can also include fingerprint sensors, pressure sensors, iris sensors, molecular sensors, gyroscopes, barometers, hygrometers, thermometers, infrared Sensors, etc., will not be repeated here.
显示单元用于显示由用户输入的信息或提供给用户的信息。显示单元可包括显示面板,可以采用液晶显示器(Liquid Crystal Display,LCD)、有机发光二极管(Organic Light-Emitting Diode,OLED)等形式来配置显示面板。The display unit is used to display information input by the user or information provided to the user. The display unit may include a display panel, and the display panel may be configured in the form of a liquid crystal display (Liquid Crystal Display, LCD), an organic light-emitting diode (Organic Light-Emitting Diode, OLED), and the like.
用户接收单元可用于接收输入的数字或字符信息,以及产生与电子设备的用户设置以及功能控制有关的键信号输入。具体地,用户接收单元包括触控面板以及其他输入设备。触控面板,也称为触摸屏,可收集用户在其上或附近的触摸操作(比如用户使用手指、触笔等任何适合的物体或附件在触控面板上或在触控面板附近的操作)。触控面板可包括触摸检测装置和触摸控制器两个部分。其中,触摸检测装置检测用户的触摸方位,并检测触摸操作带来的信号,将信号传送给触摸控制器;触摸控制器从触摸检测装置上接收触摸信息,并将它转换成触点坐标,再送给处理器102,接收处理器102发来的命令并加以执行。此外,可以采用电阻式、电容式、红外线以及表面声波等多种类型实现触控面板。除了触控面板,用户接收单元还可以包括其他输入设备。具体地,其他输入设备可以包括但不限于物理键盘、功能键(比如音量控制按键、开关按键等)、轨迹球、鼠标、操作杆,在此不再赘述。The user receiving unit can be used for receiving inputted numerical or character information, and generating key signal input related to user setting and function control of the electronic device. Specifically, the user receiving unit includes a touch panel and other input devices. A touch panel, also known as a touch screen, collects user touch operations on or near it (such as a user's operations on or near the touch panel using a finger, stylus, or any suitable object or accessory). The touch panel may include two parts, a touch detection device and a touch controller. Among them, the touch detection device detects the user's touch orientation, detects the signal brought by the touch operation, and transmits the signal to the touch controller; the touch controller receives the touch information from the touch detection device, converts it into contact coordinates, and then sends it to the touch controller. To the processor 102, the command sent by the processor 102 is received and executed. In addition, the touch panel can be realized by various types of resistive, capacitive, infrared, and surface acoustic waves. Besides the touch panel, the user receiving unit may also include other input devices. Specifically, other input devices may include, but are not limited to, physical keyboards, function keys (such as volume control keys, switch keys, etc.), trackballs, mice, and joysticks, which will not be described herein again.
进一步的,触控面板可覆盖在显示面板上,当触控面板检测到在其上或附近的触摸操作后,传送给处理器102以确定触摸事件的类型,随后处理器102根据触摸事件的类型在显示面板上提供相应的视觉输出。一般情况下,触控面板与显示面板是作为两个独立的部件来实现电子设备的输入和输出功能,但是在某些实施例中,可以将触控面板与显示面板集成而实现电子设备的输入和输出功能,具体此处不做限定。Further, the touch panel can be overlaid on the display panel, and when the touch panel detects a touch operation on or near it, it transmits it to the processor 102 to determine the type of the touch event, and then the processor 102 determines the type of the touch event according to the type of the touch event. Provide the corresponding visual output on the display panel. In general, the touch panel and the display panel are used as two independent components to realize the input and output functions of the electronic device, but in some embodiments, the touch panel and the display panel can be integrated to realize the input and output functions of the electronic device and output functions, which are not specifically limited here.
接口单元为外部装置与电子设备连接的接口。例如,外部装置可以包括有线或无线头戴式耳机端口、外部电源(或电池充电器)端口、有线或无线数据端口、存储卡端口、用于连接具有识别模块的装置的端口、音频输入/输出(I/O)端口、视频I/O端口、耳机端口等等。接口单元可以用于接收来自外部装置的输入(例如,数据信息、电力等等)并且将接收到的输入传输到电子设备中的一个或多个元件或者可以用于在电子设备和外部 装置之间传输数据。The interface unit is an interface for connecting an external device with an electronic device. For example, external devices may include wired or wireless headset ports, external power (or battery charger) ports, wired or wireless data ports, memory card ports, ports for connecting devices with identification modules, audio input/output (I/O) ports, video I/O ports, headphone ports, and more. The interface unit may be used to receive input (eg, data information, power, etc.) from an external device and transmit the received input to one or more elements in the electronic device or may be used to communicate between the electronic device and the external device transfer data.
电子设备还可以包括给各部件供电的电源(比如电池),可选的,电源可以通过电源管理系统与处理器102逻辑相连,从而通过电源管理系统实现管理充电、放电、以及功耗管理等功能。The electronic device may also include a power supply (such as a battery) for supplying power to various components. Optionally, the power supply may be logically connected to the processor 102 through a power management system, so as to manage charging, discharging, and power consumption management functions through the power management system. .
本公开实施例还提供一种计算机可读存储介质,该计算机可读存储介质上存储有计算机程序,计算机程序被处理器执行时实现上述方法实施例提供的训练牙模形变模型的方法。Embodiments of the present disclosure further provide a computer-readable storage medium, where a computer program is stored on the computer-readable storage medium, and when the computer program is executed by a processor, the method for training a dental model deformation model provided by the foregoing method embodiments is implemented.
本领域技术人员应明白,本公开的实施例可提供为方法、系统、或计算机程序产品。因此,本公开可采用完全硬件实施例、完全软件实施例、或结合软件和硬件方面的实施例的形式。而且,本公开可采用在一个或多个其中包含有计算机可用程序代码的计算机可用存储介质上实施的计算机程序产品的形式。As will be appreciated by one skilled in the art, embodiments of the present disclosure may be provided as a method, system, or computer program product. Accordingly, the present disclosure may take the form of an entirely hardware embodiment, an entirely software embodiment, or an embodiment combining software and hardware aspects. Furthermore, the present disclosure may take the form of a computer program product embodied on one or more computer-usable storage media having computer-usable program code embodied therein.
计算机可读介质包括永久性和非永久性、可移动和非可移动存储介质。存储介质可以由任何方法或技术来实现信息存储,信息可以是计算机可读指令、数据结构、程序的模块或其他数据。计算机的存储介质的例子包括,但不限于相变内存(PRAM)、静态随机存取存储器(SRAM)、动态随机存取存储器(DRAM)、其他类型的随机存取存储器(RAM)、只读存储器(ROM)、电可擦除可编程只读存储器(EEPROM)、快闪记忆体或其他内存技术、只读光盘只读存储器(CD-ROM)、数字多功能光盘(DVD)或其他光学存储、磁盒式磁带,磁盘存储或其他磁性存储设备或任何其他非传输介质,可用于存储可以被计算设备访问的信息。根据本文中的界定,计算机可读介质不包括暂存电脑可读媒体(transitorymedia),如调制的数据信号和载波。Computer readable media includes both persistent and non-permanent, removable and non-removable storage media. A storage medium can be implemented by any method or technology for storing information, and the information can be computer readable instructions, data structures, modules of programs, or other data. Examples of computer storage media include, but are not limited to, phase-change memory (PRAM), static random access memory (SRAM), dynamic random access memory (DRAM), other types of random access memory (RAM), read only memory (ROM), Electrically Erasable Programmable Read Only Memory (EEPROM), Flash Memory or other memory technology, Compact Disc Read Only Memory (CD-ROM), Digital Versatile Disc (DVD) or other optical storage, Magnetic tape cartridges, magnetic disk storage or other magnetic storage devices or any other non-transmission medium that can be used to store information that can be accessed by a computing device. As defined herein, computer-readable media does not include transitory computer-readable media, such as modulated data signals and carrier waves.
需要说明的是,在本文中,术语“包括”、“包含”或者其任何其他变体意在涵盖非排他性的包含,从而使得包括一系列要素的过程、方法、物品或者设备不仅包括那些要素,而且还包括没有明确列出的其他要素,或者是还包括为这种过程、方法、物品或者设备所固有的要素。在没有更多限制的情况下,由语句“包括一个……”限定的要素,并不排除在包括所述要素的过程、方法、物品或者设备中还存在另外的相同要素。It should be noted that, herein, the terms "comprising", "comprising" or any other variation thereof are intended to encompass non-exclusive inclusion, such that a process, method, article or device comprising a series of elements includes not only those elements, It also includes other elements not expressly listed or inherent to such a process, method, article or apparatus. Without further limitation, an element qualified by the phrase "comprising a..." does not preclude the presence of additional identical elements in a process, method, article or apparatus that includes the element.
以上所述仅是本公开的具体实施方式,使本领域技术人员能够理解或实现本公开。对这些实施例的多种修改对本领域的技术人员来说将是显而易见的,本文中所定义的一般原理可以在不脱离本公开的精神或范围的情况下,在其它实施例中实现。因此,本公开将不会被限制于本文所述的这些实施例,而是要符合与本文所公开的原理和新颖特点相一致的最宽的范围。The above descriptions are only specific embodiments of the present disclosure, so that those skilled in the art can understand or implement the present disclosure. Various modifications to these embodiments will be readily apparent to those skilled in the art, and the generic principles defined herein may be implemented in other embodiments without departing from the spirit or scope of the present disclosure. Therefore, the present disclosure is not intended to be limited to the embodiments described herein, but is to be accorded the widest scope consistent with the principles and novel features disclosed herein.
工业实用性Industrial Applicability
本公开实施例提供的训练牙模形变模型的方法可以获取牙模形变模型,并根据牙模形变模型对需要进行形变的初始模型,获取符合特定产品要求的牙齿模型,因此通过本公开实施例提供的训练牙模形变模型的方法获取的牙模形变模型可以自动化的将初始牙齿模型转换为符合特定产品要求的牙齿模型,具有很强的工业实用性。The method for training a dental model deformation model provided by the embodiment of the present disclosure can obtain a dental model deformation model, and obtain a tooth model that meets the requirements of a specific product by performing an initial model to be deformed according to the dental model deformation model. Therefore, the embodiments of the present disclosure provide The method for training a dental model deformation model and the obtained dental model deformation model can automatically convert the initial tooth model into a tooth model that meets specific product requirements, which has strong industrial practicability.

Claims (10)

  1. 一种训练牙模形变模型的方法,其特征在于,包括:A method for training a dental model deformation model, comprising:
    获取样本数据,所述样本数据包括扫描口腔获取的多个初始牙齿模型以及对各初始牙齿模型进行人工加工得到的各初始牙齿模型对应的目标形变模型;Obtaining sample data, the sample data includes a plurality of initial tooth models obtained by scanning the oral cavity and a target deformation model corresponding to each initial tooth model obtained by manually processing each initial tooth model;
    获取各初始牙齿模型对应的特征张量,各初始牙齿模型对应的特征张量的各元素为各初始牙齿模型所在立方体空间中各个体素的截断符号距离函数TSDF值;Obtain the feature tensor corresponding to each initial tooth model, and each element of the feature tensor corresponding to each initial tooth model is the truncated signed distance function TSDF value of each voxel in the cubic space where each initial tooth model is located;
    将各初始牙齿模型对应的特征张量输入预设网络模型,获取各初始牙齿模型对应的预测形变模型;Input the feature tensor corresponding to each initial tooth model into the preset network model, and obtain the predicted deformation model corresponding to each initial tooth model;
    根据各初始牙齿模型对应的目标形变模型和预测形变模型,对所述预设网络模型进行优化获取牙模形变模型。According to the target deformation model and the predicted deformation model corresponding to each initial tooth model, the preset network model is optimized to obtain the tooth model deformation model.
  2. 根据权利要求1所述的方法,其特征在于,所述预设网络模型,包括:由多个串联结构的编码器组成的编码器组件、自注意力组件、特征传递组件、多尺度分析组件以及由多个串联结构的解码器组成的解码器组件;所述编码器组件的输入为所述预设网络模型的输入,所述编码器组件的输出为所述自注意力组件的输入;所述自注意力组件的输出为所述特征传递组件的输入;所述特征传递组件的输出为所述多尺度分析组件的输入,所述多尺度分析组件的输出为所述解码器组件的输入,所述解码器组件的输出为所述预设网络模型的输出;The method according to claim 1, wherein the preset network model comprises: an encoder component composed of multiple encoders in a series structure, a self-attention component, a feature transfer component, a multi-scale analysis component, and a decoder component composed of a plurality of decoders in a series structure; the input of the encoder component is the input of the preset network model, and the output of the encoder component is the input of the self-attention component; the The output of the self-attention component is the input of the feature transfer component; the output of the feature transfer component is the input of the multi-scale analysis component, and the output of the multi-scale analysis component is the input of the decoder component, so The output of the decoder component is the output of the preset network model;
    其中,所述自注意力组件用于对所述编码器组件输出的特征张量进行非局部信息提取,获取环境特征张量;所述特征传递组件对所述自注意力组件的输出进行处理,并将处理结果传递至所述多尺度分析组件;所述多尺度分析组件用于提取所述特征传递组件输出的特征张量在多个尺度下的特征张量。Wherein, the self-attention component is used to extract non-local information on the feature tensor output by the encoder component to obtain an environment feature tensor; the feature transfer component processes the output of the self-attention component, and transmit the processing result to the multi-scale analysis component; the multi-scale analysis component is used for extracting the feature tensors in multiple scales of the feature tensor output by the feature transfer component.
  3. 根据权利要求2所述的方法,其特征在于,The method of claim 2, wherein:
    所述编码器组件包括三个串联结构的编码器,各编码器包括一残差单元和一下采样单元;各编码器的残差单元用于通过三个串联结构的卷积层对残差单元的输入进行卷积操作并对所述卷积操作的卷积结果和残差单元的输入执行加和操作,各编码器的下采样单元用于将输入特征张量下采样为通道数为输入特征张量的通道数的两倍,长、宽、高为输入特征张量的长、宽、高的二分之一的输出特征张量;The encoder component includes three encoders with a serial structure, and each encoder includes a residual unit and a down-sampling unit; the residual unit of each encoder is used to pass the three convolutional layers of the serial structure to the residual unit. Perform a convolution operation on the input and perform a sum operation on the convolution result of the convolution operation and the input of the residual unit, and the downsampling unit of each encoder is used to downsample the input feature tensor into the number of channels as the input feature tensor The output feature tensor whose length, width and height are half of the length, width and height of the input feature tensor;
    各残差单元的输入均为所属的编码器的输入,各下采样单元的输入均为所属的编码器的残差单元输出,各下采样单元的输出均为所属的编码器的输出,第一个编码器的输入为编码器组件的输入,第三个编码器的输出为编码器组件的输出,第二编码器和第三个编码器的输入分别为第一个编码器和第二个编码器的输出。The input of each residual unit is the input of the encoder to which it belongs, the input of each down-sampling unit is the output of the residual unit of the encoder to which it belongs, and the output of each down-sampling unit is the output of the encoder to which it belongs. The input of the first encoder is the input of the encoder component, the output of the third encoder is the output of the encoder component, and the input of the second encoder and the third encoder are the first encoder and the second encoder respectively. output of the device.
  4. 根据权利要求2所述的方法,其特征在于,所述自注意力组件包括:残差单元、第一卷积层、第二卷积层、第三卷积层、第四卷积层、第五卷积层、第六卷积层、第一点积单元、第二点积单元、第一加和单元以及第二加和单元;所述残差单元用于通过三个串联结构的卷积层对残差单元的输入进行卷积操作并对所述卷积操作的卷积结果和残差单元的输入执行加和操作,所述第一点积单元和第二点积单元用于对输入特征张量执行点积操作,所述第一加和单元和第二加和单元用于对输入特征张量执行加和操作;The method according to claim 2, wherein the self-attention component comprises: a residual unit, a first convolutional layer, a second convolutional layer, a third convolutional layer, a fourth convolutional layer, a Five convolution layers, sixth convolution layer, first dot product unit, second dot product unit, first sum unit, and second sum unit; the residual unit is used for convolution through three concatenated structures The layer performs a convolution operation on the input of the residual unit and performs a summation operation on the convolution result of the convolution operation and the input of the residual unit, and the first dot product unit and the second dot product unit are used for the input. The feature tensor performs a dot product operation, and the first summation unit and the second summation unit are used to perform a summation operation on the input feature tensor;
    所述残差单元的输入为所述编码器组件的输出,所述残差单元的输出为所述第一卷积层的输入;所述第一卷积层的输出为所述第二卷积层、所述第三卷积层、所述第四卷 积层的输入;所述第一点积单元的输入为所述第二卷积层的输出和所述第三卷积层的输出;所述第二点积单元的输入为所述第一点积单元的输出和所述第四卷积层的输出;所述第五卷积层的输入为所述第二点积单元的输出;所述第一加和单元的输入为所述第五卷积层的输出和所述第一卷积层的输出;所述第六卷积层的输入为所述第一加和单元的输出;所述第二加和单元的输入为所述第六卷积层的输出和所述残差单元的输出,所述第二加和单元的输出为所述自注意力组件的输出。The input of the residual unit is the output of the encoder component, the output of the residual unit is the input of the first convolutional layer; the output of the first convolutional layer is the second convolutional layer layer, the third convolution layer, the input of the fourth convolution layer; the input of the first dot product unit is the output of the second convolution layer and the output of the third convolution layer; The input of the second dot product unit is the output of the first dot product unit and the output of the fourth convolution layer; the input of the fifth convolution layer is the output of the second dot product unit; The input of the first summing unit is the output of the fifth convolutional layer and the output of the first convolutional layer; the input of the sixth convolutional layer is the output of the first summing unit; The input of the second summing unit is the output of the sixth convolutional layer and the output of the residual unit, and the output of the second summing unit is the output of the self-attention component.
  5. 根据权利要求2所述的方法,其特征在于,所述特征传递组件包括一个下采样单元和一个残差单元;The method according to claim 2, wherein the feature transfer component comprises a downsampling unit and a residual unit;
    所述下采样单元用于将输入特征张量下采样为通道数为输入特征张量的通道数的两倍,长、宽、高为输入特征张量的长、宽、高的二分之一的输出特征张量,所述残差单元用于通过三个串联结构的卷积层对残差单元的输入进行卷积操作并对所述卷积操作的卷积结果和残差单元的输入执行加和操作;The downsampling unit is used for downsampling the input feature tensor to a number of channels that is twice the number of channels of the input feature tensor, and the length, width and height are half of the length, width and height of the input feature tensor. The output feature tensor of add operation;
    所述下采样单元的输入为所述自注意力组件的输出,所述下采样单元的输出为所述残差单元的输入,所述残差单元的输出为所述特征传递组件的输出。The input of the downsampling unit is the output of the self-attention component, the output of the downsampling unit is the input of the residual unit, and the output of the residual unit is the output of the feature transfer component.
  6. 根据权利要求2所述的方法,其特征在于,所述多尺度分析组件包括:第七卷积层、第八卷积层、第九卷积层、第十卷积层、第十一卷积层、第十二卷积层以及拼接单元;所述第七卷积层、第八卷积层、所述第九卷积层、所述第十卷积层的扩张率均不同;所述拼接单元用于对输入的特征张量执行拼接操作;The method according to claim 2, wherein the multi-scale analysis component comprises: a seventh convolution layer, an eighth convolution layer, a ninth convolution layer, a tenth convolution layer, and an eleventh convolution layer layer, the twelfth convolution layer and the splicing unit; the expansion rates of the seventh convolution layer, the eighth convolution layer, the ninth convolution layer, and the tenth convolution layer are different; the splicing The unit is used to perform a concatenation operation on the input feature tensor;
    所述第七卷积层、所述第八卷积层、所述第九卷积层以及所述第十卷积层的输入均为所述特征传递组件的输出,所述拼接单元的输入为所述特征传递组件的输出、所述第七卷积层的输出、所述第八卷积层的输出、所述第九卷积层的输出以及所述第十卷积层的输出,所述第十一卷积层的输入为所述拼接单元的输出,所述第十二卷积层的输入为所述第十一卷积层的输出;所述第十二卷积层的输出为所述多尺度分析组件的输出。The inputs of the seventh convolutional layer, the eighth convolutional layer, the ninth convolutional layer and the tenth convolutional layer are the outputs of the feature transfer component, and the inputs of the splicing unit are the output of the feature transfer component, the output of the seventh convolutional layer, the output of the eighth convolutional layer, the output of the ninth convolutional layer, and the output of the tenth convolutional layer, the The input of the eleventh convolutional layer is the output of the splicing unit, the input of the twelfth convolutional layer is the output of the eleventh convolutional layer; the output of the twelfth convolutional layer is the output of the Multiscale Analysis component described above.
  7. 根据权利要求2所述的方法,其特征在于,所述解码器组件包括四个串联结构的解码器;各解码器包括:上采样单元、融合单元以及残差单元;各解码器的残差单元用于通过三个串联结构的卷积层对残差单元的输入进行卷积操作并对所述卷积操作的卷积结果和残差单元的输入执行加和操作,各解码器的融合单元均用于对输入的特征张量进行融合操作;各解码器的上采样单元均用于将输入特征张量上采样为通道数为输入特征张量的通道数的二分之一,长、宽、高为输入特征张量的长、宽、高的二倍的输出特征张量;The method according to claim 2, wherein the decoder component comprises four decoders in a serial structure; each decoder comprises: an upsampling unit, a fusion unit and a residual unit; the residual unit of each decoder It is used to perform a convolution operation on the input of the residual unit through three convolutional layers in a concatenated structure and perform a summation operation on the convolution result of the convolution operation and the input of the residual unit, and the fusion units of each decoder are all It is used to fuse the input feature tensor; the upsampling unit of each decoder is used to upsample the input feature tensor to a channel number that is half the number of channels of the input feature tensor, and the length, width, The output feature tensor whose height is twice the length, width and height of the input feature tensor;
    所述解码器组件的第一个解码器的上采样单元的输入为多尺度分析组件的输出,所述第一个解码器的融合单元的输入为所述第一个解码器的上采样单元的输出和所述自注意力模块的输出;所述第一个解码器的残差单元的输入为所述第一个解码器的融合单元的输出和所述自注意力模块的输出;所述解码器组件的第二个解码器、第三个解码、第四个解码器的上采样单元的输入均为上一个解码器的输出,所述解码器组件的第二个解码器、第三个解码、第四个解码器的融合单元的输入分别对应的编码器的残差单元的输出和所属的解码器的上采样单元输出,所述解码器组件的第二个解码器、第三个解码、第四个解码器的残差单元的输入分别为对应的编码器的残差单元的输出和所属的解码器的融合单元的输出。The input of the upsampling unit of the first decoder of the decoder component is the output of the multi-scale analysis component, and the input of the fusion unit of the first decoder is the upsampling unit of the first decoder. output and the output of the self-attention module; the input of the residual unit of the first decoder is the output of the fusion unit of the first decoder and the output of the self-attention module; the decoding The inputs of the second decoder, the third decoder, and the up-sampling unit of the fourth decoder of the decoder component are the outputs of the previous decoder, and the second decoder and third decoder of the decoder component , the input of the fusion unit of the fourth decoder corresponds to the output of the residual unit of the encoder and the output of the upsampling unit of the corresponding decoder respectively, the second decoder of the decoder component, the third decoding, The input of the residual unit of the fourth decoder is the output of the corresponding residual unit of the encoder and the output of the corresponding fusion unit of the decoder.
  8. 根据权利要求7所述的方法,其特征在于,各解码器的融合单元包括:第十三卷积层、第十四卷积层、第十五卷积层、第三加和单元、第四加和单元第三点积单元以 及第四点积单元;所述第三加和单元和第四加和单元用于对输入执行加和操作,所述第三点积单元和第四点积单元用于对输入执行点积操作;The method according to claim 7, wherein the fusion unit of each decoder comprises: a thirteenth convolutional layer, a fourteenth convolutional layer, a fifteenth convolutional layer, a third summation unit, and a fourth summation unit Units a third dot product unit and a fourth dot product unit; the third and fourth summation units are used to perform an addition operation on the input, and the third and fourth dot product units are used to perform an addition operation on the input perform a dot product operation;
    所述解码器组件的第一个解码器的融合单元的第十三卷积层和第十四卷积层输入分别为所述第一个解码器的上采样单元的输出和所述自注意力组件的输出,所述解码器组件的第二个解码器、第三个解码器、第四个解码器的融合单元的输入为所属的解码器的上采样单元的输出和对应的编码器的残差单元的输出,所述第三加和单元的输入为所述第十三卷积层的输出和所述第十四卷积层的输出,所述第十五卷积层的输入为所述第三加和单元的输出,所述第三点积单元的输入为所述第十三卷积层的输出和所述第十五卷积层的输出,所述第四点积单元的输入为所述第十四卷积层的输出和所述第十五卷积层的输出,所述第四加和单元的输入为所述第三点积单元的输出和所述第四点积单元的输出,所述第四加和单元的输出为所属的融合单元的输出。The inputs of the thirteenth convolutional layer and the fourteenth convolutional layer of the fusion unit of the first decoder of the decoder component are the output of the upsampling unit of the first decoder and the self-attention, respectively The output of the component, the input of the fusion unit of the second decoder, the third decoder, and the fourth decoder of the decoder component is the output of the upsampling unit of the corresponding decoder and the residual of the corresponding encoder. The output of the difference unit, the input of the third summation unit is the output of the thirteenth convolutional layer and the output of the fourteenth convolutional layer, and the input of the fifteenth convolutional layer is the third summation output of the unit, the input of the third dot product unit is the output of the thirteenth convolution layer and the output of the fifteenth convolution layer, and the input of the fourth dot product unit is the tenth The output of the fourth convolutional layer and the output of the fifteenth convolutional layer, the input of the fourth summing unit is the output of the third dot product unit and the output of the fourth dot product unit, the The output of the fourth summing unit is the output of the associated fusion unit.
  9. 根据权利要求1-8任一项所述的方法,其特征在于,所述根据各初始牙齿模型对应的目标形变模型和预测形变模型,对所述预设网络模型进行优化获取牙模形变模型,包括:The method according to any one of claims 1-8, characterized in that, according to the target deformation model and the predicted deformation model corresponding to each initial tooth model, the preset network model is optimized to obtain the tooth model deformation model, include:
    构建损失函数,并根据所述损失函数、各初始牙齿模型对应的目标形变模型以及预测形变模型,对所述预设网络模型进行优化获取牙模形变模型;Constructing a loss function, and optimizing the preset network model to obtain a tooth model deformation model according to the loss function, the target deformation model corresponding to each initial tooth model, and the predicted deformation model;
    其中,所述损失函数包括:Wherein, the loss function includes:
    Figure PCTCN2022081543-appb-100001
    Figure PCTCN2022081543-appb-100001
    Figure PCTCN2022081543-appb-100002
    Figure PCTCN2022081543-appb-100002
    Figure PCTCN2022081543-appb-100003
    Figure PCTCN2022081543-appb-100003
    其中,alpha为常数,out i依次根据所述多尺度分析组件的输出、所述解码器组件的各个解码器的输出进行处理得到的数据,seg为中间督导信号、mean()为求平均值函数。 Wherein, alpha is a constant, out i is the data obtained by sequentially processing the output of the multi-scale analysis component and the output of each decoder of the decoder component, seg is the intermediate supervision signal, and mean() is the averaging function .
  10. 一种建立牙模形变模型的装置,其特征在于,包括:A device for establishing a dental model deformation model, comprising:
    样本获取单元,用于获取样本数据,所述样本数据包括扫描口腔获取的多个初始牙齿模型以及对各初始牙齿模型进行人工加工得到的各初始牙齿模型对应的目标形变模型;a sample acquisition unit, configured to acquire sample data, the sample data includes a plurality of initial tooth models obtained by scanning the oral cavity and a target deformation model corresponding to each initial tooth model obtained by manually processing each initial tooth model;
    预处理单元,用于获取各初始牙齿模型对应的特征张量,各初始牙齿模型对应的特征张量的各元素为各初始牙齿模型所在立方体空间中各个体素的截断符号距离函数值TSDF值;The preprocessing unit is used to obtain the feature tensor corresponding to each initial tooth model, and each element of the feature tensor corresponding to each initial tooth model is the truncated signed distance function value TSDF value of each voxel in the cubic space where each initial tooth model is located;
    预测单元,用于将各初始牙齿模型对应的特征张量输入预设网络模型,获取各初始牙齿模型对应的预测形变模型;The prediction unit is used to input the feature tensor corresponding to each initial tooth model into the preset network model, and obtain the predicted deformation model corresponding to each initial tooth model;
    优化单元,用于根据各初始牙齿模型对应的目标形变模型和预测形变模型,对所述预设网络模型进行优化获取牙模形变模型。The optimization unit is configured to optimize the preset network model according to the target deformation model and the predicted deformation model corresponding to each initial tooth model to obtain the tooth model deformation model.
PCT/CN2022/081543 2021-03-17 2022-03-17 Method and apparatus for training dental cast deformation model WO2022194258A1 (en)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN202110287715.XA CN112884885A (en) 2021-03-17 2021-03-17 Method and device for training dental model deformation model
CN202110287715.X 2021-03-17

Publications (1)

Publication Number Publication Date
WO2022194258A1 true WO2022194258A1 (en) 2022-09-22

Family

ID=76041030

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2022/081543 WO2022194258A1 (en) 2021-03-17 2022-03-17 Method and apparatus for training dental cast deformation model

Country Status (2)

Country Link
CN (1) CN112884885A (en)
WO (1) WO2022194258A1 (en)

Families Citing this family (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112884885A (en) * 2021-03-17 2021-06-01 先临三维科技股份有限公司 Method and device for training dental model deformation model
CN113393568B (en) * 2021-06-08 2022-07-29 先临三维科技股份有限公司 Training method, device, equipment and medium for neck-edge line-shape-variation prediction model

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20190125493A1 (en) * 2016-04-22 2019-05-02 Dental Monitoring Dentition control method
CN110930513A (en) * 2019-11-18 2020-03-27 上海交通大学医学院附属第九人民医院 Dental jaw simulation model generation method and system and dental appliance
CN111265317A (en) * 2020-02-10 2020-06-12 上海牙典医疗器械有限公司 Tooth orthodontic process prediction method
CN111612778A (en) * 2020-05-26 2020-09-01 上海交通大学 Preoperative CTA and intraoperative X-ray coronary artery registration method
CN112884885A (en) * 2021-03-17 2021-06-01 先临三维科技股份有限公司 Method and device for training dental model deformation model

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20190125493A1 (en) * 2016-04-22 2019-05-02 Dental Monitoring Dentition control method
CN110930513A (en) * 2019-11-18 2020-03-27 上海交通大学医学院附属第九人民医院 Dental jaw simulation model generation method and system and dental appliance
CN111265317A (en) * 2020-02-10 2020-06-12 上海牙典医疗器械有限公司 Tooth orthodontic process prediction method
CN111612778A (en) * 2020-05-26 2020-09-01 上海交通大学 Preoperative CTA and intraoperative X-ray coronary artery registration method
CN112884885A (en) * 2021-03-17 2021-06-01 先临三维科技股份有限公司 Method and device for training dental model deformation model

Also Published As

Publication number Publication date
CN112884885A (en) 2021-06-01

Similar Documents

Publication Publication Date Title
CN110149541B (en) Video recommendation method and device, computer equipment and storage medium
CN110097019B (en) Character recognition method, character recognition device, computer equipment and storage medium
WO2022194258A1 (en) Method and apparatus for training dental cast deformation model
CN110413837B (en) Video recommendation method and device
CN110147533B (en) Encoding method, apparatus, device and storage medium
US11995406B2 (en) Encoding method, apparatus, and device, and storage medium
CN110211593B (en) Voice recognition method and device, electronic equipment and storage medium
CN110503160B (en) Image recognition method and device, electronic equipment and storage medium
US20240193831A1 (en) Method, apparatus, and device for processing image, and storage medium
CN110991457B (en) Two-dimensional code processing method and device, electronic equipment and storage medium
CN111581958A (en) Conversation state determining method and device, computer equipment and storage medium
CN110991445B (en) Vertical text recognition method, device, equipment and medium
CN110705614A (en) Model training method and device, electronic equipment and storage medium
CN113836946A (en) Method, device, terminal and storage medium for training scoring model
CN113066508A (en) Voice content processing method, device and equipment and readable storage medium
CN110232417B (en) Image recognition method and device, computer equipment and computer readable storage medium
CN110990549A (en) Method and device for obtaining answers, electronic equipment and storage medium
CN113593521B (en) Speech synthesis method, device, equipment and readable storage medium
CN113343709B (en) Method for training intention recognition model, method, device and equipment for intention recognition
CN112989198B (en) Push content determination method, device, equipment and computer-readable storage medium
CN114925667A (en) Content classification method, device, equipment and computer readable storage medium
CN111310701B (en) Gesture recognition method, device, equipment and storage medium
CN109285114B (en) Picture processing method and equipment
CN115221888A (en) Entity mention identification method, device, equipment and storage medium
CN112487162A (en) Method, device and equipment for determining text semantic information and storage medium

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 22770610

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE

122 Ep: pct application non-entry in european phase

Ref document number: 22770610

Country of ref document: EP

Kind code of ref document: A1