CN116485809B

CN116485809B - Tooth example segmentation method and system based on self-attention and receptive field adjustment

Info

Publication number: CN116485809B
Application number: CN202210767999.7A
Authority: CN
Inventors: 高珊珊; 窦文涵; 迟静; 周元峰
Original assignee: Shandong University of Finance and Economics
Current assignee: Shandong University of Finance and Economics
Priority date: 2022-07-01
Filing date: 2022-07-01
Publication date: 2023-12-15
Anticipated expiration: 2042-07-01
Also published as: CN116485809A

Abstract

The invention discloses a tooth example segmentation method and a system based on self-attention and receptive field adjustment; wherein the method comprises: acquiring a tooth CBCT image to be segmented; the method comprises the steps that a tooth CBCT image to be segmented is input into a trained tooth segmentation network, and a tooth image segmentation result is obtained; the working principle of the tooth segmentation network comprises the following steps: positioning the mass center of the tooth to obtain the fusion characteristic of the mass center of the tooth; determining geometrical structure information of a tooth body based on the CBCT image of the tooth to be segmented and the fusion characteristic of the mass center of the tooth; based on the geometrical structure information of the tooth bodies, the structure information of the tooth roots is obtained by adopting a multi-scale expansion convolution fusion mode. According to the invention, by designing a deep learning network, a single tooth is automatically segmented from the tooth CBCT image, and the tooth geometry structure and detail information can be accurately restored.

Description

Tooth example segmentation method and system based on self-attention and receptive field adjustment

Technical Field

The invention relates to the technical field of image segmentation, in particular to a tooth example segmentation method and a tooth example segmentation system based on self-attention and receptive field adjustment.

Background

The statements in this section merely relate to the background of the present disclosure and may not necessarily constitute prior art.

Digitization technology has found wide application in the personalized medicine field, such as digital dentistry. Automatic and accurate acquisition of 3D tooth segmentation results is often one of the important and fundamental steps in digital dental diagnosis and treatment, such as digital orthodontic, and the subsequent diagnosis and treatment steps are generally based on independent single teeth.

In the two existing methods for acquiring 3D tooth data, the oral scan data can only provide surface information of the tooth portion contained in the oral cavity, but cannot provide tooth body structures such as the root of the tooth, so that a 3D tooth model with a complete tooth geometry cannot be generated by using the same.

The following problems remain with respect to example segmentation of CBCT images:

(1) The problem of adhesion exists between adjacent teeth in CBCT data, and the adjacent teeth are difficult to distinguish and identify so as to realize accurate segmentation;

(2) The tooth body, especially the tooth root part, has complex shape, the tooth roots of different teeth have one to three different shapes, the contrast ratio with surrounding alveolar bone is low, and the details of the tooth root and the like are difficult to capture;

(3) The CBCT image simultaneously contains background soft tissues, so that noise interference is easily formed on tooth segmentation.

Aiming at the problems, even some of the most advanced deep learning methods still cannot better capture and restore the characteristics of complex and diverse geometric tooth structures, tooth root details and the like, so that the segmentation precision is still not high.

Disclosure of Invention

In order to solve the defects in the prior art, the invention provides a tooth example segmentation method and a tooth example segmentation system based on self-attention and receptive field adjustment; a deep learning network is designed, single teeth are automatically segmented from CBCT, and tooth geometric structure and detail information can be accurately restored.

In a first aspect, the present invention provides a method of tooth instance segmentation based on self-attention and receptive field adjustment;

a method of tooth instance segmentation based on self-attention and receptive field adjustment, comprising:

acquiring a tooth CBCT image to be segmented;

the method comprises the steps that a tooth CBCT image to be segmented is input into a trained tooth segmentation network, and a tooth image segmentation result is obtained; the working principle of the tooth segmentation network comprises the following steps: positioning the mass center of the tooth to obtain the fusion characteristic of the mass center of the tooth; determining geometrical structure information of a tooth body based on the CBCT image of the tooth to be segmented and the fusion characteristic of the mass center of the tooth; based on the geometrical structure information of the tooth bodies, the structure information of the tooth roots is obtained by adopting a multi-scale expansion convolution fusion mode.

In a second aspect, the present invention provides a tooth instance segmentation system based on self-attention and receptive field adjustment;

a tooth instance segmentation system based on self-attention and receptive field adjustment, comprising:

an acquisition module configured to: acquiring a tooth CBCT image to be segmented;

a segmentation module configured to: the method comprises the steps that a tooth CBCT image to be segmented is input into a trained tooth segmentation network, and a tooth image segmentation result is obtained; the working principle of the tooth segmentation network comprises the following steps: positioning the mass center of the tooth to obtain the fusion characteristic of the mass center of the tooth; determining geometrical structure information of a tooth body based on the CBCT image of the tooth to be segmented and the fusion characteristic of the mass center of the tooth; based on the geometrical structure information of the tooth bodies, the structure information of the tooth roots is obtained by adopting a multi-scale expansion convolution fusion mode.

In a third aspect, the present invention also provides an electronic device, including:

a memory for non-transitory storage of computer readable instructions; and

a processor for executing the computer-readable instructions,

wherein the computer readable instructions, when executed by the processor, perform the method of the first aspect described above.

In a fourth aspect, the invention also provides a storage medium storing non-transitory computer readable instructions, wherein the instructions of the method of the first aspect are executed when the non-transitory computer readable instructions are executed by a computer.

In a fifth aspect, the invention also provides a computer program product comprising a computer program for implementing the method of the first aspect described above when run on one or more processors.

Compared with the prior art, the invention has the beneficial effects that:

1. aiming at the problems of low efficiency, poor generalization capability and the like of the traditional method, a two-stage deep learning framework from tooth centroid positioning to tooth instance segmentation is provided, and automatic and accurate tooth segmentation from CBCT images is realized. The method comprises the steps of firstly, obtaining the tooth mass center of each tooth with complex shape by utilizing a mass center prediction network based on V-net, and obtaining example-level tooth information of a single tooth while realizing accurate spatial positioning of the single tooth; in the second stage, two functional modules are introduced to improve the extraction and restoration capability of the network to the characteristic semantic relation of the CBCT image so as to realize robust and accurate tooth segmentation from CBCT data.

2. Aiming at the problem that the tooth geometry is difficult to capture due to the fact that similar noise is numerous and contrast is low in the CBCT image, the tooth geometry information guiding module based on the 3D self-attention mechanism is provided, and is used for simultaneously acquiring the intra-slice dependency and inter-slice dependency in the CBCT image, improving the classification and acquisition capacity of a network on different tissue pixel points in the CBCT tooth data, and enabling the constructed network to have better capture capacity on the complete tooth geometry.

3. Aiming at the problem that details such as tooth roots and tooth surfaces are difficult to maintain in the tooth segmentation process, a tooth characteristic integration module based on expansion convolution multi-scale fusion is provided, context semantic information of tooth data is captured in parallel through expansion convolution of 3 different expansion rates, and the details such as tooth surfaces to tooth roots are captured in a multi-scale mode. And then, by fusing expansion convolution layers with different expansion rates, the network captures and restores the tooth detail information in a multi-scale manner with a larger receptive field while the resolution ratio is kept unchanged, and the gradual decoding process of the network is guided.

Drawings

The accompanying drawings, which are included to provide a further understanding of the invention and are incorporated in and constitute a part of this specification, illustrate embodiments of the invention and together with the description serve to explain the invention.

FIG. 1 is a flow chart of a method according to a first embodiment, wherein the ppline frame is performed in two stages, wherein the first stage obtains the center of mass of the tooth, and the second stage performs example segmentation of the tooth;

FIG. 2 is a tooth geometry guidance module based on the 3D self-coating mechanism according to the first embodiment;

FIGS. 3 (a) -3 (c) are visual diagrams of the corresponding receptive fields of the dilation convolutions of dilation ratios 1,3,5 of example one;

fig. 4 is a block diagram of a tooth feature integration module based on multi-scale dilation convolution according to a first embodiment, wherein the dilation convolution layers with a convolution kernel of 3 with a dilation ratio of 1,3,5 are fused.

Detailed Description

It should be noted that the following detailed description is exemplary and is intended to provide further explanation of the invention. Unless defined otherwise, all technical and scientific terms used herein have the same meaning as commonly understood by one of ordinary skill in the art to which this invention belongs.

It is noted that the terminology used herein is for the purpose of describing particular embodiments only and is not intended to be limiting of exemplary embodiments according to the present invention. As used herein, unless the context clearly indicates otherwise, the singular forms also are intended to include the plural forms, and furthermore, it is to be understood that the terms "comprises" and "comprising" and any variations thereof are intended to cover non-exclusive inclusions, such as, for example, processes, methods, systems, products or devices that comprise a series of steps or units, are not necessarily limited to those steps or units that are expressly listed, but may include other steps or units that are not expressly listed or inherent to such processes, methods, products or devices.

Embodiments of the invention and features of the embodiments may be combined with each other without conflict.

All data acquisition in the embodiment is legal application of the data on the basis of meeting laws and regulations and agreements of users.

CBCT provides comprehensive oral information including complete tooth bodies and gum bones by using three-dimensional cone beam X-ray scanning, and the data format standard is very suitable for 3D segmentation of tooth data, and the invention shows the case and the result of segmenting teeth from CBCT images as shown in fig. 2. Thus, the present invention selects oral CBCT data for research in the example segmentation and recognition work of teeth.

CBCT, cone Beam Computed Tomography, tomography.

Example 1

The present embodiments provide a tooth example segmentation method based on self-attention and receptive field adjustment;

s101: acquiring a tooth CBCT image to be segmented;

s102: the method comprises the steps that a tooth CBCT image to be segmented is input into a trained tooth segmentation network, and a tooth image segmentation result is obtained;

the working principle of the tooth segmentation network comprises the following steps:

positioning the mass center of the tooth to obtain the fusion characteristic of the mass center of the tooth;

determining geometrical structure information of a tooth body based on the CBCT image of the tooth to be segmented and the fusion characteristic of the mass center of the tooth;

based on the geometrical structure information of the tooth bodies, the structure information of the tooth roots is obtained by adopting a multi-scale expansion convolution fusion mode.

Further, the tooth segmentation network includes: a tooth centroid positioning network and a tooth instance segmentation network; the tooth instance segmentation network includes: the tooth example segmentation network coding part, a tooth body information guiding module based on a 3D self-attention mechanism and a multi-scale expansion convolution fusion tooth characteristic integration module.

Further, the tooth centroid positioning network comprises: a first V-net network;

a first V-net network comprising: a first encoder and a first decoder;

the input end of the first encoder and the output end of the first decoder are connected with a first multiplier;

the output end of the first multiplier is connected with the input end of the clustering module;

the output end of the clustering module is connected with the input end of the cutting module;

the output end of the cutting module and the input end of the first encoder are connected with the splicing unit;

the output end of the splicing unit is connected with the input end of the tooth example segmentation network.

V-Net is a variant of U-Net, a deep learning framework that is primarily used to process three-dimensional medical data. Compared with U-Net, the method uses residual error architecture in each convolution stage, so that the information in the feature diagram can be more effectively utilized, and the method has good performance in the field of processing three-dimensional medical data segmentation, so that V-Net is selected as a basic network for tooth segmentation work.

Further, the clustering module is realized by adopting a density-based rapid search clustering algorithm.

Further, the clipping module is configured to clip out a standard cube area with the predicted centroid of each tooth as a center point, so as to frame each tooth.

Further, the tooth centroid positioning network, the working principle includes:

inputting a tooth CBCT image to be segmented into a first V-net network;

generating a 3D tooth centroid offset map of the tooth by the first V-net network;

converting each voxel in the 3D tooth centroid offset map into a three-dimensional vector, wherein the target point pointed by the direction of the voxel is the tooth centroid;

generating a tooth centroid density map H according to the frequency of each three-dimensional vector in the 3D tooth centroid offset map pointed by other three-dimensional vectors _C ；

Density-based rapid search clustering algorithm for calculating tooth centroid density map H _C Local density ρ of each voxel point _i And H _C Euclidean distance delta between each voxel point of (a) and the highest density voxel point _i ；

Will local density ρ _i Points greater than the set density threshold are taken as a first candidate point set;

will be European distance delta _i A point larger than the set distance threshold is used as a second candidate point set;

intersection of the first candidate point set and the second candidate point set to generate candidate tooth centroid point C _i ^；

And comparing the predicted candidate tooth centroid with the true tooth centroid, and outputting the candidate tooth centroid corresponding to the minimum distance value as a predicted final tooth centroid point.

Illustratively, let k three-dimensional vectors in the centroid offset map, calculate the point-by-point k weight C _k Directional frequency, tooth centroid density map H _C The calculation formula of (2) is as follows:

illustratively, a density-based fast search clustering algorithm calculates a tooth centroid density map H _C Local density ρ of each voxel point _i And H _C Euclidean distance delta between each voxel point of (a) and the highest density voxel point _i The method comprises the steps of carrying out a first treatment on the surface of the The method specifically comprises the following steps:

wherein dc is a cutoff distance, which is a super parameter; so ρ _i The number of points having a distance from point i smaller than dc.

δ _i For describing the minimum distance between point i to other higher density points.

It should be appreciated that the center of mass of a tooth can accurately represent the spatial position of the tooth, and is often used for positioning a single tooth in a tooth instance segmentation work, so that the positions of different teeth can be well distinguished. Therefore, the invention designs a tooth centroid positioning network for predicting the accurate tooth centroid.

In the data preprocessing stage, the coordinate center (x _i ，y _i ，z _i ) Tag C, which sets it as the centroid of a set of teeth _i ＝{c ₁ ，c ₂ ,…c _k And the ground trunk is used as the centroid of the teeth during network training.

As shown in fig. 1, for unified input, the 3D CBCT image is uniformly converted into 240 x 240 size, as input to the tooth centroid location network, and introducing a V-net frame to perform first-stage coarse segmentation on the tooth data, and simultaneously generating a binarization map and a centroid 3D offset map of the tooth.

The generated binarized image of the tooth distinguishes a foreground tooth object and a background tissue in the CBCT image;

as shown in fig. 1, each voxel in the 3D tooth centroid offset map is converted into a three-dimensional vector, and the direction of the voxel points to the target point that is the tooth centroid.

Generating a tooth centroid density map H based on the frequencies at which each three-dimensional vector in the generated 3D centroid offset map is pointed to by the other three-dimensional vector _C Here, the position of the three-dimensional vector, which is directed at a higher frequency, is more likely to be the centroid.

Density-based rapid search clustering algorithm for calculating tooth centroid density map H _C Local density ρ of each voxel point _i And H _C Euclidean distance delta between each voxel point of (a) and high density voxel point _i ；

Will have a greater distance delta _i Has a relatively large local density ρ at the same time _i Is defined as a cluster target point as a predicted candidate tooth centroid C _i ^。

Predicted centroid C _i Equation P of _C The representation is as follows:

P _C ＝(ρ _i >μ)∩(δ _i >λ) (1)

the scalar μ=20 and λ=10 are the density threshold and the distance threshold, respectively, according to which the invention is based as a clustering indicator.

Furthermore, candidate tooth centroid C according to prediction _i To the tag tooth centroid C _i Distance comparison is performed to the center of mass C of the labeled tooth _i The center point of the smallest cluster is defined as the predicted tooth centroid point C _i ⁰ 。

The center of mass of the teeth is obtained through prediction, the network obtains the spatial position information of the teeth, the teeth are distinguished and positioned stably, and meanwhile, the example information of the single teeth is obtained in the process of learning the center of mass through the network.

To reduce the influence of redundant information on the network and prevent loss of tooth information, a predicted centroid C of each tooth _i The x32x40 cube area of 32 standards is cut out for the center to frame each tooth as input to the tooth instance segmentation network.

Further, the tooth instance segmentation network includes: a second V-net network;

a second V-net network comprising: a second encoder, a self-attention mechanism layer, and a second decoder;

a second encoder, comprising: the first downsampling layer, the first convolution layer, the second downsampling layer, the second convolution layer, the third downsampling layer, the third convolution layer, the fourth downsampling layer and the fourth convolution layer are sequentially connected from top to bottom;

a second decoder, comprising: the first up-sampling layer, the first expansion convolution layer, the second up-sampling layer, the second expansion convolution layer, the third up-sampling layer, the third expansion convolution layer, the fourth up-sampling layer and the fifth convolution layer are sequentially connected from bottom to top;

the fourth convolution layer is connected with the input end of the self-attention mechanism layer;

the first upsampling layer is connected to the output of the self-attention mechanism layer.

The second encoder is part of a tooth instance segmentation network encoding.

The self-attention mechanism layer is a tooth information guiding module based on a 3D self-attention mechanism.

The second decoder is a multi-scale expansion convolution fused tooth characteristic integration module.

Further, the tooth example segmentation network codes the working principle of the part:

and (5) gradually performing 2 times downsampling from top to bottom, and extracting characteristic information.

As shown in fig. 2, the self-attention mechanism layer includes:

the first channel, the second channel and the third channel are arranged in parallel;

wherein the first channel is provided with a first 3D convolution layer; the first 3D convolution layer is a 1 x 1 convolution layer;

wherein the second channel is provided with a second 3D convolution layer; the second 3D convolutional layer is a 3 x3 convolutional layer;

wherein the third channel is provided with a third 3D convolution layer; the third 3D convolutional layer is a 3 x3 convolutional layer;

the input values of the first channel, the second channel and the third channel are all characteristic diagrams output by the second encoder;

the output end of the first 3D convolution layer and the output end of the second 3D convolution layer are connected with the input end of the first dot product unit; the output end of the first dot product unit is connected with the input end of the softmax function layer;

the output end of the third 3D convolution layer and the output end of the softmax function layer are connected with the input end of the second point multiplication unit; the output of the second dot product unit serves as the output of the self-attention mechanism layer.

Further, the self-attention mechanism layer has a working principle that:

let it be assumed that the feature map in the high-level semantic information hiding layer is defined as the feature F of the K channel ^H×W×D×K Where H and W represent the height and width, respectively, of the CBCT image feature map and D represents the depth (number of layers) of the CBCT image.

Converting pixels xεF in an input feature map into two feature subspaces in the overall feature space R by using a 1 x 1 convolution, q (x) ∈R ^D×H×W×K ,k(x)∈R ^D×H×W×K As the point multiplication computation of the first stage of the self-attention mechanism, and takes this transformation as the original input of the feature code and channel.

The relation of each pixel to all other pixels in the CBCT image slice space is calculated:

first, a two-dimensional array is flattened into a one-dimensional vector by applying a reshape operation from k×d×w to k×n, where n=h×w×d is the total number of pixels in the feature map.

Then, for two vectors q (x _i ) And k (x) _j ) Performing inner product to obtain the relationship between every two positions, and then using the relationship matrix V (x) E R ^D×H×W×K The resulting convolution feature map is point multiplied to calculate spatial attention as represented by equations (2) and (3):

δ _j,i ＝q(x _i ) ^T k(x _j ) (3)

wherein delta _j,i ∈R ^K×N 。δ _j,i The larger the value is, the stronger the relationship is, and the smaller the value is, the weaker the relationship is, representing the dependency relationship or feature similarity between the ith position and the jth position on the feature map F.

Next, a softmax function was usedThe number normalizes delta, and then a relation matrix V (x) is applied to the convolution feature map to calculate spatial attention, where V (x) ε R ^C×D×H×W Is a 3 x3 kernel convolutional layer.

It should be appreciated that there is a correlation between the distance between any two pixel locations in the tooth feature space and the pixel value. For any two pixels, the spatial distance dependency relationship between the pixels of the pixels can be obtained by a self-attention mechanism calculation mode, which is called as a pixel dependency relationship in this document, regardless of the spatial position of the pixels in the input image or the feature map obtained in the network coding process.

There are two dependencies in tooth CBCT slice data: intra-slice dependencies and inter-slice dependencies. The dependency between different pixels of a dental image in a layer of CT slices is defined as intra-slice dependency. The CBCT data shows the cross-sectional information of the tooth from the crown to the root in the mouth from top to bottom, and the relationship between different slices is defined as inter-slice dependence.

The two long-distance dependency relations in the tooth CBCT data reflect more complete tooth information, so that an example segmentation guide module based on a 3D self-attention mechanism is provided for simultaneously processing intra-slice dependency relations and inter-slice dependency relations of the CBCT tooth data, classification and modeling capabilities of space pixel relations are improved when the network performs semantic segmentation on the tooth data, adaptability in space dimension is better considered, and tooth geometric structure information is better captured.

With the operation of the network convolution layer on the tooth image, the high-level features of the tooth image containing the richer semantic features of the tooth image are gradually acquired, and the high-level features contain richer tooth geometric structure information.

Therefore, the invention is connected with the tooth geometric structure information guiding module after the encoder, captures the space distance dependency relationship between any pixel points in the tooth space feature map by using the calculation mode of the self-attention mechanism, integrates the information in the whole tooth space feature map better, and guides the example segmentation process.

As shown in fig. 2, the present invention illustrates a flowchart of a tooth information guiding module based on a self-attention mechanism. The invention captures the position dependence relationship among the pixel points in the tooth characteristic space diagram through the self-attention mechanism, so that the network can better sense the characteristics of tooth geometric structure and the like, and the whole tooth example is guided to divide the network.

Further, the internal structures of the first, second, and third dilated convolutional layers are uniform.

As shown in fig. 4, the first expansion convolution layer includes: the first sub-expansion convolution layer, the second sub-expansion convolution layer and the third sub-expansion convolution layer are arranged in parallel;

the input ends of the first sub-expansion convolution layer, the second sub-expansion convolution layer and the third sub-expansion convolution layer are connected with the input end of the sixth convolution layer;

the output ends of the first sub expansion convolution layer, the second sub expansion convolution layer and the third sub expansion convolution layer are connected with the input ends of the series connection splicing unit, the series connection splicing unit splices the output values of the first sub expansion convolution layer, the second sub expansion convolution layer and the third sub expansion convolution layer, the output end of the series connection splicing unit is connected with the input end of the 1 multiplied by 1 convolution layer, and the output end of the 1 multiplied by 1 convolution layer is the output end of the first expansion convolution layer.

Further, the multi-scale expansion convolution fused tooth feature integration module has the working principle that:

and acquiring larger receptive field capturing characteristic information through expansion convolution, wherein the network receptive fields corresponding to different expansion rates are different, for example, the corresponding receptive field size is 7 in 3*3 convolution layers with the expansion rate of d=3. Therefore, 3 expansion convolutions with different scales are spliced and fused, and more tooth detail features are captured by the multi-scale expansion convolutions.

The three expansion convolution layers with expansion rates d of 1,3 and 5 respectively receive the same input and connect the outputs of the expansion convolution layers together, and then the multi-scale features of the expansion convolution layers are fused by using a convolution of 1 multiplied by 1, so that the obtained output is a fused feature map with different receptive field scales, and the fused feature map is up-sampled to ensure that the tooth feature map is kept in information in the decoding process;

the number of convolution kernels is represented by K, and the expansion rate is represented by D, so that three expansion convolutions H _K,D Is a convolution Y of (2) _K,D Expressed as:

k represents the number of convolution kernels, D represents the expansion rate, Y _K,D Representing the expansion convolution H by three different K, D _K,D An integrated convolution.

It should be appreciated that when decoding the acquired feature map, different network receptive fields are acquired by adding expansion convolution layers of different expansion rates. The larger the receptive field value is, the larger the range of the original image of the tooth which can be contacted with the receptive field is, and the larger the receptive field value is, the more global the tooth can be captured, and the higher the semantic hierarchy is; the smaller the convolution receptive field, the more detail such as the feature of the contained tooth tends to be local and root. Therefore, in order to capture and restore detailed information of tooth root, surface and the like, the invention provides a multi-scale expansion convolution densely connected tooth characteristic integration module in a network decoder part, the module is fused with convolution of different expansion rates d (namely kernels), each expansion convolution layer can obtain a larger receptive field under the condition of maintaining spatial resolution, as shown in fig. 3 (a) -3 (c), the corresponding receptive field size is 7, and the expansion rate d=3 of 3*3 convolution layers is shown in fig. 3 (a) -3 (c). The convolution layer with a kernel size of 3 x3 is used, the expansion rate of the convolution being set to d, taking into account the actual smaller size of the individual teeth _i ＝[1,3,5]To relieve the computational pressure of the grid as shown in figure 4.

According to the invention, the tooth characteristic integration module is introduced into a layer-by-layer decoding process, so that the receptive field of the whole network is enlarged, the capture of the detail information of the teeth of the network is enhanced, and the tooth example segmentation is guided.

Further, the trained tooth segmentation network; the training process comprises the following steps:

constructing a training set; the training set is a tooth image with known tooth segmentation results;

inputting the training set into the tooth segmentation network, training the tooth segmentation network, and stopping training when the total loss function value is not reduced any more, so as to obtain the trained tooth segmentation network.

Further, the total loss function is a summation of the loss function of the centroid prediction network and the loss function of the tooth instance segmentation network.

Further, the centroid prediction network's loss function generates a loss of segmentation offset using the dice loss and bce loss as the first V-net network, while using the mse loss and the smoothed L1 loss to calculate an offset regression loss of the tooth object voxels and using the binary cross entropy loss to calculate a binary segmentation loss.

Wherein the smooth L1 loss combines the advantages of the L1 loss function and the L2 loss function, so that the propagation gradient is more stable. The experimental equilibrium coefficient lambda the present invention was empirically set to 10, noted:

further, the loss function of the tooth instance segmentation network is combined with the Dice loss function L _dice And a binary cross entropy loss function L _bce As network loss L _seg The method is characterized by comprising the following steps:

L _seg ＝L _dice +L _bce 。

the present invention proposes a tooth segmentation deep learning network TSDNet (Tooth SegmentationDeeplearning Network) for performing tooth instance segmentation on CBCT data, the framework being divided into two phases, a centroid prediction network and a tooth instance segmentation network, as the flow of the proposed method is illustrated in fig. 1:

in the mass center prediction stage, the tooth data is subjected to first-stage rough segmentation by introducing a V-net frame to obtain a tooth mass center offset map, and a density-based rapid search clustering algorithm is introduced to accurately and robustly obtain the tooth mass center and represent the spatial position information of each tooth.

In an example segmentation stage, introducing a tooth geometric structure information guiding module into a network coding part to enhance the capture of a network to a tooth body structure;

in the stage of step-by-step decoding, a tooth characteristic integration module is introduced and used for capturing tooth characteristic detail information in a multi-scale mode so as to obtain an accurate tooth segmentation result.

In order to better capture the geometric overall structure and detail information of a single tooth and remove noise information irrelevant to the tooth as much as possible, the invention takes a CBCT original image and centroid fusion characteristics acquired by a tooth centroid positioning network together as the two-channel input of a tooth example segmentation network. As shown in fig. 1, the present invention introduces V-net as an example of the present invention to divide the main frame, and introduces a tooth dentition information guiding module based on 3D self-attention mechanism behind the decoder, to better capture the whole information of the tooth dentition by calculating the correlation of spatial pixels in the tooth feature map. For complex details such as tooth roots, the multi-scale expansion convolution fusion tooth characteristic integration module is introduced into the segmentation network decoding part, and in the layer-by-layer decoding process, the tooth detail information can be captured in a multi-scale manner while the image resolution is kept unchanged, so that the tooth detail information of a segmentation result is well kept.

Example two

The present embodiments provide a tooth instance segmentation system based on self-attention and receptive field adjustment;

Here, it should be noted that the above-mentioned obtaining module and dividing module correspond to steps S101 to S102 in the first embodiment, and the above-mentioned modules are the same as examples and application scenarios implemented by the corresponding steps, but are not limited to those disclosed in the first embodiment. It should be noted that the modules described above may be implemented as part of a system in a computer system, such as a set of computer-executable instructions.

The foregoing embodiments are directed to various embodiments, and details of one embodiment may be found in the related description of another embodiment.

The proposed system may be implemented in other ways. For example, the system embodiments described above are merely illustrative, such as the division of the modules described above, are merely a logical function division, and may be implemented in other manners, such as multiple modules may be combined or integrated into another system, or some features may be omitted, or not performed.

Example III

The embodiment also provides an electronic device, including: one or more processors, one or more memories, and one or more computer programs; wherein the processor is coupled to the memory, the one or more computer programs being stored in the memory, the processor executing the one or more computer programs stored in the memory when the electronic device is running, to cause the electronic device to perform the method of the first embodiment.

It should be understood that in this embodiment, the processor may be a central processing unit CPU, and the processor may also be other general purpose processors, digital signal processors DSP, application specific integrated circuits ASIC, off-the-shelf programmable gate array FPGA or other programmable logic device, discrete gate or transistor logic devices, discrete hardware components, or the like. A general purpose processor may be a microprocessor or the processor may be any conventional processor or the like.

The memory may include read only memory and random access memory and provide instructions and data to the processor, and a portion of the memory may also include non-volatile random access memory. For example, the memory may also store information of the device type.

In implementation, the steps of the above method may be performed by integrated logic circuits of hardware in a processor or by instructions in the form of software.

The method in the first embodiment may be directly implemented as a hardware processor executing or implemented by a combination of hardware and software modules in the processor. The software modules may be located in a random access memory, flash memory, read only memory, programmable read only memory, or electrically erasable programmable memory, registers, etc. as well known in the art. The storage medium is located in a memory, and the processor reads the information in the memory and, in combination with its hardware, performs the steps of the above method. To avoid repetition, a detailed description is not provided herein.

Those of ordinary skill in the art will appreciate that the elements and algorithm steps described in connection with the embodiments disclosed herein may be implemented as electronic hardware or combinations of computer software and electronic hardware. Whether such functionality is implemented as hardware or software depends upon the particular application and design constraints imposed on the solution. Skilled artisans may implement the described functionality in varying ways for each particular application, but such implementation decisions should not be interpreted as causing a departure from the scope of the present invention.

Example IV

The present embodiment also provides a computer-readable storage medium storing computer instructions that, when executed by a processor, perform the method of embodiment one.

The above description is only of the preferred embodiments of the present invention and is not intended to limit the present invention, but various modifications and variations can be made to the present invention by those skilled in the art. Any modification, equivalent replacement, improvement, etc. made within the spirit and principle of the present invention should be included in the protection scope of the present invention.

Claims

1. A method for tooth example segmentation based on self-attention and receptive field adjustment, comprising:

acquiring a tooth CBCT image to be segmented;

the method comprises the steps that a tooth CBCT image to be segmented is input into a trained tooth segmentation network, and a tooth image segmentation result is obtained; the working principle of the tooth segmentation network comprises the following steps: positioning the mass center of the tooth to obtain the fusion characteristic of the mass center of the tooth; determining geometrical structure information of a tooth body based on the CBCT image of the tooth to be segmented and the fusion characteristic of the mass center of the tooth; based on the geometric structure information of the tooth body, acquiring the structure information of the tooth root of the tooth by adopting a multi-scale expansion convolution fusion mode;

wherein the tooth segmentation network comprises: a tooth centroid positioning network and a tooth instance segmentation network;

wherein, tooth centroid location network, theory of operation includes:

inputting a tooth CBCT image to be segmented into a first V-net network;

generating a tooth centroid density map HC according to the frequency of each three-dimensional vector in the 3D tooth centroid offset map pointed by other three-dimensional vectors;

density-based rapid search clustering algorithm for calculating local density of each voxel point in tooth centroid density map HCAnd the Euclidean distance between each voxel point in HC and the highest density voxel point +.>；

Will local densityPoints greater than the set density threshold are taken as a first candidate point set;

distance of European styleA point larger than the set distance threshold is used as a second candidate point set;

intersection of the first candidate point set and the second candidate point set is solved, and candidate tooth centroid points Ci≡are generated;

2. The self-attention and receptive field adjustment based dental instance segmentation method of claim 1, wherein the dental instance segmentation network comprises: the tooth example segmentation network coding part, a tooth body information guiding module based on a 3D self-attention mechanism and a multi-scale expansion convolution fusion tooth characteristic integration module.

3. The self-attention and receptive field adjustment based tooth example segmentation method of claim 1 wherein said tooth centroid localization network comprises: a first V-net network; a first V-net network comprising: a first encoder and a first decoder; the input end of the first encoder and the output end of the first decoder are connected with a first multiplier; the output end of the first multiplier is connected with the input end of the clustering module; the output end of the clustering module is connected with the input end of the cutting module; the output end of the cutting module and the input end of the first encoder are connected with the splicing unit; the output end of the splicing unit is connected with the input end of the tooth example segmentation network.

4. The self-attention and receptive field adjustment based dental instance segmentation method of claim 1, wherein the dental instance segmentation network comprises: a second V-net network;

5. The method for tooth example segmentation based on self-attention and receptive field adjustment according to claim 4, wherein the self-attention mechanism layer comprises:

6. The self-attention and receptive field adjustment based dental example segmentation method of claim 4 wherein said first dilated convolution layer comprises: the first sub-expansion convolution layer, the second sub-expansion convolution layer and the third sub-expansion convolution layer are arranged in parallel;

the output ends of the first sub expansion convolution layer, the second sub expansion convolution layer and the third sub expansion convolution layer are connected with the input ends of a series connection splicing unit, the series connection splicing unit splices the output values of the first sub expansion convolution layer, the second sub expansion convolution layer and the third sub expansion convolution layer, the output end of the series connection splicing unit is connected with the input end of the 1 multiplied by 1 convolution layer, and the output end of the 1 multiplied by 1 convolution layer is the output end of the first expansion convolution layer;

or,

the trained tooth segmentation network; the training process comprises the following steps:

inputting the training set into a tooth segmentation network, training the tooth segmentation network, and stopping training when the total loss function value is not reduced any more, so as to obtain a trained tooth segmentation network;

the total loss function is the sum result of the loss function of the centroid prediction network and the loss function of the tooth example segmentation network;

a loss function of the centroid prediction network, generating a loss of segmentation offset using the dice loss and the bce loss as a first V-net network, while calculating an offset regression loss of the tooth object voxels using the mse loss and the smoothed L1 loss and calculating a binary segmentation loss using the binary cross entropy loss;

the loss function of the tooth example segmentation network is combined with the Dice loss functionAnd binary cross entropy loss functionAs network loss->。

7. A tooth example segmentation system based on self-attention and receptive field adjustment, comprising:

a segmentation module configured to: the method comprises the steps that a tooth CBCT image to be segmented is input into a trained tooth segmentation network, and a tooth image segmentation result is obtained; the working principle of the tooth segmentation network comprises the following steps: positioning the mass center of the tooth to obtain the fusion characteristic of the mass center of the tooth; determining geometrical structure information of a tooth body based on the CBCT image of the tooth to be segmented and the fusion characteristic of the mass center of the tooth; based on the geometric structure information of the tooth body, acquiring the structure information of the tooth root of the tooth by adopting a multi-scale expansion convolution fusion mode;

wherein, tooth centroid location network, theory of operation includes:

inputting a tooth CBCT image to be segmented into a first V-net network;

density-based rapid search clustering algorithm for calculating tooth centroid density map HCLocal density of voxel-by-voxel pointsAnd the Euclidean distance between each voxel point in HC and the highest density voxel point +.>；

8. An electronic device, comprising:

a memory for non-transitory storage of computer readable instructions; and

a processor for executing the computer-readable instructions,

wherein the computer readable instructions, when executed by the processor, perform the method of any of the preceding claims 1-6.

9. A storage medium, characterized by non-transitory storing computer-readable instructions, wherein the instructions of the method of any one of claims 1-6 are performed when the non-transitory computer-readable instructions are executed by a computer.