CN117115132A

CN117115132A - Oral cavity CBCT image tooth and soft tissue segmentation model method based on improved U-Net model

Info

Publication number: CN117115132A
Application number: CN202311179725.7A
Authority: CN
Inventors: 李冬; 李华勇; 范毅; 刘金池
Original assignee: Fussen Technology Co ltd
Current assignee: Fussen Technology Co ltd
Priority date: 2023-09-13
Filing date: 2023-09-13
Publication date: 2023-11-24

Abstract

The application discloses an oral cavity CBCT image tooth and soft tissue segmentation model method based on an improved U-Net model, which relates to the technical field of oral cavity CBCT images and is based on two major modules of collection and labeling of a data set and loss function determination, wherein the collection and labeling of the data set comprises the following contents: by using the CBCT equipment to collect oral cavity CBCT data and collecting a plurality of groups of oral cavity image data through the CBCT, the application selects the classical U-Net with the U-shaped structure as the standard of the network model, firstly, the quantity of parameters and the calculated quantity are less, the model can achieve good segmentation results on less training data, the training difficulty of the model is less, and the robustness is stronger. Meanwhile, aiming at the defects of the traditional convolution module, a channel attention module and a space attention module are introduced, the focusing position and the channel capacity of the model are improved, parallel convolution operations with different sizes are introduced, the capacity of extracting different scale features of the model is improved, and the segmentation performance of the model on segmentation boundaries is improved.

Description

Oral cavity CBCT image tooth and soft tissue segmentation model method based on improved U-Net model

Technical Field

The application relates to the technical field of oral cavity CBCT images, in particular to an oral cavity CBCT image tooth and soft tissue segmentation model method based on an improved U-Net model.

Background

The development of stomatology is not separated from the progress of imaging, the diagnosis rate of oral diseases is greatly improved by the application of traditional root tip and full view films, the development of oral science is promoted by the application of large CT, dental CT is an emerging technology for applying cone beam CT to three-dimensional imaging of the oral cavity in the later 90 th century, wide attention of oral clinicians and CT technical researchers is brought about, compared with the traditional CT, the problems of high spatial resolution, low dosage and the like are solved, the problems of image overlapping, distortion and the like inherent in the conventional two-dimensional perspective imaging technology are solved, the problems of the most promising and practical equipment in the prior oral skull imaging equipment are solved by the huge development of CT technology, revolutionary changes are brought to diagnosis and treatment in the clinical fields of the oral cavity and the skull, professional knowledge is needed for the work of the oral skull CT, but a doctor has large workload for CT image data of a patient, more effort is needed for the doctor to read, the doctor is difficult to read the work, the reading speed and quality are difficult to guarantee, and the related automatic or semi-automatic imaging assistance has important significance for the CBX-ray image and the accurate and the three-dimensional reconstruction of the oral tissue.

In recent years, vision transformers have gained great attention due to their ability to model remote dependencies, but they often lack robust local induction bias and often require training on large data sets to achieve ideal results, so in the field of medical image segmentation, many studies have combined CNN with visual transformations, e.g. Transunet integrates the advantages of transform, fusion convolution and self-attention, and context information is extracted from the feature maps generated by CNN encoders by self-attention mechanisms, thus improving the performance of medical image segmentation, another approach, swin-Unet, which is a pure Transformer like U-Net for medical image segmentation, marked image blocks are fed into the Transform-based U-En-Decoder architecture by jump connections for local global semantic feature learning.

Although the model parameters are too large, the deployment challenges are large, and the requirements on the data volume are large, so that the model parameters are not suitable for being used as model references of an oral cavity CBCT image segmentation task, and therefore the application provides an oral cavity CBCT image tooth and soft tissue segmentation model method based on an improved U-Net model.

Disclosure of Invention

The present application has been made in view of the above-described problems occurring in the prior art.

Therefore, the application aims to provide an oral cavity CBCT image tooth and soft tissue segmentation model method based on an improved U-Net model, which solves the problems.

In order to achieve the above object, the present application provides the following technical solutions:

an oral cavity CBCT image tooth and soft tissue segmentation model method based on an improved U-Net model is based on two major modules of collection and labeling of a data set and determining a loss function, wherein the collection and labeling of the data set comprises the following contents:

the method comprises the steps of collecting oral cavity CBCT data by using CBCT equipment, collecting a plurality of groups of oral cavity image data by using the CBCT, formulating CBCT data of each patient into 500 dcm format images with the size of 750 multiplied by 750, simultaneously marking a training data set of a deep learning oral cavity CBCT model by using a labelme online image marking tool, marking the oral cavity CBCT image in 3 types, marking the oral cavity CBCT image and a label corresponding to the oral cavity CBCT image and visualizing the oral cavity CBCT image, wherein the labeled CBCT image corresponds to a label image on a gray value image, the correspondence of pixel values corresponding to different structures is that a background area corresponding to a pixel value of 0, a soft tissue area corresponding to a pixel value of 1 and a tooth area corresponding to a pixel value of 2;

the marked training data set is divided into a plurality of groups of oral cavity CBCT image data obtained through acquisition, each group of data comprises 500 pieces of 750 multiplied by 750 dcm data, all the obtained image data are used for a png format data set trained by a deep learning model through converting the dcm format data set into png image data, and the training set and the test set are randomly divided according to the proportion of 9:1.

Further, the determining the loss function includes:

in order to ensure the accuracy of the pixel level position, the mixed loss function is adopted, and the cross entropy loss and the Dice coefficient loss function are weighted and averaged, so that the calculation formula of the specific mixed function is as follows:

Loss＝w ₁ Loss _ce +w ₂ Loss _dice wherein, loss _ce Representing a cross entropy Loss function, loss _dice Representing the Dice coefficient Loss function, and Loss represents the hybrid Loss function used by the model. According to experiments, when the weight in the loss function takes the following value, the model performance is optimal, wherein w ₁ The value is 0.4, w ₂ The value is 0.6.

Further, the soft tissue segmentation model method of oral cavity CBCT image teeth based on an improved U-Net model comprises model building and model testing, wherein the model building comprises selection of a reference model structure, CPCA module and model overall structure design, and the method comprises the following steps:

s1, selecting a reference model structure to construct a U-shaped model for segmenting an oral cavity CBCT image, adopting a U-Net network model as a reference network in an oral cavity CBCT segmentation task, adopting a network design of a U-shaped framework by taking a feature image in the model as an important channel and simultaneously as a key position through an attention mechanism, adding a CPCA module into a channel priori convolution attention module of an encoder and a decoder, and replacing the original convolution process.

S2, when the CPCA module builds the attention module, the attention weight is mainly dynamically distributed in the channel and space dimensions, the channel priori convolution attention module sequentially executes the channel attention and space attention operations, and an intermediate feature diagram F epsilon R is input ^C×H×W A1D channel attention weighting map M can be obtained by a channel attention module (CA) _C ∈R ^C×1×1 ，M _C The channel weight value is broadcast in the space dimension to obtain a refined feature map F subjected to channel attention fine adjustment _C ∈R ^C×H×W . Spatial attention module (SA) through F _C To generate a 3D spatial attention weighting map M _S ∈R ^C×H×W . By combining M _S And F _C Performing point-by-point multiplication to obtain a final output characteristic diagramThe overall attention mechanism can be summarized as the following process:

F _C ＝CA(F)×F

the generation of channel attention patterns for channel attention is responsible for the channel attention module, which achieves this goal by exploring the inter-channel relationships that exist in the features. The method proposed by CBAM is adopted, and the spatial information is aggregated from the feature map by adopting the operations of average pooling and maximum pooling. This aggregation process produces two independent spatial context descriptors, two independent channel feature descriptors are stitched together by a stitching operation, these descriptors are then input into a multi-layer perceptron (MLP), the input channel weight descriptors are subjected to refinement and dimension reduction by the multi-layer perceptron (MLP), a channel attention weight map that is consistent with the input feature map channel dimension is obtained, to reduce the parametric overhead, the shared MLP consists of a single hidden layer, where the number of active neurons of the hidden layer is set to a size of C/r x 1, where r represents the reduction ratio, and the calculation of the channel attention can be summarized as:

CA(F)＝σ(MLP(concat(AvgPool(F)，MaxPool(F))))

wherein F represents an input feature map, maxPool represents a maximum pooling operation, avgPool represents an average pooling operation, MLP represents a multi-layer perceptron (MLP), concat represents a splicing operation according to channel dimensions, and sigma represents a sigmoid function;

the spatial attention of the spatial attention is intended to be generated by extracting a spatial mapping relation. We consider that we should avoid enforcing consistency in the spatial attention weighting map of each channel. Dynamically assigning attention weights in both channel and space dimensions should be more realistic. FIG. 2 illustrates the use of a deep convolution to capture the spatial relationship between features, ensuring that the spatial attention feature map preserves the relationship between channels while reducing computational complexity. The ability of capturing spatial relationships of different scales by adopting a multi-scale structure to enhance convolution operation. Channel mixing is performed by the tail of the spatial attention module using a 1 x 1 convolution, generating a finer spatial attention weight map. The method for calculating the spatial attention is as follows:

SA(F)＝Conv _1×1 (Concat(Branch ₁ (DwConv(F))，

Branch ₂ (DwConv(F))，Branch ₃ (DwConv (F)), F), where DwConv represents a depth convolution, branch _i I epsilon {1,2,3} represents the ith branch, F represents the input feature map, two depth-wise one-dimensional convolutions are used to approximate a large-kernel standard channel convolution, the convolution kernel size of each channel is different, the channel attention and the spatial attention are focused on what and where, respectively, can be placed in parallel or sequentially, and in the module, the spatial attention design is based on the channel attention preamble, thereby forming a channel prior convolution attention module.

S3, designing a model overall structure by selecting a U-Net as a reference network of a segmentation model, designing the segmentation structure by the U-Net, wherein the input image data to be segmented is 3-dimensional image data, performing a 3X3 convolution module (comprising subsequent Relu and BN operations), changing the generated feature map channel number into 64, further extracting the features of the input image by the CPCA module at the moment, performing downsampling by a 2X 2 convolution module at the step length, reducing the resolution to be one half of the original, performing dimensional lifting by a 3X3 common convolution module, changing the channel number of the feature map into 2 times before input, performing further extraction of the feature map features by the CPCA module, repeatedly performing downsampling, upstroke and CPCA, performing downstroke by a common 3X3 convolution module, performing finer feature extraction by the CPCA module, performing downstroke for the feature map information caused by the downstroke operation, performing downstroke by the CPCA module, performing downstroke and the feature map channel operation on the same decoder to obtain the feature map 640, performing the downstroke and the feature map detail, and finally performing the downstroke on the feature map to obtain the feature map 640.

Further, the model test comprises a training stage and a testing stage, and comprises the following contents:

training stage by dividing the network model with preprocessed data, dividing the data set into training set and test set according to the ratio of 9:1, using PyTorch deep learning frame, adopting SGD optimization function, wherein momentum value is 0.9, weight attenuation parameter weight_decay is set to 0.0001, initial learning rate is set to 0.001, training period is set to 300 epochs, batch data size is set to 16, and then using StepLR to perform learning rate adjustment, wherein learning rate change mode is according to formulaGradual update, where lr _new Value, lr, representing learning rate in new training round _old Represents the middle school of the previous training roundThe value of the learning rate, iter_num represents the number of current iterations, max_iterations represents the maximum number of iterations set by experiments, meanwhile, the used loss function is a mixed loss function, after training is finished, the parameter weight corresponding to the minimum loss value in the whole training period is selected as the final weight parameter, the model training simultaneously adopts an EarlyStopping mechanism, the phenomenon of fitting in the training process is prevented, the model can be considered to be successfully converged when the loss function value of the training in a continuous certain round is reduced and does not exceed a set threshold value, and the training can be directly stopped, wherein the model parameter at the moment is the final weight parameter;

and in the test stage, a trained model is obtained by loading trained model weight parameters, and then divided test set image data are sent to the model for testing, so that a predicted tooth and soft tissue segmentation map and corresponding index parameters are finally obtained, and the segmentation of the model is completed.

Furthermore, the model builds a U-shaped structure based on a U-Net network model, an encoder coding module formed by downsampling, a decoder module formed by upsampling, a bottleneck block connecting the encoder and the decoder, and a jump connection structure used for reducing the loss of detail of a segmentation boundary caused by downsampling of the encoder.

In the technical scheme, the application has the technical effects and advantages that:

1. according to the application, the classical U-Net with a U-shaped structure is selected as a reference of the network model, the quantity of parameters and calculation is small, the model can achieve good segmentation results on a small amount of training data, the training difficulty of the model is small, and the robustness is high. Meanwhile, aiming at the defects of the traditional convolution module, a channel attention module and a space attention module are introduced, the focusing position and the channel capacity of the model are improved, parallel convolution operations with different sizes are introduced, the capacity of extracting different scale features of the model is improved, and the segmentation performance of the model on segmentation boundaries is improved.

2. The present application approximates a large-kernel standard channel convolution by using two depth-wise one-dimensional convolutions. To capture multi-scale information, the convolution kernel size is different for each channel. The channel attention and the space attention are focused on what and where, respectively, and can be placed in parallel or sequentially, and in the module, the design of the space attention is based on the preamble of the channel attention, and the sequential arrangement is better than the parallel arrangement, so that the effect of forming the channel prior convolution attention module is improved.

Drawings

In order to more clearly illustrate the embodiments of the present application or the technical solutions in the prior art, the drawings required for the embodiments will be briefly described below, and it is apparent that the drawings in the following description are only some embodiments described in the present application, and other drawings may be obtained according to these drawings for those skilled in the art.

FIG. 1 is a schematic diagram of a label image of the present application;

fig. 2 is a schematic structural diagram of a CPCA module according to the present application;

FIG. 3 is a schematic view of the overall structure of the segmentation model according to the present application;

FIG. 4 is a schematic diagram of a model test flow according to the present application;

FIG. 5 is a schematic diagram of the model segmentation of the present application.

Detailed Description

In order to make the technical scheme of the present application better understood by those skilled in the art, the present application will be further described in detail with reference to the accompanying drawings.

The embodiment of the application discloses an oral cavity CBCT image tooth and soft tissue segmentation model method based on an improved U-Net model.

The application provides an oral cavity CBCT image tooth and soft tissue segmentation model method based on an improved U-Net model as shown in figures 1-5, which is based on two major modules of collection and labeling of a data set and determining a loss function, wherein the collection and labeling of the data set comprises the following contents:

dividing a marked training data set into a plurality of groups of oral cavity CBCT image data obtained through acquisition, wherein each group of data comprises 500 pieces of 750 multiplied by 750 dcm data, converting the dcm-format data set into png-format data sets for training a deep learning model, and randomly dividing the training set and the testing set according to the proportion of 9:1;

the teeth and soft tissues in each image in the CBCT image sequence of the patient are subjected to semantic segmentation, so that specific positions of the teeth and soft tissues in the CBCT image are obtained, and necessary results can be provided for post-processing operations such as CBCT image noise reduction and three-dimensional visualization.

Wherein determining the loss function comprises:

Loss＝w ₁ Loss _ce +w ₂ Loss _dice wherein, loss _ce Representing a cross entropy Loss function, loss _dice Representing the Dice coefficient Loss function, and Loss represents the hybrid Loss function used by the model. According to the experiment, the model result is optimal when the weight takes the following values, wherein w ₁ The value is 0.4, w ₂ The value is 0.6;

in order to ensure the accuracy of the pixel level position, the mixed loss function is adopted, and the cross entropy loss and the Dice coefficient loss function are weighted and summed, so that the classification accuracy can be considered, the accuracy of the pixel position can be considered, and the classification result is more accurate.

The soft tissue segmentation model method for the oral cavity CBCT image teeth based on the improved U-Net model comprises model building and model testing, wherein the model building comprises reference model structure selection, CPCA modules and model overall structure design, and the method comprises the following steps:

s1, selecting a reference model structure to construct a U-shaped model for segmenting an oral cavity CBCT image, adopting a U-Net network model as a reference network in an oral cavity CBCT segmentation task, adopting a characteristic diagram in the model as an important channel and simultaneously as a key position through an attention mechanism, adopting a network design of a U-shaped framework, adding a CPCA module into a channel priori convolution attention module of an encoder and a decoder, and replacing the original convolution process;

in the oral cavity CBCT segmentation task, a U-Net network model is adopted as a reference network, the attention degree of the model to important channels and key positions of a feature map is improved by introducing the latest attention mechanism, the model is still the most common and effective basic framework for a medical segmentation model, a plurality of models are improved on the basis of a U-shaped structure, a classical U-shaped network is still used, and the original convolution process is replaced by introducing a CPCA module into a channel priori convolution attention module of an encoder and a decoder, so that the segmentation performance of the model is improved.

S2, when the CPCA module builds the attention module, the attention weight is mainly dynamically distributed in the channel and space dimensions, the channel priori convolution attention module sequentially executes the channel attention and space attention operations, and an intermediate feature diagram F epsilon R is input ^C×H×W The intermediate feature map is intermediate feature information output by the network structure, wherein C represents the channel number of the feature map, H represents the height of the feature map, and W represents the width of the feature map. A1D channel attention weighting map M can be obtained through a channel attention module (CA) _C ∈R ^C×1×1 ，M _C The channel weight value is broadcast in the space dimension to obtain a refined feature map F subjected to channel attention fine adjustment _C ∈R ^C×H×W The spatial attention module (SA) is connected with the power supply through F _C To generate 3D spatial attention weightsGraph M _S ∈R ^C×H×W By combining M _S And F _C Performing point-by-point multiplication to obtain a final output characteristic diagramThe overall attention mechanism can be summarized as the following process:

F _C ＝CA(F)×F

the generation of channel attention patterns for channel attention is responsible for a channel attention module that achieves this by exploring the relationships between channels that exist in the features, using the method proposed by CBAM, using average pooling and maximum pooling operations, aggregating spatial information from the feature map. This aggregation process produces two independent spatial context descriptors, two independent channel feature descriptors are stitched together by a stitching operation, these descriptors are then input into a multi-layer perceptron (MLP), the input channel weight descriptors are subjected to refinement and dimension reduction by the multi-layer perceptron (MLP), a channel attention weight map that is consistent with the input feature map channel dimension is obtained, to reduce the parametric overhead, the shared MLP consists of a single hidden layer, where the number of active neurons of the hidden layer is set to a size of C/r x 1, where r represents the reduction ratio, and the calculation of the channel attention can be summarized as:

CA(F)＝σ(MLP(concat(AvgPool(F)，MaxPool(F))))

SA(F)＝Conv _1×1 (Concat(Branch ₁ (DwConv(F))，

Branch ₂ (DwConv(F))，Branch ₃ (DwConv (F)), F), where DwConv represents a depth convolution, branch _i I epsilon {1,2,3} represents the i-th branch, F represents the feature map of the input, conv _1×1 Representing a 1 x 1 convolution, concat represents a standard channel convolution with a large kernel approximated by a one-dimensional convolution in the channel dimension, each channel having a different convolution kernel size, channel attention and spatial attention focused on "what" and "where" respectively, and placed in parallel or sequentially, and in the module, the spatial attention design is based on the channel attention preamble, thereby forming a channel prior convolution attention module.

S3, designing a model overall structure by selecting a U-Net as a reference network of a segmentation model, designing the segmentation structure by the U-Net, wherein the input image data to be segmented is 3-dimensional image data, performing a 3X3 convolution module (comprising subsequent Relu and BN operations), changing the generated feature map channel number into 64, performing finer feature extraction by the CPCA module at the moment by further extracting the features of the input image, performing downsampling by a 2X 2 convolution module at the step length, reducing the resolution to one half of the original resolution, performing dimensional lifting by a 3X3 common convolution module, changing the channel number of the feature map into 2 times before input, performing further extraction of the feature map features by the CPCA module, repeatedly performing downsampling, upstroke and CPCA, performing transposition convolution after passing through a common 3X3 convolution module, performing finer feature extraction by the CPCA module, performing downstroke, performing a final feature extraction by the CPCA module, performing downstroke and the channel operation on the decoder to obtain the feature map with the feature map information of the same layer, performing the downstroke and the feature map channel profile to the final feature map, performing the downstroke, and the feature extraction by the CPCA module, and performing the final feature extraction by the downstroke;

and splicing the output characteristic diagram of the encoder of the same layer with the characteristic diagram obtained by upsampling the next layer of the decoder by using jump connection, so that the segmentation precision of details is improved when the resolution of the characteristic diagram is restored in the upsampling operation.

The model test comprises a training stage and a testing stage, and comprises the following steps:

training stage by dividing the network model with preprocessed data, dividing the data set into training set and test set according to the ratio of 9:1, using PyTorch deep learning frame, adopting SGD optimization function, wherein momentum value is 0.9, weight attenuation parameter weight_decay is set to 0.0001, initial learning rate is set to 0.001, training period is set to 300 epochs, batch data size is set to 16, and then using StepLR to perform learning rate adjustment, wherein learning rate change mode is according to formulaGradual update, where lr _new Value, lr, representing learning rate in new training round _old The method comprises the steps of representing a value of a learning rate in the previous round of training, wherein item_num represents the number of current iterations, max_items represents the maximum iteration number set by experiments, meanwhile, a used loss function is a mixed loss function, after training is finished, a parameter weight corresponding to the minimum loss value in the whole training period is selected as a final weight parameter, an earlyStopping mechanism is adopted in model training at the same time, the phenomenon of overfitting in the training process is prevented, and the loss function value of the training in a continuous certain round is reduced by using the method not to exceed a set threshold valueThe model can be considered to be successfully converged, training can be directly stopped, and model parameters at the moment are final weight parameters;

The model builds a U-shaped structure based on a U-Net network model, an encoder coding module formed by downsampling, a decoder module formed by upsampling, a bottleneck block connecting the encoder and the decoder, and a jump connection structure used for reducing the detail loss of a segmentation boundary caused by downsampling of the encoder.

The specific positions of the teeth and the soft tissues in the CBCT image can be obtained by carrying out semantic segmentation on the teeth and the soft tissues in each image in the CBCT image sequence of the patient, and necessary results can be provided for post-processing operations such as CBCT image noise reduction, artifact removal, three-dimensional visualization and the like. Considering that a self-built oral cavity CBCT data set is difficult, a large amount of training data cannot be obtained, and the task of oral cavity CBCT image segmentation is simpler compared with natural image segmentation, therefore, a standard segmentation model U-Net which is more classical in the field of medical image segmentation is selected as a basic model, aiming at the defect that an original U-Net model only has conventional 3x3 convolution, a new CPCA module is introduced, the attention capability of the model to key information of different channels and different spatial positions is improved, and the capability of the model for extracting different scale features is improved by using parallel convolution modules of different convolution modules.

While certain exemplary embodiments of the present application have been described above by way of illustration only, it will be apparent to those of ordinary skill in the art that modifications may be made to the described embodiments in various different ways without departing from the spirit and scope of the application. Accordingly, the drawings and description are to be regarded as illustrative in nature and not as restrictive of the scope of the application, which is defined by the appended claims.

Claims

1. An oral CBCT image tooth based on an improved U-Net model, wherein the two major modules of collection and labeling based on a dataset and determining a loss function, wherein the collection and labeling of the dataset comprises:

the oral cavity CBCT data acquisition is carried out by using CBCT equipment, a plurality of groups of oral cavity image data are acquired through the CBCT, the CBCT data of each patient are formulated into 500 dcm format images with the size of 750 multiplied by 750, meanwhile, the training data set of the deep learning oral cavity CBCT model is marked by using a labelme online image marking tool, 3 types of marking are carried out on the oral cavity CBCT image, the CBCT oral cavity image and the corresponding label thereof are visualized, the corresponding relation of the marked CBCT image corresponding to the label image on the gray value image is as follows: a background region corresponding to a pixel value of 0, a pixel value of 1 corresponding to a soft tissue region, a pixel value of 2 corresponding to a tooth region,

2. The oral CBCT image tooth based on the improved U-Net model of claim 1, wherein the determining a loss function comprises:

the mixed loss function used to ensure the accuracy of pixel level position is obtained by weighting and summing the cross entropy loss and the Dice coefficient loss function, and the obtained result is used as the loss function used by the model. The calculation formula of the specific mixing function is as follows:

Loss＝w ₁ Loss _ce +w ₂ Loss _dice wherein, loss _ce Representing a cross entropy Loss function, loss _dice Represents the Dice coefficient loss function, w ₁ And w ₂ Is the weight of the two loss functions. Hybrid Loss function for Loss representationIn the present method w ₁ The value is 0.4, w ₂ The value is 0.6.

3. The soft tissue segmentation model method for the oral cavity CBCT image teeth based on the improved U-Net model is used for the oral cavity CBCT image teeth based on the improved U-Net model as claimed in any one of claims 1-2, and is characterized by comprising model building and model testing, wherein the model building comprises selection of a reference model structure, CPCA modules and model overall structure design, and comprises the following contents:

s2, when the CPCA module builds the attention module, the attention weight is mainly dynamically distributed in the channel and space dimensions, the channel priori convolution attention module sequentially executes the channel attention and space attention operations, and an intermediate feature diagram F epsilon R is input ^C×H×W A1D channel attention weighting map M can be obtained by a channel attention module (CA) _C ∈R ^C×1×1 ，M _C The channel weight value is broadcast in the space dimension to obtain a refined feature map F subjected to channel attention fine adjustment _C ∈R ^C×H×W The spatial attention module (SA) is connected with the power supply through F _C To generate a 3D spatial attention weighting map M _S ∈R ^C×H×W By combining M _S And F _C Performing point-by-point multiplication to obtain a final output characteristic diagramThe overall attention mechanism can be summarized as the following process:

F _C ＝CA(F)×F

the generation of channel attention is carried out by a channel attention module which is responsible for achieving the object by exploring the relation among channels existing in the characteristics, adopting a method proposed by CBAM, adopting average pooling and maximum pooling operation, aggregating space information from the characteristic diagram, generating two independent space context descriptors by the aggregation process, splicing the two independent channel characteristic descriptors by the splicing operation, inputting the descriptors into a multi-layer perceptron (MLP), carrying out fine tuning and dimension reduction on the input channel weight descriptors by the multi-layer perceptron (MLP), obtaining a channel attention weight map consistent with the channel dimension of the input characteristic diagram, and for reducing parameter expenditure, sharing the MLP, wherein the number of activated neurons of the hidden layer is set as C/r multiplied by 1, wherein r represents a reduction ratio, and the calculation of the channel attention can be summarized as follows:

CA(F)＝σ(MLP(concat(AvgPool(F)，MaxPool(F))))

the spatial attention is generated by extracting a spatial mapping relation, we consider that forced consistency is avoided in a spatial attention weight graph of each channel, attention weights are dynamically distributed in two dimensions of the channel and the space to be more in line with the actual demand, fig. 2 illustrates that the spatial relation between features is captured by utilizing deep convolution, the spatial attention feature graph can be ensured to preserve the relation between the channels, meanwhile, the computational complexity is reduced, the capability of capturing the spatial relation of different scales by adopting multi-scale structure enhanced convolution operation is improved, channel mixing is carried out by using 1×1 convolution at the tail part of a spatial attention module, so that a finer spatial attention weight graph is generated, and the spatial attention is calculated by the following method:

SA(F)＝Conv _1×1 (Concat(Branch ₁ (DwConv(F))，

Branch ₂ (DwConv(F))，Branch ₃ (DwConv (F)), F), where DwConv represents a depth convolution, branch _i I epsilon {1,2,3} represents the ith branch, F represents the input feature map, two depth-wise one-dimensional convolutions are adopted to approximate a large-kernel standard channel convolution, the convolution kernel size of each channel is different, the channel attention and the spatial attention are respectively focused on what and where and can be placed in parallel or sequentially, and in the module, the spatial attention design is based on the channel attention preamble, so that a channel priori convolution attention module is formed;

s3, designing a model overall structure by selecting a U-Net as a reference network of a segmentation model, designing the segmentation structure by the U-Net, wherein the input image data to be segmented is 3-dimensional image data, performing a 3X3 convolution module (comprising subsequent Relu and BN operations), changing the generated feature map channel number into 64, performing finer feature extraction by the CPCA module at the moment by further extracting the features of the input image, performing downsampling by a 2X 2 convolution module at the step length, reducing the resolution to one half of the original resolution, performing dimensional lifting by a 3X3 common convolution module, changing the channel number of the feature map into 2 times before input, performing further extraction of the feature map features by the CPCA module, repeatedly performing downsampling, upstroke and CPCA, performing transposed convolution after passing through a common 3X3 convolution module, performing finer feature extraction by the CPCA module, performing downstroke to reduce the feature map information loss caused by the downstroke operation, performing dimensional operation on the decoder of the same layer by the downstroke operation on the feature map, performing downstroke operation on the feature map layer by the decoder, performing the downstroke operation on the feature map to obtain the feature map 640, performing the feature map detail, and performing the downstroke on the feature map to obtain the final feature map, and obtaining the feature map detail.

4. The method for soft tissue segmentation model of oral CBCT image teeth based on improved U-Net model according to claim 3, wherein said model test comprises a training phase and a testing phase, comprising the following steps:

training stage by dividing the network model with preprocessed data, dividing the data set into training set and test set according to the ratio of 9:1, using PyTorch deep learning frame, adopting SGD optimization function, wherein momentum value is 0.9, weight attenuation parameter weight_decay is set to 0.0001, initial learning rate is set to 0.001, training period is set to 300 epochs, batch data size is set to 16, and then using StepLR to perform learning rate adjustment, wherein learning rate change mode is according to formulaGradual update, where lr _new Value, lr, representing learning rate in new training round _old The method comprises the steps that a value representing the learning rate in the previous round of training is item_num, the number of current iterations is represented, max_iterations represents the maximum iteration number set by experiments, meanwhile, a used loss function is a mixed loss function, after training is finished, a parameter weight corresponding to the minimum loss value in the whole training period is selected as a final weight parameter, an EarlyStopping mechanism is adopted in model training, the phenomenon of fitting is prevented in the training process, the model can be considered to be successfully converged when the loss function value of continuous training for a certain round is reduced and does not exceed a set threshold value, and training can be directly stopped, wherein model parameters at the moment are final weight parameters;

5. The method for soft tissue segmentation model of oral CBCT image teeth based on an improved U-Net model according to claim 4, wherein the model builds a U-shaped structure based on a U-Net network model, an encoder coding module constructed by downsampling, a decoder module constructed by upsampling, and a bottleneck block connecting the encoder and the decoder, and a jump connection structure used for reducing loss of segmentation boundary details due to downsampling of the encoder.