CN114926470A

CN114926470A - System and method for segmenting impacted tooth medical image based on deep learning

Info

Publication number: CN114926470A
Application number: CN202210517932.8A
Authority: CN
Inventors: 张一钰; 杨根科; 褚健
Original assignee: Ningbo Institute Of Artificial Intelligence Shanghai Jiaotong University
Current assignee: Ningbo Institute Of Artificial Intelligence Shanghai Jiaotong University; Ningbo Stomatological Hospital Group Co ltd
Priority date: 2022-05-13
Filing date: 2022-05-13
Publication date: 2022-08-19

Abstract

The invention discloses a deep learning-based impacted tooth medical image segmentation system, relates to the technical field of medical image auxiliary diagnosis, and comprises a pre-training module, an encoder module and a decoder module. The invention also discloses a method for segmenting the impacted tooth medical image based on deep learning, which comprises the following steps: s100, pre-training, S200, encoding, S300 and decoding. The invention improves the accuracy of the segmentation algorithm and meets the requirement of the medical image segmentation precision.

Description

System and method for segmenting impacted tooth medical image based on deep learning

Technical Field

The invention relates to the technical field of medical image auxiliary diagnosis, in particular to a system and a method for segmenting an impacted tooth medical image based on deep learning.

Background

Teeth are not erupted into the normal bite position in the jaw due to improper positioning, and are called impacted teeth or impacted teeth. The most common impacted teeth are the mandibular third molar, followed by the maxillary third molar and the maxillary canine. The gum around the crown of the impacted tooth is often inflamed and painful, and in severe cases, the cheek can swell, the mouth can be difficult to open, even the whole body is hot, and the submandibular lymph node is enlarged. The impacted tooth often causes the symptoms of caries, loosening, alveolar bone absorption and the like of the adjacent tooth. Some impacted teeth may also be completely embedded by bone tissue due to the malposition of the impacted tooth, which is blocked by adjacent teeth. Therefore, the extraction of the impacted tooth is more difficult than that of other teeth, the gum covered by the gum needs to be cut, the bone embedded by the bone needs to be removed, and the dental crown blocked by the adjacent tooth needs to be split and extracted in a split mode. Therefore, the time for extracting the impacted tooth is long, and the postoperative complications are more, such as bleeding, broken root, adjacent tooth injury, postoperative lower lip numbness, dry socket disease and the like. When a doctor pulls out teeth, if the doctor does not operate properly, the nerve tube of the lower alveolar will be damaged, and the lower lip is paralyzed, which is a serious medical accident. Therefore, before performing a dental procedure, it is necessary to segment the CBCT image of the impacted tooth to evaluate the anatomic position relationship between the impacted third molar and the inferior alveolar neural canal and the surrounding alveolar bone.

The automatic analysis of medical images is always a research hotspot in computer vision tasks, and has wide application value in the directions of surgical planning, pathological analysis, disease diagnosis and the like. Medical image segmentation aims to make image anatomical or pathological structural changes clearer, and often plays a key role in computer-aided diagnosis and intelligent medicine. Medical image segmentation is a part of automatic analysis of medical images and is the basis of tasks such as lesion detection and identification. The doctor can do quantitative analysis to the tissue, make operation plan, dissect, locate the focus, measure the focus size, etc. without precise division of the medical image.

Compared with the traditional machine learning method, the deep learning method has very excellent feature expression capability, the defect of manually designing features is eliminated, the data features are automatically learned by an algorithm, a large amount of manpower is saved while the accuracy is improved, and the development of medical image segmentation is promoted. Medical image segmentation based on deep learning techniques is a trend. The medical image segmentation technology is put into use, can bring huge economic benefits to the society, and can meet the ever-increasing health needs of people.

At present, most medical image segmentation models are expanded based on a natural image segmentation technology, and currently, Network frameworks which are more mainstream include a Convolutional Neural Network (CNN), a Full Convolutional Network (FCN), and a U-shaped Convolutional Neural Network (U-Net). In particular, a deep convolutional neural network U-Net based on a U-shaped architecture and a jump connection is widely applied to various medical image tasks.

At present, although the method based on the CNN has excellent performance in the field of medical image segmentation, due to the limitation of convolution operation, it cannot well learn global and remote semantic information interaction, and still cannot completely meet the strict requirement of medical application on segmentation accuracy. Recently, Transformer-based approaches have become very popular, replacing the convolution operator and using a self-attention module to compose the entire codec structure that can encode remote dependencies. Transformer-based methods have enjoyed tremendous success in the field of medical image segmentation.

Although the method based on the Transformer can improve the accuracy of medical image segmentation, the method has the problems of high calculation cost and high memory occupation. From the current studies, the combination of Transformer and CNN may lead to better results.

In a patent of medical image segmentation method (with application number of CN201811405685.2), an original medical image is input into a preprocessing network to obtain a corresponding medical image feature map; inputting the medical image feature map into a region extraction network to obtain all foreground object feature maps; and classifying, detecting and segmenting the foreground object characteristic graph by adopting the CNN to obtain a final segmentation result. The medical image segmentation method provided by the patent adopts example segmentation to realize accurate segmentation of each region of interest of the medical image, solves the problem that similar objects are adhered to each other in the existing segmentation method, and is simple and efficient. However, the patent only uses the CNN-based method, and cannot fully meet the strict requirements of medical application on the segmentation accuracy.

Accordingly, those skilled in the art are directed to developing a deep learning based impacted tooth medical image segmentation system and method.

Disclosure of Invention

In view of the above defects in the prior art, the technical problem to be solved by the present invention is how to meet the requirement of medical image segmentation precision and improve the accuracy of the segmentation algorithm.

The inventor analyzes that the prior art only uses a CNN-based method to segment medical images, and cannot meet the strict requirement of medical application on segmentation precision. Therefore, the inventor adopts a Convolutional Neural Network (CNN) and a depth Self-Attention network (Transformer) to solve a Cone Beam Computed Tomography (CBCT) image of the impacted teeth for segmentation, and utilizes Cross-Window Self-Attention (CSWin Transormer) to improve the segmentation precision and reduce the computation cost. Besides, the encoder is pre-trained, and the segmentation precision is further improved.

In one embodiment of the present invention, a system for segmenting an impacted tooth medical image based on deep learning is provided, which includes:

the pre-training module is used for inputting a CBCT image of the impacted teeth, performing self-supervision expression learning, encoding information sensed by a Region of Interest (ROI), and performing formulaization by using a multi-target loss function to obtain sub-volume data;

the encoder module is used for encoding the sub-volume data, performing representation learning and extracting the characteristics of the sub-volume data;

and the decoder module is used for decoding the sub-volume data characteristics, performing characteristic up-sampling through the residual block, calculating the segmentation probability and outputting the segmentation result.

The pre-training module is connected with the encoder module, the encoder module is connected with the decoder module, the bionic tooth CBCT image carries out self-supervision at the pre-training module and represents study, the sub-volume data characteristics are extracted at the encoder module, the decoder module decodes the extracted sub-volume data characteristics, and the segmentation result is calculated.

Optionally, in the distal end of the distal end.

Optionally, in the deep learning-based impacted tooth medical image segmentation system according to any of the above embodiments, the encoder module includes four CSWin Transformer encoders, and the four CSWin Transformer encoders are connected in series.

Further, in the deep learning-based impacted tooth medical image segmentation system of the above embodiment, four CSWin Transformer encoders in the encoder module are respectively connected to the decoder module.

Further, in the deep learning-based impacted tooth medical image segmentation system of the above embodiment, the resolutions of the four CSWin Transformer encoders are sequentially reduced 1/2 to extract feature representations at four different resolutions.

Optionally, in the distal sub-module of the deep learning-based impacted tooth medical image segmentation system according to any of the above embodiments, the decoder module includes a segmentation head, and determines whether the region belongs to an impacted tooth or a neural tube region to be segmented by using the up-sampled sub-volume data characteristics, and calculates a final output.

Optionally, in the deep learning-based interdigitation medical image segmentation system according to any of the above embodiments, the residual block in the decoder module is composed of two post-normalized 3 × 3 × 3 convolutional layers with example normalization, and based on the CNN, a U-shaped convolutional neural network is created.

Based on any one of the embodiments, another embodiment of the present invention provides a method for segmenting an impacted tooth medical image based on deep learning, which includes the following steps:

s100, pre-training, wherein the CBCT image of the impacted teeth is subjected to self-supervision representing learning in a pre-training module to obtain pre-trained sub-volume data;

s200, coding, namely coding the pre-trained sub-volume data in a coder module, performing representation learning, and extracting the characteristics of the sub-volume data;

and S300, decoding, namely decoding the extracted sub-volume data characteristics in a decoder module, calculating a segmentation result and outputting the segmentation result.

Optionally, in the method for segmenting the vital teeth medical image based on the deep learning of the embodiment, the pre-training module includes a mask volume restoration sub-module, a three-dimensional rotation prediction sub-module and a contrast learning sub-module.

Further, in the method for segmenting the vital teeth medical image based on the deep learning of the above embodiment, the step S100 includes:

s110, inputting a CBCT image of the impacted teeth, inputting the CBCT image of the impacted teeth into a pre-training module, and changing the CBCT image of the impacted teeth into sub-volume data through pre-training;

s120, mask volume restoration, wherein the mask volume restoration submodule learns the texture and structure of a mask area and the corresponding relation of the mask area and the surrounding environment, and a mask volume restoration target loss function formula is defined as follows:

wherein x is the sub-volume data before the mask volume is restored,

the volume data after restoration;

s130, three-dimensional rotation prediction, wherein the three-dimensional rotation prediction submodule learns the structure content of the CBCT image of the occlusion teeth and generates sub-volume data of different rotation angles for comparison learning, and a rotation angle loss function formula is as follows:

wherein, y _r In the true value, the value of,

for the activation function probability, R is the number of different subvolume data, R is the total number of subvolume data;

s140, contrast learning, namely distinguishing various interested regions of different sub-volume data in a contrast learning submodule, wherein a contrast learning loss function formula is as follows:

wherein v is _i ，v _j Is enhancement data from the same subvolume, t is a measure of normalized temperature scale, l is an indicator function, sim represents the dot product between normalized embeddings, k is the number of subvolume enhancement data, N is the total number of subvolume enhancement data;

s150, calculating a minimized total loss function, wherein the pre-training module needs to minimize the total loss function to achieve the best effect, and the formula is as follows:

l _tot ＝λ ₁ L _inpaint +λ ₂ L _contrast +λ ₃ L _rot (4)

wherein λ is ₁ ，λ ₂ ，λ ₃ Is an adjustable parameter.

Optionally, in the method for segmenting an impacted tooth medical image based on deep learning according to any of the embodiments above, the encoder module includes four CSWin Transformer encoders, the four CSWin Transformer encoders are connected in series and are respectively connected to the decoder module, and the resolutions of the four CSWin Transformer encoders are sequentially reduced by 1/2, so as to extract feature representations at four different resolutions.

Further, in the method for segmenting the vital teeth medical image based on the deep learning of the above embodiment, the step S200 includes:

s210, sub-volume data input, namely inputting pre-trained sub-volume data into an encoder module, and calculating in four CSWin transform encoders;

s220, Cross-Shaped Window Self-Attention (CSWin) calculation, Self-Attention in the horizontal and vertical directions, and output connection, wherein the formula is as follows:

CSWin-Attention(X)＝Concat(head ₁ ,...,head _k )W ^O (5)

wherein, the first and the second end of the pipe are connected with each other,

wherein, W ^O ∈R ^C×C Is a commonly used projection matrix, k is the linear projection value of the subvolume data, H-Attention _k (X) is the horizontal self-Attention, V-Attention, of the kth head _k (X) is the vertical self-attention of the kth head, K is the total number of linear projection values of the subvolume data;

s230, adding important position information of the sub-volume data by using local-enhanced Positional Encoding (LePE), and using local enhanced Positional Encoding;

and S240, reducing the resolution of the sub-volume data, and sequentially reducing the resolution by 1/2 in each CSWin transform encoder to obtain four different resolutions, and extracting the bottom-layer features.

Optionally, in the method for segmenting the vital teeth medical image based on the deep learning according to any of the above embodiments, the step S300 includes:

s310, feature upsampling, namely upsampling the extracted sub-volume data feature by using a residual block;

and S320, image segmentation, namely inputting the sub-volume data characteristics after the up-sampling into a final convolution layer with a proper activation function, calculating the segmentation probability and outputting the segmentation result.

Further, in the deep learning-based impacted tooth medical image segmentation system of the above embodiment, the residual block is composed of two post-normalized 3 × 3 × 3 convolutional layers with example normalization, and based on the CNN, a U-shaped convolutional neural network is created.

The invention combines CNN and Transformer, uses CSWin Transformer as an encoder for dividing network, and connects to a decoder module based on CNN; the CSWin Transformer encoder utilizes a CSWin mechanism to calculate self-attention in the horizontal direction and the vertical direction in parallel, so that the calculated amount is reduced, the calculation precision is improved, local enhancement position coding is introduced, and local position information is better processed. The invention improves the accuracy of the segmentation algorithm and meets the requirement of the medical image segmentation precision.

The conception, specific structure and technical effects of the present invention will be further described in conjunction with the accompanying drawings to fully understand the purpose, characteristics and effects of the present invention.

Drawings

FIG. 1 is a block diagram illustrating a deep learning based occluding tooth medical image segmentation system in accordance with an exemplary embodiment;

fig. 2 is a flowchart illustrating deep learning-based occluding tooth medical image segmentation according to an exemplary embodiment.

Detailed Description

The technical contents of the preferred embodiments of the present invention will be made clear and easily understood by referring to the drawings attached to the specification. The present invention may be embodied in many different forms of embodiments and the scope of the invention is not limited to the embodiments set forth herein.

In the drawings, structurally identical elements are represented by like reference numerals, and structurally or functionally similar elements are represented by like reference numerals throughout the several views. The size and thickness of each component shown in the drawings are arbitrarily illustrated, and the present invention is not limited to the size and thickness of each component. The thickness of the components is exaggerated somewhat schematically and appropriately in order to make the illustration clearer.

The inventor designs a deep learning-based impacted tooth medical image segmentation system, as shown in fig. 1, which comprises:

the pre-training module is used for inputting a CBCT image of the impacted teeth, performing self-supervision expression learning, encoding information sensed by a Region of Interest (ROI), and performing formulaization by using a multi-target loss function to obtain sub-volume data; the pre-training module comprises a mask volume repairing submodule, a three-dimensional rotation predicting submodule and a contrast learning submodule, the three submodules are sequentially connected in series, the mask volume repairing submodule learns the texture and the structure of a mask region and the corresponding relation between the texture and the structure of the mask region and the surrounding environment of the mask region, the three-dimensional rotation predicting submodule learns the structure content of the bionic interference CBCT image and generates various sub-volume data which can be used for contrast learning, the rotation angle category of the input sub-volume data is predicted, and the contrast learning submodule distinguishes various interesting regions of different segmentation parts;

the encoder module is used for encoding the sub-volume data, performing representation learning and extracting the characteristics of the sub-volume data; the encoder module comprises four CSWin Transformer encoders which are connected in series and are respectively connected with the decoder module, and the four CSWin Transformer encoders are sequentially subjected to resolution reduction 1/2 and are expressed by extracting features with four different resolutions;

the decoder module is used for decoding the characteristics of the sub-volume data, performing characteristic up-sampling through a residual block, wherein the residual block consists of two post-normalization 3 multiplied by 3 convolutional layers with example normalization, and based on CNN, a U-shaped convolutional neural network is established, and the segmentation probability is calculated to obtain a segmentation result; the decoder module includes a terminal sub-module, which is a header, for computing the final output;

the pre-training module is connected with the encoder module, the encoder module is connected with the decoder module, the impacted tooth CBCT image performs self-supervision representation learning in the pre-training module, the features are extracted in the encoder module, the decoder module decodes the extracted features, and the segmentation result is calculated.

Based on the above embodiments, the inventor provides a method for segmenting a vital teeth medical image based on deep learning, as shown in fig. 2, comprising the following steps:

s100, pre-training, wherein the CBCT image of the impacted teeth is subjected to self-supervision representing learning in a pre-training module to obtain pre-trained sub-volume data; the method specifically comprises the following steps:

wherein x is subvolume data before the mask volume is restored,

the volume data after restoration;

s130, three-dimensional rotation prediction, wherein the three-dimensional rotation prediction submodule learns the structure content of the CBCT image of the vital teeth and generates sub-volume data of different rotation angles which can be used for comparison learning, the rotation angle category of the input sub-volume data is predicted, and the rotation angle loss function formula is as follows:

wherein, y _r The value is true and the value is true,

s140, contrast learning, wherein various regions of interest of different sub-volume data are distinguished in a contrast learning submodule, and a contrast learning loss function formula is as follows:

l _tot ＝λ ₁ L _inpaint +λ ₂ L _contrast +λ ₃ L _rot (4)

wherein λ is ₁ ，λ ₂ ，λ ₃ Is an adjustable parameter.

S200, coding, namely coding the pre-trained sub-volume data in a coder module, performing representation learning, and extracting the characteristics of the sub-volume data, wherein the coder module comprises four CSWin Transformer coders, the four CSWin Transformer coders are connected in series and are respectively connected with a decoder module, the resolutions of the four CSWin Transformer coders are sequentially reduced by 1/2, and the characteristics are extracted and represented by four different resolutions; the method specifically comprises the following steps:

s210, sub-volume data input, namely inputting the pre-trained sub-volume data into an encoder module, and calculating in four CSWin transform encoders;

CSWin-Attention(X)＝Concat(head ₁ ,...,head _k )W ^O (5)

wherein the content of the first and second substances,

S300, decoding, namely decoding the extracted sub-volume data characteristics in a decoder module, calculating a segmentation result and outputting the segmentation result; the method specifically comprises the following steps:

s310, feature upsampling, namely upsampling the extracted sub-volume data features by using a residual block, wherein the residual block consists of two post-normalization 3 x 3 convolutional layers with example normalization, and a U-shaped convolutional neural network is created based on CNN;

The foregoing detailed description of the preferred embodiments of the invention has been presented. It should be understood that numerous modifications and variations could be devised by those skilled in the art in light of the present teachings without departing from the inventive concepts. Therefore, the technical solutions that can be obtained by a person skilled in the art through logical analysis, reasoning or limited experiments based on the prior art according to the concepts of the present invention should be within the scope of protection determined by the claims.

Claims

1. A system for segmentation of an impacted tooth medical image based on deep learning, comprising:

the pre-training module is used for inputting a impacted tooth CBCT image, performing self-supervision expression learning, encoding information sensed by the region of interest, and performing formulaization by using a multi-target loss function to obtain sub-volume data;

the encoder module encodes the sub-volume data, performs representation learning and extracts the characteristics of the sub-volume data;

and the decoder module decodes the sub-volume data characteristics, performs characteristic up-sampling through the residual block, calculates the segmentation probability and outputs the segmentation result.

The pre-training module is connected with the encoder module, the encoder module is connected with the decoder module, the impacted tooth CBCT image is in the pre-training module for self-supervision representation learning, the encoder module extracts sub-volume data characteristics, and the decoder module decodes the sub-volume data characteristics and calculates segmentation results.

2. The system as claimed in claim 1, wherein the pre-training module comprises a mask volume repair submodule, a three-dimensional rotation prediction submodule and a contrast learning submodule, the mask volume repair submodule, the three-dimensional rotation prediction submodule and the contrast learning submodule are connected in series in sequence, the mask volume repair submodule learns the texture, structure and corresponding relation with the surrounding environment of the mask area, the three-dimensional rotation prediction submodule learns the structural content of the block tooth CBCT image and generates various sub-volume data which can be used for contrast learning, the angle category of rotation of the sub-volume data is predicted and input, and the contrast learning submodule distinguishes various regions of interest of different segmentation parts.

3. The depth-learning based arrhythmic tooth medical image segmentation system of claim 1, wherein the encoder module comprises four CSWin Transformer encoders connected in series.

4. The deep learning-based tenaculum medical image segmentation system of claim 3 wherein the four CSWin transform encoders are sequentially reduced in resolution 1/2 to extract feature representations at four different resolutions.

5. The deep learning-based impacted tooth medical image segmentation system of claim 1, wherein the residual block consists of two post-normalized 3 x 3 convolutional layers with instance normalization, based on CNN, creating a U-shaped convolutional neural network.

6. A method for the deep learning-based interdigitation medical image segmentation, which uses the deep learning-based interdigitation medical image segmentation system as claimed in claim 4, comprising the steps of:

s100, pre-training, wherein the CBCT image of the impacted teeth is subjected to self-supervision representing learning in the pre-training module to obtain pre-trained sub-volume data;

s200, coding, wherein the pre-trained sub-volume data are coded in the coder module, and are used for representing learning and extracting the characteristics of the sub-volume data;

and S300, decoding, wherein the sub-volume data characteristics are decoded in the decoder module, the segmentation result is calculated, and the segmentation result is output.

7. The method of deep learning-based interdigitation medical image segmentation as claimed in claim 6 wherein the pre-training module comprises a mask volume restoration sub-module, a three-dimensional rotation prediction sub-module and a contrast learning sub-module.

8. The method for segmentation of a vital teeth medical image based on deep learning according to claim 7, wherein the step S100 includes:

s110, inputting a Cone Beam Computed Tomography (CBCT) image of the impacted teeth, inputting the CBCT image of the impacted teeth into the pre-training module, and changing the CBCT image of the impacted teeth into sub-volume data through pre-training;

s120, mask volume restoration, wherein the mask volume restoration submodule learns the texture, the structure and the corresponding relation with the surrounding environment of the mask area, and a mask volume restoration target loss function formula is defined as follows:

s130, three-dimensional rotation prediction, wherein the three-dimensional rotation prediction submodule learns the structure content of the CBCT image of the vital teeth and generates sub-volume data of different rotation angles which can be used for contrast learning, the rotation angle category of the input sub-volume data is predicted, and the rotation angle loss function formula is as follows:

s140, contrast learning, wherein various regions of interest of different sub-volume data are distinguished in the contrast learning submodule, and a contrast learning loss function formula is as follows:

l _tot ＝λ ₁ L _inpaint +λ ₂ L _contrast +λ ₃ L _rot (4)。

9. the method for segmentation of a vital teeth medical image based on deep learning according to claim 6, wherein the step S200 comprises:

s210, sub-volume data input, wherein the pre-trained sub-volume data is input into the encoder module and calculated in the four CSWin transform encoders;

CSWin-Attention(X)＝Concat(head ₁ ,...,head _k )W ^O (5)；

s230, adding important position information of the sub-volume data by using local enhanced position coding and using the local enhanced position coding;

10. The method for segmentation of a vital teeth medical image based on deep learning according to claim 6, wherein the step S300 comprises:

s310, feature upsampling, wherein for the sub-volume data feature, upsampling is carried out by using a residual block;