CN112085028B

CN112085028B - Tooth full-scene semantic segmentation method based on feature map disturbance and boundary supervision

Info

Publication number: CN112085028B
Application number: CN202010894993.7A
Authority: CN
Inventors: 吴福理; 张凡; 郝鹏翼; 陈大千; 郑宇祥
Original assignee: Zhejiang University of Technology ZJUT
Current assignee: Zhejiang University of Technology ZJUT
Priority date: 2020-08-31
Filing date: 2020-08-31
Publication date: 2024-03-12
Anticipated expiration: 2040-08-31
Also published as: CN112085028A

Abstract

The invention discloses a feature map disturbance and boundary supervision-based tooth full-view image semantic segmentation method, which comprises the steps of after a tooth full-view image is obtained, sharpening the tooth full-view image to obtain a tooth full-view image with a clearer tooth boundary, and then carrying out feature extraction on a disturbance feature map extraction network after training of the tooth full-view image to obtain a deep disturbance feature map; and finally, respectively inputting the deep disturbance feature map into a mask network and a boundary network which are completed by training, and obtaining a tooth region segmentation result and a tooth contour segmentation result. The invention greatly enhances the generalization capability of the network, and enables the trained model to still obtain a reasonable segmentation result by utilizing part of common features in special situations when the model encounters the special situations.

Description

Tooth full-scene semantic segmentation method based on feature map disturbance and boundary supervision

Technical Field

The invention relates to the field of medical image processing, in particular to a tooth full-scene semantic segmentation method based on feature map disturbance and boundary supervision.

Background

The shortage of oral medical resources in China is mainly manifested by serious shortage of oral doctors, unbalanced regional development and insufficient development kinetic energy of domestic oral medical appliances and equipment. The national oral industry trend report in 2019 shows that the World Health Organization (WHO) is responsible for dentists: the recommended value for population ratio is 1:5000, which ratio rises to 1:2000 for developed countries. The population proportion of the dentists in China is less than 1:8000 and is far lower than the level and WHO recommended values in other countries. With rapid oral medical development in the eastern developed areas of the north america, the physician's possession has increased significantly, like the ratio of dentists to residents in the beijing urban area is about 1:2000, and the developed countries are almost as much, but only 1:8000 to suburban areas, and 1:20000 or 1:30000 to the west. This imbalance in regional development is a significant problem facing our country. In recent years, although the number of oral medical practitioners per ten thousands of people is slightly increased, the medical treatment requirements of the people on oral medical services are still far from being met. Except for the difference in the number of stomatologists, the overall constitution of our stomatologists is lower than the academic level in quality, and it is counted that the number of the oral doctors in China is about 45% of the number of the academic persons and the academic persons above in 2015, and most of the oral doctors are concentrated in public medical institutions or high-end medical institutions in large and medium cities, and the number of people who do not receive regular oral education or only receive primary oral medical education (Chinese proprietary level) for the oral and related medical industries is quite a part.

In addition, even in public oral medical institutions with high public confidence, because the patient volume far exceeds the normal load, doctors often only can diagnose and treat the complaint teeth of the patient, and the problem of non-complaint oral cavity of the patient is often ignored and treatment is delayed or omitted. On the other hand, due to the difference of doctor level, some oral diseases are easy to be missed or misdiagnosed. Therefore, if the panoramic film can be interpreted in advance by means of the artificial intelligence (Artificial Intelligence, AI) technology and a preliminary diagnosis report can be automatically provided, the efficiency and the accuracy of oral disease diagnosis can be improved, and misdiagnosis caused by missing diagnosis can be reduced. The segmentation of the teeth in the full view is the basis of all dental disease detection.

Patent title, panoramic film permanent tooth recognition method and device based on deep learning, application number of CN109949319A, application date of 2019-3-12; the patent describes a panoramic film permanent tooth recognition method and device based on deep learning, wherein an alveolar bone line segmentation model is utilized to obtain an alveolar bone line segmentation result, periodontal area image blocks are cut out from an original full-view film according to the alveolar bone line segmentation result, and finally the periodontal area image blocks are input into a permanent tooth segmentation model based on the deep learning to obtain a permanent tooth segmentation result and mark tooth position numbers.

Patent title, depth profile perception based tooth segmentation method, device and computer equipment, application number is CN110473243A, and application date is 2019-8-9; the patent describes a tooth segmentation method, a device and computer equipment based on depth contour perception, wherein an original mask is subjected to morphological treatment to extract a contour mask and is thickened, the thickened contour mask is used as supervision information, a preprocessed original tooth image is subjected to a full convolution network, a loss function is minimized, and the full convolution network is trained to obtain a contour prediction probability map. And then fusing the preprocessed tooth image and the contour prediction probability map, and obtaining a tooth segmentation result map after the fusion by using a U-shaped depth contour perception network taking an original mask as supervision information.

In the prior art, in the aspects of enhancing the generalization capability of a model and utilizing boundary information, a targeted strategy cannot be used for enhancing the generalization capability of the model, and the boundary information is not paid attention to enough, so that the boundary information and mask information are not considered in the same position, and the extracted features have insufficient universality and poor segmentation result.

Disclosure of Invention

The purpose of the application is to provide a tooth full-view film semantic segmentation method based on feature map disturbance and boundary supervision, which solves the problem that the prior art cannot utilize part of common features in special cases to conduct the tooth full-view film semantic segmentation method when the special cases are met.

In order to achieve the above purpose, the technical scheme of the application is as follows:

a tooth full-scene semantic segmentation method based on feature map disturbance and boundary supervision comprises the following steps:

obtaining a tooth full view picture, and sharpening the tooth full view picture to obtain a tooth full view picture I with clearer tooth boundaries;

inputting the tooth full-scene picture I into a disturbance feature map extraction network after training to obtain a deep disturbance feature map F _deep ；

Map F of deep disturbance _deep And respectively inputting the tooth region segmentation result and the tooth contour segmentation result into a mask network and a boundary network after training is completed.

Further, the obtaining the tooth full view picture, the sharpening operation is performed on the tooth full view picture, and the tooth full view picture I with clearer tooth boundary is obtained, including:

input original dental full view film I _original And (3) sharpening filtering operation is carried out on all the dental full-view sheets by adopting a filter, and the filter kernel is 3*3 to obtain the sharpened dental full-view sheet I.

Further, the tooth full-scene image I is input into a disturbance feature image extraction network after training to perform feature extraction, so as to obtain a deep disturbance feature image F _deep Comprising:

step 2.1, inputting the tooth full-scene film I into a simple feature extraction module with the convolution kernel size of 3*3 to obtain an output feature map F ₁ Its dimension is C ₁ ×H ₁ ×W ₁ ；

Step 2.2, feature map F ₁ After pooling, inputting the obtained product to a disturbance feature extraction module with a convolution kernel of 3*3 to obtain a feature map F ₂ Its dimension is C ₂ ×H ₂ ×W ₂ ；

Step 2.3, feature map F ₂ After pooling, inputting the obtained product to a disturbance feature extraction module with a convolution kernel of 3*3 to obtain a feature map F ₃ Its dimension is C ₃ ×H ₃ ×W ₃ ；

Step 2.4, feature map F ₃ After pooling, inputting the obtained product to a disturbance feature extraction module with a convolution kernel of 3*3 to obtain a feature map F ₄ Its dimension is C ₄ ×H ₄ ×W ₄ ；

Step 2.5, feature map F ₄ The pooled disturbance characteristic is input into a disturbance characteristic extraction module with the convolution kernel size of 3*3 to obtain a deep disturbance characteristic diagram F _deep Its dimension is C ₅ ×H ₅ ×W ₅ 。

Further, the depth disturbance characteristic map F _deep Respectively inputting the tooth region segmentation result and the tooth contour segmentation result into a mask network and a boundary network which are trained, and obtaining the tooth region segmentation result and the tooth contour segmentation result, wherein the tooth region segmentation result and the tooth contour segmentation result comprise,

step 3.1, in the mask network, depth disturbance feature map F _deep Up-sampling and F ₄ Inputting the feature images into a channel fusion module together to obtain a feature image after channel combination, wherein the number of channels is C ₅ +C ₄ Inputting the simple feature extraction module to obtain a feature map UP ₄ Its dimension is C ₄ ×H ₄ ×W ₄ ；

Step 3.2, feature map UP ₄ Up-sampling and F ₃ Inputting the feature images into a channel fusion module together to obtain a feature image after channel combination, wherein the number of channels is C ₄ +C ₃ Inputting the simple feature extraction module to obtain a feature map UP ₃ Its dimension is C ₃ ×H ₃ ×W ₃ ；

Step 3.3, feature map UP ₃ Up-sampling and F ₂ Inputting the feature images into a channel fusion module together to obtain a feature image after channel combination, wherein the number of channels is C ₃ +C ₂ Inputting the simple feature extraction module to obtain a feature map UP ₂ Its dimension is C ₂ ×H ₂ ×W ₂ ；

Step 3.4, feature map UP ₂ Up-sampling and F ₁ Inputting the feature images into a channel fusion module together to obtain a feature image after channel combination, wherein the number of channels is C ₂ +C ₁ Inputting the simple feature extraction module to obtain a feature map UP ₁ Its dimension is C ₁ ×H ₁ ×W ₁ ；

Step 3.5, feature map UP ₁ Inputting 1*1 convolution block to obtain a characteristic map UP ₀ Dimension 32 XH ₁ ×W ₁ Wherein 32 represents 32 different teeth, UP ₀ Each channel of (2) is activated by the following formula to obtain UP ₀ The probability that each pixel point belongs to the tooth area is multiplied by 255 to obtain the segmentation of the final 32 teethResults;

and step 3.6, the operations of the steps 3.1 to 3.5 are adopted in the boundary network, and finally, the tooth profile segmentation result is output.

Further, the simple feature extraction module comprises two groups of convolution layers with the convolution kernel size of 3*3, a batch normalization layer and an activation layer which are connected in series.

Further, the disturbance feature extraction module comprises two groups of convolution layers with the convolution kernel size of 3*3 connected in series, a feature disturbance operation layer, a batch normalization layer and an activation layer.

Further, the channel fusion module is configured to combine the up-sampled lower-layer feature map and the current-layer feature map according to channels, and output a feature map with unchanged size and changed channel number into the sum of the lower-layer feature map and the current-layer feature map.

Further, the characteristic disturbance operation realizes disturbance on the characteristic diagram by using the following formula;

wherein x is _i To input a feature map, f (x _i ) Andrespectively representing the characteristic diagram before disturbance and after disturbance, m _i Consisting of 0 and 1, following a Bernoulli distribution, ε _i For controlling the disturbance amplitude, the parameter value is automatically optimized in training, and the matrix is expressed by corresponding multiplication of each point.

According to the tooth full-scene semantic segmentation method based on feature map disturbance and boundary supervision, on one hand, the feature map is disturbed by the disturbance feature extraction module in the feature extraction process, so that partial feature information can be lost in the disturbed feature map, and the neural network obtains segmentation results by learning how to use the partial feature-lost feature map, so that the generalization capability of the network is greatly enhanced, and the reasonable segmentation results can be obtained by using the partial features which are common in the special case when the special case is encountered. On the other hand, due to the introduction of the boundary network, the characteristics of the boundary of the segmented region can be directly learned through the boundary network, the boundary of the segmented region can be more easily found, and the segmentation effect on the conditions of difference in the classes and similarity between the classes is improved.

Drawings

FIG. 1 is a flow chart of a method for semantic segmentation of a dental full-scene based on feature map perturbation and boundary supervision;

FIG. 2 is a schematic diagram of a network structure according to an embodiment of the present application;

FIG. 3 is a schematic diagram of a simple feature extraction module structure of the present application;

FIG. 4 is a schematic diagram of a disturbance feature extraction module structure of the present application;

fig. 5 is a schematic structural diagram of a channel fusion module of the present application.

Detailed Description

In order to make the objects, technical solutions and advantages of the present application more apparent, the present application will be further described in detail with reference to the accompanying drawings and examples. It should be understood that the specific embodiments described herein are for purposes of illustration only and are not intended to limit the present application.

In one embodiment, as shown in fig. 1, a method for semantic segmentation of a dental full-view film based on feature map perturbation and boundary supervision is provided, comprising:

step S1, obtaining a tooth full-view film, and sharpening the tooth full-view film to obtain a tooth full-view film I with clearer tooth boundaries.

This application carries out necessary preliminary treatment to the whole sight glass of tooth that acquires, acquire the whole sight glass of tooth, pass through sharpening operation with the whole sight glass of tooth, obtain the whole sight glass I of tooth that the tooth boundary is clearer, include:

input original dental full view film I _original Sharpening filtering operation is carried out on all the scenic spots of each tooth by adopting a filterThe filter kernel is 3*3, which yields a sharpened dental full view I.

It should be noted that, the present application may also directly use the original dental panorama to perform subsequent processing without performing sharpening operation. The filter kernel may also be set to 5*5, or 7*7, as desired.

S2, inputting the tooth full-scene image I into a disturbance feature image extraction network after training to perform feature extraction to obtain a deep disturbance feature image F _deep 。

The application describes that the feature extraction is carried out on the tooth full-scene film I to obtain a deep disturbance feature map F _deep Comprising:

As shown in fig. 2, the disturbance map extraction network of the present application includes a simple feature extraction module (CBR) and a four-layer disturbance feature extraction module (CDBR). In other embodiments, the disturbance feature map extraction network may also use other scrambling structures, such as a structure of three layers of simple feature extraction modules and two layers of disturbance feature extraction modules in sequence.

Compared with the undisturbed feature map, the disturbed deep feature map has more common features, and is beneficial to improving the generalization capability of the network.

The simple feature extraction module, as shown in fig. 3, includes two sets of serially connected convolution layers (conv 3×3) with convolution kernel size 3*3, a batch normalization layer (BN), and an activation layer (ReLU).

Firstly, calculating an input feature map through a convolution layer, then carrying out batch normalization and ReLU activation layer processing, then through a second group of convolution layers, batch normalization and ReLU activation layers, and finally outputting the processed feature map.

The disturbance feature extraction module, as shown in fig. 4, includes two sets of serially connected convolution layers (conv 3×3) with convolution kernel size 3*3, a feature disturbance, a batch normalization layer (BN), and an activation layer (ReLU).

Firstly, calculating an input feature map through a convolution layer, then, disturbing the feature map through feature disturbance operation, then, carrying out batch normalization and ReLU activation layer processing, then, through a second group of convolution layers, feature disturbance, batch normalization and ReLU activation layers, and finally, outputting the processed feature map.

The characteristic disturbance operation realizes the disturbance of the characteristic diagram by using the following formula;

wherein x is _i To input a feature map, f (x _i ) Andrespectively representing the characteristic diagram before disturbance and after disturbance, m _i Consisting of 0 and 1, following a Bernoulli distribution, ε _i For controlling the disturbance amplitude, the parameter value is automatically optimized in training, and the matrix is expressed by corresponding multiplication of each point. i represents the i-th layer of the network, x _i A feature map representing the i-th layer input.

It should be noted that, the convolution kernel sizes of the simple feature extraction module and the disturbance feature extraction module in the present application may be set to be 5*5 or 7*7 as required.

S3, deep disturbance characteristic diagram F _deep And respectively inputting the tooth region segmentation result and the tooth contour segmentation result into a mask network and a boundary network after training is completed.

The deep disturbance characteristic map F _deep Respectively inputting the tooth region segmentation result and the tooth contour segmentation result into a mask network and a boundary network which are trained, and obtaining the tooth region segmentation result and the tooth contour segmentation result, wherein the tooth region segmentation result and the tooth contour segmentation result comprise,

Step (a)3.5 feature map UP ₁ Inputting 1*1 convolution block to obtain a characteristic map UP ₀ Dimension 32 XH ₁ ×W ₁ Wherein 32 represents 32 different teeth, UP ₀ Each channel of (2) is activated by the following formula to obtain UP ₀ Multiplying the probability that each pixel point belongs to the tooth area by 255 to obtain a segmentation result of the final 32 teeth;

Wherein sigmoid is an activation function and e is a constant.

In this application, the channel fusion module (Copy), as shown in fig. 5, is configured to combine the upsampled lower layer feature map and the current layer feature map according to channels, and output a feature map with unchanged size and changed channel number into the sum of the lower layer feature map and the current layer feature map.

Similarly, the convolution kernel size of the simple feature extraction module in this embodiment is 3*3, and may be 5*5 or 7*7 as needed.

It should be noted that, in the present application, C is the number of channels, H is the height of the picture, W is the width of the picture, and the subscript of the letter indicates the sequence number, so as to distinguish the dimensions of different feature graphs.

According to the method and the device, the characteristics of the boundary of the segmented region can be directly learned through the boundary network, the boundary of the segmented region can be found more easily, and the segmentation effect on the conditions of difference in the classes and similarity between the classes is improved. For partial images, there may be a large difference in image characteristics between two parts within the same semantic region, which are easily identified as two types of semantic regions, which are referred to as intra-class variability. Similarly, there is a greater similarity between image features in different semantic regions of a part of an image, so that the two parts can be easily identified as a type of semantic region, which is called as an inter-type similarity. By learning the boundary information, the correct semantic boundary can be better found, and the segmentation effect when images with intra-class differences and inter-class similarity are encountered can be well improved.

The above examples merely represent a few embodiments of the present application, which are described in more detail and are not to be construed as limiting the scope of the invention. It should be noted that it would be apparent to those skilled in the art that various modifications and improvements could be made without departing from the spirit of the present application, which would be within the scope of the present application. Accordingly, the scope of protection of the present application is to be determined by the claims appended hereto.

Claims

1. The tooth full-view picture semantic segmentation method based on feature map disturbance and boundary supervision is characterized by comprising the following steps of:

inputting the tooth full-scene picture I into a disturbance feature map extraction network after training to perform feature extraction to obtain a deep disturbance feature map F _deep ；

Map F of deep disturbance _deep Respectively inputting the tooth region segmentation result and the tooth contour segmentation result into a mask network and a boundary network after training is completed;

the tooth full-scene image I is input into a disturbance feature image extraction network after training to perform feature extraction, so as to obtain a deep disturbance feature image F _deep Comprising:

Step 2.3, feature mapF ₂ After pooling, inputting the obtained product to a disturbance feature extraction module with a convolution kernel of 3*3 to obtain a feature map F ₃ Its dimension is C ₃ ×H ₃ ×W ₃ ；

Step 2.5, feature map F ₄ The pooled disturbance characteristic is input into a disturbance characteristic extraction module with the convolution kernel size of 3*3 to obtain a deep disturbance characteristic diagram F _deep Its dimension is C ₅ ×H ₅ ×W ₅ ；

Wherein the deep disturbance feature map F _deep Respectively inputting the tooth region segmentation result and the tooth contour segmentation result into a mask network and a boundary network which are trained, and obtaining the tooth region segmentation result and the tooth contour segmentation result, wherein the tooth region segmentation result and the tooth contour segmentation result comprise,

Step 3.4, feature map UP ₂ Up-sampling and F ₁ Together input channelsThe fusion module is used for obtaining a characteristic diagram after being combined according to channels, wherein the number of the channels is C ₂ +C ₁ Inputting the simple feature extraction module to obtain a feature map UP ₁ Its dimension is C ₁ ×H ₁ ×W ₁ ；

Step 3.5, feature map UP ₁ Inputting 1*1 convolution block to obtain a characteristic map UP ₀ Dimension 32 XH ₁ ×W ₁ Wherein 32 represents 32 different teeth, UP ₀ Each channel of (2) is activated by the following formula to obtain UP ₀ Multiplying the probability that each pixel point belongs to the tooth area by 255 to obtain a segmentation result of the final 32 teeth;

2. The method for semantic segmentation of dental full-view sheets based on feature map perturbation and boundary supervision according to claim 1, wherein the step of obtaining the dental full-view sheets, and the step of obtaining dental full-view sheets I with clearer tooth boundaries by sharpening the dental full-view sheets comprises the following steps:

3. The feature map perturbation and boundary supervision-based tooth full-scene semantic segmentation method of claim 1, wherein the simple feature extraction module comprises two sets of serially connected convolution layers with convolution kernel size 3*3, a batch normalization layer and an activation layer.

4. The feature map perturbation and boundary supervision-based tooth full-scene semantic segmentation method of claim 1, wherein the perturbation feature extraction module comprises two groups of convolution layers with the convolution kernel size of 3*3, feature map perturbation operation, batch normalization layer and activation layer which are connected in series.

5. The method for semantic segmentation of dental full-view sheets based on feature map perturbation and boundary supervision according to claim 1, wherein the channel fusion module is used for combining the up-sampled lower-layer feature map and the current-layer feature map according to channels, and outputting feature maps with unchanged sizes and channel numbers which are the sum of the lower-layer feature map and the current-layer feature map.

6. The feature map perturbation and boundary supervision-based tooth full-scene semantic segmentation method according to claim 4, wherein the feature map perturbation operation realizes perturbation of a feature map by using the following formula;