CN116935048A

CN116935048A - DSA image semantic segmentation method, system and storage medium based on knowledge distillation

Info

Publication number: CN116935048A
Application number: CN202310859857.8A
Authority: CN
Inventors: 齐鹏; 姚天亮; 王玉; 汪颖
Original assignee: Tongji University
Current assignee: Tongji University
Priority date: 2023-07-13
Filing date: 2023-07-13
Publication date: 2023-10-24

Abstract

The application discloses a DSA image semantic segmentation method, a DSA image semantic segmentation system and a storage medium based on knowledge distillation. The method introduces a semantic mask knowledge distillation technology on a UNet network structure, and a semantic mask layer is arranged at each encoder stage and each decoder stage, and can generate a binarized semantic mask diagram according to an input feature diagram. And using the semantic mask map generated by the teacher model as a supervision signal to guide the student model to generate a similar or consistent semantic mask map, and calculating a loss function between the two semantic mask maps. The application can effectively extract the semantic mask features in the teacher model, and transmit the semantic mask features to the student model, thereby improving the generalization capability and the robustness of the student model; masks with different shapes and sizes can be adaptively generated according to the characteristics of different tasks, so that information loss and noise interference are reduced; knowledge transfer among various tasks can be realized, and computing resources and time cost are saved; reducing interference of background noise and artifacts.

Description

DSA image semantic segmentation method, system and storage medium based on knowledge distillation

Technical Field

The application relates to the technical field of DSA image semantic segmentation, in particular to a DSA image semantic segmentation method, a DSA image semantic segmentation system and a storage medium based on knowledge distillation.

Background

Digital subtraction angiography (Digital Subtraction Angiography, DSA) is a medical imaging technique that utilizes X-rays and computer technology to dynamically display vascular structures and functions. The method has important auxiliary reference significance in the fields of diagnosis of cardiovascular diseases, interventional operation treatment and the like, and is considered as a gold standard for diagnosis of the vascular diseases.

However, DSA imaging also presents some difficulties and limitations clinically. When medical imaging is carried out, the imaging is often interfered by factors such as patient movement, heartbeat, respiration and the like, noise such as movement artifact, overlapping artifact and the like is easy to generate, so that the generated DSA image has the characteristics of blurring and low contrast, and in addition, the clinician has subjective deviation and error in the observation and analysis of useful information such as blood vessels and the like due to the fact that the identification experience of the DSA image is different from person to person, and the related diagnosis and treatment effect is adversely affected.

In order to solve the above clinical pain, semantic segmentation techniques have been used to process DSA images from which useful information is extracted to assist doctors in diagnosis and treatment. Currently, the mainstream DSA image semantic segmentation technology includes an algorithm based on graph division and morphology, a Markov random field algorithm, a deep learning method and the like. However, the conventional semantic segmentation method is difficult to obtain accurate and robust results.

Huang Xiaoxue et al, institute of technology and commerce, propose a coronary angiography method, apparatus and computer-readable storage medium [ application number: CN201910568453.7, the method provides a graphic coronary angiography segmentation method combining the oxford method with thresholding. However, the images obtained by the method have a large amount of speckle noise, which still interferes with the judgment of doctors or intelligent medical systems.

Beijing Yue Wei medical science and technology Limited liability company proposes a contrast image coronary artery segmentation method, electronic equipment, a processing system and a storage medium [ application number: CN202211706591.5, the method provides a method for segmenting coronary artery of angiogram by using key encoder and memory contrast image segment, which can improve segmentation accuracy, but the training method requires a large number of contrast images, usually the number of related medical images is less due to its specificity.

Disclosure of Invention

Aiming at the defects in the prior art, the application provides a DSA image semantic segmentation method and a DSA image semantic segmentation system based on knowledge distillation, which introduce a semantic mask knowledge distillation technology on a UNet network structure, specifically, a semantic mask layer (Semantic Mask Layer) is arranged at each encoder stage and each decoder stage, and the semantic mask layer can generate a binarized semantic mask map (Semantic Mask Map) according to an input feature map to represent the category or foreground/background area to which each pixel point belongs. And then, the semantic mask map generated by the teacher model is used as a supervision signal to guide the student model to generate a similar or consistent semantic mask map, and a Loss Function (Loss Function) between the two semantic mask maps is calculated. In this way, the student model can capture rich and useful semantic information and structural information contained in the teacher model at each stage and accelerate the convergence process thereof.

In order to achieve the above object, in one aspect, the present application provides a DSA image semantic segmentation method based on knowledge distillation technology, which is characterized by comprising the following steps:

step S101, preprocessing the DSA image dataset, cutting and scaling are sequentially performed to enable the sizes of pictures to be consistent, and then normalization operation is performed;

step S102, selecting a deep LabV3 network which is already pre-trained as a teacher model, and adding a semantic mask layer in each encoder stage and decoder stage;

step S103, initializing a clipping part of parameters, taking a UNet network with reduced complexity and calculated amount as a student model, and adding a semantic mask layer which has the same type as the corresponding position of a teacher model and can change parameters in each encoder stage and decoder stage;

step S104, in the training process, a batch of DSA images are given to be input into the teacher model and the student model, and an output characteristic diagram and a semantic mask diagram of the teacher model and the student model are obtained respectively;

step S105, calculating a first cross entropy loss function between the teacher model output feature map and the real label, and taking the first cross entropy loss function as a supervision signal of the teacher model; calculating a second cross entropy loss function between the student model output feature map and the real label as a supervision signal of the student model;

step S106, calculating a mean square error loss function between semantic mask graphs generated by the teacher model and the student model at each stage, and taking the mean square error loss function as a knowledge distillation signal;

step S107, the first cross entropy loss function, the second cross entropy loss function and the mean square error loss function are weighted and summed in proportion to obtain a final overall loss function, and back propagation and parameter updating are carried out on a teacher model and a student model;

and step S108, repeating the step S107 until the preset training round number is reached or convergence conditions are met, then storing a trained student model, and carrying out segmentation prediction on a new DSA image by using the training model.

Further, the deep LabV3 network adopts an ASPP model structure, the specific structure is ResNet101, and the parameters are transmitted as follows: and extracting image features by using the backbone model.

Further, the ASPP model structure uses res net101, and the last layers of the backbone model use hole convolution; finally, the ASPP model structure classifies the different pixels of the output image and processes them through a 1 x 1 convolution layer to recover their original size.

Further, the UNet network has a five-layer structure.

Further, assuming that the output feature diagram of the teacher model is T, the output feature diagram of the student model is S, the real label is Y, and the semantic mask diagrams generated by the teacher model and the student model in the ith stage are M respectively _iT And M _iS The overall loss function can be expressed as:

wherein L is _CE Representing a cross entropy loss function, defined as:

wherein P, Q represents two probability distribution vectors, and C represents the category number;

L _MSE representing a mean square error loss function, defined as:

wherein X and Z represent two semantic mask patterns, H and W represent height and width respectively;

α, β, γ represent the weight coefficients of the three loss functions.

On the other hand, the application provides a DSA image semantic segmentation system based on a knowledge distillation technology, which is characterized by comprising an image preprocessing module, a knowledge distillation module and an image prediction module:

the image preprocessing module performs preprocessing operation on the DSA image data set, sequentially performs cutting and scaling to enable the sizes of the pictures to be consistent, and then performs normalization operation;

the knowledge distillation module comprises a teacher model and a student model, and is used for training the student model; the pre-trained deep LabV3 network is used as a teacher model, and a semantic mask layer is added in each encoder stage and decoder stage; initializing a cutting part of parameters, taking a UNet network with reduced complexity and calculated amount as a student model, and adding a semantic mask layer which has the same type as the corresponding position of a teacher model and can change parameters in each encoder stage and decoder stage; the knowledge distillation module calculates a first cross entropy loss function between the teacher model output feature map and the real label and takes the first cross entropy loss function as a supervision signal of the teacher model; calculating a second cross entropy loss function between the student model output feature map and the real label as a supervision signal of the student model; calculating a mean square error loss function between semantic mask graphs generated by the teacher model and the student model at each stage to serve as a knowledge distillation signal; the first cross entropy loss function, the second cross entropy loss function and the mean square error loss function are weighted and summed in proportion to obtain a final overall loss function, and back propagation and parameter updating are carried out on a teacher model and a student model;

the image prediction module comprises a trained student model, and the trained student model is adopted to conduct segmentation prediction on a new DSA image.

In yet another aspect, the present application provides a computer readable storage medium storing executable instructions that, when executed by one or more processors, cause the one or more processors to perform the DSA image semantic segmentation method described above.

In a final aspect, the present application provides a computer readable storage medium, which is characterized in that the DSA image semantic segmentation system is stored.

Compared with the prior art, the application has the following advantages or beneficial effects:

(1) According to the application, a semantic segmentation knowledge distillation technology is introduced into the basic network structures of the deep LabV3 teacher model and the UNet student model, so that semantic features in the teacher model can be effectively extracted and transferred to the student model, and the generalization capability and robustness of the student model are improved;

(2) The application has self-adaption, and can self-adaptively generate masks with different shapes and sizes according to the characteristics of different tasks, thereby reducing information loss and noise interference;

(3) The application has stronger clinical significance, can improve the quality and the readability of DSA images, reduce the interference of background noise and artifacts, and provide effective assistance for related diagnosis and treatment.

Drawings

The application and its features and advantages will become more apparent from reading of the detailed description of non-limiting embodiments, given with reference to the following drawings.

FIG. 1 is a workflow of a DSA image semantic segmentation method according to an embodiment of the present application;

FIG. 2 is a sample of the effectiveness of a DSA image semantic segmentation system according to an embodiment of the present application for verification on a small-scale dataset;

fig. 3 is a schematic structural diagram of a DSA image semantic segmentation system according to an embodiment of the present application.

Detailed Description

The application will now be further described with reference to the accompanying drawings and specific examples, which are not intended to limit the application.

In the following detailed description, numerous specific details are set forth in order to provide a more thorough understanding of the application. However, it will be apparent to those skilled in the art that well-known algorithms and models are not shown in detail in order to avoid obscuring the principles of the present application.

The order of execution of the operations, steps, and the like in the apparatus and methods shown in the claims, the specification, and the drawings may be in any order as long as the order is not particularly limited, and the output of the preceding process is not used in the subsequent process.

Example 1

Referring to fig. 1, the present embodiment provides a DSA image semantic segmentation method based on knowledge distillation technology. The core idea of the DSA image semantic segmentation method is to transmit semantic information and structural information learned in a Teacher Model (Teacher Model) to a Student Model (Student Model) by utilizing a semantic mask knowledge distillation (Semantic Mask Knowledge Distillation, SMKD) technology, so that the Student Model can achieve segmentation performance similar to or even exceeding that of the Teacher Model under a smaller network structure.

The embodiment adopts a deep LabV3 network as a teacher model and a UNet network as a basic network structure of a student model. Deep LabV3 is a semantic segmentation model based on cavity convolution and space pyramid pooling, can extract multi-scale information while not reducing the resolution of a feature map, can provide rich semantic features and global context information, and guides a student model to learn more effective feature representation. UNet is a classical and efficient image segmentation network, which is characterized by using a U-shaped encoder-decoder structure, and splicing (Concat) the encoder feature map of the corresponding level with the decoder feature map at each up-sampling stage, thereby retaining more low-level and high-level feature information.

Semantic mask knowledge distillation is a pixel-level based feature distillation method that can adaptively select valuable regions in a teacher model to distill and assign different weights. The method has excellent performance in tasks such as target detection, semantic segmentation, image classification and the like.

The specific principle and the implementation steps are as follows:

step S101, preprocessing the DSA image dataset, cutting and scaling are sequentially performed to enable the sizes of pictures to be consistent, for example, 512 multiplied by 512, and then normalization is performed;

As a preferred technical solution, further: the DeepLabV3 network adopts an ASPP model structure, the specific structure is ResNet101, and the parameters are transmitted as follows: and extracting image features by using the backbone model. Preferably, the ASPP model structure uses res net101, and the last layers of the backbone model use hole convolution; finally, the ASPP model structure classifies the different pixels of the output image and processes them through a 1 x 1 convolution layer to recover their original size. The UNet network has a five-layer structure.

Assuming that the output feature diagram of the teacher model is T, the output feature diagram of the student model is S, the real label is Y, and the semantic mask diagrams generated by the teacher model and the student model in the ith stage are M respectively _iT And M _iS The overall loss function can be expressed as:

wherein L is _CE Representing a cross entropy loss function, defined as:

L _MSE representing mean square errorA loss function defined as:

α, β, γ represent the weight coefficients of the three loss functions.

The present embodiment sets α=0.5, β=0.2, and γ=0.3.

Repeating the steps to continuously perform back propagation and parameter updating until the preset training round number is reached or convergence conditions are met, then saving a trained student model, and performing segmentation prediction on a new DSA image by using the student model.

In short, semantic mask knowledge distillation is to select and weight valuable feature regions in a teacher model by learning a set of dynamic, differentiated, highly interpretable semantic masks and pass them to a student model.

Fig. 2 illustrates the effect of this embodiment in semantic segmentation, validating on a small-scale dataset. In the figure, group trunk is a blood vessel contour marked by a doctor with abundant experience, segmentation Result is a blood vessel contour obtained by model segmentation, and a similarity measurement function DSC=94.6% of the two blood vessel contours shows that the student model trained by the embodiment has good semantic segmentation effect on DSA images.

Example 2

Referring to fig. 3, the present embodiment provides a DSA image semantic segmentation system based on knowledge distillation technology, which includes an image preprocessing module 201, a knowledge distillation module, and an image prediction module 204:

the image preprocessing module 201 performs preprocessing operation on the DSA image data set, performs clipping and scaling in order, enables the sizes of the pictures to be consistent, and then performs normalization operation;

the knowledge distillation module comprises a teacher model 202 and a student model 203, and is used for training the student model 203; the pre-trained deep labv3 network serves as a teacher model 202, and a semantic mask layer is added to each encoder stage and decoder stage; initializing a clipping part of parameters, taking a UNet network with reduced complexity and calculated amount as a student model 203, and adding a semantic mask layer which has the same type as the corresponding position of a teacher model and can change parameters in each encoder stage and decoder stage; the knowledge distillation module calculates a first cross entropy loss function between the teacher model output feature map and the real label and takes the first cross entropy loss function as a supervision signal of the teacher model; calculating a second cross entropy loss function between the student model output feature map and the real label as a supervision signal of the student model; calculating a mean square error loss function between semantic mask graphs generated by the teacher model and the student model at each stage to serve as a knowledge distillation signal; the first cross entropy loss function, the second cross entropy loss function and the mean square error loss function are weighted and summed proportionally to obtain a final overall loss function, and back propagation and parameter updating are carried out on the teacher model 202 and the student model 203;

the image prediction module 204 includes a trained student model that is employed to segment and predict new DSA images.

Example 3

The present embodiment provides a computer-readable storage medium storing executable instructions that, when executed by one or more processors, can cause the one or more processors to perform the DSA image semantic segmentation method as described in embodiment 1.

Example 4

The present embodiment provides a computer-readable storage medium storing a DSA image semantic segmentation system as described in embodiment 2.

The above functions, if implemented in the form of software functional units and sold or used as a stand-alone product, may be stored in a computer readable storage medium. Based on this understanding, the technical solution of the present application may be embodied essentially or in a part contributing to the prior art or in a part of the technical solution, in the form of a software product stored in a storage medium, comprising several instructions for causing a computer device (which may be a personal computer, a server, a network device, etc.) to perform all or part of the steps of the method according to the embodiments of the present application. And the aforementioned storage medium includes: a U-disk, a removable hard disk, a Read-Only Memory (ROM), a random access Memory (RAM, random Access Memory), a magnetic disk, or an optical disk, or other various media capable of storing program codes.

In summary, the application provides a DSA image semantic segmentation method, a DSA image semantic segmentation system and a storage medium based on knowledge distillation. The method introduces a semantic mask knowledge distillation technology on a UNet network structure, and a semantic mask layer is arranged at each encoder stage and each decoder stage, and can generate a binarized semantic mask diagram according to an input feature diagram. And using the semantic mask map generated by the teacher model as a supervision signal to guide the student model to generate a similar or consistent semantic mask map, and calculating a loss function between the two semantic mask maps. The application can effectively extract the semantic mask features in the teacher model, and transmit the semantic mask features to the student model, thereby improving the generalization capability and the robustness of the student model; masks with different shapes and sizes can be adaptively generated according to the characteristics of different tasks, so that information loss and noise interference are reduced; knowledge transfer among various tasks can be realized, and computing resources and time cost are saved; reducing interference of background noise and artifacts.

Those skilled in the art will understand that the skilled person can implement the modification in combination with the prior art and the above embodiments, and this will not be repeated here. Such modifications do not affect the essence of the present application, and are not described herein.

The preferred embodiments of the present application have been described above. It is to be understood that the application is not limited to the specific embodiments described above, wherein devices and structures not described in detail are to be understood as being implemented in a manner common in the art; any person skilled in the art can make many possible variations and modifications to the technical solution of the present application or modifications to equivalent embodiments without departing from the scope of the technical solution of the present application, using the methods and technical contents disclosed above, without affecting the essential content of the present application. Therefore, any simple modification, equivalent variation and modification of the above embodiments according to the technical substance of the present application still fall within the scope of the technical solution of the present application.

Those of ordinary skill in the art will appreciate that the elements of the various examples described in connection with the present embodiments, i.e., the algorithm steps, can be implemented as electronic hardware or combinations of computer software and electronic hardware. Whether such functionality is implemented as hardware or software depends upon the particular application and design constraints imposed on the solution. Skilled artisans may implement the described functionality in varying ways for each particular application, but such implementation decisions should not be interpreted as causing a departure from the scope of the present application.

Claims

1. The DSA image semantic segmentation method based on the knowledge distillation technology is characterized by comprising the following steps of:

2. The DSA image semantic segmentation method based on the knowledge distillation technology according to claim 1, wherein the deep labv3 network adopts an ASPP model structure, the specific structure is ResNet101, and the parameters are transferred as follows: and extracting image features by using the backbone model.

3. The DSA image semantic segmentation method based on knowledge distillation technology according to claim 2, wherein the ASPP model structure uses res net101, and the last layers of the backbone model use hole convolution; finally, the ASPP model structure classifies the different pixels of the output image and processes them through a 1 x 1 convolution layer to recover their original size.

4. The DSA image semantic segmentation method based on knowledge distillation technology according to claim 2, wherein the UNet network has a five-layer structure.

5. The DSA image semantic segmentation method based on knowledge distillation technology as claimed in claim 1 or 4, wherein the teacher model output feature map is assumed to be T, the student model output feature map is assumed to be S, the real label is assumed to be Y, and the teacherThe semantic mask patterns generated by the model and the student model in the ith stage are M respectively _iT And M _iS The overall loss function can be expressed as:

wherein L is _CE Representing a cross entropy loss function, defined as:

L _MSE representing a mean square error loss function, defined as:

α, β, γ represent the weight coefficients of the three loss functions.

6. The DSA image semantic segmentation system based on the knowledge distillation technology is characterized by comprising an image preprocessing module, a knowledge distillation module and an image prediction module:

7. A computer readable storage medium storing executable instructions which, when executed by one or more processors, cause the one or more processors to perform the DSA image semantic segmentation method of any one of claims 1 to 5.

8. A computer readable storage medium storing the DSA image semantic segmentation system of claim 6.