CN116843901A

CN116843901A - Medical image segmentation model training method and medical image segmentation method

Info

Publication number: CN116843901A
Application number: CN202310872747.5A
Authority: CN
Inventors: 请求不公布姓名
Original assignee: Suzhou Xiaowei Changxing Robot Co ltd
Current assignee: Suzhou Xiaowei Changxing Robot Co ltd
Priority date: 2023-07-17
Filing date: 2023-07-17
Publication date: 2023-10-03

Abstract

The invention provides a medical image segmentation model training method and a medical image segmentation method, wherein the medical image is a medical image corresponding to hard tissues, and the training method comprises the steps of acquiring a first number of first medical images; carrying out random mask processing on each first medical image to obtain a corresponding mask medical image, and forming a first training sample by taking the first medical image as a label and the mask medical image corresponding to the first medical image; training a pre-constructed self-supervision model according to a first training sample, wherein the self-supervision model comprises a first coding network and a first decoding network; and migrating the weight parameters of the first coding network in the trained self-supervision model to a pre-constructed medical image segmentation model, and training the medical image segmentation model by adopting a second training sample. The invention greatly reduces the dependence of the hard tissue medical image segmentation algorithm based on the deep learning on the labeling data quantity and reduces the workload of data labeling.

Description

Medical image segmentation model training method and medical image segmentation method

Technical Field

The present invention relates to the field of image processing technologies, and in particular, to a medical image segmentation model training method, a medical image segmentation method, an electronic device, and a readable storage medium.

Background

The position and size of each tooth need to be calculated first when planning the tooth planting. At present, the purpose of tooth segmentation is achieved in two ways, namely, a semi-automatic segmentation way is used, and an operator interacts with a machine to segment each tooth; or a traditional three-dimensional convolution neural network is adopted, and a better segmentation effect is achieved by collecting a large amount of labeling data.

The traditional semi-automatic segmentation mode is adopted, an operator is required to interact with a computer frequently to calibrate the foreground and the background of each type of teeth, the segmentation efficiency and the segmentation effect are not ideal, and the segmentation algorithm based on deep learning is gradually replaced. However, if a relatively satisfactory segmentation result is to be obtained by using the neural network segmentation algorithm based on the traditional three-dimensional convolution, a large amount of data support is required, and according to experience, more than 1000 cases of precisely marked data are required for the 32 categories of the whole mouth teeth, but the difficulty in collecting and marking medical data is always a big problem in the industry.

It should be noted that the information disclosed in this background section is only for enhancement of understanding of the general background of the invention and should not be taken as an acknowledgement or any form of suggestion that this information forms the prior art already known to a person skilled in the art.

Disclosure of Invention

The invention aims to provide a medical image segmentation model training method and a medical image segmentation method, which can reduce the dependence of a hard tissue medical image segmentation algorithm based on deep learning on the labeling data quantity and reduce the workload of data labeling.

In order to achieve the above object, the present invention provides a training method for a medical image segmentation model, wherein the medical image is a medical image corresponding to hard tissue, and the training method comprises:

acquiring a first number of first medical images;

for each first medical image, carrying out random mask processing on the first medical image to obtain a corresponding mask medical image, and forming a first training sample by taking the first medical image as a label and the mask medical image corresponding to the first medical image;

training a pre-constructed self-supervision model according to the first training sample until a first preset training ending condition is met, wherein the self-supervision model comprises a first coding network for extracting tissue characteristics and a first decoding network for carrying out pixel value regression;

Migrating the weight parameters of the first coding network in the trained self-supervision model to a pre-constructed medical image segmentation model, and training the medical image segmentation model by adopting a second training sample with a second quantity acquired in advance until a second preset training ending condition is met, wherein the second quantity is smaller than the first quantity, the second training sample comprises a second medical image and a corresponding target tissue mask label image thereof, the medical image segmentation model comprises a second coding network for extracting target tissue characteristics and a second decoding network for carrying out target tissue segmentation, and the network structure of the second coding network is the same as that of the first coding network.

Optionally, before performing the random masking processing on each of the first medical images, the training method further includes:

for each first medical image, performing normalization processing of resolution and normalization processing of pixel values on the first medical image; and/or

Before training the medical image segmentation model with a second number of pre-acquired second training samples, the training method further comprises:

And for each second training sample, performing resolution normalization processing on a second medical image and a target tissue mask label image in the second training sample, and performing pixel value normalization processing on the second medical image.

Optionally, the performing random masking processing on the first medical image to obtain a corresponding masked medical image includes:

adjusting the size of the first medical image to be an integer multiple of a preset size;

dividing the first medical image with the adjusted size into a plurality of first subareas according to the preset size, wherein the size of each first subarea is the preset size;

and randomly selecting a plurality of first sub-areas from all the first sub-areas of the first medical image according to a preset proportion to carry out random mask processing so as to acquire a corresponding mask medical image.

Optionally, the first encoding network includes a first feature extraction sub-network, a first position encoding module and a self-attention calculating module, where the first feature extraction sub-network is configured to perform feature extraction on all first sub-regions that are not masked in the masked medical image, so as to obtain first feature maps corresponding to the first sub-regions that are not masked; the first position coding module is used for carrying out position coding on each pixel point in each first feature map aiming at each first feature map, and splicing a position coding result with the first feature map to obtain a corresponding second feature map; the self-attention calculating module is used for fusing the second feature images corresponding to all the first sub-areas which are not masked in the masked medical image based on a self-attention mechanism so as to obtain corresponding fused feature images;

The first decoding network comprises a second feature extraction sub-network, a second position coding module, a merging layer and a full-connection layer, wherein the second feature extraction sub-network is used for respectively extracting features of all first sub-areas in the mask medical image so as to obtain third feature images corresponding to the first sub-areas; the second position coding module is used for carrying out position coding on each pixel point in the third feature map aiming at each third feature map, and splicing a position coding result with the third feature map to obtain a fourth feature map; the merging layer is used for merging the fusion feature map output by the first coding network with all the fourth feature maps; and the full connection layer is used for carrying out pixel value regression on the output of the merging layer so as to obtain sub-prediction medical images corresponding to all the first sub-regions in the mask medical image.

Optionally, the first preset training ending condition is:

the error value between the predicted medical image corresponding to the mask medical image and the first medical image corresponding to the mask medical image is smaller than a first preset error threshold, wherein the predicted medical image corresponding to the mask medical image is obtained by combining the sub-predicted medical images corresponding to all the first sub-regions in the mask medical image.

Optionally, before training the medical image segmentation model using the second training sample, the training method further comprises:

adjusting the size of the second medical image in the second training sample to be an integer multiple of the preset size;

dividing the second medical image with the adjusted size into a plurality of second subareas according to the preset size, wherein the size of each second subarea is the preset size.

Optionally, the second preset training ending condition is:

the error value between the predicted target tissue mask image corresponding to the second medical image and the target tissue mask label image corresponding to the second medical image is smaller than a second preset error threshold, wherein the predicted target tissue mask image corresponding to the second medical image is obtained by combining sub-predicted target tissue segmentation images corresponding to all the second sub-regions in the second medical image.

Optionally, the training the medical image segmentation model using a second number of second training samples acquired in advance includes:

taking the trained weight parameters of the first coding network in the self-supervision model as the weight parameters of the second coding network, and setting initial values of the weight parameters of the second decoding network in the medical image segmentation model;

And training the medical image segmentation model by adopting the second training sample according to the weight parameters of the second coding network and the initial values of the weight parameters of the second decoding network so as to adjust the weight parameters of the second decoding network.

In order to achieve the above object, the present invention further provides a medical image segmentation method, wherein the medical image is a medical image corresponding to hard tissue, and the medical image segmentation method includes:

acquiring a trained medical image segmentation model by adopting the medical image segmentation model training method;

and dividing the target tissue of the acquired medical image to be divided by adopting the trained medical image dividing model so as to acquire a target tissue dividing image.

Optionally, the segmenting the target tissue of the acquired medical image to be segmented by using the trained medical image segmentation model to acquire a segmented image of the target tissue includes:

the size of the medical image to be segmented is adjusted to be an integral multiple of a preset size;

dividing the medical image to be segmented after the size adjustment into a plurality of third subareas according to the preset size, wherein the size of each third subarea is the preset size;

Inputting all the third subareas into the medical image segmentation model to obtain sub-target tissue segmentation images corresponding to all the third subareas;

and combining all the sub-target tissue segmentation images to obtain the target tissue segmentation images.

Compared with the prior art, the medical image segmentation model training method and the medical image segmentation method provided by the invention have the following advantages:

according to the medical image segmentation model training method provided by the invention, a large amount of non-labeling data (namely, a first number of first training samples) is adopted to train a pre-built self-supervision model so as to train to obtain a basic large model (namely, a trained self-supervision model), then weight parameters of a first coding network used for extracting features in the trained self-supervision model are migrated into the pre-built medical image segmentation model, and the medical image segmentation model is trained according to a specific segmentation task (such as a tooth image segmentation task, namely, a target tissue is teeth) by adopting a small amount of labeling data (namely, a second number of second training samples), so that a medical image segmentation model required for the specific segmentation task (such as the tooth image segmentation task) is finally obtained. Therefore, the medical image segmentation model training method provided by the invention greatly reduces the dependence of the hard tissue medical image segmentation algorithm based on the deep learning on the labeling data quantity, and reduces the workload of data labeling, so that the time cost and the labor cost of the hard tissue medical image segmentation algorithm based on the deep learning can be reduced, and the problem of difficult segmentation model training caused by multi-category segmentation tasks is effectively solved.

Because the medical image segmentation method provided by the invention and the medical image segmentation model training method provided by the invention belong to the same inventive concept, the medical image segmentation method provided by the invention at least has all the advantages of the medical image segmentation model training method provided by the invention, and the description thereof can be specifically referred to, so that the advantages of the medical image segmentation method provided by the invention are not repeated one by one.

Drawings

FIG. 1 is a flow chart of a medical image segmentation method according to an embodiment of the present invention;

FIG. 2a is a schematic diagram of image division according to an embodiment of the present invention;

FIG. 2b is a schematic diagram of a random mask according to an embodiment of the present invention;

FIG. 3 is a schematic diagram of a network structure of a self-monitoring model according to an embodiment of the present invention;

FIG. 4 is a schematic diagram illustrating the operation of a first coding network according to an embodiment of the present invention;

FIG. 5 is a schematic diagram of a self-monitoring model according to an embodiment of the present invention acquiring a predicted medical image corresponding to a masked medical image;

FIG. 6 is a schematic diagram of a network structure of a medical image segmentation model according to an embodiment of the present invention;

fig. 7 is a schematic diagram of the operation of a second decoding network according to an embodiment of the present invention;

FIG. 8 is a flowchart of a medical image segmentation method according to an embodiment of the present invention;

FIG. 9 is a schematic illustration of segmenting a dental image using the medical image segmentation method provided by the present invention;

FIG. 10 is a schematic illustration of segmentation of a spine image using the medical image segmentation method provided by the present invention;

fig. 11 is a schematic block diagram of an electronic device according to an embodiment of the present invention.

Detailed Description

The medical image segmentation model training method, the medical image segmentation method, the electronic device and the readable storage medium according to the present invention are further described in detail below with reference to the accompanying drawings and the detailed description. The advantages and features of the present invention will become more apparent from the following description. It should be noted that the drawings are in a very simplified form and are all to a non-precise scale, and are merely intended to facilitate a convenient and clear description of the objects provided by the present invention. For a better understanding of the invention with objects, features and advantages, refer to the drawings. It should be understood that the structures, proportions, sizes, etc. shown in the drawings are shown only in connection with the present disclosure for the understanding and reading of the present disclosure, and are not intended to limit the scope of the invention, which is defined by the appended claims, and any structural modifications, proportional changes, or dimensional adjustments, which may be made by the present disclosure, should fall within the scope of the present disclosure under the same or similar circumstances as the effects and objectives attained by the present invention.

The invention provides a medical image segmentation model training method, a medical image segmentation method, electronic equipment and a readable storage medium, which can reduce the dependence of a medical image segmentation algorithm based on deep learning on the labeling data quantity and reduce the workload of data labeling. It should be noted that the medical image segmentation model training method and the medical image segmentation method provided by the invention can be applied to the electronic device provided by the invention, wherein the electronic device can be a personal computer, a mobile terminal and the like, and the mobile terminal can be a hardware device with various operating systems such as a mobile phone, a tablet personal computer and the like. It should be noted that, although the present invention is described taking tooth image segmentation as a specific segmentation task as an example, as will be appreciated by those skilled in the art, the medical image segmentation model training method and the medical image segmentation method provided by the present invention may also be applicable to segmentation tasks in other medical scenarios, such as, for example, scenarios in which hard tissue anatomy structures such as spine image segmentation, pelvic image segmentation, phalange image segmentation need to be segmented in multiple categories.

In order to achieve the above-mentioned idea, the present invention provides a medical image segmentation method, where the medical image is a medical image corresponding to hard tissue. Fig. 1 is a flowchart illustrating a medical image segmentation method according to an embodiment of the invention. As shown in fig. 1, the medical image segmentation method provided by the invention comprises the following steps:

Step S110, acquiring a first number of first medical images.

Step S120, for each first medical image, performing random masking processing on the first medical image to obtain a corresponding masked medical image, and forming a first training sample with the first medical image as a label and the corresponding masked medical image.

And step 130, training the pre-constructed self-supervision model according to the first training sample until a first preset training ending condition is met.

And step 140, migrating the weight parameters of the first coding network in the trained self-supervision model to a pre-constructed medical image segmentation model, and training the medical image segmentation model by adopting a second number of pre-acquired second training samples until a second preset training ending condition is met.

The self-supervision model comprises a first coding network for extracting tissue characteristics and a first decoding network for carrying out pixel value regression, the second number is smaller than the first number, the second training sample comprises a second medical image and a target tissue mask label image corresponding to the second medical image, the medical image segmentation model comprises a second coding network for extracting target tissue characteristics and a second decoding network for carrying out target tissue segmentation, and the network structure of the second coding network is the same as that of the first coding network.

According to the medical image segmentation model training method provided by the invention, a large amount of non-labeling data (namely, a first number of first training samples) is adopted to train a pre-built self-supervision model so as to train to obtain a basic large model (namely, a trained self-supervision model), then weight parameters of a first coding network used for extracting features in the trained self-supervision model are migrated into the pre-built medical image segmentation model, and the medical image segmentation model is trained by adopting a small amount of labeling data (namely, a second number of second training samples) according to a specific segmentation task (such as a tooth image segmentation task, namely, a target tissue is teeth), so that a medical image segmentation model required for the specific segmentation task (such as the tooth image segmentation task) is finally obtained. Therefore, the medical image segmentation model training method provided by the invention greatly reduces the dependence of the hard tissue medical image segmentation algorithm based on the deep learning on the labeling data quantity, and reduces the workload of data labeling, so that the time cost and the labor cost of the hard tissue medical image segmentation algorithm based on the deep learning can be reduced, and the problem of difficult segmentation model training caused by multi-category segmentation tasks is effectively solved. It should be noted that, as those skilled in the art can understand, the first decoding network in the trained self-supervised model may be replaced with the second decoding network to construct a medical image segmentation model, and then the medical image segmentation model is trained using the second training samples.

Specifically, at least a proportion (e.g., 10% -30%) of the first medical images in the first number of acquired first medical images are medical images including the target tissue (i.e., medical images related to the target tissue segmentation task, such as tooth CT images including the case of full mouth teeth, half mouth teeth, etc.), and further, medical images of other tissues (i.e., medical images unrelated to the target tissue segmentation task, such as spine CT images, hip CT images, lower limb CT images, etc.) may be included. The second number of acquired second medical images are medical images (such as tooth CT images, including the conditions of full mouth teeth, half mouth teeth and the like) comprising target tissues, and target tissue mask label images (such as tooth label images) corresponding to the second medical images can be marked by operators with relevant medical experience.

In some exemplary embodiments, before performing the random masking process on each of the first medical images, the training method further includes:

for each first medical image, performing normalization processing of resolution and normalization processing of pixel values on the first medical image.

Therefore, the resolution of the first medical images acquired by the same type of equipment can be unified through the normalization processing of the resolution of the first medical images, so that the self-supervision model can learn the tissue characteristics better. Through the normalization processing of the pixel values of the first medical images, the pixel values of all the pixel points in the first medical images can be adjusted to be between 0 and 1, so that the gradient is effectively prevented from being changed greatly in the training process of the self-supervision model, and the self-supervision model has a better iteration convergence effect.

Specifically, the resolution of the first medical image may be normalized to the target resolution corresponding to the first medical image according to the data type of the first medical image, for example, if the data type of the first medical image is CT data, the resolution (spacing) of the first medical image is unified to 1.0mm, that is, the resolution of the first medical image is adjusted to (1.0 ); if the data class of the first medical image is CBCT data, the resolution of the first medical image is unified to 0.4mm, i.e. the resolution of the first medical image is adjusted (0.4,0.4,0.4). It should be noted that, as those skilled in the art can understand, the resolution of CBCT data is in the range of 0.076mm to 0.4mm, and in most cases, the resolution of the first medical image with the data category of CBCT data is in the range of 0.2mm to 0.4mm, so that the resolution of the first medical image with the data category of CBCT data can be unified to any other target resolution (for example, 0.2 mm) within the range of 0.2mm to 0.4mm, which is specifically selected according to the actual requirement, and the present invention is not limited thereto. Because the resolution range of the CT data is different, the resolution of the first medical image with all data types being the CT data can be unified to other target resolutions except 1.0mm according to actual requirements.

Further, the normalization processing of the pixel values may be performed on the first medical image by:

performing truncation processing on the first medical image with the adjusted resolution ratio so as to adjust the pixel value of each pixel point in the first medical image to be within a target range;

and adjusting the pixel value of each pixel point in the first medical image after the truncation processing to be between 0 and 1.

Specifically, the pixel value of the pixel point whose pixel value is smaller than the minimum value threshold (i.e., the minimum value within the target range) in the first medical image after the resolution is adjusted may be set as the minimum value threshold, the pixel value of the pixel point whose pixel value is larger than the maximum value threshold (i.e., the maximum value within the target range) is set as the maximum value threshold, and the pixel value of the pixel point whose pixel value is between the minimum value threshold and the maximum value threshold is kept unchanged, so as to adjust the pixel value of each pixel point in the first medical image to be within [ the minimum value threshold, the maximum value threshold ]. It should be noted that, as understood by those skilled in the art, the target ranges corresponding to the first medical images including the tissues of different classes are different, for each of the first medical images of the tissue classes, the pixel values of all the pixels of the tissue region in the first medical image of the tissue class may be counted, the pixel value ranked at the first preset percentage (e.g. 0.05%) is used as the minimum threshold value corresponding to the first medical image of the tissue class, and the pixel value ranked at the second preset percentage (e.g. 99.5%) is used as the maximum threshold value corresponding to the first medical image of the tissue class. It should be further noted that, as will be understood by those skilled in the art, regarding how to adjust the pixel value of each pixel point in the first medical image after the truncation process to be between 0 and 1, reference may be made to normalization content known to those skilled in the art, and no further description is given here.

In some exemplary embodiments, the performing the random masking processing on the first medical image to obtain a corresponding masked medical image includes:

Specifically, the preset size may be set according to practical situations, for example, the preset size may be (16, 16), which is not limited by the present invention. Taking the preset size (16, 16) as an example, the size of the first medical image (preferably the normalized first medical image) may be adjusted to an integer multiple of the preset size according to the following formula:

wherein x ε { H, W, D },representation pair->Is rounded down, H is the height dimension of the first medical image (preferably the normalized first medical image), W is the width dimension of the first medical image (preferably the normalized first medical image), and D is the depth dimension (i.e. the number of layers in the Z-direction) of the first medical image (preferably the normalized first medical image).

Further, please refer to fig. 2a and 2b, wherein fig. 2a is a schematic diagram illustrating image division according to an embodiment of the present invention; fig. 2b is a schematic diagram of a random mask according to an embodiment of the present invention. As shown in fig. 2a and 2b, the resized first medical image, preferably the resized normalized first medical image, may be divided into N first sub-areas (each small box in the figure representing one first sub-area) according to a preset size (e.g. (16, 16)), and then randomly selectedThe first sub-region is subjected to a random masking process (i.e. the selected +.>The pixel value of the first sub-region is set to 0), wherein a is a preset proportion, +.>Indicating that the value of a x N is rounded up. It should be noted that the preset proportion may be set according to the actual situation, which is not limited in the present invention, for example, the preset proportion may be 75%, and preferably, in order to improve the tissue feature extraction effect of the self-supervision model, the preset proportion is preferably greater than or equal to 50%.

Fig. 3 is a schematic diagram of a network structure of a self-supervision model according to an embodiment of the invention. As shown in fig. 3, in some exemplary embodiments, the first encoding network includes a first feature extraction sub-network, a first position encoding module and a self-attention calculating module, where the first feature extraction sub-network is configured to perform feature extraction on all first sub-regions that are not masked in the masked medical image, so as to obtain a first feature map corresponding to each first sub-region that is not masked; the first position coding module is used for carrying out position coding on each pixel point in each first feature map aiming at each first feature map, and splicing a position coding result with the first feature map to obtain a corresponding second feature map; the self-attention calculating module is used for fusing the second feature images corresponding to all the first sub-areas which are not masked in the masked medical image based on a self-attention mechanism so as to obtain corresponding fused feature images; the first decoding network comprises a second feature extraction sub-network, a second position coding module, a merging layer and a full-connection layer, wherein the second feature extraction sub-network is used for respectively extracting features of all first sub-areas in the mask medical image so as to obtain third feature images corresponding to the first sub-areas; the second position coding module is used for carrying out position coding on each pixel point in the third feature map aiming at each third feature map, and splicing a position coding result with the third feature map to obtain a fourth feature map; the first merging layer is used for merging the fusion feature map output by the first coding network with all the fourth feature maps; and the full connection layer is used for carrying out pixel value regression on the output of the merging layer so as to obtain sub-prediction medical images corresponding to all the first sub-regions in the mask medical image.

Specifically, please refer to fig. 4, which is a schematic diagram illustrating the operation of the first coding network according to an embodiment of the present invention, wherein gray rectangular blocks represent a first feature diagram. As shown in fig. 4, feature extraction of multiple levels is performed on each first sub-region that is not masked through the first feature extraction sub-network, so as to obtain multiple levels of first feature graphs corresponding to each first sub-region that is not masked; then, a first position coding module is used for respectively carrying out position coding on the first feature images of each level corresponding to each first sub-area which is not masked, and the position coding result is spliced with the corresponding first feature images so as to obtain corresponding second feature images; and then, inputting each second feature map corresponding to each first sub-region which is not masked into the self-attention calculating module together so as to calculate the self-attention among the hierarchical features of each first sub-region which is not masked, so that the hierarchical features of different first sub-regions can be fused better, and the self-supervision model can learn the relation among the different first sub-regions. It should be noted that, as those skilled in the art will understand, the specific network structures of the first feature extraction sub-network and the second feature extraction sub-network are not limited in the present invention, and the first feature extraction sub-network and the second feature extraction sub-network may be, but are not limited to, 3D RestNet networks.

Further, since the dimensions of the first feature maps of different levels are different, before self-attention is calculated, it is necessary to perform dimension normalization processing on the first feature maps of different levels, so as to adjust the dimensions of the first feature maps of each level to be consistent, and then perform position coding on the first feature maps subjected to the dimension normalization processing by using a first position coding module.

Still further, the position encoding may be performed according to the following formula:

wherein PE represents the position encoding result, pos represents the position to be encoded, 2i represents the 2i-th dimension, 2i+1 represents the 2i+1-th dimension, d _model Representing the dimension of the feature map that needs to be encoded.

It should be noted that, as those skilled in the art can understand, the specific details of how to perform the dimension normalization process can refer to the related art known to those skilled in the art, so the description thereof will not be repeated here.

In some exemplary embodiments, the self-attention computation module is configured to fuse the second feature maps corresponding to the first sub-regions, which are not masked, in the masked medical image by using a multi-head self-attention mechanism, so as to obtain corresponding fused feature maps.

Specifically, the second feature graphs corresponding to all the first sub-regions which are not masked in the masked medical image may be spliced to obtain a spliced feature matrix; then, for each head in the multi-head attention mechanism: mapping the spliced feature matrix into a key vector, a value vector and a query vector respectively, calculating dot products of the query vector and the key vector, normalizing a dot product result to obtain similarity weights of the query vector and the value vector, and weighting the value vector and the similarity weights to finish the calculation of the attention of the head; and finally, splicing the calculation results of the attention of each head, and performing linear transformation on the splicing results to obtain corresponding fusion characteristic diagrams.

Further, the calculation of the multi-headed attention can be performed according to the following formula:

MultiHead(Q,K,V)＝Concat(head ₁ ,…,head _h )W ^O

head _i ＝Attention(QW _i ^Q ,KW _i ^K ,VW _i ^V )

wherein W is ^O Representing a weight matrix, head _i Represents the i-th head attention, Q represents the query vector, K represents the key vector, V represents the value vector, T represents the transpose, d _K Representing the dimensions of the key vector, W _i ^Q Representing the ith noteWeight matrix of query vectors in intent, W _i ^K Weight matrix representing key vector in ith attention, W _i ^V A weight matrix representing a vector of values in the ith attention.

Since different head attentions are used to map inputs to different sub-representation spaces, the model can be made to focus on different locations in the different sub-representation spaces by employing a multi-head attentions mechanism for feature fusion. It should be noted that, for more details about the multi-head attention mechanism, reference may be made to related techniques known to those skilled in the art, and further description is omitted herein.

Please continue to refer to fig. 5, which is a schematic diagram illustrating a self-monitoring model according to an embodiment of the present invention to obtain a predicted medical image corresponding to a masked medical image. As shown in fig. 5, the sub-predictive medical images corresponding to the first sub-areas in the mask medical image may be acquired through the self-supervision model, and the predictive medical image corresponding to the mask medical image may be acquired by combining all the sub-predictive medical images of the mask medical image together.

In some exemplary embodiments, the first preset training end condition is: the error value between the predicted medical image corresponding to the mask medical image and the first medical image corresponding to the mask medical image is smaller than a first preset error threshold, wherein the predicted medical image corresponding to the mask medical image is obtained by combining the sub-predicted medical images corresponding to all the first sub-regions in the mask medical image.

Specifically, the error value between the predicted medical image corresponding to the mask medical image and the first medical image corresponding to the mask medical image may be calculated according to a first preset loss function, where the first preset loss function may be a Dice loss function or other loss functions known to those skilled in the art, and will not be described herein. It should be noted that, as those skilled in the art will understand, the first preset error threshold may be set according to the specific situation, which is not limited by the present invention. In addition, it should be noted that, as those skilled in the art can understand, the first preset training ending condition may also be that the training number reaches the first preset iteration number.

In some exemplary embodiments, before training the medical image segmentation model with a second number of second training samples acquired in advance, the training method further comprises:

Specifically, the relevant content of how to perform the normalization processing of the resolution ratio on the second medical image and the target tissue mask label image may refer to the relevant content of how to perform the normalization processing of the resolution ratio on the first medical image, and the relevant content of how to perform the normalization processing of the pixel value on the second medical image may refer to the relevant content of how to perform the normalization processing of the pixel value on the first medical image, which will not be described herein.

Before training the medical image segmentation model with the second training sample, the training method further comprises:

Thus, by dividing the second medical image also into a plurality of second sub-regions each of the predetermined size, e.g. (16, 16), the second encoding network in the medical image segmentation model may be enabled to learn better about the features required for the target tissue segmentation, i.e. the target tissue features.

Please continue to refer to fig. 6, which is a schematic diagram illustrating a network structure of a medical image segmentation model according to an embodiment of the present invention. As shown in fig. 6, the structure of the second encoding network in the medical image segmentation model is the same as that of the first encoding network, and also includes a first feature extraction sub-network, a first position encoding module, and a self-attention calculating module. The first extraction sub-network in the second coding network is used for respectively extracting features of all second sub-areas in the second medical image to obtain first feature images corresponding to the second sub-areas, the second position coding module in the second coding network is used for respectively carrying out position coding on the first feature images corresponding to the second sub-areas to obtain corresponding second feature images, and the self-attention module in the second coding network is used for fusing the second feature images corresponding to all the second sub-areas in the second medical image based on a self-attention mechanism to obtain corresponding fused feature images. Further, the self-attention module in the second coding network also adopts a multi-head self-attention mechanism to fuse the second feature images corresponding to all the second sub-regions in the second medical image so as to obtain corresponding fusion feature images.

Please continue to refer to fig. 7, which is a schematic diagram illustrating the operation of the second decoding network according to an embodiment of the present invention. As shown in fig. 7, the second decoding network can gradually upsample the features (i.e. the fused feature map) output by the second encoding network to restore the size of the input data (i.e. restore to the size of the second sub-region input to the second encoding network, for example, 16 x 16) while predicting the target tissue segmentation result corresponding to the input data (i.e. the second sub-region), the sub-prediction target tissue segmentation images corresponding to the second sub-regions are obtained, and the prediction target tissue mask image corresponding to the second medical image can be obtained by combining the sub-prediction target tissue segmentation images corresponding to all the second sub-regions in the second medical image.

In some exemplary embodiments, the second preset training end condition is: and the error value between the predicted target tissue mask image corresponding to the second medical image and the target tissue mask label image corresponding to the second medical image output by the medical image segmentation model is smaller than a second preset error threshold.

Specifically, the error value between the predicted target tissue mask image corresponding to the second medical image and the target tissue mask label image corresponding to the second medical image may be calculated according to a second preset loss function, where the second preset loss function may be a Dice loss function or other loss functions known to those skilled in the art, and will not be described herein. It should be noted that, as those skilled in the art will understand, the second preset error threshold may be set according to the specific situation, which is not limited by the present invention. Furthermore, it should be noted that, as those skilled in the art can understand, the second preset training ending condition may also be that the training number reaches the second preset iteration number.

In some exemplary embodiments, the training the medical image segmentation model with the second pre-acquired number of second training samples comprises:

Because a great amount of training is carried out on the first coding network in the training stage of the self-supervision model, the first coding network in the trained self-supervision model has better tissue feature extraction capability, so that the weight parameter of the first coding network in the trained self-supervision model can be directly adopted as the weight parameter of the second coding network in the medical image segmentation model, namely, the weight parameter of the second coding network is not required to be adjusted in the training process of the medical image segmentation model, and only the weight parameter of the second decoding network is required to be adjusted, so that the number of second training samples can be reduced.

The invention also provides a medical image segmentation method, please refer to fig. 8, which is a flow chart of the medical image segmentation method according to an embodiment of the invention. As shown in fig. 8, the medical image segmentation method provided by the invention comprises the following steps:

step S210, obtaining a trained medical image segmentation model by adopting the medical image segmentation model training method.

And step S220, segmenting target tissues of the acquired medical image to be segmented by adopting the trained medical image segmentation model so as to acquire a target tissue segmentation image.

The medical image segmentation model adopted in the medical image segmentation method is obtained by training the medical image segmentation model training method, so that the medical image segmentation method greatly reduces the dependence of a hard tissue medical image segmentation algorithm based on deep learning on the labeling data amount, reduces the workload of data labeling, can reduce the time cost and the labor cost of the hard tissue medical image segmentation algorithm based on the deep learning, and effectively solves the problem of difficult training of the segmentation model caused by multi-category segmentation tasks.

In some exemplary embodiments, before segmenting the target tissue of the acquired medical image to be segmented using the trained medical image segmentation model, the medical image segmentation method further comprises:

and carrying out resolution normalization processing and pixel value normalization processing on the medical image to be segmented.

Specifically, the details of how to perform the normalization processing of the resolution and the normalization processing of the pixel values on the medical image to be segmented may refer to the relevant details of how to perform the normalization processing of the resolution and the normalization processing of the pixel values on the first medical image, which are not described herein.

In some exemplary embodiments, the segmenting the target tissue of the acquired medical image to be segmented using the trained medical image segmentation model to acquire a target tissue segmented image includes:

the size of the medical image to be segmented (preferably the normalized medical image to be segmented) is adjusted to be an integer multiple of a preset size;

and combining all the sub-target tissue segmentation images to obtain a target tissue segmentation image.

In particular, the relevant details of how to resize the medical image to be segmented, preferably the normalized medical image to be segmented, to an integer multiple of a preset size, for example 16 x 16, can be referred to the relevant description hereinabove of how to resize the first medical image to an integer multiple of said preset size, and will not be described in detail herein.

With continued reference to fig. 9 and 10, fig. 9 is a schematic diagram illustrating a tooth image segmentation using the medical image segmentation method according to the present invention; fig. 10 is a schematic view of a spine image segmented using the medical image segmentation method provided by the present invention. As shown in fig. 9 and 10, each tooth can be accurately segmented from the tooth image by using the medical image segmentation method provided by the present invention, and each section of the spine can be accurately segmented from the spine image by using the medical image segmentation method provided by the present invention. It should be noted that, as those skilled in the art will understand, the second training samples used in the training process of the medical image segmentation model for segmenting the tooth image include the tooth image and the corresponding tooth mask label image, and the second training samples used in the training process of the medical image segmentation model for segmenting the spine image include the spine image and the corresponding spine mask label image.

Based on the same inventive concept, the present invention also provides an electronic device, please refer to fig. 11, which is a block structure schematic diagram of the electronic device according to an embodiment of the present invention. As shown in fig. 11, the electronic device provided by the present invention includes a processor 101 and a memory 103, where the memory 103 stores a computer program, and when the computer program is executed by the processor 101, the medical image segmentation model training method described above or the medical image segmentation method described above is implemented. Because the electronic device provided by the invention and the medical image segmentation model training method or the medical image segmentation method provided by the invention belong to the same inventive concept, the electronic device provided by the invention has all advantages of the medical image segmentation model training method or the medical image segmentation method provided by the invention, and specific reference can be made to the related description above, and details are not repeated here.

As shown in fig. 11, the electronic device further comprises a communication interface 102 and a communication bus 104, wherein the processor 101, the communication interface 102, and the memory 103 communicate with each other via the communication bus 104. The communication bus 104 may be a peripheral component interconnect standard (Peripheral Component Interconnect, PCI) bus, or an extended industry standard architecture (Extended Industry StandardArchitecture, EISA) bus, among others. The communication bus 104 may be classified as an address bus, a data bus, a control bus, or the like. For ease of illustration, the figures are shown with only one bold line, but not with only one bus or one type of bus. The communication interface 102 is used for communication between the electronic device and other devices.

The processor 101 of the present invention may be a central processing unit (Central Processing Unit, CPU), other general purpose processor, digital signal processor (Digital Signal Processor, DSP), application specific integrated circuit (Application Specific Integrated Circuit, ASIC), off-the-shelf programmable gate array (Field-Programmable Gate Array, FPGA) or other programmable logic device, discrete gate or transistor logic device, discrete hardware components, or the like. A general purpose processor may be a microprocessor or the processor may be any conventional processor or the like, and the processor 101 is a control center of the electronic device, and connects various parts of the entire electronic device using various interfaces and lines.

The memory 103 may be used to store the computer program, and the processor 101 may implement various functions of the electronic device by running or executing the computer program stored in the memory 103 and invoking data stored in the memory 103.

The present invention also provides a readable storage medium having stored thereon a computer program which, when executed by a processor, may implement the medical image segmentation model training method or the medical image segmentation method described above. Because the readable storage medium provided by the invention and the medical image segmentation model training method or the medical image segmentation method provided by the invention belong to the same inventive concept, the readable storage medium provided by the invention has all advantages of the medical image segmentation model training method or the medical image segmentation method provided by the invention, and the description thereof can be referred to in the above, and will not be repeated.

It should be noted that any combination of one or more computer readable media may be employed as the readable storage media provided by the present invention. The readable medium may be a computer readable signal medium or a computer readable storage medium. The computer readable storage medium can be, for example, but not limited to, an electronic, magnetic, optical, electromagnetic, infrared, or semiconductor system, apparatus, or device, or a combination of any of the foregoing. More specific examples (a non-exhaustive list) of the computer-readable storage medium would include the following: an electrical connection having one or more wires, a portable computer hard disk, a hard disk, random Access Memory (RAM), read-only memory (ROM), erasable programmable read-only memory (EPROM or flash memory), optical fiber, portable compact disc read-only memory (CD-ROM), an optical storage device, a magnetic storage device, or any suitable combination of the foregoing. In the context of this document, a computer readable storage medium may be any tangible medium that can contain, or store a program for use by or in connection with an instruction execution system, apparatus, or device.

Further, the computer readable signal medium may include a data signal propagated in baseband or as part of a carrier wave, with the computer readable program code embodied therein. Such a propagated data signal may take any of a variety of forms, including, but not limited to, electro-magnetic, optical, or any suitable combination of the foregoing. A computer readable signal medium may also be any computer readable medium that is not a computer readable storage medium and that can communicate, propagate, or transport a program for use by or in connection with an instruction execution system, apparatus, or device. Program code embodied on a computer readable medium may be transmitted using any appropriate medium, including but not limited to wireless, wireline, optical fiber cable, RF, etc., or any suitable combination of the foregoing.

In summary, compared with the prior art, the medical image segmentation model training method, the medical image segmentation method, the electronic device and the readable storage medium provided by the invention have the following advantages:

according to the method, a large amount of non-labeling data (namely a first number of first training samples) are adopted to train a pre-built self-supervision model so as to train to obtain a basic large model (namely a trained self-supervision model), weight parameters of a first coding network used for feature extraction in the trained self-supervision model are migrated into a pre-built medical image segmentation model, and the medical image segmentation model is trained according to a specific segmentation task (for example, a tooth image segmentation task, namely, a target tissue is teeth) by adopting a small amount of labeling data (namely, a second number of second training samples), so that a medical image segmentation model required for the specific segmentation task (for example, a tooth image segmentation task) is finally obtained. Therefore, the invention greatly reduces the dependence of the hard tissue medical image segmentation algorithm based on the deep learning on the labeling data quantity and reduces the workload of data labeling, thereby reducing the time cost and the labor cost of the hard tissue medical image segmentation algorithm based on the deep learning and effectively solving the problem of difficult training of the segmentation model caused by multi-category segmentation tasks.

It should be noted that computer program code for carrying out operations of the present invention may be written in one or more programming languages, including an object oriented programming language such as Java, smalltalk, C ++ and conventional procedural programming languages, such as the "C" programming language or similar programming languages. The program code may execute entirely on the user's computer, partly on the user's computer, as a stand-alone software package, partly on the user's computer and partly on a remote computer or entirely on the remote computer or server. In the case of a remote computer, the remote computer may be connected to the user's computer through any kind of network, including a Local Area Network (LAN) or a Wide Area Network (WAN), or may be connected to an external computer (for example, through the Internet using an Internet service provider).

It should also be noted that the apparatus and method disclosed in the embodiments herein may be implemented in other manners. The apparatus embodiments described above are merely illustrative, for example, flow diagrams and block diagrams in the figures illustrate the architecture, functionality, and operation of possible implementations of apparatus, methods and computer program products according to various embodiments herein. In this regard, each block in the flowchart or block diagrams may represent a module, segment, or portion of code, which comprises one or more executable instructions for implementing the specified logical function(s). It should also be noted that in some alternative implementations, the functions noted in the block may occur out of the order noted in the figures. For example, two blocks shown in succession may, in fact, be executed substantially concurrently, or the blocks may sometimes be executed in the reverse order, depending upon the functionality involved. It will also be noted that each block of the block diagrams and/or flowchart illustration, and combinations of blocks in the block diagrams and/or flowchart illustration, can be implemented by special purpose hardware-based systems which perform the specified functions or acts, or combinations of special purpose hardware and computer instructions. In addition, the functional modules in the embodiments herein may be integrated together to form a single part, or the modules may exist alone, or two or more modules may be integrated to form a single part.

Furthermore, the foregoing description is only illustrative of the preferred embodiments of the present invention, and is not intended to limit the scope of the present invention, as any changes and modifications made by those skilled in the art in light of the above disclosure fall within the scope of the present invention. It will be apparent to those skilled in the art that various modifications and variations can be made to the present invention without departing from the spirit or scope of the invention. Thus, the present invention is intended to include such modifications and alterations insofar as they come within the scope of the invention or the equivalents thereof.

Claims

1. A medical image segmentation model training method, wherein the medical image is a medical image corresponding to hard tissue, the method is characterized by comprising the following steps:

acquiring a first number of first medical images;

2. The medical image segmentation model training method according to claim 1, wherein before performing the random masking process on each of the first medical images, the training method further comprises:

3. The medical image segmentation model training method according to claim 1, wherein the performing a random masking process on the first medical image to obtain a corresponding masked medical image comprises:

4. A medical image segmentation model training method as defined in claim 3,

the first coding network comprises a first feature extraction sub-network, a first position coding module and a self-attention calculating module, wherein the first feature extraction sub-network is used for respectively carrying out feature extraction on all first sub-areas which are not masked in the masked medical image so as to obtain first feature graphs corresponding to the first sub-areas which are not masked; the first position coding module is used for carrying out position coding on each pixel point in each first feature map aiming at each first feature map, and splicing a position coding result with the first feature map to obtain a corresponding second feature map; the self-attention calculating module is used for fusing the second feature images corresponding to all the first sub-areas which are not masked in the masked medical image based on a self-attention mechanism so as to obtain corresponding fused feature images;

5. The medical image segmentation model training method as set forth in claim 4, wherein the first preset training ending condition is:

6. The medical image segmentation model training method as set forth in claim 1, further comprising, prior to training the medical image segmentation model with the second training sample:

7. The medical image segmentation model training method as set forth in claim 6, wherein the second preset training ending condition is:

8. The method of claim 1, wherein training the medical image segmentation model using a second number of pre-acquired second training samples comprises:

9. A medical image segmentation method, the medical image being a medical image corresponding to hard tissue, the medical image segmentation method comprising:

acquiring a trained medical image segmentation model by adopting the medical image segmentation model training method according to any one of claims 1 to 8;

10. The medical image segmentation method according to claim 9, wherein the segmenting of the target tissue of the acquired medical image to be segmented using the trained medical image segmentation model to acquire a target tissue segmented image comprises: