CN113989501A

CN113989501A - Training method of image segmentation model and related device

Info

Publication number: CN113989501A
Application number: CN202111236105.3A
Authority: CN
Inventors: 倪东; 黄晓琼; 杨鑫; 刘振东
Original assignee: Shenzhen University
Current assignee: Shenzhen University
Priority date: 2021-10-22
Filing date: 2021-10-22
Publication date: 2022-01-28

Abstract

The application discloses a training method and a related device of an image segmentation model, wherein the method comprises the steps of carrying out data enhancement on test images in a test image set to obtain a plurality of enhanced test images; determining a prediction probability graph of a test image and a prediction probability graph of each enhanced test image through a pre-segmentation model, and determining a master plate probability graph based on each prediction probability graph; determining a plurality of maximum joint mask images based on the master probability map and determining pseudo labels based on the plurality of maximum joint mask images; and fine-tuning model parameters of the segmentation model based on each pseudo label to obtain the image segmentation model. The segmentation model is adjusted on line through a self-supervision learning party, the reliable pseudo labels are generated in a test period data enhancement mode to drive on-line learning of the segmentation model, and learning is stopped dynamically according to the consistency degree of the pseudo labels, so that the model performance and the generalization degree of the image segmentation model can be improved.

Description

Training method of image segmentation model and related device

Technical Field

The present application relates to the field of medical image processing technologies, and in particular, to a training method for an image segmentation model and a related apparatus.

Background

The data-driven learning algorithm makes a major breakthrough in various key and challenging tasks, but the hypothesis is that the data used for model training and the data during testing are obtained from the same distribution by independent sampling. However, due to the influence of the imaging device, the imaging parameters, even different operators, imaging time points, and the like, a typical gray distribution drift problem exists in the medical image, so that the intelligent model learned based on the training data has poor prediction accuracy and lacks of generalization in the medical image.

Thus, the prior art has yet to be improved and enhanced.

Disclosure of Invention

The technical problem to be solved by the present application is to provide a training method of an image segmentation model and a related apparatus, aiming at the deficiencies of the prior art.

In order to solve the above technical problem, a first aspect of the embodiments of the present application provides a training method for an image segmentation model, where the training method includes:

performing data enhancement on the test images in the test image set to obtain a plurality of enhanced test images;

determining a prediction probability graph of the test image and a prediction probability graph of each enhanced test image through a pre-trained segmentation model, and determining a master plate probability graph based on each determined prediction probability graph;

dividing the master plate probability graph based on each preset threshold value in a plurality of preset threshold values respectively to obtain a plurality of maximum joint mask graphs;

determining pseudo labels for the test image and each enhanced test image based on a number of maximum joint mask maps;

and finely adjusting the segmentation model based on each pseudo label, and continuously executing the step of performing data enhancement on the test images in the test image set until the fine adjustment finishing condition is met, so as to obtain the image segmentation model.

The training method of the image segmentation model, wherein the determining of the master probability map based on the determined prediction probability maps specifically comprises:

respectively carrying out reverse enhancement operation on the prediction probability graphs corresponding to the enhanced test images to obtain candidate test probability graphs corresponding to the enhanced test images;

and adding the candidate test probability images and the prediction probability images corresponding to the test images to obtain a master plate probability image.

The training method of the image segmentation model, wherein the dividing the master probability map based on each of a plurality of preset thresholds respectively to obtain a plurality of maximum joint mask maps specifically comprises:

selecting a plurality of preset thresholds, wherein the plurality of preset thresholds comprise all integers smaller than the number of the prediction probability graphs;

and for each preset threshold, obtaining a maximum difference value in the difference values between the pixel value of each pixel in the master probability map and the preset threshold, and generating a maximum joint mask map corresponding to the preset threshold based on the minimum value of the maximum difference value and 1 to obtain a plurality of maximum joint mask maps.

The training method of the image segmentation model, wherein the determining the pseudo labels of the test image and each enhanced test image based on the maximum joint mask images specifically includes:

selecting a preset number of maximum joint mask graphs from the maximum joint mask graphs to form a pseudo label set;

and selecting the pseudo label corresponding to each prediction probability graph in the pseudo label set by adopting the distance similarity index so as to obtain the pseudo labels of the test image and each enhanced test image.

The method for training the image segmentation model, wherein the step of fine-tuning the segmentation model based on each pseudo label and continuing to perform data enhancement on the test image in the test image set until a fine-tuning end condition is met to obtain the image segmentation model, specifically comprises:

modifying model parameters of the segmentation model based on each pseudo label;

selecting a first maximum joint mask image corresponding to a maximum preset threshold and a second maximum joint mask image corresponding to a minimum preset threshold, and calculating a similarity coefficient of the first maximum joint mask image and the second maximum joint mask image;

judging whether the segmentation model meets a fine tuning finishing condition based on the similarity coefficient and the fine tuning frequency of the segmentation model, wherein the fine tuning finishing condition is that the similarity coefficient is smaller than a preset coefficient or the fine tuning frequency is equal to a preset frequency threshold;

when the fine adjustment finishing condition is met, taking the fine-adjusted segmentation model as an image segmentation model;

and when the fine adjustment ending condition is not met, continuing to perform the step of enhancing the data of the test images in the test image set.

The training method of the image segmentation model comprises the steps that the segmentation model comprises at least one style conversion module; the determining the prediction probability map of the test image and the prediction probability map of each enhanced test image through the pre-trained segmentation model specifically includes:

for each reference test image in a test image group formed by the test image and each enhanced test image, respectively inputting the reference test image and a source domain image into the segmentation model, wherein the source domain image is a training image in a preset training sample set for training the segmentation model;

controlling a network layer of the segmentation model before the style conversion module to determine a first feature map corresponding to the reference test image and a second feature map corresponding to the source domain image;

the control style conversion module adjusts the image gray scale of the first feature map based on the image gray scale distribution of the second feature map to obtain an adjusted first feature map;

and controlling the adjusted first characteristic diagram and the second characteristic diagram to pass through a network layer positioned behind the style conversion module in the segmentation model to obtain a prediction probability diagram of the reference test image so as to obtain the prediction probability diagram of the test image and the prediction probability diagram of each enhanced test image.

The training method of the image segmentation model, wherein the adjusting the image gray scale of the first feature map by the control style conversion module based on the image gray scale distribution of the second feature map to obtain the adjusted first feature map specifically includes:

for each channel of the first feature map and each channel of the second feature map, determining sequence numbers of pixel values of pixels in the channels in all pixels of the channels, wherein the sequence numbers are formed according to the sequence of the pixel values from small to large;

and for each channel in the first feature map, selecting a candidate channel corresponding to the channel in the second feature map, selecting a candidate pixel corresponding to each pixel in the candidate channel based on the sequence number corresponding to each pixel in the channel, and replacing the pixel value of the pixel corresponding to each candidate pixel with the pixel value of each candidate pixel to obtain the adjusted first feature map.

for each channel of the first characteristic diagram and each channel of the second characteristic diagram, dividing the channel into a plurality of sliding windows in a sliding window mode, and determining sequence numbers of pixel values of pixels in each sliding window in all pixels of the channel window, wherein the sequence numbers are formed according to the sequence of the pixel values from small to large;

for each sliding window in the first feature map, selecting a candidate sliding window corresponding to the sliding window in the second feature map, selecting a candidate pixel corresponding to each pixel in the candidate sliding window based on the sequence number corresponding to each pixel in the sliding window, and replacing the pixel value of the pixel corresponding to each candidate pixel with the pixel value of each candidate pixel to obtain an adjusted sliding window corresponding to the sliding window;

and for each pixel in each channel in the first feature map, acquiring an adjusted pixel value of the pixel in each adjusted sliding window comprising the pixel, and taking an average value of all acquired adjusted pixel values as a pixel value of the pixel to obtain an adjusted first feature map.

A second aspect of embodiments of the present application provides a computer-readable storage medium storing one or more programs, which are executable by one or more processors to implement steps in a method for training an image segmentation model as described in any one of the above.

A third aspect of the embodiments of the present application provides a terminal device, including: a processor, a memory, and a communication bus; the memory has stored thereon a computer readable program executable by the processor;

the communication bus realizes connection communication between the processor and the memory;

the processor, when executing the computer readable program, implements the steps in the method for training an image segmentation model as described in any one of the above.

Has the advantages that: compared with the prior art, the application provides a training method and a related device of an image segmentation model, wherein the method comprises the steps of performing data enhancement on a test image in a test image set to obtain a plurality of enhanced test images; determining a prediction probability graph of the test image and a prediction probability graph of each enhanced test image through a pre-trained segmentation model, and determining a master plate probability graph based on each determined prediction probability graph; determining a plurality of maximum joint mask images based on the master probability map; determining pseudo labels for the test image and each enhanced test image based on a number of maximum joint mask maps; and finely adjusting the segmentation model based on each pseudo label, and continuously executing the step of performing data enhancement on the test images in the test image set until the fine adjustment finishing condition is met, so as to obtain the image segmentation model. The segmentation model is adjusted on line through a self-supervision learning party, the reliable pseudo labels are generated in a test period data enhancement mode to drive on-line learning of the segmentation model, and learning is stopped dynamically according to the consistency degree of the pseudo labels, so that the model performance and the generalization degree of the image segmentation model can be improved.

Drawings

In order to more clearly illustrate the technical solutions in the embodiments of the present application, the drawings needed to be used in the description of the embodiments are briefly introduced below, and it is obvious that the drawings in the following description are only some embodiments of the present application, and it is obvious for those skilled in the art that other drawings can be obtained according to the drawings without any inventive work.

Fig. 1 is a flowchart of a training method of an image segmentation model provided in the present application.

Fig. 2 is a flowchart of an embodiment of a training method of an image segmentation model provided in the present application.

Fig. 3 is a schematic diagram illustrating an alignment operation of order statistics in the training method of the image segmentation model provided in the present application.

Fig. 4 is a schematic diagram illustrating alignment of order statistics based on a sliding window in the training method of the image segmentation model provided in the present application.

Fig. 5 is a schematic diagram of a pseudo tag generation strategy framework in the training method of the image segmentation model provided by the present application.

Fig. 6 is a schematic structural diagram of a terminal device provided in the present application.

Detailed Description

In order to make the purpose, technical solution, and effect of the present application clearer and clearer, the present application will be further described in detail below with reference to the accompanying drawings and examples. It should be understood that the specific embodiments described herein are merely illustrative of the present application and are not intended to limit the present application.

As used herein, the singular forms "a", "an", "the" and "the" are intended to include the plural forms as well, unless the context clearly indicates otherwise. It will be further understood that the terms "comprises" and/or "comprising," when used in this specification, specify the presence of stated features, integers, steps, operations, elements, and/or components, but do not preclude the presence or addition of one or more other features, integers, steps, operations, elements, components, and/or groups thereof. It will be understood that when an element is referred to as being "connected" or "coupled" to another element, it can be directly connected or coupled to the other element or intervening elements may also be present. Further, "connected" or "coupled" as used herein may include wirelessly connected or wirelessly coupled. As used herein, the term "and/or" includes all or any element and all combinations of one or more of the associated listed items.

It will be understood by those within the art that, unless otherwise defined, all terms (including technical and scientific terms) used herein have the same meaning as commonly understood by one of ordinary skill in the art to which this application belongs. It will be further understood that terms, such as those defined in commonly used dictionaries, should be interpreted as having a meaning that is consistent with their meaning in the context of the prior art and will not be interpreted in an idealized or overly formal sense unless expressly so defined herein.

It should be understood that, the sequence numbers and sizes of the steps in this embodiment do not mean the execution sequence, and the execution sequence of each process is determined by its function and inherent logic, and should not constitute any limitation on the implementation process of this embodiment.

The inventor has found that the learning algorithm based on data driving has made a great breakthrough in various key and challenging tasks, but the hypothesis is that the data used for model training and the data during testing are obtained from independent samples in the same distribution. However, due to the influence of the imaging device, the imaging parameters, even different operators, imaging time points, and the like, a typical gray distribution drift problem exists in the medical image, so that the intelligent model learned based on the training data has poor prediction accuracy and lacks of generalization in the medical image.

The most direct method for solving the problem is to continuously increase the training data volume, so that the model can learn more complicated gray scale transformation drift possibility in the training process. However, in practice, the mix of imaging factors will cause each test image to exhibit a unique shift in the gray scale distribution, the probability of which is infinite. Moreover, the detailed delineation of labeled information for data is time consuming and expensive, and particularly in the medical field where specialized knowledge is required, it is very challenging to apply in the clinic. On the other hand, after the learned model is deployed in a daily-used medical instrument, due to the limitation of machine computing resources, for example, some ultrasound machines only configure a CPU and a limited memory, it is difficult to implement a process of largely re-learning a network model therein. Therefore, the risk caused by the unpredictable image gray distribution drift is overcome, and the dilemma that the model is subjected to a large amount of repeated learning under different imaging conditions is avoided to a great extent.

Various solutions have been proposed by many scholars to this dilemma, which can be divided into supervised and unsupervised learning depending on whether target data and their corresponding labels are needed:

(1) and (3) supervision and learning: the network is supervised learnt by the target data, and the target data is required to be obtained sufficiently, and corresponding labeled information is also required. The method is proved that in multi-center prostate segmentation, at least 8 images with label information of unknown centers are used for learning a pre-training network, and the problem of model generalization capability can be solved. But, as such, labeling each new image is time consuming, laborious, and requires some expertise, which is less practical in clinical practice.

(2) Unsupervised learning: for the target domain, only the image of the target domain is needed, and corresponding labeling information is not needed, so that the labeling process of the target image is omitted. Two main methods are to distinguish the countermeasure network and the image-to-image conversion network. The core idea of differentiating the antagonistic network is to make the network learn common invariant features of different image appearances by antagonistic learning, and ignore the drift of different distributions of appearances. The image-to-image conversion method can be further divided into generation of a countermeasure network and feed-forward approximate stylization, and the essence of the generation of the countermeasure network and the feed-forward approximate stylization is that images of a source domain are converted into a graph with image style characteristics of a target domain, and then a target task network is trained on the converted source domain data so as to achieve the purpose of model generalization. However, in clinical situations, there are an infinite variety of possible shifts in the gray scale distribution, and not all shifts are known or available, so the premise of obtaining a sufficient amount of target domain image data is not always true.

Based on this, in the embodiment of the application, data enhancement is performed on the test images in the test image set to obtain a plurality of enhanced test images; determining a prediction probability graph of the test image and a prediction probability graph of each enhanced test image through a pre-trained segmentation model, and determining a master plate probability graph based on each determined prediction probability graph; determining a plurality of maximum joint mask images based on the master probability map; determining pseudo labels for the test image and each enhanced test image based on a number of maximum joint mask maps; and finely adjusting the segmentation model based on each pseudo label, and continuously executing the step of performing data enhancement on the test images in the test image set until the fine adjustment finishing condition is met, so as to obtain the image segmentation model. The segmentation model is adjusted on line through a self-supervision learning party, the reliable pseudo labels are generated in a test period data enhancement mode to drive on-line learning of the segmentation model, and learning is stopped dynamically according to the consistency degree of the pseudo labels, so that the model performance and the generalization degree of the image segmentation model can be improved.

The following further describes the content of the application by describing the embodiments with reference to the attached drawings.

The embodiment provides a training method of an image segmentation model, as shown in fig. 1 and fig. 2, the method includes:

and S10, performing data enhancement on the test images in the test image set to obtain a plurality of enhanced test images.

Specifically, the test image set includes a plurality of test images, and each of the plurality of test images does not carry the annotation information. In the present embodiment, each of the plurality of test images is a medical image, for example, each of the test images is an ultrasound image, or alternatively, each of the test images is a magnetic resonance image, etc. The test image set is used for testing a pre-trained segmentation model, wherein the segmentation model is used for medical image segmentation, for example, the test image is an ultrasound image of thyroid gland, and the segmentation model is used for segmenting the ultrasound image of thyroid gland to obtain a nodule region in the ultrasound image of thyroid gland.

Each enhanced test image in the enhanced test images is obtained by performing data enhancement on the same test image, and it can be understood that, for a test image in the test image set, data enhancement is performed on the test image for a plurality of times respectively to obtain a plurality of enhanced test images. The enhancement mode or the enhancement intensity of each of the plurality of enhanced test images is different, that is, any two enhanced test images in the plurality of enhanced test images are respectively recorded as a first enhanced test image and a second enhanced test image, and the enhancement mode and the enhancement intensity of the first strong test image are not completely the same as those of the second strong test image, for example, the enhancement mode of the first strong test image is rotation, and the enhancement mode of the second strong test image is horizontal mirror image, or the enhancement mode of the first strong test image and the enhancement mode of the second strong test image are both rotation of the first strong test image by 30 degrees, and the rotation angle of the second strong test image by 40 degrees. In a specific implementation manner, the plurality of enhanced test images are 4 enhanced test images, one enhanced test image in the 4 enhanced test images is enhanced in a horizontal mirror image manner, the other three enhanced test images are enhanced in a rotation manner, and the rotation angles of the enhanced test images are different.

S20, determining the prediction probability map of the test image and the prediction probability map of each enhanced test image through the pre-trained segmentation model, and determining the master probability map based on each determined prediction probability map.

Specifically, the segmentation model is a pre-trained network model, a pre-training sample set for training the segmentation model includes a plurality of training images, and each of the plurality of training images carries labeling information. In addition, before training the segmentation model by using the preset training sample set, data preprocessing may be performed with respect to the preset training sample set, where the data preprocessing at least includes normalization, and may further include normalization, scaling, data expansion, unbalanced sample processing, and the like. Specifically, the normalization is to subtract the mean value of the pixels from all the pixels of the image, and then divide the mean value by the standard deviation of the pixels, so that the gray distribution of the image satisfies the distribution that the mean value is 0 and the variance is 1. In addition, in most cases, the appearance difference between the pre-training sample sets is large and the distribution is unbalanced, and if data is not pre-processed, certain influence may be caused on the subsequent training process of the segmentation model, such as limitation of accuracy, convergence speed, generalization capability and the like. Therefore, the difference between the data can be reduced through data preprocessing, the imaging data of different devices can be adapted, the generalization capability of the model is enhanced, and meanwhile, the training optimization is facilitated.

In one implementation of this embodiment, the segmentation model includes at least one style conversion module; the determining the prediction probability map of the test image and the prediction probability map of each enhanced test image through the pre-trained segmentation model specifically includes:

Specifically, the style conversion module is used for changing the appearance gray distribution of the test image according to the appearance gray distribution of the source domain image so as to adjust the image gray style of the test image. The style conversion module may adopt a plug-and-play type framework and is configured in the segmentation model in the testing process of the segmentation model. In addition, the segmentation model may not include the style conversion module in the process of training the segmentation model, but the style conversion module is inserted into the segmentation model after the training is completed, and the segmentation model inserted into the style conversion module is used as a pre-trained segmentation model; or, configuring the style conversion module in the training process of the segmentation model, skipping the style conversion module after the training image passes through the network layer positioned in front of the style conversion module, and inputting the training image into the network layer positioned behind the style conversion module, or, configuring the style conversion module in the training process of the segmentation model, wherein the input items of the input style conversion module are kept unchanged, namely the input items and the output items of the style conversion module are the same.

In an implementation manner of this embodiment, the segmentation model may include an encoder and a decoder, the style conversion module is located in the encoder, and the encoder may include one style conversion module or may include a plurality of style conversion modules, when the encoder includes a plurality of style conversion modules, positions of the style conversion modules in the encoder are different, and when functions of the style conversion modules are the same, the style conversion modules are all used to adjust the image gray scale of the first feature map based on the image gray scale distribution of the second feature map, where the first feature map refers to a network layer located before the style conversion module and generated based on a test image, and the second feature map refers to a network layer located before the style conversion module and generated based on a training image. It will be appreciated that when a plurality of the style conversion modules are included, the image gray-scale style of the test image is adjusted a plurality of times based on the image gray-scale style of the source domain image. The example of the encoder including a style conversion module is described herein.

The reference test image includes a test image group consisting of the test image and each enhanced test image, that is, the reference test image may be either the test image or the enhanced test image. The source domain image is a training image in a preset training sample set for training the segmentation model, that is, when the segmentation model is tested by using a test image, one training image is selected from the preset training sample set for training the segmentation model as a source domain image corresponding to the test image, then the test image and the source domain image are respectively input into the segmentation model, and a first feature map of the test image and a second feature map of the source domain image are respectively obtained through the segmentation model, so that the image gray scale of the first feature map is adjusted based on the image gray scale distribution of the second feature map.

In an implementation manner of this embodiment, the adjusting, by the control style conversion module, the image gray scale of the first feature map based on the image gray scale distribution of the second feature map to obtain the adjusted first feature map specifically includes:

for each channel of the first feature map and each channel of the second feature map, determining the sequence serial number of the pixel value of each pixel in the channel in all the pixels of the channel;

In particular, the image gray scale is used to reflect image style characteristics, which may includeContrast, brightness, texture, and subtle noise, among others. The sequence numbers are formed according to the sequence of the pixel values from small to large, and it can be understood that a pixel sequence is obtained by sequencing the pixel values of the pixels in the channel according to the sequence from small to large, and the position number of each pixel in the pixel sequence is the sequence number of the pixel. For example, a channel F as shown in FIG. 3_c[n]The pixel values of the pixels in (a) get F shown in fig. 3 in descending order_c[n]A sequence number diagram, wherein a sequence number corresponding to-1.32 is 0, which indicates that the pixel value of the pixel in the third row and the first column is 0 in the sequence number of all the pixel values; 2.57 corresponds to a sequence number of 8, indicating that the pixel values of the pixels in the third row and the third column are sorted by a sequence number of 8 among all the pixel values. Therefore, each channel in the first characteristic diagram and each channel in the second characteristic diagram can form a sequence number diagram, and the sequence number of each position in the sequence number diagram identifies the sequence number of the pixel value of the pixel at the position in the channel corresponding to the sequence number diagram in all pixel value sequences.

After the sequence serial number of each pixel in each channel is obtained, matching each channel in the first feature map with each channel in the second feature map according to the channel number to obtain the channel in the second feature map corresponding to each channel in the first feature map, and recording the channel in the second feature map as a candidate channel corresponding to each channel, wherein the channel number of each channel is the same as the candidate channel number corresponding to each channel. This is because the first feature map and the second feature map are both output through the network layer located before the style conversion module, so that the image scale of the first feature map is the same as that of the second feature map, and thus each channel in the first feature map can select a candidate channel with the same channel number in the second feature map.

After the candidate channel is obtained, for each pixel in the channel in the first feature map, selecting a candidate pixel with the sequence number same as that of the pixel in the candidate channel, and taking the pixel value of the candidate pixel as the pixel value of the pixel corresponding to the candidate pixel. For example, as shown in FIG. 3, channel F_c[n]In the sequence number of0 Pixel selection candidate channel F_s[n]The pixel with the middle sequence number of 0 is taken as a candidate pixel, F_s[n]The pixel value of the pixel with the middle sequence number of 0 is-0.74, then the channel F_c[n]The-1.32 of the pixel with the middle sequence number 0 is replaced with-0.74, and thus, the channel F_c[n]F 'can be obtained after the adjustment'_c[n]That is, for F_c[n]By each pixel value of F_s[n]Neutralization of F_c[n]When F is replaced by a pixel value having the same ordinal number_c[n]The pixel value in (1) is replaced to obtain F'_c[n]After, F'_c[n]And F_s[n]The values of the two are the same, so that the feature statistics representing the image style information, such as mean, label difference, covariance and the like, can be migrated, and the replaced F'_c[n]And F before replacement_c[n]The order of magnitude of the values of (a) is consistent, i.e., its order statistics are preserved. The style conversion module can change the pixel value of the pixel in the first feature map of the test image based on the pixel value of the pixel in the second feature map of the source domain image to transform the appearance distribution of the test image into the appearance distribution of the source domain image, and can keep the semantic structure information of the test image by keeping the sequence statistical property of the values of the feature maps, so that the test image with distribution drift can be robustly segmented by the existing segmentation model.

In another implementation of the embodiment, since the medical image is generally uneven in gray scale distribution, especially in the ultrasound image, the frequency of uneven gray scale distribution is higher. Therefore, when the style conversion module is used for adjusting the image gray scale, the channel can be divided into a plurality of sliding windows, and then the image gray scale of each sliding window is adjusted, so that the problem that the medical image is generally uneven in gray scale distribution is further solved. Based on this, the adjusting the image gray scale of the first feature map by the control style conversion module based on the image gray scale distribution of the second feature map to obtain the adjusted first feature map specifically includes:

Specifically, the sliding windows are obtained by a sliding window method, and the window sizes of the sliding windows are the same, for example, 3 × 3, so that the number of the sliding windows obtained by dividing each channel in the first feature diagram is the same as the number of the sliding windows obtained by dividing each channel in the second feature diagram, and thus the number of the sliding windows corresponding to the first feature diagram is the same as the number of the sliding windows corresponding to the second feature diagram, and the sliding windows corresponding to the first feature diagram correspond to the sliding windows corresponding to the second feature diagram in a one-to-one manner. In addition, after acquiring a plurality of sliding windows corresponding to the first feature map and a plurality of sliding windows corresponding to the second feature map, performing image gray scale adjustment on the plurality of sliding windows corresponding to the first feature map, where a process of performing image gray scale adjustment on each sliding window is the same as a process of performing image gray scale adjustment on each channel in the foregoing embodiment, which is not repeated here, and the foregoing description may be specifically referred to.

Further, after the adjusted sliding windows corresponding to the sliding windows are obtained, two sliding windows may existIn the case where the window comprises the same pixel, e.g. a sliding window of 3 x 3 with a step size of 2, then the pixel in the third column of the first row will be comprised by two sliding windows. Therefore, when the adjusted first feature map is determined after the adjusted sliding windows are acquired, the pixel position of the pixel can be acquired for the pixel in each channel in the first feature map, the sliding window including the pixel position is selected, the pixel value of the pixel position in the adjusted sliding window corresponding to the sliding window is read, and the average value of all the read pixel values is used as the adjusted pixel value of the pixel, so that the adjusted first feature map is obtained. For example, as shown in FIG. 4, the channel includes four sliding windows, respectively designated as

And

each sliding window is respectively aligned with the sequence statistics to obtain

And

then based on

And

forming an adjusted channel, wherein the pixel value adjusted by the pixel included in the sliding windows is an average value of the pixel in the adjusted sliding window including the pixel, and the sequence statistics are aligned to the above-mentioned process of performing pixel replacement based on the sequence number.

In an implementation manner of this embodiment, the determining the master probability map based on the determined prediction probability maps specifically includes:

Specifically, the master probability map is obtained by adding each candidate test probability map and a prediction probability map corresponding to the test image, wherein the similarity refers to adding pixel values of pixel positions corresponding to each candidate prediction probability map and the prediction probability map. The image scale of the master plate probability graph is the same as that of the prediction probability graph, the pixel value of each pixel position in the master plate probability graph is equal to the sum of the pixel value of the pixel position in each candidate test probability graph and the pixel value of the pixel position in the prediction probability graph, the value range of the pixel value of each pixel position in the master plate probability graph is [0, N ], and N is equal to the sum of the number of the plurality of enhanced predicted images and 1. For example, as shown in fig. 5, the number of the enhanced prediction images is 4, and then the value range of the pixel value of each pixel position in the master probability map is [0,5 ]. In addition, the inverse enhancement operation is to welcome the prediction result of each pixel in the prediction probability map corresponding to the enhanced test image as the original position to obtain the candidate test probability map corresponding to each enhanced test image.

S30, dividing the master plate probability graph based on each preset threshold value in a plurality of preset threshold values respectively to obtain a plurality of maximum joint mask graphs.

Specifically, the preset thresholds are preset, each preset threshold of the preset thresholds is an integer, and the number of the preset thresholds is smaller than or equal to the number of the enhanced test images plus 1, and in a typical implementation, the number of the preset thresholds is equal to the number of the enhanced test images plus 1. For example, as shown in fig. 5, the number of the plurality of enhanced test images is 4, and the number of the plurality of preset thresholds is 5. The number of the maximum joint mask images is equal to the number of the preset thresholds, that is, the master probability image is divided into the maximum joint mask images based on each preset threshold, each maximum joint mask image is obtained based on division of the master probability image, and the preset thresholds corresponding to the maximum joint mask images are different.

In an implementation manner of this embodiment, the dividing the master probability map based on each of a plurality of preset thresholds to obtain a plurality of maximum joint mask maps specifically includes:

selecting a plurality of preset thresholds;

Specifically, each of the plurality of preset thresholds is an integer smaller than the number of prediction probability maps, and since the prediction probability maps include the prediction probability map corresponding to the test image and the prediction probability map corresponding to each enhanced test image, the number of the prediction probability maps is equal to the number of the enhanced test images plus 1, and the number of the plurality of preset thresholds is equal to the number of the enhanced test images plus 1, each integer smaller than the number of the prediction probability maps is a preset threshold, so that the plurality of preset thresholds may be 0,1, 2. For example, the number of preset probability maps is 5, and then several preset thresholds include 0,1,2,3, and 4. Therefore, selecting a plurality of preset thresholds may specifically include: and acquiring the number of the prediction probability maps, selecting all integers smaller than the number of the prediction probability maps, and finally selecting all the integers as a plurality of preset thresholds.

After a plurality of preset thresholds are selected, dividing the master plate probability graph based on each preset threshold, wherein the process of determining the maximum joint mask graph can be represented as:

p_i,i+1min { max { P-i },1}, { i is an integer and i is<N}

WhereinI denotes a preset threshold, P denotes a reticle probability map, and P denotes a reticle probability map_i,i+1Representing the maximum joint mask map, N equals the number of several enhanced predicted pictures plus 1.

In this embodiment, the master probability map is divided into corresponding neighborhood regions by using a plurality of adjacent integers, so that a plurality of maximum joint mask maps with different confidence levels can be obtained. For example, as shown in FIG. 5, if several preset thresholds include 0,1,2,3 and 4, then 5 maximum joint mask maps with different confidence levels, p respectively, can be obtained_0,1，p_1,2，p_2,3，p_3,4，p_4,5Wherein p is_0,1Can be viewed as a maximally-joined and-collected graph of five predictive probability maps, which preserves all non-zero probability positions, p_1,2，p_2,3，p_3,4Is a joint graph with at least 1,2,3 predicted positions as foreground, p_4,5The method can also be regarded as an intersection graph of five prediction probability graphs, so that maximum joint mask graphs with different confidence levels can be used as supplementary information of the prediction probability graphs, more reasonable foreground indication can be given by candidate pseudo labels determined based on the maximum joint mask graphs and the prediction probability graphs, and the accuracy of fine adjustment of the segmentation model based on the test graph can be improved.

S40, determining the pseudo label of the test image and each enhanced test image based on a plurality of maximum joint mask images.

Specifically, the pseudo labels are used as labeling information, and the segmentation model is finely adjusted based on the pseudo labels after the test image and the pseudo labels of the enhanced test images are obtained, wherein the pseudo labels of the test image and the enhanced test images are both based on one maximum joint mask image of a plurality of maximum joint mask images, and the pseudo labels of the test image and the enhanced test images are determined and obtained based on distance similarity indexes of the corresponding prediction probability images and the maximum joint mask images.

Based on this, in an implementation manner of this embodiment, the determining the pseudo labels of the test image and each enhanced test image based on a plurality of maximum joint mask maps specifically includes:

Specifically, the pseudo tag set includes a preset number of maximum joint mask maps, and the number of maximum joint mask maps included in the pseudo tag set is smaller than or equal to the number of several maximum joint mask maps, that is, the pseudo tag set may include a part of the several maximum joint mask maps, or all of the several maximum joint mask maps. In a specific implementation manner, since the maximum joint mask generated by the lower threshold may include more supplementary information of the prediction result of the test image, the preset number of maximum joint mask maps is obtained by selecting from small to large the preset threshold corresponding to the maximum joint mask map, and the preset number is smaller than the number of the plurality of maximum joint mask maps. For example, several maximum joint mask maps include p_0,1，p_1,2，p_2,3，p_3,4And p_4,5Selecting p_0,1，p_1,2,p_2,3Forming a pseudo tag set, i.e. a pseudo tag set comprising p_0,1，p_1,2,p_2,3。

After the pseudo tag sets are obtained, as shown in fig. 5, a most similar maximum joint mask map in the pseudo tag sets Pset is selected as the corresponding pseudo tag p 'for each prediction probability map by using a bulldozer Distance (EMD) similarity index'_iTo obtain the test image and the pseudo label of each enhanced test image. In the embodiment, the probability graph with the minimum distribution distance is selected as the corresponding pseudo label, and the self prediction probability graph can be combined to a great extent, so that the possible collapse caused by large-scale adjustment of the model in the self-learning process is avoided.

S50, fine-tuning the segmentation model based on each pseudo label, and continuing to perform the step of enhancing the data of the test images in the test image set until the fine-tuning end condition is met, so as to obtain the image segmentation model.

Specifically, after each pseudo label is obtained, the segmentation model is modified based on the prediction probability map and the pseudo labels corresponding to the test image and each enhanced test image, so as to fine-tune the segmentation model. In addition, since the test image is the label information, it cannot be determined whether the segmentation model is optimized towards the correct direction for generating a reasonable result, and when to stop the fine tuning of the segmentation model, so that the embodiment provides a fine tuning end condition, wherein the fine tuning end condition is the similarity coefficient D_curLess than a predetermined coefficient D_preOr the number of fine adjustments k is equal to a preset number threshold k_max. And when the similarity coefficient corresponding to the test image is smaller than the preset coefficient or the fine tuning frequency is equal to the preset frequency threshold, indicating that the segmentation model meets the fine tuning finishing condition, and finishing the fine tuning process of the segmentation model.

Based on this, the fine tuning the segmentation model based on each pseudo label, and continuing to perform the step of performing data enhancement on the test image in the test image set until the fine tuning end condition is satisfied, so as to obtain the image segmentation model specifically includes:

Specifically, the similarity coefficient adopts a Dice coefficient, the shape consistency of a first maximum joint mask image corresponding to a maximum preset threshold and a second maximum joint mask image corresponding to a minimum preset threshold is calculated through the Dice coefficient, and the calculated shape consistency is used as the similarity coefficient. When the similarity coefficient of the first maximum joint mask image and the second maximum joint mask image is smaller than a preset coefficient, the maximum joint masks at different confidence levels tend to be consistent with each other, and the prediction results of the segmentation model on the test image and each enhanced test image corresponding to the test image are stable, so that the segmentation model is correctly optimized, and the fine adjustment of the segmentation model can be stopped. In addition, the fine tuning frequency is added to the fine tuning end condition to be equal to the preset frequency threshold value, so that the fine tuning process can be prevented from entering a dead cycle.

In summary, the embodiment provides a training method for an image segmentation model, the method includes performing data enhancement on a test image in a test image set to obtain a plurality of enhanced test images; determining a prediction probability graph of the test image and a prediction probability graph of each enhanced test image through a pre-trained segmentation model, and determining a master plate probability graph based on each determined prediction probability graph; determining a plurality of maximum joint mask images based on the master probability map; determining pseudo labels for the test image and each enhanced test image based on a number of maximum joint mask maps; and finely adjusting the segmentation model based on each pseudo label, and continuously executing the step of performing data enhancement on the test images in the test image set until the fine adjustment finishing condition is met, so as to obtain the image segmentation model. The segmentation model is adjusted on line through a self-supervision learning party, the reliable pseudo labels are generated in a test period data enhancement mode to drive on-line learning of the segmentation model, and learning is stopped dynamically according to the consistency degree of the pseudo labels, so that the model performance and the generalization degree of the image segmentation model can be improved.

Based on the above training method of the image segmentation model, the present embodiment provides a computer-readable storage medium storing one or more programs, which are executable by one or more processors to implement the steps in the training method of the image segmentation model according to the above embodiment.

Based on the training method of the image segmentation model, the present application also provides a medical image segmentation method, which applies the image segmentation model obtained by training the training method of the image segmentation model provided in the above embodiment, and the medical image segmentation method specifically includes:

inputting a medical image to be segmented into the image segmentation model, and determining a segmentation region corresponding to the medical image through the image segmentation model.

Based on the above training method of the image segmentation model, the present application further provides a terminal device, as shown in fig. 6, which includes at least one processor (processor) 20; a display screen 21; and a memory (memory)22, and may further include a communication Interface (Communications Interface)23 and a bus 24. The processor 20, the display 21, the memory 22 and the communication interface 23 can communicate with each other through the bus 24. The display screen 21 is configured to display a user guidance interface preset in the initial setting mode. The communication interface 23 may transmit information. The processor 20 may call logic instructions in the memory 22 to perform the methods in the embodiments described above.

Furthermore, the logic instructions in the memory 22 may be implemented in software functional units and stored in a computer readable storage medium when sold or used as a stand-alone product.

The memory 22, which is a computer-readable storage medium, may be configured to store a software program, a computer-executable program, such as program instructions or modules corresponding to the methods in the embodiments of the present disclosure. The processor 20 executes the functional application and data processing, i.e. implements the method in the above-described embodiments, by executing the software program, instructions or modules stored in the memory 22.

The memory 22 may include a storage program area and a storage data area, wherein the storage program area may store an operating system, an application program required for at least one function; the storage data area may store data created according to the use of the terminal device, and the like. Further, the memory 22 may include a high speed random access memory and may also include a non-volatile memory. For example, a variety of media that can store program codes, such as a usb disk, a removable hard disk, a Read-Only Memory (ROM), a Random Access Memory (RAM), a magnetic disk, or an optical disk, may also be transient storage media.

In addition, the specific processes loaded and executed by the storage medium and the instruction processors in the terminal device are described in detail in the method, and are not stated herein.

Finally, it should be noted that: the above embodiments are only used to illustrate the technical solutions of the present application, and not to limit the same; although the present application has been described in detail with reference to the foregoing embodiments, it should be understood by those of ordinary skill in the art that: the technical solutions described in the foregoing embodiments may still be modified, or some technical features may be equivalently replaced; and such modifications or substitutions do not depart from the spirit and scope of the corresponding technical solutions in the embodiments of the present application.

Claims

1. A training method of an image segmentation model is characterized by comprising the following steps:

2. The method for training an image segmentation model according to claim 1, wherein the determining the master probability map based on the determined prediction probability maps specifically comprises:

3. The method for training an image segmentation model according to claim 1, wherein the dividing the master probability map based on each of a plurality of preset thresholds to obtain a plurality of maximum joint mask maps specifically comprises:

4. The method for training an image segmentation model according to claim 1, wherein the determining the pseudo labels of the test image and each enhanced test image based on the maximum joint mask maps specifically comprises:

5. The method for training the image segmentation model according to claim 1, wherein the fine-tuning the segmentation model based on each pseudo label and continuing to perform the step of enhancing the data of the test image in the test image set until a fine-tuning end condition is satisfied, so as to obtain the image segmentation model specifically comprises:

6. A method for training an image segmentation model according to any one of claims 1 to 5, characterized in that the segmentation model comprises at least one style transformation module; the determining the prediction probability map of the test image and the prediction probability map of each enhanced test image through the pre-trained segmentation model specifically includes:

7. The method for training an image segmentation model according to claim 6, wherein the adjusting the image gray scale of the first feature map by the control style conversion module based on the image gray scale distribution of the second feature map to obtain the adjusted first feature map specifically comprises:

8. The method for training an image segmentation model according to claim 6, wherein the adjusting the image gray scale of the first feature map by the control style conversion module based on the image gray scale distribution of the second feature map to obtain the adjusted first feature map specifically comprises:

9. A computer-readable storage medium storing one or more programs, the one or more programs being executable by one or more processors to implement the steps in the method for training an image segmentation model according to any one of claims 1 to 8.

10. A terminal device, comprising: a processor, a memory, and a communication bus; the memory has stored thereon a computer readable program executable by the processor;

the processor, when executing the computer readable program, implements the steps in the method of training an image segmentation model according to any one of claims 1 to 8.