CN117115437A - Multi-index multi-organ medical image segmentation model evaluation system based on region - Google Patents

Multi-index multi-organ medical image segmentation model evaluation system based on region Download PDF

Info

Publication number
CN117115437A
CN117115437A CN202310899309.8A CN202310899309A CN117115437A CN 117115437 A CN117115437 A CN 117115437A CN 202310899309 A CN202310899309 A CN 202310899309A CN 117115437 A CN117115437 A CN 117115437A
Authority
CN
China
Prior art keywords
model
medical image
organ
image segmentation
value
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202310899309.8A
Other languages
Chinese (zh)
Inventor
叶淇
郭礼华
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
South China University of Technology SCUT
Original Assignee
South China University of Technology SCUT
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by South China University of Technology SCUT filed Critical South China University of Technology SCUT
Priority to CN202310899309.8A priority Critical patent/CN117115437A/en
Publication of CN117115437A publication Critical patent/CN117115437A/en
Pending legal-status Critical Current

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/20Image preprocessing
    • G06V10/26Segmentation of patterns in the image field; Cutting or merging of image elements to establish the pattern region, e.g. clustering-based techniques; Detection of occlusion
    • G06V10/267Segmentation of patterns in the image field; Cutting or merging of image elements to establish the pattern region, e.g. clustering-based techniques; Detection of occlusion by performing operations on regions, e.g. growing, shrinking or watersheds
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/045Combinations of networks
    • G06N3/0455Auto-encoder networks; Encoder-decoder networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/047Probabilistic or stochastic networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/048Activation functions
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • G06N3/084Backpropagation, e.g. using gradient descent
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/20Image preprocessing
    • G06V10/25Determination of region of interest [ROI] or a volume of interest [VOI]
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/70Arrangements for image or video recognition or understanding using pattern recognition or machine learning
    • G06V10/74Image or video pattern matching; Proximity measures in feature spaces
    • G06V10/761Proximity, similarity or dissimilarity measures
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/70Arrangements for image or video recognition or understanding using pattern recognition or machine learning
    • G06V10/764Arrangements for image or video recognition or understanding using pattern recognition or machine learning using classification, e.g. of video objects
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/70Arrangements for image or video recognition or understanding using pattern recognition or machine learning
    • G06V10/77Processing image or video features in feature spaces; using data integration or data reduction, e.g. principal component analysis [PCA] or independent component analysis [ICA] or self-organising maps [SOM]; Blind source separation
    • G06V10/774Generating sets of training patterns; Bootstrap methods, e.g. bagging or boosting
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/70Arrangements for image or video recognition or understanding using pattern recognition or machine learning
    • G06V10/77Processing image or video features in feature spaces; using data integration or data reduction, e.g. principal component analysis [PCA] or independent component analysis [ICA] or self-organising maps [SOM]; Blind source separation
    • G06V10/776Validation; Performance evaluation
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/70Arrangements for image or video recognition or understanding using pattern recognition or machine learning
    • G06V10/82Arrangements for image or video recognition or understanding using pattern recognition or machine learning using neural networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V2201/00Indexing scheme relating to image or video recognition or understanding
    • G06V2201/03Recognition of patterns in medical or anatomical images
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V2201/00Indexing scheme relating to image or video recognition or understanding
    • G06V2201/03Recognition of patterns in medical or anatomical images
    • G06V2201/031Recognition of patterns in medical or anatomical images of internal organs

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • Evolutionary Computation (AREA)
  • Health & Medical Sciences (AREA)
  • Artificial Intelligence (AREA)
  • Computing Systems (AREA)
  • Software Systems (AREA)
  • General Health & Medical Sciences (AREA)
  • Multimedia (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Medical Informatics (AREA)
  • Databases & Information Systems (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Biomedical Technology (AREA)
  • Biophysics (AREA)
  • Computational Linguistics (AREA)
  • Data Mining & Analysis (AREA)
  • Molecular Biology (AREA)
  • General Engineering & Computer Science (AREA)
  • Mathematical Physics (AREA)
  • Probability & Statistics with Applications (AREA)
  • Apparatus For Radiation Diagnosis (AREA)

Abstract

The invention discloses a region-based multi-index multi-organ medical image segmentation model evaluation system which comprises a data acquisition module, an organ region sketching module, a data preprocessing module, a medical image segmentation model training module, a medical image segmentation model testing module and a medical image segmentation model evaluation module. The system provides a multi-index multi-organ medical image segmentation model evaluation method, simultaneously quantifies multi-organ segmentation results, multi-accuracy indexes and confidence estimation values in a simple and unified measure, provides information of model comprehensive performance and clinical availability degree, and brings novel and visual standards for clinically evaluating multi-index and multi-organ medical image segmentation models.

Description

Multi-index multi-organ medical image segmentation model evaluation system based on region
Technical Field
The invention relates to the technical field of medical image processing, in particular to a multi-index multi-organ medical image segmentation model evaluation system based on areas.
Background
In recent years, in clinical practice, U-Net and its derivative variant models have reached the most advanced results in a variety of medical image segmentation tasks. However, the research on model evaluation techniques is relatively short with respect to the development of the model itself. The existing model evaluation technology faces the problem of complexity in guiding clinical practice, the model evaluation technology is not comprehensive, and the evaluation technology aiming at specific clinical application scenes, particularly multi-organ segmentation tasks, is still blank.
In clinical application, the most original model evaluation technology is that a clinician performs subjective evaluation according to a segmentation result of a model, so that the accuracy of the model is evaluated. The method is high in reliability and strong in specialization, but is time-consuming, increases the workload of staff and has subjective difference problems. The traditional model evaluation technology introduces objective indexes and focuses on the accuracy of model segmentation to calculate the correlation coefficient between the segmentation result and the true value label. The method is objective and can image clinical diagnosis results, but the sequencing results of model evaluation are inconsistent due to different emphasis points and clinical meanings of various accuracy indexes, so that the complexity of the clinical evaluation model is increased and the model is puzzled for a doctor to select in clinic. Considering the accuracy of model segmentation only singly cannot ensure that the result of model prediction in clinical application is reliable, and whether the model is reliable has a critical influence on the clinical deployment model. The evaluation of the reliability of models is another important technology, however, after the reliability index is introduced, the problem of ambiguity between accuracy and reliability is difficult to avoid, and the phenomenon that one model has higher accuracy but the other model is more reliable is compared with the phenomenon of everywhere. The doctor must take into account or even take into consideration both the accuracy and the reliability of the model when clinically deploying the model, and the existing evaluation technology still does not effectively solve the problems that a plurality of accuracy indexes and reliability estimated values are comprehensively measured in a multi-organ segmentation clinical application scene, and uncertainty and lack of uniformity exist in the model evaluation result.
In general, the model evaluation techniques commonly used in clinic today perform well in certain medical image segmentation tasks, but still suffer from the following drawbacks:
1. the manner based on subjective assessment of the doctor requires much time and effort, and there is a possibility of subjective differences, which increases the workload of the doctor and uncertainty of deploying the model.
2. The evaluation mode based on the objective accuracy index can have the phenomenon that the ordering results of model evaluation among a plurality of accuracy indexes are inconsistent, which is caused by different emphasis points and clinical significance of the indexes. This can increase the complexity of the clinical assessment model and can be confusing for the physician to choose the model clinically.
3. The evaluation mode based on the reliability index has an ambiguity problem with accuracy in the model evaluation result. The manner in which model accuracy and reliability are evaluated individually in the clinic is not comprehensive and does not provide a unified view of model evaluation, and physicians need to choose and trade off both accuracy and reliability.
In summary, in the medical image segmentation model evaluation system, how to comprehensively consider a plurality of accuracy indexes and reliability metrics simultaneously reduces uncertainty and complexity of model evaluation, and does not provide a more reliable and effective model selection basis for a clinician, which is a critical problem to be solved urgently.
Disclosure of Invention
The invention aims to overcome the defects and shortcomings of the prior art, provides a multi-index multi-organ medical image segmentation model evaluation system based on a region, adopts a statistical method, provides a new clinical model evaluation mode, uses a unified measure to simultaneously quantify multi-organ segmentation results, multi-accuracy indexes and reliability estimation, and realizes that a comprehensive and concise view angle is provided for a clinical evaluation model.
In order to achieve the above purpose, the technical scheme provided by the invention is as follows: a region-based multi-index multi-organ medical image segmentation model evaluation system, comprising:
the data acquisition module is used for acquiring an image data set, and comprises CT image data of the same multi-organ part in a plurality of samples; randomly dividing the image data set into a training set, a verification set and a test set;
the organ region sketching module is used for sketching an organ region of interest in the acquired CT image data and is used as a true value label;
the data preprocessing module is used for preprocessing the acquired CT image data and the sketched true value label, and adopts the modes of cutting, format conversion and normalization to enable the acquired CT image data and the sketched true value label to meet the input requirements of a medical image segmentation model, so that a new data format is obtained;
The medical image segmentation model training module is used for carrying out iterative training on the medical image segmentation model by using the training set with the completed division, adjusting model parameters according to the value of the loss function in the training process to gradually converge to an optimal value, and adjusting the model parameters by using the verification set to prevent the model from being trained and fitting, so as to finally obtain the medical image segmentation model with the completed training;
the medical image segmentation model test module is used for inputting a test set into each trained model obtained in the medical image segmentation model training module, generating a corresponding organ segmentation result, calculating confidence coefficient estimated values of each organ under different models as reliability indexes of model prediction according to the organ segmentation result, and evaluating segmentation quality of each organ under different models by combining with the true value labels obtained by the organ region sketching module to obtain corresponding accuracy indexes of model prediction;
the medical image segmentation model evaluation module is used for evaluating the superiority and inferiority of different medical image segmentation models, summarizing the accuracy indexes of all models generated in the medical image segmentation model test module, automatically generating a threshold value for screening whether the segmentation result of each sample organ is clinically available in a statistical mode among the models, establishing ranking correlation of accuracy and confidence degree for each model to be evaluated, counting clinically acceptable confidence intervals, screening by combining the generated threshold value, and finally using a measurement based on a regional value to uniformly quantify a plurality of accuracy indexes, confidence degree estimation values and segmentation results of a plurality of organs to generate a specific and simple measurement index, and providing information of the comprehensive performance superiority and inferiority of the models and the clinical availability degree.
Further, the organ region sketching module manually determines and sketches the organ region of interest of each sample according to the tissue structure characteristics in the reference image.
Furthermore, the data preprocessing module preprocesses data according to the requirements and characteristics of the medical image segmentation model, so that the performance and generalization capability of the medical image segmentation model are improved, and the practical problem is better solved;
for a medical image segmentation model which can only process two-dimensional data, the model can only input two-dimensional CT images, and the data preprocessing module performs the following operations: the three-dimensional CT image data are segmented into two-dimensional CT images, then the two-dimensional CT images are converted into Numpy format, the images are cut in a numerical range from-125 to 275, normalization is carried out on each CT image, and the processed two-dimensional CT images are processed according to 8:1:1 is randomly divided into a training set, a verification set and a test set, wherein the test set is stored in an h5 format;
aiming at a medical image segmentation model capable of directly processing three-dimensional data, the model can directly input three-dimensional images at the moment, and the data preprocessing module performs the following operations: normalizing the CT value of the whole three-dimensional CT image data from the numerical range of-1000 to the numerical range of 0 to 1, resampling each slice to the isotropic voxel spacing of 1.0mm in the pretreatment process, and then carrying out the following steps: 1:1 randomly divides the processed data into a training set, a validation set and a test set.
Further, the method comprises the steps of,the medical image segmentation model training module divides the training set obtained in the data preprocessing module into n small batches and trains the medical image segmentation model batch by batch; and adopts a data enhancement strategy in a training stage, wherein the data enhancement strategy comprises random rotation of 90 degrees, 180 degrees and 270 degrees, random axial position, sagittal position and coronal position overturning and random scaling; use I CT And the real value labels corresponding to the multi-organ segmentation are expressed by g, and the training process comprises the following steps of:
1) For a medical image segmentation model needing pre-training, pre-training is carried out in a large database ImageNet and a generated weight initialization model is used;
2) Will I CT Inputting the data into a medical image segmentation model S for forward propagation, inputting the data into an encoder in the forward propagation process to obtain a series of characteristic images, and then obtaining a segmentation result p with the same size as the input image through a decoder; wherein the segmentation result p is determined by:
p=S(I CT )
3) Comparing the segmentation result p output by the model with a true value label g, and calculating a loss function L; wherein F represents a function for calculating a correlation coefficient between the segmentation result and the true value label, and L is determined by the following formula:
L=F(p,g)
4) Updating parameters of the model by using a back propagation algorithm according to the gradient of the loss function, automatically calculating the gradient of the loss function on each parameter by the medical image segmentation model in the back propagation process, and adaptively adjusting the learning rate and updating the parameters by using an Adam optimization algorithm;
5) According to an Adam optimization algorithm, parameters of the model are updated by using the gradient obtained through calculation, a loss function L of the model is reduced every time the parameters are updated, a verification set is used for testing a medical image segmentation model every preset iteration times, an accuracy index of the model is calculated, and if a test result of the model does not meet the requirement, a model structure, the loss function and super parameters are required to be adjusted, and the model is retrained;
6) Repeating the steps 2) -5) until the loss function L of the model is stable or reaches the preset iteration times.
Further, the medical image segmentation model testing module tests the medical image segmentation model obtained by the medical image segmentation model training module, respectively inputs a testing set to each trained model, obtains segmentation results of each organ, and respectively calculates an accuracy index and a reliability index according to the segmentation results;
The accuracy index calculation is carried out by measuring the difference between the model segmentation result and the true value label, and is divided into a calculation method based on a region and a calculation method based on a boundary; the measurement based on the region is the similarity between the comparison segmentation result and the true value label, and the index is a Dice coefficient and an intersection ratio IOU;
the Dice coefficient is an index for comparing the overlapping degree between the segmentation result and the true value label, and the calculation mode is as follows: converting the segmentation result and the true value label into binary images respectively, multiplying the binary images to obtain an intersection, dividing the pixel number in the intersection by the sum of the total number of pixels in the two binary images, wherein the divided result is used as a position coefficient, the value range of the position coefficient is 0 to 1, and the closer the value is to 1, the better the quality of the segmentation result is represented;
the segmentation result is represented by p, g represents a true value label, I represents the ith voxel value, I represents the total number of voxels, and the Dice coefficient is determined by the following expression:
the cross ratio IOU is used for measuring the precision and accuracy of the segmentation result, and the calculation mode is as follows: converting the segmentation result and the true value label into binary images respectively, multiplying the binary images to obtain an intersection, adding the binary images to obtain a union, subtracting the intersection to obtain a union, dividing the pixel number of the intersection by the pixel number of the union to obtain an intersection ratio IOU, wherein the value range of the intersection ratio IOU is 0 to 1, and the closer the value is to 1, the better the quality of the segmentation result is represented;
The overlap ratio IOU is determined by the following expression:
the boundary-based measurement is to compare the boundary difference between the segmentation result and the true value label, and the index is Haosdorf distance, and the calculation mode is as follows: respectively calculating the shortest distance from each pixel point in the true value label to the segmentation result and the shortest distance from each pixel point in the segmentation result to the true value label, and taking the larger value in the maximum value of the shortest distance from each pixel point in the segmentation result to the true value label as the Haoskov distance, wherein the smaller Haoskov distance value indicates the smaller boundary difference between the segmentation result and the true value label;
g 'and P' are used for respectively representing the true value label and the point set on the curved surface of the segmentation result, G 'and P' respectively represent the points in the point set, HD represents the Hastedorff distance, and the Hastedorff distance is determined by the following expression:
HD(G',P')=max{max g'∈G' min p'∈P' ||g'-p'||,max p'∈P' min g'∈G' ||p'-g'||}
the medical image segmentation model test module needs to select two or more accuracy indexes;
the reliability index of the medical image segmentation model can be obtained through a calculation mode of a direct method or an indirect method;
the direct method is used for obtaining confidence indexes, the confidence is used for measuring the reliability degree of a medical image segmentation model prediction result, namely the determination degree of the model classification result for each pixel point, the calculation method is that the maximum value of prediction logits is taken for each pixel, specifically the value after a Sigmoid or Softmax activation function is adopted, then the average value of all pixels of each test sample in each channel is calculated as the confidence value of each organ, the confidence value is represented by conf, N represents the total sample number, r a A probability value representing a prediction of the a-th sample, the confidence value being determined by the following expression:
the indirect method is calculated by combining accuracy indexes, wherein two adopted indexes are a prediction calibration error and a maximum calibration error, the two indexes are balanced to measure the stability of a medical image segmentation model, the calculation mode is an absolute difference value between a statistical accuracy index and a confidence coefficient, the prediction calibration error is obtained by averaging the absolute difference value, and the maximum calibration error is the maximum value of the absolute difference value;
ECE is used to represent the prediction calibration error, N is used to represent the total number of samples, s a An accuracy index, conf, representing the a-th sample a A confidence indicator representing sample a, the predicted calibration error is determined by the following expression:
the maximum calibration error is denoted by MCE, and is determined by the following expression:
the confidence value output by the model itself, namely the direct method calculation mode, is preferentially selected as the reliability index in the medical image segmentation model test module.
Further, the medical image segmentation model evaluation module measures the performance of the model generated by the medical image segmentation model test module in terms of comprehensiveness, and specifically comprises the following steps:
summarizing the accuracy indexes of all models generated in a medical image segmentation model test module, adopting a statistical method of Bootstrapping for each organ segmentation result under each sample, and generating a threshold value of each accuracy index under each organ between the interiors of the models for screening clinically available sample organ segmentation results;
Bootstrapping refers to a bootstrap method, which is a statistical reasoning method based on a computer, and does not need to rely on data to be in specific distribution, and can extract a preset number of samples from the existing data, and then deduce sample characteristics more in line with actual distribution through statistical analysis of the samples, wherein the bootstrap method has the following core ideas: resampling the existing limited samples, continuously taking smaller sample sets randomly, and carrying out data processing on each small sample set so as to construct sample distribution which is more in line with reality and is used for deduction;
the specific implementation mode of obtaining the accuracy index threshold value in the medical image segmentation model evaluation module by Bootstrapping is as follows: randomly extracting samples with the number o from the total test set samples according to organs under each accuracy index, wherein o is less than or equal to the total sample number to form a new sample set, and repeating B times to generate B new sets; using statistics θ for each set to evaluate, and generating estimates of B θ; the medical image segmentation model evaluation module sets the mean value as a default value of the statistic theta; using the B estimators to construct a new distribution, sorting the distribution in a descending order, and selecting a value corresponding to a percentile as a threshold value under the accuracy index, wherein the percentile is set to be 50% by default in a medical image segmentation model evaluation module, namely the median of the distribution;
Summarizing multiple organ segmentation results, a plurality of accuracy indexes, confidence coefficient estimation values and thresholds under the generated accuracy indexes under all test set samples for each model, carrying out sequencing calculation, solving confidence intervals, screening and comparing according to the thresholds, generating a sample organ set meeting all threshold conditions at the same time, and further calculating availability/comprehensive scores of the models, wherein the results of the scores are taken as the basis for comparing the quality of comprehensive performance or the clinical availability degree among the plurality of models;
generating a result according to a medical image segmentation model test module, wherein each test set sample in one model has a corresponding accuracy index and a confidence coefficient estimated value under each organ; the specific implementation of the calculation of the availability/comprehensiveness score of the model in the medical image segmentation model evaluation module is as follows: for each organ segmentation result of a sample, establishing one-to-one association between different accuracy indexes and confidence estimation values, sorting the different accuracy indexes and confidence estimation values in descending order according to the magnitude of the confidence values, so as to obtain a new sorting set, traversing the sorted results until the end, when traversing the j-th element in the set, calculating confidence intervals of each accuracy index by reusing a Bootstrapping technique on the first j elements in the set, wherein a clinically acceptable segmentation accuracy threshold value comprises 95% confidence intervals, so that an accuracy index value corresponding to the 95% percentile is selected in the process of selecting the percentile, comparing the accuracy index value with the generated threshold value, and in one sample, carrying out the same statistical calculation on the organ segmentation results which meet the threshold value condition only if all the accuracy indexes meet the threshold value condition, generating a sample organ set of which the threshold value condition is higher in the sample, wherein the larger area of the set represents good comprehensive performance of a model, and the final usability/comprehensive score is obtained by dividing the total organ values in the sample set by the total organ values of which are higher in the threshold value of 0 or the total organ values, and the comprehensive performance of the sample organ is higher in the mode of the sample is higher than the total organ value of the sample.
Compared with the prior art, the invention has the following advantages and beneficial effects:
1. aiming at the clinical application scene of specific multi-index and multi-organ segmentation tasks, the system adopts a mode of generating thresholds among the interiors of the models to be evaluated, combines multi-organ segmentation results, multi-accuracy indexes and confidence estimation values to calculate, screens and measures each sample organ result, generates a sample organ set meeting the threshold conditions, and finally generates a usability and comprehensive score value representing the advantages and disadvantages of the models simply and intuitively, thereby bringing more comprehensive, unified and novel insight into the clinical model evaluation.
2. Compared with a manner based on subjective evaluation of doctors, the system is an objective evaluation standard, does not need manual intervention evaluation, and is objective without subjective difference.
3. Compared with an evaluation mode based on objective accuracy indexes, the system provided by the invention measures and unifies different accuracy indexes at the same time, so that the problem of inconsistent sequencing results of a clinical evaluation model is solved, and the complexity of the clinical evaluation model is reduced and the system is more comprehensive.
4. Compared with an evaluation mode based on reliability indexes, the system provided by the invention simultaneously considers the accuracy and the reliability of the medical image segmentation model, uniformly measures the accuracy and the reliability, eliminates the ambiguity problem between the accuracy and the reliability faced by clinical evaluation of the model, and reduces the uncertainty and the complexity of evaluation.
5. Compared with the existing medical image segmentation model evaluation method, the system provided by the invention has stronger comprehensiveness, higher objectivity and more quantification, and has a larger application range, including the neighborhood of the medical image segmentation model evaluation. Meanwhile, the model has stronger comparability, and the comprehensive performances of different models can be compared, so that the advantages and disadvantages of the models are obtained, and a clinician is more conveniently helped to know the characteristics of the models and make a choice for the models.
Drawings
Fig. 1 is a diagram of the architecture of the system of the present invention.
Fig. 2 is a schematic diagram of a medical image segmentation model test module.
FIG. 3 is a schematic diagram of a medical image segmentation model evaluation module.
Fig. 4 is a detailed implementation diagram of availability/comprehensive score calculation.
Detailed Description
The present invention will be described in further detail with reference to examples and drawings, but embodiments of the present invention are not limited thereto.
As shown in fig. 1, the present embodiment discloses a region-based multi-index multi-organ medical image segmentation model evaluation system, including: the system comprises a data acquisition module, an organ region sketching module, a data preprocessing module, a medical image segmentation model training module, a medical image segmentation model testing module and a medical image segmentation model evaluation module.
The data acquisition module is used for acquiring an image data set and comprises multi-organ CT image data obtained by a plurality of samples at the same time under the same position. In this example, the samples were all abdominal multiple organ samples, and the selected sites were all abdominal organs including stomach, liver, kidney, gall bladder, left and right adrenal glands, spleen, esophagus, pancreas and portal vein, and were performed using two different scales of public data sets BTCV and AMOS 2022; patients were randomized and the ratio of training set, validation set and test set was 8:1:1.
The organ region delineating module is used for delineating the organ region of interest in the reference image, and as the evaluation system is applied to a multi-organ segmentation scene, a doctor of an imaging department delineates various organ regions of the abdomen of the human body on the CT image, and the result of the organ delineating the region is used as a true value label in the embodiment.
The data preprocessing module is used for preprocessing the CT image acquired by the data acquisition module so as to enable the CT image to meet the input requirement of the medical image segmentation model;
in the present embodiment, only two-dimensional data can be processed for the medical image segmentation model of Transunet and Swin-unet or the like. The data preprocessing module performs the following operations: the three-dimensional CT image data are segmented into two-dimensional CT images, then the two-dimensional CT images are converted into a Numpy format, the images are cut in a numerical range from-125 to 275, normalization is carried out on each CT image, and the processed two-dimensional CT images are processed according to 8:1:1 is randomly divided into a training set, a verification set and a test set, wherein the test set is stored in an 'h 5' format;
In this embodiment, the medical image segmentation model for 3D Unet, nnU-Net, V-Net, attention u-Net, unetr, and Swin Unetr can directly process three-dimensional data. The data preprocessing module performs the following operations: normalizing the CT value of the whole three-dimensional CT image data from the numerical range of-1000 to the numerical range of 0 to 1, resampling each slice to the isotropic voxel spacing of 1.0mm in the pretreatment process, and then carrying out the following steps: 1:1 randomly divides the processed data into a training set, a validation set and a test set.
The medical image segmentation model training module is used for training the medical image segmentation model by dividing a training set into n small batches (mini batch) of data, wherein the mini batch is set as m, a specific numerical value can be adjusted according to the size of an available video memory of the GPU, and m is selected as 1 in the embodiment; during the training phase, a data enhancement strategy is adopted, including random rotation by 90 °, 180 ° and 270 °, random axis, sagittal and coronal flipping and random scaling. Use I CT And the real value labels corresponding to the multi-organ segmentation are expressed by g, and the training process comprises the following steps of:
1) For a medical image segmentation model which needs to be pre-trained by Transunet and Swin-unet, pre-training in a large database ImageNet and initializing the model by using the generated weights;
2) I is calculated for 3D Unet, nnU-Net, V-Net, attention u-Net, unetr and Swin Unetr models CT Inputting the data into a medical image segmentation model S for forward propagation, inputting the data into an encoder in the forward propagation process to obtain a series of characteristic images, and then obtaining a segmentation result p with the same size as the input image through a decoder; wherein the segmentation result p is determined by:
p=S(I CT )
3) And comparing the segmentation result p output by the model with the true value label g, and calculating a loss function L. Wherein F represents a function for calculating a correlation coefficient between the segmentation result and the true value label, and L is determined by the following formula:
L=F(p,g)
4) The parameters of the model are updated using a back-propagation algorithm based on the gradient of the loss function. In the back propagation process, the medical image segmentation model automatically calculates the gradient of the loss function to each parameter, and adaptively adjusts the learning rate and updates the parameters by using an Adam optimization algorithm in the embodiment;
5) And updating parameters of the model by using the calculated gradient according to an Adam optimization algorithm. Each parameter update reduces the model's loss function L to some extent. Testing the medical image segmentation model by using a verification set at regular iteration times, calculating an accuracy index of the model, and if a test result of the model does not meet the requirement, adjusting a model structure, a loss function and super parameters, and retraining the model;
6) Repeating the steps 2) -5) until the loss function L of the model is stable or reaches the preset iteration times.
The medical image segmentation model test module inputs CT image data in the test set into each training-completed model obtained by the medical image segmentation model training module, generates a corresponding organ segmentation result, and calculates an accuracy index and a reliability index according to the segmentation result;
as shown in fig. 2, CT image data in the test set is input to the image segmentation model S to obtain a segmentation result p. And combining the doctor manual labeling real value label g obtained by the organ region sketching module, calculating an accuracy index by voxel values, wherein I represents an ith voxel value, and I represents the total voxel number.
The medical image segmentation model test module needs to select two or more accuracy indexes, and in the embodiment, the Dice coefficient and the hausdorff distance are mainly adopted for the accuracy indexes. The Dice coefficient is an accuracy index for measuring the overlapping degree and the similarity between the segmentation result and the true value label. The calculation mode is as follows: and respectively converting the segmentation result and the true value label into binary images, multiplying the binary images to obtain an intersection, dividing the pixel number in the intersection by the sum of the total number of pixels in the two binary images, and taking the divided result as a Dice coefficient. The value of the Dice coefficient ranges from 0 to 1, and the closer the value is to 1, the better the quality of the segmentation result is. The Dice coefficient is determined by the following expression:
The Haoskov distance is mainly used for comparing the boundary difference between the segmentation result and the true value label, and the calculation mode is as follows: and respectively calculating the shortest distance from each pixel point in the real value label to the segmentation result and the shortest distance from each pixel point in the segmentation result to the real value label, and taking the larger value in the maximum value of the shortest distance from each pixel point to the real value label as the Hastedorff distance. The smaller the hausdorff distance value, the smaller the boundary difference before the segmentation result and the true value label.
G 'and P' are used for respectively representing the true value label and the point set on the curved surface of the segmentation result, G 'and P' respectively represent the points in the point set, HD represents the Hastedorff distance, and the Hastedorff distance is determined by the following expression:
HD(G',P')=max{max g'∈G' min p'∈P' ||g'-p'||,max p'∈P' min g'∈G' ||p'-g'||}
in this embodiment, the results of testing different medical image segmentation models in two abdomen multi-organ public data sets of BTCV and AMOS 2022 and calculating the Dice coefficient and hausdorff distance are as follows:
table 1 results of the test of average Dice coefficient and hausdorff distance for each model in two data sets
As shown in the table above: from the point of view of the performance evaluation of the average Dice coefficient (∈). On the BTCV dataset, the average Dice coefficient results for both the attribute u-net and Swin unetr models were 0.832 and 0.838, respectively. In the AMOS 2022 dataset, the average Dice coefficient results for nn U-Net and Swin unetr were 0.87 and 0.876, respectively. Although Swin unitr performs better in both data sets than the other model, it has a smaller magnitude score gap than the suboptimal model. And it is noted that the worst performing models are Swin-unet and U-Net, respectively, in terms of the Dice coefficients, and the resulting ordering of the different data sets may vary. For the average hausdorff distance (∈) Transunet scores lowest in the BTCV dataset, i.e., best performing, and has significant gaps with other models. In the AMOS 2022 dataset, swinnetr ranked first, with a score slightly lower than transfunet, which differs by 0.03, but both datasets indicated that U-Net exhibited the worst boundary similarity. However, the different ordering between the hausdorff distance and the Dice coefficient may conflict with each other. This ambiguity in multi-index assessment results in varying final ordering of the models, which can complicate the deployment of models by doctors and experts in clinical practice.
In the present embodiment, a manner of directly obtaining the confidence value is mainly adopted for the reliability index. The confidence level mainly measures the reliability of the prediction result of the medical image segmentation model, namely the determination degree of the classification result of the model for each pixel point, and the calculation mode is that the maximum value of the prediction logits is taken for each pixel, which is usually a value after a Sigmoid or Softmax activation function, and then the average value of all pixels of each test sample in each channel is calculated as the confidence level value of each organ. Confidence value is represented by conf, N represents total sample number, r a A probability value representing a prediction of the a-th sample, the confidence value being determined by the following expression:
in addition, the reliability index can also be indirectly obtained, the calculation mode of the indirect method is to calculate by combining the accuracy index, the two adopted indexes are a prediction calibration error and a maximum calibration error, the two indexes are used for balancing the stability of the medical image segmentation model, the calculation mode is to calculate the absolute difference between the accuracy index and the confidence coefficient, the prediction calibration error is to calculate the average value of the absolute difference, and the maximum calibration error is to obtain the maximum value of the absolute difference.
ECE is used to represent the prediction calibration error, N is used to represent the total number of samples, s a An accuracy index, conf, representing the a-th sample a A confidence indicator representing sample a, the predicted calibration error is determined by the following expression:
the maximum calibration error is denoted by MCE, and is determined by the following expression:
in this embodiment, the results of testing different medical image segmentation models in two abdomen multi-organ public data sets of BTCV and AMOS 2022 and calculating the predicted calibration error ECE and the maximum calibration error MCE are as follows:
table 2 test results of predicted calibration error ECE and maximum calibration error MCE for each model in two data sets
As shown in the table above: from the point of view of ECE (∈) and MCE (∈), it is reflected that Swin unetr shows the best level of calibration in both abdominal multi-organ datasets. In contrast, the U-Net model achieves the highest score in both ECE and MCE, indicating that the model performs the worst in reliability. However, it should be noted that the ordering of the other models will vary depending on the ECE or MCE under consideration and will also be affected by the different data sets. In addition, there is a phenomenon that, for example, the V-Net model performs better than the Unetr in terms of prediction accuracy, but performs worse than the Unetr in terms of reliability. This poses difficulties in selecting models for clinical deployment because of the need to choose between accuracy and reliability of the models, and direct comparison of the models is difficult.
The medical image segmentation model evaluation module is used for evaluating the superiority and inferiority of different medical image segmentation models, summarizing the accuracy indexes of all models in the model test module, and autonomously generating a threshold value for screening whether the segmentation result of each sample organ is clinically available or not between the interiors of the models by using a statistical method. For each model to be evaluated, establishing a ranking correlation of accuracy and confidence, counting clinically acceptable confidence intervals, screening by combining the generated thresholds, and using a unified measurement based on region values to quantify a plurality of accuracy indexes, confidence estimation values and segmentation results of a plurality of organs to generate a specific and concise measurement index, and providing information of the comprehensive performance quality of the model and the clinical availability degree.
As shown in fig. 3, in this embodiment, the Dice coefficients and haoskov distance indexes of all models generated in the medical image segmentation model test module are summarized, and for each organ segmentation result under each sample, a statistical method of Bootstrapping is adopted, and a threshold value of each accuracy index under each organ is generated between the interiors of the models, so as to screen clinically available sample organ segmentation results. The specific implementation method is as follows: and randomly extracting samples with the number o from the total test set samples according to different organs under the index of the Dice coefficient or the Haosdorf distance, wherein o is less than or equal to the total sample number, forming a new sample set, and repeating B times to generate B new sets. The statistics θ are used for each set to evaluate and produce estimates of B θ. For general purposes, the present embodiment sets the "average" to the default value of the statistic θ. And constructing a new distribution by using the B estimators, sorting the distribution in a descending order, and selecting a value corresponding to a percentile as a threshold value under the accuracy index, wherein the percentile is set to be 50% by default in the embodiment, namely the median of the distribution.
Aiming at each model trained in a medical image segmentation model training module in the embodiment, gathering multi-organ segmentation results, dice coefficients, haoskov distances, confidence coefficient estimation values and generated thresholds under all test set samples, carrying out sequencing calculation according to the confidence coefficient values, solving confidence intervals, screening and comparing according to the thresholds to generate sample organ sets meeting all threshold conditions at the same time, and further calculating availability/comprehensive scores of each model, wherein the results of the scores are taken as the basis for comparing the advantages and disadvantages of comprehensive performances among a plurality of models or the sizes of clinically available degrees;
as shown in fig. 4, the specific implementation of calculating the availability/synthesis score of each model in this embodiment is: for each organ segmentation result of the sample, a one-to-one association is established with the confidence estimation value for different Dice coefficients and hausdorff distance indicators, respectively. And sorting in descending order according to the magnitude of the confidence value, thereby obtaining a new sorted set. Traversing the ordered results according to different organs until the end, when traversing the j-th element in the set, respectively calculating confidence intervals of the Dice coefficient and the Haoskov distance by using a Bootstrapping method on the first j elements in the set, wherein a clinically acceptable segmentation accuracy threshold value comprises 95% of confidence intervals, so that an accuracy index corresponding to the 95% percentile is selected in the process of selecting the percentile after descending order, the accuracy index is compared with the generated threshold value, and because a plurality of different accuracy indexes are adopted, in one sample, only the organ segmentation results meeting the threshold condition at the same time are considered to be up to standard, the same statistical calculation is carried out on the segmentation results of all organs, and a sample organ set meeting the threshold condition in all samples is generated, and as shown in a dark area in fig. 4, the larger area of the set represents better comprehensive performance of one model, the usability/comprehensive score result of the final model is divided by the accumulated value of all organs to be segmented in the total sample, so that the area is acquired, and the area of the model is more than the threshold value is 0 or more than the clinical value, and the model can be represented by the higher comprehensive performance value is more than the threshold value in the mode.
In this example, the results of testing different medical image segmentation models in two abdominal multi-organ public datasets, BTCV and AMOS 2022, and calculating the final usability/comprehensiveness score of the models are as follows:
table 3 availability/comprehensiveness score results for each model in two datasets
As shown in the table above: the medical image segmentation model evaluation module finally generates a very intuitive number, wherein the usability/comprehensiveness of the Swin unetr model is the best, and the score of the U-Net is the lowest, i.e. the model with the worst performance. The ranking of the remaining models may vary from dataset to dataset, but there is little chance of repeated scoring. The method well solves the ambiguity problem among multiple indexes, accuracy and reliability, successfully uses a unified measure to quantify multiple accuracy indexes, confidence estimation values and multiple organ segmentation results at the same time, and provides comprehensive performance insights that the accuracy indexes, the confidence estimation values and the calibration error indexes cannot be comprehensively described. Finally, a compact score result is provided, and complexity and uncertainty of model comparison and deployment in clinic are reduced.
The above examples are preferred embodiments of the present invention, but the embodiments of the present invention are not limited to the above examples, and any other changes, modifications, substitutions, combinations, and simplifications that do not depart from the spirit and principle of the present invention should be made in the equivalent manner, and the embodiments are included in the protection scope of the present invention.

Claims (6)

1. A region-based multi-index multi-organ medical image segmentation model evaluation system, comprising:
the data acquisition module is used for acquiring an image data set, and comprises CT image data of the same multi-organ part in a plurality of samples; randomly dividing the image data set into a training set, a verification set and a test set;
the organ region sketching module is used for sketching an organ region of interest in the acquired CT image data and is used as a true value label;
the data preprocessing module is used for preprocessing the acquired CT image data and the sketched true value label, and adopts the modes of cutting, format conversion and normalization to enable the acquired CT image data and the sketched true value label to meet the input requirements of a medical image segmentation model, so that a new data format is obtained;
the medical image segmentation model training module is used for carrying out iterative training on the medical image segmentation model by using the training set with the completed division, adjusting model parameters according to the value of the loss function in the training process to gradually converge to an optimal value, and adjusting the model parameters by using the verification set to prevent the model from being trained and fitting, so as to finally obtain the medical image segmentation model with the completed training;
the medical image segmentation model test module is used for inputting a test set into each trained model obtained in the medical image segmentation model training module, generating a corresponding organ segmentation result, calculating confidence coefficient estimated values of each organ under different models as reliability indexes of model prediction according to the organ segmentation result, and evaluating segmentation quality of each organ under different models by combining with the true value labels obtained by the organ region sketching module to obtain corresponding accuracy indexes of model prediction;
The medical image segmentation model evaluation module is used for evaluating the superiority and inferiority of different medical image segmentation models, summarizing the accuracy indexes of all models generated in the medical image segmentation model test module, automatically generating a threshold value for screening whether the segmentation result of each sample organ is clinically available in a statistical mode among the models, establishing ranking correlation of accuracy and confidence degree for each model to be evaluated, counting clinically acceptable confidence intervals, screening by combining the generated threshold value, and finally using a measurement based on a regional value to uniformly quantify a plurality of accuracy indexes, confidence degree estimation values and segmentation results of a plurality of organs to generate a specific and simple measurement index, and providing information of the comprehensive performance superiority and inferiority of the models and the clinical availability degree.
2. The region-based multi-index multi-organ medical image segmentation model evaluation system according to claim 1, wherein: and the organ region sketching module manually determines and sketches the organ region of interest of each sample according to the tissue structure characteristics in the reference image.
3. The region-based multi-index multi-organ medical image segmentation model evaluation system according to claim 2, wherein: the data preprocessing module preprocesses data according to the requirements and the characteristics of the medical image segmentation model, so that the performance and the generalization capability of the medical image segmentation model are improved, and the practical problem is better solved;
for a medical image segmentation model which can only process two-dimensional data, the model can only input two-dimensional CT images, and the data preprocessing module performs the following operations: the three-dimensional CT image data are segmented into two-dimensional CT images, then the two-dimensional CT images are converted into Numpy format, the images are cut in a numerical range from-125 to 275, normalization is carried out on each CT image, and the processed two-dimensional CT images are processed according to 8:1:1 is randomly divided into a training set, a verification set and a test set, wherein the test set is stored in an h5 format;
aiming at a medical image segmentation model capable of directly processing three-dimensional data, the model can directly input three-dimensional images at the moment, and the data preprocessing module performs the following operations: normalizing the CT value of the whole three-dimensional CT image data from the numerical range of-1000 to the numerical range of 0 to 1, resampling each slice to the isotropic voxel spacing of 1.0mm in the pretreatment process, and then carrying out the following steps: 1:1 randomly divides the processed data into a training set, a validation set and a test set.
4. A region-based multi-index multi-organ medical image segmentation model evaluation system according to claim 3, wherein: the medical image segmentation model training module divides the training set obtained in the data preprocessing module into n small batches and trains the medical image segmentation model batch by batch; and adopts a data enhancement strategy in a training stage, wherein the data enhancement strategy comprises random rotation of 90 degrees, 180 degrees and 270 degrees, random axial position, sagittal position and coronal position overturning and random scaling; use I CT Representing CT image data of the current batch, using g for the true value label corresponding to multi-organ segmentation, and training a model aiming at the medical image segmentation modelThe training process comprises the following steps:
1) For a medical image segmentation model needing pre-training, pre-training is carried out in a large database ImageNet and a generated weight initialization model is used;
2) Will I CT Inputting the data into a medical image segmentation model S for forward propagation, inputting the data into an encoder in the forward propagation process to obtain a series of characteristic images, and then obtaining a segmentation result p with the same size as the input image through a decoder; wherein the segmentation result p is determined by:
p=S(I CT )
3) Comparing the segmentation result p output by the model with a true value label g, and calculating a loss function L; wherein F represents a function for calculating a correlation coefficient between the segmentation result and the true value label, and L is determined by the following formula:
L=F(p,g)
4) Updating parameters of the model by using a back propagation algorithm according to the gradient of the loss function, automatically calculating the gradient of the loss function on each parameter by the medical image segmentation model in the back propagation process, and adaptively adjusting the learning rate and updating the parameters by using an Adam optimization algorithm;
5) According to an Adam optimization algorithm, parameters of the model are updated by using the gradient obtained through calculation, a loss function L of the model is reduced every time the parameters are updated, a verification set is used for testing a medical image segmentation model every preset iteration times, an accuracy index of the model is calculated, and if a test result of the model does not meet the requirement, a model structure, the loss function and super parameters are required to be adjusted, and the model is retrained;
6) Repeating the steps 2) -5) until the loss function L of the model is stable or reaches the preset iteration times.
5. The region-based multi-index multi-organ medical image segmentation model evaluation system according to claim 4, wherein: the medical image segmentation model testing module tests the medical image segmentation model obtained by the medical image segmentation model training module, respectively inputs a testing set to each trained model, obtains the segmentation result of each organ, and respectively calculates an accuracy index and a reliability index according to the segmentation result;
The accuracy index calculation is carried out by measuring the difference between the model segmentation result and the true value label, and is divided into a calculation method based on a region and a calculation method based on a boundary; the measurement based on the region is the similarity between the comparison segmentation result and the true value label, and the index is a Dice coefficient and an intersection ratio IOU;
the Dice coefficient is an index for comparing the overlapping degree between the segmentation result and the true value label, and the calculation mode is as follows: converting the segmentation result and the true value label into binary images respectively, multiplying the binary images to obtain an intersection, dividing the pixel number in the intersection by the sum of the total number of pixels in the two binary images, wherein the divided result is used as a position coefficient, the value range of the position coefficient is 0 to 1, and the closer the value is to 1, the better the quality of the segmentation result is represented;
the segmentation result is represented by p, g represents a true value label, I represents the ith voxel value, I represents the total number of voxels, and the Dice coefficient is determined by the following expression:
the cross ratio IOU is used for measuring the precision and accuracy of the segmentation result, and the calculation mode is as follows: converting the segmentation result and the true value label into binary images respectively, multiplying the binary images to obtain an intersection, adding the binary images to obtain a union, subtracting the intersection to obtain a union, dividing the pixel number of the intersection by the pixel number of the union to obtain an intersection ratio IOU, wherein the value range of the intersection ratio IOU is 0 to 1, and the closer the value is to 1, the better the quality of the segmentation result is represented;
The overlap ratio IOU is determined by the following expression:
the boundary-based measurement is to compare the boundary difference between the segmentation result and the true value label, and the index is Haosdorf distance, and the calculation mode is as follows: respectively calculating the shortest distance from each pixel point in the true value label to the segmentation result and the shortest distance from each pixel point in the segmentation result to the true value label, and taking the larger value in the maximum value of the shortest distance from each pixel point in the segmentation result to the true value label as the Haoskov distance, wherein the smaller Haoskov distance value indicates the smaller boundary difference between the segmentation result and the true value label;
g 'and P' are used for respectively representing the true value label and the point set on the curved surface of the segmentation result, G 'and P' respectively represent the points in the point set, HD represents the Hastedorff distance, and the Hastedorff distance is determined by the following expression:
HD(G',P')=max{max g'∈G' min p'∈P' ||g'-p'||,max p'∈P' min g'∈G' ||p'-g'||}
the medical image segmentation model test module needs to select two or more accuracy indexes;
the reliability index of the medical image segmentation model can be obtained through a calculation mode of a direct method or an indirect method;
the direct method is used for obtaining confidence indexes, the confidence is used for measuring the reliability degree of a medical image segmentation model prediction result, namely the determination degree of the model classification result for each pixel point, the calculation method is that the maximum value of prediction logits is taken for each pixel, specifically the value after a Sigmoid or Softmax activation function is adopted, then the average value of all pixels of each test sample in each channel is calculated as the confidence value of each organ, the confidence value is represented by conf, N represents the total sample number, r a A probability value representing a prediction of the a-th sample, the confidence value being determined by the following expression:
the indirect method is calculated by combining accuracy indexes, wherein two adopted indexes are a prediction calibration error and a maximum calibration error, the two indexes are balanced to measure the stability of a medical image segmentation model, the calculation mode is an absolute difference value between a statistical accuracy index and a confidence coefficient, the prediction calibration error is obtained by averaging the absolute difference value, and the maximum calibration error is the maximum value of the absolute difference value;
ECE is used to represent the prediction calibration error, N is used to represent the total number of samples, s a An accuracy index, conf, representing the a-th sample a A confidence indicator representing sample a, the predicted calibration error is determined by the following expression:
the maximum calibration error is denoted by MCE, and is determined by the following expression:
the confidence value output by the model itself, namely the direct method calculation mode, is preferentially selected as the reliability index in the medical image segmentation model test module.
6. The region-based multi-index multi-organ medical image segmentation model evaluation system according to claim 5, wherein: the medical image segmentation model evaluation module measures the performance of a model generated by the medical image segmentation model test module in a comprehensive aspect, and specifically comprises the following steps:
Summarizing the accuracy indexes of all models generated in a medical image segmentation model test module, adopting a statistical method of Bootstrapping for each organ segmentation result under each sample, and generating a threshold value of each accuracy index under each organ between the interiors of the models for screening clinically available sample organ segmentation results;
bootstrapping refers to a bootstrap method, which is a statistical reasoning method based on a computer, and does not need to rely on data to be in specific distribution, and can extract a preset number of samples from the existing data, and then deduce sample characteristics more in line with actual distribution through statistical analysis of the samples, wherein the bootstrap method has the following core ideas: resampling the existing limited samples, continuously taking smaller sample sets randomly, and carrying out data processing on each small sample set so as to construct sample distribution which is more in line with reality and is used for deduction;
the specific implementation mode of obtaining the accuracy index threshold value in the medical image segmentation model evaluation module by Bootstrapping is as follows: randomly extracting samples with the number o from the total test set samples according to organs under each accuracy index, wherein o is less than or equal to the total sample number to form a new sample set, and repeating B times to generate B new sets; using statistics θ for each set to evaluate, and generating estimates of B θ; the medical image segmentation model evaluation module sets the mean value as a default value of the statistic theta; using the B estimators to construct a new distribution, sorting the distribution in a descending order, and selecting a value corresponding to a percentile as a threshold value under the accuracy index, wherein the percentile is set to be 50% by default in a medical image segmentation model evaluation module, namely the median of the distribution;
Summarizing multiple organ segmentation results, a plurality of accuracy indexes, confidence coefficient estimation values and thresholds under the generated accuracy indexes under all test set samples for each model, carrying out sequencing calculation, solving confidence intervals, screening and comparing according to the thresholds, generating a sample organ set meeting all threshold conditions at the same time, and further calculating availability/comprehensive scores of the models, wherein the results of the scores are taken as the basis for comparing the quality of comprehensive performance or the clinical availability degree among the plurality of models;
generating a result according to a medical image segmentation model test module, wherein each test set sample in one model has a corresponding accuracy index and a confidence coefficient estimated value under each organ; the specific implementation of the calculation of the availability/comprehensiveness score of the model in the medical image segmentation model evaluation module is as follows: for each organ segmentation result of a sample, establishing one-to-one association between different accuracy indexes and confidence estimation values, sorting the different accuracy indexes and confidence estimation values in descending order according to the magnitude of the confidence values, so as to obtain a new sorting set, traversing the sorted results until the end, when traversing the j-th element in the set, calculating confidence intervals of each accuracy index by reusing a Bootstrapping technique on the first j elements in the set, wherein a clinically acceptable segmentation accuracy threshold value comprises 95% confidence intervals, so that an accuracy index value corresponding to the 95% percentile is selected in the process of selecting the percentile, comparing the accuracy index value with the generated threshold value, and in one sample, carrying out the same statistical calculation on the organ segmentation results which meet the threshold value condition only if all the accuracy indexes meet the threshold value condition, generating a sample organ set of which the threshold value condition is higher in the sample, wherein the larger area of the set represents good comprehensive performance of a model, and the final usability/comprehensive score is obtained by dividing the total organ values in the sample set by the total organ values of which are higher in the threshold value of 0 or the total organ values, and the comprehensive performance of the sample organ is higher in the mode of the sample is higher than the total organ value of the sample.
CN202310899309.8A 2023-07-21 2023-07-21 Multi-index multi-organ medical image segmentation model evaluation system based on region Pending CN117115437A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202310899309.8A CN117115437A (en) 2023-07-21 2023-07-21 Multi-index multi-organ medical image segmentation model evaluation system based on region

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202310899309.8A CN117115437A (en) 2023-07-21 2023-07-21 Multi-index multi-organ medical image segmentation model evaluation system based on region

Publications (1)

Publication Number Publication Date
CN117115437A true CN117115437A (en) 2023-11-24

Family

ID=88804621

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202310899309.8A Pending CN117115437A (en) 2023-07-21 2023-07-21 Multi-index multi-organ medical image segmentation model evaluation system based on region

Country Status (1)

Country Link
CN (1) CN117115437A (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN117576076A (en) * 2023-12-14 2024-02-20 湖州宇泛智能科技有限公司 Bare soil detection method and device and electronic equipment

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN117576076A (en) * 2023-12-14 2024-02-20 湖州宇泛智能科技有限公司 Bare soil detection method and device and electronic equipment

Similar Documents

Publication Publication Date Title
Xian et al. Automatic breast ultrasound image segmentation: A survey
US10467757B2 (en) System and method for computer aided diagnosis
US20070165916A1 (en) Automatic multi-dimensional intravascular ultrasound image segmentation method
Qian et al. An integrated method for atherosclerotic carotid plaque segmentation in ultrasound image
CN111105424A (en) Lymph node automatic delineation method and device
Li et al. Automated measurement network for accurate segmentation and parameter modification in fetal head ultrasound images
CN107766874B (en) Measuring method and measuring system for ultrasonic volume biological parameters
CN111768366A (en) Ultrasonic imaging system, BI-RADS classification method and model training method
US11972571B2 (en) Method for image segmentation, method for training image segmentation model
WO2021136368A1 (en) Method and apparatus for automatically detecting pectoralis major region in molybdenum target image
JP2013051988A (en) Device, method and program for image processing
CN109919254B (en) Breast density classification method, system, readable storage medium and computer device
US11684333B2 (en) Medical image analyzing system and method thereof
CN114782307A (en) Enhanced CT image colorectal cancer staging auxiliary diagnosis system based on deep learning
Yao et al. Advances on pancreas segmentation: a review
CN117115437A (en) Multi-index multi-organ medical image segmentation model evaluation system based on region
CN112508902A (en) White matter high signal grading method, electronic device and storage medium
CN117036288A (en) Tumor subtype diagnosis method for full-slice pathological image
Kitrungrotsakul et al. Interactive deep refinement network for medical image segmentation
Chen et al. Pulmonary nodule segmentation in computed tomography with an encoder-decoder architecture
Jose et al. Liver Tumor Classification using Optimal Opposition-Based Grey Wolf Optimization
Chen et al. An Improved Region-based Fully Convolutional Network for Automatic Pulmonary Nodules Detection
WO2019210124A1 (en) Atlas for segmentation of retina layers from oct images
CN114445421B (en) Identification and segmentation method, device and system for nasopharyngeal carcinoma lymph node region
CN112750137B (en) Liver tumor segmentation method and system based on deep learning

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination