CN114119516A

CN114119516A - Virus focus segmentation method based on transfer learning and cascade adaptive hole convolution

Info

Publication number: CN114119516A
Application number: CN202111343978.4A
Authority: CN
Inventors: 蒋宗礼; 王少猛; 张津丽; 顾问
Original assignee: Beijing University of Technology
Current assignee: Beijing University of Technology
Priority date: 2021-11-14
Filing date: 2021-11-14
Publication date: 2022-03-01

Abstract

The invention discloses a virus focus segmentation method based on transfer learning and cascade adaptive cavity convolution, and a new module and a new frame are designed to segment CT images. First, a cascade structure based on three-dimensional convolution is applied. The first-level structure performs rough segmentation on the image to keep the integrity, and the second-level structure combines the rough segmentation to obtain an accurate segmentation result. In a second-level internal structure, a parallel multi-scale feature extraction block is introduced to improve the adaptability of the system to focuses of different sizes. Finally, migration learning is used to solve the overfitting problem of the system. The invention mainly designs a multi-scale parallel cavity rolling block, which not only reduces the calculated amount, but also enables the model to have self-adaptability. Tests on two data are carried out, and the virus focus extraction model designed by the invention is proved to have better performance.

Description

Virus focus segmentation method based on transfer learning and cascade adaptive hole convolution

Technical Field

The invention relates to the technical field of medical image segmentation, in particular to a virus focus CT image automatic segmentation method based on transfer learning, coarse and fine segmentation cascade and adaptive empty hole convolution.

Background

Reverse transcription polymerase chain reaction (nucleic acid detection) is the most critical indicator for the diagnosis of viruses. However, different sampling methods lead to different disease conditions of patients, and the nucleic acid detection has the defects of low sensitivity and high false negative rate. Several studies have shown that chest imaging images such as X-ray and Computed Tomography (CT) scans are all more sensitive and accurate for viral diagnosis in patients. Although X-rays are more readily available, only CT scans show the location of a particular lesion in the lung and accurately distinguish between a frosted glass shadow and lung consolidation. Therefore, CT is an indispensable method for clinical diagnosis of viruses. Viewing and diagnosing CT requires extensive medical expertise and experience. Radiologists can distinguish between viruses and general inflammation by observing features of CT images. However, current manual evaluation of large numbers of CT images is time consuming and inefficient. Therefore, an accurate automatic diagnosis system must be established to improve the diagnosis speed and efficiency.

With the rise of deep learning, neural networks play a key role in computer-aided clinical diagnosis and lesion analysis. Medical image segmentation has many tasks and data, such as brain tumor segmentation, liver tumor segmentation, cell segmentation, lung segmentation, and the like. As data sets become more extensive and public, many deep learning models have been proposed. U-Net is the most classical basic model, applying symmetric structure and skip-chaining to fuse low-level and high-level image features. Oktay et al propose Attention U-Net for pancreas segmentation, which uses an Attention block to fuse different levels of features to improve the accuracy of the segmentation. Zhou et al propose UNet + +, which add dense connections to aggregate multi-scale features in the classical U-Net layer. Isense et al propose nnU-Net, which is an adaptive model that can adaptively process two-dimensional images and three-dimensional images. To process three-dimensional volumetric data such as CT and Magnetic Resonance Imaging (MRI), some classical models apply a three-dimensional convolution. Such as 3D U-Net and V-Net, which directly process three-dimensional data without slicing the data. The advantage of directly processing three-dimensional data is that high-dimensional features in the data can be extracted, and the accuracy of system segmentation is improved.

Recently, many deep learning methods have been applied to automatic classification or segmentation of viral medical images. For example, Ozturk et al constructed a DarkCovindet model for detecting and classifying virus cases from X-ray images. Wang et al developed a weak surveillance framework for classification and lesion localization of CT image slices. Gao et al propose a two-branch combining network that can accomplish both slice-level segmentation and individual-level classification. Fan realizes a Semi-supervised lesion segmentation model (Semi-Inf-Net) to segment the lesion and successfully distinguishes the frosty glass sample from the lung consolidation. Yan et al uses a feature change block to enhance feature representation, and then fuses features through progressive porous spatial pyramid pooling blocks, completing lung segmentation and lesion segmentation at the same time.

Processing CT images using two-dimensional convolution is a computationally inexpensive method, but it does not extract high-dimensional spatial features. Thus, the two-dimensional segmentation approach essentially ignores the stereo features of the lung, resulting in a lack of accurate localization of lung lesions. The three-dimensional convolution can make up for the lack of the two-dimensional convolution. However, one drawback of three-dimensional convolution is that the amount of parameters is multiplied, and the other drawback is that the volume data needs to be cut into small blocks to reduce memory consumption. Both of these deficiencies result in a system that is prone to overfitting and can lose CT image integrity. Finally, the most common problem to solve is the lack of correctly marked data.

In order to solve the above problems, a new module and a new frame are designed to segment the CT image. First, a cascade structure based on three-dimensional convolution is applied. The first-level structure performs rough segmentation on the image to keep the integrity, and the second-level structure combines the rough segmentation to obtain an accurate segmentation result. In a second-level internal structure, a parallel multi-scale feature extraction block is introduced to improve the adaptability of the system to focuses of different sizes. Finally, migration learning is used to solve the overfitting problem of the system.

Disclosure of Invention

The invention aims to realize a high-accuracy virus focus segmentation system based on a convolutional neural network, which is used for realizing automatic diagnosis and segmentation of CT images and solving the problems mentioned in the background technology.

In order to achieve the purpose, the method designs the following technical method: a virus focus area automatic extraction system based on a convolutional neural network comprises the following steps:

step S1: a multi-scale parallel cavity convolution (MPDC) block is designed based on CNN to extract features, so that multi-scale self-adaptation to medical image features is realized;

step S2: constructing an encoder and a decoder based on the MPDC block designed in the step S1 to realize feature extraction and upsampling;

step S3: the designed module is used for carrying out overall design on the implementation process, so that the CT image is segmented, and the result is obtained.

Fig. 1 is a dashed-line frame part, which is a multi-scale parallel feature extraction module designed by the present invention. The module designs a method for extracting features of different receptive fields in parallel according to the characteristics of virus focus graphs in CT images. The design mode can extract multi-scale features and has good adaptivity to focuses of different sizes. As shown in fig. 1, in order to directly process volume data, a system realized by three-dimensional convolution is used in its entirety. A parallel multi-branch convolution structure is innovatively designed for system self-adaption of lesion features with different sizes in CT images. An MPDC block is realized that combines the advantages of multi-scale convolution while reducing computational cost. As can be seen in fig. 1, the MPDC block is designed to be implemented by four parallel branches. The first red branch is a 3 × 3 × 3 maximum pooling layer, which can retain the most obvious features of the image; the second branch is a 3 multiplied by 3 ordinary convolution layer and is used for extracting small-scale features of the image; the third and fourth branches are respectively the hole convolution layers with the expansion rates of 4 and 8, and the purpose of the two branches is to increase the field of perception of convolution on the premise of not increasing the calculation amount so as to extract the features of the image with larger scale. The 4 parallel branches process the feature map of the previous layer input simultaneously and then connect the outputs of all branches by channel dimension. After the superposition, the calculation amount is reduced by a 1 × 1 × 1 convolution layer. The module is designed to automatically adapt to characteristics of different sizes in a mode that four branches are connected in parallel. The introduction of the void volume sum 1 × 1 × 1 greatly reduces the model parameters and the calculation amount, so that the implementation of the method is simpler in hardware and facility conditions.

As a further solution of the present invention, the designed MPDC block is used to construct the main architecture of the model, as shown in fig. 1 as a whole:

the classical architecture of feature extraction and segmentation, the encoder and decoder architecture, is used. In a specific implementation process, four MPDC blocks are connected in series to form an encoder to extract the features of the CT image. A 2 x2 max pooling layer is applied after the first 3 MPDC blocks to downsample the feature map to reduce the data size of the image while extracting features of higher dimensionality. Correspondingly, the feature decoder comprises three groups of convolutions, each group of convolutions comprising a common 3 × 3 × 3 convolution in series and an upsampling module. 3 upsampling layers are used in the decoder to restore the feature map to the size of the original image. In addition, a jump connection is used to feed the output of each MPDC directly to the decoder. The realization of jump connection can not only relieve the problem of gradient disappearance or gradient explosion, but also fuse the characteristics of high and low layers of CT images, and improve the accuracy of the system. Through experimental verification, the output channels of the four MPDC modules are respectively designed to be 64, 128, 256 and 512.

As a further scheme of the invention, the whole system framework is designed according to the actual situation:

experiments show that the current CT image is a three-dimensional image and has large data volume. Common hardware facilities cannot directly process the whole three-dimensional CT image. Therefore, when processing a CT image, it is necessary to cut it into small pieces of data to process. However, if the CT is subjected to the dicing process, the entire information of the CT image is easily lost. Step S3 of the present invention designs a cascaded framework to capture the whole information of the CT image, and simultaneously reduces the requirement of hardware. As shown in fig. 2, a cascade network is designed, the former network CoarseNet performs coarse segmentation on the low-resolution image, and the latter network FineNet performs fine segmentation based on the result of the coarse segmentation network. The overall framework of the system is as follows:

{O₁，O₂，……，O_nrepresents the original resolution CT image after normalization. First high resolution O_iDown-sampling to one eighth of the original resolution

And used as training data for CoarseNet. Secondly, will

Inputting CoarseNet to extract low resolution segmentation maps of viral lesions

Then, the high resolution image O 'is processed by adopting a cascade attention block'_iAnd coarse segmentation picture U'_iLearning a new feature map A as input_i. There are two design details that need attention. First, U in the figure_iIs prepared by mixing

Up-sampling eight times to the original resolution size. Another design to be noted is to separately introduce O_iAnd U_iWas subjected to dicing treatment to obtain chips O'_iAnd U'_i. Finally, the output A of the attention module will be cascaded_iFineNet is input and a final accurate segmentation map S of the lung infection area is obtained_i. CoarseNet and FineNet are core components of the overall segmentation system, and both employ a classical encoder-decoder architecture. In addition, there are many methods of combining the rough segmentation and the fine segmentation. A cascade attention block is specially designed to connect the two parts so as to fully fuse the rough segmentation and the fine segmentation. In the following, the design of the cascade attention block will be described in detail:

in the model of (3), the CoarseNet is used to extract global features from low resolution CT images. These coarse segmentation maps may help filter background information, focusing on the target area. To better utilize the coarse segmentation, an attention block is employed to fuse the concatenation information. The implementation of a cascaded hierarchy of attention blocks is described in detail nextA method. From O'_iRecording as a slice of the original CT image, in U'_iThis is denoted as a coarsely divided image.

C_i＝ReLU(BN(Conv(O_i'))) (1)

F_i＝ReLU(BN(Conv(U_i'))) (2)

Wherein represents ^ represents a sigmoid activation function;

connections representing channel dimensions;

represents a Hadamard product; while ReLU, BN, Conv represent the ReLU activation function, batch normalization and 1 × 1 × 1 convolution layer, respectively. In the implementation process, 64 1 × 1 × 1 convolution layers are used in the formulas (1) and (2), and C is added_iAnd F_iRespectively sampling to 64; (3) the formula then uses 1 convolution block to obtain the coefficient matrix Coe with the channel dimension of 1.

Drawings

FIG. 1 is a network architecture of FineNet, in which the dashed box part is a multiscale parallel hole convolution (MPDC)

And (5) blocking.

Fig. 2 is the overall architecture of the present invention.

FIG. 3 is a comparison of the effect of the inventive model of the present invention and other models.

Detailed Description

The technical scheme and the implementation details in the specific implementation process of the invention are clearly and completely described below with reference to the attached drawings. Obviously, the described implementation examples are only a part of implementation examples of the present invention, and not all of the embodiments; all other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present invention.

The embodiment of this experiment is as follows:

step S1: a data set is constructed. A publicly available data set COVID-CT-MD was used, which was published at 20/4/2021. Three types of CT samples were included in the data set, including 169 diagnosed virus samples, 60 common pneumonia samples, and 76 normal samples. All CT scan slices are in digital imaging and communications in medicine (DICOM) format. There is no segmentation label for the lesion in the dataset. Therefore, radiologists were asked to manually mark the pneumonia foci in 10 virus cases and the entire lungs of all normal specimens.

Step S2: and (4) preprocessing data. The raw CT images have an average Hounsfield Unit (HU) in the range of-1024 to 1600, which is too large for image processing. In the medical image standard, the HU value of the lung region is generally considered to be in the range of-1000 to-700, and the HU value of the virus focus region is observed to be in the range of-700 to 100 HU. Therefore, the HU range of the CT image is narrowed by clipping the HU values of the pixels of the original CT image to a minimum of-1000 and a maximum of + 400. In addition, the distribution of the intensity range of the CT image is crucial to the segmentation, and the variation of the range will greatly affect the accuracy of the segmentation. Therefore, using the normalization approach, all raw CT image intensity values are scaled to the range of 0 to 1, which is more suitable for the training of neural network models. The HU value clipping and normalization can be expressed by the following formula, wherein c_iRepresenting the intensity value, p, of a pixel in the original CT image_iRepresenting the clipped and normalized pixel values:

the voxel spacing of CT three-dimensional volume data is anisotropic, which is very disadvantageous for the system implementation. Therefore, the voxel spacing in all dimensions is first resampled to the same spacing to reduce complexity in the training process. For simplicity of implementation, all volume data is resampled to a fixed voxel pitch of 1 × 1 × 1 mm. After resampling, the resolution at the CT image level was 512 × 512, and the average number of slices was 150. Then, the original CT image is subjected to one-eighth down-sampling using a trilinear method to obtain a low-resolution CT image. The downsampled CT image contains complete context information and occupies less memory resources. After downsampling, the use of 0 padding is used to ensure that all CT image sizes are 24 × 64 × 64. After the padding is completed, the low-resolution image is input to CoarseNet as input data. In addition, when the FineNet is trained, the memory is consumed by directly loading the complete image. The original CT image needs to be cut into smaller volumes to ensure the versatility of the designed system on all computer devices. In practice, each CT scan is sliced into small volumes of data of 32 × 128 × 128, and the same padding with 0 ensures consistency of the small volume block size. For most virus cases, a size of 32 × 128 × 128 may cover the horizontal view of the lesion, facilitating the extraction of lesion features.

Step S3: the data set is enhanced. Since there are only 10 tagged data. To obtain good performance with a small amount of tagged data, data enhancement is applied to create more data with reasonable patterns. Two types of enhancements are achieved: data enhancement is performed by spatial transformation such as rotation and mirroring, and data enhancement is performed by adding gaussian noise. Applying the enhancement operation to each cut small volume before inputting it into the neural network. Another implementation detail, since the number of positive cases labeled is small, applies each enhancement method to all labeled CT images. After enhancement, the probability of overfitting is reduced, and the performance of the model is improved.

Step S4: and (5) network training. To further increase the generalization capability of the designed model, overfitting is avoided. A network model trained by a transfer learning method is applied to solve the problem of poor generalization capability caused by insufficient training data. In the experiment, 76 normal case CT images with lung labeling were used to pre-train the model. Then, an identical network is initialized with pre-trained weights. Finally, in the case of a fixed encoder, the decoder is retrained using the virus image with lesion markings.

All pre-training and migration training was performed using a pytorech on a server equipped with 2 invada RTX2080Ti GPUs. The CoarseNet and FineNet are trained separately to reduce the complexity of training and reduce the memory requirement of the device. First pretraining and fine-tuning the CoarseNet with a batch size of 32, and respectively pretraining and fine-tuning 2000 rounds to obtain the optimal result. Then, the FineNet is pre-trained and fine-tuned, the batch size is set to 6, and the optimal result is obtained respectively with 300 rounds of training and fine tuning. An Adam optimizer is used, the initial learning rate in the pre-training phase is 1e-3, and the initial learning rate in the fine-tuning phase is 1 e-4. Both the overall condition of the lesion and the pixel level details of the lesion need to be considered in the segmentation. Therefore, joint losses consisting of cross-entropy losses and Dice losses are employed in the training. The labeled data were analyzed and the background distribution of normal CT was found to be 86% lung and 14% without severe imbalance. In contrast, there was a severe class bias in the virus image, with a background to lesion area ratio of 87: 1. To compensate for the unbalanced class, weighted cross-entropy balanced positive and negative samples are used in the transfer learning tweak.

Step S5: and (5) testing the model. The training data set was manually annotated with 10 virus cases from COVID-CT-MD. The data is divided into 4: 3: and 3, respectively used for training, verifying and testing. In addition, to verify the robustness and generalization of the model, another common data set virus-CT-Seg was used as the model evaluated by the extended test. During testing, the CT image is only required to be standardized according to the formula (5), then the standardized image is directly input into a system, and a virus focus segmentation result graph can be obtained after a period of time.

FIG. 3 shows a comparison of the segmentation results of the model designed herein and other existing models. In fig. 3, (1), (2), (3), and (4) represent different horizontal slice views of CT images, respectively. (a) (b), (c), (d), (e), (f), (g) are true tags, segmentation maps of the models invented herein and segmentation maps obtained from five models, 2D U-Net, V-Net, 3D U-Net, 3DAttention U-Net and LCOV-Net, respectively. From the observation of the figure, the segmentation map of the viral foci obtained by the model of the invention is very close to the real label. In contrast, other methods presented many over-segmentations, suggesting that the inventive model has superior performance in segmenting viral lesions. As can be seen from the cases (1) and (4), 2D-UNet marks the extra-pulmonary region as a lesion, which is a serious error and unacceptable at the time of diagnosis. Analytically, the reason for this problem is that the two-dimensional segmentation method cannot extract critical stereo lung edge information. (2) And (3) other three-dimensional segmentation methods are described, and a plurality of false positive pixels are predicted. Analysis is due to the inability of other methods to adapt to the size of the lesion area and to not preserve global features of CT when dicing. This demonstrates the effectiveness of the inventive MPDC block and cascade structure.

The segmentation problem faced is a binary problem of pixels, i.e. the lesion area is identified as 1 and the background pixel is identified as 0. In the implementation process, precision rate, recall rate, Dice coefficient, and intersection ratio (IOU) are used as evaluation criteria in the test process. Where accuracy, also referred to as precision, is the probability that a sample is actually a road sample among all samples predicted as roads, represents the percentage of the total samples that are predicted to be correct. The recall ratio represents the probability of being predicted as a road pixel among the actual road pixels. The accuracy and recall rate evaluation standards are compared on one side, and the Dice coefficient and the IOU are more comprehensive indexes.

Wherein TP, TN, FN and FP respectively represent true positive, false positive, true negative and false negative pixels of the predicted image pixel.

The main innovation point of the method is that a multi-scale parallel cavity rolling block is designed, so that the calculated amount is reduced, and the model has self-adaptability. Tests on two data are carried out, and the virus focus extraction model designed by the invention is proved to have better performance.

It will be evident to those skilled in the art that the invention is not limited to the details of the foregoing illustrative embodiments, and that the present invention may be embodied in other specific forms without departing from the spirit or essential attributes thereof; the present embodiments are therefore to be considered in all respects as illustrative and not restrictive, the scope of the invention being indicated by the appended claims rather than by the foregoing description, and all changes which come within the meaning and range of equivalency of the claims are therefore intended to be embraced therein; any reference sign in a claim should not be construed as limiting the claim concerned.

Furthermore, it should be understood that although the present description refers to embodiments, not every embodiment may contain only a single embodiment, and such description is for clarity only, and those skilled in the art should integrate the description, and the embodiments may be combined as appropriate to form other embodiments understood by those skilled in the art.

Claims

1. The virus focus segmentation method based on the transfer learning and the cascade adaptive empty hole convolution is characterized by comprising the following steps of:

step S1: a multi-scale parallel cavity convolution MPDC block is designed based on CNN to carry out feature extraction, so that multi-scale self-adaptation to medical image features is realized;

step S3: and the overall design is carried out on the implementation process by utilizing the designed MPDC block, so that the CT image is segmented, and the result is obtained.

2. The virus lesion segmentation method based on the transfer learning and cascade adaptive hole convolution of claim 1, wherein a three-dimensional convolution is used to design a parallel multi-branch convolution for adaptive CT image lesion features of different sizes; the MPDC block is designed in a mode of being formed by four parallel branches; the first branch is a 3 multiplied by 3 maximum pooling layer, and the most obvious characteristics of the image are reserved; the second branch is a 3 multiplied by 3 ordinary convolution layer and is used for extracting small-scale features of the image; the third and fourth branches are respectively void convolution layers with expansion rates of 4 and 8, and the purpose of the two branches is to increase the field of perception of convolution on the premise of not increasing the calculation amount so as to extract the features of the image with larger scale; the 4 parallel branches simultaneously process the feature graph input by the previous layer, and then the outputs of all the branches are connected according to the channel dimension; after superposition, reducing the calculated amount through a 1 × 1 × 1 convolution layer; the MPDC block automatically adapts to the characteristics of different sizes in a mode that four branches are connected in parallel; the void convolution and the introduction of 1 × 1 × 1 reduce the model parameters and the amount of computation.

3. The method for virus lesion segmentation based on migration learning and cascade adaptive hole convolution of claim 1 or 2, wherein the MPDC block uses a classical structure of feature extraction and segmentation, encoder and decoder structures; in a specific implementation process, four MPDC blocks are connected in series to form an encoder to extract the characteristics of a CT image; applying a 2 × 2 × 2 max pooling layer after the first 3 MPDC blocks to downsample the feature map to reduce the data size of the image while extracting features of higher dimensionality; the feature decoder comprises three groups of convolutions, wherein each group of convolutions comprises a common 3 multiplied by 3 convolution and an up-sampling module which are connected in series; 3 upsampling layers are used in the decoder to restore the feature map to the size of the original image; feeding the output of each MPDC block directly to a decoder using a skip connection; the output channels of the four MPDC blocks are designed to be 64, 128, 256, 512, respectively.

4. The virus lesion segmentation method based on the transfer learning and the cascade adaptive hole convolution as claimed in claim 1, wherein step S3 designs a cascade framework to capture the whole information of the CT image and reduce the hardware requirement; in a cascade network of a cascade frame, a CoarseNet of a former network performs coarse segmentation on an image with low resolution, and a FineNet of a latter network performs fine segmentation based on the result of the coarse segmentation network; the whole framework is as follows:

{O₁，O₂，……，O_nrepresents the original resolution CT image after normalization; first high resolution O_iDown-sampling to one eighth of the original resolution

And used as training data for CoarseNet; secondly, will

Then, the high resolution image O 'is processed by adopting a cascade attention block'_iAnd coarse segmentation picture U'_iLearning a new feature map A as input_i(ii) a Characteristic diagram A_iIn (1) U_iIs prepared by mixing

Up-sampling eight times to the original resolution; respectively adding O_iAnd U_iDicing to obtain small pieces of O'_iAnd U'_i(ii) a Will cascade the output A of the attention module_iFineNet is input and a final accurate segmentation map S of the infected area is obtained_i(ii) a CoarseNet andFineNet is the core component of the whole partitioning system, and adopts a classical coder-decoder structure.

5. The method for segmenting viral lesions based on the migration learning and the cascade adaptive hole convolution as claimed in claim 4, characterized in that a cascade attention block is designed to connect the two parts to fully fuse the rough segmentation and the fine segmentation;

extracting global features from the low-resolution CT image by using CoarseNet; these coarse segmentation maps may help filter background information, focus on the target area; in order to utilize the rough segmentation, an attention block is adopted to fuse the cascade information; the implementation method of the cascade level attention block is specifically described next; from O'_iRecording as a slice of the original CT image, in U'_iRecording as a roughly segmented image;

C_i＝Re LU(BN(Conv(O′_i))) (1)

F_i＝Re LU(BN(Conv(U′_i))) (2)

wherein represents ^ represents a sigmoid activation function;

connections representing channel dimensions;

represents a Hadamard product; ReLU, BN, Conv represent ReLU activation function, batch normalization and 1 × 1 × 1 convolution layer, respectively; in the implementation process, 64 1 × 1 × 1 convolution layers are used in the formulas (1) and (2), and C is added_iAnd F_iRespectively sampling to 64; (3) formula (II)1 convolution block is used to obtain a coefficient matrix Coe with a channel dimension of 1.

6. The method for viral lesion segmentation based on migratory learning and cascade adaptive hole convolution according to claim 1, wherein the faced segmentation problem is a binary classification problem of pixels, i.e. lesion region is identified as 1 and background pixels are identified as 0; in the implementation process, the precision rate, the recall rate, the Dice coefficient and the cross-over ratio (IOU) are used as evaluation standards in the test process; wherein, the accuracy rate represents the percentage of the result with correct prediction to the total sample, the accuracy rate is also called precision rate, which is the probability of actually being a road sample in all samples predicted as roads; the recall rate represents the probability of being predicted as a road pixel among the actual road pixels; the accuracy rate and the recall rate evaluation standard are compared on one surface, and the Dice coefficient and the IOU are more comprehensive indexes;