CN115775252A

CN115775252A - Magnetic resonance image cervical cancer tumor segmentation method based on global local cascade

Info

Publication number: CN115775252A
Application number: CN202211135688.5A
Authority: CN
Inventors: 方发明; 金智伟; 张桂戌
Original assignee: East China Normal University
Current assignee: East China Normal University
Priority date: 2022-09-19
Filing date: 2022-09-19
Publication date: 2023-03-10

Abstract

The invention discloses a magnetic resonance image cervical cancer tumor segmentation method based on global local cascade, which belongs to the field of medical image processing and computer vision, and is characterized in that the method solves the problem of small focus segmentation for the fusion of global characteristics and local characteristics, and specifically comprises the following steps: preprocessing a training data set; establishing a global and local cascade magnetic resonance image cervical cancer tumor segmentation network model; model reasoning to obtain a focus segmentation result. Compared with other methods in the prior art, the method simultaneously considers and fuses the global characteristics and the local characteristics of the image, and effectively solves the problem of finishing accurate segmentation of the tiny cervical cancer tumor on the magnetic resonance image under the condition of limited computing resources.

Description

Magnetic resonance image cervical cancer tumor segmentation method based on global local cascade

Technical Field

The invention relates to the technical field of magnetic resonance image processing, in particular to a magnetic resonance image cervical cancer lesion segmentation method based on global local cascade.

Background

Cervical cancer is the second most common cancer worldwide, affecting nearly 50 more than ten thousand women worldwide each year, resulting in 30 more than ten thousand deaths. Magnetic resonance imaging is the best screening tool for the assessment of pre-treatment for cervical cancer. The cervical lesion segmentation on the magnetic resonance image is an important step in the cervical cancer diagnosis process, and a doctor can accurately analyze and calculate the radiation dose according to the marked lesion region so as to provide guidance for the subsequent operation.

However, the magnetic resonance image is affected by factors such as field intensity, scanning thickness, field of view, etc., which may result in low image signal-to-noise ratio and unclear lesion edges. Therefore, it is an important research direction of intelligent medical treatment to help doctors to improve diagnosis efficiency, reduce working strength, and implement accurate and repeatable automatic segmentation algorithm to detect cancer foci.

Since the development of deep learning, methods represented by convolutional neural networks have been widely applied to various fields of computer vision, and various network models based on encoder-decoder structures have appeared to realize magnetic resonance image segmentation. These methods do not distinguish the object to be detected, but the lesion of cervical cancer to be segmented is very small and the difference between slices is large, and thus it is difficult to achieve satisfactory results with these methods. In addition, some methods directly use a 3D convolutional network to improve the segmentation performance in consideration of the three-dimensional characteristics of the magnetic resonance image, but there are significant disadvantages such as large calculation amount, difficult optimization, and poor real-time performance. And because the three-dimensional region of the focus also occupies less parts in the original image, more sub-images containing tumors are difficult to be marked during preprocessing. Therefore, it is challenging to design a deep convolutional neural network to segment cervical cancer lesions in mri.

Disclosure of Invention

The invention aims to provide a magnetic resonance image cervical cancer focus segmentation method based on global local cascade aiming at the defects of the prior art, the method is a medical image semantic segmentation neural network model based on an encoder-decoder framework, the method adopts a global local cascade strategy, an image to be processed is input into a trained model for cervical cancer tumor segmentation, and the model outputs a precisely segmented cervical cancer tumor region, so that the problem that a small target focus cannot be accurately segmented is well solved, the accurate analysis and the calculation of radiation dose in the medical diagnosis process are facilitated, and guidance is provided for the subsequent operation.

The specific technical scheme for realizing the purpose of the invention is as follows:

a cervical cancer focus segmentation method based on global local cascade magnetic resonance imaging is characterized in that fusion of global features and local features specifically comprises the following steps:

step 1: preprocessing the data set;

the method is realized on a CeTS magnetic resonance image data set, slices are extracted from a 3D magnetic resonance image, a training set and a testing set are divided, the image is subjected to data enhancement through random horizontal overturning, vertical overturning and scaling, and data are normalized;

step 2: constructing a neural network model based on global local cascade;

based on a PyTorch deep learning framework, a magnetic resonance image cervical cancer tumor segmentation neural network model based on global local cascade is constructed, the whole model is based on a coder-decoder framework, and the following modules are inserted into paths of a coder and a decoder in a cascade mode:

a) A global feature fusion module: inputting H multiplied by W multiplied by 3 samples, wherein H and W represent the height and the width of an image, sequentially passing through three 3 multiplied by 3 convolutional sub-blocks, and each convolutional sub-block comprises a convolutional layer, a compression-excitation block and a ReLU activation function; then, adjusting the importance of the feature map on the channel dimension according to the attention value generated by the compression-excitation block, so that more global information is aggregated by the multi-scale feature map; then extracting output from the three volume blocks, fusing the three volume blocks together, and extracting scales with different spatial characteristics from the fused three volume blocks; finally, a 1 × 1 convolutional layer is introduced among the blocks to increase the jump connection; the output size of the module is H multiplied by W multiplied by C ₁ Characteristic diagram of (1), C ₁ Representing the number of intermediate channels;

b) A characteristic decomposition and recombination module: the input being the output F of the global feature fusion module _s0 Then, a feature map F requiring local processing is selected from the global feature maps using a 1 × 1 convolution _s Then using a characteristic decomposition method to decompose F _s The decomposition is carried out into four subblocks of the same size, each subblock being processed using a convolution, resulting in F ₁ 、F ₂ 、F ₃ 、F ₄ (ii) a Then, using a characteristic recombination method, splicing two blocks adjacent to each other left and right to obtain two characteristic graphs F ₁₂ And F ₃₄ They are all half of the original input and act on F using another convolutional layer ₁₂ And F ₃₄ To extract more features in a larger pixel domain; for two blocks adjacent up and down, the same operation as that of the block adjacent left and right is also executed, namely a characteristic diagram F is obtained ₁₃ And F ₂₄ (ii) a Subsequent splicing F ₁₂ And F ₃₄ To obtain F ₁₂₃₄ Splicing F ₁₃ And F ₂₄ To obtain F ₁₃₂₄ And to F ₁₂₃₄ 、F ₁₃₂₄ Performing convolution; finally F is mixed ₁₂₃₄ 、F ₁₃₂₄ Connecting with the feature map of the original input on a channel, and fusing the feature map and the feature map by using convolution of 1 multiplied by 1;

the formula of the feature decomposition method is expressed as:

F ₁ ＝Conv(F _s [0:C,0:H/2+B,0:W/2+B]),

F ₂ ＝Conv(F _s [0:C,0:H/2+B,W/2-B:W]),

F ₃ ＝Conv(F _s [0:C,H/2-B:H,0:W/2+B]),

F ₄ ＝Conv(F _s [0:C,H/2-B:H,W/2-B:W]),

wherein

subscripts

1,2,3 and 4 of F respectively represent feature maps of upper left, upper right, lower left and lower right positions after division, and Fs represents output F of the global feature fusion module _s0 Performing 1 × 1 convolution, wherein Conv represents convolution operation, C, H and W respectively represent the number, height and width of channels of the feature map, and B represents the length of the overlapping part of two adjacent subblocks;

the formula of the characteristic reorganization method is expressed as follows:

F ₁₂ ＝Conv(F ₁ (s)F ₂ ),

F ₃₄ ＝Conv(F ₃ (s)F ₄ ),

F ₁₃ ＝Conv(F ₁ (s)F ₃ ),

F ₂₄ ＝Conv(F ₂ (s)F ₄ ),

wherein(s) is a splicing operator, which indicates that two adjacent feature maps are spliced in a spatial plane;

the formula for fusion is expressed as:

F ₁₂₃₄ ＝Conv(F ₁₂ (s)F ₃₄ ),

F ₁₃₂₄ ＝Conv(F ₁₃ (s)F ₂₄ ),

F _out ＝Conv(F ₁₂₃₄ (c)F ₁₃₂₄ (c)F _in ),

wherein (c) represents a channel join operator;

c) Channel spatial attention gating module: first using decoder characteristics

Generating a channel attention map g of the relationship between different channels _c ∈R ^C×1×1 (ii) a Then for g _c Performing broadcast operations, extending to g _c ∈R ^C×H×W And compares it with the decoder characteristics x _d Corresponding encoder feature x _e ∈R ^C×H×W Multiplying pixel by pixel to obtain x _c ∈R ^C×H×W (ii) a Then use x _c Spatial relationship generation spatial attention s _s ∈R ^1×H×W (ii) a Finally, attention is paid to space s _s And encoder characteristic x _e Carrying out pixel-by-pixel multiplication to obtain a characteristic diagram x which applies attention simultaneously in a spatial domain and a channel direction _out ∈R ^C×H×W ；

And step 3: training a network model;

inputting the preprocessed training set into the neural network model for training, and using a loss function integrating the Dice loss and the cross entropy loss for constraint, wherein the loss function is expressed as:

where λ represents the balance factor between the two loss functions, which is set to 0.5, n represents the total number of pixels, C represents the number of channels contained in the prediction matrix, p represents the pixel value label as a positive sample,

representing the probability that the pixel value is predicted as a positive sample,

indicating that the real label of each channel is dot product operated with the matrix of the corresponding prediction result,

represents the square of the euclidean norm operation; a verification set is also divided in the training process, and a neural network model with the best performance is stored on the verification set;

and 4, step 4: segmentation of cervical cancer lesions;

and (4) outputting a visual result of the lesion segmentation on the test set by using the optimal network model stored in the step (3).

Compared with the prior art, the invention has the following beneficial technical effects:

1) The constructed network model fuses the global features and the local features of the image, so that the features of the small focus are not lost in the network transmission process, and the accurate segmentation of the small target focus is realized;

2) 2D slices are used as input of the model, so that the calculation cost is low, and the model reasoning speed is high;

3) The image multi-level feature fusion is realized by using the cascade connection of the global feature fusion module and the feature decomposition recombination module, and the output is concentrated on the specific feature by using a channel space attention gating mechanism in the transmission process.

Drawings

FIG. 1 is a schematic diagram of a neural network model constructed according to the present invention;

FIG. 2 is a schematic diagram of a global feature fusion module;

FIG. 3 is a schematic diagram of a feature decomposition and reorganization module;

FIG. 4 is a schematic diagram of a channel spatial attention gating module;

FIG. 5 is a schematic image of the experimental results of the model of the example.

Detailed Description

The invention is described in detail below with reference to the figures and examples.

Referring to fig. 1, the present invention performs cervical cancer lesion segmentation as follows:

step 1: preprocessing a data set

The method is mainly realized on a CeTS magnetic resonance image data set, slices are extracted from a 3D magnetic resonance image, a training set and a testing set are divided, the image is subjected to data enhancement through random horizontal overturning, vertical overturning and scaling, and data normalization is performed.

Step 2: construction of magnetic resonance image cervical cancer focus segmentation neural network model based on global local cascade

Based on a PyTorch deep learning framework, a magnetic resonance image cervical cancer tumor segmentation neural network model based on global local cascade is constructed, and the model consists of an encoder, a decoder, a global feature fusion module, a feature decomposition recombination module and a channel space attention gating mechanism.

And step 3: training of network models

Inputting the preprocessed data samples in the training set into a neural network model for training, dividing a verification set in the training process, and storing the network model weight with the best performance on the verification set.

And 4, step 4: segmentation of cervical cancer lesions

And 3, performing model reasoning on the test set by using the optimal network model stored in the step 3, and outputting a visual result of focus segmentation.

The neural network model constructed in the step 2 consists of five parts, which are respectively: the device comprises an encoder, a decoder, a global feature fusion module, a feature decomposition and recombination module and a channel space attention gating module.

The global feature fusion module and the feature decomposition module are cascaded on each layer of path of the encoder to realize the fusion of the global features and the local features, and meanwhile, the global features and the local features of the image are reserved and the feature recombination is carried out.

The channel spatial attention gating module acts on the decoder stage, and in each layer of the decoder, the channel spatial attention gating module is used for reducing the noise of the characteristic diagram and enabling the characteristic diagram to focus on specific characteristics for the up-sampled deep characteristic diagram and the shallow characteristic diagram transmitted by the same layer of the encoder.

Examples

Step 1: preprocessing a data set

This embodiment is implemented on a CeTS magnetic resonance image dataset, where a slice is extracted from a 3D magnetic resonance image, a training set and a test set are partitioned, and the image is data-enhanced and data is normalized by random horizontal inversion, vertical inversion and scaling.

The CeTS mri dataset selected in this example is clinical data from a certain obstetrical and gynecological hospital. It included 1800 sagittal T2-weighted samples and detailed neck tumor regions marked by experienced radiologists. Each 3D sample contains 20 to 35 2D slices, each of 256 x 256 pixels in size.

For each sample, all of its sections were used, and each section may have an effect on the diagnosis of cervical cancer due to clinical diagnosis by a physician. Furthermore, unlike many other medical image segmentation datasets, most early cervical tumors in the CeTS dataset are very small. Therefore, the original image size is directly used without cropping and resampling, to avoid that the features of the small objects change or even disappear after performing these operations. All 1800 samples were scaled by 7:3 into training and test sets, and then convert the 3D samples into 2D slices. Finally, 7089 sections for training and 2972 sections for testing were obtained. For each slice, it is cut to between 0.5% and 99.5% according to the intensity values of the whole picture. Subsequently, the mean and standard deviation of each channel are calculated independently within the non-zero region, and then all two-dimensional slices are normalized with these mean and standard deviation. No normalization process is performed on the background because the background portion is too large and the pixel values are all zero. And then, the image is randomly horizontally turned, vertically turned and scaled to realize image enhancement. To speed up the convergence speed of the model, the input is set to three channels, the middle channel being the image to be segmented, the other two channels being slices of the middle channel image that are adjacent in the original image.

Referring to fig. 1, the magnetic resonance image cervical cancer lesion segmentation neural network model based on global local cascade constructed by the present invention mainly comprises five parts, namely an encoder, a decoder, a global feature fusion module, a feature decomposition recombination module, and a channel space attention gating module.

The present embodiment uses the encoder-decoder architecture, and uses a global feature fusion module, a feature decomposition and reorganization module on the encoder path, and a channel space attention gating module on the decoder path.

Referring to fig. 2, in each global feature fusion module, 3 × 3 convolution sub-blocks, each containing a convolution layer, a compression-excitation block, and a ReLU activation function, are first used in sequence. Adjusting the importance of the feature maps in the channel dimension according to the attention values generated by the compression-excitation block can enable the multi-scale feature maps to aggregate more global information. Outputs are then extracted from the three volume blocks and fused together to extract the scale of spatial features from them. This series of three volume blocks simultaneously possesses the receptive fields produced by the 3 x 3,5 x 5 and 7 x 7 convolution operations, similar to the inclusion blocks, but with fewer parameters and computations. The hopping connection is then also increased by introducing 1 x 1 convolutional layers between the blocks.

Referring to fig. 3, in each feature decomposition and reorganization module, a feature map requiring local processing is first selected from the global feature map using a convolution of 1 × 1, and then decomposed into four sub-blocks of the same size, each of which is processed using one convolution. Then we stitch two blocks adjacent to each other right and left to get two feature maps, both half of the original input, and use another convolutional layer to extract more features in a larger pixel domain from the two stitched feature maps. The same operation is used for two sets of tiles spliced one above the other. Finally, we stitch the two local feature maps and the original input global feature and then have a 1 × 1 convolution to fuse them.

Referring to fig. 4, the channel spatial attention gating mechanism focuses on specific features by integrating the low-level feature information of the encoder and the high-level feature information of the decoder to reduce the noise response. First, the inter-channel relationships of the decoder features are multiplied by the corresponding encoder features to obtain a channel attention map. Then, a spatial attention map is generated by multiplying the same encoder features by the relationship between the output feature spaces of the previous step.

And step 3: training of network models

This example is based on a PyTorch1.8 environment and is trained using a 24GB NVIDIAGeForceRTX 3090 graphics card. All models were trained from scratch, initialized using Kaiming, and the network optimized using Adam optimizer, with initial learning rate set to 0.0003, batch size set to 20 during training, and number of epochs set to 250 for training. Further, the present example inputs three consecutive slices together into the network to learn the continuity information between the slices. In the embodiment, the preprocessed data is input into the neural network model for training, a verification set is also divided in the training process, and the network model weight with the best performance is stored in the verification set.

The image segmentation performance of the model is quantified by using five common evaluation indexes, namely average cross-over ratio (mIoU), dice coefficient (DSC), accuracy (Acc), sensitivity (Sen) and accuracy (Prec), and the higher the evaluation index value is, the better the network segmentation effect is.

The average intersection ratio (mlou) is calculated by the following formula:

the Dice coefficient (DSC) is calculated by the following formula:

the accuracy (Acc) is calculated by the following formula:

the sensitivity (Sen) is calculated by the following formula:

the accuracy (Prec) is calculated by the following formula:

where TP, TN, FP, and FN represent positive examples correctly classified, negative examples correctly classified, positive examples misclassified, and negative examples misclassified in the prediction, respectively.

In the training process, the cross entropy loss function is used for carrying out pixel-level constraint on the segmentation result, and the Dice loss function is used for carrying out overall constraint on the segmentation result.

Wherein the cross entropy loss function is defined as:

the Dice loss function is defined as:

the final loss function is expressed as:

where p represents the probability of labeling as a lesion for each pixel on the image,

indicating, for each pixel of the segmentation result, the probability of being predicted as a tumor,

indicating that the labeling result and the prediction result perform dot product on the corresponding channels,

representing the euclidean norm operation, λ represents the balance factor between the two penalty functions, set in this embodiment to 0.5, n the number of pixels, and C the number of channels of the prediction matrix.

And 4, step 4: segmentation of cervical cancer lesions

Referring to fig. 5, the magnetic resonance image to be processed (5 a) is input into the optimal network model saved in step 3, and is processed by the network model to obtain an image segmentation result map (5 c), wherein the image segmentation result map (5 b) is the tumor position marked by the doctor.

The neural network constructed by the method is compared with U-Net, attention-Net, unet 3+, multiResunet, DC-UNet, dualNorm-UNet and EAR-U-Net, and evaluation index values of each network model on a CeTS test set are shown in the following table 1 because of other models in the aspects of average cross-over ratio (mIoU), dice coefficient (DSC), accuracy (Acc) and sensitivity (Sen):

table 1: evaluation index of each network on CeTS data set

The effectiveness of the global feature fusion module, the feature decomposition recombination module and the channel space attention gating module is proved through ablation experiments, and the experimental results are shown in table 2:

table 2: ablation effect of modules on CeTS dataset

Wherein Net1 is the backbone of the network architecture, which is a simple improvement of the original U-Net. Net2, net3 and Net4 use different global convolution modules for training, and ResBlock, multiResBlock and the global feature fusion module provided by the invention are respectively used. Net5 is trained using a global feature fusion module and a feature decomposition and reorganization module. Net6 and Net7 add attention gating and the channel spatial attention gating proposed by the present invention to Net 5.

The above embodiments describe in detail a magnetic resonance imaging cervical cancer lesion segmentation method based on global local cascade. For those skilled in the art, according to the idea of the present invention, there may be variations in the specific implementation and application range, for example, the encoder of the network model may be replaced by an encoder such as VGG, resNet, efficientNet, etc. to extract features, and the gating mechanism of the network may be replaced by an SE block, a CBAM block, etc., and in conclusion, this description should not be construed as a limitation to the present invention.

Claims

1. A magnetic resonance image cervical cancer tumor segmentation method based on global local cascade is characterized by comprising the following specific steps:

step 1: preprocessing the data set;

the method is realized on a CeTS magnetic resonance image data set, slices are extracted from a 3D magnetic resonance image, a training set and a testing set are divided, data enhancement is carried out on the image through random horizontal overturning, vertical overturning and scaling, and data normalization is carried out;

and 2, step: constructing a neural network model based on global local cascade;

based on a PyTorch deep learning framework, a neural network model based on global local cascade is constructed, the model is wholly based on an encoder-decoder framework, and the following modules are inserted into paths of an encoder and a decoder in a cascade mode:

a) A global feature fusion module: inputting H multiplied by W multiplied by 3 samples, wherein H and W represent the height and the width of an image, sequentially passing through three 3 multiplied by 3 convolutional sub-blocks, and each convolutional sub-block comprises a convolutional layer, a compression-excitation block and a ReLU activation function; then, adjusting the importance of the feature map on the channel dimension according to the attention value generated by the compression-excitation block, so that more global information is aggregated by the multi-scale feature map; then extracting output from the three volume blocks, fusing the three volume blocks together, and extracting scales with different spatial characteristics from the fused output; finally, 1 × 1 convolution layers are introduced among the blocks to increase jump connection; the output size of the module is H multiplied by W multiplied by C ₁ Characteristic diagram of (1), C ₁ Representing the number of intermediate channels;

b) A characteristic decomposition and recombination module: the input isOutput of the Global feature fusion Module F _s0 Then, a feature map F requiring local processing is selected from the global feature maps using a 1 × 1 convolution _s Then using a characteristic decomposition method to decompose F _s Decomposing into four sub-blocks of the same size, each sub-block being processed using a convolution to obtain F ₁ 、F ₂ 、F ₃ 、F ₄ (ii) a Then, using a characteristic recombination method, splicing two blocks adjacent to each other left and right to obtain two characteristic graphs F ₁₂ And F ₃₄ They are all half of the original input and act on F using another convolutional layer ₁₂ And F ₃₄ To extract more features in a larger pixel domain; for two blocks adjacent up and down, the same operation as that of the block adjacent left and right is also executed, namely a characteristic diagram F is obtained ₁₃ And F ₂₄ (ii) a Subsequent splicing F ₁₂ And F ₃₄ To obtain F ₁₂₃₄ Splicing F ₁₃ And F ₂₄ To obtain F ₁₃₂₄ And to F ₁₂₃₄ 、F ₁₃₂₄ Performing convolution; finally F is put ₁₂₃₄ 、F ₁₃₂₄ Connecting with the feature map of the original input on a channel, and fusing the feature map and the feature map by using convolution of 1 multiplied by 1;

the formula of the feature decomposition method is expressed as:

F ₁ ＝Conv(F _s [0:C,0:H/2+B,0:W/2+B]),

F ₂ ＝Conv(F _s [0:C,0:H/2+B,W/2-B:W]),

F ₃ ＝Conv(F _s [0:C,H/2-B:H,0:W/2+B]),

F ₄ ＝Conv(F _s [0:C,H/2-B:H,W/2-B:W]),

wherein subscripts 1,2,3 and 4 of F respectively represent feature maps of upper left, upper right, lower left and lower right positions after division, and Fs represents output F of the global feature fusion module _s0 Performing 1 × 1 convolution, wherein Conv represents convolution operation, C, H and W respectively represent the number of channels, height and width of a feature map, and B represents the length of an overlapping part of two adjacent subblocks;

the formula of the characteristic recombination method is expressed as follows:

F ₁₂ ＝Conv(F ₁ (s)F ₂ ),

F ₃₄ ＝Conv(F ₃ (s)F ₄ ),

F ₁₃ ＝Conv(F ₁ (s)F ₃ ),

F ₂₄ ＝Conv(F ₂ (s)F ₄ ),

wherein(s) is a splicing operator, and represents that two adjacent characteristic graphs are spliced in a space plane;

the formula for fusion is expressed as:

F ₁₂₃₄ ＝Conv(F ₁₂ (s)F ₃₄ ),

F ₁₃₂₄ ＝Conv(F ₁₃ (s)F ₂₄ ),

F _out ＝Conv(F ₁₂₃₄ (c)F ₁₃₂₄ (c)F _in ),

wherein (c) represents a channel join operator;

c) Channel spatial attention gating module: first using decoder characteristics

Generating a channel attention map g of the relationship between different channels _c ∈R ^C×1×1 (ii) a Then for g _c Performing broadcast operation, extending to g _c ∈R ^C×H×W And compares it with the decoder characteristic x _d Corresponding encoder feature x _e ∈R ^C×H×W Multiplying pixel by pixel to obtain x _c ∈R ^C×H×W (ii) a Then utilize x _c Spatial relationship generation spatial attention map s _s ∈R ^1×H×W (ii) a Finally, note the space as s _s And encoder characteristic x _e Pixel-by-pixel multiplication is carried out to obtain a feature map x which applies attention simultaneously in the spatial domain and the channel direction _out ∈R ^C×H×W ；

And step 3: training a network model;

where λ represents the balance factor between the two penalty functions, which is set to 0.5, n represents the total number of pixels, C represents the number of channels contained in the prediction matrix, p represents the pixel value label as a positive sample,

indicating that the real label of each channel is dot-product operated with the matrix of the corresponding prediction result,

represents the square of the euclidean norm operation; a verification set is further divided in the training process, and the weight of the neural network model with the best performance is stored in the verification set;

and 4, step 4: segmenting a cervical cancer focus;

and (4) outputting a visual result of the lesion segmentation on the test set by using the optimal neural network model stored in the step (3).