CN116486075A

CN116486075A - HRNet-based multi-scale strong fusion semantic segmentation method for extracting ground features of remote sensing image

Info

Publication number: CN116486075A
Application number: CN202310337060.1A
Authority: CN
Inventors: 宋永端; 龙鸿; 吴将娱; 姚栋; 胡芳; 张景; 刘伯威; 王玉娟
Original assignee: Chongqing University
Current assignee: Chongqing University
Priority date: 2023-03-31
Filing date: 2023-03-31
Publication date: 2023-07-25

Abstract

The invention discloses a multi-scale strong fusion semantic segmentation method for extracting ground features of a remote sensing image based on HRNet. Based on the feature element extraction sample data set, a multi-scale strong fusion semantic segmentation network MT-HRNet fusing a triple attention mechanism is constructed, based on the constructed MT-HRNet semantic segmentation model, training is carried out on the training set, model parameters are optimized, and a preliminary feature element extraction result is obtained. And calculating the segmentation loss based on the preliminary feature element extraction result and the real label of the remote sensing image. The segmentation loss can guide the MT-HRNet feature extraction network to perform sufficient feature extraction, and the segmentation precision is improved. Until the MTC-HRNet model converges. According to the method, the HRNet network is used for semantic segmentation of the remote sensing image, so that the feature extraction capacity of the remote sensing image is improved, and the feature extraction result is more accurate.

Description

HRNet-based multi-scale strong fusion semantic segmentation method for extracting ground features of remote sensing image

Technical Field

The invention relates to a remote sensing image, ground feature element extraction, deep learning, attention mechanism, multi-scale strong fusion and semantic segmentation technology, in particular to a multi-scale strong fusion and semantic segmentation method for extracting a ground feature of a remote sensing image based on HRNet.

Background

With the development of aerospace technology, a large amount of remote sensing data can be obtained every day, and the intrinsic information can be obtained from the data, so that the life and production of people can be effectively helped. The feature element extraction is one of basic tasks of remote sensing image analysis, and the result provides important data support for town construction and planning, land resource management, feature element duty ratio statistics, land mapping and the like, so that the method has wide application. However, the existing semantic segmentation model is mostly developed for natural images, the direct application of the semantic segmentation model to remote sensing images is not good in segmentation effect, the segmentation model related to the existing semantic segmentation algorithm for the remote sensing images is old, and targeted optimization and improvement are seldom performed. The feature of the input remote sensing image can be automatically extracted after the semantic segmentation model is trained, an end-to-end segmentation network is formed, and the accuracy of the output segmentation result is higher. The satellite remote sensing data based ground feature element extraction greatly reduces labor cost, is beneficial to promoting reasonable planning and use of the land resources in China, and has higher accuracy in mastering the condition of the land resources.

Therefore, the patent provides a multi-scale strong fusion semantic segmentation method for extracting ground features of remote sensing images based on HRNet. The method solves the problem that the accuracy of dividing the target category is insufficient in the remote sensing image ground feature element extraction task. According to the method, the HRNet network is used for semantic segmentation of the remote sensing image, so that the feature extraction capacity of the remote sensing image is improved, and the feature extraction result is more accurate. And a global self-adaptive upsampling and multi-scale information strong fusion strategy is fused, the problem of insufficient fusion of feature maps with different resolutions is solved, and more spatial position information and advanced semantic information are reserved. The triple attention mechanism TAM is added, and the problem that the division of the slender categories such as 'water body', 'road' and the like is discontinuous or unpredictable during the division is solved. The remote sensing image information is introduced into a network training process, segmentation loss is generated by comparing tag true values, the training process of the semantic segmentation network is effectively guided, and the extraction capability of detail texture features is improved while the feature classification is carried out. The invention can extract the remote sensing image ground feature information with different resolutions, more accurately divide the ground feature categories, extract the ground feature categories with different colors, texture features and different scales, and improve the extraction capability of the slender category.

Disclosure of Invention

The technical solution of the invention is as follows: a multi-scale strong fusion semantic segmentation method for extracting ground features of remote sensing images based on HRNet is provided. The ground feature element extraction task of the remote sensing image is realized, and the precision and connectivity are ensured.

The technical scheme of the invention is as follows: a method for extracting multi-scale strong fusion semantic segmentation of a ground object of a remote sensing image based on HRNet comprises the steps of firstly obtaining remote sensing data, dividing the data into data sets, and forming a ground element extraction sample data set. Based on the feature element extraction sample data set, a multi-scale strong fusion semantic segmentation network MT-HRNet fusing a triple attention mechanism is constructed, based on the constructed MT-HRNet semantic segmentation model, training is carried out on the training set, model parameters are optimized, and a preliminary feature element extraction result is obtained. And calculating the segmentation loss based on the preliminary feature element extraction result and the real label of the remote sensing image. The segmentation loss can guide the MT-HRNet feature extraction network to perform sufficient feature extraction, and the segmentation precision is improved. Until the MTC-HRNet model converges. And predicting the test set based on the converged MT-HRNet semantic segmentation model to obtain a feature element extraction result. The method comprises the following specific steps:

(1) Remote sensing data preprocessing and data set partitioning.

In the step (1), the remote sensing data is preprocessed to be processed into a training file in the VOC format, and a remote sensing image sample data set is generated.

Further, the remote sensing data tag is subjected to VOC format adjustment. The processing may be performed by Python script. The input picture is a JPG picture, and the label is a PNG picture. The value of each pixel of the tag is the type to which that pixel belongs. Such as: the background pixel value is 0, the target class 1 pixel value is 1, the target class 2 pixel value is 2, the target class 3 pixel value is 3, the following are included.

(2) Performing feature element extraction on the remote sensing image sample data set obtained in the step (1) to construct a semantic segmentation model MT-HRNet, wherein the method specifically comprises the following steps of:

(a) And constructing a remote sensing image ground feature element feature extraction network by adopting structures of parallel sub-networks with different resolutions. A 256×256 size three-channel picture is input, and the input is first downsampled twice using a convolution layer (containing BN and ReLU) with a convolution kernel size of 3×3, a step size of 2, and a padding of 1. At this point a total of four times down-sampling occurs. And then, carrying out channel number adjustment through a Layer1 module without changing the size of the feature diagram, wherein the Layer1 is formed by repeatedly stacking Bottleneck, and the Bottleneck module is a ResNet network with more network layers. Then through a series of transitions and Stage structures. Every time the feature map passes through a Transition structure, a branch with different resolutions is newly added on the basis of the original branch, the branch feature maps H and W are halved, and the channel number C is doubled. Layer1 is processed by a Transition1 module, namely a convolution Layer +BN +ReLU with the convolution kernel size of 3 multiplied by 3, the step sizes of 1 and 2 respectively, and the padding of 1 respectively to generate two branches with different resolutions, namely, the scales of four times and eight times are sampled downwards on the basis of an original image. Transition2 is a scale that is increased by a factor of 16 over the original two branches. Transit 3 is the same, adding a 32-fold downsampling scale to the previous one. And fusing information with different scales through a Stage structure in the network. The single branch in each Stage, with 4 superimposed BasicBlock, basicBlock modules, is less for the number of network layers in the ResNet. The high-resolution representation is maintained in the whole process, and the feature extraction capability of the model on the remote sensing image can be improved by repeatedly fusing information with different scales, so that the feature extraction result is more accurate.

(b) And constructing a global self-adaptive up-sampling module GSAU to unify different scale resolutions. The GSAU module performs a 3×3 convolution operation on the low resolution features, reducing the number of channels of the feature map. Global context information acquired from the high resolution feature map is subjected to a convolution of 1 x 1, a batch normalization operation BN and a deconvolution operation ReLU, and then multiplied by the low resolution feature map, at which time feature mapping of the high resolution spatial information to low resolution class localization has been completed. And finally, adding the high-resolution feature map and the weighted low-resolution feature map to obtain a feature map after primary fusion. The high resolution feature map of the upper branch is downsampled and then added to the feature map just processed to complete the global adaptive upsampling operation. Notably, the global adaptive upsampling GSAU module only acts during feature fusion where there is an upsampling operation.

(c) And constructing a multi-scale strong fusion module MSSFM to solve the problem of insufficient fusion of the features with different resolutions. And taking the GSAU module output characteristic diagram as an input characteristic diagram of the MSSFM module. The feature map compresses space information through global average pooling and global maximum pooling, and outputs more proper one-dimensional channel weight parameters, wherein the weight parameters are processed through a convolution of 1 multiplied by 1 and a BN layer, a non-linear ReLU activation function, and then the final channel weight attention parameters are obtained through convolution of 1 multiplied by 1 and Sigmoid. And finally multiplying the weight with the processed high-resolution feature images, namely guiding better fusion of the feature images with different resolutions by using the channel weight attention parameter, so that the feature images are more reliable. And finally, accumulating the processed high-resolution feature map to output again. This is to effectively alleviate the problems of gradient extinction, gradient explosion and degradation by using the residual concept. The feature map processed by the multi-scale strong fusion module MSSFM retains more space information and advanced semantic information. To ensure that the predicted image is more similar to the layout of the feature elements in the label image. The output of the down-sampled four-times branch in Stage2 is processed by the down-sampled eight-times branch through the GASU module and then through the MSSFM module together with the down-sampled four-times branch in Stage 2. Stage3 is similar to Stage 2.

(d) The four-fold BasicbLock processed feature maps per stack in Stage4 all need to be processed by the triple attentiveness mechanism TAM module. The three parallel branches of the TAM module are used to capture the dependency between the (C, H), (C, W) and (H, W) dimensions of the input tensor, respectively. The top branch is responsible for calculating the attention weight of the cross-channel dimension C and the space dimension W, the middle branch is responsible for calculating the attention weight of the cross-channel dimension C and the space dimension H, and the bottom branch establishes the dependency relationship between H and W. And finally, summarizing the three-branch weights through simple average. The Z-Pool layer is responsible for reducing tensors of channel dimensions to two dimensions, and the feature map is processed by using average pooling and maximum pooling, so that the layer can furthest reserve rich features of actual tensors, and meanwhile, the layer number of the neural network is reduced, so that the module is lighter. The same shape as the TAM input is obtained after TAM processing, but with a fine tensor of more detailed features. To improve the segmentation accuracy of slender categories such as 'roads', 'water bodies', etc. The output of each TAM module needs to be subjected to multi-scale feature strong fusion through the MSSFM module again.

(3) Based on the constructed MT-HRNet semantic segmentation network, training is carried out on a training set, and model parameters are optimized until the MT-HRNet semantic segmentation network converges.

(a) And (5) training a feature model of the ground object. The model learning rate was initially set to 0.004, with a minimum learning rate of 0.00004. The optimizer is sgd, the internal parameter momentum of the optimizer is set to 0.9, and the learning rate is reduced by cos. The batch number batch_size of model training is set to 16, the iteration number epoch is set to 200, and model training results MPA and MIoU are observed.

(b) And (5) verifying a road feature model. And (3) reserving an optimal model in the training round of the process (a), performing first-round verification on the verification set by using the model to generate a prediction set, and checking a prediction result.

(c) Repeating the processes of (a) and (b) until the model precision reaches a usable level, and reserving an optimal model, so as to finish the construction of the remote sensing image ground feature element extraction model.

(4) Based on the converged MT-HRNet semantic segmentation model, predicting the test set to obtain a feature element extraction visualization result. The method comprises the following specific steps: and (3) extracting feature features of the ground object element from the test set by using an MT-HRNet model, and storing extraction results MPA and MIoU, and a remote sensing image integral dataset segmentation result visualization.

Compared with the prior art, the invention has the advantages that:

1. and the high-resolution network HRNet is used for extracting ground feature elements of the remote sensing image, and the characterization of the high-resolution feature map is kept consistent in the whole training process, and different resolution information is fused repeatedly. The feature extraction capability of the network is improved, and the feature element extraction result is more accurate.

2. And the global self-adaptive up-sampling is fused, the unification of different scale resolutions is carried out, and the feature images with the same size as the high resolution are output, so that the robustness of the input space change is effectively maintained, and the unique characteristics in the feature images are learned, thereby obtaining finer space information. In addition, a multi-scale information strong fusion strategy is adopted, the problem of insufficient fusion of feature images with different resolutions is solved, more spatial position information and advanced semantic information are reserved, and the model segmentation precision is improved.

3. And adding a triple attention mechanism in the last Stage of the network, fully utilizing cross-channel interaction information, effectively helping channel information and space detail information to effectively propagate in the network, and acquiring the fine tensor with more detail characteristics in the same shape. The problem of losing the texture information of the fine strip-shaped target is solved, and the extraction effect of the ground feature elements is improved.

Drawings

Fig. 1 is an overall flow chart of the present invention.

FIG. 2 is a Stage and Transition infrastructure

Fig. 3 is a network configuration diagram of GSAU.

Fig. 4 is a network configuration diagram of the MSSFM.

Fig. 5 is a network configuration diagram of the triple attention mechanism module TAM.

FIG. 6 is a visual illustration of the segmentation results of HRNet and MT-HRNet

Detailed Description

In order to make the solution of the embodiment of the present invention better understood by those skilled in the art, the embodiment of the present invention is further described in detail below with reference to the accompanying drawings and embodiments.

The invention comprises the following steps:

1. remote sensing data preprocessing and data set partitioning. And adjusting the VOC format of the remote sensing data tag. The processing may be performed by Python script. The input picture is a JPG picture, and the label is a PNG picture. The value of each pixel of the tag is the type to which that pixel belongs. Such as: the background pixel value is 0, the target class 1 pixel value is 1, the target class 2 pixel value is 2, the target class 3 pixel value is 3, and so on.

2. Extracting ground feature elements from a remote sensing image sample dataset to construct a semantic segmentation model, wherein the method specifically comprises the following steps of: constructing a feature learning model of the feature, training the feature model of the feature, testing the feature model of the feature, and storing an optimal model;

constructing a ground object element feature learning model MT-HRNet: 256×256 three-channel pictures are input, the input is first downsampled twice using a convolution layer (comprising BN and ReLU) with a convolution kernel size of 3×3 and a step size of 2 and a step size of 1, and each downsampling twice passes through a convolution layer with a convolution kernel of 3×3 and a step size of 2 and a step size of 1, and finally passes through the BN layer, at this time, the downsampling is four times in total. And then, carrying out channel number adjustment through a Layer1 module without changing the size of the feature map, wherein the Layer1 is formed by repeatedly stacking Bottleneck, and the Bottleneck module is a ResNet network with a large number of network layers. Then through a series of transitions and Stage structures. Every time the feature map passes through a Transition structure, a branch with different resolutions is newly added on the basis of the original branch, the branch feature maps H and W are halved, and the channel number C is doubled. Layer1 is processed by a Transition1 module, namely a convolution Layer +BN +ReLU with the convolution kernel size of 3 multiplied by 3, the step sizes of 1 and 2 respectively, and the padding of 1 respectively to generate two branches with different resolutions, namely, the scales of four times and eight times are sampled downwards on the basis of an original image. Transition2 is a scale that is increased by 16 times by one downsampling based on the original two branches. Transit 3 is the same, adding a 32-fold downsampling scale to the previous one. The main work of the Stage structure in the network is to fuse information of different scales. Stage2 first uses four BasicBlock, basicBlock modules to get the result of the ResNet with fewer network layers and the output of each branch is the result of the fusion of all branches. The output of the branch with four times of downsampling in Stage2 is that the branch with eight times of downsampling is processed by the GASU module, and then is processed by the MSSFM module together with the branch with four times of downsampling in Stage2, so that more remote sensing image space information and high-level semantic information are reserved, and the predicted image and the layout of ground feature elements in the tag image are ensured to be more similar. Stage3 branches are also similar. The four-fold BasicbLock processed feature maps per stack in Stage4 all need to be processed by the triple attentiveness mechanism TAM module. The three parallel branches of the TAM module are used to capture the dependency between the (C, H), (C, W) and (H, W) dimensions of the input tensor, respectively. The top branch is responsible for calculating the attention weight of the cross-channel dimension C and the space dimension W, the middle branch is responsible for calculating the attention weight of the cross-channel dimension C and the space dimension H, and the bottom branch establishes the dependency relationship between H and W. And finally, summarizing the three-branch weights through simple average. The Z-Pool layer is responsible for reducing tensors of channel dimensions to two dimensions, and the feature map is processed by using average pooling and maximum pooling, so that the layer can furthest reserve rich features of actual tensors, and meanwhile, the layer number of the neural network is reduced, so that the module is lighter. The same shape as the TAM input is obtained after TAM processing, but with a fine tensor of more detailed features. To improve the segmentation accuracy of slender categories such as 'roads', 'water bodies', etc. The output of each TAM module needs to be subjected to multi-scale feature strong fusion through the MSSFM module again.

And (3) training a feature model of the ground object element: the model learning rate was initially set to 0.004, with a minimum learning rate of 0.00004. The optimizer is sgd, the internal parameter momentum of the optimizer is set to 0.9, and the learning rate is reduced by cos. The batch number batch_size of model training is set to 16, the iteration number epoch is set to 200, and model training results MPA and MIoU are observed.

And (3) feature model test of the ground object element: the optimal model in the feature model training round of the feature of the ground object is reserved, the model is used for carrying out a first round of test on the test set to generate a prediction set, and the prediction result is checked

Storing an optimal model: and repeating the processes of model training and model testing until the model precision reaches a usable level, and reserving an optimal model, so that the remote sensing image ground feature element extraction model construction is completed.

3. The method for extracting the ground object elements of the remote sensing image data based on the MT-HRNet semantic segmentation model comprises the following specific steps: extracting feature features of the ground object and storing segmentation results;

extracting feature features of ground object elements: and (3) putting all remote sensing image data sets into a test set, extracting the ground feature elements by using an MT-HRNet semantic segmentation model, and visualizing the ground feature element extraction result, namely, MPA and MIoU, namely, the extracted data of the ground feature elements.

Storing a segmentation result: and (3) adopting a centralized high-performance computing facility or utilizing a distributed computing environment and a storage structure to extract the ground feature elements and then physically store the data.

Table 1 experimental results

It should be noted that the method of the embodiment of the invention is suitable for extracting the ground feature elements of the remote sensing images with different resolutions.

The foregoing has described in detail embodiments of the invention, which are presented herein with particular reference to the drawings and are presented solely to aid in the understanding of the invention; meanwhile, as those skilled in the art will have variations in the specific embodiments and application scope in accordance with the ideas of the present invention, the present description should not be construed as limiting the present invention in view of the above.

Claims

1. The method is characterized by firstly acquiring remote sensing data, and dividing the data into data sets to form a ground feature element extraction sample data set; based on a feature element extraction sample data set, constructing a multi-scale strong fusion semantic segmentation network MT-HRNet fusing a triple attention mechanism, training on a training set based on a constructed MT-HRNet semantic segmentation model, and optimizing model parameters to obtain a preliminary feature element extraction result; calculating segmentation loss based on the primary feature element extraction result and the real label of the remote sensing image; the segmentation loss guides the MT-HRNet feature extraction network to perform feature extraction until the MTC-HRNet model converges; and predicting the test set based on the converged MT-HRNet semantic segmentation model to obtain a feature element extraction result.

2. The HRNet-based remote sensing image ground feature extraction multi-scale strong fusion semantic segmentation method according to claim 1, wherein the semantic segmentation method comprises the following specific steps:

(1) Preprocessing remote sensing data and dividing a data set; preprocessing remote sensing data, processing the remote sensing data into a training file in a VOC format, and generating a remote sensing image sample data set;

(a) Constructing a remote sensing image ground feature extraction network by adopting structures of parallel sub-networks with different resolutions; inputting a three-channel picture with 256 multiplied by 256, firstly, performing two downsampling on the input by using a convolution layer with a convolution kernel size of 3 multiplied by 3 and a step length of 2 and a padding of 1, wherein the downsampling is four times; the channel number is adjusted through a Layer1 module without changing the size of a feature diagram, the Layer1 is formed by repeatedly stacking Bottleneck, and the Bottleneck module is a ResNet network with more network layers and is in a Transition and Stage structure; each time the feature map passes through a Transition structure, a branch with different resolutions is newly added on the basis of the original branch, the feature maps H and W of the branch are halved, and the channel number C is doubled; layer1 is processed by a Transition1 module, namely a convolution Layer +BN +ReLU with the convolution kernel size of 3 multiplied by 3, the step length of 1 and 2 and the padding of 1 respectively to generate two branches with different resolutions, namely the scales of four times and eight times are sampled downwards on the basis of an original image; transition2 is a scale of increasing one downsampled 16 times on the basis of the original two branches; a downsampling scale of 32 times is added to Transition 3; information of different scales is fused in a network through a Stage structure; the single branch in each Stage is overlapped with 4 BasicbLock modules, and the Basicblock modules are used for the network layers in ResNet; the characteristic extraction capability of the remote sensing image is improved by maintaining the high-resolution representation in the whole process and repeatedly fusing the information of different scales, so that the feature extraction result is more accurate;

(b) Constructing a global self-adaptive up-sampling module GSAU to unify different scale resolutions; the GSAU module executes 3×3 convolution operation on the low resolution feature, global context information obtained from the high resolution feature map is subjected to 1×1 convolution, batch normalization operation BN and branching transformation operation ReLU, and then multiplied by the low resolution feature map to complete feature mapping of the high resolution space information on low resolution category positioning; finally, adding the high-resolution feature map and the weighted low-resolution feature map to obtain a feature map after preliminary fusion; the high-resolution feature map of the upper branch is added with the feature map just processed after being downsampled to complete the global self-adaptive upsampling operation; the global self-adaptive up-sampling GSAU module only acts in the feature fusion process of up-sampling operation;

(c) Constructing a multi-scale strong fusion module MSSFM, and taking the GSAU module output feature map as an input feature map of the MSSFM module; compressing space information by global average pooling and global maximum pooling of the feature map, outputting more proper one-dimensional channel weight parameters, processing the weight parameters by a convolution of 1 multiplied by 1 and a BN layer, and obtaining final channel weight attention parameters by a convolution of 1 multiplied by 1 and a Sigmoid; finally multiplying the channel weight with the processed high-resolution feature map; accumulating the processed high-resolution feature images to output again;

(d) The feature map after being processed by Basicblock for four times in each stack in Stage4 is processed by a triple attention mechanism TAM module; the three parallel branches of the TAM module are respectively used for capturing the dependency relationship between the (C, H), (C, W) and (H, W) dimensions of the input tensor; the top branch is responsible for calculating the attention weight of the cross-channel dimension C and the space dimension W, the middle branch is responsible for calculating the attention weight of the cross-channel dimension C and the space dimension H, the bottom branch establishes the dependency relationship between H and W, and the three branch weights are summarized on average; the Z-Pool layer is responsible for reducing tensors of channel dimensions to two dimensions, and processing the feature map by using average pooling and maximum pooling; the same shape as the input of the TAM module is obtained after the TAM module is processed, and the output of each TAM module is required to be subjected to multi-scale feature strong fusion again through the MSSFM module;

(3) Training on a training set based on the constructed MT-HRNet semantic segmentation network, and optimizing model parameters until the MT-HRNet network converges;

(4) Based on the converged MT-HRNet semantic segmentation model, predicting a test set to obtain a feature element extraction visualization result; the method specifically comprises the following steps: and (3) extracting feature features of the ground object element from the test set by using an MT-HRNet model, and storing extraction results MPA and MIoU, and a remote sensing image integral dataset segmentation result visualization.

3. The HRNet-based remote sensing image ground feature extraction multi-scale strong fusion semantic segmentation method of claim 2, wherein in the remote sensing data preprocessing and data set partitioning, the remote sensing data tag is subjected to VOC format adjustment and is processed through a Python script; the input picture is a JPG picture, and the label is a PNG picture; the value of each pixel of the tag is the type to which that pixel belongs.

4. The HRNet-based remote sensing image ground feature extraction multi-scale strong fusion semantic segmentation method of claim 2, wherein the feature map processed by the multi-scale strong fusion module MSSFM is constructed in the multi-scale strong fusion module MSSFM, and more space information and advanced semantic information are reserved; ensuring that the predicted image is similar to the layout of the ground feature elements in the tag image; the output of the down-sampled four-times branch in Stage2 is processed by the down-sampled eight-times branch through the GASU module and then through the MSSFM module together with the down-sampled four-times branch in Stage 2.

5. The HRNet-based remote sensing image ground feature extraction multi-scale strong fusion semantic segmentation method of claim 2, wherein training optimization model parameters on a training set specifically comprises (a) ground feature element feature model training; setting the model learning rate to 0.004 at the initial stage, and setting the minimum learning rate to 0.00004; the optimizer is sgd, the internal parameter momentum of the optimizer is set to be 0.9, and the learning rate is reduced by cos; batch number batch_size of model training is set to 16, iteration number epoch is set to 200, and model training results MPA and MIoU are observed;

(b) Verifying a road feature model; reserving an optimal model in the training round in the process (a), performing first-round verification on the verification set by using the model to generate a prediction set, and checking a prediction result;