CN110322453B

CN110322453B - 3D point cloud semantic segmentation method based on position attention and auxiliary network

Info

Publication number: CN110322453B
Application number: CN201910604264.0A
Authority: CN
Inventors: 焦李成; 冯志玺; 张格格; 杨淑媛; 程曦娜; 马清华; 张�杰; 郭雨薇; 丁静怡; 唐旭
Original assignee: Xidian University
Current assignee: Xidian University
Priority date: 2019-07-05
Filing date: 2019-07-05
Publication date: 2023-04-18
Anticipated expiration: 2039-07-05
Also published as: CN110322453A

Abstract

The invention provides a 3D point cloud semantic segmentation method based on position attention and an auxiliary network, which mainly solves the problem of low segmentation precision in the prior art, and the implementation scheme is as follows: acquiring a training set T and a test set V; constructing a 3D point cloud semantic segmentation network, and setting a loss function of the network, wherein the network comprises a feature down-sampling network, a position attention module, a feature up-sampling network and an auxiliary network which are sequentially cascaded; and performing P rounds of supervised training on the segmentation network by using a training set T: adjusting network parameters according to a loss function in the training process of each round, and taking a network model with the highest segmentation precision as a trained network model after P rounds of training are completed; and inputting the test set V into the trained network model for semantic segmentation to obtain a segmentation result of each point. The method improves the semantic segmentation precision of the 3D point cloud, and can be used for automatic driving, robots, 3D scene reconstruction, quality detection, 3D mapping and smart city construction.

Description

3D point cloud semantic segmentation method based on position attention and auxiliary network

Technical Field

The invention belongs to the technical field of data processing, and particularly relates to a 3D point cloud semantic segmentation method which can be used for automatic driving, robots, 3D scene reconstruction, quality detection, 3D mapping and smart city construction.

Background

In recent years, with the wide application of laser radars, RGBD cameras and other 3D sensors in the fields of robots and unmanned driving, the application of deep learning in 3D point cloud data has become one of the research hotspots. The 3D point cloud data refers to: a set of vectors in a three-dimensional coordinate system, usually expressed in the form of x, y, z three-dimensional coordinates, is typically used to represent the shape of the outer surface of an object. In addition, besides the geometric information represented by (x, y, z), RGB color, intensity, gray scale value, depth, or number of returns may be included. The point cloud data is typically obtained by a 3D scanning device, such as a laser radar, RGBD camera, or the like. These sensors measure information in an automated manner at a large number of points on the object surface and then output point cloud data using some data file. Point cloud data has the characteristics of being disorderly, unstructured, and may have different densities in 3D space. This makes the study of deep learning applications on 3D point cloud data a huge challenge.

The 3D point cloud semantic segmentation refers to allocating a category to each point in the input point cloud data. In early research work, 3D point cloud data was generally converted into manual voxel grid features or multi-view image features, and then sent into a deep learning network for feature extraction, so that the method for converting features is not only large in data size, but also complex in calculation, and if the resolution is reduced, the segmentation accuracy would be reduced. Therefore, it is especially important to directly process point cloud data by using a deep learning method.

In 2017, a paper published on CVPR by Charles R Qi and the like and named as 'Deep Learning framework for directly processing 3D Point cloud data' discloses a Deep Learning framework for solving the problem of Point cloud disorder by using a symmetric function of max-posing so as to extract the global feature of each Point. But this method only considers global features and ignores local features at each point. Therefore, shortly after PointNet, charles R Qi's team published a paper in NIPS named "PointNet + +: deep Hierarchical Feature Learning on Point Sets in a Metric Space", pointNet + + being a layered version of PointNet, each layer having three phases: sampling, grouping and feature extraction. Firstly, some more important points are selected as the central points of each local area, and then k adjacent points are selected around the central points according to Euclidean distance. And then, taking the k neighbor points as a local point cloud, extracting features by adopting a PointNet network, and then returning deep features to obtain a 3D point cloud data semantic segmentation result.

Compared with the traditional method, the two methods have the advantages that the 3D point cloud data are directly processed, the calculation is simple, the characteristic of point cloud disorder is effectively solved, and the segmentation precision is improved, however, the PointNet + + does not consider the relation among characteristics of each central point, namely context information, so that the characteristic representation is relatively weak, and in addition, the PointNet + + obes a general frame of coding-decoding and does not consider more information of a bottom layer, so that the segmentation precision is not high, and an improved space is still provided.

Disclosure of Invention

The invention aims to provide a 3D point cloud data semantic segmentation method based on position attention and an auxiliary network aiming at the defects of the prior art, so that the position attention of context characteristics is associated with the auxiliary network for reconstructing underlying information, and the segmentation precision is improved.

In order to achieve the purpose, the technical scheme of the invention comprises the following steps:

(1) Downloading a training file and a test file of 3D point cloud data from a ScanNet official network, and carrying out category statistics and block cutting processing on the training file and the test file to obtain a training set T and a test set V;

(2) Constructing a 3D point cloud semantic segmentation network, which comprises a feature down-sampling network, a position attention module, a feature up-sampling network and an auxiliary network which are sequentially cascaded;

(3) Using a multi-classification cross entropy loss function as a loss function of the 3D point cloud semantic segmentation network;

(4) Performing P rounds of supervised training on the 3D point cloud data semantic segmentation network by using a training set T, wherein P is more than or equal to 500;

(4a) In each round of training process, according to the loss function of the semantic segmentation network, adjusting network parameters to obtain a network model;

(4b) Every other P ₁ And evaluating the segmentation accuracy of the current network model by using the sample of the test set, and if the segmentation accuracy of the current network model is higher than that of the previously stored network model, storing the current network model, P ₁ ≥2；

(4c) After P rounds of training are finished, the network model with the highest segmentation precision is used as a trained network model;

(5) And inputting the test set V into the trained network model for semantic segmentation to obtain a segmentation result of each point.

Compared with the prior art, the invention has the following advantages:

according to the invention, as the 3D point cloud semantic segmentation network is constructed, and the relevance among the characteristics represented by each centroid of the input data of the 3D point cloud semantic segmentation network is calculated through the position attention module, the context information is added to the local centroid characteristics of the 3D point cloud semantic segmentation network; meanwhile, the bottom layer characteristics of the 3D point cloud semantic segmentation network are returned through the auxiliary network, so that the bottom layer information of the 3D point cloud semantic segmentation network is reconstructed, and the segmentation precision of the 3D point cloud semantic segmentation is effectively improved.

Drawings

FIG. 1 is a flow chart of an implementation of the present invention;

FIG. 2 is a whole structure diagram of a 3D point cloud semantic segmentation network constructed in the invention;

FIG. 3 is a block diagram of a location attention module according to the present invention.

Detailed Description

The invention is described in further detail below with reference to the figures and the specific embodiments.

Referring to fig. 1, implementation steps of this example include the following.

Step 1, a training set T and a test set V are obtained.

1.1 Training and testing files for downloading 3D point cloud data from ScanNet official network, wherein the training files contain f ₀ Point cloud scene, test file contains f ₁ Point cloud scenario, in this example f ₀ ＝1201，f ₁ ＝312；

1.2 Using histogram statistics to count all f's in the training file ₀ The number of each category of the point cloud data of each scene is calculated, and the weight w of each category is calculated _k ：

Wherein, G _k The number of k-th point cloud data is represented, M represents the number of all point cloud data, L represents the number of segmentation categories, L is more than or equal to 2, and L =21 in the embodiment;

1.3 Randomly selecting a point as a central point for each scene in the training file, wherein the coordinate is (x, y, z), and taking points in the ranges of (x-0.75, x + 0.75), (y-0.75, y + 0.75), (z-0.75, z + 0.75) around the point to form a data block;

1.4 Set the number of sampling points N) ₀ The number of points in the data block obtained in (1.3) and the number of sampling points N ₀ And (4) comparing to judge whether the comparison is reasonable:

if the number of points in the data block is larger than the number of sampling points N ₀ Then the data block is judged to be reasonable and N is randomly sampled in the data block ₀ Point, forming a sample data, otherwise, discarding the data block, thereby obtaining a training set T, in this embodiment, N ₀ ＝8192；

1.5 For all f in the test file ₁ For each scene in the scene, a sliding window is cut by using a cubic window with the size of 1.5 multiplied by 3, and N is randomly sampled for each data block ₀ And (4) forming sample data by using the points to obtain a test set V.

And 2, constructing a 3D point cloud semantic segmentation network.

Referring to fig. 2, the 3D point cloud semantic segmentation network constructed in this step includes a feature down-sampling network, a location attention module, a feature up-sampling network, and an auxiliary network, which are sequentially cascaded.

2.1 Set up a feature downsampling network:

the feature downsampling network comprises n cascaded PointSA modules, wherein each PointSA module comprises a point cloud centroid sampling and grouping layer and a point cloud feature extraction layer which are cascaded in sequence, n is larger than or equal to 2, and the parameter is set to be n =4 in the embodiment;

for the centroid sampling and grouping layer of the mth PointSA module, m =1,2

Point is taken as the center of mass point and, secondly, in->

Using the center of mass point of each sample as the center, and using a spherical search algorithm to search for the center of mass point at a specific radius r ^m Search in range->

Dots, constitute a packet. In this embodiment, the 1 st PointSA module is set to->

r ¹ =0.1; the 2 nd PointSA module, set->

r ² =0.2; the 3 rd PointSA module, set->

r ³ =0.4; 4 th PointSA module, set->

r ⁴ ＝0.8；

And the point cloud feature extraction layer of the mth PointSA module comprises 3 sequentially cascaded 2D convolution layers and is used for extracting the features of the data output by the centroid sampling and grouping layer and pooling the extracted region features by using a maximum pooling strategy. In this embodiment, the convolution kernels of 3 2D convolution layers of the point cloud feature extraction layer of the 1 st PointSA module are all 1 × 1, the step length is all 1, and the number of output channels is 32, and 64, respectively; the convolution kernel sizes of 3 2D convolution layers of a point cloud feature extraction layer of the 2 nd PointSA module are all 1 multiplied by 1, the step length is 1, and the number of output channels is 64, 64 and 128 respectively; the convolution kernel sizes of 3 2D convolution layers of a point cloud feature extraction layer of a 3 rd PointSA module are all 1 multiplied by 1, the step length is 1, and the number of output channels is 128, 128 and 256 respectively; the convolution kernel sizes of 3 2D convolution layers of a point cloud feature extraction layer of a 4 th PointSA module are all 1 multiplied by 1, the step length is 1, and the output channel numbers are respectively 256, 256 and 512;

2.2 Set up the position attention module, is used for calculating the correlation between the characteristic that its every mass center of input data F represents, get the characteristic E after the position attention strengthens:

referring to fig. 3, the module works as follows:

2.2.1 Input data F respectively pass through the first 1D convolutional layer Q to obtain the characteristic Q of the ith centroid _i I =1,2,.., N denotes the number of centroids of F; then the second 1D convolution layer U is used to obtain the characteristic U of the jth mass center _j J =1,2.., N, and then the characteristic V of the jth centroid is obtained through the third 1D convolution layer V _j (ii) a The sizes of convolution kernels of the three 1D convolution layers Q, U, V are all 1, the step lengths are all 1, and the number of output characteristic channels of the first 1D convolution layer Q and the second 1D convolution layer U is equal to the number of input data F characteristic channels

The number of output characteristic channels of the third 1D convolutional layer V is the same as that of the input data F;

2.2.2 Computing attention-influence values t between features represented by respective centroids _ij ：

Using t _ij Forming a matrix A:

2.2.3 Computing location attention features

2.2.4 Feature E after attention boost is output:

E＝[E ₁ ；E ₂ ；...；E _i ；...；E _N ]，

wherein E is _i ＝αJ _i +F _i Denote the feature of the ith centroid in E, and α denotes the positional attention feature J _i Weight of (1), F _i A feature representing an ith centroid of the input;

2.3 Set up a feature upsampling network:

the characteristic up-sampling network comprises a plurality of PointFP modules, a 1D convolution layer, a Dropout layer and a 1D convolution layer for classification which are sequentially cascaded, wherein each PointFP module comprises a characteristic interpolation layer and a characteristic extraction layer which are sequentially cascaded, a is more than or equal to 2, and the parameter is set to be a =4 in the embodiment;

the a PointFP modules have different structures of a characteristic interpolation layer and a characteristic extraction layer, wherein:

for the 1 st PointFP module, the characteristic interpolation layer interpolates the output characteristics of the position attention module, and the characteristics after interpolation are cascaded with the output characteristics of the 3 rd PointSA module to obtain the output characteristics of the characteristic interpolation layer; the characteristic extraction layer comprises 2 sequentially cascaded 2D convolutional layers and is used for further extracting the output characteristic, the sizes of the convolutional kernels of the 2D convolutional layers are all 1 multiplied by 1, the step length is all 1, and the number of output channels is 256 and 256 respectively;

for the 2 nd PointFP module, the characteristic interpolation layer interpolates the output characteristics of the 1 st PointFP module, and the interpolated characteristics and the output characteristics of the 2 nd PointSA module are cascaded to obtain the output characteristics of the characteristic interpolation layer; the characteristic extraction layer comprises 2 sequentially cascaded 2D convolutional layers and is used for further extracting the output characteristic, the sizes of the convolutional kernels of the 2D convolutional layers are all 1 multiplied by 1, the step length is all 1, and the number of output channels is 256 and 256 respectively;

for the 3 rd PointFP module, the characteristic interpolation layer interpolates the output characteristics of the 2 nd PointFP module, and the interpolated characteristics and the output characteristics of the 1 st PointSA module are cascaded to obtain the output characteristics of the characteristic interpolation layer; the characteristic extraction layer comprises 2 sequentially cascaded 2D convolutional layers and is used for further extracting the output characteristic, the sizes of convolution kernels of the 2D convolutional layers are all 1 multiplied by 1, the step length is all 1, and the number of output channels is 256 and 128 respectively;

for the 4 th PointFP module, the characteristic interpolation layer interpolates the output characteristics of the 3 rd PointFP module to obtain interpolated characteristics, and the interpolated characteristics are used as the output characteristics of the characteristic interpolation layer; the feature extraction layer comprises 3 sequentially cascaded 2D convolutional layers and is used for further extracting the output feature, the sizes of convolution kernels of the 3 2D convolutional layers are all 1 multiplied by 1, the step length is all 1, and the number of output channels is 128, 128 and 128 respectively.

The convolution kernel of the 1D convolution layer is 1, the step length is 1, and the number of output characteristic channels is set to be 128;

the Dropout layer, the retention probability of which is set to 0.5;

the 1D convolutional layer for classification has the convolutional kernel size of 1 and the step length of 1, and the number of output characteristic channels is set as the number L of the classification of the segmentation.

2.4 Set up the auxiliary network:

the auxiliary network comprises b PointAux modules and 1D convolutional layers for classification, wherein the b is more than or equal to 1, and the b =2 in the embodiment;

for the 1 st PointAux module, the 1D convolution layer of the 1 st PointAux module is used for extracting the characteristics of the output data of the 2 nd PointFP module, the size of a convolution kernel is 1, the step length is 1, and an output characteristic channel is the number L of the divided categories; the characteristic interpolation layer is used for interpolating the characteristics extracted by the 1D convolution layer;

for the 2 nd PointAux module, the 1D convolution layer is used for extracting the characteristics of the output data of the 1 st PointAux module, the size of the convolution kernel is 1, the step length is 1, and the output characteristic channel is the segmented class number L; the characteristic interpolation layer is used for interpolating the characteristics extracted by the 1D convolution layer;

and the 1D convolutional layer is used for classifying the output characteristics of the 2 nd PointAux module, the size of a convolutional kernel is 1, the step length is 1, and the number of output characteristic channels is set as the number L of the divided categories.

And 3, setting a loss function of the 3D point cloud semantic segmentation network.

The example takes a multi-classification cross entropy loss function as a loss function of a 3D point cloud semantic segmentation network, and the expression formula is as follows:

wherein C represents the number of training sample points, L represents the total number of categories, and w _k Is a weight of class k, w _a Weight of loss to the auxiliary network, w _a ∈[0,1]W in this embodiment _a ＝0.5；

p _i,k Representing the real probability that the ith sample point belongs to the kth class, wherein if the ith sample point belongs to the kth class, the probability value is 1, otherwise, the probability value is 0;

and &>

Respectively representing the probability that the ith sample point predicted by the feature upsampling network and the auxiliary network belongs to the kth class, and the calculation formula is as follows:

wherein，

Respectively representing the k channel characteristic value of the ith sample point output by the characteristic up-sampling network and the auxiliary network, and the calculation formula is as follows:

/>

wherein x is _i Input features representing the ith sample point, f ¹ Representing a characteristic upsampling network, θ ¹ Parameters representing a characteristic up-sampling network, f ² Representing auxiliary networks, theta ² Representing parameters of the auxiliary network.

And 4, performing P rounds of supervised training on the 3D point cloud semantic segmentation network by using the training set T, wherein P is more than or equal to 500.

In this embodiment, P =1000 is taken, and the training steps are as follows:

4.1 In the q-th round of training, set l _q For the learning rate of the q-th round of training process, θ is set _q Using a formula for the parameters of the network model of the q-th round training process according to the loss function set in step 3

Adjusting theta _q To obtain a network model parameter theta for the q +1 th round of training process _q+1 Thus obtaining a network model after the q-th round of training process;

4.2 Every P) ₁ And (4) inputting the test set into the current network model to obtain the prediction categories, P, of all point cloud data in the test set ₁ Not less than 2, in this example, P ₁ ＝5；

4.3 ) counting the number of the prediction categories of all point cloud data in the test set, which is the same as the real categories of the point cloud data, and calculating the segmentation precision:

wherein, R represents the number of the prediction categories of all the point cloud data in the test set which is the same as the real categories of the point cloud data, and H represents the number of all the point cloud data in the test set;

4.4 Comparing the segmentation accuracy acc of the current network model with the segmentation accuracy acc of the previously stored network model, if the segmentation accuracy acc of the current network model is higher than the segmentation accuracy acc of the previously stored network model, indicating that the current network model is better and storing the current network model, otherwise, not storing the current network model.

4.5 After P rounds of training are finished, the network model with the highest segmentation precision is used as a trained network model;

and 5) inputting the test set V into the trained network model obtained in the step 4.5) for semantic segmentation to obtain a segmentation result of each point.

The technical effects of the invention are explained by combining simulation experiments as follows:

1. simulation conditions

The simulation experiment of the present invention was performed in the following environment.

Hardware platform: intel (R) Xeon (R) CPU E5-2650v4@2.20GHz,64GB runs memory, ubuntu16.04 operating system, geForce GTX TITAN X;

a software platform: tensorflow deep learning framework, python3.5, the dataset used for the experiment was a point cloud dataset ScanNet.

ScanNet is a point cloud dataset of an indoor scene scanned and reconstructed by an RGB-D camera. The total number of 1513 scenes is included, 1201 scenes are used as a training set, 312 scenes are used as a test set, and the number of included categories is 21.

2. Simulation experiment:

according to the method, a training set and a test set are obtained, a 3D point cloud semantic segmentation network is constructed, supervised training is carried out on the 3D point cloud semantic segmentation network by using the training set, then points in the test set are predicted by using a trained network model, and the segmentation precision of the 3D point cloud segmentation network on the test set V is calculated according to the method in the step 4.3.

Comparing the precision of semantic segmentation on point cloud data by the invention and the existing PointNet + + method, and using the segmentation precision as an evaluation index for evaluating the quality of the invention and the prior art, the results are shown in Table 1:

TABLE 1 ScanNet data set segmentation accuracy comparison table

Evaluation index	Prior Art	The invention
			Accuracy of segmentation	0.836	0.852

As can be seen from Table 1, the segmentation precision of the ScanNet data set exceeds that of PointNet + +, which is the prior art, and is improved by 1.6%, which indicates that the semantic segmentation effect of the invention on 3D point cloud is stronger than that of PointNet + +.

Claims

1. A3D point cloud semantic segmentation method based on position attention and an auxiliary network is characterized by comprising the following steps:

the location attention module comprises 3 independent 1D convolutional layers Q, U, V,for extracting features of the input data F of the module and calculating attention impact values t between features represented by respective centroids _ij And feature E after attention boost:

E＝[E ₁ ；E ₂ ；...；E _i ；...；E _N ]

wherein, U _i Features representing the ith centroid of input data F of the location attention Module through the 1D convolutional layer U, Q _j ^T The input data F representing the position attention module is transposed by the characteristics of the jth centroid extracted by the 1D convolutional layer Q, N represents the number of centroids of F, and E represents the number of centroids of F _i And (3) representing the characteristic of the ith centroid in the E, wherein the calculation formula is as follows:

wherein, V _j Features representing the jth centroid of F extracted through 1D convolutional layer V,

features representing the ith centroid after positional attention has passed, a represents the weight of the positional attention feature, F _i A feature representing an ith centroid of the input;

the auxiliary network comprises b pointAux modules and 1D convolutional layers for classification, wherein the b modules are sequentially cascaded, each pointAux module comprises a 1D convolutional layer and a characteristic interpolation layer, and b is more than or equal to 1;

(4) Performing P rounds of supervised training on the 3D point cloud data semantic segmentation network by using a training set T, wherein P is more than or equal to 500:

(4a) In each round of training process, according to loss functions of the semantic segmentation network, network parameters are adjusted to obtain a network model;

2. The method according to claim 1, wherein (1) the class statistics and the block cutting processing are performed on the point cloud data as follows:

(1a) Using histogram statistics to count all f in training files ₀ The number of each category of the point cloud data of each scene is calculated, and the weight w of each category is calculated _k ：

Wherein, G _k Representing the number of the kth point cloud data, M representing the number of all point cloud data, L representing the number of segmentation classes, f ₀ ≥1000，L≥2；

(1b) For each scene in the training file, a point is randomly selected as a central point with coordinates of (x, y, z), points in the ranges of (x-0.75, x + 0.75), (y-0.75, y + 0.75), (z-0.75, z + 0.75) are taken around the point to form a data block, and the number of points in the data block and the number of sampling points N are combined ₀ And (4) comparing to judge whether the comparison is reasonable:

if the number of points in the data block is larger than the number of sampling points N ₀ Then the data block is judged to be reasonable and N is randomly sampled in the data block ₀ Point, forming a sample data, otherwise, abandoning the data block, thereby obtaining a training set T, wherein N is ₀ ≥4096；

(1c) For places in test filesHas f ₁ For each scene in the scene, a sliding window is cut by using a cubic window with the size of 1.5 multiplied by 3, and N is randomly sampled for each data block ₀ Forming a sample data by using points to obtain a test set V, f ₁ ≥300。

3. The method of claim 1, wherein the feature downsampling network in (2) comprises n cascaded PointSA modules, each PointSA module comprises a point cloud centroid sampling layer, a grouping layer and a point cloud feature extraction layer, which are cascaded in sequence, wherein n is greater than or equal to 2.

4. The method according to claim 1, wherein the feature upsampling network in (2) comprises a pointFP modules, a 1D convolutional layer, a Dropout layer and a 1D convolutional layer for classification, which are sequentially cascaded, and each pointFP module comprises a feature interpolation layer and a feature extraction layer, which are sequentially cascaded, wherein a is greater than or equal to 2.

5. The method according to claim 1, wherein the loss function of the 3D point cloud semantic segmentation network in step (3) is calculated as follows:

wherein C represents the number of training sample points, L represents the total number of categories, and w _k Is a weight of class k, w _a Weight of loss to the auxiliary network, w _a ∈[0,1]；p _i,k Representing the real probability that the ith sample point belongs to the kth class, wherein if the ith sample point belongs to the kth class, the probability value is 1, otherwise, the probability value is 0;

and &>

Respectively representing characteristic of winningThe probability that the ith sample point predicted by the sample network and the auxiliary network belongs to the kth class, and->

And &>

The calculation formula of (a) is as follows:

wherein the content of the first and second substances,

wherein x is _i Input features representing the ith sample point, f ¹ Representing a characteristic up-sampling network, theta ¹ Parameters representing a characteristic up-sampling network, f ² Representing auxiliary networks, theta ² Representing parameters of the auxiliary network.

6. The method of claim 5, wherein the adjusting the network parameters according to the loss function of the semantically segmented network in (4 a) is performed by the following formula:

wherein l _q Represents the learning rate, theta, of the q-th round of training _q Parameters of the 3D point cloud semantic segmentation network representing the qth round of training, θ _q+1 Is expressed as a pair of theta _q And after adjustment, parameters for the q +1 th round of training process.

7. The method of claim 1, wherein every P in (4 b) ₁ And evaluating the segmentation precision of the current network model, and realizing the following steps:

(4b1) Every other P ₁ Inputting the test set into the current network model to obtain the prediction categories of all point cloud data in the test set;

(4b2) Counting the number of the prediction categories of all point cloud data in the test set, which is the same as the real categories of the point cloud data, and calculating the segmentation precision:

(4b3) And comparing the segmentation precision of the current network model with the segmentation precision of the previously stored network model, if the segmentation precision of the current network model is higher than that of the previously stored network model, indicating that the current network model is better, and storing the current network model, otherwise, not storing the current network model.