CN115984559B

CN115984559B - Intelligent sample selection method and related device

Info

Publication number: CN115984559B
Application number: CN202211685928.9A
Authority: CN
Inventors: 陈婷; 李志强; 丁媛; 朱艳丽; 刘书静; 何建军
Original assignee: Twenty First Century Aerospace Technology Co ltd
Current assignee: Twenty First Century Aerospace Technology Co ltd
Priority date: 2022-12-27
Filing date: 2022-12-27
Publication date: 2024-01-12
Anticipated expiration: 2042-12-27
Also published as: CN115984559A

Abstract

The application discloses an intelligent sample selection method and a related device, wherein the method comprises the following steps: sample selection technology based on remote sensing information result data is used for semantic segmentation sample selection, an image data area with poor model generalization is obtained through block precision evaluation, a sample with poor local annotation is prevented from being selected as a high-quality sample through local minimum F1 indexes, and model generalization is gradually improved through iterative selection; dividing the candidate sample into a high-quality sample, a low-quality sample and an error sample in sample selection, and analyzing the distribution condition of the three parts, thereby determining sample selection by acquiring the high-quality sample and realizing iterative sample selection; through virtuous circle, the choice model can continuously increase generalization on the candidate sample set, and high-quality samples are continuously extracted from the candidate sample set.

Description

Intelligent sample selection method and related device

Technical Field

The application relates to the technical field of sample selection, in particular to an intelligent sample selection method and a related device.

Background

As is well known, the remote sensing data volume is huge, and the feature information extraction is a serious content in the remote sensing field. At present, deep learning technology is becoming a mainstream method for extracting remote sensing ground object information, however, deep learning technology under data driving often needs to rely on a large number of samples to obtain better extraction effect. In different tasks such as scene classification, target detection and semantic segmentation, especially, the sample labeling work of the semantic segmentation task is the most time-consuming and labor-consuming, and aiming at the problem, students research a plurality of automatic or semi-automatic semantic segmentation sample labeling methods, but the problems of poor sample labeling work efficiency and low precision of the semantic segmentation task cannot be solved at present.

Therefore, how to solve the problem of poor sample labeling work efficiency and low precision of semantic segmentation task becomes a technical problem to be solved urgently.

Disclosure of Invention

In order to generate refuge paths in time when natural disasters come, the application provides an intelligent sample carefully selecting method and a related device.

In a first aspect, the present application provides an intelligent sample selection method that adopts the following technical scheme:

a method of smart sample beneficiation, comprising:

obtaining a sample to be trained, and storing the sample to be trained into a preset high-quality sample set;

model training is carried out on the preset high-quality sample set to obtain a target model;

acquiring historical image data, and predicting the historical image data by using the target model to acquire a prediction result;

performing block evaluation in the prediction result to obtain an image data area, cutting the image data area into a target sample, and storing the target sample into a preset candidate sample set;

determining local minimum F1 indexes of all target samples in the preset candidate sample set by utilizing the target model; determining a final sample set according to a preset rule by combining the local minimum F1 index, and storing the final sample set into a preset carefully selected sample set;

And carrying out carefully chosen iteration on the preset carefully chosen sample set to obtain a carefully chosen sample set.

Optionally, the step of performing block evaluation in the prediction result to obtain an image data area, and clipping the image data area into a target sample and storing the target sample in a preset candidate sample set includes:

dividing the image information into standard image blocks according to geographic units and/or grids in the prediction result;

acquiring a prediction quality evaluation index of the standard image block;

acquiring an image data area by combining the prediction quality evaluation index in a block evaluation mode;

and cutting the data image area into target samples and storing the target samples into a preset candidate sample set.

Optionally, the step of obtaining the image data area by combining the prediction quality evaluation index in a block evaluation manner includes:

obtaining a feature prediction result and a feature outline vector result corresponding to the feature prediction result from the historical image data; superposing the ground object prediction result and the ground object outline vector result to generate a superposition image;

carrying out block evaluation on the superimposed image according to the prediction quality evaluation index to obtain an evaluation value of each standard image block; and extracting an image area with the evaluation value smaller than a preset threshold value as an image data acquisition area.

Optionally, before the step of determining the local minimum F1 index of all the target samples by using the target model in the preset candidate sample set, the method further includes:

judging whether sample selection is performed for the first time in the preset candidate sample set;

if yes, determining local minimum F1 indexes of all the target samples according to the target model;

if not, using the training model of the existing carefully chosen sample set to obtain a more generalized model, updating the target model, carefully choosing the residual candidate samples by using the new target model, and determining the local minimum F1 index of the residual candidate samples by using the target model.

Optionally, the step of determining the local minimum F1 index of all the target samples in the preset candidate sample set by using the target model includes:

determining a Local minimum F1 index Local (N) minF1 of all the target samples in the candidate sample set by using the target model, wherein the calculation formula is as follows:

Local(N)minF1＝min(Local(N ₁ )F1,……,Local(N _i )F1,……,Local(N _n )F1)

Local(N _i ) F1 means a local F1 fraction, N means a local block number into which the sample is divided, N _i Refers to the i-th local block, P is the global accuracy of the sample, R is the global recall of the sample, P _L Is the local accuracy of the sample, R _L The local recall rate of the sample is calculated, p is the number of pixels occupied by the positive class in the sample prediction result, and l is the number occupied by the positive class in the label.

Optionally, the step of determining a final sample set according to a preset rule in combination with the local minimum F1 index and storing the final sample set in a preset carefully chosen sample set includes:

acquiring a preset ordering rule, and determining a sample sequence according to the preset ordering rule and a local minimum F1 index of the target sample;

acquiring a preset limit threshold, and screening in the sample sequence through the preset limit threshold to determine a final sample set; and storing the final sample set into a preset carefully chosen sample set.

Optionally, the step of performing a beneficiation iteration on the preset beneficiation sample set to obtain a beneficiation sample set includes: training the target model by using the preset carefully chosen sample set so as to update the target model;

updating the preset carefully chosen sample set by combining the current target model with the current candidate sample set;

and merging the current preset carefully chosen sample set with the high-quality sample set to obtain a carefully chosen sample set when the current preset carefully chosen sample set meets a preset stop condition.

In a second aspect, the present application provides an intelligent sample beneficiation device, comprising:

the sample to be trained acquisition module is used for acquiring a sample to be trained and storing the sample to be trained into a preset high-quality sample set; the model training module is used for carrying out model training on the preset high-quality sample set to obtain a target model;

the prediction result acquisition module is used for acquiring historical image data, and predicting the historical image data by utilizing the target model to acquire a prediction result;

the block evaluation module is used for carrying out block evaluation in the prediction result to obtain an image data area, cutting the image data area into a target sample and storing the target sample into a preset candidate sample set;

the local minimum F1 index acquisition module is used for determining local minimum F1 indexes of all target samples in the preset candidate sample set by utilizing the target model;

the preset condition module is used for determining a final sample set according to a preset rule by combining the local minimum F1 index, and storing the final sample set into a preset carefully selected sample set;

and the carefully chosen sample set module is used for carrying out carefully chosen iteration on the preset carefully chosen sample set to obtain the carefully chosen sample set.

In a third aspect, the present application provides a computer device, the device comprising: a memory, a processor which, when executing the computer instructions stored by the memory, performs the method as claimed in any one of the preceding claims.

In a fourth aspect, the present application provides a computer readable storage medium comprising instructions which, when run on a computer, cause the computer to perform a method as described above.

In summary, the present application includes the following beneficial technical effects:

the sample selection technology based on remote sensing information result data is used for semantic segmentation sample selection, an image data area with poor model generalization is obtained through block precision evaluation, the sample with poor local annotation is prevented from being selected as a high-quality sample through local minimum F1 index, and model generalization is gradually improved through iterative selection; dividing the candidate sample into a high-quality sample, a low-quality sample and an error sample in sample selection, and analyzing the distribution condition of the three parts, thereby determining sample selection by acquiring the high-quality sample and realizing iterative sample selection; through virtuous circle, the choice model can continuously increase generalization on the candidate sample set, and high-quality samples are continuously extracted from the candidate sample set.

Drawings

FIG. 1 is a schematic diagram of a computer device in a hardware operating environment according to an embodiment of the present invention;

FIG. 2 is a flow chart of a first embodiment of the smart sample beneficiation method of the present invention;

FIG. 3 is a schematic diagram of three different quality samples of a first embodiment of the intelligent sample beneficiation method of the present invention;

FIG. 4 is an evaluation distribution diagram of different prediction results combined with different quality labels for a first embodiment of the intelligent sample refinement method of the present invention;

FIG. 5 is a schematic illustration of building semantic segmentation sample beneficiation of a first embodiment of the intelligent sample beneficiation method of the present invention;

FIG. 6 is a diagram of an original image of a first embodiment of the smart sample beneficiation method of the present invention;

FIG. 7 is a gray scale image of a model obtained from training a candidate sample set according to a first embodiment of the intelligent sample refinement method of the present invention;

FIG. 8 is a gray scale image of a model obtained by training a beneficiation sample set according to the first embodiment of the intelligent sample beneficiation method of the present invention;

FIG. 9 is a gray scale map contrast plot of the extraction results of a local building of a first embodiment of the intelligent sample beneficiation method of the present invention;

FIG. 10 is a sample beneficiation training schematic diagram of a first embodiment of the intelligent sample beneficiation method of the present invention;

FIG. 11 is a flow chart of a second embodiment of the intelligent sample beneficiation method of the present invention;

FIG. 12 is a block diagram of a first embodiment of the intelligent sample beneficiation apparatus of the present invention.

Detailed Description

In order to make the objects, technical solutions and advantages of the present application more apparent, the present application will be further described in detail by means of the accompanying drawings and examples. It should be understood that the specific embodiments described herein are for purposes of illustration only and are not intended to limit the present application.

Referring to fig. 1, fig. 1 is a schematic diagram of a computer device structure of a hardware running environment according to an embodiment of the present invention.

As shown in fig. 1, the computer device may include: a processor 1001, such as a graphics processor (Graphics Processing Unit, GPU), a communication bus 1002, a user interface 1003, a network interface 1004, a memory 1005. Wherein the communication bus 1002 is used to enable connected communication between these components. The user interface 1003 may include a Display, an input unit such as a Keyboard (Keyboard), and the optional user interface 1003 may further include a standard wired interface, a wireless interface. The network interface 1004 may optionally include a standard wired interface, a Wireless interface (e.g., a Wireless-Fidelity (Wi-Fi) interface). The Memory 1005 may be a high-speed random access Memory (Random Access Memory, RAM) or a stable nonvolatile Memory (NVM), such as a disk Memory. The memory 1005 may also optionally be a storage device separate from the processor 1001 described above.

Those skilled in the art will appreciate that the architecture shown in fig. 1 is not limiting of a computer device and may include more or fewer components than shown, or may combine certain components, or a different arrangement of components.

As shown in FIG. 1, an operating system, a network communication module, a user interface module, and an intelligent sample beneficiation program may be included in memory 1005, which is a storage medium.

In the computer device shown in fig. 1, the network interface 1004 is mainly used for data communication with a network server; the user interface 1003 is mainly used for data interaction with a user; the processor 1001 and the memory 1005 in the computer device of the present invention may be provided in the computer device, where the computer device invokes the intelligent sample beneficiation program stored in the memory 1005 through the processor 1001, and executes the intelligent sample beneficiation method provided in the embodiment of the present invention.

An embodiment of the present invention provides an intelligent sample selection method, referring to fig. 2, fig. 2 is a schematic flow chart of a first embodiment of the intelligent sample selection method of the present invention.

In this embodiment, the intelligent sample selection method includes the following steps:

step S10: and acquiring a sample to be trained, and storing the sample to be trained into a preset high-quality sample set.

It should be noted that, in this embodiment, the problem to be solved is to obtain high-quality semantic segmentation samples of the ground object from the historical remote sensing image and the ground object outline vector, so as to supplement the semantic segmentation samples of the ground object, thereby reducing the cost of manually manufacturing the samples by effectively using the historical data to generate the samples, and shortening the update training period and the precision of the ground object intelligent extraction model.

It can be appreciated that to generate a high quality semantic segmentation sample of a feature using a historical remote sensing image and feature contour vectors, the present embodiment first needs to understand the possible problems: the historical remote sensing image is not necessarily a base map when the profile vector of the ground object information is produced, and there may be a little mismatch in time phases, which may cause that the ground object labels in the vector are inconsistent with the actual conditions in the image. When generating the feature semantic segmentation samples, how to avoid introducing a large number of redundant samples is limited in meaning for improving the model accuracy although labeling of the samples can be very accurate. When the feature semantic segmentation sample is generated, how to measure the quality of the generated sample label, although various ways for measuring the quality of the sample label exist in the current sample selection technology, for the semantic segmentation sample, the quality of the whole sample label needs to be ensured by considering the quality of each part.

In order to solve the above problems, the present embodiment proposes a sample selection technique based on remote sensing information result data for semantic segmentation sample selection. In practical applications, large data sets for training often have a large amount of uncorrelated, redundant, incomplete and noisy data in them due to different factors, but the redundancy, noise and erroneous data of the scene to which the present invention is directed are mainly caused by the first problem described above except for the manual operation variability factors, which affects the locally optimal solution of the model. In the prior art, few researches on sample selection of semantic segmentation samples are carried out, and the research is supported strongly due to the fact that a large number of historical images and ground feature contour vectors are possessed. In view of the second problem, the present invention proposes a block evaluation to obtain a sample with a model that is difficult to generalize, and in view of the third problem, the present invention proposes to use a local minimum F1 index to focus on the local labeling quality of the sample.

In a specific implementation, the present embodiment belongs to a sample selection technique. As the remote sensing data volume is larger and larger, a large amount of redundant noise data exists in the training set; on the other hand, the large-scale training data brings the problems of more storage, greater calculation complexity and the like, so that generalization capability is influenced, and prediction accuracy is reduced. The number and quality of samples can affect the performance of the computer and the robustness of the model. The sample selection method can reduce the calculation cost and even improve the learning accuracy by discarding redundant, noisy data and other negative samples. Current sample selection techniques can be broadly summarized as two types, data compression, which removes a portion of irrelevant or redundant samples, and active learning, which selects a representative set of unlabeled samples to learn.

In specific implementation, based on the remote sensing ground object semantic segmentation sample labeling scene faced by the embodiment, the semantic segmentation sample standard needs to label each pixel of the image, and the labeling quality has an influence on the accuracy of the model training result. The prior art has few high quality sample choices for semantically segmented samples. Unsupervised active learning is not applicable to the scene faced by the present embodiment, because the historical contour vector annotation data of the features of the scene can be used as the standby annotation. For sample selection technologies such as random sub-sampling, uniform sub-sampling, high-amplitude rank sub-sampling and the like, the requirement of selecting high-quality labeling samples from historical data in the scene can not be met. For the method for selecting samples by utilizing various index information in deep learning, most of the current technologies select samples for scene classification or target detection in the field of image vision, and few related technologies select semantic segmentation samples, and the methods mentioned in the current technologies are difficult to directly take care of the selection of the semantic segmentation samples.

It should be noted that, the purpose of this embodiment is to obtain high-quality semantic segmentation samples of features from the historical remote sensing images and feature contour vectors, so as to supplement these semantic segmentation samples of features, thereby reducing the cost of manually making the samples by effectively using the historical data to generate the samples, and shortening the update training period and accuracy of the feature intelligent extraction model. Specifically, predicting collected historical images by using a model trained by existing high-quality samples; then carrying out block evaluation by using the prediction result and the historical contour vector data of the ground object to obtain an image data area in which the model is difficult to generalize, wherein the aim is to avoid introducing more redundant samples in sample selection; and finally, carrying out sample cutting and sample selection on the image data and the vector data in the acquisition range, and avoiding selecting the marked sample with poor local part as a high-quality sample through the local minimum F1 index in the sample selection.

In an implementation, a high quality sample set, a candidate sample set, a culled sample set are defined. The three sample sets are used for storing high-quality samples, candidate samples and carefully selected samples of the semantic segmentation samples related to the invention. The high-quality samples are the existing samples with good labeling quality, and the carefully selected samples are the samples with good labeling quality selected from candidate samples.

Note that, the sample to be trained in this embodiment is a high-quality sample.

It will be appreciated that in order to determine the sample refinement scheme, the present invention first classifies candidate samples. The invention divides candidate samples into three types of high quality samples, low quality samples and error samples according to the quality of the labels. For three types of samples of different quality, we describe them as follows:

error samples: refers to a sample with large area multi-label or less-label for the alignment class in the label.

Low quality samples: refers to a sample with misalignment of the normal class or a small number of more or less marks in the normal class gathering area.

High quality samples: the index marks are samples with the dislocation of the alignment class in the index marks within the allowable error range or with little or more marks in the aggregation area of the alignment class.

Meanwhile, after the candidate sample sets are ordered according to the local minimum F1 index from large to small, we find that the high-quality samples, the low-quality samples and the error samples have a certain distribution rule, wherein the low-precision error samples at the rear end of the sequence comprise a plurality of well-marked samples, the medium-precision low-quality samples in the middle of the sequence also comprise well-marked samples, and the analysis finds that the generalization of the model initially used for carefully selecting on the candidate sample sets is not good, so that the samples are difficult to identify, the two distribution conditions are caused, and the high-quality samples at the front end of the sequence comprise only a very small quantity of low-quality samples and error samples. From the above distribution rules, the present invention determines a strategy for refining high quality samples from a candidate sample set according to a local minimum F1 index threshold greater than a certain local minimum, as shown in fig. 3.

Further, the predicted and not predicted results are counted with the expected accuracy of the high quality label, the low quality label, and the error label, and as shown in fig. 4, it can be seen that the high quality sample is mixed in the case of low accuracy, and the high quality sample can be basically considered in the case of high accuracy. The feasibility of the present invention to refine high quality sample strategies from candidate sample sets according to a local minimum F1 index threshold greater than a certain local minimum is further demonstrated.

Step S20: model training is carried out on a preset high-quality sample set to obtain a target model.

It will be appreciated that candidate samples are obtained. And performing model training by using a high-quality sample set to obtain an optimal model on the sample set, and then predicting a historical image by using the model to obtain a prediction result of the ground feature.

Step S30: and acquiring historical image data, and predicting the historical image data by using the target model to acquire a prediction result.

Step S40: and carrying out block evaluation in the prediction result to obtain an image data area, cutting the image data area into target samples, and storing the target samples in a preset candidate sample set.

The influence data region obtained in this embodiment refers to an image data region with poor generalization or a candidate sample set collection region.

In specific implementation, a target model is used for predicting the historical image, and a prediction result of the ground feature is obtained. The method comprises the steps of obtaining a candidate sample set collection area, wherein the method comprises the steps of superposing a ground feature prediction result on a historical image and a ground feature outline vector result corresponding to the image, carrying out block evaluation, and extracting an image area with an evaluation index smaller than a threshold value 1 as the candidate sample set collection area. And finally, cutting out the data of the candidate sample set acquisition area and generating a candidate sample set.

Step S50: and determining the local minimum F1 index of all target samples by utilizing the target model in the preset candidate sample set.

In a specific implementation, the purpose of the invention is to carefully select semantic segmentation samples, labeling of the semantic segmentation samples is pixel by pixel, in general, the process of evaluating the prediction Precision of a model is to firstly predict an image by using the model to obtain a prediction result, then to superimpose the prediction result and the label corresponding to the image, count each pixel category of the prediction result, and complete Precision evaluation, wherein the Precision indexes generally comprise Precision (Precision), recall (Recall), intersection ratio (IoU), F1 Score (F1 Score) and the like. The invention utilizes the F1 score index to evaluate the precision of the candidate set, and the F1 score simultaneously considers the precision and the recall, so that the candidate set can be regarded as a harmonic mean of the precision and the recall. However, when the local inaccuracy or the local error of the sample labeling is found in the experiment and the local proportion of the sample is very small, the F1 fraction reduction of the whole sample is not greatly influenced, and the sample is mistakenly regarded as a high-quality sample and is carefully selected, so that the influence on model training is caused. In order to weaken the influence caused by local to a large extent, the invention provides a local minimum F1 index, wherein the local minimum F1 index formula is as follows:

Wherein Local (N _i ) F1 means a local F1 fraction, N means a local block number into which the sample is divided, N _i Refers to the i-th local block, P is the global accuracy of the sample, R is the global recall of the sample, P _L Is the local accuracy of the sample, R _L The local recall rate of the sample is calculated, p is the number of pixels occupied by the positive class in the sample prediction result, and l is the number occupied by the positive class in the label. The meaning of the formula is that when p+l=0, it indicates that the prediction result and the label of the sample are not available, so that the local F1 score of the positive class is equal to 1; when 0 is<When p+l is less than or equal to 500, the number of pixels of the positive class in the prediction result or the label is few, the accuracy of calculation is possibly low or high, and the influence of the small number of error labels on the model is considered to be within tolerance, so that the global F1 score of the sample is directly used for assigning a value to the local F1 score of the local; when p+l>500, we consider that labeling has some impact on model training, so the F1 score in the local is computed strictly. Finally, the formula for defining the local minimum F1 fraction is as follows:

the local minimum F1 index provided by the invention reflects the sample local index evaluation more accurately.

The sample selection is performed from the candidate sample set. If the sample is selected for the first time, at the moment, no sample exists in the selected sample set, and sample selection is carried out on the candidate sample set by using a model trained by the high-quality sample set; if the sample is not the first sample refinement, then the samples in the refinement sample set are already present and the model has been retrained with the refinement sample, and the remaining candidate sample set is subjected to sample refinement using the model trained with the refinement sample.

It will be appreciated that the specific steps for one sample selection are as follows:

step 1, obtaining an optimal model for sample selection, wherein if the sample selection is the first sample selection, the optimal model is obtained by training a high-quality sample set, and if the sample selection is not the first sample selection, the optimal model is obtained by training an existing selected sample set.

Step 2, predicting a candidate (residual candidate) sample set by using an optimal model, and calculating a Local minimum F1 index Local (N) minF1 of each sample, wherein the calculation formula of the Local (N) minF1 is as follows:

wherein Local (N _i ) F1 means a local F1 fraction, N means a local block number into which the sample is divided, N _i Refers to the i-th local block, P is the global accuracy of the sample, R is the global recall of the sample, P _L Is the local accuracy of the sample, R _L The local recall rate of the sample is calculated, p is the number of pixels occupied by the positive class in the sample prediction result, and l is the number occupied by the positive class in the label.

And 3, sorting candidate (residual candidate) samples from large to small according to the index of the local minimum F1 to obtain a sample sequence, wherein the low-precision error samples at the rear end of the sequence comprise a plurality of samples with good labeling quality, and the medium-precision low-quality samples in the middle of the sequence also comprise samples with good labeling quality, because the generalization of the model initially used for carefully selecting on the candidate sample set is not good, so that the samples are difficult to identify, the two distribution conditions are caused, and the high-quality samples at the front end of the sequence comprise a very small quantity of low-quality samples and error samples. The high-quality samples refer to the samples with the misalignment of the alignment class in the label within the allowable error range, or the samples with the multiple marks or the few marks in the aggregation area of the alignment class, the low-quality samples refer to the samples with the misalignment of the alignment class in the label, or the samples with the small number of the multiple marks or the few marks in the aggregation area of the alignment class, and the error samples refer to the samples with the large-area multiple marks or the small marks in the index. Threshold 2 is then determined as a limit for selecting high quality samples.

And 4, selecting samples with local minimum F1 indexes larger than a threshold value 2 to be moved into a carefully selected sample set, so as to finish carefully selecting the samples for one time.

Further, to reduce the calculation on the selected samples, before the step of determining the local minimum F1 index of all the target samples by using the target model in the preset candidate sample set, the method further includes: judging whether sample selection is performed for the first time in the preset candidate sample set; if yes, determining local minimum F1 indexes of all the target samples according to the target model; if not, using the training model of the existing carefully chosen sample set to obtain a more generalized model, updating the target model, carefully choosing the residual candidate samples by using the new target model, and determining the local minimum F1 index of the residual candidate samples by using the target model.

Further, in order to obtain the FI index, the step of determining, in the preset candidate sample set, the local minimum F1 index of all the target samples by using the target model includes:

Local(N _i ) F1 means a local F1 fraction, N means a local block number into which the sample is divided，N _i Refers to the i-th local block, P is the global accuracy of the sample, R is the global recall of the sample, P _L Is the local accuracy of the sample, R _L The local recall rate of the sample is calculated, p is the number of pixels occupied by the positive class in the sample prediction result, and l is the number occupied by the positive class in the label.

Step S60: and determining a final sample set according to a preset rule and combining the local minimum F1 index, and storing the final sample set into a preset carefully selected sample set.

Further, in order to improve the sample quality in the final sample set, the step of determining the final sample set according to the preset rule in combination with the local minimum F1 index and storing the final sample set in a preset carefully chosen sample set includes: acquiring a preset ordering rule, and determining a sample sequence according to the preset ordering rule and a local minimum F1 index of the target sample; acquiring a preset limit threshold, and screening in the sample sequence through the preset limit threshold to determine a final sample set; and storing the final sample set into a preset carefully chosen sample set.

Step S70: a beneficiation iteration is performed on the preset beneficiation sample set to obtain a beneficiation sample set.

It should be noted that, the refinement iteration is a process of obtaining refinement samples from the remaining candidate sample sets by updating the target model until a preset requirement is met, and finally merging the candidate sample set and the refinement sample set.

Further, to implement the culling iteration, the step of performing the culling iteration on the preset culling sample set to obtain a culling sample set includes: training the target model by using the preset carefully chosen sample set so as to update the target model; updating the preset carefully chosen sample set by combining the current target model with the current candidate sample set; and merging the current preset carefully chosen sample set with the high-quality sample set to obtain a carefully chosen sample set when the current preset carefully chosen sample set meets a preset stop condition.

In particular, as shown in fig. 3, this embodiment proposes an embodiment for building sample refinement: to enable those skilled in the art to further understand the present embodiment, we need to convert the historical data into semantic segmentation samples for use in the deep learning model to improve the performance of the model, by choosing the semantic segmentation samples for a 0.5 m spatial resolution building, the scene has 10000 sets of building samples with boundary errors within 2 pixels sketched on 21 counties and 15 additional counties. This embodiment is more detailed and the effect of this embodiment in semantic segmentation sample refinement can be seen from some of the result data. The specific steps of this embodiment are as follows:

The first step is defined as a high quality sample set, a candidate sample set and a carefully selected sample set of the building.

The three sample sets are used for storing high-quality samples, candidate samples and carefully selected samples of the semantic segmentation samples related to the invention. The high-quality samples are existing samples with good labeling quality, 10000 groups are provided, the sizes of the samples are 512 x 512 pixels, the carefully selected samples are selected samples with good labeling quality from candidate samples, and the candidate sample set is generated from historical images of other 15 counties and cities and building contour vector result data.

And step two, obtaining candidate samples.

The high-quality building sample is randomly divided into a training set and a verification set, a deep LabV3+ model is selected as a building extraction method, the training set of the high-quality building sample is put into the deep LabV3+ for training, and the model which is highest in the verification set and IoU is stored as the optimal model. The model will be used to predict and evaluate the areas of generalized differences from the historical images, as well as for the first sample refinement. And predicting a historical image by using a deep LabV3+ model trained by a high-quality sample set, and carrying out block evaluation on a prediction result and obtaining the accuracy of each block by taking a building contour vector corresponding to the historical image as a label. And extracting an image region with the evaluation index smaller than the threshold value 1 as a candidate sample set acquisition region. And finally, cutting out the data of the candidate sample set acquisition area and generating a candidate sample set with the sample size of 512 x 512 pixels. The step of acquiring the candidate sample set acquisition region is as follows.

Step 2.1, dividing the image into standard image blocks according to a grid of 5000 x 5000 pixels.

And 2.2, taking the F1 score as a standard image block prediction quality evaluation index for block evaluation.

And 2.3, superposing a ground feature prediction result on the historical image and a ground feature outline vector result corresponding to the image, and carrying out block evaluation on the image by using the F1 score by taking a standard image block as a unit to obtain an F1 score evaluation value of each standard image block.

Step 2.4 determines that the F1 score threshold for standard image block prediction is 0.7 as threshold 1. Because of factors such as image quality, resolution, spectrum, complexity of features and the like, the difficulty of extracting different features by the model is different, a professional sets a threshold according to experience to distinguish a region with poor generalization of the model, and if the experience is lacking, the threshold can be set by referring to the accuracy obtained on a verification set during high-quality sample training. The present embodiment empirically determines the threshold.

And 2.5, extracting an image region with the evaluation index smaller than the threshold value 1 as a candidate sample set acquisition region.

And thirdly, sample selection is carried out from the candidate sample set.

If the sample is selected for the first time, at the moment, no sample exists in the selected sample set, and sample selection is carried out on the candidate sample set by using a model trained by the high-quality sample set; if the sample is not the first sample refinement, then the samples in the refinement sample set are already present and the model has been retrained with the refinement sample, and the remaining candidate sample set is subjected to sample refinement using the model trained with the refinement sample. The specific procedure for primary sample refinement is as follows.

Step 3.1, obtaining an optimal model for sample refinement, wherein if the sample refinement is the first sample refinement, the optimal model is obtained by training a high-quality sample set, and if the sample refinement is not the first sample refinement, the optimal model is obtained by training an existing refinement sample set.

Step 3.2 predicts the candidate (remaining candidate) sample set using the best model, calculating the Local minimum F1 index Local (N) minF1, n= (512/128) 2 for each sample.

Step 3.3 the candidate (remaining candidate) samples are ordered from large to small according to the local minimum F1 index to obtain a sample sequence, and then a threshold value 2 equal to 0.75 is determined as a limit of high quality samples. Because of factors such as image quality, resolution, spectrum, complexity of features and the like, the difficulty of extracting different features by the model is different, a professional sets a threshold according to experience to distinguish a region with poor generalization of the model, and if the experience is lacking, the threshold can be set by referring to the accuracy obtained on a verification set during high-quality sample training. The present embodiment empirically determines the threshold.

And 3.4, selecting samples with local minimum F1 indexes larger than a threshold value 2, and shifting the samples into a carefully selected sample set, so as to finish carefully selecting the samples once.

And step four, obtaining a carefully chosen sample set through carefully chosen iteration. The specific steps are as follows.

Step 4.1, after one sample selection, randomly dividing the selected sample set into a training set and a verification set, training a model by using the selected sample set, and storing the model which is the highest on the verification set IoU as the best model so as to obtain the best model for further generalizing the candidate sample set, and taking the best model as the model of the next selected sample.

Step 4.2 continues sample refinement of the remaining candidate sample sets using the best model trained on the refinement sample set, and repeats the sample refinement iterations of steps 4.1 and 4.2 until the sample refinement iteration reaches an increment, i.e., the number of samples that increase when the refinement sample set is distant from the last time the model was built with high quality building sample sets, the increment being set to 3000 in this example.

And 4.3, merging the carefully chosen sample set obtained in the carefully chosen iteration with the high-quality sample set, training and obtaining an optimal model, ending carefully chosen if the model meets the self-defined requirement, obtaining the carefully chosen sample set, and continuing to carry out carefully chosen iteration until the carefully chosen sample set meets the self-defined requirement and obtaining the carefully chosen sample set if the model does not meet the self-defined requirement. The invention does not limit the custom index used for judging whether the model meets the requirement.

It will be appreciated that, as shown in the following table, the candidate sample set of the building in this embodiment is recorded as 140000 groups of samples, the selected sample set obtained by sample selection has about 55000 groups of samples, and IoU of the two groups of samples on the verification set are trained by the comparison model, so that the model has about 10% of the verification set IoU lifted on the selected sample set, and it is precisely because the model has lower precision on the candidate sample set due to incorrect labels in the historical data, which indicates that the purity of the samples has an important influence on the training of the model.

	Candidate sample set	Choice sample set
			Number/group	About 140000	About 55000
Verification set IoU	0.6123	0.7075

In addition, when the candidate sample set is trained to obtain the model, as shown in fig. 4, the model convergence is poor and the prediction result is also poor due to a large number of errors and poor quality labels in the candidate sample set, so that the gray level of the prediction result is shown as a lot of gray-covered error extraction results in fig. 5, which can be seen in fig. 6 after the sample is carefully selected, and meanwhile, the house extraction result is partially improved, and the model obtained after the sample is carefully selected can be seen as shown in fig. 7.

In particular implementations, as shown in fig. 8, the present embodiment relies on a high quality sample set to predict and obtain a region of generalization difference on an image only for the first time by training a model to generate a candidate sample set, which is then followed by an iterative operation of prediction, refinement, training. For the first selection in the candidate sample set, since the blocks in the block evaluation process for obtaining the generalization difference are far larger than the sample size, the generalization difference does not represent that the model predicts poorly at each local part of the block, but the prediction effect is poor in most local areas, so that the overall accuracy of the block is low, and therefore, the problem that high-quality samples cannot be extracted due to generalization in the first sample selection is avoided. After the first sample refinement, anything is related to other things according to the first law of geography, namely the law of spatial correlation, but similar things are more closely related, the model trained by using the refined sample set increases the generalization of the model in the candidate sample set. Therefore, the generalization of the model in the candidate sample set can be continuously improved by utilizing the benign iteration, and high-quality samples mixed in the middle section and the rear section of the precision evaluation sequence can be continuously selected. In addition, in the second and subsequent sample refinement, only the model is trained by using the refined sample set, so that the time cost caused by training the model together with the high-quality sample set can be reduced.

The sample selection technology based on remote sensing information result data is used for semantic segmentation sample selection, an image data area with poor model generalization is obtained through block precision evaluation, the sample with poor local annotation is prevented from being selected as a high-quality sample through the local minimum F1 index, and the model generalization is gradually improved through iterative selection; dividing the candidate sample into a high-quality sample, a low-quality sample and an error sample in sample selection, and analyzing the distribution condition of the three parts, thereby determining sample selection by acquiring the high-quality sample and realizing iterative sample selection; through virtuous circle, the choice model can continuously increase generalization on the candidate sample set, and high-quality samples are continuously extracted from the candidate sample set.

Referring to fig. 11, a flow chart of a second embodiment of the smart sample beneficiation method of the present invention is shown.

Based on the above-mentioned first embodiment, the step S40 of the intelligent sample selection method of the present embodiment further includes:

step S401: the image information is divided into standard image blocks in the prediction result in a geographic unit and/or grid mode.

It should be noted that one of the problems faced by the present invention is how to avoid introducing a large number of redundant samples when generating the feature semantic segmentation samples, and although labeling of these samples may be very accurate, introducing them has limited significance for improving the model accuracy. The block evaluation can obtain a region of poor generalization ability of the model by obtaining a manner in which the evaluation index is lower than a prescribed threshold. The image can be divided into grids according to the specified width and height, the image can be divided into areas according to the cognitive limit, such as streets, rivers, mountains and the like, and then the evaluation index on each image block after evaluation can be obtained. Because of factors such as image quality, resolution, spectrum, complexity of features and the like, the difficulty of extracting different features by the model is different, a professional sets a threshold according to experience to distinguish a region with poor generalization of the model, and if the experience is lacking, the threshold can be set by referring to the accuracy obtained on a verification set during high-quality sample training.

In the block evaluation, the feature outline vector corresponding to the historical image is used as the label, and as described above, the vector label is inconsistent with the actual feature of the image, so that a local area exists, namely, the prediction result is good and the label is not good, or the prediction result is not good and the label is also not good, namely, the evaluation index is low, candidate samples prepared by the areas are not selected in the subsequent carefully selecting operation, and therefore, in this step, the user aims to reject the predicted area from the labeled area, and redundant samples which have no meaning of improving the model performance are avoided.

Step S402: and obtaining a prediction quality evaluation index of the standard image block.

Step S403: and acquiring an image data area by combining the prediction quality evaluation index in a block evaluation mode.

In this embodiment, the image is divided into standard image blocks according to a geographic unit or grid mode. And determining a standard image block prediction quality evaluation index for block evaluation. And superposing the ground object prediction result on the historical image and the ground object outline vector result corresponding to the image, and carrying out block evaluation on the image by taking the standard image block as a unit by using the determined evaluation index to obtain an evaluation value of each standard image block. And determining a standard image block prediction quality evaluation threshold value as a threshold value 1. And extracting an image region with the evaluation index smaller than the threshold value 1 as a candidate sample set acquisition region.

Further, in order to accurately acquire the image data acquisition area, the step of acquiring the image data area by combining the prediction quality evaluation index in a block evaluation manner includes: obtaining a feature prediction result and a feature outline vector result corresponding to the feature prediction result from the historical image data; superposing the ground object prediction result and the ground object outline vector result to generate a superposition image; carrying out block evaluation on the superimposed image according to the prediction quality evaluation index to obtain an evaluation value of each standard image block; and extracting an image area with the evaluation value smaller than a preset threshold value as an image data acquisition area.

Step S404: and cutting the data image area into target samples and storing the target samples into a preset candidate sample set.

In the embodiment, the image information is divided into standard image blocks according to the geographic units and/or grids in the prediction result; acquiring a prediction quality evaluation index of the standard image block; acquiring an image data area by combining the prediction quality evaluation index in a block evaluation mode; cutting the data image area into target samples and storing the target samples into a preset candidate sample set; the technical effect of accurately generating the data image area in the preset candidate sample set is achieved.

Furthermore, embodiments of the present invention provide a computer readable storage medium having stored thereon a program of sample refinement, which when executed by a processor, implements the steps of the method of sample refinement as described above.

Referring to fig. 12, fig. 12 is a block diagram illustrating a first embodiment of the intelligent sample concentrating apparatus according to the present invention.

As shown in fig. 12, the intelligent sample selecting device according to the embodiment of the present invention includes:

the sample to be trained obtaining module 10 is configured to obtain a sample to be trained, and store the sample to be trained into a preset high-quality sample set;

the model training module 20 is configured to perform model training on the preset high-quality sample set to obtain a target model;

a prediction result obtaining module 30, configured to obtain historical image data, and predict the historical image data by using the target model to obtain a prediction result;

the block evaluation module 40 is configured to perform block evaluation in the prediction result to obtain an image data area, and cut the image data area into a target sample and store the target sample in a preset candidate sample set;

a local minimum F1 index obtaining module 50, configured to determine local minimum F1 indexes of all the target samples in the preset candidate sample set by using the target model;

A preset condition module 60, configured to determine a final sample set according to a preset rule in combination with the local minimum F1 index, and store the final sample set in a preset carefully chosen sample set;

a pick sample set module 70 for performing a pick iteration on the preset pick sample set to obtain a pick sample set.

It should be understood that the foregoing is illustrative only and is not limiting, and that in specific applications, those skilled in the art may set the invention as desired, and the invention is not limited thereto.

In an embodiment, the block evaluation module 40 is further configured to divide the image information into standard image blocks according to geographic units and/or grids in the prediction result; acquiring a prediction quality evaluation index of the standard image block; acquiring an image data area by combining the prediction quality evaluation index in a block evaluation mode; and cutting the data image area into target samples and storing the target samples into a preset candidate sample set.

In an embodiment, the block evaluation module 40 is further configured to obtain a feature prediction result and a feature contour vector result corresponding to the feature prediction result from the historical image data; superposing the ground object prediction result and the ground object outline vector result to generate a superposition image; carrying out block evaluation on the superimposed image according to the prediction quality evaluation index to obtain an evaluation value of each standard image block; and extracting an image area with the evaluation value smaller than a preset threshold value as an image data acquisition area.

In an embodiment, the local minimum F1 index obtaining module 50 is further configured to determine whether to perform sample refinement for the first time in the preset candidate sample set; if yes, determining local minimum F1 indexes of all the target samples according to the target model; if not, using the training model of the existing carefully chosen sample set to obtain a more generalized model, updating the target model, carefully choosing the residual candidate samples by using the new target model, and determining the local minimum F1 index of the residual candidate samples by using the target model.

In an embodiment, the Local minimum F1 index obtaining module 50 is further configured to determine a Local minimum F1 index Local (N) minF1 of all the target samples in the candidate sample set by using the target model, where a calculation formula is as follows:

In an embodiment, the preset condition module 60 is further configured to obtain a preset ordering rule, and determine a sample sequence according to the preset ordering rule in combination with a local minimum F1 index of the target sample; acquiring a preset limit threshold, and screening in the sample sequence through the preset limit threshold to determine a final sample set; and storing the final sample set into a preset carefully chosen sample set.

In one embodiment, the fine sample set module 70 is further configured to train the target model using the preset fine sample set to update the target model; updating the preset carefully chosen sample set by combining the current target model with the current candidate sample set; and merging the current preset carefully chosen sample set with the high-quality sample set to obtain a carefully chosen sample set when the current preset carefully chosen sample set meets a preset stop condition.

It should be noted that the above-described working procedure is merely illustrative, and does not limit the scope of the present invention, and in practical application, a person skilled in the art may select part or all of them according to actual needs to achieve the purpose of the embodiment, which is not limited herein.

In addition, technical details not described in detail in this embodiment may refer to the method for sample selection provided in any embodiment of the present invention, which is not described herein.

Furthermore, it should be noted that, in this document, the terms "comprises," "comprising," or any other variation thereof, are intended to cover a non-exclusive inclusion, such that a process, method, article, or system that comprises a list of elements does not include only those elements but may include other elements not expressly listed or inherent to such process, method, article, or system. Without further limitation, an element defined by the phrase "comprising one … …" does not exclude the presence of other like elements in a process, method, article, or system that comprises the element.

The foregoing embodiment numbers of the present invention are merely for the purpose of description, and do not represent the advantages or disadvantages of the embodiments.

From the above description of the embodiments, it will be clear to those skilled in the art that the above-described embodiment method may be implemented by means of software plus a necessary general hardware platform, but of course may also be implemented by means of hardware, but in many cases the former is a preferred embodiment. Based on such understanding, the technical solution of the present invention may be embodied essentially or in a part contributing to the prior art in the form of a software product stored in a storage medium (e.g. Read Only Memory)/RAM, magnetic disk, optical disk) and including several instructions for causing a terminal device (which may be a mobile phone, a computer, a server, or a network device, etc.) to perform the method according to the embodiments of the present invention.

The foregoing description is only of the preferred embodiments of the present invention, and is not intended to limit the scope of the invention, but rather is intended to cover any equivalents of the structures or equivalent processes disclosed herein or in the alternative, which may be employed directly or indirectly in other related arts.

Claims

1. A method for smart sample beneficiation, comprising:

determining local minimum F1 indexes of all target samples in the preset candidate sample set by utilizing the target model;

determining a final sample set according to a preset rule by combining the local minimum F1 index, and storing the final sample set into a preset carefully selected sample set;

performing carefully chosen iteration on the preset carefully chosen sample set to obtain a carefully chosen sample set;

the step of performing block evaluation in the prediction result to obtain an image data area, cutting the image data area into a target sample, and storing the target sample in a preset candidate sample set includes:

Acquiring a prediction quality evaluation index of the standard image block;

cutting the data image area into target samples and storing the target samples into a preset candidate sample set;

the step of obtaining the image data area by combining the prediction quality evaluation index in a block evaluation mode comprises the following steps:

obtaining a feature prediction result and a feature outline vector result corresponding to the feature prediction result from the historical image data;

superposing the ground object prediction result and the ground object outline vector result to generate a superposition image;

carrying out block evaluation on the superimposed image according to the prediction quality evaluation index to obtain an evaluation value of each standard image block;

extracting an image area with the evaluation value smaller than a preset threshold value as an image data acquisition area;

the step of determining the local minimum F1 index of all the target samples by using the target model in the preset candidate sample set includes:

Local(N)minF1＝min(Local(N ₁ )F1，......，Local(N _i )F1，......，Local(N _n )F1)Local(N _i ) F1 means a local F1 fraction, N means a local block number into which the sample is divided, N _i Refers to the i-th local block, P is the global accuracy of the sample, R is the global recall of the sample, P _L Is the local accuracy of the sample, R _L The local recall rate of the sample is calculated, p is the number of pixels occupied by the positive class in the sample prediction result, and l is the number occupied by the positive class in the label.

2. The intelligent sample refinement method according to claim 1, further comprising, prior to said step of determining a local minimum F1 index for all of said target samples using said target model in said set of preset candidate samples:

3. The intelligent sample beneficiation method in accordance with claim 1, wherein the step of determining a final sample set in accordance with a preset rule in combination with the local minimum F1 index and storing the final sample set in a preset beneficiation sample set comprises:

acquiring a preset limit threshold, and screening in the sample sequence through the preset limit threshold to determine a final sample set;

and storing the final sample set into a preset carefully chosen sample set.

4. The intelligent sample refinement method according to claim 1, wherein said step of performing refinement iterations on said preset refinement sample set to obtain a refinement sample set comprises:

training the target model by using the preset carefully chosen sample set so as to update the target model;

5. An intelligent sample beneficiation device, comprising:

the sample to be trained acquisition module is used for acquiring a sample to be trained and storing the sample to be trained into a preset high-quality sample set;

The model training module is used for carrying out model training on the preset high-quality sample set to obtain a target model;

a carefully chosen sample set module, configured to perform carefully chosen iteration on the preset carefully chosen sample set to obtain a carefully chosen sample set;

the block evaluation module is further used for dividing the image information into standard image blocks according to geographic units and/or grids in the prediction result;

acquiring a prediction quality evaluation index of the standard image block;

the block evaluation module is further used for obtaining a feature prediction result and a feature contour vector result corresponding to the feature prediction result from the historical image data;

carrying out block evaluation on the superimposed image according to the prediction quality evaluation index to obtain an evaluation value of each standard image block; extracting an image area with the evaluation value smaller than a preset threshold value as an image data acquisition area;

the Local minimum F1 index obtaining module is further configured to determine Local minimum F1 indexes (N) minF1 of all the target samples in the candidate sample set by using the target model, where a calculation formula is as follows:

Local(N)minF1＝min(Local(N ₁ )F1，......，Local(N _i )F1，......，Local(N _n )F1)

Local(N _i ) F1 means a local F1 fraction, N means a local block number into which the sample is divided, N _i Refers to the i-th local block, P is the global accuracy of the sample, R is the global recall of the sample, R _L Is the local accuracy of the sample, R _L The local recall rate of the sample is calculated, p is the number of pixels occupied by the positive class in the sample prediction result, and l is the number occupied by the positive class in the label.

6. A computer device, the device comprising: a memory, a processor which, when executing the computer instructions stored by the memory, performs the method of any one of claims 1 to 4.

7. A computer readable storage medium comprising instructions which, when run on a computer, cause the computer to perform the method of any one of claims 1 to 4.