CN115984559A

CN115984559A - Intelligent sample selection method and related device

Info

Publication number: CN115984559A
Application number: CN202211685928.9A
Authority: CN
Inventors: 陈婷; 李志强; 丁媛; 朱艳丽; 刘书静; 何建军
Original assignee: Twenty First Century Aerospace Technology Co ltd
Current assignee: Twenty First Century Aerospace Technology Co ltd
Priority date: 2022-12-27
Filing date: 2022-12-27
Publication date: 2023-04-18
Anticipated expiration: 2042-12-27
Also published as: CN115984559B

Abstract

The application discloses a method for selecting intelligent samples and a related device, comprising the following steps: the sample selection technology based on remote sensing information result data is used for selecting semantic segmentation samples, an image data area with poor model generalization is obtained through block precision evaluation, a sample with poor local labeling is prevented from being selected as a high-quality sample through a local minimum F1 index, and the model generalization is gradually improved through iterative selection; dividing the candidate sample into a high-quality sample, a low-quality sample and an error sample in sample selection, and analyzing the distribution conditions of the three samples, thereby determining that the sample selection is performed by obtaining the high-quality sample and realizing the sample iterative selection; and through virtuous cycle, the selection model can continuously increase the generalization of the candidate sample set, and continuously extract high-quality samples from the candidate sample set.

Description

Intelligent sample selection method and related device

Technical Field

The present application relates to the field of sample selection technologies, and in particular, to an intelligent sample selection method and a related apparatus.

Background

As is known, the remote sensing data volume is huge, and the extraction of the surface feature information is the important content in the field of remote sensing. At present, a deep learning technology is slowly becoming a mainstream method for extracting remote sensing ground feature information, however, the deep learning technology under data driving usually needs to rely on a large number of samples to obtain a better extraction effect. In different tasks such as scene classification, target detection, semantic segmentation and the like, particularly, the sample labeling work of the semantic segmentation task is the most time-consuming and labor-consuming, and for the problem, students research a plurality of automatic or semi-automatic semantic segmentation sample labeling methods, but the problems of poor working efficiency and low accuracy of the sample labeling work of the semantic segmentation task cannot be solved at present.

Therefore, how to solve the problems of low efficiency and low accuracy of the sample labeling of the semantic segmentation task becomes a technical problem to be solved urgently.

Disclosure of Invention

In order to timely generate a refuge path when a natural disaster comes, the application provides an intelligent sample selection method and a related device.

In a first aspect, the intelligent sample selection method provided by the present application adopts the following technical scheme:

an intelligent sample refinement method, comprising:

obtaining a sample to be trained, and storing the sample to be trained into a preset high-quality sample set;

performing model training on the preset high-quality sample set to obtain a target model;

acquiring historical image data, and predicting the historical image data by using the target model to acquire a prediction result;

performing block evaluation in the prediction result to obtain an image data area, cutting the image data area into target samples, and storing the target samples in a preset candidate sample set;

determining local minimum F1 indexes of all the target samples by using the target model in the preset candidate sample set; determining a final sample set according to a preset rule and the local minimum F1 index, and storing the final sample set into a preset choice sample set;

performing a culling iteration on the preset culling sample set to obtain a culling sample set.

Optionally, the step of performing block evaluation in the prediction result to obtain an image data region, cutting the image data region into target samples, and storing the target samples in a preset candidate sample set includes:

dividing image information into standard image blocks in the prediction result according to a geographic unit and/or a grid mode;

obtaining a prediction quality evaluation index of the standard image block;

obtaining an image data area by combining the prediction quality evaluation index in a blocking evaluation mode;

and cutting the data image area into a target sample and storing the target sample in a preset candidate sample set.

Optionally, the step of obtaining the image data region by combining the prediction quality evaluation index in a block evaluation manner includes:

acquiring a feature prediction result and a feature contour vector result corresponding to the feature prediction result from the historical image data; superposing the feature prediction result and the feature outline vector result to generate an overlaid image;

block evaluation is carried out on the superposed image according to the prediction quality evaluation index so as to obtain an evaluation value of each standard image block; and extracting the image area with the evaluation value smaller than the preset threshold value as an image data acquisition area.

Optionally, before the step of determining the local minimum F1 index of all the target samples by using the target model in the preset candidate sample set, the method further includes:

judging whether sample concentration is carried out for the first time in the preset candidate sample set;

if yes, determining local minimum F1 indexes of all the target samples according to the target model;

if not, using the existing refined sample set to train the model, obtaining a more generalized model, updating the target model, using the new target model to refine the remaining candidate samples, and determining the local minimum F1 index of the remaining candidate samples through the target model.

Optionally, the step of determining the local minimum F1 index of all the target samples by using the target model in the preset candidate sample set includes:

determining a Local minimum F1 index, local (N) minF1, of all the target samples in the set of candidate samples using the target model, as calculated by:

Local(N)minF1＝min(Local(N ₁ )F1,……,Local(N _i )F1,……,Local(N _n )F1)

Local(N _i ) F1 is the local F1 fraction, N is the number of local blocks into which the sample is divided, N _i Referring to the ith local block, P is the sample global precision, R is the sample global recall, P _L Is the local accuracy of the sample, R _L And the local recall rate of the sample, p is the number of pixels occupied by the positive class in the sample prediction result, and l is the number occupied by the positive class in the label.

Optionally, the step of determining a final sample set according to a preset rule by combining the local minimum F1 index, and storing the final sample set into a preset culling sample set includes:

acquiring a preset ordering rule, and determining a sample sequence according to the preset ordering rule in combination with the local minimum F1 index of the target sample;

obtaining a preset limit threshold, and screening in the sample sequence through the preset limit threshold to determine a final sample set; and storing the final sample set into a preset selected sample set.

Optionally, the step of performing a refinement iteration on the preset refinement sample set to obtain a refinement sample set includes: training the target model using the preset cull sample set to update the target model;

updating the preset refined sample set by using the current target model and the current candidate sample set;

when the current preset selected sample set meets a preset stop condition, merging the current preset selected sample set and the high-quality sample set to obtain a selected sample set.

In a second aspect, the present application provides an intelligent sample beneficiation device, comprising:

the device comprises a to-be-trained sample acquisition module, a to-be-trained sample acquisition module and a to-be-trained sample storage module, wherein the to-be-trained sample acquisition module is used for acquiring a to-be-trained sample and storing the to-be-trained sample into a preset high-quality sample set; the model training module is used for carrying out model training on the preset high-quality sample set to obtain a target model;

the prediction result acquisition module is used for acquiring historical image data and predicting the historical image data by using the target model to acquire a prediction result;

the block evaluation module is used for carrying out block evaluation in the prediction result to obtain an image data area, cutting the image data area into target samples and storing the target samples into a preset candidate sample set;

a local minimum F1 index acquisition module, configured to determine local minimum F1 indexes of all the target samples in the preset candidate sample set by using the target model;

the preset condition module is used for determining a final sample set according to a preset rule by combining the local minimum F1 index and storing the final sample set into a preset choice sample set;

a refined sample set module for performing a refinement iteration on the preset refined sample set to obtain a refined sample set.

In a third aspect, the present application provides a computer apparatus, the apparatus comprising: a memory, a processor that, when executing computer instructions stored by the memory, performs a method as in any one of the above.

In a fourth aspect, the present application provides a computer-readable storage medium comprising instructions which, when executed on a computer, cause the computer to perform the method as described above.

In summary, the present application includes the following advantageous technical effects:

the sample selection technology based on remote sensing information result data is used for selecting semantic segmentation samples, image data areas with poor model generalization are obtained through block precision evaluation, the samples with poor local labeling are prevented from being selected as high-quality samples through a local minimum F1 index, and the model generalization is gradually improved through iterative selection; dividing the candidate sample into a high-quality sample, a low-quality sample and an error sample in sample selection, and analyzing the distribution conditions of the three samples, thereby determining that the sample selection is performed by obtaining the high-quality sample and realizing the sample iterative selection; and through virtuous cycle, the selection model can continuously increase the generalization of the candidate sample set, and continuously extract high-quality samples from the candidate sample set.

Drawings

FIG. 1 is a schematic diagram of a computer device architecture of a hardware operating environment according to an embodiment of the present invention;

FIG. 2 is a schematic flow chart diagram of a first embodiment of the intelligent sample refinement method of the present invention;

FIG. 3 is a schematic diagram of three different quality samples of a first embodiment of the intelligent sample refinement method of the present invention;

FIG. 4 is an evaluation distribution plot of different prediction results combined with different quality labels for the first embodiment of the intelligent sample refinement method of the present invention;

FIG. 5 is a schematic diagram of building semantic segmentation sample refinement according to a first embodiment of the intelligent sample refinement method of the present invention;

FIG. 6 is a raw image of a first embodiment of the intelligent sample refinement method of the present invention;

FIG. 7 is a gray scale image of the prediction result of the model obtained by training the candidate sample set according to the first embodiment of the intelligent sample refinement method of the present invention;

FIG. 8 is a gray scale image prediction result of a model obtained by training a selected sample set according to a first embodiment of the intelligent sample selection method of the present invention;

FIG. 9 is a comparison graph of the local building extraction results gray scale of the first embodiment of the intelligent sample refinement method of the present invention;

FIG. 10 is a schematic diagram of sample picking training in accordance with a first embodiment of the intelligent sample picking method of the present invention;

FIG. 11 is a schematic flow chart diagram of a second embodiment of the intelligent sample refinement method of the present invention;

FIG. 12 is a block diagram of the first embodiment of the intelligent sample culling apparatus of the invention.

Detailed Description

In order to make the objects, technical solutions and advantages of the present application more apparent, the present application is further described in detail below with reference to the accompanying drawings and embodiments. It should be understood that the specific embodiments described herein are merely illustrative of and not restrictive on the broad application.

Referring to fig. 1, fig. 1 is a schematic structural diagram of a computer device in a hardware operating environment according to an embodiment of the present invention.

As shown in fig. 1, the computer device may include: a processor 1001, such as a Graphics Processing Unit (GPU), a communication bus 1002, a user interface 1003, a network interface 1004, and a memory 1005. Wherein a communication bus 1002 is used to enable connective communication between these components. The user interface 1003 may include a Display screen (Display), an input unit such as a Keyboard (Keyboard), and the optional user interface 1003 may also include a standard wired interface, a wireless interface. The network interface 1004 may optionally include a standard wired interface, a Wireless interface (e.g., a Wireless-Fidelity (Wi-Fi) interface). The Memory 1005 may be a Random Access Memory (RAM) or a Non-Volatile Memory (NVM), such as a disk Memory. The memory 1005 may alternatively be a storage device separate from the processor 1001.

Those skilled in the art will appreciate that the configuration shown in FIG. 1 is not intended to be limiting of computer devices and may include more or fewer components than shown, or some components may be combined, or a different arrangement of components.

As shown in fig. 1, the memory 1005, which is one type of storage medium, may include therein an operating system, a network communication module, a user interface module, and an intelligent sample culling program.

In the computer device shown in fig. 1, the network interface 1004 is mainly used for data communication with a network server; the user interface 1003 is mainly used for data interaction with a user; the processor 1001 and the memory 1005 of the computer apparatus of the present invention may be provided in a computer apparatus which calls the intelligent sample culling program stored in the memory 1005 through the processor 1001 and performs the intelligent sample culling method provided by the embodiment of the present invention.

An embodiment of the present invention provides an intelligent sample selection method, and referring to fig. 2, fig. 2 is a schematic flow chart of a first embodiment of the intelligent sample selection method according to the present invention.

In this embodiment, the intelligent sample selection method includes the following steps:

step S10: and acquiring a sample to be trained, and storing the sample to be trained into a preset high-quality sample set.

It should be noted that the problem to be solved in this embodiment is to obtain high-quality semantic segmentation samples of the surface features from the historical remote sensing images and the surface feature contour vectors to supplement the surface feature semantic segmentation samples, so as to reduce the cost of manually making samples by effectively using historical data to generate samples, and shorten the update training period and precision of the intelligent surface feature extraction model.

It can be understood that, to generate a high-quality semantic segmentation sample of the ground object by using the historical remote sensing image and the contour vector of the ground object, the embodiment first needs to understand the possible problems: the historical remote sensing image is not necessarily a base map when the ground feature information contour vector is produced, and the historical remote sensing image may have little correspondence in time phase, which can cause ground feature labels in the vector to be inconsistent with the actual situation in the image. When the surface feature semantic segmentation samples are generated, how to avoid introducing a large number of redundant samples is avoided, and although the labeling of the samples can be very accurate, the introduction of the samples has a limited meaning for improving the model precision. In the generation of a ground object semantic segmentation sample, how to measure the quality of a generated sample label is to be measured, although various methods for measuring the quality of the sample label exist in the current sample selection technology, for the semantic segmentation sample, the quality of the whole sample label needs to be ensured by considering the quality of each local part.

In order to solve the above problems, the present embodiment provides a sample selection technique based on remote sensing information result data, which is used for semantic segmentation sample selection. In practical applications, a large data set used for training often has a large amount of irrelevant, redundant, incomplete and noisy data due to different factors, and the redundant, noisy and erroneous data of the scene targeted by the present invention are mainly caused by the first problem mentioned above except for the factor of manual operation difference, which affects the locally optimal solution of the model. In the prior art, few researches for sample selection of semantic segmentation samples benefit from scenes with a large number of historical images and ground feature contour vectors, so that the researches have strong support. In the face of the second problem, the invention provides block evaluation to obtain a sample of which the model is difficult to generalize, and in the face of the third problem, the invention provides the use of a local minimum F1 index to pay attention to the local marking quality of the sample.

In one implementation, the present embodiment pertains to a sample selection technique. With the increasing amount of remote sensing data, a large amount of redundant and noisy data exist in the training set; on the other hand, large-scale training data brings about the problems of more storage, more calculation complexity and the like, so that the generalization capability is influenced, and the prediction accuracy is reduced. The number and quality of samples can affect the performance of the computer and the robustness of the model. The sample selection method can reduce the calculation cost and even improve the learning precision by discarding redundant, noisy data and other negative samples. Current sample selection techniques can be broadly summarized as data compression, which removes a portion of irrelevant or redundant samples, and active learning, which selects a representative set of unlabeled samples for learning.

In specific implementation, based on the remote sensing surface feature semantic segmentation sample labeling scene faced by this embodiment, each pixel of an image needs to be labeled according to the semantic segmentation sample standard, and the labeling quality has an influence on the precision of a model training result. Few prior art techniques provide high quality sample selection for semantically segmented samples. Unsupervised active learning is not suitable for the scenario faced by the present embodiment, because the historical contour vector labeling data of the feature existing in the scenario can be used as a backup label. For sample selection technologies such as random sub-sampling, uniform sub-sampling, high-amplitude rank sub-sampling and the like, the requirement of selecting high-quality labeled samples from historical data in the scene cannot be met. For the method of selecting samples by using various index information in deep learning, most of the current technologies perform sample selection aiming at scene classification or target detection in the field of image vision, few related technologies select semantic segmentation samples, and the method mentioned in the current technology is difficult to directly move to the semantic segmentation sample selection.

It should be noted that the purpose of this embodiment is to obtain high-quality semantic segmentation samples of the surface features from the historical remote sensing images and the surface feature contour vectors to supplement these surface feature semantic segmentation samples, so as to reduce the cost of artificially creating samples by effectively using historical data to generate samples, and shorten the update training period and precision of the intelligent surface feature extraction model. Specifically, a model trained by existing high-quality samples is used for predicting collected historical images; then, block evaluation is carried out by using the prediction result and historical contour vector data of the ground feature to obtain an image data area of which the model is difficult to generalize, and the aim of avoiding introducing more redundant samples in sample selection is fulfilled; and finally, performing sample cutting and sample selection on the image data and the vector data in the acquisition range, and avoiding selecting a labeling sample with poor local part as a high-quality sample through a local minimum F1 index in the sample selection.

In a specific implementation, a high quality sample set, a candidate sample set, a cull sample set are defined. The three sample sets are respectively used for storing high-quality samples, candidate samples and selected samples of the semantic segmentation samples involved in the invention. The high-quality samples are the existing samples with good labeling quality, and the selected samples are the samples with good labeling quality selected from the candidate samples.

It should be noted that the sample to be trained in this embodiment is a high-quality sample.

It is to be understood that, in order to determine the sample selection scheme, the present invention first classifies the candidate samples. The method roughly divides the candidate samples into three types of high-quality samples, low-quality samples and error samples according to the marked quality. For three types of samples with different masses, we describe them as follows:

error samples: the method refers to a sample with a large area and multiple marks or less marks on the positive type in the label.

Low quality samples: the mark indicates that the positive type has dislocation, or the positive type gather area has a small amount of samples with more or less marks.

High quality samples: the method refers to samples with dislocation of the positive type in the label within an allowable error range or with few or more labels in the positive type clustering area.

Meanwhile, after the candidate sample set is sorted from large to small according to the local minimum F1 index, we find that a certain distribution rule exists among high-quality samples, low-quality samples and error samples, wherein the low-precision error samples at the rear end of the sequence include a lot of samples labeled with good errors, and the medium-precision low-quality samples in the middle of the sequence also include samples labeled with good errors. Based on the above distribution rules, the present invention determines a strategy for culling high quality samples from a candidate set of samples by being greater than some local minimum F1 metric threshold, as shown in FIG. 3.

Further, the results of good and bad prediction are counted together with the expected accuracy of high quality labeling, low quality labeling and wrong labeling, and as shown in fig. 4, it can be seen that the high quality samples are mixed in the case of low accuracy, and the high quality samples can be basically considered to be all high quality samples in the case of high accuracy. The feasibility of the present invention is thus further demonstrated for a strategy to cull high quality samples from a set of candidate samples by being greater than some local minimum F1 index threshold.

Step S20: and performing model training on a preset high-quality sample set to obtain a target model.

It is understood that candidate samples are obtained. And carrying out model training by using a high-quality sample set to obtain an optimal model on the sample set, and then predicting the historical image by using the model to obtain a prediction result of the ground feature.

Step S30: historical image data are obtained, and the target model is used for predicting the historical image data to obtain a prediction result.

Step S40: and performing block evaluation in the prediction result to obtain an image data area, cutting the image data area into target samples, and storing the target samples in a preset candidate sample set.

It should be noted that the influence data area acquired in this embodiment refers to an acquired image data area with poor generalization or a candidate sample set acquisition area.

In the specific implementation, the target model is used for predicting the historical images to obtain the prediction result of the ground feature. And acquiring a candidate sample set acquisition area, wherein the method comprises the steps of overlapping a feature prediction result on a historical image with a feature contour vector result corresponding to the image, performing block evaluation, and extracting an image area with an evaluation index smaller than a threshold value 1 as the candidate sample set acquisition area. And finally, cutting the data of the candidate sample set acquisition area and generating a candidate sample set.

Step S50: and determining local minimum F1 indexes of all target samples by using the target model in a preset candidate sample set.

In a specific implementation, the semantic segmentation samples are selected, and the labels of the semantic segmentation samples are pixel by pixel, generally, the process of evaluating the model prediction Precision is to firstly obtain a prediction result by using a model prediction image, then, the prediction result is overlapped with the corresponding label of the image, each pixel type is counted, and Precision evaluation is completed, and the Precision indexes generally comprise Precision (Precision), recall (Recall), intersection ratio (IoU), F1 Score (F1 Score) and the like. According to the method, the F1 score index is used for carrying out precision evaluation on the candidate set, and the F1 score gives consideration to the precision rate and the recall rate at the same time, so that the precision rate and the recall rate can be considered as a harmonic average. However, in an experiment, if local inaccuracy or local error of sample labeling is found, and the local proportion is small, the F1 score of the whole sample cannot be greatly reduced, and we can mistakenly consider the sample as a high-quality sample and select the sample, so that the model training is influenced. In order to weaken the influence caused by local to a greater extent, the invention provides a local minimum F1 index, wherein the formula of the local minimum local F1 index is as follows:

wherein Local (N) _i ) F1 is the local F1 fraction, N is the number of local blocks into which the sample is divided, N _i Refer to the ith local block, P is the sample global accuracy, R is the sample global recall, P _L Is the local accuracy of the sample, R _L The local recall rate of the sample is p, the number of pixels occupied by the positive class in the prediction result of the sample is p, and the number of the positive class in the label is l. The meaning of this formula is that when p + l =0, it indicates that neither the sample's prediction result nor the label is present, and therefore its positive class local F1 score is equal to 1; when 0 is present<When p + l is less than or equal to 500, the fact that the number of pixels of the part, no matter the positive type in the prediction result or the label, is small is shown, and the accuracy of the part is calculated possibly low or high, the influence of the small number of wrong labels on the model is considered to be within the tolerance, so that the global F1 score of the sample is directly used for assigning the local F1 score of the part; when p + l>At 500, we consider that the annotation will have some impact on the training of the model, so the F1 score in the local is strictly calculated. Finally, the invention defines a local minimum F1 scoreThe formula (c) is as follows:

the local minimum and local minimum F1 index provided by the invention reflects the evaluation of the local index of the sample more accurately.

It should be noted that, the sample refinement is performed from the candidate sample set. If the samples are selected for the first time, selecting no sample in the sample set, and selecting the samples of the candidate sample set by using a model trained by the high-quality sample set; if not, then there are already samples in the refined sample set and the model has been retrained with the refined samples, and the remaining candidate sample sets are sample refined using the model trained with the refined samples.

It is understood that the specific steps of one sample concentration are as follows:

step 1, obtaining an optimal model for the current sample selection, wherein the optimal model is obtained by training a high-quality sample set if the current sample selection is the first sample selection, and the optimal model is obtained by training an existing selected sample set if the current sample selection is not the first sample selection.

And 2, predicting the candidate (residual candidate) sample set by using the optimal model, and calculating a Local minimum F1 index Local (N) minF1 of each sample, wherein the calculation formula of the Local (N) minF1 is as follows:

wherein Local (N) _i ) F1 is the local F1 fraction, N is the number of local blocks into which the sample is divided, N _i Refer to the ith local block, P is the sample global accuracy, R is the sample global recall, P _L Is the local accuracy of the sample, R _L The local recall rate of the sample is p, the number of pixels occupied by the positive class in the prediction result of the sample is p, and the number of the positive class in the label is l.

And 3, sequencing the candidate (residual candidate) samples from large to small according to the local minimum F1 index to obtain a sample sequence, wherein a low-precision error sample positioned at the rear end of the sequence contains a plurality of samples with good labeling quality, and a medium-precision low-quality sample positioned in the middle of the sequence also contains a sample with good labeling quality. Wherein, the high-quality sample refers to a sample with dislocation of the alignment class in the label within an allowable error range, or a sample with more or less labels in the positive aggregation area, the low-quality sample refers to a sample with dislocation of the alignment class in the label, or a sample with less or more labels in the positive aggregation area, and the error sample refers to a sample with large area and more or less labels in the alignment class in the label. Threshold 2 is then determined as a limit for selecting high quality samples.

And 4, selecting the sample with the local minimum F1 index larger than the threshold value 2, and transferring the sample into a selected sample set, so as to finish one time of sample selection.

Further, in order to reduce the calculation for selecting samples, before the step of determining the local minimum F1 index of all the target samples by using the target model in the preset candidate sample set, the method further includes: judging whether sample concentration is carried out for the first time in the preset candidate sample set; if yes, determining local minimum F1 indexes of all the target samples according to the target model; if not, using the existing refined sample set to train the model, obtaining a more generalized model, updating the target model, using the new target model to refine the remaining candidate samples, and determining the local minimum F1 index of the remaining candidate samples through the target model.

Further, in order to achieve the obtaining of the FI index, the step of determining the local minimum F1 index of all the target samples by using the target model in the preset candidate sample set includes:

Local(N _i ) F1 is the local F1 fraction, N is the number of local blocks into which the sample is divided, N _i Refer to the ith local block, P is the sample global accuracy, R is the sample global recall, P _L Is the local accuracy of the sample, R _L And the local recall rate of the sample, p is the number of pixels occupied by the positive class in the sample prediction result, and l is the number occupied by the positive class in the label.

Step S60: and determining a final sample set according to a preset rule and the local minimum F1 index, and storing the final sample set into a preset selected sample set.

Further, in order to improve the quality of the samples in the final sample set, the step of determining the final sample set according to a preset rule in combination with the local minimum F1 index and storing the final sample set into a preset selected sample set includes: acquiring a preset ordering rule, and determining a sample sequence according to the preset ordering rule in combination with the local minimum F1 index of the target sample; obtaining a preset limit threshold, and screening in the sample sequence through the preset limit threshold to determine a final sample set; and storing the final sample set into a preset selected sample set.

Step S70: a refinement iteration is performed on a preset refinement sample set to obtain a refinement sample set.

It should be noted that the selection iteration is a process of obtaining the selection samples from the remaining candidate sample set by updating the target model until a preset requirement is met, and finally merging the candidate sample set and the selection sample set.

Further, to implement a culling iteration, the step of performing a culling iteration on the preset culling sample set to obtain a culling sample set comprises: training the target model using the preset cull sample set to update the target model; updating the preset refined sample set by using the current target model and the current candidate sample set; when the current preset selected sample set meets a preset stop condition, merging the current preset selected sample set and the high-quality sample set to obtain a selected sample set.

In particular implementation, as shown in fig. 3, the present embodiment proposes an embodiment of building sample refinement: in order to further understand the embodiment of the present invention, the semantic segmentation samples of the building with the spatial resolution of 0.5 meter are carefully selected, the scene has 10000 sets of building samples with boundary errors within 2 pixels, which are sketched in 21 counties and cities, and there are other 15 historical images and building contour vector result data of the counties and cities, and we need to convert the historical data into semantic segmentation samples which can be used by the deep learning model, so as to improve the performance of the model. The embodiment has more details, and the effect of the embodiment in the semantic segmentation sample fine selection can be seen through some result data. The specific steps of this example are as follows:

step one, defining a high-quality sample set, a candidate sample set and a fine sample set of the building.

The three sample sets are respectively used for storing high-quality samples, candidate samples and selected samples of the semantic segmentation samples involved in the invention. The high-quality samples are 10000 groups of samples with good labeling quality, the size is 512 x 512 pixels, the selected samples are samples with good labeling quality selected from candidate samples, and the candidate sample set is generated from historical images of other 15 counties and building contour vector production data.

And step two, obtaining a candidate sample.

Randomly dividing the high-quality building sample into a training set and a verification set, selecting a DeepLabV3+ model as a building extraction method, putting the training set of the high-quality building sample into the DeepLabV3+ for training, and storing the model with the highest IoU on the verification set as an optimal model. The model will be used to predict and evaluate areas of generalized differences from historical images, as well as for first sample refinement. And predicting the historical image by using a DeepLabV3+ model trained by a high-quality sample set, taking a building contour vector corresponding to the historical image as a label, and performing block evaluation on a prediction result and obtaining the precision of each block. And extracting the image area with the evaluation index smaller than the threshold value 1 as a candidate sample set acquisition area. And finally, cutting the data of the candidate sample set acquisition region and generating a candidate sample set with a sample size of 512 pixels by 512 pixels. The step of acquiring the candidate sample set acquisition region is as follows.

Step 2.1 divides the image into standard image blocks according to a 5000 x 5000 pixel grid.

And 2.2, taking the F1 score as a standard image block prediction quality evaluation index for block evaluation.

And 2.3, superposing the feature prediction result on the historical image and the feature contour vector result corresponding to the image, and performing block evaluation on the image by taking the standard image block as a unit by using the F1 score to obtain an F1 score evaluation value of each standard image block.

Step 2.4 determines that the F1 score threshold for standard image block prediction is 0.7 as threshold 1. Due to factors such as quality, resolution, spectrum and complexity of the images, the model has different extraction difficulty degrees for different ground objects, a professional sets a threshold according to experience, distinguishes an area with poor generalization of the model, and can set the threshold by referring to the precision obtained on a verification set during training of a high-quality sample if the professional does not have experience. The present embodiment empirically determines the threshold value.

And 2.5, extracting the image area with the evaluation index smaller than the threshold value 1 as a candidate sample set acquisition area.

And step three, selecting samples from the candidate sample set.

If the samples are selected for the first time, selecting no sample in the sample set at the time, and selecting the samples of the candidate sample set by using the model trained by the high-quality sample set; if the sample selection is not the first time, the samples exist in the selected sample set at the moment, the model is retrained by the selected samples, and the remaining candidate sample sets are subjected to sample selection by using the model trained by the selected samples. The specific steps of one sample concentration are as follows.

And 3.1, obtaining an optimal model for the sample selection, wherein the optimal model is obtained by training a high-quality sample set if the sample selection is performed for the first time, and the optimal model is obtained by training an existing selected sample set if the sample selection is not performed for the first time.

Step 3.2 predicts the candidate (remaining candidate) sample set using the best model, calculating the Local minimum F1 index Local (N) minF1, N = (512/128) 2 for each sample.

And 3.3, sequencing the candidate (residual candidate) samples from large to small according to the local minimum F1 index to obtain a sample sequence, and then determining that the threshold 2 is equal to 0.75 as the boundary of high-quality samples. Due to factors such as quality, resolution, spectrum and complexity of the images, the model has different extraction difficulty degrees for different ground objects, a professional sets a threshold according to experience, distinguishes an area with poor generalization of the model, and can set the threshold by referring to the precision obtained on a verification set during training of a high-quality sample if the professional does not have experience. The present embodiment empirically determines the threshold value.

And 3.4, selecting the sample with the local minimum F1 index larger than the threshold value 2, and moving the sample into the selected sample set, so as to finish one sample selection.

And step four, obtaining a refined sample set through refined iteration. The method comprises the following specific steps.

And 4.1, after the primary sample selection is carried out, randomly dividing the selected sample set into a training set and a verification set, training a model by using the selected sample set, and storing the model with the highest IoU on the verification set as an optimal model to obtain the optimal model for further generalizing the candidate sample set as the model of the next selected sample.

Step 4.2 continues the sample culling of the remaining candidate sample sets using the best model trained using the culled sample set, and repeats steps 4.1 and 4.2 for the sample culling iteration until this sample culling iteration results in the culled samples reaching an increment, i.e., the number of samples that increased when the culled sample set was the last time from the high quality building sample set and the model was trained, which in this example is set to 3000.

And 4.3, merging the selected sample set obtained by the selection iteration with the high-quality sample set, training and obtaining the optimal model, finishing selection to obtain the selected sample set if the model meets the user-defined requirement, and continuing selection iteration until the selected sample set meets the user-defined requirement to obtain the selected sample set if the model does not meet the requirement. The invention does not limit the user-defined index used for judging whether the model meets the requirement or not.

It can be understood that, as shown in the following table, the candidate sample set of the building in this embodiment is recorded as about 140000 groups of samples, the refined sample set obtained through sample refinement has about 55000 groups of samples, and as can be seen by comparing the ious of the model training two groups of samples on the verification set, the model has about 10% improvement of the IoU of the verification set on the refined sample set, and it is because the model obtains lower accuracy on the candidate sample set due to the presence of wrong labels in the historical data, which indicates that the purity of the samples has an important influence on the training of the model.

	Set of candidate samples	Selected sample set
			Number/group	About 140000	About 55000
Verification set IoU	0.6123	0.7075

In addition, the images are also tested and compared, as shown in fig. 4, when a candidate sample set is trained to obtain a model, the convergence of the model is poor due to a large number of wrong labels and poor quality labels in the candidate sample set, the prediction result is also poor, a gray level graph 5 of the prediction result has many wrong extraction results of gray Mongolia, while fig. 6 after the sample is refined shows that the situation is greatly improved, and meanwhile, the local house extraction result has a better effect on the model obtained after the sample is refined as shown in fig. 7.

In a specific implementation, as shown in fig. 8, the present embodiment generates a candidate sample set by training a model with high quality only for the first time to predict and obtain a generalization poor region on an image, and then performs iterative operations of prediction, refinement and training. For the first time of selection in the candidate sample set, as the block size in the block evaluation process for obtaining the generalization difference is far larger than the sample size, the generalization difference does not represent that the model is not well predicted in each local part of the block, but the prediction effect is poor in most local areas to cause low overall accuracy of the block, so that the worry that the high-quality sample cannot be extracted due to the generalization at the first time of sample selection is avoided. After the first sample selection, according to the first law of geography, namely the spatial correlation law, any object is related to other objects, but the close objects are more closely related, the model trained by using the selected sample set can improve the generalization of the model in the candidate sample set. Therefore, the invention can continuously improve the generalization of the model in the candidate sample set by utilizing the benign iteration, and can continuously select high-quality samples mixed in the middle section and the rear section of the precision evaluation sequence. In addition, in the invention, in the second and later sample selection, the model is trained only by using the selected sample set, so that the time cost brought by training the model together with the high-quality sample set can be reduced.

The sample selection technology based on remote sensing information result data is used for selecting semantically segmented samples, an image data area with poor model generalization is obtained through block precision evaluation, the local minimum F1 index is used for preventing the sample with poor local labeling from being selected as a high-quality sample, and the model generalization is gradually improved through iterative selection; dividing the candidate sample into a high-quality sample, a low-quality sample and an error sample in sample selection, and analyzing the distribution conditions of the three samples, thereby determining that the sample selection is performed by obtaining the high-quality sample and realizing the sample iterative selection; and through virtuous cycle, the selection model can continuously increase the generalization of the candidate sample set, and continuously extract high-quality samples from the candidate sample set.

Referring to FIG. 11, a flowchart of a second embodiment of the intelligent sample culling method of the invention is shown.

Based on the first embodiment, the step S40 of the intelligent sample selection method of this embodiment further includes:

step S401: and dividing the image information into standard image blocks in a geographic unit and/or grid mode in the prediction result.

It should be noted that one of the problems faced by the present invention is how to avoid introducing a large number of redundant samples when generating surface feature semantic segmentation samples, and although the labeling of these samples may be very accurate, the introduction of them has a limited meaning for improving the model accuracy. The block evaluation can obtain the region with poor generalization ability of the model by obtaining the evaluation index below a prescribed threshold value. The blocking mode can be used for carrying out grid division on the image according to the specified width and height, or carrying out region division on the image according to cognitive boundaries such as streets, rivers, mountains and the like, and then obtaining the evaluation index on each image block after evaluation. Due to factors such as quality, resolution, spectrum and complexity of the images, the model has different extraction difficulty degrees for different ground objects, a professional sets a threshold according to experience, distinguishes an area with poor generalization of the model, and can set the threshold by referring to the precision obtained on a verification set during training of a high-quality sample if the professional does not have experience.

In the block evaluation, a feature contour vector corresponding to a historical image is used as a label, as described above, the label of the vector does not conform to the actual feature of the image, so that a local region with good prediction result and poor label or a local region with poor prediction result and poor label with low evaluation index exists, and candidate samples made by the regions will not be selected in subsequent selection operation, so that in this step, the predicted region is removed from the labeled region, and a redundant sample which does not have a significance on improving the model performance is avoided.

Step S402: and obtaining the prediction quality evaluation index of the standard image block.

Step S403: and acquiring an image data area by combining the prediction quality evaluation index in a block evaluation mode.

In the present embodiment, the video is divided into standard video blocks according to a geographic unit or a grid manner. And determining a standard image block prediction quality evaluation index for block evaluation. And superposing the feature prediction result on the historical image and the feature contour vector result corresponding to the image, and performing block evaluation on the image by taking the standard image block as a unit by using the determined evaluation index to obtain the evaluation value of each standard image block. A standard image block prediction quality evaluation threshold is determined as threshold 1. And extracting the image area with the evaluation index smaller than the threshold value 1 as a candidate sample set acquisition area.

Further, in order to accurately obtain an image data acquisition region, the step of obtaining the image data region by combining the prediction quality evaluation index in a block evaluation manner includes: acquiring a feature prediction result and a feature contour vector result corresponding to the feature prediction result from the historical image data; superposing the feature prediction result and the feature contour vector result to generate an superposed image; block evaluation is carried out on the superposed image according to the prediction quality evaluation index so as to obtain an evaluation value of each standard image block; and extracting the image area with the evaluation value smaller than the preset threshold value as an image data acquisition area.

Step S404: and cutting the data image area into a target sample and storing the target sample in a preset candidate sample set.

In the embodiment, the image information is divided into standard image blocks in the prediction result according to a geographic unit and/or a grid mode; obtaining a prediction quality evaluation index of the standard image block; acquiring an image data area by combining the prediction quality evaluation index in a blocking evaluation mode; cutting the data image area into a target sample and storing the target sample in a preset candidate sample set; the technical effect of accurately generating the data image area in the preset candidate sample set is achieved.

Furthermore, an embodiment of the present invention also provides a computer-readable storage medium, on which a program for sample culling is stored, which when executed by a processor implements the steps of the method for sample culling as described above.

Referring to FIG. 12, FIG. 12 is a block diagram of the first embodiment of the intelligent sample culling apparatus of the invention.

As shown in fig. 12, the intelligent sample culling apparatus according to an embodiment of the invention includes:

a to-be-trained sample acquisition module 10, configured to acquire a to-be-trained sample, and store the to-be-trained sample in a preset high-quality sample set;

the model training module 20 is configured to perform model training on the preset high-quality sample set to obtain a target model;

a prediction result obtaining module 30, configured to obtain historical image data, and predict the historical image data by using the target model to obtain a prediction result;

a block evaluation module 40, configured to perform block evaluation on the prediction result to obtain an image data area, cut the image data area into target samples, and store the target samples in a preset candidate sample set;

a local minimum F1 index obtaining module 50, configured to determine, in the preset candidate sample set, a local minimum F1 index of all the target samples by using the target model;

a preset condition module 60, configured to determine a final sample set according to a preset rule in combination with the local minimum F1 index, and store the final sample set in a preset refined sample set;

a refined sample set module 70 configured to perform a refinement iteration on the preset refined sample set to obtain a refined sample set.

It should be understood that the above is only an example, and the technical solution of the present invention is not limited in any way, and in a specific application, a person skilled in the art may set the technical solution as needed, and the present invention is not limited thereto.

The sample selection technology based on remote sensing information result data is used for selecting semantic segmentation samples, an image data area with poor model generalization is obtained through block precision evaluation, a sample with poor local labeling is prevented from being selected as a high-quality sample through a local minimum F1 index, and the model generalization is gradually improved through iterative selection; dividing the candidate sample into a high-quality sample, a low-quality sample and an error sample in sample selection, and analyzing the distribution conditions of the three samples, thereby determining that the sample selection is performed by obtaining the high-quality sample and realizing the sample iterative selection; and the selected model can continuously increase the generalization on the candidate sample set through virtuous cycle, and continuously extract high-quality samples from the candidate sample set.

In an embodiment, the block evaluation module 40 is further configured to divide the image information into standard image blocks in the prediction result according to a geographic unit and/or a grid; obtaining a prediction quality evaluation index of the standard image block; obtaining an image data area by combining the prediction quality evaluation index in a blocking evaluation mode; and cutting the data image area into a target sample and storing the target sample in a preset candidate sample set.

In an embodiment, the block evaluation module 40 is further configured to obtain a feature prediction result and a feature contour vector result corresponding to the feature prediction result from the historical image data; superposing the feature prediction result and the feature outline vector result to generate an overlaid image; block evaluation is carried out on the superposed image according to the prediction quality evaluation index so as to obtain an evaluation value of each standard image block; and extracting the image area with the evaluation value smaller than the preset threshold value as an image data acquisition area.

In an embodiment, the local minimum F1 index obtaining module 50 is further configured to determine whether to perform sample refinement for the first time in the preset candidate sample set; if yes, determining local minimum F1 indexes of all the target samples according to the target model; if not, using the existing refined sample set to train the model, obtaining a more generalized model, updating the target model, using the new target model to refine the remaining candidate samples, and determining the local minimum F1 index of the remaining candidate samples through the target model.

In an embodiment, the Local minimum F1 index obtaining module 50 is further configured to determine a Local minimum F1 index Local (N) minF1 of all the target samples in the candidate sample set by using the target model, and the calculation formula is:

In an embodiment, the preset condition module 60 is further configured to obtain a preset ordering rule, and determine a sample sequence according to the preset ordering rule in combination with the local minimum F1 index of the target sample; obtaining a preset limit threshold value, and screening in the sample sequence through the preset limit threshold value to determine a final sample set; and storing the final sample set into a preset selection sample set.

In one embodiment, the refined sample set module 70 is further configured to train the target model using the preset refined sample set to update the target model; updating the preset refined sample set by using the current target model and combining with the current candidate sample set; when the current preset selected sample set meets a preset stop condition, merging the current preset selected sample set and the high-quality sample set to obtain a selected sample set.

It should be noted that the above-described work flows are only exemplary, and do not limit the scope of the present invention, and in practical applications, a person skilled in the art may select some or all of them to achieve the purpose of the solution of the embodiment according to actual needs, and the present invention is not limited herein.

In addition, the technical details that are not elaborated in this embodiment can be referred to the method for selecting samples provided by any embodiment of the present invention, and are not described herein again.

Further, it is to be noted that, in this document, the terms "comprises," "comprising," or any other variation thereof, are intended to cover a non-exclusive inclusion, such that a process, method, article, or system that comprises a list of elements does not include only those elements but may include other elements not expressly listed or inherent to such process, method, article, or system. Without further limitation, an element defined by the phrases "comprising a," "...," or "comprising" does not exclude the presence of other like elements in a process, method, article, or system comprising the element.

The above-mentioned serial numbers of the embodiments of the present invention are merely for description and do not represent the merits of the embodiments.

Through the above description of the embodiments, those skilled in the art will clearly understand that the method of the above embodiments can be implemented by software plus a necessary general hardware platform, and certainly can also be implemented by hardware, but in many cases, the former is a better implementation manner. Based on such understanding, the technical solution of the present invention or portions thereof that contribute to the prior art may be embodied in the form of a software product, where the computer software product is stored in a storage medium (e.g. Read Only Memory (ROM)/RAM, magnetic disk, optical disk), and includes several instructions for enabling a terminal device (e.g. a mobile phone, a computer, a server, or a network device) to execute the method according to the embodiments of the present invention.

The above description is only a preferred embodiment of the present invention, and not intended to limit the scope of the present invention, and all modifications of equivalent structures and equivalent processes, which are made by using the contents of the present specification and the accompanying drawings, or directly or indirectly applied to other related technical fields, are included in the scope of the present invention.

Claims

1. An intelligent sample selection method, comprising:

determining local minimum F1 indexes of all the target samples by using the target model in the preset candidate sample set;

determining a final sample set according to a preset rule and the local minimum F1 index, and storing the final sample set into a preset choice sample set;

2. The intelligent sample refinement method according to claim 1, wherein the step of performing block evaluation in the prediction result to obtain an image data area, and clipping the image data area into a target sample to be stored in a preset candidate sample set comprises:

obtaining a prediction quality evaluation index of the standard image block;

3. The intelligent sample refinement method according to claim 2, wherein the step of obtaining the image data region by means of block estimation in combination with the prediction quality estimation indicator comprises:

4. The intelligent sample picking method according to claim 1, wherein the step of determining the local minimum F1 index of all the target samples using the target model in the preset set of candidate samples is preceded by the step of:

if not, using the existing refined sample set training model to obtain a more generalized model, updating the target model, using the new target model to refine the remaining candidate samples, and determining the local minimum F1 index of the remaining candidate samples through the target model.

5. The intelligent sample picking method according to claim 1, wherein the step of determining the local minimum F1 index of all the target samples using the target model in the preset set of candidate samples comprises:

determining a Local minimum F1 index (Local (N) minF 1) for all of the target samples in the set of candidate samples using the target model, which is calculated by:

6. The intelligent sample picking method according to claim 1, wherein the step of determining a final sample set according to a preset rule in combination with the local minimum F1 index and storing the final sample set in a preset picking sample set comprises: acquiring a preset ordering rule, and determining a sample sequence according to the preset ordering rule in combination with the local minimum F1 index of the target sample;

obtaining a preset limit threshold value, and screening in the sample sequence through the preset limit threshold value to determine a final sample set;

and storing the final sample set into a preset selected sample set.

7. The intelligent sample refinement method according to claim 1, wherein said step of performing a refinement iteration on said preset refinement sample set to obtain a refinement sample set comprises:

training the target model by using the preset choice sample set to update the target model;

8. An intelligent sample beneficiation apparatus, characterized in that the intelligent sample beneficiation apparatus comprises:

9. A computer device, the device comprising: a memory, a processor that, when executing computer instructions stored by the memory, performs the method of any of claims 1-7.

10. A computer-readable storage medium comprising instructions that, when executed on a computer, cause the computer to perform the method of any of claims 1 to 7.