CN111310846A - Method, device, storage medium and server for selecting sample image - Google Patents

Method, device, storage medium and server for selecting sample image Download PDF

Info

Publication number
CN111310846A
CN111310846A CN202010127598.6A CN202010127598A CN111310846A CN 111310846 A CN111310846 A CN 111310846A CN 202010127598 A CN202010127598 A CN 202010127598A CN 111310846 A CN111310846 A CN 111310846A
Authority
CN
China
Prior art keywords
image
sample
sample image
unlabeled
uncertainty
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN202010127598.6A
Other languages
Chinese (zh)
Other versions
CN111310846B (en
Inventor
王俊
高鹏
谢国彤
柳杨
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Ping An Technology Shenzhen Co Ltd
Original Assignee
Ping An Technology Shenzhen Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Ping An Technology Shenzhen Co Ltd filed Critical Ping An Technology Shenzhen Co Ltd
Priority to CN202010127598.6A priority Critical patent/CN111310846B/en
Publication of CN111310846A publication Critical patent/CN111310846A/en
Priority to PCT/CN2020/119302 priority patent/WO2021169301A1/en
Application granted granted Critical
Publication of CN111310846B publication Critical patent/CN111310846B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/21Design or setup of recognition systems or techniques; Extraction of features in feature space; Blind source separation
    • G06F18/214Generating training patterns; Bootstrap methods, e.g. bagging or boosting
    • G06F18/2155Generating training patterns; Bootstrap methods, e.g. bagging or boosting characterised by the incorporation of unlabelled data, e.g. multiple instance learning [MIL], semi-supervised techniques using expectation-maximisation [EM] or naïve labelling
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/24Classification techniques
    • G06F18/241Classification techniques relating to the classification model, e.g. parametric or non-parametric approaches

Landscapes

  • Engineering & Computer Science (AREA)
  • Data Mining & Analysis (AREA)
  • Theoretical Computer Science (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Artificial Intelligence (AREA)
  • Evolutionary Biology (AREA)
  • Evolutionary Computation (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Image Analysis (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

The application is applicable to the technical field of computers, and provides a method, a device, a storage medium and a server for selecting a sample image. By adopting the method, when the sample images are selected for manual annotation, the uncertainty index and the representative index of each unmarked sample image are respectively calculated, and the annotation value of each sample image is determined by combining the uncertainty index and the representative index. Because the samples with larger representativeness cannot be outlier samples and can better reflect the characteristics of each sample in the sample set, the samples are the samples with higher labeling value as the samples with large uncertainty. When the annotation value of the sample image is measured, the uncertainty and the representativeness of the sample are considered at the same time, so that the part of the sample image with high annotation value can be selected for manual annotation, and the performance of the image classification model is optimized better.

Description

Method, device, storage medium and server for selecting sample image
Technical Field
The present application relates to the field of computer technologies, and in particular, to a method, an apparatus, a storage medium, and a server for selecting a sample image.
Background
Currently, when images are classified, an image annotation method based on active learning is generally adopted, and the image annotation method mainly includes: acquiring a part of marked image and a part of unmarked image; taking the part of marked images as a training set, and training to obtain an initial classification model; classifying and predicting the part of the unmarked images by using the initial classification model to obtain a prediction result of each image; respectively calculating the credibility of the prediction result of each image, selecting the image with the maximum uncertainty, and submitting the image to an expert for manual annotation of the image; adding the manually marked images into a training set, and retraining and optimizing the classification model; and repeating the iteration to execute the steps until the accuracy of the classification model meets the requirement, or the iteration times reach the specified times. By adopting the image labeling method based on active learning, a part of samples can be selected from a large number of unlabelled images and submitted to artificial labeling, thereby reducing the workload of artificial labeling. However, when selecting and manually labeling images from image prediction results, the uncertainty of the images is considered alone, so that the labeling value of the images cannot be reflected well, and it is difficult to ensure that the images with high labeling value are selected and manually labeled.
Disclosure of Invention
In view of this, the present application provides a method for selecting a sample image, which can select a portion of the sample image with a high labeling value for manual labeling, so as to better optimize the performance of an image classification model.
In a first aspect, an embodiment of the present application provides a method for selecting a sample image, including:
acquiring an unlabelled image set and an labeled image set, wherein the unlabelled image set comprises a plurality of unlabelled sample images, and the labeled image set comprises a plurality of labeled sample images;
training to obtain an image classification model by taking the marked image set as a training set;
classifying each unlabeled sample image in the unlabeled image set by adopting the image classification model to obtain a classification result of each unlabeled sample image;
for each unlabeled sample image, respectively calculating to obtain respective uncertainty indexes and representative indexes according to respective classification results, and determining respective labeling values by combining the respective uncertainty indexes and the representative indexes, wherein the uncertainty indexes are used for measuring the uncertainty of the image classification results of the samples, and the representative indexes are used for measuring the probability size of the samples which can be used as the representative samples of the unlabeled image set;
and selecting and outputting the sample image with the highest labeling value from the unlabeled sample images.
In the process, when the sample images are selected for manual annotation, the uncertainty index and the representative index of each unmarked sample image are respectively calculated, and the annotation value of each sample image is determined by combining the uncertainty index and the representative index. Because the samples with larger representativeness cannot be outlier samples and can better reflect the characteristics of each sample in the sample set, the samples are the samples with higher labeling value as the samples with large uncertainty. When the annotation value of the sample image is measured, the uncertainty and the representativeness of the sample are considered at the same time, so that the part of the sample image with high annotation value can be selected for manual annotation, and the performance of the image classification model is optimized better.
Further, the uncertainty indicator of any one unlabeled target sample image in the set of unlabeled images can be calculated by the following formula:
f(x,L,u)=-∑y∈Ypθ(y|x)*log(pθ(y|x))
wherein f (x, L, u) represents an uncertainty indicator of the target sample image x, L represents a sample of the labeled image set, u represents a sample of the unlabeled image set, and pθ(Y | x) represents the probability that the target sample image x belongs to a label Y, which is a pre-constructed set of label categories.
Further, the representative index of the target sample image may be calculated by the following formula:
Figure BDA0002394866660000021
wherein Rep (x) represents a representative index of the target sample image x, n represents the number of sample images of the unlabeled image set, sim (x, x)i) Representing the target sample image x and one sample image x of the set of unlabeled imagesiThe similarity between the target sample image x and the target sample image x is assumed to be expressed as x ═ x in the attribute space1,x2,...,xj,...,xmThe sample image xiIs expressed as
Figure BDA0002394866660000031
Figure BDA0002394866660000032
Then sim (x, x)i) The specific expression of (A) is as follows:
Figure BDA0002394866660000033
the annotation value (x) of the target sample image can be calculated by the following formula:
Value(x)=f(x,L,u)*Rep(x)。
further, the uncertainty index of any one unlabeled target sample image in the set of unlabeled images can be determined by:
calculating an information entropy index of the target sample image;
counting the number of labels obtained by classifying the target sample image according to the classification result of the target sample image;
and calculating to obtain an uncertainty index of the target sample image by combining the information entropy index and the number of the labels.
The number of the labels obtained by classifying the sample images can be used for measuring the information diversity of the samples, and more accurate index parameters comprehensively considering the uncertainty of sample prediction and the label diversity can be obtained by combining the information entropy index and the diversity of the sample prediction.
Specifically, the information entropy index of the target sample image can be calculated by the following formula:
Ent(x,L,u)=-∑y∈Ypθ(y|x)*log(y|x)
wherein Ent (x, L, u) represents an information entropy index of the target sample image x, L represents a sample of the labeled image set, u represents a sample of the unlabeled image set, and pθ(Y | x) represents the probability that the target sample image x belongs to a label Y, Y being a pre-constructed label category set;
the uncertainty indicator for the target sample image may be calculated by the following formula:
f(x,L,u)=Ent(x,L,u)*Mul(x)a
wherein f (x, L, u) represents an uncertainty index of the target sample image x, mul (x) represents the number of labels, and a is a parameter for adjusting specific gravity.
Further, the representative index of the target sample image may be calculated by the following kernel density estimation formula:
Figure BDA0002394866660000041
the annotation value (x) of the target sample image can be calculated by the following formula:
Value(x)=f(x,L,u)*Repβ(x);
rep (x) represents a representative index of the target sample image x, n represents the number of sample images of the unlabeled image set, h is the bandwidth of the kernel density estimation, and the sample images of the unlabeled image set are represented by { x1,x2,...,xi,...,xnK (, K) is a preset weight function, and β is a parameter for adjusting specific gravity.
Further, after selecting and outputting a sample image with the highest labeling value from the unlabeled sample images, the method may further include:
transferring the sample image with the highest labeling value after manual labeling from the unlabeled image set to the labeled image set, and updating the labeled image set;
taking the updated labeled image set as a training set, and performing optimization updating on the image classification model;
and if the optimization updating times of the image classification model reach the set iteration times or the accuracy of the image classification model reaches the set threshold value, determining the current image classification model as the final image classification model.
After a sample image with the highest labeling value is selected, manual labeling is carried out on the part of sample image, then the sample image after manual labeling is transferred from the unlabeled image set to the labeled image set, the updated labeled image set is used as a training set, and the image classification model is optimized and updated, so that the performance of the image classification model is improved.
In a second aspect, an embodiment of the present application provides an apparatus for extracting a sample image, including:
the image collection acquisition module is used for acquiring an unlabelled image collection and an labeled image collection, wherein the unlabelled image collection comprises a plurality of unlabelled sample images, and the labeled image collection comprises a plurality of labeled sample images;
the classification model training module is used for training to obtain an image classification model by taking the marked image set as a training set;
the sample image classification module is used for classifying each unlabeled sample image in the unlabeled image set by adopting the image classification model to obtain a classification result of each unlabeled sample image;
a sample annotation value determining module, configured to calculate, for each unlabeled sample image, a respective uncertainty index and a respective representative index according to a respective classification result, and determine a respective annotation value by combining the respective uncertainty index and the respective representative index, where the uncertainty index is used to measure uncertainty of an image classification result of a sample, and the representative index is used to measure a probability size of a sample that can be used as a representative sample of the unlabeled image set;
and the sample image selecting module is used for selecting and outputting the sample image with the highest labeling value from the unlabeled sample images.
In a third aspect, an embodiment of the present application provides a computer-readable storage medium, where a computer program is stored, and when the computer program is executed by a processor, the method for selecting a sample image as set forth in the first aspect of the embodiment of the present application is implemented.
In a fourth aspect, an embodiment of the present application provides a server, including a memory, a processor, and a computer program stored in the memory and executable on the processor, where the processor executes the computer program to implement the method for selecting a sample image as set forth in the first aspect of the embodiment of the present application.
In a fifth aspect, an embodiment of the present application provides a computer program product, which, when running on a terminal device, causes the terminal device to execute the method for selecting a sample image according to the first aspect.
It is understood that the beneficial effects of the second aspect to the fifth aspect can be referred to the related description of the first aspect, and are not described herein again.
Drawings
In order to more clearly illustrate the technical solutions in the embodiments of the present application, the drawings needed to be used in the embodiments or the prior art descriptions will be briefly described below, and it is obvious that the drawings in the following description are only some embodiments of the present application, and it is obvious for those skilled in the art to obtain other drawings based on these drawings without inventive exercise.
Fig. 1 is a flowchart of a first embodiment of a method for selecting a sample image according to an embodiment of the present application;
FIG. 2 is a flowchart of a second embodiment of a method for selecting a sample image according to an embodiment of the present application;
FIG. 3 is a block diagram of an embodiment of an apparatus for extracting a sample image according to an embodiment of the present disclosure;
fig. 4 is a schematic diagram of a server according to an embodiment of the present application.
Detailed Description
In the following description, for purposes of explanation and not limitation, specific details are set forth, such as particular system structures, techniques, etc. in order to provide a thorough understanding of the embodiments of the present application. It will be apparent, however, to one skilled in the art that the present application may be practiced in other embodiments that depart from these specific details. In other instances, detailed descriptions of well-known systems, devices, circuits, and methods are omitted so as not to obscure the description of the present application with unnecessary detail. Furthermore, in the description of the present application and the appended claims, the terms "first," "second," "third," and the like are used for distinguishing between descriptions and not necessarily for describing or implying relative importance.
The application provides a method for selecting sample images, which can select the sample images with high labeling value to be manually labeled, so that the performance of an image classification model is better optimized.
It should be understood that the execution subject of the method for selecting a sample image proposed in the embodiments of the present application is a server.
Referring to fig. 1, a first embodiment of a method for selecting a sample image according to an embodiment of the present application includes:
101. acquiring an unlabelled image set and an labeled image set, wherein the unlabelled image set comprises a plurality of unlabelled sample images, and the labeled image set comprises a plurality of labeled sample images;
firstly, an unlabelled image set and an labeled image set are obtained, wherein the unlabelled image set comprises a plurality of unlabelled sample images, and the labeled image set comprises a plurality of labeled sample images. These sample images may be multi-labeled images, i.e., one image contains a plurality of different class labels. For example, an ophthalmic OCT image may simultaneously include 1-6 focus type labels (vitreous macular traction, epiretinal membrane or epiretinal membrane, macular hole, intra-retinal effusion, pigment epithelium detachment, drusen or retinal atrophy, etc.).
102. Training to obtain an image classification model by taking the marked image set as a training set;
and then, training to obtain an image classification model by taking the marked image set as a training set. The image classification model may employ depth models of various classes, such as DenseNet, ResNet, resenxt, MobileNet, NASNet, and the like. Among them, preferably adopt DenseNet, realize the reuse of characteristic through the connection of characteristic on the channel, can obtain the better performance than ResNet under the situation that the parameter and computational cost are less.
103. Classifying each unlabeled sample image in the unlabeled image set by adopting the image classification model to obtain a classification result of each unlabeled sample image;
after an image classification model is obtained through training, the image classification model is adopted to classify each unlabeled sample image in the unlabeled image set respectively, and a classification result of each unlabeled sample image is obtained. The classification result of a sample image is specifically the probability that the sample image belongs to each different preset class label, such as class label a-90%, class label B-20%, and the like.
104. For each unlabeled sample image, respectively calculating to obtain respective uncertainty indexes and representative indexes according to respective classification results, and determining respective labeling values by combining the respective uncertainty indexes and the representative indexes;
the embodiment of the application selects a strategy of uncertainties) + retrieval (Representativeness) to measure the annotation value of the sample image. For any unmarked sample image, the uncertainty index and the representative index of the sample image are calculated according to the classification result of the sample image, and then the marking value of the sample image is determined by combining the uncertainty index and the representative index. The uncertainty index is used for measuring the uncertainty of the image classification result of the sample, and the representative index is used for measuring the probability size of the sample which can be used as the representative sample of the unlabeled image set.
Optionally, the uncertainty indicator of any unlabeled target sample image in the set of unlabeled images may be calculated by the following formula (1-1):
f(x,L,u)=-∑y∈Ypθ(y|x)*log(pθ(y|x)) (1-1)
wherein f (x, L, u) represents an uncertainty indicator of the target sample image x, L represents a sample of the labeled image set, u represents a sample of the unlabeled image set, and pθ(Y | x) represents the probability that the target sample image x belongs to a label Y, which is a pre-constructed set of label categories. In the classification result of the sample, the closer the prediction result for a certain label is to 0.5, the higher uncertainty that the current model has for the label of the sample, that is, the higher value the sample needs to be labeled.
Considering the representativeness of the samples, the representativeness of the samples can be measured by how many samples are similar to the samples, the samples with larger representativeness cannot be outlier samples, the higher the similarity among the samples is, the more consistent the characteristics of the samples are, and if the similarity is greater than a certain set threshold value, the sample information redundancy is considered. Specifically, the similarity between sample points may be calculated using the similarity coefficient in the following manner: assume that the attribute space of the target sample image x is represented by x ═ x1,x2,...,xj,...,xm-one sample image x of the set of unlabelled imagesiIs expressed as
Figure BDA0002394866660000081
The cosine formula is used to calculate the similarity between the two, i.e. the following formula (1-2):
Figure BDA0002394866660000082
then, the representative index of the target sample image x can be obtained by averaging the similarity, as shown in the following formula (1-3):
Figure BDA0002394866660000083
wherein rep (x) represents a representative index of the target sample image x, and n represents the number of sample images of the unlabeled image set.
After obtaining the uncertainty index and the representative index of the target sample image, the annotation value of the target sample image can be determined according to the 2 indexes, for example, the annotation value (x) can be calculated by the following formula (1-4):
Value(x)=f(x,L,u)*Rep(x) (1-4)
the first term f (x, L, u) in the equation (1-4) represents the information amount (uncertainty index) of the sample x under the condition of the current query strategy, and the second term is a representative index, and is obtained by calculating the similarity of each sample relative to other samples and is represented as the average similarity of the sample x and all samples in the sample space. If the density of the space where the sample is located is higher, the information content of the sample is higher, and the sample has a larger chance to be selected for marking. Through the introduction of a representative criterion, samples with high similarity are deleted to reduce the addition of redundant samples, representative high-information-quantity samples are reserved by integrating the uncertainty indexes and the representative indexes of the samples, and the influence of isolated points on the sample selection quality is effectively solved.
Obviously, the sample image x with the highest labeling value is:
x*=argmax[f(x,L,u)*Rep(x)]
argmax is a function that is a function of (a set of) parameters to the function. When we have another function y ═ f (x), if there is a result x0Argmax (f (x)) means when the function f (x) takes x ═ x0When f (x) is needed, the maximum value of the value range of f (x) is obtained; if there are multiple points such that f (x) takes the same maximum, then the result of argmax (f (x)) is a set of points. Sentence changingIn other words, argmax (f (x)) is the variable point x (or set of x) corresponding to the maximum value of f (x).
One way to calculate the uncertainty index, the representative index, and determine the annotation value of the sample image by combining these 2 indexes has been shown above, and another different way of calculation is proposed below.
Optionally, the uncertainty indicator of any unlabeled target sample image in the set of unlabeled images may be determined by:
(1) calculating an information entropy index of the target sample image;
(2) counting the number of labels obtained by classifying the target sample image according to the classification result of the target sample image;
(3) and calculating to obtain an uncertainty index of the target sample image by combining the information entropy index and the number of the labels.
For step (1), measuring the uncertainty of the current sample by information entropy, defining the information entropy index of the sample image x as end (x, L, u), where L represents the sample of the labeled set, u represents the sample of the unlabeled set, and end (x, L, u) can be calculated by the following formula (1-5):
Ent(x,L,u)=-∑y∈Ypθ(y|x)*log(y|x) (1-5)
wherein p isθ(Y | x) represents the probability that the target sample image x belongs to a label Y, which is a pre-constructed set of label categories.
Regarding the steps (2) and (3), in consideration of the difference between the multi-label classification and the single-label classification, the sample selection strategy can be guided and gained by mining the label diversity in the multi-label classification, the uncertainty of the current sample is measured through the information entropy, the diversity of the labels is further fused through the sample label base number predicted by the model, when the number of the labels of the sample predicted by the previous round of depth model is more, the sample is considered to contain more information for improving the model performance, and the labeling value of the unlabeled sample can be more effectively measured through the comprehensive consideration of two dimensions of the sample and the labels. Specifically, the uncertainty index of the target sample image can be calculated by the following formula (1-6):
f(x,L,u)=Ent(x,L,u)*Mul(x)a(1-6)
wherein f (x, L, u) represents an uncertainty index of the target sample image x, mul (x) represents the number of labels, and a is a parameter for adjusting specific gravity.
For the representative index of the sample image, the representative index can be measured by the similarity between the sample and other samples, the sample with larger representativeness cannot be an outlier sample, when the similarity between the samples is calculated, firstly, the last full-link layer of the deep pre-training network is adopted to extract the feature vector of the image sample, then, the LargeVis method can be adopted to reduce the dimension of the extracted high-dimensional feature vector, in the two-dimensional space of the sample point after the dimension reduction, the distribution density of the position of the sample point is calculated by the kernel density method, and the higher the kernel density is, the higher the area of the sample point is, so that the representative degree of the sample is higher. That is, the representativeness of the sample points can be characterized by the kernel density of the sample points, and the process of calculating the representative index can be converted into the process of calculating the kernel density.
Further, the representative index of the target sample image may be calculated by the following kernel density estimation formula (1-7):
Figure BDA0002394866660000111
where rep (x) represents a representative index of the target sample image x, n represents the number of sample points, i.e. the number of sample images of the unlabeled image set, h is the bandwidth of kernel density estimation, and the sample images of the unlabeled image set are represented by { x }1,x2,...,xi,...,xnK (, K) is a preset weight function.
Equations (1-7) are a weighted average calculation, while kernel K (—) is a weight function whose shape and value range control the number and degree of utilization of data points used to estimate rep (x) at point x, and intuitively, the effect of kernel density estimation depends on the choice of kernel and bandwidth h. A general weight function is symmetric about the origin and has an integral of 1, such as the commonly used Uniform, Epanechikov, Quartic, and Gaussian functions.
For the Uniform kernel, then only
Figure BDA0002394866660000112
The points whose absolute value is less than 1 (or the points whose distance from x is less than the bandwidth h) are used to estimate the value of rep (x), but the weight of all contributing data is the same.
For Gaussian kernel function, as can be seen from the expression of Rep (x), if xiThe closer to the x the more closely,
Figure BDA0002394866660000113
the closer to zero, the greater the density value, since the range of normal density is the entire real axis, all data are used to estimate the value of Rep (x), except that points closer to x have a greater effect on the estimation, and when h is small, only points particularly close to x will have a greater effect, and as h increases, the effect of points further away will increase.
If kernel functions of the form Epanechikov and quaritic are used, not only is there truncation (i.e. points at a distance from x greater than the bandwidth h do not work), but the weight of the points that work also becomes smaller as the distance from x increases. In general, the kernel function choice affects the kernel estimation much less than the bandwidth h choice.
The choice of the bandwidth h has a great influence on the estimator rep (x), if h is too small, the density estimation will favor assigning the probability density too close to the observation data, resulting in many false peaks in the estimated density function; if h is too large, the density estimate will spread the probability densities too far apart, thus losing some important density function features.
Specifically, the bandwidth h can be selected in the following 2 ways.
The selection mode (1) of the bandwidth is as follows:
to determine the bandwidth quality, it is necessary to know how to evaluate the nature of the density estimator rep (x). The integrated mean square error is typically used as a criterion to determine whether the density estimator is good or bad. The expression for the integrated mean square error is:
Figure BDA0002394866660000121
wherein:
Figure BDA0002394866660000122
AMISE (h) is called progressive mean square integral error, and to minimize this error, h must be set at some intermediate value, so that Rep (x) can be avoided from having too large a bias or variance. Regarding finding h to minimize the AMISE (h), it is desirable to accurately balance the order of the variance term and the variance term in AMISE (h), so that the optimal bandwidth is:
Figure BDA0002394866660000123
the selection mode (2) of the bandwidth is as follows:
bandwidth is chosen using the Silverman thumb rule, for simplicity, r (g) ═ g is defined2(z) dz, containing an unknown quantity R [ Rep (x) in the optimal bandwidth obtained for minimizing AMISE (h)]A first-class method, rule soft humb (thumb rule, i.e. empirical method), can be used: rep (x) is replaced by a normal density with variance and estimated variance matching, which is equivalent to
Figure BDA0002394866660000124
Estimating Rep (x), wherein
Figure BDA0002394866660000125
For standard normal density function, taking K (. about.X) as a Gaussian density kernel and σ as a sample variance
Figure BDA0002394866660000126
Then the optimal bandwidth can be obtained by using the Silverman thumb rule as follows:
Figure BDA0002394866660000127
after the uncertainty index and the representative index of the target sample image are calculated, the labeling value (x) of the target sample image can be calculated by the following formula (1-12):
Value(x)=f(x,L,u)*Repβ(x) (1-12)
in the formulas (1-12), the first term f (x, L, u)) is an uncertainty index of the target sample image x, and is based on the information content (including uncertainty and label diversity information) of the sample x under the current query policy condition, and the second term rep (x) represents a representative index of the target sample image x, and is obtained by calculating the kernel density of each instance in the reduced-dimension feature space, and is represented as the average similarity between x and other samples in the sample space. β is a parameter for adjusting the specific gravity of the two samples, and if the density of the space where the samples are located is higher, the samples have a higher chance to be selected for labeling.
Then, the sample image x with the highest labeling value is:
x*=argmax[f(x,L,u)*Repβ(x)]
105. and selecting and outputting the sample image with the highest labeling value from the unlabeled sample images.
And after the labeling value of each sample image is determined by combining the uncertainty index and the representative index, selecting the sample image with the highest labeling value for output, and submitting the sample image to manual labeling. Through testing, compared with the method for selecting sample images through Random Sampling (RS) under the general condition, the method for fusing deep Learning and Active Learning (AL) in the embodiment of the application can select the sample image with the highest classification contribution degree in the current unlabeled sample set for labeling each time based on the good feature expression capability of the depth model and the fusion of the Active selection strategy. And in a large number of unmarked original images, part of high-value samples are selected to be marked for experts without marking all samples, samples with lower quality are filtered, the most valuable samples for improving the deep learning model are selected each time to be added into training, and the workload of manual marking is effectively reduced on the basis of ensuring the task precision.
According to the method for selecting the sample images, when the sample images are selected for manual annotation, the uncertainty indexes and the representative indexes of each unmarked sample image are respectively calculated, and the annotation value of each sample image is determined by combining the uncertainty indexes and the representative indexes. Because the samples with larger representativeness cannot be outlier samples and can better reflect the characteristics of each sample in the sample set, the samples are the samples with higher labeling value as the samples with large uncertainty. When the annotation value of the sample image is measured, the uncertainty and the representativeness of the sample are considered at the same time, so that the part of the sample image with high annotation value can be selected for manual annotation, and the performance of the image classification model is optimized better.
Referring to fig. 2, a second embodiment of a method for selecting a sample image according to an embodiment of the present application includes:
201. acquiring an unlabelled image set and an labeled image set, wherein the unlabelled image set comprises a plurality of unlabelled sample images, and the labeled image set comprises a plurality of labeled sample images;
202. training to obtain an image classification model by taking the marked image set as a training set;
203. classifying each unlabeled sample image in the unlabeled image set by adopting the image classification model to obtain a classification result of each unlabeled sample image;
204. for each unlabeled sample image, respectively calculating to obtain respective uncertainty indexes and representative indexes according to respective classification results, and determining respective labeling values by combining the respective uncertainty indexes and the representative indexes;
205. selecting and outputting a sample image with the highest labeling value from the unlabeled sample images;
the steps 201-205 are the same as the steps 101-105, and the related description of the steps 101-105 can be referred to.
206. Transferring the sample image with the highest labeling value after manual labeling from the unlabeled image set to the labeled image set, and updating the labeled image set;
after the sample image with the highest labeling value is output, the sample image is handed to an expert for manual labeling. And then, transferring the manually marked sample image from the unmarked image set to the marked image set, and updating the marked image set.
207. Taking the updated labeled image set as a training set, and performing optimization updating on the image classification model;
and then, retraining the image classification model by taking the updated labeled image set as a training set so as to optimize and update model parameters and improve the performance of the model.
208. Judging whether the optimization updating times of the image classification model reach set iteration times or not, or whether the accuracy of the image classification model reaches a set threshold value or not;
after the image classification model is optimized and updated, whether the current optimization and update times reach the set iteration times or not or whether the accuracy of the image classification model reaches the set threshold value or not is judged. If yes, go to step 209; if not, returning to the step 203, and re-executing the iterative optimization operation of the classification model until the condition is met.
209. And determining the current image classification model as a final image classification model.
The number of times of optimizing and updating the image classification model reaches the set iteration number, or the accuracy of the image classification model reaches the set threshold value, and the optimizing and updating operation of the image classification model is completed at this time, so that the current model can be determined as a final image classification model, and the images to be classified are classified by adopting the final image classification model.
According to the method and the device, after the sample image with the highest labeling value is selected, the sample image is labeled manually, the sample image after being labeled manually is transferred from the unlabeled image set to the labeled image set, the updated labeled image set is used as a training set, the image classification model is optimized and updated, and therefore the performance of the image classification model is improved.
It should be understood that, the sequence numbers of the steps in the foregoing embodiments do not imply an execution sequence, and the execution sequence of each process should be determined by its function and inherent logic, and should not constitute any limitation to the implementation process of the embodiments of the present application.
Fig. 3 is a block diagram of a device for extracting a sample image according to an embodiment of the present application, which corresponds to the method for extracting a sample image described in the foregoing embodiment, and only shows portions related to the embodiment of the present application for convenience of description.
Referring to fig. 3, the apparatus includes:
an image set obtaining module 301, configured to obtain an unlabeled image set and an labeled image set, where the unlabeled image set includes multiple unlabeled sample images, and the labeled image set includes multiple labeled sample images;
a classification model training module 302, configured to train to obtain an image classification model by using the labeled image set as a training set;
a sample image classification module 303, configured to classify each unlabeled sample image in the unlabeled image set by using the image classification model, respectively, to obtain a classification result of each unlabeled sample image;
a sample annotation value determining module 304, configured to calculate, for each unlabeled sample image, a respective uncertainty index and a respective representative index according to a respective classification result, and determine a respective annotation value by combining the respective uncertainty index and the respective representative index, where the uncertainty index is used to measure uncertainty of an image classification result of a sample, and the representative index is used to measure a probability size of a sample that can be used as a representative sample of the unlabeled image set;
and a sample image selecting module 305, configured to select and output a sample image with the highest labeling value from the unlabeled sample images.
Further, the sample annotation value determination module can include:
a first uncertainty index calculation unit, configured to calculate an uncertainty index of any one unlabeled target sample image in the set of unlabeled images by using the following formula:
f(x,L,u)=-∑y∈Ypθ(y|x)*log(pθ(y|x))
wherein f (x, L, u) represents an uncertainty indicator of the target sample image x, L represents a sample of the labeled image set, u represents a sample of the unlabeled image set, and pθ(Y | x) represents the probability that the target sample image x belongs to a label Y, which is a pre-constructed set of label categories.
Further, the sample annotation value determination module can include:
a first representative index calculation unit for calculating a representative index of the target sample image by the following formula:
Figure BDA0002394866660000161
wherein Rep (x) represents a representative index of the target sample image x, n represents the number of sample images of the unlabeled image set, sim (x, x)i) Representing the target sample image x and one sample image x of the set of unlabeled imagesiThe similarity between the target sample image x and the target sample image x is assumed to be expressed as x ═ x in the attribute space1,x2,...,xj,...,xmThe sample image xiIs expressed as
Figure BDA0002394866660000162
Figure BDA0002394866660000163
Then sim (x, x)i) The specific expression of (A) is as follows:
Figure BDA0002394866660000164
the annotation value (x) of the target sample image is calculated by the following formula:
Value(x)=f(x,L,u)*Rep(x)。
further, the sample annotation value determination module can include:
the information entropy calculation unit is used for calculating an information entropy index of the target sample image;
the label quantity counting unit is used for counting the quantity of labels obtained by classifying the target sample images according to the classification result of the target sample images;
and the uncertainty index determining unit is used for calculating the uncertainty index of the target sample image by combining the information entropy index and the number of the labels.
Further, the information entropy calculating unit is specifically configured to calculate an information entropy index of the target sample image according to the following formula:
Ent(x,L,u)=-∑y∈Ypθ(y|x)*log(y|x)
wherein Ent (x, L, u) represents an information entropy index of the target sample image x, L represents a sample of the labeled image set, u represents a sample of the unlabeled image set, and pθ(Y | x) represents the probability that the target sample image x belongs to a label Y, Y being a pre-constructed label category set;
further, the sample annotation value determination module can include:
a second uncertainty index calculation unit for calculating an uncertainty index of the target sample image by the following formula:
f(x,L,u)=Ent(x,L,u)*Mul(x)a
wherein f (x, L, u) represents an uncertainty index of the target sample image x, mul (x) represents the number of labels, and a is a parameter for adjusting specific gravity.
Further, the sample annotation value determination module can include:
a second representative index calculation unit for calculating a representative index of the target sample image by the following kernel density estimation formula:
Figure BDA0002394866660000171
the annotation value (x) of the target sample image is calculated by the following formula:
Value(x)=f(x,L,u)*Repβ(x);
rep (x) represents a representative index of the target sample image x, n represents the number of sample images of the unlabeled image set, h is the bandwidth of the kernel density estimation, and the sample images of the unlabeled image set are represented by { x1,x2,...,xi,...,xnK (, K) is a preset weight function, and β is a parameter for adjusting specific gravity.
Further, the apparatus for extracting a sample image may further include:
the image set updating module is used for transferring the sample image with the highest labeling value after manual labeling from the unlabeled image set to the labeled image set and updating the labeled image set;
the image classification model optimization module is used for optimizing and updating the image classification model by taking the updated labeled image set as a training set;
and the image classification model determining module is used for determining the current image classification model as the final image classification model if the optimization updating times of the image classification model reach the set iteration times or the accuracy of the image classification model reaches the set threshold value.
Embodiments of the present application further provide a computer-readable storage medium, which stores computer-readable instructions, and the computer-readable instructions, when executed by a processor, implement the steps of any one of the methods for selecting a sample image as shown in fig. 1 or fig. 2.
Embodiments of the present application further provide a server, which includes a memory, a processor, and computer readable instructions stored in the memory and executable on the processor, and the processor executes the computer readable instructions to implement the steps of any one of the methods for selecting a sample image as shown in fig. 1 or fig. 2.
Embodiments of the present application further provide a computer program product, which when run on a server, causes the server to execute the steps of implementing any one of the methods for selecting a sample image as shown in fig. 1 or fig. 2.
Fig. 4 is a schematic diagram of a server according to an embodiment of the present application. As shown in fig. 4, the server 4 of this embodiment includes: a processor 40, a memory 41, and computer readable instructions 42 stored in the memory 41 and executable on the processor 40. The processor 40, when executing the computer readable instructions 42, performs the steps in the various sample image extraction method embodiments described above, such as steps 101-105 shown in fig. 1. Alternatively, the processor 40, when executing the computer readable instructions 42, implements the functions of the modules/units in the above device embodiments, such as the functions of the modules 301 to 305 shown in fig. 3.
Illustratively, the computer readable instructions 42 may be partitioned into one or more modules/units that are stored in the memory 41 and executed by the processor 40 to accomplish the present application. The one or more modules/units may be a series of computer-readable instruction segments capable of performing certain functions, which are used to describe the execution of the computer-readable instructions 42 in the server 4.
The server 4 may be a computing device such as a smart phone, a notebook, a palm computer, and a cloud server. The server 4 may include, but is not limited to, a processor 40, a memory 41. Those skilled in the art will appreciate that fig. 4 is merely an example of a server 4 and does not constitute a limitation of server 4 and may include more or fewer components than shown, or some components in combination, or different components, e.g., server 4 may also include input output devices, network access devices, buses, etc.
The Processor 40 may be a CentraL Processing Unit (CPU), other general purpose Processor, a DigitaL SignaL Processor (DSP), an AppLication Specific Integrated Circuit (ASIC), an off-the-shelf ProgrammabLe Gate Array (FPGA) or other ProgrammabLe logic device, discrete Gate or transistor logic device, discrete hardware component, etc. A general purpose processor may be a microprocessor or the processor may be any conventional processor or the like.
The memory 41 may be an internal storage unit of the server 4, such as a hard disk or a memory of the server 4. The memory 41 may also be an external storage device of the server 4, such as a plug-in hard disk, a Smart Media Card (SMC), a Secure DigitaL (SD) Card, a FLash memory Card (FLash Card), or the like, provided on the server 4. Further, the memory 41 may also include both an internal storage unit of the server 4 and an external storage device. The memory 41 is used to store the computer readable instructions and other programs and data required by the server. The memory 41 may also be used to temporarily store data that has been output or is to be output.
It should be noted that, for the information interaction, execution process, and other contents between the above-mentioned devices/units, the specific functions and technical effects thereof are based on the same concept as those of the embodiment of the method of the present application, and specific reference may be made to the part of the embodiment of the method, which is not described herein again.
It will be apparent to those skilled in the art that, for convenience and brevity of description, only the above-mentioned division of the functional units and modules is illustrated, and in practical applications, the above-mentioned function distribution may be performed by different functional units and modules according to needs, that is, the internal structure of the apparatus is divided into different functional units or modules to perform all or part of the above-mentioned functions. Each functional unit and module in the embodiments may be integrated in one processing unit, or each unit may exist alone physically, or two or more units are integrated in one unit, and the integrated unit may be implemented in a form of hardware, or in a form of software functional unit. In addition, specific names of the functional units and modules are only for convenience of distinguishing from each other, and are not used for limiting the protection scope of the present application. The specific working processes of the units and modules in the system may refer to the corresponding processes in the foregoing method embodiments, and are not described herein again.
The integrated unit, if implemented in the form of a software functional unit and sold or used as a stand-alone product, may be stored in a computer readable storage medium. Based on such understanding, all or part of the processes in the methods of the embodiments described above can be implemented by a computer program, which can be stored in a computer-readable storage medium and can implement the steps of the embodiments of the methods described above when the computer program is executed by a processor. Wherein the computer program comprises computer program code, which may be in the form of source code, object code, an executable file or some intermediate form, etc. The computer readable medium may include at least: any entity or device capable of carrying computer program code to a photographing apparatus/terminal apparatus, a recording medium, computer Memory, Read-Only Memory (ROM), random-access Memory (RAM), an electrical carrier signal, a telecommunications signal, and a software distribution medium. Such as a usb-disk, a removable hard disk, a magnetic or optical disk, etc. In certain jurisdictions, computer-readable media may not be an electrical carrier signal or a telecommunications signal in accordance with legislative and patent practice.
In the above embodiments, the descriptions of the respective embodiments have respective emphasis, and reference may be made to the related descriptions of other embodiments for parts that are not described or illustrated in a certain embodiment.
The above-mentioned embodiments are only used for illustrating the technical solutions of the present application, and not for limiting the same; although the present application has been described in detail with reference to the foregoing embodiments, it should be understood by those of ordinary skill in the art that: the technical solutions described in the foregoing embodiments may still be modified, or some technical features may be equivalently replaced; such modifications and substitutions do not substantially depart from the spirit and scope of the embodiments of the present application and are intended to be included within the scope of the present application.

Claims (10)

1. A method of selecting a sample image, comprising:
acquiring an unlabelled image set and an labeled image set, wherein the unlabelled image set comprises a plurality of unlabelled sample images, and the labeled image set comprises a plurality of labeled sample images;
training to obtain an image classification model by taking the marked image set as a training set;
classifying each unlabeled sample image in the unlabeled image set by adopting the image classification model to obtain a classification result of each unlabeled sample image;
for each unlabeled sample image, respectively calculating to obtain respective uncertainty indexes and representative indexes according to respective classification results, and determining respective labeling values by combining the respective uncertainty indexes and the representative indexes, wherein the uncertainty indexes are used for measuring the uncertainty of the image classification results of the samples, and the representative indexes are used for measuring the probability size of the samples which can be used as the representative samples of the unlabeled image set;
and selecting and outputting the sample image with the highest labeling value from the unlabeled sample images.
2. The method of selecting a sample image as claimed in claim 1, wherein the uncertainty indicator for any one unlabeled target sample image in the set of unlabeled images is calculated by the following formula:
f(x,L,u)=-∑y∈Ypθ(y|x)*log(pθ(y|x))
wherein f (x, L, u) represents an uncertainty indicator of the target sample image x, L represents a sample of the labeled image set, u represents a sample of the unlabeled image set, and pθ(Y | x) represents the probability that the target sample image x belongs to a label Y, which is a pre-constructed set of label categories.
3. The method of selecting a sample image as claimed in claim 2, wherein the representative index of the target sample image is calculated by the following formula:
Figure FDA0002394866650000011
wherein Rep (x) represents a representative index of the target sample image x, n represents the number of sample images of the unlabeled image set, sim (x, x)i) Representing the target sample image x and one sample image x of the set of unlabeled imagesiThe similarity between the target sample image x and the target sample image x is assumed to be expressed as x ═ x in the attribute space1,x2,...,xj,...,xmThe sample image xiIs expressed as
Figure FDA0002394866650000021
Figure FDA0002394866650000022
Then sim (x, x)i) The specific expression of (A) is as follows:
Figure FDA0002394866650000023
the annotation value (x) of the target sample image is calculated by the following formula:
Value(x)=f(x,L,u)*Rep(x)。
4. the method of selecting a sample image as claimed in claim 1, wherein the uncertainty indicator for any one unlabeled target sample image in the set of unlabeled images is determined by:
calculating an information entropy index of the target sample image;
counting the number of labels obtained by classifying the target sample image according to the classification result of the target sample image;
and calculating to obtain an uncertainty index of the target sample image by combining the information entropy index and the number of the labels.
5. The method of selecting a sample image as claimed in claim 4, wherein the information entropy index of the target sample image is calculated by the following formula:
Ent(x,L,u)=-∑y∈Ypθ(y|x)*log(y|x)
wherein Ent (x, L, u) represents an information entropy index of the target sample image x, L represents a sample of the labeled image set, u represents a sample of the unlabeled image set, and pθ(Y | x) represents the probability that the target sample image x belongs to a label Y, Y being a pre-constructed label category set;
the uncertainty indicator of the target sample image is calculated by the following formula:
f(x,L,u)=Ent(x,L,u)*Mul(x)a
wherein f (x, L, u) represents an uncertainty index of the target sample image x, mul (x) represents the number of labels, and a is a parameter for adjusting specific gravity.
6. The method of selecting a sample image as claimed in claim 5, wherein the representative index of the target sample image is calculated by the following kernel density estimation formula:
Figure FDA0002394866650000031
the annotation value (x) of the target sample image is calculated by the following formula:
Value(x)=f(x,L,u)*Repβ(x);
rep (x) represents a representative index of the target sample image x, n represents the number of sample images of the unlabeled image set, h is the bandwidth of the kernel density estimation, and the sample images of the unlabeled image set are represented by { x1,x2,...,xi,...,xnK (, K) is a preset weight function, and β is a parameter for adjusting specific gravity.
7. The method of selecting sample images according to any one of claims 1 to 6, further comprising, after selecting and outputting a sample image with the highest labeling value from among the unlabeled sample images:
transferring the sample image with the highest labeling value after manual labeling from the unlabeled image set to the labeled image set, and updating the labeled image set;
taking the updated labeled image set as a training set, and performing optimization updating on the image classification model;
and if the optimization updating times of the image classification model reach the set iteration times or the accuracy of the image classification model reaches the set threshold value, determining the current image classification model as the final image classification model.
8. An apparatus for selecting an image of a sample, comprising:
the image collection acquisition module is used for acquiring an unlabelled image collection and an labeled image collection, wherein the unlabelled image collection comprises a plurality of unlabelled sample images, and the labeled image collection comprises a plurality of labeled sample images;
the classification model training module is used for training to obtain an image classification model by taking the marked image set as a training set;
the sample image classification module is used for classifying each unlabeled sample image in the unlabeled image set by adopting the image classification model to obtain a classification result of each unlabeled sample image;
a sample annotation value determining module, configured to calculate, for each unlabeled sample image, a respective uncertainty index and a respective representative index according to a respective classification result, and determine a respective annotation value by combining the respective uncertainty index and the respective representative index, where the uncertainty index is used to measure uncertainty of an image classification result of a sample, and the representative index is used to measure a probability size of a sample that can be used as a representative sample of the unlabeled image set;
and the sample image selecting module is used for selecting and outputting the sample image with the highest labeling value from the unlabeled sample images.
9. A computer-readable storage medium, in which a computer program is stored which, when being executed by a processor, carries out the method of selecting a sample image according to any one of claims 1 to 7.
10. A server comprising a memory, a processor and a computer program stored in the memory and executable on the processor, wherein the processor when executing the computer program implements a method of selecting a sample image as claimed in any one of claims 1 to 7.
CN202010127598.6A 2020-02-28 2020-02-28 Method, device, storage medium and server for selecting sample image Active CN111310846B (en)

Priority Applications (2)

Application Number Priority Date Filing Date Title
CN202010127598.6A CN111310846B (en) 2020-02-28 2020-02-28 Method, device, storage medium and server for selecting sample image
PCT/CN2020/119302 WO2021169301A1 (en) 2020-02-28 2020-09-30 Method and device for selecting sample image, storage medium and server

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202010127598.6A CN111310846B (en) 2020-02-28 2020-02-28 Method, device, storage medium and server for selecting sample image

Publications (2)

Publication Number Publication Date
CN111310846A true CN111310846A (en) 2020-06-19
CN111310846B CN111310846B (en) 2024-07-02

Family

ID=71145364

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202010127598.6A Active CN111310846B (en) 2020-02-28 2020-02-28 Method, device, storage medium and server for selecting sample image

Country Status (2)

Country Link
CN (1) CN111310846B (en)
WO (1) WO2021169301A1 (en)

Cited By (13)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112614570A (en) * 2020-12-16 2021-04-06 上海壁仞智能科技有限公司 Sample set labeling method, pathological image classification method and classification model construction method and device
CN112785585A (en) * 2021-02-03 2021-05-11 腾讯科技(深圳)有限公司 Active learning-based training method and device for image video quality evaluation model
CN113064973A (en) * 2021-04-12 2021-07-02 平安国际智慧城市科技股份有限公司 Text classification method, device, equipment and storage medium
WO2021164306A1 (en) * 2020-09-17 2021-08-26 平安科技(深圳)有限公司 Image classification model training method, apparatus, computer device, and storage medium
WO2021169301A1 (en) * 2020-02-28 2021-09-02 平安科技(深圳)有限公司 Method and device for selecting sample image, storage medium and server
CN113435540A (en) * 2021-07-22 2021-09-24 中国人民大学 Image classification method, system, medium, and device when class distribution is mismatched
CN113487617A (en) * 2021-07-26 2021-10-08 推想医疗科技股份有限公司 Data processing method, data processing device, electronic equipment and storage medium
CN113590764A (en) * 2021-09-27 2021-11-02 智者四海(北京)技术有限公司 Training sample construction method and device, electronic equipment and storage medium
CN113657510A (en) * 2021-08-19 2021-11-16 支付宝(杭州)信息技术有限公司 Method and device for determining data sample with marked value
CN113706448A (en) * 2021-05-11 2021-11-26 腾讯科技(深圳)有限公司 Method, device and equipment for determining image and storage medium
CN114120048A (en) * 2022-01-26 2022-03-01 中兴通讯股份有限公司 Image processing method, electronic device and computer storage medium
CN114141382A (en) * 2021-12-10 2022-03-04 厦门影诺医疗科技有限公司 Digestive endoscopy video data screening and labeling method, system and application
CN116994085A (en) * 2023-06-27 2023-11-03 中电金信软件有限公司 Image sample screening method, model training method, device and computer equipment

Families Citing this family (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113793604B (en) * 2021-09-14 2024-01-05 思必驰科技股份有限公司 Speech recognition system optimization method and device
CN114139726A (en) * 2021-12-01 2022-03-04 北京欧珀通信有限公司 Data processing method and device, electronic equipment and storage medium
CN116246756B (en) * 2023-01-06 2023-12-22 浙江医准智能科技有限公司 Model updating method, device, electronic equipment and medium

Citations (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20100217732A1 (en) * 2009-02-24 2010-08-26 Microsoft Corporation Unbiased Active Learning
CN108764281A (en) * 2018-04-18 2018-11-06 华南理工大学 A kind of image classification method learning across task depth network based on semi-supervised step certainly
CN109299668A (en) * 2018-08-30 2019-02-01 中国科学院遥感与数字地球研究所 A kind of hyperspectral image classification method based on Active Learning and clustering
CN109388784A (en) * 2018-09-12 2019-02-26 深圳大学 Minimum entropy Density Estimator device generation method, device and computer readable storage medium
CN109886925A (en) * 2019-01-19 2019-06-14 天津大学 A kind of aluminium material surface defect inspection method that Active Learning is combined with deep learning
WO2019232853A1 (en) * 2018-06-04 2019-12-12 平安科技(深圳)有限公司 Chinese model training method, chinese image recognition method, device, apparatus and medium
CN110766080A (en) * 2019-10-24 2020-02-07 腾讯科技(深圳)有限公司 Method, device and equipment for determining labeled sample and storage medium

Family Cites Families (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US9002100B2 (en) * 2008-04-02 2015-04-07 Xerox Corporation Model uncertainty visualization for active learning
CN110689038B (en) * 2019-06-25 2024-02-02 深圳市腾讯计算机系统有限公司 Training method and device for neural network model and medical image processing system
CN111310846B (en) * 2020-02-28 2024-07-02 平安科技(深圳)有限公司 Method, device, storage medium and server for selecting sample image

Patent Citations (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20100217732A1 (en) * 2009-02-24 2010-08-26 Microsoft Corporation Unbiased Active Learning
CN108764281A (en) * 2018-04-18 2018-11-06 华南理工大学 A kind of image classification method learning across task depth network based on semi-supervised step certainly
WO2019232853A1 (en) * 2018-06-04 2019-12-12 平安科技(深圳)有限公司 Chinese model training method, chinese image recognition method, device, apparatus and medium
CN109299668A (en) * 2018-08-30 2019-02-01 中国科学院遥感与数字地球研究所 A kind of hyperspectral image classification method based on Active Learning and clustering
CN109388784A (en) * 2018-09-12 2019-02-26 深圳大学 Minimum entropy Density Estimator device generation method, device and computer readable storage medium
CN109886925A (en) * 2019-01-19 2019-06-14 天津大学 A kind of aluminium material surface defect inspection method that Active Learning is combined with deep learning
CN110766080A (en) * 2019-10-24 2020-02-07 腾讯科技(深圳)有限公司 Method, device and equipment for determining labeled sample and storage medium

Cited By (18)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2021169301A1 (en) * 2020-02-28 2021-09-02 平安科技(深圳)有限公司 Method and device for selecting sample image, storage medium and server
WO2021164306A1 (en) * 2020-09-17 2021-08-26 平安科技(深圳)有限公司 Image classification model training method, apparatus, computer device, and storage medium
CN112614570A (en) * 2020-12-16 2021-04-06 上海壁仞智能科技有限公司 Sample set labeling method, pathological image classification method and classification model construction method and device
CN112614570B (en) * 2020-12-16 2022-11-25 上海壁仞智能科技有限公司 Sample set labeling method, pathological image classification method, classification model construction method and device
CN112785585A (en) * 2021-02-03 2021-05-11 腾讯科技(深圳)有限公司 Active learning-based training method and device for image video quality evaluation model
CN112785585B (en) * 2021-02-03 2023-07-28 腾讯科技(深圳)有限公司 Training method and device for image video quality evaluation model based on active learning
CN113064973A (en) * 2021-04-12 2021-07-02 平安国际智慧城市科技股份有限公司 Text classification method, device, equipment and storage medium
CN113706448B (en) * 2021-05-11 2022-07-12 腾讯医疗健康(深圳)有限公司 Method, device and equipment for determining image and storage medium
CN113706448A (en) * 2021-05-11 2021-11-26 腾讯科技(深圳)有限公司 Method, device and equipment for determining image and storage medium
CN113435540A (en) * 2021-07-22 2021-09-24 中国人民大学 Image classification method, system, medium, and device when class distribution is mismatched
CN113487617A (en) * 2021-07-26 2021-10-08 推想医疗科技股份有限公司 Data processing method, data processing device, electronic equipment and storage medium
CN113657510A (en) * 2021-08-19 2021-11-16 支付宝(杭州)信息技术有限公司 Method and device for determining data sample with marked value
CN113590764B (en) * 2021-09-27 2021-12-21 智者四海(北京)技术有限公司 Training sample construction method and device, electronic equipment and storage medium
CN113590764A (en) * 2021-09-27 2021-11-02 智者四海(北京)技术有限公司 Training sample construction method and device, electronic equipment and storage medium
CN114141382A (en) * 2021-12-10 2022-03-04 厦门影诺医疗科技有限公司 Digestive endoscopy video data screening and labeling method, system and application
CN114120048A (en) * 2022-01-26 2022-03-01 中兴通讯股份有限公司 Image processing method, electronic device and computer storage medium
WO2023143038A1 (en) * 2022-01-26 2023-08-03 中兴通讯股份有限公司 Image processing method, electronic device, and computer-readable storage medium
CN116994085A (en) * 2023-06-27 2023-11-03 中电金信软件有限公司 Image sample screening method, model training method, device and computer equipment

Also Published As

Publication number Publication date
CN111310846B (en) 2024-07-02
WO2021169301A1 (en) 2021-09-02

Similar Documents

Publication Publication Date Title
CN111310846B (en) Method, device, storage medium and server for selecting sample image
TWI689871B (en) Gradient lifting decision tree (GBDT) model feature interpretation method and device
WO2018121690A1 (en) Object attribute detection method and device, neural network training method and device, and regional detection method and device
CN111079780B (en) Training method for space diagram convolution network, electronic equipment and storage medium
CN112862093B (en) Graphic neural network training method and device
CN111127364B (en) Image data enhancement strategy selection method and face recognition image data enhancement method
WO2019015246A1 (en) Image feature acquisition
CN112231592B (en) Graph-based network community discovery method, device, equipment and storage medium
CN115439887A (en) Pedestrian re-identification method and system based on pseudo label optimization and storage medium
CN110348516B (en) Data processing method, data processing device, storage medium and electronic equipment
CN116451081A (en) Data drift detection method, device, terminal and storage medium
CN112115996A (en) Image data processing method, device, equipment and storage medium
CN111062406B (en) Heterogeneous domain adaptation-oriented semi-supervised optimal transmission method
CN110059743B (en) Method, apparatus and storage medium for determining a predicted reliability metric
CN111612021B (en) Error sample identification method, device and terminal
WO2012077818A1 (en) Method for determining conversion matrix for hash function, hash-type approximation nearest neighbour search method using said hash function, and device and computer program therefor
CN116229172A (en) Federal few-sample image classification model training method, classification method and equipment based on comparison learning
CN115982351A (en) Test question evaluation method and related device, electronic equipment and storage medium
CN117454375A (en) Malicious encryption traffic identification model training method and device and electronic equipment
CN113673583A (en) Image recognition method, recognition network training method and related device
CN113283598A (en) Model training method and device, storage medium and electronic equipment
CN111797905A (en) Target detection optimization method based on positive and negative sample sampling ratio and model fine tuning
CN117058498B (en) Training method of segmentation map evaluation model, and segmentation map evaluation method and device
CN113672783B (en) Feature processing method, model training method and media resource processing method
CN110890978B (en) Cross-region communication quality prediction method with privacy protection based on model reuse

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
REG Reference to a national code

Ref country code: HK

Ref legal event code: DE

Ref document number: 40032370

Country of ref document: HK

SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant