WO2021169301A1 - Procédé et dispositif de sélection d'image d'échantillon, support de stockage et serveur - Google Patents

Procédé et dispositif de sélection d'image d'échantillon, support de stockage et serveur Download PDF

Info

Publication number
WO2021169301A1
WO2021169301A1 PCT/CN2020/119302 CN2020119302W WO2021169301A1 WO 2021169301 A1 WO2021169301 A1 WO 2021169301A1 CN 2020119302 W CN2020119302 W CN 2020119302W WO 2021169301 A1 WO2021169301 A1 WO 2021169301A1
Authority
WO
WIPO (PCT)
Prior art keywords
image
sample
unlabeled
index
sample image
Prior art date
Application number
PCT/CN2020/119302
Other languages
English (en)
Chinese (zh)
Inventor
王俊
高鹏
谢国彤
柳杨
Original Assignee
平安科技(深圳)有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 平安科技(深圳)有限公司 filed Critical 平安科技(深圳)有限公司
Publication of WO2021169301A1 publication Critical patent/WO2021169301A1/fr

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/21Design or setup of recognition systems or techniques; Extraction of features in feature space; Blind source separation
    • G06F18/214Generating training patterns; Bootstrap methods, e.g. bagging or boosting
    • G06F18/2155Generating training patterns; Bootstrap methods, e.g. bagging or boosting characterised by the incorporation of unlabelled data, e.g. multiple instance learning [MIL], semi-supervised techniques using expectation-maximisation [EM] or naïve labelling
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/24Classification techniques
    • G06F18/241Classification techniques relating to the classification model, e.g. parametric or non-parametric approaches

Definitions

  • This application belongs to the field of computer technology, and in particular relates to a method, device, storage medium, and server for selecting sample images.
  • an image annotation method based on active learning is usually used.
  • the image annotation method mainly includes: acquiring a part of annotated images and a part of unlabeled images; using this part of annotated images as a training set , Training to obtain an initial classification model; use the initial classification model to classify and predict the part of the unlabeled image to obtain the prediction result of each image; calculate the reliability of the prediction result of each image separately, and select which is not
  • the image with the greatest degree of certainty is handed over to experts for manual image annotation; the artificially annotated image is added to the training set, and the classification model is retrained to optimize the classification model; repeat the above steps until the accuracy of the classification model meets the requirements, or iterate The number of times reached the specified number.
  • a part of samples can be selected from a large number of unlabeled images and handed over to manual labeling, thereby reducing the workload of manual labeling.
  • this application proposes a method for selecting sample images, which can select the part of the sample images with high annotation value for manual annotation, so as to better optimize the performance of the image classification model.
  • an embodiment of the present application provides a method for selecting sample images, including:
  • an image classification model is obtained through training
  • the respective uncertainty indicators and representative indicators are calculated according to the respective classification results, and the respective uncertainty indicators and representative indicators are combined to determine the respective labeled values.
  • the uncertainty index is used to measure the uncertainty of the image classification result of the sample
  • the representative index is used to measure the probability that the sample can be used as a representative sample of the unlabeled image set;
  • an apparatus for selecting sample images including:
  • An image collection acquisition module configured to acquire an unlabeled image set and a labeled image set, the unlabeled image set includes a plurality of unlabeled sample images, and the labeled image set includes a plurality of labeled sample images;
  • the classification model training module is configured to use the labeled image set as a training set to obtain an image classification model through training;
  • a sample image classification module configured to use the image classification model to respectively classify each unlabeled sample image in the unlabeled image set to obtain a classification result of each of the unlabeled sample images
  • the sample labeling value determination module is used to calculate the respective uncertainty index and representative index according to the respective classification results for each of the unlabeled sample images, and combine the respective uncertainty index and representative index Determining the respective annotation values, the uncertainty index is used to measure the uncertainty of the image classification result of the sample, and the representative index is used to measure the probability that the sample can be used as a representative sample of the unlabeled image set;
  • the sample image selection module is used to select and output the sample image with the highest annotated value from each of the unlabeled sample images.
  • an embodiment of the present application provides a computer-readable storage medium, where the computer-readable storage medium stores computer-readable instructions, and when the computer-readable instructions are executed by a processor, the following steps are implemented:
  • an image classification model is obtained through training
  • the respective uncertainty indicators and representative indicators are calculated according to the respective classification results, and the respective uncertainty indicators and representative indicators are combined to determine the respective labeled values.
  • the uncertainty index is used to measure the uncertainty of the image classification result of the sample
  • the representative index is used to measure the probability that the sample can be used as a representative sample of the unlabeled image set;
  • an embodiment of the present application provides a server, including a memory, a processor, and computer-readable instructions stored in the memory and running on the processor, and the processor executes the computer The following steps are implemented when reading instructions:
  • an image classification model is obtained through training
  • the respective uncertainty indicators and representative indicators are calculated according to the respective classification results, and the respective uncertainty indicators and representative indicators are combined to determine the respective labeled values.
  • the uncertainty index is used to measure the uncertainty of the image classification result of the sample
  • the representative index is used to measure the probability that the sample can be used as a representative sample of the unlabeled image set;
  • the embodiments of the present application provide a computer program product, which when the computer program product runs on a terminal device, causes the terminal device to execute the method for selecting sample images as described in the first aspect.
  • the embodiment of the present application has the beneficial effect that when selecting the sample images for manual annotation, the uncertainty index and representative index of each unlabeled sample image are calculated separately in the above process, and the uncertainty is combined. Sex indicators and representative indicators are used to determine the labeling value of each sample image. Since the samples with larger representativeness are unlikely to be outliers and can better reflect the characteristics of each sample in the sample set, this part of the samples, like the samples with large uncertainties, are all samples with higher annotated value. When measuring the annotation value of sample images, this application will also consider the uncertainty and representativeness of the sample, so that part of the sample images with high annotation value can be selected for manual annotation, so as to better optimize the performance of the image classification model.
  • Fig. 1 is a flowchart of a first embodiment of a method for selecting a sample image provided by an embodiment of the present application
  • FIG. 2 is a flowchart of a second embodiment of a method for selecting a sample image provided by an embodiment of the present application
  • Fig. 3 is a structural diagram of an embodiment of a device for selecting sample images provided by an embodiment of the present application
  • Fig. 4 is a schematic diagram of a server provided by an embodiment of the present application.
  • This application proposes a method for selecting sample images, which can select the part of the sample images with high annotation value for manual annotation, so as to better optimize the performance of the image classification model.
  • a first embodiment of a method for selecting a sample image in an embodiment of the present application includes:
  • the unlabeled image set contains multiple unlabeled sample images
  • the labeled image set contains multiple labeled sample images.
  • These sample images can be multi-label images, that is, one image contains multiple different category labels.
  • an ophthalmic OCT image may contain labels for 1 to 6 types of lesions at the same time (vitreous macular traction, macular or epiretinal membrane, macular hole, intraretinal effusion, pigment epithelial detachment, drusen or retinal atrophy, etc. ).
  • the image classification model can use various types of deep models, such as DenseNet, ResNet, ResNeXt, MobileNet, and NASNet. Among them, DenseNet is preferably used to realize feature reuse through the connection of features on the channel, which can achieve better performance than ResNet with fewer parameters and calculation costs.
  • the image classification model is used to classify each unlabeled sample image in the unlabeled image set, and the classification result of each unlabeled sample image is obtained.
  • the classification result of a sample image is specifically the probability that the sample image belongs to different preset category labels, such as category label A-90%, category label B-20%, and so on.
  • the strategy of Uncertainty+Representativeness is selected to measure the annotation value of the sample image.
  • the uncertainty index and representative index of the sample image will be calculated according to the classification result of the sample image, and then the uncertainty index and representative index will be combined to determine the labeling value of the sample image .
  • the uncertainty index is used to measure the uncertainty of the image classification result of the sample
  • the representative index is used to measure the probability that the sample can be used as a representative sample of the unlabeled image set.
  • the uncertainty index of any unlabeled target sample image in the unlabeled image set may be calculated by the following formula (1-1):
  • f(x,L,u) represents the uncertainty index of the target sample image x
  • L represents the sample of the labeled image set
  • u represents the sample of the unlabeled image set
  • x) represents the probability that the target sample image x belongs to the label y
  • Y is a pre-built label category set.
  • the representativeness of the sample it can be measured by how many samples are similar to the sample. A sample with larger representativeness cannot be an outlier sample. The higher the similarity between samples, the more consistent the characteristics of the samples. If the similarity is greater than A certain set threshold is regarded as redundancy of sample information. Specifically, the similarity coefficient can be used to calculate the similarity between the sample points.
  • the representative index of the target sample image x can be calculated by calculating the mean value of the similarity, as shown in the following formula (1-3):
  • Rep(x) represents a representative index of the target sample image x
  • n represents the number of sample images in the unlabeled image set.
  • the label value of the target sample image can be determined based on these two indicators.
  • the label value Value(x) can be calculated by the following formula (1-4):
  • the first term f(x, L, u) represents the amount of information (uncertainty index) of sample x under the current query strategy conditions
  • the second term is a representative indicator.
  • the similarity to other samples is calculated and expressed as the average similarity between sample x and all samples in the sample space. If the density of the space where the sample is located is greater, and the information content of the sample is higher, then the sample has a greater chance of being selected for labeling.
  • samples with high similarity are deleted to reduce the addition of redundant samples.
  • the sample uncertainty index and representative index are integrated to retain representative samples with high information content, which effectively solves the problem of outliers. The impact of sample selection quality.
  • argmax is a kind of function, which is a function for obtaining parameters (sets) of the function.
  • the uncertainty index of any unlabeled target sample image in the unlabeled image set may be determined through the following steps:
  • step (1) the uncertainty of the current sample is measured by information entropy, and the information entropy index of the sample image x is defined as Ent(x,L,u), L represents the sample of the labeled set, and u represents the unidentified
  • Ent(x,L,u) can be calculated by the following formula (1-5):
  • Ent(x,L,u) - ⁇ y ⁇ Y p ⁇ (y
  • x) represents the probability that the target sample image x belongs to the label y
  • Y is a pre-built label category set.
  • f(x, L, u) represents the uncertainty index of the target sample image x
  • Mul(x) represents the number of tags
  • a is a parameter for adjusting the specific gravity.
  • the representative index of the sample image it can be measured by the similarity between the sample and other samples.
  • a sample with larger representativeness cannot be an outlier sample.
  • the LargeVis method can be used to reduce the dimension of the extracted high-dimensional feature vector.
  • the sample is calculated by the kernel density method The distribution density of the location of the point, the higher the core density, the higher the representative area, and the higher the representative degree of the sample. That is, the representativeness of the sample points can be characterized by the nuclear density of the sample points, and the process of calculating the representative index can be converted into the process of calculating the nuclear density.
  • the representative index of the target sample image can be calculated by the following kernel density estimation formula (1-7):
  • Rep(x) represents the representative index of the target sample image x
  • n represents the number of sample points, that is, the number of sample images in the unlabeled image set
  • h is the bandwidth of nuclear density estimation
  • the sample images of the image set are represented as ⁇ x 1 ,x 2 ,...,x i ,...,x n ⁇
  • K(*) is a preset weight function.
  • Formula (1-7) is a weighted average calculation, and the kernel function K(*) is a weight function.
  • the shape and range of the kernel function control the data points used to estimate the value of Rep(x) at point x
  • the number and the degree of utilization intuitively speaking, the effect of kernel density estimation depends on the choice of kernel function and bandwidth h.
  • the general weight function is symmetric about the origin, and its integral is 1, such as the commonly used Uniform, Epanechikov, Quartic and Gaussian functions.
  • kernel functions like Epanechikov and Quartic, not only is there truncation (that is, points whose distance from x is greater than the bandwidth h will not work), and the weight of the points that work will also become smaller as the distance from x increases. .
  • the choice of the kernel function has much less influence on the kernel estimation than the choice of the bandwidth h.
  • the bandwidth h can be selected in the following two ways.
  • the integral mean square error is usually used as a criterion for judging the quality of the density estimator.
  • the expression of the integral mean square error is:
  • AMISE(h) is called the asymptotic mean square integral error. To minimize the error, h must be set at a certain intermediate value, so as to avoid excessive deviation or variance of Rep(x). Regarding finding h to minimize AMIDE(h), it is best to accurately balance the order of the deviation and variance terms in AMISE(h), so the optimal bandwidth is:
  • the labeled value Value(x) of the target sample image can be calculated by the following formula (1-12):
  • the first term f(x,L,u)) is the uncertainty index of the target sample image x, which is based on the amount of information (including uncertainty) of sample x under the current query strategy.
  • Information the second term Rep(x) represents the representative index of the target sample image x, which is calculated by the kernel density of each instance in the dimensionality reduction feature space, expressed as x and the sample space The average similarity of other samples.
  • is a parameter that adjusts the proportion of the two. If the density of the space where the sample is located, the higher the information content of the sample, then the sample has a greater chance of being selected for labeling.
  • the sample image with the highest label value is selected for output, and handed to manual labeling.
  • the method of fusing deep learning and active learning (AL, Active Learning) in the embodiment of this application can be based on the good feature expression ability of the deep model.
  • the active selection strategy selects the sample image with the highest contribution to classification in the current unlabeled sample set to label each time. From a large number of original images that have never been labeled, select some high-value samples to be labeled by experts. It is not necessary to label all samples, filter out samples with lower quality, and select the most valuable samples for improving the deep learning model each time Adding training can effectively reduce the workload of manual labeling on the basis of ensuring the accuracy of the task.
  • the uncertainty index and representative index of each unlabeled sample image are calculated separately, and the uncertainty index and representative index are combined To determine the annotation value of each sample image. Since a sample with a larger representativeness cannot be an outlier sample, it can better reflect the characteristics of each sample in the sample set, so this part of the sample is the same as the sample with a large uncertainty, and it is a sample with a higher label value.
  • this application will also consider the uncertainty and representativeness of the sample, so that part of the sample images with high annotation value can be selected for manual annotation, so as to better optimize the performance of the image classification model.
  • a second embodiment of a method for selecting a sample image in an embodiment of the present application includes:
  • Steps 201-205 are the same as steps 101-105. For details, please refer to the relevant descriptions of steps 101-105.
  • this part of the sample images will be handed over to experts for manual annotation.
  • the manually labeled sample images are transferred from the unlabeled image set to the labeled image set, and the labeled image set is updated.
  • the image classification model is retrained by using the updated labeled image set as a training set to optimize the updated model parameters and improve the performance of the model.
  • step 209 After the image classification model is optimized and updated, it is determined whether the current number of optimization updates reaches the set number of iterations, or whether the accuracy of the image classification model reaches the set threshold. If yes, go to step 209; if not, go back to step 203, and re-execute the iterative optimization operation of the classification model until the condition is met.
  • the number of optimization updates of the image classification model reaches the set number of iterations, or the accuracy of the image classification model reaches the set threshold. At this time, the optimization update operation of the image classification model has been completed, so the current model can be determined as the final Image classification model, and use the final image classification model to classify the image to be classified.
  • this embodiment of the application manually annotates these sample images, and then transfers the manually annotated sample images from the unannotated image set to the annotated image set to obtain the updated image.
  • the labeled image set is used as a training set to optimize and update the image classification model, thereby improving the performance of the image classification model.
  • FIG. 3 shows a structural block diagram of the apparatus for selecting sample images provided by an embodiment of the present application. part.
  • the device includes:
  • An image collection acquisition module 301 configured to acquire an unlabeled image set and an labeled image set, the unlabeled image set includes a plurality of unlabeled sample images, and the labeled image set includes a plurality of labeled sample images;
  • the classification model training module 302 is configured to use the labeled image set as a training set to obtain an image classification model through training;
  • the sample image classification module 303 is configured to use the image classification model to separately classify each unlabeled sample image in the unlabeled image set to obtain a classification result of each of the unlabeled sample images;
  • the sample labeling value determination module 304 is used to calculate the uncertainty index and representative index for each of the unlabeled sample images according to the respective classification results, and combine the respective uncertainty index and representativeness
  • the indicators determine the respective labeled values, the uncertainty indicators are used to measure the uncertainty of the sample's image classification results, and the representative indicators are used to measure the probability that the sample can be used as a representative sample of the unlabeled image set ;
  • the sample image selection module 305 is configured to select and output the sample image with the highest annotated value from each of the unlabeled sample images.
  • sample labeling value determination module may include:
  • f(x,L,u) represents the uncertainty index of the target sample image x
  • L represents the sample of the labeled image set
  • u represents the sample of the unlabeled image set
  • x) represents the probability that the target sample image x belongs to the label y
  • Y is a pre-built label category set.
  • sample labeling value determination module may include:
  • the first representative index calculation unit is configured to calculate the representative index of the target sample image by the following formula:
  • Rep(x) represents the representative index of the target sample image x
  • n represents the number of sample images in the unlabeled image set
  • sim(x, x i ) represents the target sample image x and the unlabeled image.
  • the labeled value Value(x) of the target sample image can be calculated by the following formula:
  • sample labeling value determination module may include:
  • An information entropy calculation unit for calculating the information entropy index of the target sample image
  • the label quantity counting unit is configured to count the number of labels obtained by classifying the target sample image according to the classification result of the target sample image
  • the uncertainty index determination unit is configured to calculate the uncertainty index of the target sample image by combining the information entropy index and the number of tags.
  • the information entropy calculation unit is specifically configured to calculate the information entropy index of the target sample image by using the following formula:
  • Ent(x,L,u) - ⁇ y ⁇ Y p ⁇ (y
  • Ent(x,L,u) represents the information entropy index of the target sample image x
  • L represents the sample of the labeled image set
  • u represents the sample of the unlabeled image set
  • x ) Represents the probability that the target sample image x belongs to the label y
  • Y is a pre-built label category set
  • sample labeling value determination module may include:
  • the second uncertainty index calculation unit is used to calculate the uncertainty index of the target sample image by the following formula:
  • f(x, L, u) represents the uncertainty index of the target sample image x
  • Mul(x) represents the number of tags
  • a is a parameter for adjusting the specific gravity.
  • sample labeling value determination module may include:
  • the second representative index calculation unit is used to calculate the representative index of the target sample image through the following kernel density estimation formula:
  • the labeled value Value(x) of the target sample image can be calculated by the following formula:
  • Rep(x) represents the representative index of the target sample image x
  • n represents the number of sample images of the unlabeled image set
  • h is the bandwidth for kernel density estimation
  • the sample image of the unlabeled image set represents Is ⁇ x 1 ,x 2 ,...,x i ,...,x n ⁇
  • K(*) is the preset weight function
  • is the parameter for adjusting the specific gravity.
  • the device for selecting sample images may further include:
  • An image collection update module configured to transfer the manually-annotated sample images with the highest annotated value from the unlabeled image collection to the labeled image collection, and update the labeled image collection;
  • An image classification model optimization module configured to use the updated labeled image set as a training set to optimize and update the image classification model
  • the image classification model determination module is used to determine the current image classification model if the number of optimization updates of the image classification model reaches the set number of iterations, or the accuracy of the image classification model reaches a set threshold For the final image classification model.
  • An embodiment of the present application also provides a computer-readable storage medium, the computer-readable storage medium stores computer-readable instructions, and when the computer-readable instructions are executed by a processor, any one of those shown in FIG. 1 or FIG. 2 is implemented. The steps of a method for selecting sample images.
  • the computer-readable storage medium may be non-volatile or volatile.
  • An embodiment of the present application further provides a server, including a memory, a processor, and computer-readable instructions stored in the memory and capable of running on the processor.
  • the processor executes the computer-readable instructions when the computer-readable instructions are executed. The steps of any method for selecting a sample image as shown in Fig. 1 or Fig. 2.
  • the embodiment of the present application also provides a computer program product.
  • the computer program product runs on a server
  • the server executes the steps of implementing any method for selecting sample images as shown in FIG. 1 or FIG. 2.
  • Fig. 4 is a schematic diagram of a server provided by an embodiment of the present application.
  • the server 4 in this embodiment includes a processor 40, a memory 41, and computer-readable instructions 42 stored in the memory 41 and running on the processor 40.
  • the processor 40 executes the computer-readable instructions 42, the steps in the above-mentioned method embodiments for selecting sample images are implemented, for example, steps 101 to 105 shown in FIG. 1.
  • the processor 40 implements the functions of the modules/units in the foregoing device embodiments when executing the computer-readable instructions 42, for example, the functions of the modules 301 to 305 shown in FIG. 3.
  • the computer-readable instructions 42 may be divided into one or more modules/units, and the one or more modules/units are stored in the memory 41 and executed by the processor 40, To complete this application.
  • the one or more modules/units may be a series of computer-readable instruction segments capable of completing specific functions, and the instruction segments are used to describe the execution process of the computer-readable instructions 42 in the server 4.
  • the server 4 may be a computing device such as a smart phone, a notebook, a palmtop computer, and a cloud server.
  • the server 4 may include, but is not limited to, a processor 40 and a memory 41.
  • FIG. 4 is only an example of the server 4, and does not constitute a limitation on the server 4. It may include more or less components than shown, or a combination of certain components, or different components, such as
  • the server 4 may also include input and output devices, network access devices, buses, and the like.
  • the processor 40 may be a central processing unit (CentraL Processing Unit, CPU), or other general-purpose processors, digital signal processors (DigitaL Signal Processor, DSP), application specific integrated circuits (AppLication Specific Integrated Circuit, ASIC), off-the-shelf Programmable gate array (FieLd-Programmable Gate Array, FPGA) or other programmable logic devices, discrete gate or transistor logic devices, discrete hardware components, etc.
  • the general-purpose processor may be a microprocessor or the processor may also be any conventional processor or the like.
  • the storage 41 may be an internal storage unit of the server 4, such as a hard disk or a memory of the server 4.
  • the memory 41 may also be an external storage device of the server 4, for example, a plug-in hard disk equipped on the server 4, a smart media card (SMC), or a secure digital (SD) card. Flash Card (FLash Card), etc.
  • the memory 41 may also include both an internal storage unit of the server 4 and an external storage device.
  • the memory 41 is used to store the computer readable instructions and other programs and data required by the server.
  • the memory 41 can also be used to temporarily store data that has been output or will be output.
  • the integrated unit is implemented in the form of a software functional unit and sold or used as an independent product, it can be stored in a computer readable storage medium.
  • this application implements all or part of the processes in the above-mentioned embodiments and methods, which can be completed by instructing relevant hardware through a computer program.
  • the computer program can be stored in a storage medium, and the computer program is being processed. When the device is executed, the steps of the above-mentioned method embodiments can be realized.
  • the computer program includes computer program code, and the computer program code may be in the form of source code, object code, executable file, or some intermediate forms.
  • the computer-readable medium may at least include: any entity or device capable of carrying the computer program code to the photographing device/terminal device, recording medium, computer memory, read-only memory (ROM, Read-Only Memory), and random access memory (RAM, Random Access Memory), electric carrier signal, telecommunications signal and software distribution medium.
  • ROM read-only memory
  • RAM random access memory
  • electric carrier signal telecommunications signal and software distribution medium.
  • U disk mobile hard disk, floppy disk or CD-ROM, etc.
  • computer-readable media cannot be electrical carrier signals and telecommunication signals.

Abstract

La présente invention est applicable au domaine technique des ordinateurs, et concerne un procédé et un dispositif de sélection d'une image d'échantillon, un support de stockage et un serveur. Par l'adoption du procédé, lorsque des images d'échantillon sont sélectionnées en vue d'une annotation manuelle, l'indice d'incertitude et l'indice de représentativité de chaque image d'échantillon non annotée sont calculés respectivement, et la valeur d'annotation de chaque image d'échantillon est déterminée par combinaison de l'indice d'incertitude et de l'indice de représentativité. Étant donné que l'échantillon ayant une représentativité supérieure ne pourrait pas être un échantillon aberrant, les caractéristiques de chaque échantillon dans un ensemble d'échantillons peuvent être mieux reflétées, de telle sorte que ladite partie de l'échantillon soit la même que l'échantillon ayant une plus grande incertitude et ce sont tous deux des échantillons ayant une valeur de marquage supérieure. Selon la présente invention, lorsque la valeur d'annotation de l'image échantillon est mesurée, l'incertitude et la représentativité de l'échantillon sont considérées en même temps, de sorte que la partie de l'image d'échantillon ayant la valeur d'annotation élevée puisse être sélectionnée en vue d'une annotation manuelle, ce qui permet d'optimiser les performances d'un modèle de classification d'image. La présente invention concerne en outre le domaine de l'intelligence artificielle.
PCT/CN2020/119302 2020-02-28 2020-09-30 Procédé et dispositif de sélection d'image d'échantillon, support de stockage et serveur WO2021169301A1 (fr)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN202010127598.6A CN111310846A (zh) 2020-02-28 2020-02-28 一种选取样本图像的方法、装置、存储介质和服务器
CN202010127598.6 2020-02-28

Publications (1)

Publication Number Publication Date
WO2021169301A1 true WO2021169301A1 (fr) 2021-09-02

Family

ID=71145364

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2020/119302 WO2021169301A1 (fr) 2020-02-28 2020-09-30 Procédé et dispositif de sélection d'image d'échantillon, support de stockage et serveur

Country Status (2)

Country Link
CN (1) CN111310846A (fr)
WO (1) WO2021169301A1 (fr)

Cited By (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113487617A (zh) * 2021-07-26 2021-10-08 推想医疗科技股份有限公司 数据处理方法、装置、电子设备以及存储介质
CN113793604A (zh) * 2021-09-14 2021-12-14 思必驰科技股份有限公司 语音识别系统优化方法和装置
CN114120048A (zh) * 2022-01-26 2022-03-01 中兴通讯股份有限公司 图像处理方法、电子设备及计算存储介质
CN116246756A (zh) * 2023-01-06 2023-06-09 北京医准智能科技有限公司 模型更新方法、装置、电子设备及介质

Families Citing this family (11)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111310846A (zh) * 2020-02-28 2020-06-19 平安科技(深圳)有限公司 一种选取样本图像的方法、装置、存储介质和服务器
CN111931865B (zh) * 2020-09-17 2021-01-26 平安科技(深圳)有限公司 图像分类模型的训练方法、装置、计算机设备及存储介质
CN112614570B (zh) * 2020-12-16 2022-11-25 上海壁仞智能科技有限公司 样本集标注、病理图像分类、分类模型构建方法及装置
CN112785585B (zh) * 2021-02-03 2023-07-28 腾讯科技(深圳)有限公司 基于主动学习的图像视频质量评价模型的训练方法以及装置
CN113064973A (zh) * 2021-04-12 2021-07-02 平安国际智慧城市科技股份有限公司 文本分类方法、装置、设备及存储介质
CN113706448B (zh) * 2021-05-11 2022-07-12 腾讯医疗健康(深圳)有限公司 确定图像的方法、装置、设备及存储介质
CN113435540A (zh) * 2021-07-22 2021-09-24 中国人民大学 一种类分布不匹配时的图像分类方法、系统、介质和设备
CN113657510A (zh) * 2021-08-19 2021-11-16 支付宝(杭州)信息技术有限公司 一种有标注价值的数据样本的确定方法及装置
CN113590764B (zh) * 2021-09-27 2021-12-21 智者四海(北京)技术有限公司 训练样本构建方法、装置、电子设备和存储介质
CN114141382A (zh) * 2021-12-10 2022-03-04 厦门影诺医疗科技有限公司 一种消化内镜视频数据筛选标注方法、系统和应用
CN116994085A (zh) * 2023-06-27 2023-11-03 中电金信软件有限公司 图像样本筛选方法、模型训练方法、装置和计算机设备

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20090252404A1 (en) * 2008-04-02 2009-10-08 Xerox Corporation Model uncertainty visualization for active learning
CN109886925A (zh) * 2019-01-19 2019-06-14 天津大学 一种主动学习与深度学习相结合的铝材表面缺陷检测方法
CN110689038A (zh) * 2019-06-25 2020-01-14 深圳市腾讯计算机系统有限公司 神经网络模型的训练方法、装置和医学图像处理系统
CN111310846A (zh) * 2020-02-28 2020-06-19 平安科技(深圳)有限公司 一种选取样本图像的方法、装置、存储介质和服务器

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20090252404A1 (en) * 2008-04-02 2009-10-08 Xerox Corporation Model uncertainty visualization for active learning
CN109886925A (zh) * 2019-01-19 2019-06-14 天津大学 一种主动学习与深度学习相结合的铝材表面缺陷检测方法
CN110689038A (zh) * 2019-06-25 2020-01-14 深圳市腾讯计算机系统有限公司 神经网络模型的训练方法、装置和医学图像处理系统
CN111310846A (zh) * 2020-02-28 2020-06-19 平安科技(深圳)有限公司 一种选取样本图像的方法、装置、存储介质和服务器

Non-Patent Citations (3)

* Cited by examiner, † Cited by third party
Title
BURR SETTLES, MARK CRAVEN: "An analysis of active learning strategies for sequence labeling tasks", PROCEEDINGS OF THE CONFERENCE ON EMPIRICAL METHODS IN NATURAL LANGUAGE PROCESSING, EMNLP '08, ASSOCIATION FOR COMPUTATIONAL LINGUISTICS, MORRISTOWN, NJ, USA, 1 October 2008 (2008-10-01) - 27 October 2008 (2008-10-27), Morristown, NJ, USA, pages 1070 - 1079, XP055166517, DOI: 10.3115/1613715.1613855 *
JINGBO ZHU, WANG HUIZHEN, YAO TIANSHUN, TSOU BENJAMIN K.: "Active learning with sampling by uncertainty and density for word sense disambiguation and text classification", PROCEEDINGS OF THE 22ND INTERNATIONAL CONFERENCE ON COMPUTATIONAL LINGUISTICS, COLING '08, ASSOCIATION FOR COMPUTATIONAL LINGUISTICS, MORRISTOWN, NJ, USA, vol. 1, 18 August 2008 (2008-08-18) - 22 August 2008 (2008-08-22), Morristown, NJ, USA, pages 1137 - 1144, XP055275602, ISBN: 978-1-905593-44-6, DOI: 10.3115/1599081.1599224 *
LIU PENG, ZHANG HUI, EOM KIE B.: "Active Deep Learning for Classification of Hyperspectral Images", IEEE JOURNAL OF SELECTED TOPICS IN APPLIED EARTH OBSERVATIONS AND REMOTE SENSING, IEEE, USA, vol. 10, no. 2, 1 February 2017 (2017-02-01), USA, pages 712 - 724, XP055841689, ISSN: 1939-1404, DOI: 10.1109/JSTARS.2016.2598859 *

Cited By (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113487617A (zh) * 2021-07-26 2021-10-08 推想医疗科技股份有限公司 数据处理方法、装置、电子设备以及存储介质
CN113793604A (zh) * 2021-09-14 2021-12-14 思必驰科技股份有限公司 语音识别系统优化方法和装置
CN113793604B (zh) * 2021-09-14 2024-01-05 思必驰科技股份有限公司 语音识别系统优化方法和装置
CN114120048A (zh) * 2022-01-26 2022-03-01 中兴通讯股份有限公司 图像处理方法、电子设备及计算存储介质
CN114120048B (zh) * 2022-01-26 2022-05-13 中兴通讯股份有限公司 图像处理方法、电子设备及计算存储介质
CN116246756A (zh) * 2023-01-06 2023-06-09 北京医准智能科技有限公司 模型更新方法、装置、电子设备及介质
CN116246756B (zh) * 2023-01-06 2023-12-22 浙江医准智能科技有限公司 模型更新方法、装置、电子设备及介质

Also Published As

Publication number Publication date
CN111310846A (zh) 2020-06-19

Similar Documents

Publication Publication Date Title
WO2021169301A1 (fr) Procédé et dispositif de sélection d'image d'échantillon, support de stockage et serveur
WO2021008328A1 (fr) Procédé et dispositif de traitement d'image, terminal et support d'informations
CN108280477B (zh) 用于聚类图像的方法和装置
WO2021089013A1 (fr) Procédé de formation de réseau de convolution de graphe spatial, dispositif électronique et support de stockage
WO2017075939A1 (fr) Procédé et dispositif permettant de reconnaître des contenus d'image
US20190102655A1 (en) Training data acquisition method and device, server and storage medium
CN108804641A (zh) 一种文本相似度的计算方法、装置、设备和存储介质
US11816149B2 (en) Electronic device and control method thereof
CN112862093B (zh) 一种图神经网络训练方法及装置
WO2021139262A1 (fr) Procédé et appareil d'agregation de terme mesh de document, dispositif informatique et support de stockage lisible
CN106095829A (zh) 基于深度学习与一致性表达空间学习的跨媒体检索方法
CN109165309B (zh) 负例训练样本采集方法、装置及模型训练方法、装置
KR20210102039A (ko) 전자 디바이스 및 이의 제어 방법
WO2022028147A1 (fr) Procédé et appareil d'entraînement de modèle de classification d'images, dispositif informatique et support de stockage
CN114882324A (zh) 目标检测模型训练方法、设备及计算机可读存储介质
WO2020135054A1 (fr) Procédé, dispositif et appareil de recommandation de vidéos et support de stockage
CN111931050B (zh) 基于智能识别和大数据的信息推送方法及区块链金融服务器
CN110059743B (zh) 确定预测的可靠性度量的方法、设备和存储介质
CN111161238A (zh) 图像质量评价方法及装置、电子设备、存储介质
CN115129885A (zh) 实体链指方法、装置、设备及存储介质
CN111091198B (zh) 一种数据处理方法及装置
CN112861962A (zh) 样本处理方法、装置、电子设备和存储介质
TW202004519A (zh) 影像自動分類的方法
CN115062783B (zh) 实体对齐方法及相关装置、电子设备、存储介质
US11829735B2 (en) Artificial intelligence (AI) framework to identify object-relational mapping issues in real-time

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 20922119

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE

122 Ep: pct application non-entry in european phase

Ref document number: 20922119

Country of ref document: EP

Kind code of ref document: A1