WO2023280229A1 - Image processing method, electronic device, and storage medium - Google Patents

Image processing method, electronic device, and storage medium Download PDF

Info

Publication number
WO2023280229A1
WO2023280229A1 PCT/CN2022/104184 CN2022104184W WO2023280229A1 WO 2023280229 A1 WO2023280229 A1 WO 2023280229A1 CN 2022104184 W CN2022104184 W CN 2022104184W WO 2023280229 A1 WO2023280229 A1 WO 2023280229A1
Authority
WO
WIPO (PCT)
Prior art keywords
image
model
sample
classification
difficulty
Prior art date
Application number
PCT/CN2022/104184
Other languages
French (fr)
Chinese (zh)
Inventor
张连文
谢伟雁
林志
董冠方
李小慧
Original Assignee
华为技术有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 华为技术有限公司 filed Critical 华为技术有限公司
Publication of WO2023280229A1 publication Critical patent/WO2023280229A1/en

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/70Arrangements for image or video recognition or understanding using pattern recognition or machine learning
    • G06V10/77Processing image or video features in feature spaces; using data integration or data reduction, e.g. principal component analysis [PCA] or independent component analysis [ICA] or self-organising maps [SOM]; Blind source separation
    • G06V10/774Generating sets of training patterns; Bootstrap methods, e.g. bagging or boosting
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/24Classification techniques
    • G06F18/241Classification techniques relating to the classification model, e.g. parametric or non-parametric approaches
    • G06F18/2415Classification techniques relating to the classification model, e.g. parametric or non-parametric approaches based on parametric or probabilistic models, e.g. based on likelihood ratio or false acceptance rate versus a false rejection rate
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/045Combinations of networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • G06N3/09Supervised learning
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V20/00Scenes; Scene-specific elements
    • G06V20/35Categorising the entire scene, e.g. birthday party or wedding scene

Definitions

  • Fig. 1 shows a schematic diagram of samples and labels of different difficulties in the related art.
  • the program code provided by the embodiment of the present application operates on the host memory 41 (including database 220) of the server 40, CPU42 fast hardware GPU43 wherein the image analysis system 240 can also be called a cloud server image analysis platform, and the image verification system 260 can also be Known as a cloud service image verification platform, the image visualization system 280 may also be called a cloud service image visualization platform.
  • the image analysis system 240 can also be called a cloud server image analysis platform
  • the image verification system 260 can also be Known as a cloud service image verification platform
  • the image visualization system 280 may also be called a cloud service image visualization platform.
  • the sample classification difficulty threshold is set by default, or is set by a user. This embodiment of the present application does not limit it.
  • the data set is screened according to the corresponding confusion perplexity and false perplexity of multiple sample images in the data set, the preset confusion perplexity threshold and false perplexity threshold, and each image in the filtered data set
  • the aliasing perplexity of the sample images is less than a preset aliasing perplexity threshold, and the false perplexity of each sample image is greater than the preset false perplexity threshold.

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • Artificial Intelligence (AREA)
  • Data Mining & Analysis (AREA)
  • Evolutionary Computation (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Health & Medical Sciences (AREA)
  • General Engineering & Computer Science (AREA)
  • Software Systems (AREA)
  • General Health & Medical Sciences (AREA)
  • Computing Systems (AREA)
  • Biophysics (AREA)
  • Molecular Biology (AREA)
  • Mathematical Physics (AREA)
  • Computational Linguistics (AREA)
  • Biomedical Technology (AREA)
  • Multimedia (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Databases & Information Systems (AREA)
  • Medical Informatics (AREA)
  • Probability & Statistics with Applications (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Evolutionary Biology (AREA)
  • Image Analysis (AREA)

Abstract

The present application relates to the field of image processing, and in particular to an image processing method, an electronic device, and a storage medium. The method comprises: obtaining a dataset and a model pool, the dataset comprising a plurality of sample images, the model pool comprising a plurality of pretrained first image classification models; respectively inputting the dataset into each of the first image classification models of the model pool to obtain predicted distribution results respectively corresponding to the plurality of first image classification models, each predicted distribution result being used for indicating probability distribution of each sample image on a plurality of image tags corresponding to the dataset; and according to the plurality of predicted distribution results, determining sample classification difficulty corresponding to at least one sample image in the dataset, the sample classification difficulty being used for indicating the difficulty of classification of the sample image by the model. The method provided by embodiments of the present application can automatically evaluate sample classification difficulty without manual participation, can be applied to large-scale data, and provides the basic capability for subsequent applications such as classification design and data verification.

Description

图像处理方法、电子设备及存储介质Image processing method, electronic device and storage medium
本申请要求于2021年07月07日提交中国专利局、申请号为202110767141.6、申请名称为“图像处理方法、电子设备及存储介质”的中国专利申请的优先权,其全部内容通过引用结合在本申请中。This application claims the priority of the Chinese patent application with the application number 202110767141.6 and the application name "image processing method, electronic device and storage medium" submitted to the China Patent Office on July 07, 2021, the entire contents of which are incorporated herein by reference Applying.
技术领域technical field
本申请涉及图像处理领域,尤其涉及一种图像处理方法、电子设备及存储介质。The present application relates to the field of image processing, in particular to an image processing method, electronic equipment and storage media.
背景技术Background technique
随着人工智能技术和图像处理技术的发展,目前图像分类处理越来越多地基于人工智能模型进行。为了实现图像分类,相关技术一般通过大量的样本图像训练图像分类模型,以调用训练完成的图像分类模型进行图像分类处理。With the development of artificial intelligence technology and image processing technology, image classification processing is increasingly based on artificial intelligence models. In order to realize image classification, related technologies generally train an image classification model through a large number of sample images, so as to call the trained image classification model to perform image classification processing.
然而,不同的样本图像包含的信息量大小不同,这就导致不同的样本图像被图像分类模型学习的难易程度不同。相关技术中并未对样本图像进行区分,而是直接基于训练难度不均衡的样本直接训练图像分类模型,这就导致图像分类模型学习到的知识有限,进而导致最终分类能力不佳。However, different sample images contain different amounts of information, which leads to different degrees of difficulty for different sample images to be learned by the image classification model. In the related art, the sample images are not differentiated, but the image classification model is directly trained based on the samples with unbalanced training difficulty, which leads to the limited knowledge learned by the image classification model, which leads to poor final classification ability.
发明内容Contents of the invention
有鉴于此,本申请实施例提出了一种图像处理方法、电子设备及存储介质,可以解决相关技术中样本图像存在训练难度不均衡现象,在训练图像分类模型时会影响模型训练效果的问题。In view of this, the embodiment of the present application proposes an image processing method, an electronic device and a storage medium, which can solve the problem that the training difficulty of the sample images in the related art is unbalanced, which will affect the model training effect when training the image classification model.
第一方面,本申请的实施例提供了一种图像处理方法,所述方法包括:In a first aspect, an embodiment of the present application provides an image processing method, the method comprising:
获取数据集和模型池,所述数据集包括多个样本图像,所述模型池包括预先训练完成的多个第一图像分类模型;Obtain a data set and a model pool, the data set includes a plurality of sample images, and the model pool includes a plurality of pre-trained first image classification models;
将所述数据集分别输入至所述模型池的每个第一图像分类模型中,得到所述多个第一图像分类模型各自对应的预测分布结果,所述预测分布结果用于指示每个所述样本图像在所述数据集对应的多个图像标签上的概率分布;Input the data set into each first image classification model of the model pool respectively, and obtain prediction distribution results corresponding to each of the plurality of first image classification models, and the prediction distribution results are used to indicate that each of the first image classification models The probability distribution of the sample image on a plurality of image tags corresponding to the data set;
根据所述多个预测分布结果,确定所述数据集中至少一个所述样本图像对应的样本分类难度,所述样本分类难度用于指示所述样本图像被模型分类的难易程度。A sample classification difficulty corresponding to at least one sample image in the data set is determined according to the plurality of prediction distribution results, and the sample classification difficulty is used to indicate the degree of difficulty for the sample image to be classified by a model.
在该实现方式中,本申请实施例提供的图像处理方法,通过获取数据集和模型池,将数据集分别输入至模型池的每个第一图像分类模型中,得到多个第一图像分类模型各自对应的预测分布结果,根据多个预测分布结果,确定数据集中至少一个样本图像对应的样本分类难度,样本分类难度衡量了该样本图像被模型分类的难易程度,后续可以基于至少一个样本图像对应的样本分类难度训练图像分类模型,从而解决由于样本图像存在训练难度不均衡现象导致的模型训练效果不佳的问题,本申请实施例提供的图像处理方法可以应用于大规模数据,为后续新的分类方法的设计和数据校验等应用提供了基础能力。In this implementation, the image processing method provided in the embodiment of the present application obtains a data set and a model pool, and inputs the data set into each first image classification model in the model pool respectively, to obtain multiple first image classification models According to the corresponding prediction distribution results, the difficulty of sample classification corresponding to at least one sample image in the data set is determined according to multiple prediction distribution results. The corresponding sample classification difficulty trains the image classification model, so as to solve the problem of poor model training effect caused by the unbalanced training difficulty of the sample images. The design of classification methods and applications such as data verification provide basic capabilities.
在一种可能的实现方式中,所述根据所述多个预测分布结果,确定所述数据集中至少一个所述样本图像对应的样本分类难度,包括:In a possible implementation manner, the determining the sample classification difficulty corresponding to at least one sample image in the data set according to the multiple prediction distribution results includes:
对于至少一个所述样本图像中的每个所述样本图像,根据所述多个预测分布结果和所述模型池中的所述第一图像分类模型的个数,确定所述样本图像的混淆困惑度,所述混淆困惑度用于指示所述样本图像对应的样本分类难度。For each of the at least one of the sample images, determining the confusion of the sample images based on the plurality of prediction distribution results and the number of the first image classification models in the model pool The degree of confusion is used to indicate the difficulty of classifying the sample corresponding to the sample image.
在该实现方式中,引入了混淆困惑度作为样本分类难度度量方式,使得样本分类难度的评估可以在有标签数据或无标签数据中进行,进一步保证了后续新的分类方法的设计和数据校验等。In this implementation, confusion is introduced as a measure of sample classification difficulty, so that the evaluation of sample classification difficulty can be performed in labeled data or unlabeled data, which further ensures the design and data verification of subsequent new classification methods Wait.
在另一种可能的实现方式中,所述样本图像的混淆困惑度与各个所述第一图像分类模型对应的所述样本图像的预测分布结果的熵正相关,且与所述第一图像分类模型的个数负相关,所述第一图像分类模型对应的预测分布结果的熵是根据所述第一图像分类模型输出的所述样本图像在各个所述图像标签上的概率确定的。In another possible implementation manner, the confusion degree of the sample image is positively correlated with the entropy of the prediction distribution result of the sample image corresponding to each of the first image classification models, and is related to the entropy of the first image classification model. The number of models is negatively correlated, and the entropy of the predicted distribution result corresponding to the first image classification model is determined according to the probability of the sample image output by the first image classification model on each of the image labels.
在该实现方式中,为计算样本图像的混淆困惑度提供一种可能的实现方式,进一步保证了自动评估样本分类难度的可行性和可靠性。In this implementation, a possible implementation is provided for calculating the perplexity of sample images, which further ensures the feasibility and reliability of automatically evaluating the difficulty of sample classification.
在另一种可能的实现方式中,所述方法还包括:In another possible implementation, the method further includes:
根据所述多个预测分布结果,确定所述多个图像标签中任意两个所述图像标签间的标签区分难度,所述标签区分难度用于指示任意两个所述图像标签被模型区分的难易程度。According to the plurality of prediction distribution results, determine the label discrimination difficulty between any two image labels in the plurality of image labels, and the label discrimination difficulty is used to indicate the difficulty of any two image labels being distinguished by the model ease.
在该实现方式中,根据多个预测分布结果,确定多个图像标签中任意两个图像标签间的标签区分难度,标签区分难度衡量了任意两个图像标签被模型区分的难易程度,后续可以基于标签区分难度训练图像分类模型,提供了图像分类模型的分类效果。并且,该方法可以应用于大规模数据,为后续新的分类方法的设计和数据校验等应用提供了基础能力。In this implementation, according to the results of multiple prediction distributions, the label discrimination difficulty between any two image labels in multiple image labels is determined. The label discrimination difficulty measures the difficulty of any two image labels being distinguished by the model. The image classification model is trained based on the difficulty of label discrimination, and the classification effect of the image classification model is provided. Moreover, this method can be applied to large-scale data, providing basic capabilities for subsequent applications such as the design of new classification methods and data verification.
在另一种可能的实现方式中,所述根据所述多个预测分布结果,确定所述多个图像标签中任意两个所述图像标签间的标签区分难度,包括:In another possible implementation manner, the determining the label discrimination difficulty between any two image labels among the multiple image labels according to the multiple prediction distribution results includes:
对于所述多个图像标签中任意两个所述图像标签,根据所述多个预测分布结果和所述模型池中的所述第一图像分类模型的个数,确定任意两个所述图像标签间的标签混淆指数,所述标签混淆指数用于指示所述任意两个图像标签间的标签区分难度。For any two of the plurality of image tags, according to the plurality of prediction distribution results and the number of the first image classification models in the model pool, determine any two of the image tags The label confusion index between them is used to indicate the difficulty of label distinction between any two image labels.
在该实现方式中,引入了标签混淆指数作为标签区分难度度量方式,有效且高效地评估了标签区分难度,进一步保证了后续新的分类方法的设计和数据校验等。In this implementation, the label confusion index is introduced as a measure of the difficulty of label distinction, which effectively and efficiently evaluates the difficulty of label distinction, and further ensures the design of subsequent new classification methods and data verification.
在另一种可能的实现方式中,任意两个所述图像标签间的标签混淆指数是根据所述数据集在两个所述图像标签上的概率分布的相似度确定的。In another possible implementation manner, the label confusion index between any two image labels is determined according to the similarity of the probability distribution of the data set on the two image labels.
在该实现方式中,为计算任意两个图像标签间的标签混淆指数提供一种可能的实现方式,进一步保证了自动评估标签区分难度的可行性和可靠性。In this implementation, a possible implementation is provided for calculating the label confusion index between any two image labels, which further ensures the feasibility and reliability of automatically evaluating the difficulty of label distinction.
在另一种可能的实现方式中,所述获取数据集和模型池之前,还包括:In another possible implementation, before acquiring the dataset and the model pool, further includes:
根据训练集对原始的第二图像分类模型进行训练得到所述第一图像分类模型,所述训练集包括多个训练图像;training the original second image classification model according to the training set to obtain the first image classification model, the training set including a plurality of training images;
其中,所述模型池中存在至少两个所述第一图像分类模型各自对应的模型训练参数是不同的,所述模型训练参数包括所述训练集、所述第二图像分类模型的类型和模 型训练时长中的至少一个。Wherein, there are at least two of the first image classification models in the model pool, and the corresponding model training parameters are different, and the model training parameters include the training set, the type and model of the second image classification model At least one of the training durations.
在该实现方式中,构建模型池作为难度评估的基础,避免了相关技术中受试者参与的情况,可以实现大规模自动化的评估;还通过模型池中存在至少两个所=第一图像分类模型各自对应的模型训练参数是不同的,使得模型池更具代表性,使用该模型池所得到的分布可以避免由于模型导致的偏差,以确保最终计算结果不受模型选择偏差的影响。In this implementation, the model pool is constructed as the basis for difficulty assessment, which avoids the participation of subjects in related technologies, and can realize large-scale automated assessment; there are at least two = first image classifications in the model pool The model training parameters corresponding to each model are different, making the model pool more representative, and the distribution obtained by using the model pool can avoid the bias caused by the model, so as to ensure that the final calculation result is not affected by the model selection bias.
在另一种可能的实现方式中,所述方法还包括:In another possible implementation, the method further includes:
根据所述数据集中的所述多个样本图像各自对应的样本分类难度和预设的样本分类难度阈值,对所述数据集进行筛选;Filtering the data set according to the sample classification difficulty corresponding to each of the plurality of sample images in the data set and a preset sample classification difficulty threshold;
对于筛选后的数据集中的每个所述样本图像,根据所述模型池最大比例输出的目标图像标签,对所述样本图像的图像标签进行修正。For each sample image in the filtered data set, the image label of the sample image is corrected according to the target image label output by the maximum proportion of the model pool.
在该实现方式中,根据数据集中的多个样本图像各自对应的样本分类难度和预设的样本分类难度阈值,对数据集进行筛选;对于筛选后的数据集中的每个样本图像,根据模型池最大比例输出的目标图像标签,对样本图像的图像标签进行修正,使得在自动评估样本分类难度后,可以根据评估得到的样本分类难度,对数据集中错误标注数据进行校验和修正。In this implementation, the data set is screened according to the sample classification difficulty corresponding to each of the multiple sample images in the data set and the preset sample classification difficulty threshold; for each sample image in the filtered data set, according to the model pool The target image label output at the maximum ratio corrects the image label of the sample image, so that after the difficulty of sample classification is automatically evaluated, the mislabeled data in the data set can be verified and corrected according to the difficulty of sample classification obtained from the evaluation.
在另一种可能的实现方式中,所述方法还包括:In another possible implementation, the method further includes:
根据所述数据集中任意两个所述图像标签间的标签区分难度构建第三图像分类模型,所述第三图像分类模型为多级分类模型;Constructing a third image classification model according to the label discrimination difficulty between any two of the image tags in the data set, the third image classification model is a multi-level classification model;
根据训练集对所述第三图像分类模型进行训练得到第四图像分类模型;training the third image classification model according to the training set to obtain a fourth image classification model;
在获取到待分类的目标图像的情况下,调用所述第四图像分类模型对所述目标图像进行分类处理,得到所述目标图像的分类结果。When the target image to be classified is obtained, the fourth image classification model is invoked to perform classification processing on the target image to obtain a classification result of the target image.
在该实现方式中,在自动评估标签区分难度后,可以根据标签区分难度构建第三图像分类模型,根据训练集对第三图像分类模型进行训练得到精度更高的第四图像分类模型;调用第四图像分类模型进行推理时,由于模型平均复杂度相比原来的复杂大模型可以降低,因此可以实现加速推理,有助于进行终端部署。In this implementation, after automatically evaluating the difficulty of label discrimination, a third image classification model can be constructed according to the difficulty of label discrimination, and the third image classification model can be trained according to the training set to obtain a fourth image classification model with higher accuracy; call the When the four-image classification model performs inference, because the average complexity of the model can be reduced compared with the original complex large model, it can achieve accelerated inference and facilitate terminal deployment.
第二方面,本申请的实施例提供了一种图像处理装置,该装置包括至少一个单元,至少一个单元用于实现上述第一方面或第一方面中的任意一种可能的实现方式所提供的图像处理方法。In the second aspect, the embodiments of the present application provide an image processing device, the device includes at least one unit, and at least one unit is used to implement the above-mentioned first aspect or any one of the possible implementations of the first aspect. image processing method.
第三方面,本申请的实施例提供了一种电子设备,所述电子设备包括:In a third aspect, an embodiment of the present application provides an electronic device, and the electronic device includes:
处理器;processor;
用于存储处理器可执行指令的存储器;memory for storing processor-executable instructions;
其中,所述处理器被配置为执行所述指令时实现上述第一方面或第一方面中的任意一种可能的实现方式所提供的图像处理方法。Wherein, the processor is configured to implement the image processing method provided in the first aspect or any possible implementation manner of the first aspect when executing the instructions.
第四方面,本申请的实施例提供了一种计算机程序产品,包括计算机可读代码,或者承载有计算机可读代码的非易失性计算机可读存储介质,当所述计算机可读代码在电子设备中运行时,所述电子设备中的处理器执行上述第一方面或者第一方面中的任意一种可能的实现方式所提供的图像处理方法。In a fourth aspect, the embodiments of the present application provide a computer program product, including computer-readable codes, or a non-volatile computer-readable storage medium bearing computer-readable codes, when the computer-readable codes are stored in an electronic When running in the device, the processor in the electronic device executes the image processing method provided in the first aspect or any possible implementation manner of the first aspect.
第五方面,本申请的实施例提供了一种非易失性计算机可读存储介质,其上存储 有计算机程序指令,所述计算机程序指令被处理器执行时实现上述第一方面或者第一方面中的任意一种可能的实现方式所提供的图像处理方法。In the fifth aspect, the embodiments of the present application provide a non-volatile computer-readable storage medium, on which computer program instructions are stored, and when the computer program instructions are executed by a processor, the above-mentioned first aspect or the first aspect can be realized The image processing method provided by any one of the possible implementations.
附图说明Description of drawings
包含在说明书中并且构成说明书的一部分的附图与说明书一起示出了本申请的示例性实施例、特征和方面,并且用于解释本申请的原理。The accompanying drawings, which are incorporated in and constitute a part of the specification, illustrate exemplary embodiments, features, and aspects of the application and, together with the specification, serve to explain the principles of the application.
图1示出了相关技术中不同难度的样本及标签的示意图。Fig. 1 shows a schematic diagram of samples and labels of different difficulties in the related art.
图2示出了本申请一个示例性实施例提供的图像处理系统的结构示意图。Fig. 2 shows a schematic structural diagram of an image processing system provided by an exemplary embodiment of the present application.
图3示出了本申请一个示例性实施例提供的电子设备的结构示意图。Fig. 3 shows a schematic structural diagram of an electronic device provided by an exemplary embodiment of the present application.
图4示出了本申请另一个示例性实施例提供的电子设备的结构示意图。Fig. 4 shows a schematic structural diagram of an electronic device provided by another exemplary embodiment of the present application.
图5示出了本申请一个示例性实施例提供的图像处理方法的流程图。Fig. 5 shows a flowchart of an image processing method provided by an exemplary embodiment of the present application.
图6示出了本申请一个示例性实施例提供的标签混淆指数的计算方法的原理示意图。Fig. 6 shows a schematic diagram of the principle of the calculation method of the label confusion index provided by an exemplary embodiment of the present application.
图7示出了本申请一个示例性实施例提供的ImageNet数据集的混淆困惑度分布情况的示意图。Fig. 7 shows a schematic diagram of the distribution of confusion and perplexity of the ImageNet data set provided by an exemplary embodiment of the present application.
图8示出了本申请一个示例性实施例提供的图像处理方法得到的部分样本图像的计算结果的示意图。Fig. 8 shows a schematic diagram of calculation results of some sample images obtained by an image processing method provided in an exemplary embodiment of the present application.
图9示出了本申请一个示例性实施例提供的ImageNet数据集的标签混淆指数分布情况的示意图。Fig. 9 shows a schematic diagram of the distribution of the label confusion index of the ImageNet data set provided by an exemplary embodiment of the present application.
图10示出了本申请一个示例性实施例提供的图像处理方法得到的部分标签对的计算结果的示意图。Fig. 10 shows a schematic diagram of calculation results of some label pairs obtained by an image processing method provided in an exemplary embodiment of the present application.
图11示出了本申请另一个示例性实施例提供的图像处理方法的流程图。Fig. 11 shows a flowchart of an image processing method provided by another exemplary embodiment of the present application.
图12示出了本申请另一个示例性实施例提供的错误标注数据修正的原理示意图。Fig. 12 shows a schematic diagram of the principle of correcting incorrectly labeled data provided by another exemplary embodiment of the present application.
图13示出了本申请另一个示例性实施例提供的图像处理方法的流程图。Fig. 13 shows a flowchart of an image processing method provided by another exemplary embodiment of the present application.
图14示出了本申请另一个示例性实施例提供的多级的图像分类方法的原理示意图。Fig. 14 shows a schematic diagram of the principle of a multi-level image classification method provided by another exemplary embodiment of the present application.
图15示出了本申请另一个示例性实施例提供的图像处理方法的流程图。Fig. 15 shows a flowchart of an image processing method provided by another exemplary embodiment of the present application.
图16示出了本申请另一个示例性实施例提供的图像处理装置的框图。Fig. 16 shows a block diagram of an image processing apparatus provided by another exemplary embodiment of the present application.
具体实施方式detailed description
以下将参考附图详细说明本申请的各种示例性实施例、特征和方面。附图中相同的附图标记表示功能相同或相似的元件。尽管在附图中示出了实施例的各种方面,但是除非特别指出,不必按比例绘制附图。Various exemplary embodiments, features, and aspects of the present application will be described in detail below with reference to the accompanying drawings. The same reference numbers in the figures indicate functionally identical or similar elements. While various aspects of the embodiments are shown in drawings, the drawings are not necessarily drawn to scale unless specifically indicated.
在这里专用的词“示例性”意为“用作例子、实施例或说明性”。这里作为“示例性”所说明的任何实施例不必解释为优于或好于其它实施例。The word "exemplary" is used exclusively herein to mean "serving as an example, embodiment, or illustration." Any embodiment described herein as "exemplary" is not necessarily to be construed as superior or better than other embodiments.
另外,为了更好的说明本申请,在下文的具体实施方式中给出了众多的具体细节。本领域技术人员应当理解,没有某些具体细节,本申请同样可以实施。在一些实例中,对于本领域技术人员熟知的方法、手段、元件和电路未作详细描述,以便于凸显本申请的主旨。In addition, in order to better illustrate the present application, numerous specific details are given in the following specific implementation manners. It will be understood by those skilled in the art that the present application may be practiced without certain of the specific details. In some instances, methods, means, components and circuits well known to those skilled in the art have not been described in detail in order to highlight the gist of the present application.
目前在越来越多的应用场景中,人工智能技术的表现已经可以媲美人类了,然而当前的人工智能的学习过程与人类学习过程仍存在着一些较大的差异。如图1所示,沙漏为容易样本,球拍为困难样本;标签“吉他”和标签“小提琴”为容易区分的标签,标签“小提琴”和标签“大提琴”为难以区分的标签。在目前的人工智能研究中,尽管样本有难易之分,标签也有难易之分,普遍倾向于使用一个黑盒模型进行训练学习,对所有样本“一视同仁”,对所有标签“一视同仁”。此外,目前对于人工智能模型的一些解释方法,也都没有考虑到样本分类难度和标签区分难度。也就是说,不管对于容易的样本还是困难的样本,解释的复杂度及精细程度都是一样的,也少有不同标签之间的比较,这与人类理解模型的需求不符。At present, in more and more application scenarios, the performance of artificial intelligence technology is comparable to that of human beings. However, there are still some big differences between the current learning process of artificial intelligence and human learning process. As shown in Figure 1, the hourglass is an easy sample, and the racket is a difficult sample; the label "guitar" and the label "violin" are easily distinguishable labels, and the label "violin" and the label "cello" are indistinguishable labels. In the current artificial intelligence research, although samples are divided into difficulties and labels are also divided into difficulties, it is generally inclined to use a black box model for training and learning, "treating all samples equally" and "treating all labels equally". In addition, some current interpretation methods for artificial intelligence models do not take into account the difficulty of sample classification and label distinction. That is to say, no matter for easy samples or difficult samples, the complexity and level of sophistication of the explanation are the same, and there are few comparisons between different labels, which does not meet the needs of human understanding models.
样本分类难度及标签区分难度为数据的基本特性,对这两个特性探索的缺失,可能导致模型开发者不能“因材施教”,从而开发出更高效精度更高的模型。因此,本申请实施例提供了一种自动化评估样本分类难度及标签区分难度的方案,旨在为探索符合人认知过程的人工智能分类方法和人工智能模型解释方法提供基础能力,从而为后续人工智能业界的发展提供一个全新的发展方向。此外,本方案还可运用于数据的标注校验和清洗。The difficulty of sample classification and the difficulty of label distinction are basic characteristics of data. The lack of exploration of these two characteristics may cause model developers to be unable to "teach students in accordance with their aptitude", thus developing more efficient and accurate models. Therefore, the embodiment of the present application provides a solution for automatically evaluating the difficulty of sample classification and label distinction, which aims to provide basic capabilities for exploring artificial intelligence classification methods and artificial intelligence model interpretation methods that conform to the human cognitive process, so as to provide a basis for subsequent artificial intelligence. The development of the intelligent industry provides a new direction of development. In addition, this solution can also be applied to label verification and cleaning of data.
相关技术中,可以通过测试受试者对于视觉搜索到指定图片的时间,以该响应时间作为样本图像在视觉搜索任务中的样本分类难度的衡量,即受试者的响应时间越长,该样本图像越难以被搜索到,即该样本图像的样本分类难度越高。但是在该方法中需要人为参与,不能针对大规模数据进行测试,且容易受到受试者状态的影响,比如受试者的精神状态,从而出现统计偏差。In related technologies, it is possible to test the time it takes for a subject to visually search for a specified picture, and use the response time as a measure of the sample classification difficulty of the sample image in the visual search task, that is, the longer the subject's response time, the sample The harder it is for an image to be searched, that is, the higher the difficulty of sample classification for the sample image. However, this method requires human participation, cannot be tested for large-scale data, and is easily affected by the state of the subject, such as the mental state of the subject, resulting in statistical deviation.
此外,相关技术中还可以使用顶级模型的性能指标作为数据集的难度衡量。可选的,对每一个数据集选取多个模型进行训练,得到各个模型的精度表现,表现最好的模型的精度越高,则说明该数据集的分类难度越低。但是在该方法是针对数据集整体的分类难度评估,不能为单个样本图像进行分类难度评估,也不能为图像标签进行区分难度评估。In addition, in related technologies, the performance index of the top model can also be used as a measure of the difficulty of the data set. Optionally, multiple models are selected for each data set for training to obtain the accuracy performance of each model. The higher the accuracy of the model with the best performance, the lower the classification difficulty of the data set. However, this method is aimed at evaluating the classification difficulty of the entire data set, and cannot evaluate the classification difficulty of a single sample image, nor can it evaluate the difficulty of distinguishing image labels.
本申请实施例提出了一种图像处理方法、电子设备及存储介质,通过获取数据集和模型池,将数据集分别输入至模型池的每个第一图像分类模型中,得到多个第一图像分类模型各自对应的预测分布结果,根据多个预测分布结果,确定数据集中至少一个样本图像对应的样本分类难度,样本分类难度衡量了该样本图像被模型分类的难易程度,后续可以基于至少一个样本图像对应的样本分类难度训练图像分类模型,从而解决由于样本图像存在训练难度不均衡现象导致的模型训练效果不佳的问题。此外,本申请实施例还通过根据多个预测分布结果,确定多个图像标签中任意两个图像标签间的标签区分难度,标签区分难度衡量了任意两个图像标签被模型区分的难易程度,后续可以基于多个图像标签中任意两个图像标签间的标签区分难度训练图像分类模型。保证了本方案可以应用于大规模数据,为后续新的分类方法的设计和数据校验等应用提供了基础能力。The embodiment of the present application proposes an image processing method, electronic equipment, and storage medium. By obtaining a data set and a model pool, the data set is respectively input into each first image classification model in the model pool to obtain a plurality of first images According to the prediction distribution results corresponding to the classification models, the difficulty of sample classification corresponding to at least one sample image in the data set is determined according to multiple prediction distribution results. The difficulty of sample classification measures the difficulty of classifying the sample image by the model. The sample classification difficulty corresponding to the sample image trains the image classification model, so as to solve the problem of poor model training effect caused by the unbalanced training difficulty of the sample image. In addition, the embodiment of the present application also determines the label discrimination difficulty between any two image labels among the multiple image labels according to multiple prediction distribution results. The label discrimination difficulty measures the difficulty of any two image labels being distinguished by the model. Subsequently, an image classification model can be trained based on the label discrimination difficulty between any two image labels in the plurality of image labels. It ensures that this scheme can be applied to large-scale data, and provides basic capabilities for subsequent applications such as the design of new classification methods and data verification.
首先,对本申请涉及的应用场景进行介绍。First, the application scenarios involved in this application are introduced.
请参考图2,其示出了本申请一个示例性实施例提供的图像处理系统的结构示意图。Please refer to FIG. 2 , which shows a schematic structural diagram of an image processing system provided by an exemplary embodiment of the present application.
该图像处理系统包括数据库220、图像分析系统240、图像校验系统260和图像可视系统280。The image processing system includes a database 220 , an image analysis system 240 , an image verification system 260 and an image visualization system 280 .
数据库220用于为图像分析系统240提供数据集,数据集包括多个样本图像。The database 220 is used to provide the image analysis system 240 with a data set, the data set includes a plurality of sample images.
图像分析系统240包括难度评估装置242,该难度评估装置242用于从数据库中获取数据集,对数据集中的至少一个样本图像进行样本分类难度计算。The image analysis system 240 includes a difficulty assessment device 242, which is configured to obtain a data set from a database, and perform sample classification difficulty calculation on at least one sample image in the data set.
可选的,该难度评估装置242还用于对数据集对应的多个图像标签中任意两个图像标签间进行标签区分难度计算。Optionally, the difficulty evaluation device 242 is also used to calculate the difficulty of label distinction between any two image labels among the plurality of image labels corresponding to the data set.
该难度评估装置242还用于将计算结果提供给后续的图像校验系统260和图像可视系统280。The difficulty evaluation device 242 is also used to provide calculation results to the subsequent image verification system 260 and image visualization system 280 .
图像校验系统260用于根据难度评估装置242的计算结果,对数据集中错误标注的数据进行校验和修正。The image verification system 260 is used for verifying and correcting the incorrectly marked data in the data set according to the calculation result of the difficulty evaluation device 242 .
图像可视系统280用于根据难度评估装置242的计算结果,显示与计算结果相关的内容。The image visualization system 280 is used to display the content related to the calculation result according to the calculation result of the difficulty assessment device 242 .
本申请实施例提供的图像处理方法的执行主体为电子设备。请参考图3,其示出了本申请一个示例性实施例提供的电子设备的结构示意图。The execution subject of the image processing method provided in the embodiment of the present application is an electronic device. Please refer to FIG. 3 , which shows a schematic structural diagram of an electronic device provided by an exemplary embodiment of the present application.
该电子设备可以是终端或者服务器。终端包括移动终端或者固定终端,比如终端可以是手机、平板电脑、膝上型便携计算机和台式计算机等等。服务器可以是一台服务器,或者由若干台服务器组成的服务器集群,或者是一个云计算服务中心。The electronic device may be a terminal or a server. The terminal includes a mobile terminal or a fixed terminal, such as a mobile phone, a tablet computer, a laptop computer, a desktop computer, and the like. The server can be one server, or a server cluster composed of several servers, or a cloud computing service center.
如图3所示,电子设备包括处理器310、存储器320以及通信接口330。本领域技术人员可以理解,图3中示出的结构并不构成对该电子设备的限定,可以包括比图示更多或更少的部件,或者组合某些部件,或者不同的部件布置。其中:As shown in FIG. 3 , the electronic device includes a processor 310 , a memory 320 and a communication interface 330 . Those skilled in the art can understand that the structure shown in FIG. 3 does not constitute a limitation to the electronic device, and may include more or less components than shown in the figure, or combine some components, or arrange different components. in:
处理器310是电子设备的控制中心,利用各种接口和线路连接整个电子设备的各个部分,通过运行或执行存储在存储器320内的软件程序和/或模块,以及调用存储在存储器320内的数据,执行电子设备的各种功能和处理数据,从而对电子设备进行整体控制。处理器310可以由中央处理器(Central Processing Unit,CPU)实现,也可以由图形处理器(Graphics Processing Unit,GPU)实现。The processor 310 is the control center of the electronic device, and uses various interfaces and lines to connect various parts of the entire electronic device, by running or executing software programs and/or modules stored in the memory 320, and calling data stored in the memory 320 , to perform various functions of the electronic equipment and process data, thereby controlling the electronic equipment as a whole. The processor 310 may be implemented by a central processing unit (Central Processing Unit, CPU), and may also be implemented by a graphics processing unit (Graphics Processing Unit, GPU).
存储器320可用于存储软件程序以及模块。处理器310通过运行存储在存储器320的软件程序以及模块,从而执行各种功能应用以及数据处理。存储器320可主要包括存储程序区和存储数据区,其中,存储程序区可存储操作系统321、获取模块322、输入模块323、确定模块324和至少一个功能所需的应用程序325(比如神经网络模型训练等)等;存储数据区可存储根据电子设备的使用所创建的数据等。存储器320可以由任何类型的易失性或非易失性存储设备或者它们的组合实现,如静态随机存取存储器(Static Random Access Memory,SRAM),电可擦除可编程只读存储器(Electrically Erasable Programmable Read-Only Memory,EEPROM),可擦除可编程只读存储器(Erasable Programmable Read Only Memory,EPROM),可编程只读存储器(Programmable Read-Only Memory,PROM),只读存储器(Read Only Memory,ROM), 磁存储器,快闪存储器,磁盘或光盘。相应地,存储器320还可以包括存储器控制器,以提供处理器310对存储器320的访问。The memory 320 can be used to store software programs as well as modules. The processor 310 executes various functional applications and data processing by running software programs and modules stored in the memory 320 . The memory 320 can mainly include a program storage area and a data storage area, wherein the program storage area can store an operating system 321, an acquisition module 322, an input module 323, a determination module 324 and at least one functionally required application program 325 (such as a neural network model training, etc.); the storage data area can store data created according to the use of the electronic device, etc. Memory 320 can be realized by any type of volatile or nonvolatile memory device or their combination, such as Static Random Access Memory (Static Random Access Memory, SRAM), Electrically Erasable Programmable Read-Only Memory (Electrically Erasable Programmable Read-Only Memory, EEPROM), Erasable Programmable Read Only Memory (EPROM), Programmable Read-Only Memory (PROM), Read Only Memory (Read Only Memory, ROM), magnetic memory, flash memory, magnetic or optical disk. Correspondingly, the memory 320 may further include a memory controller to provide the processor 310 with access to the memory 320 .
其中,处理器310通过运行获取模块322执行以下功能:获取数据集和模型池,数据集包括多个样本图像,模型池包括预先训练完成的多个第一图像分类模型;处理器310通过运行输入模块323执行以下功能:将数据集分别输入至模型池的每个第一图像分类模型中,得到多个第一图像分类模型各自对应的预测分布结果,预测分布结果用于指示每个样本图像在数据集对应的多个图像标签上的概率分布;处理器310通过运行确定模块324执行以下功能:根据多个预测分布结果,确定数据集中至少一个样本图像对应的样本分类难度,样本分类难度用于指示样本图像被模型分类的难易程度。Wherein, the processor 310 performs the following functions by running the acquiring module 322: acquiring a data set and a model pool, the data set includes a plurality of sample images, and the model pool includes a plurality of first image classification models that have been trained in advance; the processor 310 inputs by running Module 323 performs the following functions: respectively input the data set into each first image classification model in the model pool, and obtain the prediction distribution results corresponding to each of the multiple first image classification models, and the prediction distribution results are used to indicate that each sample image is in Probability distribution on multiple image labels corresponding to the data set; the processor 310 executes the following functions by running the determination module 324: according to multiple prediction distribution results, determine the sample classification difficulty corresponding to at least one sample image in the data set, and the sample classification difficulty is used for Indicates how easily the sample image is classified by the model.
在一种可能的实现方式中,如图4所示,电子设备可以为服务器40,本案的产品实现形态是包含在大数据分析平台中,并部署在服务器40硬件上的神经网络模型及程序代码。本申请实施例涉及的神经网络模型及程序代码部署于图像分析系统240中,图像分析系统240包括难度评估装置242,难度评估装置242还用于将计算结果提供给后续的图像校验系统260和/或图像可视系统280。运行时,本申请实施例提供的程序代码运行于服务器40的主机存储器41括数据库220)、CPU42速硬件GPU43其中图像分析系统240也可以称为云服务器图像分析平台,图像校验系统260也可以称为云服务图像校验平台,图像可视系统280也可以称为云服务图像可视平台。需要说明的是,对数据库220、图像分析系统240、图像校验系统260和图像可视系统280的介绍可参考上述实施例中的相关描述,在此不再赘述。In a possible implementation, as shown in Figure 4, the electronic device can be a server 40, and the product implementation form of this case is a neural network model and program codes included in the big data analysis platform and deployed on the hardware of the server 40 . The neural network model and program code involved in the embodiment of the present application are deployed in the image analysis system 240. The image analysis system 240 includes a difficulty evaluation device 242, and the difficulty evaluation device 242 is also used to provide calculation results to the subsequent image verification system 260 and and/or image visualization system 280 . During operation, the program code provided by the embodiment of the present application operates on the host memory 41 (including database 220) of the server 40, CPU42 fast hardware GPU43 wherein the image analysis system 240 can also be called a cloud server image analysis platform, and the image verification system 260 can also be Known as a cloud service image verification platform, the image visualization system 280 may also be called a cloud service image visualization platform. It should be noted that, for the introduction of the database 220 , the image analysis system 240 , the image verification system 260 and the image visualization system 280 , reference may be made to the relevant descriptions in the above embodiments, which will not be repeated here.
下面,采用几个示例性实施例对本申请实施例提供的图像处理方法的进行介绍。In the following, several exemplary embodiments are used to introduce the image processing method provided by the embodiment of the present application.
请参考图5,其示出了本申请一个示例性实施例提供的图像处理方法的流程图,本实施例以该方法用于图3或图4所示的电子设备中来举例说明。该方法包括以下几个步骤。Please refer to FIG. 5 , which shows a flowchart of an image processing method provided by an exemplary embodiment of the present application. This embodiment is illustrated by using the method in the electronic device shown in FIG. 3 or 4 . The method includes the following steps.
步骤501,构建模型池,模型池包括多个第一图像分类模型。 Step 501, build a model pool, the model pool includes a plurality of first image classification models.
可选的,电子设备仿照多类型的分层抽样方式,构建一个模型池,该模型池包括多个分类效果强弱不同的第一图像分类模型。分层抽样方式是将总体按照一定标志分成若干层,分别从各层中抽检一定数量样本的方式。在样本分类难度的抽样调查中,需要考虑的标志包括人的知识水平(比如学历),生活经验(比如年龄)等。类似地,在构造模型池时,需要考虑的标志包括模型结构,训练图像数量和模型训练时长等,如此,使得模型池更具代表性,使得后续计算样本分类难度时的偏差尽可能少。Optionally, the electronic device builds a model pool in the manner of multi-type stratified sampling, and the model pool includes multiple first image classification models with different classification effects. The stratified sampling method is to divide the population into several strata according to a certain mark, and select a certain number of samples from each stratum. In the sampling survey of the difficulty of sample classification, the indicators that need to be considered include people's knowledge level (such as education background), life experience (such as age) and so on. Similarly, when constructing a model pool, the signs that need to be considered include the model structure, the number of training images, and the duration of model training, etc., so that the model pool is more representative and the deviation in the subsequent calculation of sample classification difficulty is as small as possible.
在一种可能的实现方式中,电子设备选取原始的多种第二图像分类模型,分别在大小不同的训练集上训练选取的多种第二图像分类模型。在每次训练过程中,在模型训练的不同阶段(初期,中期及接近收敛)时分别收集模型,得到多个分类效果强弱不同的第一图像分类模型。In a possible implementation manner, the electronic device selects multiple types of original second image classification models, and trains the selected multiple types of second image classification models on training sets of different sizes. During each training process, models are collected at different stages of model training (initial stage, middle stage and close to convergence), and multiple first image classification models with different classification effects are obtained.
可选的,模型池中存在至少两个第一图像分类模型各自对应的模型训练参数是不同的,模型训练参数包括训练集、第二图像分类模型的类型和模型训练时长中的至少一个。Optionally, there are at least two first image classification models in the model pool, and the corresponding model training parameters are different, and the model training parameters include at least one of the training set, the type of the second image classification model, and the model training duration.
示意性的,第二图像分类模型的类型包括但不限于如表一所示的模型类型中的至少一种。Schematically, the type of the second image classification model includes but is not limited to at least one of the model types shown in Table 1.
表一Table I
模型Model 参数数量(百万)Number of parameters (millions) 内存(MB)RAM(MB)
VGG16VGG16 138138 500500
ResNET50ResNET50 2525 9898
ResNET101ResNET101 4444 174174
InceptionV3InceptionV3 24twenty four 9292
XceptionXception 23twenty three 8888
DenseNet121 DenseNet121 88 3333
DenseNet169DenseNet169 1414 5757
DenseNet201 DenseNet201 2020 8080
EfficientNetB0EfficientNetB0 5.35.3 2020
EfficientNetB2EfficientNetB2 99 3636
示意性的,电子设备获取目标数据集(比如ImageNet数据集),通过在目标数据集中随机抽样分别得到大小为目标数据集的25%、50%和75%的子数据集,作为大小不同的3个训练集。本申请实施例对此不加以限定。Schematically, the electronic device acquires a target data set (such as an ImageNet data set), and obtains sub-data sets whose sizes are 25%, 50% and 75% of the target data set by random sampling in the target data set, as 3 sub-data sets of different sizes. a training set. This embodiment of the present application does not limit it.
电子设备构建模型池,将构建好的模型池存储在电子设备中。The electronic device constructs a model pool, and stores the constructed model pool in the electronic device.
步骤502,将数据集分别输入至模型池的每个第一图像分类模型中,得到多个第一图像分类模型各自对应的预测分布结果,预测分布结果用于指示每个样本图像在数据集对应的多个图像标签上的概率分布。 Step 502, input the data set into each first image classification model in the model pool respectively, and obtain the prediction distribution results corresponding to each of the multiple first image classification models, and the prediction distribution results are used to indicate that each sample image is in the data set corresponding to Probability distributions over multiple image labels for .
电子设备获取数据集和模型池,数据集包括多个样本图像,模型池包括预先训练完成的多个第一图像分类模型。电子设备将数据集分别输入至模型池的每个第一图像分类模型中,得到多个第一图像分类模型各自对应的预测分布结果,预测分布结果用于指示每个样本图像在数据集对应的多个图像标签上的概率分布。The electronic device acquires a data set and a model pool, the data set includes a plurality of sample images, and the model pool includes a plurality of pre-trained first image classification models. The electronic device respectively inputs the data set into each first image classification model in the model pool, and obtains prediction distribution results corresponding to each of the multiple first image classification models, and the prediction distribution result is used to indicate that each sample image is in the data set corresponding Probability distributions over multiple image labels.
步骤503,根据多个预测分布结果,确定数据集中至少一个样本图像对应的样本分类难度,并确定多个图像标签中任意两个图像标签间的标签区分难度。 Step 503 , according to multiple prediction distribution results, determine the sample classification difficulty corresponding to at least one sample image in the data set, and determine the label discrimination difficulty between any two image labels among the multiple image labels.
其中,样本分类难度用于指示样本图像被模型分类的难易程度。标签区分难度用于指示任意两个图像标签被模型区分的难易程度。Wherein, the sample classification difficulty is used to indicate the degree of difficulty of the sample image being classified by the model. Label discrimination difficulty is used to indicate how easy it is for any two image labels to be distinguished by the model.
在一种可能的实现方式中,采用一个度量指标来评估样本分类难度,即混淆困惑度(Confusion Perplexity,C-Perplexity)。混淆困惑度通过对单个样本,计算其多个预测分布的平均不确定度获得。混淆困惑度可以同时适用于带标签或无标签的样本图像。对于分类难度低的样本图像,大部分模型应该对计算结果有较低的不确定性,反之则相反。In a possible implementation, a metric is used to evaluate the difficulty of sample classification, that is, Confusion Perplexity (C-Perplexity). Confusion perplexity is obtained by calculating the average uncertainty of multiple prediction distributions for a single sample. Confusion perplexity can be applied to both labeled or unlabeled sample images. For sample images with low classification difficulty, most models should have low uncertainty about the calculation results, and vice versa.
可选的,对于至少一个样本图像中的每个样本图像,电子设备根据多个预测分布结果和模型池中的第一图像分类模型的个数,确定样本图像的混淆困惑度,混淆困惑度用于指示样本图像对应的样本分类难度。Optionally, for each sample image in at least one sample image, the electronic device determines the confusion perplexity of the sample image according to the multiple prediction distribution results and the number of the first image classification models in the model pool, where the confusion perplexity is represented by Indicates the sample classification difficulty corresponding to the sample image.
可选的,样本图像的混淆困惑度与该样本图像的样本分类难度呈正相关关系,即样本图像的混淆困惑度越大,表示该样本图像的样本分类难度越高,即样本图像越难 被模型分类。Optionally, the degree of confusion of the sample image is positively correlated with the difficulty of sample classification of the sample image, that is, the greater the degree of confusion of the sample image, the higher the difficulty of sample classification of the sample image, that is, the harder the sample image is to be classified by the model Classification.
可选的,样本图像的混淆困惑度与各个第一图像分类模型对应的样本图像的预测分布结果的熵正相关,且与第一图像分类模型的个数负相关,第一图像分类模型对应的预测分布结果的熵是根据第一图像分类模型输出的样本图像在各个图像标签上的概率确定的。Optionally, the confusion perplexity of the sample image is positively correlated with the entropy of the predicted distribution result of the sample image corresponding to each first image classification model, and is negatively correlated with the number of the first image classification model. The entropy of the predicted distribution result is determined according to the probabilities of the sample images output by the first image classification model on each image label.
示意性的,电子设备根据多个预测分布结果和模型池中的第一图像分类模型的个数,通过如下公式确定样本图像的混淆困惑度φ c(x): Schematically, the electronic device determines the confusion perplexity φ c (x) of the sample image by the following formula according to the multiple prediction distribution results and the number of the first image classification models in the model pool:
Figure PCTCN2022104184-appb-000001
其中H i(x)=-∑ yP i(y|x)log 2P i(y|x);
Figure PCTCN2022104184-appb-000001
Wherein H i (x)=-∑ y P i (y|x)log 2 P i (y|x);
其中,x为样本图像,y为图像标签,N为模型池中的第一图像分类模型的个数,H i(x)为模型池中第i个第一图像分类模型对应的预测分布结果的熵,P i(y|x)为第i个第一图像分类模型输出的样本图像x在图像标签y上的概率,i和N均为正整数。 Among them, x is the sample image, y is the image label, N is the number of the first image classification model in the model pool, H i (x) is the prediction distribution result corresponding to the i-th first image classification model in the model pool Entropy, P i (y|x) is the probability of the sample image x output by the i-th first image classification model on the image label y, and both i and N are positive integers.
在一种可能的实现方式中,采用一个度量指标来评估两个图像标签间的区分难度,即标签混淆指数(Confusion Index,CI)。标签混淆指数衡量对于数据集中的任意两个图像标签,模型池中的第一图像分类模型对这两个图像标签的混淆程度,通过计算整个数据集在这两个图像标签上的预测分布的相似性得到。若这两个图像标签的概率同时高或者同时低,则这两个图像标签的区分难度比较高,若这两个图像标签的概率一个高且另一个低,则这这两个图像标签的区分难度比较低。In a possible implementation, a metric is used to evaluate the difficulty of distinguishing between two image labels, that is, label confusion index (Confusion Index, CI). The label confusion index measures the degree to which the first image classification model in the model pool confuses any two image labels in the dataset by calculating the similarity of the predicted distributions of the entire dataset on the two image labels Sex gets. If the probabilities of the two image tags are high or low at the same time, the difficulty of distinguishing the two image tags is relatively high. If the probability of the two image tags is high and the other is low, the distinction between the two image tags The difficulty is relatively low.
其中,按照样本图像进行划分,每个第一图像分类模型对应的预测分布结果包括每个样本图像在多个图像标签上的概率分布。按照图像标签划分,每个第一图像分类模型对应的预测分布结果包括每个图像标签对应的预测分布,预测分布用于指示多个样本图像在该图像标签上的概率。Wherein, the division is performed according to the sample images, and the prediction distribution result corresponding to each first image classification model includes the probability distribution of each sample image on multiple image labels. According to the classification of image labels, the prediction distribution result corresponding to each first image classification model includes the prediction distribution corresponding to each image label, and the prediction distribution is used to indicate the probability of multiple sample images on the image label.
对于多个图像标签中任意两个图像标签(即标签对),电子设备根据多个预测分布结果和模型池中的第一图像分类模型的个数,确定任意两个图像标签间的标签混淆指数,标签混淆指数用于指示任意两个图像标签间的标签区分难度。For any two image tags (i.e., tag pairs) among the multiple image tags, the electronic device determines the label confusion index between any two image tags according to the multiple prediction distribution results and the number of the first image classification models in the model pool , the label confusion index is used to indicate the difficulty of label discrimination between any two image labels.
可选的,任意两个图像标签间的标签混淆指数与这两个图像标签间的标签区分难度呈正相关关系,即两个图像标签间的标签混淆指数越大,表示这两个图像标签间的标签区分难度越高,即这两个图像标签越难被模型区分。Optionally, the label confusion index between any two image tags is positively correlated with the difficulty of distinguishing the labels between the two image tags, that is, the greater the label confusion index between the two image tags, the greater the difference between the two image tags. The higher the difficulty of label discrimination, that is, the harder it is for the two image labels to be distinguished by the model.
可选的,任意两个图像标签间的标签混淆指数是根据数据集在两个图像标签上的概率分布的相似度确定的。Optionally, the label confusion index between any two image labels is determined according to the similarity of the probability distribution of the data set on the two image labels.
示意性的,电子设备根据多个预测分布结果和模型池中的第一图像分类模型的个数,通过如下公式确定任意两个图像标签间的标签混淆指数φ c(y 1,y 2): Schematically, the electronic device determines the label confusion index φ c (y 1 , y 2 ) between any two image labels according to the multiple prediction distribution results and the number of the first image classification models in the model pool by the following formula:
Figure PCTCN2022104184-appb-000002
Figure PCTCN2022104184-appb-000002
其中,x为样本图像,y 1和y 2为任意两个图像标签,N为模型池中的第一图像分类模型的个数,P i(y 1|x)为第i个第一图像分类模型输出的样本图像x在图像标签y 1上的概率,P i(y 2|x)为第i个第一图像分类模型输出的样本图像x在图像标签y 2上的概率,E x为数据集中的多个样本图像的概率的平均值,i和N均为正整数。 Among them, x is the sample image, y 1 and y 2 are any two image labels, N is the number of first image classification models in the model pool, P i (y 1 |x) is the i-th first image classification The probability of the sample image x output by the model on the image label y 1 , P i (y 2 |x) is the probability of the sample image x output by the i-th first image classification model on the image label y 2 , E x is the data The average of the probabilities of multiple sample images in the set, i and N are both positive integers.
在一个示意性的例子中,如图6所示,电子设备将数据集61分别输入至模型池62的每个第一图像分类模型中,得到图像标签1对应的预测分布和图像标签2对应的 预测分布,根据图像标签1的预测分布和图像标签2的预测分布进行相似度计算,确定这两个图像标签间的标签混淆指数。In a schematic example, as shown in FIG. 6 , the electronic device inputs the data set 61 into each first image classification model in the model pool 62 to obtain the predicted distribution corresponding to image label 1 and the corresponding prediction distribution of image label 2. Prediction distribution, calculate the similarity according to the prediction distribution of image label 1 and the prediction distribution of image label 2, and determine the label confusion index between these two image labels.
步骤504,显示至少一个样本图像对应的样本分类难度和/或多个图像标签中任意两个图像标签间的标签区分难度。 Step 504, displaying at least one sample classification difficulty corresponding to the sample image and/or label discrimination difficulty between any two image labels among the plurality of image labels.
可选的,电子设备根据上述的计算结果,以预设显示形式显示样本分类难度和标签区分难度的整体分布情况。比如,预设显示形式为数据仪表盘的形式。Optionally, the electronic device displays the overall distribution of sample classification difficulty and label discrimination difficulty in a preset display form according to the above calculation results. For example, the preset display form is in the form of a data dashboard.
可选的,电子设备以交互式的形式显示数据集中的单个样本图像的样本分类难度和/或其中两个图像标签间的标签区分难度。本申请实施例对此不加以限定。Optionally, the electronic device displays in an interactive form the difficulty of classifying a single sample image in the data set and/or the difficulty of distinguishing labels between two image labels. This embodiment of the present application does not limit it.
步骤505,根据至少一个样本图像对应的样本分类难度和/或多个图像标签中任意两个图像标签间的标签区分难度进行难度分析,输出得到原因信息。Step 505: Perform difficulty analysis according to the difficulty of classifying samples corresponding to at least one sample image and/or the difficulty of label distinction between any two image labels among the plurality of image labels, and output cause information.
可选的,电子设备基于上述的数据仪表盘,进行样本分类难度和标签区分难度分析,输出导致样本分类难度和标签区分难度高的原因信息。Optionally, the electronic device analyzes the difficulty of sample classification and label discrimination based on the above-mentioned data dashboard, and outputs information on the reasons for the high difficulty of sample classification and label discrimination.
需要说明的是,上述步骤504和步骤505为可选的步骤,即上述步骤504和步骤505可以都执行,也可以都不执行,还可以择一执行,本申请实施例对此不加以限定。It should be noted that the above step 504 and step 505 are optional steps, that is, both the above step 504 and step 505 may be performed, or neither may be performed, or one may be performed, which is not limited in this embodiment of the present application.
综上所述,本申请实施例提供的图像处理方法可以自动地评估样本分类难度及标签区分难度,而不需要人力参与,从而可以应用于大规模数据;针对有标注或者没有标注的样本图像均可以计算其样本分类难度;针对标签规模大的数据集可以计算数据集对应的多个图像标签中任意两个图像标签间的标签区分难度。To sum up, the image processing method provided by the embodiment of the present application can automatically evaluate the difficulty of sample classification and label distinction without human participation, so it can be applied to large-scale data; The difficulty of classifying samples can be calculated; for a data set with a large scale of labels, the difficulty of label discrimination between any two image labels among the multiple image labels corresponding to the data set can be calculated.
本申请实施例还引入了混淆困惑度作为样本分类难度度量方式及标签混淆指数作为标签区分难度度量方式,一方面,使得样本分类难度的评估可以在有标签数据或无标签数据中进行;另一方面,有效且高效地评估了标签区分难度,从而为后续新的分类方法的设计和数据校验等应用提供了基础能力。The embodiment of the present application also introduces confusion perplexity as a measure of sample classification difficulty and label confusion index as a measure of label distinction difficulty. On the one hand, the evaluation of sample classification difficulty can be performed in labeled data or unlabeled data; on the other hand On the one hand, it effectively and efficiently evaluates the difficulty of label discrimination, thus providing basic capabilities for the design of new classification methods and applications such as data verification.
本申请实施例还通过构建模型池作为难度评估的基础,避免了相关技术中受试者参与的情况,可以实现大规模自动化的评估;还通过对模型的选择采用了分层抽样原则,使用本方案构建的模型池所得到的分布可以避免由于模型导致的偏差,以确保最终计算结果不受模型选择偏差的影响。The embodiment of the present application also builds a model pool as the basis for difficulty evaluation, avoiding the participation of subjects in related technologies, and can realize large-scale automated evaluation; also adopts the principle of stratified sampling for the selection of models, using this The distribution obtained by the model pool constructed by the scheme can avoid the bias caused by the model, so as to ensure that the final calculation result is not affected by the model selection bias.
在一个示意性的例子中,数据集为ImageNet数据集,模型池包括500个分类效果强弱不同的第一图像分类模型,模型池的构建方法采取的是类似抽样调查中分层抽样的方法,针对ImageNet数据集采用本申请实施例提出的混淆困惑度的计算方法,得到整体分布情况如图7所示。作为模型池导致的差异情况,同时计算只选用强分类器的模型池作为比对,其中,横坐标为混淆困惑度,纵坐标为密度,即当前位置单位的混淆困惑度的样本数占总体样本数的比例,以数学的形式描述的话是δratio(CP)/δCP,ratio(CP)是指混淆困惑度为CP时的样本数占总体样本数的比例。从图7可以看出,使用本申请实施例构建的模型池所得到的分布相比仅使用强分类器的模型池分布更为均匀。并且,数据集中大部分样本图像属于容易样本(混淆困惑度接近1)。In an illustrative example, the data set is the ImageNet data set, and the model pool includes 500 first image classification models with different classification effects. The construction method of the model pool is similar to the stratified sampling method in the sample survey. For the ImageNet data set, the calculation method of confusion perplexity proposed in the embodiment of the present application is adopted, and the overall distribution is obtained as shown in FIG. 7 . As the difference caused by the model pool, at the same time, only the model pool with strong classifiers is used as a comparison, where the abscissa is the degree of confusion, and the ordinate is the density, that is, the number of samples with the degree of confusion in the current position unit accounts for the total number of samples The ratio of numbers, described in mathematical form, is δratio(CP)/δCP, and ratio(CP) refers to the ratio of the number of samples when the confusion degree is CP to the total number of samples. It can be seen from FIG. 7 that the distribution of the model pool constructed by using the embodiment of the present application is more uniform than that of the model pool using only strong classifiers. Moreover, most of the sample images in the dataset belong to easy samples (confusion perplexity close to 1).
在另一个示意性的例子中,采用本申请实施例提供的图像处理方法得到的部分样本图像的计算结果如图8所示,图8中同时展示了相关技术中计算错误困惑度(Misclassification Perplexity,X-Perplexity)的计算结果作为对比。其中错误困惑度为模型对样本图像的误分率,仅能针对带有标签的数据进行分类难度评估,而在现实场 景中带有标签的数据获取成本较高,未标注的数据更容易获取。从图8可以看出,一方面,高混淆困惑度图像往往物体较多,难以分类;低混淆困惑度图像往往对象单一,容易分类,符合认知情况。另一方面,在该数据集下大体上本方案提出的混淆困惑度与相关技术中的错误困惑度在大多数情况下匹配,但值得注意的是部分数据错误困惑高的而其混淆困惑度却比较低,这部分数据往往是由于标签本身发生了错误,这一特点可以后续应用于标注数据校验。In another schematic example, the calculation results of some sample images obtained by using the image processing method provided by the embodiment of the present application are shown in Figure 8, and Figure 8 also shows the calculation error perplexity (Misclassification Perplexity, X-Perplexity) for comparison. Among them, the error perplexity is the misclassification rate of the sample image by the model, which can only evaluate the classification difficulty of the labeled data, but in the real scene, the acquisition cost of the labeled data is high, and the unlabeled data is easier to obtain. It can be seen from Figure 8 that, on the one hand, images with high confusion and perplexity often have many objects and are difficult to classify; images with low confusion and perplexity often have single objects and are easy to classify, which conforms to the cognitive situation. On the other hand, under this data set, the confusion perplexity proposed by this scheme generally matches the error perplexity in related technologies in most cases, but it is worth noting that some data have high error perplexity but their confusion perplexity is low. Relatively low, this part of the data is often due to errors in the label itself, and this feature can be subsequently applied to label data verification.
保持同样的实验设置,按照上述实施例所述,进行标签混淆指数的计算,其整体分布如图9所示,部分标签对的计算结果如图10所示。从图10可以看出,易混淆的标签对通常较为相似,不易混淆的标签对通常差别较大,符合认知情况。Keeping the same experimental settings, calculate the label confusion index according to the above-mentioned examples, the overall distribution is shown in Figure 9, and the calculation results of some label pairs are shown in Figure 10. It can be seen from Figure 10 that the easily confused label pairs are usually relatively similar, and the difficultly confused label pairs are usually quite different, which is in line with the cognitive situation.
在对导致样本分类难度高原因的探查过程中,发现当样本错误困惑度极高而混淆困惑度低时,极有可能是人工标注的过程中图像标签错误的情况。缺少混淆困惑度的度量,不能快速定位到这些样本。因此,本申请实施例提供的图像处理方法可以应用于数据集中错误标注数据的校验和修正。在一种可能的实现方式中,上述的图像处理方法还包括但不限于如下几个步骤,如图11所示:In the process of investigating the reasons for the high difficulty of sample classification, it is found that when the sample error perplexity is extremely high and the confusion perplexity is low, it is very likely that the image label is wrong in the process of manual labeling. Lacking a measure of confusion perplexity, these samples cannot be quickly located. Therefore, the image processing method provided by the embodiment of the present application can be applied to the verification and correction of incorrectly labeled data in the dataset. In a possible implementation, the above image processing method also includes but not limited to the following steps, as shown in Figure 11:
步骤1101,根据数据集中的多个样本图像各自对应的样本分类难度和预设的样本分类难度阈值,对数据集进行筛选。 Step 1101, filter the data set according to the sample classification difficulty corresponding to each of the plurality of sample images in the data set and the preset sample classification difficulty threshold.
其中,筛选后的数据集中的每个样本图像的样本分类难度小于预设的样本分类难度阈值。Wherein, the sample classification difficulty of each sample image in the filtered data set is less than a preset sample classification difficulty threshold.
可选的,样本分类难度阈值是默认设置的,或者是自定义设置的。本申请实施例对此不加以限定。Optionally, the sample classification difficulty threshold is set by default, or is set by a user. This embodiment of the present application does not limit it.
可选的,采用混淆困惑度作为样本分类难度度量方式,样本分类难度阈值为混淆困惑度阈值。根据数据集中的多个样本图像各自对应的混淆困惑度和预设的混淆困惑度阈值,对数据集进行筛选,筛选后的数据集中的每个样本图像的混淆困惑度小于预设的混淆困惑度阈值。Optionally, the confusion perplexity is used as the sample classification difficulty measurement method, and the sample classification difficulty threshold is the confusion perplexity threshold. Filter the data set according to the confusion degrees corresponding to the multiple sample images in the data set and the preset confusion degree threshold, and the confusion degree of each sample image in the filtered data set is less than the preset confusion degree threshold.
可选的,根据数据集中的多个样本图像各自对应的混淆困惑度和错误困惑度、预设的混淆困惑度阈值和错误困惑度阈值,对数据集进行筛选,筛选后的数据集中的每个样本图像的混淆困惑度小于预设的混淆困惑度阈值,且每个样本图像的错误困惑度大于预设的错误困惑度阈值。Optionally, the data set is screened according to the corresponding confusion perplexity and false perplexity of multiple sample images in the data set, the preset confusion perplexity threshold and false perplexity threshold, and each image in the filtered data set The aliasing perplexity of the sample images is less than a preset aliasing perplexity threshold, and the false perplexity of each sample image is greater than the preset false perplexity threshold.
示意性的,可以通过如下公式确定样本图像的错误困惑度XP(x):Schematically, the error perplexity XP(x) of the sample image can be determined by the following formula:
Figure PCTCN2022104184-appb-000003
Figure PCTCN2022104184-appb-000003
其中,x为任意输入的样本图像,N为模型池中的第一图像分类模型的个数,C i(x)为第i个第一图像分类模型在样本图像x上的预测标签(通常取概率最大的标签),y gt为样本图像x的真实标签,I(Ω)为指示函数,其中I(Ω)的定义如下: Among them, x is any input sample image, N is the number of first image classification models in the model pool, C i (x) is the prediction label of the i-th first image classification model on the sample image x (usually taken as The label with the highest probability), y gt is the real label of the sample image x, I(Ω) is the indicator function, where I(Ω) is defined as follows:
Figure PCTCN2022104184-appb-000004
Figure PCTCN2022104184-appb-000004
可选的,混淆困惑度阈值和/或错误困惑度阈值是默认设置的,或者是自定义设置的。本申请实施例对此不加以限定。Optionally, the confusion perplexity threshold and/or the false perplexity threshold are set by default, or are custom-set. This embodiment of the present application does not limit it.
步骤1102,对于筛选后的数据集中的每个样本图像,根据模型池最大比例输出的 目标图像标签,对样本图像的图像标签进行修正。 Step 1102, for each sample image in the filtered data set, correct the image label of the sample image according to the target image label output by the maximum proportion of the model pool.
对于筛选后的数据集中的每个样本图像,将该样本图像分别输入至模型池的每个第一图像分类模型中得到多个第一图像分类模型各自对应的最高概率标签,每个第一图像分类模型的最高概率标签为该样本图像的预测分布结果中最高概率对应的图像标签。For each sample image in the filtered data set, input the sample image into each first image classification model of the model pool to obtain the respective highest probability labels corresponding to multiple first image classification models, each first image The highest probability label of the classification model is the image label corresponding to the highest probability in the prediction distribution results of the sample image.
可选的,对于筛选后的数据集中的每个样本图像,将多个第一图像分类模型各自对应的最高概率标签中重复次数最多的标签确定为模型池最大比例输出的目标图像标签,将该样本图像的图像标签修正为该目标图像标签。Optionally, for each sample image in the filtered data set, the label with the highest number of repetitions among the highest probability labels corresponding to each of the multiple first image classification models is determined as the target image label output by the maximum proportion of the model pool, and the The image label of the sample image is corrected to this target image label.
示意性的,模型池包括3个第一图像分类模型,将该样本图像分别输入至这3个第一图像分类模型中,得到3个第一图像分类模型各自对应的最高概率标签,即最高概率标签为(“狗”,“猫”,“狗”),其中狗出现了2次,猫出现一次,则模型池最大比例输出的目标图像标签为“狗”。Schematically, the model pool includes three first image classification models, and the sample image is respectively input into these three first image classification models to obtain the highest probability labels corresponding to each of the three first image classification models, that is, the highest probability The label is ("dog", "cat", "dog"), where the dog appears 2 times and the cat appears once, then the target image label output by the maximum proportion of the model pool is "dog".
在一个示意性的例子中,如图12所示,预先设置错误困惑度阈值为θ X,混淆困惑度阈值为θ C;按照筛选条件“S={XP i>θ X}∩{CP i<θ C}for i∈T(T为数据全集)”对数据集进行筛选,得到筛选后的数据集;对于筛选后的数据集中的每个样本图像,通过如下方式对样本图像的原始图像标签进行修正:“L i=TVL i for i∈S”,其中L i为修正后的标签,TVL为模型池最大比例输出的目标图像标签。 In a schematic example, as shown in Figure 12, the error perplexity threshold is preset as θ X , and the confusion perplexity threshold is θ C ; according to the filter condition "S={XP iX }∩{CP i < θ C }for i∈T (T is the complete set of data)" to filter the data set to obtain the filtered data set; for each sample image in the filtered data set, the original image label of the sample image is processed in the following way Correction: "L i = TVL i for i∈S", where L i is the corrected label, and TVL is the target image label output by the maximum proportion of the model pool.
对于图像分类模型来说容易把一些类跟其他类混淆,比如总是容易将大提琴与小提琴混淆,将木吉他、电吉他和五弦琴混淆。但同时有一些类别容易被区分,如“沙漏”。基于此,本申请实施例提供了一种多级的图像分类方法。在一种可能的实现方式中,上述的图像处理方法还包括但不限于如下几个步骤,如图13所示:For image classification models, it is easy to confuse some classes with other classes. For example, it is always easy to confuse a cello with a violin, and confuse an acoustic guitar, an electric guitar, and a banjo. But at the same time, some categories are easy to distinguish, such as "hourglass". Based on this, the embodiment of the present application provides a multi-level image classification method. In a possible implementation, the above image processing method also includes but not limited to the following steps, as shown in Figure 13:
步骤1301,根据数据集中任意两个图像标签间的标签区分难度构建第三图像分类模型,第三图像分类模型为多级分类模型;根据训练集对第三图像分类模型进行训练得到第四图像分类模型。 Step 1301, constructing a third image classification model according to the label discrimination difficulty between any two image labels in the data set, the third image classification model is a multi-level classification model; training the third image classification model according to the training set to obtain a fourth image classification model Model.
可选的,根据图像标签间的标签区分难度,得到一种创新的多级分类方法。首先根据区分难度将标签聚为不同的簇,使得不同簇间标签容易区分,同一簇内标签难以区分,接着为不同的簇构建相对简单的模型(可以称之为簇模型),为同一簇内构建中等复杂的模型(可以称之为类模型),从而得到多级分类模型即第三图像分类模型。最终根据训练集对第三图像分类模型进行训练(训练方法和普通深度学习方法一致,只是损失函数需要添加簇模型对应的损失函数),训练得到第四图像分类模型。调用第四图像分类模型进行推理时,由于模型平均复杂度相比原来的复杂大模型可以降低,因此可以实现加速推理,有助于进行终端部署。其中,第四图像分类模型为第三图像分类模型训练后得到的分类模型,即第四图像分类模型为多级分类模型。Optionally, an innovative multi-level classification method is obtained according to the label discrimination difficulty between image labels. First, the labels are clustered into different clusters according to the difficulty of distinguishing, so that the labels between different clusters are easy to distinguish, and the labels in the same cluster are difficult to distinguish. Then, a relatively simple model (which can be called a cluster model) is constructed for different clusters. A moderately complex model (which can be called a class model) is constructed to obtain a multi-level classification model, that is, a third image classification model. Finally, the third image classification model is trained according to the training set (the training method is consistent with the common deep learning method, except that the loss function needs to add the loss function corresponding to the cluster model), and the fourth image classification model is obtained through training. When calling the fourth image classification model for reasoning, since the average complexity of the model can be reduced compared with the original complex large model, accelerated reasoning can be achieved, which is helpful for terminal deployment. Wherein, the fourth image classification model is a classification model obtained after training the third image classification model, that is, the fourth image classification model is a multi-level classification model.
可选的,第三图像分类模型和第四图像分类模型均包括第一神经网络和第二神经网络,第一神经网络用于将输入的目标图像进行第一分类处理得到第一分类结果,第一分类结果对应多个第二分类结果,第二神经网络用于基于第一分类结果,将目标图像进行第二分类处理得到第二分类结果。Optionally, both the third image classification model and the fourth image classification model include a first neural network and a second neural network, the first neural network is used to perform the first classification processing on the input target image to obtain the first classification result, and the second One classification result corresponds to a plurality of second classification results, and the second neural network is used to perform second classification processing on the target image based on the first classification result to obtain the second classification result.
其中,第一分类结果为粗颗粒的分类结果,第二分类结果为细颗粒的分类结果, 即第一分类结果对应的多个第二分类结果属于该第一分类结果。可选的,第一分类结果用于指示图像类型,第二分类结果用于指示图像标签。比如,第一分类结果为提琴类,该第一分类结果对应的两个可能的第二分类结果:小提琴和大提琴。又比如,第一分类结果为吉他类,该第一分类结果对应的三个可能的第二分类结果:木吉他、电吉他和五弦琴。Wherein, the first classification result is a coarse-grained classification result, and the second classification result is a fine-grained classification result, that is, multiple second classification results corresponding to the first classification result belong to the first classification result. Optionally, the first classification result is used to indicate the image type, and the second classification result is used to indicate the image label. For example, if the first classification result is violin, the first classification result corresponds to two possible second classification results: violin and cello. For another example, the first classification result is guitar, and the first classification result corresponds to three possible second classification results: acoustic guitar, electric guitar, and banjo.
步骤1302,在获取到待分类的目标图像的情况下,调用第四图像分类模型对目标图像进行分类处理,得到目标图像的分类结果。 Step 1302, when the target image to be classified is obtained, call the fourth image classification model to classify the target image, and obtain a classification result of the target image.
在获取到待分类的目标图像的情况下,将目标图像输入至第四图像分类模型中输出得到目标图像的分类结果,该分类结果用于指示该目标图像的图像标签。When the target image to be classified is obtained, the target image is input into the fourth image classification model to output a classification result of the target image, and the classification result is used to indicate the image label of the target image.
在一个示意性的例子中,如图14所示,将目标图像141输入至第四图像分类模型的第一神经网络142中得到中间结果即第一分类结果,第一分类结果可以为提琴类或吉他类或其他类,基于第一分类结果,将目标图像输入至第四图像分类模型的第二神经网络143中输出得到最终结果即第二分类结果,比如第一分类结果“提琴类”对应的第二分类结果为“小提琴或大提琴”,第一分类结果“吉他类”对应的第二分类结果为“木吉他或电吉他或五弦琴”。本申请实施例对此不加以限定。In a schematic example, as shown in FIG. 14 , the target image 141 is input into the first neural network 142 of the fourth image classification model to obtain an intermediate result, that is, the first classification result, and the first classification result can be violin class or Guitar class or other classes, based on the first classification result, the target image is input to the second neural network 143 of the fourth image classification model to output the final result, that is, the second classification result, such as the first classification result corresponding to the "violin class" The second classification result is "violin or cello", and the second classification result corresponding to the first classification result "guitar" is "acoustic guitar or electric guitar or banjo". This embodiment of the present application does not limit it.
综上所述,本实施例提供的图像处理方法,相较于一般的神经网络方法的优势在于,一方面,符合人类循序渐进的认知过程,先解决容易的问题,再解决困难的问题,可以学习到精度更高的模型;另一方面,进行推理时,由于模型平均复杂度可以降低,因此可以实现加速推理,有助于进行电子设备(比如终端)部署。To sum up, the image processing method provided by this embodiment has the advantage over the general neural network method that, on the one hand, it conforms to the step-by-step cognitive process of human beings, and solves easy problems first, and then solves difficult problems. A model with higher accuracy is learned; on the other hand, when inference is performed, since the average complexity of the model can be reduced, accelerated inference can be achieved, which is helpful for the deployment of electronic devices (such as terminals).
请参考图15,其示出了本申请一个示例性实施例提供的图像处理方法的流程图,本实施例以该方法用于图3或图4所示的电子设备中来举例说明。该方法包括以下几个步骤。Please refer to FIG. 15 , which shows a flowchart of an image processing method provided by an exemplary embodiment of the present application. This embodiment is illustrated by using the method in the electronic device shown in FIG. 3 or 4 . The method includes the following steps.
步骤1501,获取数据集和模型池,数据集包括多个样本图像,模型池包括预先训练完成的多个第一图像分类模型。 Step 1501, acquire a data set and a model pool, the data set includes a plurality of sample images, and the model pool includes a plurality of pre-trained first image classification models.
步骤1502,将数据集分别输入至模型池的每个第一图像分类模型中,得到多个第一图像分类模型各自对应的预测分布结果,预测分布结果用于指示每个样本图像在数据集对应的多个图像标签上的概率分布。 Step 1502, input the data set into each first image classification model in the model pool respectively, and obtain the prediction distribution results corresponding to each of the multiple first image classification models, and the prediction distribution results are used to indicate that each sample image is in the data set corresponding to Probability distributions over multiple image labels for .
步骤1503,根据多个预测分布结果,确定数据集中至少一个样本图像对应的样本分类难度,样本分类难度用于指示样本图像被模型分类的难易程度。Step 1503: Determine the sample classification difficulty corresponding to at least one sample image in the data set according to multiple prediction distribution results, where the sample classification difficulty is used to indicate the degree of difficulty for the sample image to be classified by the model.
需要说明的是,本实施例中各个步骤的实现细节可参考上述实施例中的相关描述,在此不再赘述。It should be noted that, for implementation details of each step in this embodiment, reference may be made to relevant descriptions in the foregoing embodiments, and details are not repeated here.
请参考图16,其示出了本申请另一个示例性实施例提供的图像处理装置的框图。该装置可以通过软件、硬件或者两者的结合实现成为电子设备的全部或者一部分。该装置可以包括:获取单元1610、输入单元1620和确定单元1630。Please refer to FIG. 16 , which shows a block diagram of an image processing apparatus provided by another exemplary embodiment of the present application. The device can be implemented as all or part of the electronic equipment through software, hardware or a combination of the two. The apparatus may include: an acquisition unit 1610 , an input unit 1620 and a determination unit 1630 .
获取单元1610,用于获取数据集和模型池,数据集包括多个样本图像,模型池包括预先训练完成的多个第一图像分类模型;An acquisition unit 1610, configured to acquire a data set and a model pool, the data set includes a plurality of sample images, and the model pool includes a plurality of pre-trained first image classification models;
输入单元1620,用于将数据集分别输入至模型池的每个第一图像分类模型中,得 到多个第一图像分类模型各自对应的预测分布结果,预测分布结果用于指示每个样本图像在数据集对应的多个图像标签上的概率分布;The input unit 1620 is configured to respectively input the data set into each first image classification model in the model pool, and obtain the prediction distribution results corresponding to each of the multiple first image classification models, and the prediction distribution results are used to indicate that each sample image is in The probability distribution over multiple image labels corresponding to the dataset;
确定单元1630,用于根据多个预测分布结果,确定数据集中至少一个样本图像对应的样本分类难度,样本分类难度用于指示样本图像被模型分类的难易程度。The determining unit 1630 is configured to determine the sample classification difficulty corresponding to at least one sample image in the data set according to the multiple prediction distribution results, where the sample classification difficulty is used to indicate the degree of difficulty of the sample image being classified by the model.
在一种可能的实现方式中,确定单元1630,还用于:In a possible implementation manner, the determining unit 1630 is further configured to:
对于至少一个样本图像中的每个样本图像,根据多个预测分布结果和模型池中的第一图像分类模型的个数,确定样本图像的混淆困惑度,混淆困惑度用于指示样本图像对应的样本分类难度。For each sample image in at least one sample image, according to a plurality of prediction distribution results and the number of the first image classification model in the model pool, determine the confusion perplexity of the sample image, and the confusion perplexity is used to indicate the corresponding Sample classification difficulty.
在另一种可能的实现方式中,样本图像的混淆困惑度与各个第一图像分类模型对应的样本图像的预测分布结果的熵正相关,且与第一图像分类模型的个数负相关,第一图像分类模型对应的预测分布结果的熵是根据第一图像分类模型输出的样本图像在各个图像标签上的概率确定的。In another possible implementation, the perplexity of the sample image is positively correlated with the entropy of the predicted distribution results of the sample image corresponding to each first image classification model, and is negatively correlated with the number of the first image classification models. The entropy of the predicted distribution result corresponding to an image classification model is determined according to the probability of the sample image output by the first image classification model on each image label.
在另一种可能的实现方式中,该装置还包括:In another possible implementation, the device also includes:
确定单元1630,还用于根据多个预测分布结果,确定多个图像标签中任意两个图像标签间的标签区分难度,标签区分难度用于指示任意两个图像标签被模型区分的难易程度。The determination unit 1630 is further configured to determine the label discrimination difficulty between any two image labels in the plurality of image labels according to the multiple prediction distribution results, and the label discrimination difficulty is used to indicate the degree of difficulty for any two image labels to be distinguished by the model.
在另一种可能的实现方式中,确定单元1630,还用于:In another possible implementation manner, the determining unit 1630 is further configured to:
对于多个图像标签中任意两个图像标签,根据多个预测分布结果和模型池中的第一图像分类模型的个数,确定任意两个图像标签间的标签混淆指数,标签混淆指数用于指示任意两个图像标签间的标签区分难度。For any two image labels in multiple image labels, according to the multiple prediction distribution results and the number of the first image classification model in the model pool, determine the label confusion index between any two image labels, and the label confusion index is used to indicate Label discrimination difficulty between any two image labels.
在另一种可能的实现方式中,任意两个图像标签间的标签混淆指数是根据数据集在两个图像标签上的概率分布的相似度确定的。In another possible implementation manner, the label confusion index between any two image labels is determined according to the similarity of the probability distribution of the data set on the two image labels.
在另一种可能的实现方式中,该装置还包括:训练单元;In another possible implementation manner, the device further includes: a training unit;
训练单元,用于根据训练集对原始的第二图像分类模型进行训练得到第一图像分类模型,训练集包括多个训练图像;The training unit is used to train the original second image classification model according to the training set to obtain the first image classification model, and the training set includes a plurality of training images;
其中,模型池中存在至少两个第一图像分类模型各自对应的模型训练参数是不同的,模型训练参数包括训练集、第二图像分类模型的类型和模型训练时长中的至少一个。Wherein, there are at least two first image classification models in the model pool, and the corresponding model training parameters are different, and the model training parameters include at least one of the training set, the type of the second image classification model, and the duration of model training.
在另一种可能的实现方式中,该装置还包括:修正单元;修正单元,用于:In another possible implementation manner, the device further includes: a correction unit; the correction unit is configured to:
根据数据集中的多个样本图像各自对应的样本分类难度和预设的样本分类难度阈值,对数据集进行筛选;Screening the data set according to the corresponding sample classification difficulty of each of the multiple sample images in the data set and a preset sample classification difficulty threshold;
对于筛选后的数据集中的每个样本图像,根据模型池最大比例输出的目标图像标签,对样本图像的图像标签进行修正。For each sample image in the filtered dataset, the image label of the sample image is corrected according to the target image label output by the largest proportion of the model pool.
在另一种可能的实现方式中,该装置还包括:分类单元;分类单元,用于:In another possible implementation manner, the device further includes: a classification unit; the classification unit is used for:
根据数据集中任意两个图像标签间的标签区分难度构建第三图像分类模型,第三图像分类模型为多级分类模型;Constructing a third image classification model according to the label discrimination difficulty between any two image labels in the data set, the third image classification model is a multi-level classification model;
根据训练集对第三图像分类模型进行训练得到第四图像分类模型;training the third image classification model according to the training set to obtain a fourth image classification model;
在获取到待分类的目标图像的情况下,调用第四图像分类模型对目标图像进行分类处理,得到目标图像的分类结果。When the target image to be classified is acquired, the fourth image classification model is invoked to perform classification processing on the target image to obtain a classification result of the target image.
需要说明的是,上述实施例提供的装置,在实现其功能时,仅以上述各功能模块的划分进行举例说明,实际应用中,可以根据需要而将上述功能分配由不同的功能模块完成,即将设备的内部结构划分成不同的功能模块,以完成以上描述的全部或者部分功能。另外,上述实施例提供的装置与方法实施例属于同一构思,其具体实现过程详见方法实施例,这里不再赘述。It should be noted that, when realizing the functions of the device provided by the above-mentioned embodiments, the division of the above-mentioned functional modules is used as an example for illustration. In practical applications, the above-mentioned function allocation can be completed by different functional modules according to the needs. The internal structure of the device is divided into different functional modules to complete all or part of the functions described above. In addition, the device and the method embodiment provided by the above embodiment belong to the same idea, and the specific implementation process thereof is detailed in the method embodiment, and will not be repeated here.
本申请实施例提供了一种图像处理装置,该装置包括:处理器;用于存储处理器可执行指令的存储器;其中,处理器被配置为执行指令时实现上述由电子设备执行的方法。An embodiment of the present application provides an image processing apparatus, which includes: a processor; a memory for storing instructions executable by the processor; wherein, the processor is configured to implement the above-mentioned method performed by the electronic device when executing the instructions.
本申请实施例提供了一种计算机程序产品,包括计算机可读代码,或者承载有计算机可读代码的非易失性计算机可读存储介质,当计算机可读代码在电子设备的处理器中运行时,电子设备中的处理器执行上述由电子设备执行的方法。An embodiment of the present application provides a computer program product, including computer-readable codes, or a non-volatile computer-readable storage medium carrying computer-readable codes, when the computer-readable codes are run in a processor of an electronic device , the processor in the electronic device executes the above method executed by the electronic device.
本申请实施例提供了一种非易失性计算机可读存储介质,其上存储有计算机程序指令,计算机程序指令被处理器执行时实现上述由电子设备执行的方法。An embodiment of the present application provides a non-volatile computer-readable storage medium, on which computer program instructions are stored, and when the computer program instructions are executed by a processor, the foregoing method executed by the electronic device is realized.
计算机可读存储介质可以是可以保持和存储由指令执行设备使用的指令的有形设备。计算机可读存储介质例如可以是――但不限于――电存储设备、磁存储设备、光存储设备、电磁存储设备、半导体存储设备或者上述的任意合适的组合。计算机可读存储介质的更具体的例子(非穷举的列表)包括:便携式计算机盘、硬盘、随机存取存储器(Random Access Memory,RAM)、只读存储器(Read Only Memory,ROM)、可擦式可编程只读存储器(Electrically Programmable Read-Only-Memory,EPROM或闪存)、静态随机存取存储器(Static Random-Access Memory,SRAM)、便携式压缩盘只读存储器(Compact Disc Read-Only Memory,CD-ROM)、数字多功能盘(Digital Video Disc,DVD)、记忆棒、软盘、机械编码设备、例如其上存储有指令的打孔卡或凹槽内凸起结构、以及上述的任意合适的组合。A computer readable storage medium may be a tangible device that can retain and store instructions for use by an instruction execution device. A computer readable storage medium may be, for example, but is not limited to, an electrical storage device, a magnetic storage device, an optical storage device, an electromagnetic storage device, a semiconductor storage device, or any suitable combination of the foregoing. More specific examples (non-exhaustive list) of computer-readable storage media include: portable computer disk, hard disk, random access memory (Random Access Memory, RAM), read only memory (Read Only Memory, ROM), erasable Electrically Programmable Read-Only-Memory (EPROM or flash memory), Static Random-Access Memory (Static Random-Access Memory, SRAM), Portable Compression Disk Read-Only Memory (Compact Disc Read-Only Memory, CD -ROM), Digital Video Disc (DVD), memory sticks, floppy disks, mechanically encoded devices such as punched cards or raised structures in grooves with instructions stored thereon, and any suitable combination of the foregoing .
这里所描述的计算机可读程序指令或代码可以从计算机可读存储介质下载到各个计算/处理设备,或者通过网络、例如因特网、局域网、广域网和/或无线网下载到外部计算机或外部存储设备。网络可以包括铜传输电缆、光纤传输、无线传输、路由器、防火墙、交换机、网关计算机和/或边缘服务器。每个计算/处理设备中的网络适配卡或者网络接口从网络接收计算机可读程序指令,并转发该计算机可读程序指令,以供存储在各个计算/处理设备中的计算机可读存储介质中。Computer readable program instructions or codes described herein may be downloaded from a computer readable storage medium to a respective computing/processing device, or downloaded to an external computer or external storage device over a network, such as the Internet, local area network, wide area network, and/or wireless network. The network may include copper transmission cables, fiber optic transmission, wireless transmission, routers, firewalls, switches, gateway computers, and/or edge servers. A network adapter card or a network interface in each computing/processing device receives computer-readable program instructions from the network and forwards the computer-readable program instructions for storage in a computer-readable storage medium in each computing/processing device .
用于执行本申请操作的计算机程序指令可以是汇编指令、指令集架构(Instruction Set Architecture,ISA)指令、机器指令、机器相关指令、微代码、固件指令、状态设置数据、或者以一种或多种编程语言的任意组合编写的源代码或目标代码,所述编程语言包括面向对象的编程语言—诸如Smalltalk、C++等,以及常规的过程式编程语言—诸如“C”语言或类似的编程语言。计算机可读程序指令可以完全地在用户计算机上执行、部分地在用户计算机上执行、作为一个独立的软件包执行、部分在用户计算机上部分在远程计算机上执行、或者完全在远程计算机或服务器上执行。在涉及远程计算机的情形中,远程计算机可以通过任意种类的网络—包括局域网(Local Area Network,LAN)或广域网(Wide Area Network,WAN)—连接到用户计算机,或者,可以连接到外 部计算机(例如利用因特网服务提供商来通过因特网连接)。在一些实施例中,通过利用计算机可读程序指令的状态信息来个性化定制电子电路,例如可编程逻辑电路、现场可编程门阵列(Field-Programmable Gate Array,FPGA)或可编程逻辑阵列(Programmable Logic Array,PLA),该电子电路可以执行计算机可读程序指令,从而实现本申请的各个方面。Computer program instructions for performing the operations of the present application may be assembly instructions, instruction set architecture (Instruction Set Architecture, ISA) instructions, machine instructions, machine-related instructions, microcode, firmware instructions, state setting data, or in one or more source or object code written in any combination of programming languages, including object-oriented programming languages—such as Smalltalk, C++, etc., and conventional procedural programming languages—such as the “C” language or similar programming languages. Computer readable program instructions may execute entirely on the user's computer, partly on the user's computer, as a stand-alone software package, partly on the user's computer and partly on a remote computer, or entirely on the remote computer or server implement. In cases involving a remote computer, the remote computer can be connected to the user's computer through any kind of network, including a Local Area Network (LAN) or a Wide Area Network (WAN), or it can be connected to an external computer such as use an Internet service provider to connect via the Internet). In some embodiments, electronic circuits, such as programmable logic circuits, field-programmable gate arrays (Field-Programmable Gate Array, FPGA) or programmable logic arrays (Programmable Logic Array, PLA), the electronic circuit can execute computer-readable program instructions, thereby realizing various aspects of the present application.
这里参照根据本申请实施例的方法、装置(系统)和计算机程序产品的流程图和/或框图描述了本申请的各个方面。应当理解,流程图和/或框图的每个方框以及流程图和/或框图中各方框的组合,都可以由计算机可读程序指令实现。Aspects of the present application are described herein with reference to flowchart illustrations and/or block diagrams of methods, apparatus (systems) and computer program products according to embodiments of the application. It should be understood that each block of the flowcharts and/or block diagrams, and combinations of blocks in the flowcharts and/or block diagrams, can be implemented by computer-readable program instructions.
这些计算机可读程序指令可以提供给通用计算机、专用计算机或其它可编程数据处理装置的处理器,从而生产出一种机器,使得这些指令在通过计算机或其它可编程数据处理装置的处理器执行时,产生了实现流程图和/或框图中的一个或多个方框中规定的功能/动作的装置。也可以把这些计算机可读程序指令存储在计算机可读存储介质中,这些指令使得计算机、可编程数据处理装置和/或其他设备以特定方式工作,从而,存储有指令的计算机可读介质则包括一个制造品,其包括实现流程图和/或框图中的一个或多个方框中规定的功能/动作的各个方面的指令。These computer-readable program instructions may be provided to a processor of a general purpose computer, special purpose computer, or other programmable data processing apparatus to produce a machine such that when executed by the processor of the computer or other programmable data processing apparatus , producing an apparatus for realizing the functions/actions specified in one or more blocks in the flowchart and/or block diagram. These computer-readable program instructions can also be stored in a computer-readable storage medium, and these instructions cause computers, programmable data processing devices and/or other devices to work in a specific way, so that the computer-readable medium storing instructions includes An article of manufacture comprising instructions for implementing various aspects of the functions/acts specified in one or more blocks in flowcharts and/or block diagrams.
也可以把计算机可读程序指令加载到计算机、其它可编程数据处理装置、或其它设备上,使得在计算机、其它可编程数据处理装置或其它设备上执行一系列操作步骤,以产生计算机实现的过程,从而使得在计算机、其它可编程数据处理装置、或其它设备上执行的指令实现流程图和/或框图中的一个或多个方框中规定的功能/动作。It is also possible to load computer-readable program instructions into a computer, other programmable data processing device, or other equipment, so that a series of operational steps are performed on the computer, other programmable data processing device, or other equipment to produce a computer-implemented process , so that instructions executed on computers, other programmable data processing devices, or other devices implement the functions/actions specified in one or more blocks in the flowcharts and/or block diagrams.
附图中的流程图和框图显示了根据本申请的多个实施例的装置、系统、方法和计算机程序产品的可能实现的体系架构、功能和操作。在这点上,流程图或框图中的每个方框可以代表一个模块、程序段或指令的一部分,所述模块、程序段或指令的一部分包含一个或多个用于实现规定的逻辑功能的可执行指令。在有些作为替换的实现中,方框中所标注的功能也可以以不同于附图中所标注的顺序发生。例如,两个连续的方框实际上可以基本并行地执行,它们有时也可以按相反的顺序执行,这依所涉及的功能而定。The flowchart and block diagrams in the figures show the architecture, functions and operations of possible implementations of apparatuses, systems, methods and computer program products according to various embodiments of the present application. In this regard, each block in a flowchart or block diagram may represent a module, a portion of a program segment, or an instruction that includes one or more Executable instructions. In some alternative implementations, the functions noted in the block may occur out of the order noted in the figures. For example, two blocks in succession may, in fact, be executed substantially concurrently, or they may sometimes be executed in the reverse order, depending upon the functionality involved.
也要注意的是,框图和/或流程图中的每个方框、以及框图和/或流程图中的方框的组合,可以用执行相应的功能或动作的硬件(例如电路或ASIC(Application Specific Integrated Circuit,专用集成电路))来实现,或者可以用硬件和软件的组合,如固件等来实现。It should also be noted that each block in the block diagrams and/or flowcharts, and combinations of blocks in the block diagrams and/or flowcharts, can be implemented with hardware (such as circuits or ASIC (Application Specific Integrated Circuit, application-specific integrated circuit)), or can be implemented with a combination of hardware and software, such as firmware.
尽管在此结合各实施例对本申请进行了描述,然而,在实施所要求保护的本申请过程中,本领域技术人员通过查看所述附图、公开内容、以及所附权利要求书,可理解并实现所述公开实施例的其它变化。在权利要求中,“包括”(comprising)一词不排除其他组成部分或步骤,“一”或“一个”不排除多个的情况。单个处理器或其它单元可以实现权利要求中列举的若干项功能。相互不同的从属权利要求中记载了某些措施,但这并不表示这些措施不能组合起来产生良好的效果。Although the present application has been described in conjunction with various embodiments here, however, in the process of implementing the claimed application, those skilled in the art can understand and Other variations of the disclosed embodiments are implemented. In the claims, the word "comprising" does not exclude other components or steps, and "a" or "an" does not exclude a plurality. A single processor or other unit may fulfill the functions of several items recited in the claims. The mere fact that certain measures are recited in mutually different dependent claims does not indicate that these measures cannot be combined to advantage.
以上已经描述了本申请的各实施例,上述说明是示例性的,并非穷尽性的,并且也不限于所披露的各实施例。在不偏离所说明的各实施例的范围和精神的情况下,对 于本技术领域的普通技术人员来说许多修改和变更都是显而易见的。本文中所用术语的选择,旨在最好地解释各实施例的原理、实际应用或对市场中的技术的改进,或者使本技术领域的其它普通技术人员能理解本文披露的各实施例。Having described various embodiments of the present application above, the foregoing description is exemplary, not exhaustive, and is not limited to the disclosed embodiments. Many modifications and alterations will be apparent to those of ordinary skill in the art without departing from the scope and spirit of the described embodiments. The terminology used herein is chosen to best explain the principle of each embodiment, practical application or improvement of technology in the market, or to enable other ordinary skilled in the art to understand each embodiment disclosed herein.

Claims (11)

  1. 一种图像处理方法,其特征在于,所述方法包括:An image processing method, characterized in that the method comprises:
    获取数据集和模型池,所述数据集包括多个样本图像,所述模型池包括预先训练完成的多个第一图像分类模型;Obtain a data set and a model pool, the data set includes a plurality of sample images, and the model pool includes a plurality of pre-trained first image classification models;
    将所述数据集分别输入至所述模型池的每个第一图像分类模型中,得到所述多个第一图像分类模型各自对应的预测分布结果,所述预测分布结果用于指示每个所述样本图像在所述数据集对应的多个图像标签上的概率分布;Input the data set into each first image classification model of the model pool respectively, and obtain prediction distribution results corresponding to each of the plurality of first image classification models, and the prediction distribution results are used to indicate that each of the first image classification models The probability distribution of the sample image on a plurality of image tags corresponding to the data set;
    根据所述多个预测分布结果,确定所述数据集中至少一个所述样本图像对应的样本分类难度,所述样本分类难度用于指示所述样本图像被模型分类的难易程度。A sample classification difficulty corresponding to at least one sample image in the data set is determined according to the plurality of prediction distribution results, and the sample classification difficulty is used to indicate the degree of difficulty for the sample image to be classified by a model.
  2. 根据权利要求1所述方法,其特征在于,所述根据所述多个预测分布结果,确定所述数据集中至少一个所述样本图像对应的样本分类难度,包括:The method according to claim 1, wherein the determination of the difficulty of classifying samples corresponding to at least one of the sample images in the data set according to the plurality of prediction distribution results includes:
    对于至少一个所述样本图像中的每个所述样本图像,根据所述多个预测分布结果和所述模型池中的所述第一图像分类模型的个数,确定所述样本图像的混淆困惑度,所述混淆困惑度用于指示所述样本图像对应的样本分类难度。For each of the at least one of the sample images, determining the confusion of the sample images based on the plurality of prediction distribution results and the number of the first image classification models in the model pool The degree of confusion is used to indicate the difficulty of classifying the sample corresponding to the sample image.
  3. 根据权利要求2所述方法,其特征在于,所述样本图像的混淆困惑度与各个所述第一图像分类模型对应的所述样本图像的预测分布结果的熵正相关,且与所述第一图像分类模型的个数负相关,所述第一图像分类模型对应的预测分布结果的熵是根据所述第一图像分类模型输出的所述样本图像在各个所述图像标签上的概率确定的。The method according to claim 2, wherein the confusion perplexity of the sample image is positively correlated with the entropy of the predicted distribution result of the sample image corresponding to each of the first image classification models, and is positively correlated with the first The number of image classification models is negatively correlated, and the entropy of the predicted distribution result corresponding to the first image classification model is determined according to the probability of the sample image output by the first image classification model on each of the image labels.
  4. 根据权利要求1至3任一所述方法,其特征在于,所述方法还包括:The method according to any one of claims 1 to 3, wherein the method further comprises:
    根据所述多个预测分布结果,确定所述多个图像标签中任意两个所述图像标签间的标签区分难度,所述标签区分难度用于指示任意两个所述图像标签被模型区分的难易程度。According to the plurality of prediction distribution results, determine the label discrimination difficulty between any two image labels in the plurality of image labels, and the label discrimination difficulty is used to indicate the difficulty of any two image labels being distinguished by the model ease.
  5. 根据权利要求4至所述方法,其特征在于,所述根据所述多个预测分布结果,确定所述多个图像标签中任意两个所述图像标签间的标签区分难度,包括:According to the method according to claim 4, wherein, according to the plurality of prediction distribution results, determining the difficulty of label distinction between any two image labels in the plurality of image labels comprises:
    对于所述多个图像标签中任意两个所述图像标签,根据所述多个预测分布结果和所述模型池中的所述第一图像分类模型的个数,确定任意两个所述图像标签间的标签混淆指数,所述标签混淆指数用于指示所述任意两个图像标签间的标签区分难度。For any two of the plurality of image tags, according to the plurality of prediction distribution results and the number of the first image classification models in the model pool, determine any two of the image tags The label confusion index between them is used to indicate the difficulty of label distinction between any two image labels.
  6. 根据权利要求5所述方法,其特征在于,任意两个所述图像标签间的标签混淆指数是根据所述数据集在两个所述图像标签上的概率分布的相似度确定的。The method according to claim 5, wherein the label confusion index between any two image labels is determined according to the similarity of the probability distribution of the data set on the two image labels.
  7. 根据权利要求1至6任一所述方法,其特征在于,所述获取数据集和模型池之前,还包括:According to the method according to any one of claims 1 to 6, it is characterized in that, before the acquisition of data sets and model pools, it also includes:
    根据训练集对原始的第二图像分类模型进行训练得到所述第一图像分类模型,所述训练集包括多个训练图像;training the original second image classification model according to the training set to obtain the first image classification model, the training set including a plurality of training images;
    其中,所述模型池中存在至少两个所述第一图像分类模型各自对应的模型训练参数是不同的,所述模型训练参数包括所述训练集、所述第二图像分类模型的类型和模型训练时长中的至少一个。Wherein, there are at least two of the first image classification models in the model pool, and the corresponding model training parameters are different, and the model training parameters include the training set, the type and model of the second image classification model At least one of the training durations.
  8. 根据权利要求1至6任一所述方法,其特征在于,所述方法还包括:The method according to any one of claims 1 to 6, wherein the method further comprises:
    根据所述数据集中的所述多个样本图像各自对应的样本分类难度和预设的样本分类难度阈值,对所述数据集进行筛选;Filtering the data set according to the sample classification difficulty corresponding to each of the plurality of sample images in the data set and a preset sample classification difficulty threshold;
    对于筛选后的数据集中的每个所述样本图像,根据所述模型池最大比例输出的目标图像标签,对所述样本图像的图像标签进行修正。For each sample image in the filtered data set, the image label of the sample image is corrected according to the target image label output by the maximum proportion of the model pool.
  9. 根据权利要求4至6任一所述方法,其特征在于,所述方法还包括:The method according to any one of claims 4 to 6, wherein the method further comprises:
    根据所述数据集中任意两个所述图像标签间的标签区分难度构建第三图像分类模型,所述第三图像分类模型为多级分类模型;Constructing a third image classification model according to the label discrimination difficulty between any two of the image tags in the data set, the third image classification model is a multi-level classification model;
    根据训练集对所述第三图像分类模型进行训练得到第四图像分类模型;training the third image classification model according to the training set to obtain a fourth image classification model;
    在获取到待分类的目标图像的情况下,调用所述第四图像分类模型对所述目标图像进行分类处理,得到所述目标图像的分类结果。When the target image to be classified is obtained, the fourth image classification model is invoked to perform classification processing on the target image to obtain a classification result of the target image.
  10. 一种电子设备,其特征在于,所述电子设备包括:An electronic device, characterized in that the electronic device comprises:
    处理器;processor;
    用于存储处理器可执行指令的存储器;memory for storing processor-executable instructions;
    其中,所述处理器被配置为执行所述指令时实现权利要求1-9任意一项所述的方法。Wherein, the processor is configured to implement the method of any one of claims 1-9 when executing the instructions.
  11. 一种非易失性计算机可读存储介质,其上存储有计算机程序指令,其特征在于,所述计算机程序指令被处理器执行时实现权利要求1-9中任意一项所述的方法。A non-volatile computer-readable storage medium, on which computer program instructions are stored, wherein, when the computer program instructions are executed by a processor, the method according to any one of claims 1-9 is implemented.
PCT/CN2022/104184 2021-07-07 2022-07-06 Image processing method, electronic device, and storage medium WO2023280229A1 (en)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN202110767141.6 2021-07-07
CN202110767141.6A CN115661502A (en) 2021-07-07 2021-07-07 Image processing method, electronic device, and storage medium

Publications (1)

Publication Number Publication Date
WO2023280229A1 true WO2023280229A1 (en) 2023-01-12

Family

ID=84801318

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2022/104184 WO2023280229A1 (en) 2021-07-07 2022-07-06 Image processing method, electronic device, and storage medium

Country Status (2)

Country Link
CN (1) CN115661502A (en)
WO (1) WO2023280229A1 (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN116483881A (en) * 2023-04-26 2023-07-25 北京远舢智能科技有限公司 Data sampling method, device, electronic equipment and medium based on pull Ding Chao cube

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109948683A (en) * 2019-03-12 2019-06-28 百度在线网络技术(北京)有限公司 Difficulty division methods, device and its relevant device of point cloud data
CN111598168A (en) * 2020-05-18 2020-08-28 腾讯科技(深圳)有限公司 Image classification method, device, computer equipment and medium
US20200342328A1 (en) * 2019-04-26 2020-10-29 Naver Corporation Training a convolutional neural network for image retrieval with a listwise ranking loss function
CN112101162A (en) * 2020-09-04 2020-12-18 沈阳东软智能医疗科技研究院有限公司 Image recognition model generation method and device, storage medium and electronic equipment
CN113569894A (en) * 2021-02-09 2021-10-29 腾讯科技(深圳)有限公司 Training method of image classification model, image classification method, device and equipment

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109948683A (en) * 2019-03-12 2019-06-28 百度在线网络技术(北京)有限公司 Difficulty division methods, device and its relevant device of point cloud data
US20200342328A1 (en) * 2019-04-26 2020-10-29 Naver Corporation Training a convolutional neural network for image retrieval with a listwise ranking loss function
CN111598168A (en) * 2020-05-18 2020-08-28 腾讯科技(深圳)有限公司 Image classification method, device, computer equipment and medium
CN112101162A (en) * 2020-09-04 2020-12-18 沈阳东软智能医疗科技研究院有限公司 Image recognition model generation method and device, storage medium and electronic equipment
CN113569894A (en) * 2021-02-09 2021-10-29 腾讯科技(深圳)有限公司 Training method of image classification model, image classification method, device and equipment

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN116483881A (en) * 2023-04-26 2023-07-25 北京远舢智能科技有限公司 Data sampling method, device, electronic equipment and medium based on pull Ding Chao cube
CN116483881B (en) * 2023-04-26 2024-05-03 北京远舢智能科技有限公司 Data sampling method and device based on pull Ding Chao cube, electronic equipment and medium

Also Published As

Publication number Publication date
CN115661502A (en) 2023-01-31

Similar Documents

Publication Publication Date Title
CN111476284B (en) Image recognition model training and image recognition method and device and electronic equipment
US11062090B2 (en) Method and apparatus for mining general text content, server, and storage medium
US20190266434A1 (en) Method and device for extracting information from pie chart
CN111767366B (en) Question and answer resource mining method and device, computer equipment and storage medium
CN108171203B (en) Method and device for identifying vehicle
JP2020522077A (en) Acquisition of image features
CN110472675B (en) Image classification method, image classification device, storage medium and electronic equipment
JP5214760B2 (en) Learning apparatus, method and program
CN109919252B (en) Method for generating classifier by using few labeled images
CN112149754B (en) Information classification method, device, equipment and storage medium
CN112883990A (en) Data classification method and device, computer storage medium and electronic equipment
CN114330588A (en) Picture classification method, picture classification model training method and related device
CN112420125A (en) Molecular attribute prediction method and device, intelligent equipment and terminal
CN115187772A (en) Training method, device and equipment of target detection network and target detection method, device and equipment
WO2023280229A1 (en) Image processing method, electronic device, and storage medium
JP2012048624A (en) Learning device, method and program
CN116152576B (en) Image processing method, device, equipment and storage medium
CN114943674A (en) Defect detection method, electronic device and storage medium
CN110059743B (en) Method, apparatus and storage medium for determining a predicted reliability metric
CN115757112A (en) Test subset construction method based on variation analysis and related equipment
AU2021251463B2 (en) Generating performance predictions with uncertainty intervals
US11514311B2 (en) Automated data slicing based on an artificial neural network
CN113010687B (en) Exercise label prediction method and device, storage medium and computer equipment
CN114022698A (en) Multi-tag behavior identification method and device based on binary tree structure
CN114238968A (en) Application program detection method and device, storage medium and electronic equipment

Legal Events

Date Code Title Description
NENP Non-entry into the national phase

Ref country code: DE