WO2022188080A1 - Image classification network model training method, image classification method, and related device - Google Patents

Image classification network model training method, image classification method, and related device Download PDF

Info

Publication number
WO2022188080A1
WO2022188080A1 PCT/CN2021/080087 CN2021080087W WO2022188080A1 WO 2022188080 A1 WO2022188080 A1 WO 2022188080A1 CN 2021080087 W CN2021080087 W CN 2021080087W WO 2022188080 A1 WO2022188080 A1 WO 2022188080A1
Authority
WO
WIPO (PCT)
Prior art keywords
label
class
category
image classification
image
Prior art date
Application number
PCT/CN2021/080087
Other languages
French (fr)
Chinese (zh)
Inventor
王蕊
童学智
曲强
姜青山
Original Assignee
深圳先进技术研究院
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 深圳先进技术研究院 filed Critical 深圳先进技术研究院
Publication of WO2022188080A1 publication Critical patent/WO2022188080A1/en

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/24Classification techniques
    • G06F18/241Classification techniques relating to the classification model, e.g. parametric or non-parametric approaches
    • G06F18/2415Classification techniques relating to the classification model, e.g. parametric or non-parametric approaches based on parametric or probabilistic models, e.g. based on likelihood ratio or false acceptance rate versus a false rejection rate

Definitions

  • the present application relates to the technical field of image processing, and in particular, to a training method of an image classification network model, an image classification method and related equipment.
  • Image classification is one of the most basic problems in the field of image processing technology.
  • a deep neural network image classification method is mainly used, and specifically, the image to be classified and the category label of the to-be-classified image are input into the deep neural network model to train the deep neural network model.
  • the predicted classification label of the image to be classified outputted by the deep neural network model obtained in the above manner may be wrong and unexplainable.
  • the present application provides a training method for an image classification network model, an image classification method and related equipment.
  • the present application provides a training method of an image classification network model, the method comprising:
  • the external knowledge base including the true category label of the training image
  • the image classification network model is trained based on the target loss function.
  • the present application provides an image classification method, the image classification method includes:
  • the class labels of the images to be classified are evaluated to obtain an interpretability score.
  • the present application provides a terminal device, the device includes a memory and a processor coupled to the memory;
  • the memory is used to store program data
  • the processor is used to execute the program data to implement the above-mentioned training method for an image classification network model and/or the above-mentioned image classification method.
  • the present application also provides a computer storage medium, which is used for storing program data.
  • a computer storage medium which is used for storing program data.
  • the beneficial effects of the present application are: acquiring training images and an external knowledge base, where the external knowledge base includes the real category labels of the training images; encoding the external knowledge base to obtain a category distance matrix; combining the training images and their real category labels and category distances
  • the matrix is input to the image classification network model, and the predicted class probability distribution of the training image is obtained, wherein the predicted class probability distribution includes the difference probability between the predicted class label output by the image classification network model and the real class label; the real class label in the class distance matrix is used.
  • Calculate the target loss function based on the depth distance from the predicted class label and the predicted class probability distribution; train the network model based on the target loss function.
  • This application cites an external knowledge base to constrain the predicted category probability distribution output by the image classification network model, which improves the accuracy of image classification and enhances the interpretability of the predicted results.
  • FIG. 1 is a schematic flowchart of an embodiment of a training method for an image classification network model provided by the present application
  • FIG. 2 is a simple schematic diagram of an external knowledge base in the training method of the image classification network model provided by the application;
  • FIG. 3 is a schematic flowchart of an embodiment of S102 in the training method of the image classification network model shown in FIG. 1;
  • FIG. 4 is a schematic flowchart of an embodiment of S104 in the training method of the image classification network model shown in FIG. 1;
  • FIG. 5 is a schematic flowchart of an embodiment of an image classification method provided by the present application.
  • FIG. 6 is a schematic structural diagram of an embodiment of a terminal device provided by the present application.
  • FIG. 7 is a schematic structural diagram of an embodiment of a computer storage medium provided by the present application.
  • FIG. 1 is a schematic flowchart of an embodiment of the training method for an image classification network model provided by the present application.
  • the training method of the image classification network model in this embodiment can be applied to an image classification apparatus.
  • the image classification apparatus of the present application may be a server, a mobile device, or a system in which a server and a mobile device cooperate with each other.
  • each part included in the mobile device such as each unit, subunit, module, and submodule, may all be provided in the server, or in the mobile device, or in the server and the mobile device, respectively.
  • the above server may be hardware or software.
  • the server When the server is hardware, it can be implemented as a distributed server cluster composed of multiple servers, or can be implemented as a single server.
  • the server When the server is software, it can be implemented as multiple software or software modules, such as software or software modules for providing distributed servers, or can be implemented as a single software or software module, which is not specifically limited here.
  • S101 Acquire training images and an external knowledge base, where the external knowledge base includes ground-truth class labels of the training images.
  • the image classification network model is trained by simply using the training image and the real class label of the training image, the predicted classification label output by the obtained image classification network model has the possibility of error, and there is a possibility of error. inexplicability.
  • the image classification apparatus of the present application refers to an external knowledge base to constrain the prediction results of the image classification network model.
  • FIG. 2 is a simple schematic diagram of an external knowledge base in the training method of the image classification network model provided by the present application.
  • the external knowledge base is a tree-like structure composed of multiple category labels, each node in the tree-like structure represents a category label, and the closer the position in the tree-like structure is, the more similar the category labels are between nodes.
  • the external knowledge base in this embodiment should include all the class labels that the image classification network model can distinguish.
  • the external knowledge base at least includes the real category labels of the training images.
  • a single external knowledge base may not be able to include ground-truth class labels for all training images.
  • the training method of the image classification network model of this embodiment can supplement the missing category labels in the single external knowledge base by manually extracting the category labels from the additional knowledge base.
  • the number of training images required in this embodiment should be as large as possible.
  • the number of training images is at least 1000.
  • the image classification apparatus of this embodiment should unify the pixel size of the training images, for example, uniformly scale them to 256 ⁇ 256, so that it is convenient to use the training images of the same pixel size to compare the images.
  • the classification network model is trained.
  • the external knowledge base needs to be encoded to obtain a category distance matrix.
  • the category distance matrix includes the depth distance between any two category labels in the external knowledge base, that is, the semantic distance.
  • this embodiment may adopt the embodiment in FIG. 3 to implement S102, which specifically includes S201 to S203:
  • the image classification apparatus of this embodiment may know in advance a class distance matrix including the depth distance between the real class label and the predicted class label. Specifically, for the acquisition of the category distance matrix, the image classification device first needs to acquire any two category labels in the external knowledge base.
  • the image classification apparatus obtains the common class label between any two class labels, that is, the most recent common ancestor, in the external knowledge base.
  • the nearest common ancestor is the ancestor of one category label and the other category label in any two category labels, and the ancestor depth is as large as possible.
  • S203 Calculate the depth distance of any two class labels based on the common class label, so as to obtain a class distance matrix including the depth distance of any two class labels.
  • the image classification apparatus of this embodiment uses the common class label to calculate the depth distance of any two class labels, so as to obtain a class distance matrix including the depth distance of any two class labels.
  • the image classification device obtains the common class label, the depth of one class label and the depth of the other class label respectively; calculates the sum of the depth of one class label and the depth of the other class label; uses the common class label The ratio of the depth to the sum of the depth of one category label and the depth of the other category label above calculates the depth distance between any two category labels.
  • Wup Wi-Palmer semantic similarity
  • c 1 and c 2 are two category labels in the external knowledge base
  • depth(c 1 ) is the depth of the category label c 1
  • depth(c 2 ) is the depth of the category label c 2
  • lcs(c 1 , c 2 ) is the common class label of class label c 1 and class label c 2
  • depth(lcs(c 1 , c 2 )) is the depth of the common class label
  • d(c 1 , c 2 ) is the class label c 1 and the class Depth distance between labels c 2 .
  • the image classification apparatus of this embodiment obtains the number of layers of the common class label by locating the label position of the common class label in the external knowledge base, and uses the label position in the external knowledge base to determine the depth of the common class label.
  • the depth acquisition method of the category label c 1 and the category label c 2 refers to the depth acquisition method of the common category label, which will not be repeated here.
  • S103 Input the training image and its true category label and category distance matrix into an image classification network model to obtain the predicted category probability distribution of the training image.
  • the image classification apparatus of this embodiment inputs the training image, its real category label and category distance matrix into the image classification network model, and obtains the predicted category probability distribution of the training image.
  • the predicted class probability distribution includes the difference probability between the predicted class label output by the image classification network model and the real class label.
  • S104 Calculate the target loss function by using the depth distance between the true category label and the predicted category label in the category distance matrix and the predicted category probability distribution.
  • the loss function used in the existing image classification network model training method has a loss function value when the predicted class label of the training image is consistent with the real class label.
  • the loss function value is 0. Therefore, the existing loss function ignores the impact on the training of the image classification network model when the predicted class label of the training image is inconsistent with the real class label, resulting in the predicted class probability distribution output by the image classification network model inconsistent with common sense.
  • the image classification network model training method of the present embodiment expands the loss function, and takes into account the influence of the inconsistency between the predicted category label of the training image and the real category label on the image classification network model.
  • the image classification apparatus of this embodiment uses the depth distance between the real class label and the predicted class label in the class distance matrix and the predicted class probability distribution to calculate the target loss function.
  • this embodiment may adopt the embodiment of FIG. 4 to implement S104, which specifically includes S301 to S304:
  • the target loss function in the image classification network model in this embodiment includes a first loss function and a second loss function.
  • the first loss function and the second loss function respectively represent different aspects of the network model.
  • the first loss function represents the loss between the predicted class probability distribution output by the image classification network model and the preset class probability distribution when the predicted class of the training image is consistent with the real class.
  • the second loss function indicates that when the predicted category of the training image is inconsistent with the real category, the depth distance between the predicted category of the training image and the real category, that is, the semantic distance, is used to obtain the predicted category probability distribution and depth output by the image classification network model. loss between distances.
  • S302 Calculate a first loss function by using the predicted category probability distribution and the depth distance.
  • the image classification apparatus calculates a first loss function using the predicted class probability distribution and the depth distance.
  • p(k, l) is the predicted class probability output by the image classification network model.
  • I(k, l) is an indicator function.
  • the indicator function is 1 when the real class labels of l and k are consistent, and the indicator function is 0 when the real class labels of l and k are inconsistent.
  • the first loss function may be a cross-entropy loss function.
  • S303 Calculate the second loss function by using the predicted class probability distribution and the true class label.
  • the image classification apparatus of this embodiment extends the first loss function to constrain the prediction results of other category labels except the true category label. Specifically, the image classification apparatus calculates the second loss function using the predicted class probability distribution and the true class label.
  • the second loss function satisfies the following formula:
  • L Sem (k) is the second loss function
  • t k is the true class label corresponding to the training image
  • d(t k , l) is the class distance matrix including the depth distance between the predicted class label and the true class label
  • p(k, l) is the predicted class probability output by the image classification network model.
  • S304 Calculate the target loss function based on the first loss function and the second loss function.
  • the image classification apparatus uses the first loss function and the second loss function to calculate the target loss function.
  • the objective loss function satisfies the following formula:
  • L(k) is the target loss function
  • is the weight coefficient, which is used to balance the first loss function and the second loss function to optimize the training of the image classification network model.
  • the image classification apparatus may use a grid search method to determine the weight coefficient ⁇ .
  • the image classification apparatus of this embodiment trains an image classification network model with an objective loss function. Specifically, the image classification apparatus of this embodiment can use the gradient descent technique to train the target loss function.
  • the image classification device refers to an external knowledge base to constrain the predicted category probability distribution output by the image classification network model, which improves the accuracy of image classification and enhances the interpretability of the predicted results; using the predicted category probability distribution and depth.
  • the distance calculation target loss function extends the existing loss function to avoid the fact that the existing loss function ignores the inconsistency between the predicted category label of the training image and the real category label, which causes the predicted category probability distribution output by the image classification network model to be inconsistent with common sense.
  • FIG. 5 is a schematic flowchart of an embodiment of an image classification method provided by the present application.
  • the image classification method in this embodiment can be applied to the image classification network model trained in the above-mentioned training method of the image classification network model, so as to improve the accuracy of image classification and the interpretability of prediction results.
  • the server used for the image classification method as an example below, the image classification method provided by the present application is introduced.
  • the image classification method in this embodiment specifically includes the following steps:
  • S401 Acquire an image to be classified.
  • the acquisition of the image to be classified in this embodiment is similar to the acquisition of the training image in the above-mentioned embodiment S101, and details are not repeated here.
  • S402 Input the image to be classified into the image classification network model to obtain the class label of the image to be classified.
  • the image classification apparatus of this embodiment inputs the image to be classified into the image classification network model, and obtains the class label of the image to be classified.
  • S403 Evaluate the category labels of the images to be classified to obtain an interpretability score.
  • the image classification network model In order to improve the accuracy of image classification and enhance the interpretability of the prediction results, in this embodiment, it is necessary to evaluate the class labels of the images to be classified outputted by the image classification network model to obtain an interpretability score.
  • the images to be classified are input into the image classification network model, the class label ranking value and the class probability distribution are obtained, and the interpretability is calculated by using the class label ranking value including the first class label ranking value and the second class label ranking value. score.
  • the first category label ranking value is the difference probability ranking value between the category label of the image to be classified and the real category label of the image to be classified in the category probability distribution.
  • the second category label ranking value is the depth distance ranking value between the category label of the image to be classified and the real category label of the image to be classified in the category distance matrix.
  • r k, l is the category probability ranking value of the category probability that the image k to be classified belongs to the category label l in the category probability distribution.
  • s k, l is the ranking value of the depth distance between the class label l and the real class label t k of the image to be classified in the class distance matrix, that is, s k, l is the corresponding sort value.
  • label for category The ranking value of the depth distance between the real class label t k of the image to be classified in the class distance matrix, i.e. for the corresponding sort value.
  • the images to be classified are acquired, the images to be classified are input into the image classification network model, the class labels of the images to be classified are obtained, the class labels of the images to be classified are evaluated, and the interpretability score is obtained, so as to improve the accuracy of image classification. and enhance the interpretability of prediction results.
  • FIG. 6 is a schematic structural diagram of an embodiment of the terminal device provided by the present application.
  • the terminal device 600 includes a memory 61 and a processor 62, wherein the memory 61 and the processor 62 are coupled.
  • the memory 61 is used for storing program data
  • the processor 62 is used for executing the program data to implement the image classification network model training method and/or the image classification method in the above-mentioned embodiments.
  • the processor 62 may also be referred to as a CPU (Central Processing Unit, central processing unit).
  • the processor 62 may be an integrated circuit chip with signal processing capability.
  • the processor 62 may also be a general purpose processor, digital signal processor (DSP), application specific integrated circuit (ASIC), field programmable gate array (FPGA) or other programmable logic device, discrete gate or transistor logic, discrete hardware components .
  • DSP digital signal processor
  • ASIC application specific integrated circuit
  • FPGA field programmable gate array
  • a general purpose processor may be a microprocessor or the processor 62 may be any conventional processor or the like.
  • the present application also provides a computer storage medium 700.
  • the computer storage medium 700 is used to store program data 71, and when the program data 71 is executed by the processor, it is used to realize the method described in the embodiment of the present application.
  • the image classification network model training method and/or the image classification method are used to realize the method described in the embodiment of the present application.
  • the methods involved in the embodiments of the image classification network model training method and/or the image classification method of the present application exist in the form of software functional units when implemented and are sold or used as independent products, and can be stored in the device, for example a computer-readable storage medium.
  • the technical solutions of the present application can be embodied in the form of software products in essence, or the parts that contribute to the prior art, or all or part of the technical solutions, and the computer software products are stored in a storage medium , including several instructions for causing a computer device (which may be a personal computer, a server, or a network device, etc.) or a processor to execute all or part of the steps of the methods described in the various embodiments of the present invention.
  • the aforementioned storage medium includes: U disk, mobile hard disk, Read-Only Memory (ROM, Read-Only Memory), Random Access Memory (RAM, Random Access Memory), magnetic disk or optical disk and other media that can store program codes .

Landscapes

  • Engineering & Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Theoretical Computer Science (AREA)
  • Data Mining & Analysis (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Artificial Intelligence (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Evolutionary Biology (AREA)
  • Evolutionary Computation (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Probability & Statistics with Applications (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)
  • Image Analysis (AREA)

Abstract

The present application provides an image classification network model training method, an image classification method, and a related device. The image classification network model training method comprises: acquiring a training image and an external knowledge base, wherein the external knowledge base comprises a true class label of the training image; encoding the external knowledge base to obtain a class distance matrix; inputting the training image and the true class label thereof and the class distance matrix into an image classification network model to obtain predicted class probability distribution of the training image, wherein the predicted class probability distribution comprises a difference probability between a predicted class label output by the image classification network model and the true class label; calculating a target loss function by using a depth distance, in the class distance matrix, between the true class label and the predicted class label and the predicted class probability distribution; and training the network model on the basis of the target loss function. The present application is used for obtaining an image classification network model for both improving image classification accuracy and enhancing prediction result interpretability.

Description

图像分类网络模型训练方法、图像分类方法及相关设备Image classification network model training method, image classification method and related equipment 【技术领域】【Technical field】
本申请涉及图像处理技术领域,特别是涉及一种图像分类网络模型的训练方法、图像分类方法及相关设备。The present application relates to the technical field of image processing, and in particular, to a training method of an image classification network model, an image classification method and related equipment.
【背景技术】【Background technique】
图像分类是图像处理技术领域最为基础的一类问题。现有技术中主要采用深度神经网络的图像分类方法,具体为将待分类图像及待分类图像的类别标签输入深度神经网络模型,以对深度神经网络模型进行训练。但上述方式所得深度神经网络模型输出的待分类图像的预测分类标签存在错误的可能性,且存在不可解释性。Image classification is one of the most basic problems in the field of image processing technology. In the prior art, a deep neural network image classification method is mainly used, and specifically, the image to be classified and the category label of the to-be-classified image are input into the deep neural network model to train the deep neural network model. However, the predicted classification label of the image to be classified outputted by the deep neural network model obtained in the above manner may be wrong and unexplainable.
【发明内容】[Content of the invention]
本申请提供了一种图像分类网络模型的训练方法、图像分类方法及相关设备。The present application provides a training method for an image classification network model, an image classification method and related equipment.
为解决上述技术问题,本申请提供了一种图像分类网络模型的训练方法,所述方法包括:In order to solve the above technical problems, the present application provides a training method of an image classification network model, the method comprising:
获取训练图像和外部知识库,所述外部知识库包括所述训练图像的真实类别标签;acquiring a training image and an external knowledge base, the external knowledge base including the true category label of the training image;
对所述外部知识库进行编码处理,得到类别距离矩阵;Encoding the external knowledge base to obtain a category distance matrix;
将所述训练图像及其真实类别标签和所述类别距离矩阵输入所述图像分类网络模型,得到所述训练图像的预测类别概率分布,其中,所述预测类别概率分布包括所述图像分类网络模型输出的预测类别标签与所述真实类别标签之间的差距概率;Input the training image and its true category label and the category distance matrix into the image classification network model to obtain the predicted category probability distribution of the training image, wherein the predicted category probability distribution includes the image classification network model The gap probability between the output predicted class label and the true class label;
利用所述类别距离矩阵中所述真实类别标签与所述预测类别标签之间的深度距离以及所述预测类别概率分布计算目标损失函数;Calculate the target loss function by using the depth distance between the true class label and the predicted class label in the class distance matrix and the predicted class probability distribution;
基于所述目标损失函数训练所述图像分类网络模型。The image classification network model is trained based on the target loss function.
为解决上述技术问题,本申请提供了一种图像分类方法,所述图像分类方法包括:In order to solve the above technical problems, the present application provides an image classification method, the image classification method includes:
获取待分类图像;Get images to be classified;
将所述待分类图像输入到图像分类网络模型,得到所述待分类图像的类别标签,其中,所述图像分类网络模型为利用上述方法训练所得的图像分类网络模型;Inputting the to-be-classified image into an image classification network model to obtain a class label of the to-be-classified image, wherein the image classification network model is an image classification network model trained by the above method;
对所述待分类图像的类别标签进行评价,得到可解释性评分。The class labels of the images to be classified are evaluated to obtain an interpretability score.
为解决上述技术问题,本申请提供了一种终端设备,所述设备包括存储器 以及与所述存储器耦接的处理器;In order to solve the above-mentioned technical problems, the present application provides a terminal device, the device includes a memory and a processor coupled to the memory;
所述存储器用于存储程序数据,所述处理器用于执行所述程序数据以实现如上述的图像分类网络模型的训练方法和/或上述的图像分类方法。The memory is used to store program data, and the processor is used to execute the program data to implement the above-mentioned training method for an image classification network model and/or the above-mentioned image classification method.
为解决上述技术问题,本申请还提供了一种计算机存储介质,所述计算机存储介质用于存储程序数据,所述程序数据在被处理器执行时,用以实现如上述的图像分类网络模型的训练方法和/或上述的图像分类方法。In order to solve the above technical problems, the present application also provides a computer storage medium, which is used for storing program data. The training method and/or the image classification method described above.
本申请的有益效果是:获取训练图像和外部知识库,外部知识库包括训练图像的真实类别标签;对外部知识库进行编码处理,得到类别距离矩阵;将训练图像及其真实类别标签和类别距离矩阵输入图像分类网络模型,得到训练图像的预测类别概率分布,其中,预测类别概率分布包括图像分类网络模型输出的预测类别标签与真实类别标签之间的差距概率;利用类别距离矩阵中真实类别标签与预测类别标签之间的深度距离以及预测类别概率分布计算目标损失函数;基于目标损失函数训练网络模型。本申请引用外部知识库对图像分类网络模型输出的预测类别概率分布进行约束,兼顾提高了图像分类的准确性及增强了预测结果的可解释性。The beneficial effects of the present application are: acquiring training images and an external knowledge base, where the external knowledge base includes the real category labels of the training images; encoding the external knowledge base to obtain a category distance matrix; combining the training images and their real category labels and category distances The matrix is input to the image classification network model, and the predicted class probability distribution of the training image is obtained, wherein the predicted class probability distribution includes the difference probability between the predicted class label output by the image classification network model and the real class label; the real class label in the class distance matrix is used. Calculate the target loss function based on the depth distance from the predicted class label and the predicted class probability distribution; train the network model based on the target loss function. This application cites an external knowledge base to constrain the predicted category probability distribution output by the image classification network model, which improves the accuracy of image classification and enhances the interpretability of the predicted results.
【附图说明】【Description of drawings】
为了更清楚地说明本发明实施例中的技术方案,下面将对实施例描述中所需要使用的附图作简单地介绍,显而易见地,下面描述中的附图仅仅是本发明的一些实施例,对于本领域普通技术人员来讲,在不付出创造性劳动的前提下,还可以根据这些附图获得其他的附图。其中:In order to illustrate the technical solutions in the embodiments of the present invention more clearly, the following briefly introduces the accompanying drawings used in the description of the embodiments. Obviously, the accompanying drawings in the following description are only some embodiments of the present invention. For those of ordinary skill in the art, other drawings can also be obtained from these drawings without creative effort. in:
图1是本申请提供的图像分类网络模型的训练方法一实施例的流程示意图;1 is a schematic flowchart of an embodiment of a training method for an image classification network model provided by the present application;
图2是本申请提供的图像分类网络模型的训练方法中外部知识库的简易示意图;2 is a simple schematic diagram of an external knowledge base in the training method of the image classification network model provided by the application;
图3是图1所示的图像分类网络模型的训练方法中S102一实施例的流程示意图;3 is a schematic flowchart of an embodiment of S102 in the training method of the image classification network model shown in FIG. 1;
图4是图1所示的图像分类网络模型的训练方法中S104一实施例的流程示意图;4 is a schematic flowchart of an embodiment of S104 in the training method of the image classification network model shown in FIG. 1;
图5是本申请提供的图像分类方法的一实施例的流程示意图;5 is a schematic flowchart of an embodiment of an image classification method provided by the present application;
图6是本申请提供的终端设备一实施例的结构示意图;6 is a schematic structural diagram of an embodiment of a terminal device provided by the present application;
图7是本申请提供的计算机存储介质一实施例的结构示意图。FIG. 7 is a schematic structural diagram of an embodiment of a computer storage medium provided by the present application.
【具体实施方式】【Detailed ways】
下面将结合本申请实施例中的附图,对本申请实施例中的技术方案进行清楚、完整地描述,显然,所描述的实施例仅是本申请的一部分实施例,而不是全部的实施例。基于本申请中的实施例,本领域普通技术人员在没有做出创造 性劳动前提下所获得的所有其他实施例,都属于本申请保护的范围。The technical solutions in the embodiments of the present application will be clearly and completely described below with reference to the drawings in the embodiments of the present application. Obviously, the described embodiments are only a part of the embodiments of the present application, but not all of the embodiments. Based on the embodiments in this application, all other embodiments obtained by those of ordinary skill in the art without making creative efforts shall fall within the scope of protection of this application.
本申请提出了一种图像分类网络模型的训练方法,具体请参阅图1,图1是本申请提供的图像分类网络模型的训练方法一实施例的流程示意图。本实施例中图像分类网络模型的训练方法可以应用于图像分类装置,本申请的图像分类装置可以为服务器,也可以为移动设备,还可以为由服务器和移动设备相互配合的系统。相应地,移动设备包括的各个部分,例如各个单元、子单元、模块、子模块可以全部设置于服务器中,也可以全部设置于移动设备中,还可以分别设置于服务器和移动设备中。The present application proposes a training method for an image classification network model. Please refer to FIG. 1 for details. FIG. 1 is a schematic flowchart of an embodiment of the training method for an image classification network model provided by the present application. The training method of the image classification network model in this embodiment can be applied to an image classification apparatus. The image classification apparatus of the present application may be a server, a mobile device, or a system in which a server and a mobile device cooperate with each other. Correspondingly, each part included in the mobile device, such as each unit, subunit, module, and submodule, may all be provided in the server, or in the mobile device, or in the server and the mobile device, respectively.
进一步地,上述服务器可以是硬件,也可以是软件。当服务器为硬件时,可以实现成多个服务器组成的分布式服务器集群,也可以实现成单个服务器。当服务器为软件时,可以实现成多个软件或软件模块,例如用来提供分布式服务器的软件或软件模块,也可以实现成单个软件或软件模块,在此不做具体限定。Further, the above server may be hardware or software. When the server is hardware, it can be implemented as a distributed server cluster composed of multiple servers, or can be implemented as a single server. When the server is software, it can be implemented as multiple software or software modules, such as software or software modules for providing distributed servers, or can be implemented as a single software or software module, which is not specifically limited here.
本实施例的图像分类网络模型的训练方法具体包括以下步骤:The training method of the image classification network model of this embodiment specifically includes the following steps:
S101:获取训练图像和外部知识库,外部知识库包括训练图像的真实类别标签。S101: Acquire training images and an external knowledge base, where the external knowledge base includes ground-truth class labels of the training images.
本公开实施例中,考虑到现有技术中单纯利用训练图像和训练图像的真实类别标签对图像分类网络模型进行训练,所得的图像分类网络模型输出的预测分类标签存在错误的可能性,并且存在不可解释性。为避免上述问题,本申请的图像分类装置引用外部知识库对图像分类网络模型的预测结果进行约束。In the embodiment of the present disclosure, considering that in the prior art, the image classification network model is trained by simply using the training image and the real class label of the training image, the predicted classification label output by the obtained image classification network model has the possibility of error, and there is a possibility of error. inexplicability. In order to avoid the above problems, the image classification apparatus of the present application refers to an external knowledge base to constrain the prediction results of the image classification network model.
可参阅图2,图2是本申请提供的图像分类网络模型的训练方法中外部知识库的简易示意图。由图可知,外部知识库为由多个类别标签构成的树状结构,树状结构中的每个节点表示类别标签,且树状结构中位置越接近的节点之间的类别标签越相似。为了利用外部知识库对图像分类网络模型输出的预测类别概率分布进行约束,本实施例的外部知识库应包括图像分类网络模型能够区分的所有类别标签。进一步地,在对图像分类网络模型进行训练的过程中,外部知识库至少包括训练图像的真实类别标签。Please refer to FIG. 2 , which is a simple schematic diagram of an external knowledge base in the training method of the image classification network model provided by the present application. It can be seen from the figure that the external knowledge base is a tree-like structure composed of multiple category labels, each node in the tree-like structure represents a category label, and the closer the position in the tree-like structure is, the more similar the category labels are between nodes. In order to use the external knowledge base to constrain the predicted class probability distribution output by the image classification network model, the external knowledge base in this embodiment should include all the class labels that the image classification network model can distinguish. Further, in the process of training the image classification network model, the external knowledge base at least includes the real category labels of the training images.
进一步地,考虑到训练图像的多类性,单一的外部知识库可能无法包括所有训练图像的真实类别标签。为此,本实施例的图像分类网络模型的训练方法可通过人工从额外的知识库中提取类别标签对单一外部知识库中的缺失类别标签进行补充。Further, considering the multi-class nature of training images, a single external knowledge base may not be able to include ground-truth class labels for all training images. To this end, the training method of the image classification network model of this embodiment can supplement the missing category labels in the single external knowledge base by manually extracting the category labels from the additional knowledge base.
考虑到训练图像的数量对图像分类网络模型输出的预测结果的影响,本实施例所需训练图像的数量应尽可能地多。在具体实施例中,训练图像的数量至少为1000张。Considering the influence of the number of training images on the prediction result output by the image classification network model, the number of training images required in this embodiment should be as large as possible. In a specific embodiment, the number of training images is at least 1000.
需要说明的是,本实施例的图像分类装置在利用训练图像对图像分类网络模型进行训练前,应统一训练图像的像素大小,例如,统一缩放为256x256,方 便利用相同像素大小的训练图像对图像分类网络模型进行训练。It should be noted that, before using the training images to train the image classification network model, the image classification apparatus of this embodiment should unify the pixel size of the training images, for example, uniformly scale them to 256×256, so that it is convenient to use the training images of the same pixel size to compare the images. The classification network model is trained.
S102:对外部知识库进行编码处理,得到类别距离矩阵。S102: Encode the external knowledge base to obtain a category distance matrix.
可继续参阅图2,为了利用外部知识库中类别标签之间的深度距离对网络模型输出的预测类别概率分布进行约束。本实施例需对外部知识库进行编码处理,得到类别距离矩阵。其中,类别距离矩阵包括外部知识库中任意两类别标签之间的深度距离,也即语义距离。Continuing to refer to Figure 2, in order to use the depth distance between the category labels in the external knowledge base to constrain the predicted category probability distribution output by the network model. In this embodiment, the external knowledge base needs to be encoded to obtain a category distance matrix. Among them, the category distance matrix includes the depth distance between any two category labels in the external knowledge base, that is, the semantic distance.
可选地,本实施例可采用图3实施例实现S102,具体包括S201至S203:Optionally, this embodiment may adopt the embodiment in FIG. 3 to implement S102, which specifically includes S201 to S203:
S201:获取外部知识库中任意两个类别标签。S201: Obtain any two category labels in the external knowledge base.
为了方便从类别距离矩阵中获取真实类别标签和预测类别标签之间的深度距离,本实施例的图像分类装置可以预先获知包括真实类别标签和预测类别标签之间的深度距离的类别距离矩阵。具体地,对于类别距离矩阵的获取,图像分类装置首先需获取外部知识库中任意两个类别标签。In order to conveniently obtain the depth distance between the real class label and the predicted class label from the class distance matrix, the image classification apparatus of this embodiment may know in advance a class distance matrix including the depth distance between the real class label and the predicted class label. Specifically, for the acquisition of the category distance matrix, the image classification device first needs to acquire any two category labels in the external knowledge base.
S202:获取任意两个类别标签之间的公共类别标签。S202: Obtain a common class label between any two class labels.
进一步地,图像分类装置在外部知识库中获取任意两个类别标签之间的公共类别标签,也即最近公共祖先。其中,最近公共祖先为任意两个类别标签中一类别标签和另一类别标签的祖先且该祖先深度尽可能大。Further, the image classification apparatus obtains the common class label between any two class labels, that is, the most recent common ancestor, in the external knowledge base. Among them, the nearest common ancestor is the ancestor of one category label and the other category label in any two category labels, and the ancestor depth is as large as possible.
S203:基于公共类别标签计算任意两个类别标签的深度距离,以得到包括任意两个类别标签的深度距离的类别距离矩阵。S203: Calculate the depth distance of any two class labels based on the common class label, so as to obtain a class distance matrix including the depth distance of any two class labels.
其中,本实施例的图像分类装置利用公共类别标签计算任意两个类别标签的深度距离,以得到包括任意两个类别标签的深度距离的类别距离矩阵。Wherein, the image classification apparatus of this embodiment uses the common class label to calculate the depth distance of any two class labels, so as to obtain a class distance matrix including the depth distance of any two class labels.
具体地,图像分类装置分别获取公共类别标签、两个类别标签中一类别标签的深度和另一类别标签的深度;计算一类别标签的深度与另一类别标签的深度之和;利用公共类别标签的深度与上述一类别标签的深度与另一类别标签的深度之和的比值计算这任意两个类别标签的深度距离。Specifically, the image classification device obtains the common class label, the depth of one class label and the depth of the other class label respectively; calculates the sum of the depth of one class label and the depth of the other class label; uses the common class label The ratio of the depth to the sum of the depth of one category label and the depth of the other category label above calculates the depth distance between any two category labels.
其中,采用Wup(Wu-Palmer)语义相似度计算外部知识库中任意两个类别标签之间的深度距离。深度距离的具体计算公式如下:Among them, Wup (Wu-Palmer) semantic similarity is used to calculate the depth distance between any two category labels in the external knowledge base. The specific calculation formula of the depth distance is as follows:
Figure PCTCN2021080087-appb-000001
Figure PCTCN2021080087-appb-000001
其中,c 1及c 2为外部知识库中的两个类别标签,depth(c 1)为类别标签c 1的深度,depth(c 2)为类别标签c 2的深度,lcs(c 1,c 2)为类别标签c 1和类别标签c 2的公共类别标签,depth(lcs(c 1,c 2))为公共类别标签的深度,d(c 1,c 2)为类别标签c 1和类别标签c 2之间的深度距离。 Among them, c 1 and c 2 are two category labels in the external knowledge base, depth(c 1 ) is the depth of the category label c 1 , depth(c 2 ) is the depth of the category label c 2 , lcs(c 1 , c 2 ) is the common class label of class label c 1 and class label c 2 , depth(lcs(c 1 , c 2 )) is the depth of the common class label, d(c 1 , c 2 ) is the class label c 1 and the class Depth distance between labels c 2 .
进一步地,本实施例的图像分类装置通过定位公共类别标签在外部知识库中的标签位置,利用外部知识库中标签位置获取公共类别标签的层数,并以此确定公共类别标签的深度。在具体实施例中,类别标签c 1和类别标签c 2的深度获取方式参照公共类别标签的深度获取方式,在此不进行重复赘述。 Further, the image classification apparatus of this embodiment obtains the number of layers of the common class label by locating the label position of the common class label in the external knowledge base, and uses the label position in the external knowledge base to determine the depth of the common class label. In a specific embodiment, the depth acquisition method of the category label c 1 and the category label c 2 refers to the depth acquisition method of the common category label, which will not be repeated here.
S103:将训练图像及其真实类别标签和类别距离矩阵输入图像分类网络模型,得到训练图像的预测类别概率分布。S103: Input the training image and its true category label and category distance matrix into an image classification network model to obtain the predicted category probability distribution of the training image.
本实施例的图像分类装置将训练图像及其真实类别标签和类别距离矩阵输入图像分类网络模型,得到训练图像的预测类别概率分布。其中,预测类别概率分布包括图像分类网络模型输出的预测类别标签与真实类别标签之间的差距概率。The image classification apparatus of this embodiment inputs the training image, its real category label and category distance matrix into the image classification network model, and obtains the predicted category probability distribution of the training image. Among them, the predicted class probability distribution includes the difference probability between the predicted class label output by the image classification network model and the real class label.
S104:利用类别距离矩阵中真实类别标签与预测类别标签之间的深度距离以及预测类别概率分布计算目标损失函数。S104: Calculate the target loss function by using the depth distance between the true category label and the predicted category label in the category distance matrix and the predicted category probability distribution.
由于现有图像分类网络模型训练方法中所用损失函数在训练图像的预测类别标签与真实类别标签一致时,存在损失函数值。在训练图像的预测类别标签与真实类别标签不一致时,损失函数值为0。所以,现有损失函数忽略了训练图像的预测类别标签与真实类别标签不一致的情况下对图像分类网络模型训练的影响,导致图像分类网络模型输出的预测类别概率分布与常识不符。为解决上述问题,本实施例的图像分类网络模型训练方法通过拓展损失函数,兼顾考虑训练图像的预测类别标签与真实类别标签不一致情况对图像分类网络模型的影响。具体地,本实施例的图像分类装置利用类别距离矩阵中真实类别标签与预测类别标签之间的深度距离以及预测类别概率分布计算目标损失函数。Because the loss function used in the existing image classification network model training method has a loss function value when the predicted class label of the training image is consistent with the real class label. When the predicted class label of the training image is inconsistent with the true class label, the loss function value is 0. Therefore, the existing loss function ignores the impact on the training of the image classification network model when the predicted class label of the training image is inconsistent with the real class label, resulting in the predicted class probability distribution output by the image classification network model inconsistent with common sense. In order to solve the above problem, the image classification network model training method of the present embodiment expands the loss function, and takes into account the influence of the inconsistency between the predicted category label of the training image and the real category label on the image classification network model. Specifically, the image classification apparatus of this embodiment uses the depth distance between the real class label and the predicted class label in the class distance matrix and the predicted class probability distribution to calculate the target loss function.
可选地,本实施例可采用图4实施例实现S104,具体包括S301至S304:Optionally, this embodiment may adopt the embodiment of FIG. 4 to implement S104, which specifically includes S301 to S304:
S301:获取类别距离矩阵中真实类别标签与预测类别标签之间的深度距离。S301: Obtain the depth distance between the true category label and the predicted category label in the category distance matrix.
由于本实施例拓展了现有图像分类网络模型训练方法中的损失函数,因此,本实施例图像分类网络模型中的目标损失函数包括第一损失函数和第二损失函数。第一损失函数和第二损失函数分别表征网络模型的不同方面特征。具体地,第一损失函数表征在训练图像的预测类别与真实类别之间一致性时,图像分类网络模型输出的预测类别概率分布与预设类别概率分布之间的损失。第二损失函数表征在训练图像的预测类别与真实类别不一致性时,利用训练图像的预测类别与真实类别之间的深度距离,即语义距离,获取图像分类网络模型输出的预测类别概率分布与深度距离之间的损失。Since this embodiment expands the loss function in the existing image classification network model training method, the target loss function in the image classification network model in this embodiment includes a first loss function and a second loss function. The first loss function and the second loss function respectively represent different aspects of the network model. Specifically, the first loss function represents the loss between the predicted class probability distribution output by the image classification network model and the preset class probability distribution when the predicted class of the training image is consistent with the real class. The second loss function indicates that when the predicted category of the training image is inconsistent with the real category, the depth distance between the predicted category of the training image and the real category, that is, the semantic distance, is used to obtain the predicted category probability distribution and depth output by the image classification network model. loss between distances.
S302:利用预测类别概率分布与深度距离计算第一损失函数。S302: Calculate a first loss function by using the predicted category probability distribution and the depth distance.
图像分类装置利用预测类别概率分布与深度距离计算第一损失函数。The image classification apparatus calculates a first loss function using the predicted class probability distribution and the depth distance.
Figure PCTCN2021080087-appb-000002
p(k,l)为图像分类网络模型输出的预测类别概率。I(k,l)为指示函数,当l与k的真实类别标签一致时指示函数为1,当l与k的真实类别标签不一致时指示函数为0。
Figure PCTCN2021080087-appb-000002
p(k, l) is the predicted class probability output by the image classification network model. I(k, l) is an indicator function. The indicator function is 1 when the real class labels of l and k are consistent, and the indicator function is 0 when the real class labels of l and k are inconsistent.
需要说明的是,在具体实施例中,第一损失函数可以为交叉熵损失函数。It should be noted that, in a specific embodiment, the first loss function may be a cross-entropy loss function.
S303:利用预测类别概率分布与真实类别标签计算第二损失函数。S303: Calculate the second loss function by using the predicted class probability distribution and the true class label.
基于S302的第一损失函数可知,当l与k的真实类别标签不一致时,指示函数为0,导致第一损失函数为0,图像分类网络模型输出的预测结果忽视了训练图像的真实类别标签与预测类别标签不一致的情况。为解决上述问题,本实施例的图像分类装置拓展第一损失函数,对除真实类别标签的其他类别标签的预测结果进行约束。具体地,图像分类装置利用预测类别概率分布与真实类别标签计算第二损失函数。Based on the first loss function in S302, it can be seen that when the real category labels of l and k are inconsistent, the indicator function is 0, resulting in the first loss function being 0, and the prediction result output by the image classification network model ignores the real category label and the training image. Predict cases where class labels are inconsistent. In order to solve the above problem, the image classification apparatus of this embodiment extends the first loss function to constrain the prediction results of other category labels except the true category label. Specifically, the image classification apparatus calculates the second loss function using the predicted class probability distribution and the true class label.
具体地,第二损失函数满足下式:Specifically, the second loss function satisfies the following formula:
Figure PCTCN2021080087-appb-000003
Figure PCTCN2021080087-appb-000003
其中,L Sem(k)为第二损失函数,t k为训练图像对应的真实类别标签,d(t k,l)为包括预测类别标签与真实类别标签之间的深度距离的类别距离矩阵,p(k,l)为图像分类网络模型输出的预测类别概率。 where L Sem (k) is the second loss function, t k is the true class label corresponding to the training image, d(t k , l) is the class distance matrix including the depth distance between the predicted class label and the true class label, p(k, l) is the predicted class probability output by the image classification network model.
S304:基于第一损失函数和第二损失函数计算目标损失函数。S304: Calculate the target loss function based on the first loss function and the second loss function.
其中,图像分类装置利用第一损失函数和第二损失函数计算目标损失函数。Wherein, the image classification apparatus uses the first loss function and the second loss function to calculate the target loss function.
具体地,目标损失函数满足下式:Specifically, the objective loss function satisfies the following formula:
Figure PCTCN2021080087-appb-000004
Figure PCTCN2021080087-appb-000004
其中,L(k)为目标损失函数,α为权重系数,用于平衡第一损失函数和第二损失函数以最优化图像分类网络模型的训练。Among them, L(k) is the target loss function, and α is the weight coefficient, which is used to balance the first loss function and the second loss function to optimize the training of the image classification network model.
在具体实施例中,图像分类装置可采用网格搜索法确定权重系数α。In a specific embodiment, the image classification apparatus may use a grid search method to determine the weight coefficient α.
S105:基于目标损失函数训练图像分类网络模型。S105: Train an image classification network model based on the target loss function.
本实施例的图像分类装置以目标损失函数训练图像分类网络模型。具体地,本实施例的图像分类装置可利用梯度下降技术对目标损失函数进行训练。The image classification apparatus of this embodiment trains an image classification network model with an objective loss function. Specifically, the image classification apparatus of this embodiment can use the gradient descent technique to train the target loss function.
上述方案中,图像分类装置引用外部知识库对图像分类网络模型输出的预测类别概率分布进行约束,兼顾提高了图像分类的准确性及增强了预测结果的可解释性;利用预测类别概率分布与深度距离计算目标损失函数,扩展了现有损失函数,避免因现有损失函数忽视训练图像的预测类别标签与真实类别标签不一致的情况,导致图像分类网络模型输出的预测类别概率分布与常识不符。In the above solution, the image classification device refers to an external knowledge base to constrain the predicted category probability distribution output by the image classification network model, which improves the accuracy of image classification and enhances the interpretability of the predicted results; using the predicted category probability distribution and depth. The distance calculation target loss function extends the existing loss function to avoid the fact that the existing loss function ignores the inconsistency between the predicted category label of the training image and the real category label, which causes the predicted category probability distribution output by the image classification network model to be inconsistent with common sense.
可参阅图5,图5是本申请提供的图像分类方法的一实施例的流程示意图。本实施例图像分类方法可应用于上述图像分类网络模型的训练方法中训练所得的图像分类网络模型,从而兼顾提高图像分类的准确性及预测结果的可解释性。下面以用于图像分类方法的服务器为例,介绍本申请提供的图像分类方法,本实施例图像分类方法具体包括以下步骤:Please refer to FIG. 5 , which is a schematic flowchart of an embodiment of an image classification method provided by the present application. The image classification method in this embodiment can be applied to the image classification network model trained in the above-mentioned training method of the image classification network model, so as to improve the accuracy of image classification and the interpretability of prediction results. Taking the server used for the image classification method as an example below, the image classification method provided by the present application is introduced. The image classification method in this embodiment specifically includes the following steps:
S401:获取待分类图像。S401: Acquire an image to be classified.
本实施例获取待分类图像与上述实施例S101中训练图像获取相似,在此不 再赘述。The acquisition of the image to be classified in this embodiment is similar to the acquisition of the training image in the above-mentioned embodiment S101, and details are not repeated here.
S402:将待分类图像输入到图像分类网络模型,得到待分类图像的类别标签。S402: Input the image to be classified into the image classification network model to obtain the class label of the image to be classified.
其中,本实施例的图像分类装置将待分类图像输入到图像分类网络模型中,得到待分类图像的类别标签。Wherein, the image classification apparatus of this embodiment inputs the image to be classified into the image classification network model, and obtains the class label of the image to be classified.
S403:对待分类图像的类别标签进行评价,得到可解释性评分。S403: Evaluate the category labels of the images to be classified to obtain an interpretability score.
为了兼顾提高图像分类准确性及增强预测结果可解释性,本实施例需对图像分类网络模型输出的待分类图像的类别标签进行评价,得到可解释性评分。具体地,本实施例将待分类图像输入图像分类网络模型,得到类别标签排序值及类别概率分布,利用包括第一类别标签排序值和第二类别标签排序值的类别标签排序值计算可解释性评分。In order to improve the accuracy of image classification and enhance the interpretability of the prediction results, in this embodiment, it is necessary to evaluate the class labels of the images to be classified outputted by the image classification network model to obtain an interpretability score. Specifically, in this embodiment, the images to be classified are input into the image classification network model, the class label ranking value and the class probability distribution are obtained, and the interpretability is calculated by using the class label ranking value including the first class label ranking value and the second class label ranking value. score.
其中,第一类别标签排序值为类别概率分布中待分类图像的类别标签与待分类图像的真实类别标签之间的差距概率排序值。第二类别标签排序值为类别距离矩阵中待分类图像的类别标签与待分类图像的真实类别标签之间的深度距离排序值。Wherein, the first category label ranking value is the difference probability ranking value between the category label of the image to be classified and the real category label of the image to be classified in the category probability distribution. The second category label ranking value is the depth distance ranking value between the category label of the image to be classified and the real category label of the image to be classified in the category distance matrix.
其中,可解释性评价满足下式:Among them, the interpretability evaluation satisfies the following formula:
Figure PCTCN2021080087-appb-000005
Figure PCTCN2021080087-appb-000005
其中,r k,l为待分类图像k属于类别标签l的类别概率在类别概率分布中的类别概率排序值。
Figure PCTCN2021080087-appb-000006
为待分类图像k属于类别标签
Figure PCTCN2021080087-appb-000007
的类别概率在类别概率分布中的类别概率排序值。s k,l为类别标签l与待分类图像真实类别标签t k之间的深度距离在类别距离矩阵中的排序值,即s k,l
Figure PCTCN2021080087-appb-000008
对应的排序值。
Figure PCTCN2021080087-appb-000009
为类别标签
Figure PCTCN2021080087-appb-000010
与待分类图像真实类别标签t k之间的深度距离在类别距离矩阵中的排序值,即
Figure PCTCN2021080087-appb-000011
Figure PCTCN2021080087-appb-000012
对应的排序值。
Among them, r k, l is the category probability ranking value of the category probability that the image k to be classified belongs to the category label l in the category probability distribution.
Figure PCTCN2021080087-appb-000006
For the image to be classified k belongs to the class label
Figure PCTCN2021080087-appb-000007
The class probability ranking value of the class probability in the class probability distribution. s k, l is the ranking value of the depth distance between the class label l and the real class label t k of the image to be classified in the class distance matrix, that is, s k, l is
Figure PCTCN2021080087-appb-000008
the corresponding sort value.
Figure PCTCN2021080087-appb-000009
label for category
Figure PCTCN2021080087-appb-000010
The ranking value of the depth distance between the real class label t k of the image to be classified in the class distance matrix, i.e.
Figure PCTCN2021080087-appb-000011
for
Figure PCTCN2021080087-appb-000012
the corresponding sort value.
本实施例,获取待分类图像,将待分类图像输入到图像分类网络模型,得到待分类图像的类别标签,对待分类图像的类别标签进行评价,得到可解释性评分,实现了兼顾提高图像分类准确性及增强预测结果可解释性。In this embodiment, the images to be classified are acquired, the images to be classified are input into the image classification network model, the class labels of the images to be classified are obtained, the class labels of the images to be classified are evaluated, and the interpretability score is obtained, so as to improve the accuracy of image classification. and enhance the interpretability of prediction results.
为实现上述实施例的图像分类网络模型训练方法和/或图像分类方法,本申请提出了一种终端设备,具体请参阅图6,图6是本申请提供的终端设备一实施例的结构示意图。In order to implement the image classification network model training method and/or the image classification method in the above embodiment, the present application proposes a terminal device. Please refer to FIG. 6 for details. FIG. 6 is a schematic structural diagram of an embodiment of the terminal device provided by the present application.
终端设备600包括存储器61和处理器62,其中,存储器61和处理器62耦接。The terminal device 600 includes a memory 61 and a processor 62, wherein the memory 61 and the processor 62 are coupled.
存储器61用于存储程序数据,处理器62用于执行程序数据以实现上述实施例的图像分类网络模型训练方法和/或图像分类方法。The memory 61 is used for storing program data, and the processor 62 is used for executing the program data to implement the image classification network model training method and/or the image classification method in the above-mentioned embodiments.
在本实施例中,处理器62还可以称为CPU(Central Processing Unit,中央 处理单元)。处理器62可能是一种集成电路芯片,具有信号的处理能力。处理器62还可以是通用处理器、数字信号处理器(DSP)、专用集成电路(ASIC)、现场可编程门阵列(FPGA)或者其他可编程逻辑器件、分立门或者晶体管逻辑器件、分立硬件组件。通用处理器可以是微处理器或者该处理器62也可以是任何常规的处理器等。In this embodiment, the processor 62 may also be referred to as a CPU (Central Processing Unit, central processing unit). The processor 62 may be an integrated circuit chip with signal processing capability. The processor 62 may also be a general purpose processor, digital signal processor (DSP), application specific integrated circuit (ASIC), field programmable gate array (FPGA) or other programmable logic device, discrete gate or transistor logic, discrete hardware components . A general purpose processor may be a microprocessor or the processor 62 may be any conventional processor or the like.
本申请还提供一种计算机存储介质700,如图7所示,计算机存储介质700用于存储程序数据71,程序数据71在被处理器执行时,用以实现如本申请方法实施例中所述的图像分类网络模型训练方法和/或图像分类方法。The present application also provides a computer storage medium 700. As shown in FIG. 7, the computer storage medium 700 is used to store program data 71, and when the program data 71 is executed by the processor, it is used to realize the method described in the embodiment of the present application. The image classification network model training method and/or the image classification method.
本申请图像分类网络模型训练方法和/或图像分类方法实施例中所涉及到的方法,在实现时以软件功能单元的形式存在并作为独立的产品销售或使用时,可以存储在装置中,例如一个计算机可读取存储介质中。基于这样的理解,本申请的技术方案本质上或者说对现有技术做出贡献的部分或者该技术方案的全部或部分可以以软件产品的形式体现出来,该计算机软件产品存储在一个存储介质中,包括若干指令用以使得一台计算机设备(可以是个人计算机,服务器,或者网络设备等)或处理器(processor)执行本发明各个实施方式所述方法的全部或部分步骤。而前述的存储介质包括:U盘、移动硬盘、只读存储器(ROM,Read-Only Memory)、随机存取存储器(RAM,Random Access Memory)、磁碟或者光盘等各种可以存储程序代码的介质。The methods involved in the embodiments of the image classification network model training method and/or the image classification method of the present application exist in the form of software functional units when implemented and are sold or used as independent products, and can be stored in the device, for example a computer-readable storage medium. Based on this understanding, the technical solutions of the present application can be embodied in the form of software products in essence, or the parts that contribute to the prior art, or all or part of the technical solutions, and the computer software products are stored in a storage medium , including several instructions for causing a computer device (which may be a personal computer, a server, or a network device, etc.) or a processor to execute all or part of the steps of the methods described in the various embodiments of the present invention. The aforementioned storage medium includes: U disk, mobile hard disk, Read-Only Memory (ROM, Read-Only Memory), Random Access Memory (RAM, Random Access Memory), magnetic disk or optical disk and other media that can store program codes .
以上所述仅为本申请的实施方式,并非因此限制本申请的专利范围,凡是利用本申请说明书及附图内容所作的等效结构或等效流程变换,或直接或间接运用在其他相关的技术领域,均同理包括在本申请的专利保护范围内。The above description is only an embodiment of the present application, and is not intended to limit the scope of the patent of the present application. Any equivalent structure or equivalent process transformation made by using the contents of the description and drawings of the present application, or directly or indirectly applied to other related technologies Fields are similarly included within the scope of patent protection of this application.

Claims (10)

  1. 一种图像分类网络模型的训练方法,其特征在于,所述方法包括:A training method for an image classification network model, characterized in that the method comprises:
    获取训练图像和外部知识库,所述外部知识库包括所述训练图像的真实类别标签;acquiring a training image and an external knowledge base, the external knowledge base including the true category label of the training image;
    对所述外部知识库进行编码处理,得到类别距离矩阵;Encoding the external knowledge base to obtain a category distance matrix;
    将所述训练图像及其真实类别标签和所述类别距离矩阵输入所述图像分类网络模型,得到所述训练图像的预测类别概率分布,其中,所述预测类别概率分布包括所述图像分类网络模型输出的预测类别标签与所述真实类别标签之间的差距概率;Input the training image and its true category label and the category distance matrix into the image classification network model to obtain the predicted category probability distribution of the training image, wherein the predicted category probability distribution includes the image classification network model The gap probability between the output predicted class label and the true class label;
    利用所述类别距离矩阵中所述真实类别标签与所述预测类别标签之间的深度距离以及所述预测类别概率分布计算目标损失函数;Calculate the target loss function by using the depth distance between the true class label and the predicted class label in the class distance matrix and the predicted class probability distribution;
    基于所述目标损失函数训练所述图像分类网络模型。The image classification network model is trained based on the target loss function.
  2. 根据权利要求1所述的训练方法,其特征在于,所述对所述外部知识库进行编码处理,得到类别距离矩阵的步骤,包括:The training method according to claim 1, wherein the step of encoding the external knowledge base to obtain a category distance matrix comprises:
    获取所述外部知识库中任意两个类别标签;Obtain any two category labels in the external knowledge base;
    获取所述任意两个类别标签之间的公共类别标签;Obtain the common class label between any two class labels;
    基于所述公共类别标签计算所述任意两个类别标签的深度距离,以得到包括所述任意两个类别标签的深度距离的类别距离矩阵。Calculate the depth distances of the any two class labels based on the common class labels to obtain a class distance matrix including the depth distances of the any two class labels.
  3. 根据权利要求2所述的训练方法,其特征在于,所述基于所述公共类别标签计算所述任意两个类别标签的深度距离的步骤,包括:The training method according to claim 2, wherein the step of calculating the depth distance of any two class labels based on the common class label comprises:
    分别获取所述公共类别标签、所述任意两个类别标签中的一类别标签和另一类别标签的深度;respectively acquiring the depth of the public category label, one of the any two category labels, and the other category label;
    计算所述一类别标签的深度与所述另一类别标签的深度之和;calculating the sum of the depth of the one class label and the depth of the other class label;
    利用所述公共类别标签的深度与所述和的比值计算所述任意两个类别标签的深度距离。The depth distance of any two class labels is calculated using the ratio of the depth of the common class label to the sum.
  4. 根据权利要求3所述的训练方法,其特征在于,所述外部知识库为树状结构,所述获取所述公共类别标签的深度的步骤,包括:The training method according to claim 3, wherein the external knowledge base is a tree structure, and the step of acquiring the depth of the public category label comprises:
    定位所述公共类别标签在所述树状结构的标签位置;locate the label position of the public category label in the tree structure;
    基于所述标签位置获取所述树状结构中所述标签位置对应节点与根节点之间的层数;Obtain the number of layers between the node corresponding to the label position and the root node in the tree structure based on the label position;
    利用所述层数确定所述公共类别标签的深度。The depth of the common class label is determined using the number of layers.
  5. 根据权利要求1所述的训练方法,其特征在于,所述目标损失函数包括第一损失函数和第二损失函数,所述利用类别距离矩阵中所述真实类别标签与所述预测类别标签之间的深度距离以及所述预测类别概率分布计算目标损失函数的步骤,包括:The training method according to claim 1, wherein the target loss function includes a first loss function and a second loss function, and the difference between the real class label and the predicted class label in the class distance matrix is used. The steps of calculating the target loss function of the depth distance and the predicted category probability distribution, including:
    获取所述类别距离矩阵中所述真实类别标签与所述预测类别标签之间的深度距离;Obtain the depth distance between the true class label and the predicted class label in the class distance matrix;
    利用所述预测类别概率分布与所述深度距离计算所述第一损失函数;Calculate the first loss function using the predicted class probability distribution and the depth distance;
    利用所述预测类别概率分布与所述真实类别标签计算第二损失函数;Calculate a second loss function using the predicted class probability distribution and the true class label;
    基于所述第一损失函数和所述第二损失函数计算所述目标损失函数。The target loss function is calculated based on the first loss function and the second loss function.
  6. 根据权利要求1所述的训练方法,其特征在于,所述基于所述目标损失函数训练所述图像分类网络模型的步骤,包括:The training method according to claim 1, wherein the step of training the image classification network model based on the target loss function comprises:
    利用梯度下降技术对所述目标损失函数进行训练。The objective loss function is trained using gradient descent techniques.
  7. 一种图像分类方法,其特征在于,所述图像分类方法包括:An image classification method, characterized in that the image classification method comprises:
    获取待分类图像;Get images to be classified;
    将所述待分类图像输入到图像分类网络模型,得到所述待分类图像的类别标签,其中,所述图像分类网络模型为利用上述权利要求1-6任一项所述的方法训练所得的图像分类网络模型;Input the image to be classified into an image classification network model to obtain the class label of the image to be classified, wherein the image classification network model is an image obtained by training the method according to any one of the above claims 1-6 Classification network model;
    对所述待分类图像的类别标签进行评价,得到可解释性评分。The class labels of the images to be classified are evaluated to obtain an interpretability score.
  8. 根据权利要求7所述的方法,其特征在于,所述对所述待分类图像的类别标签进行评价,得到可解释性评分的步骤,包括:The method according to claim 7, wherein the step of evaluating the category label of the to-be-classified image to obtain an interpretability score comprises:
    将所述待分类图像输入到所述图像分类网络模型,得到类别标签排序值和类别概率分布;Inputting the image to be classified into the image classification network model to obtain a category label ranking value and a category probability distribution;
    所述类别标签排序值包括第一类别标签排序值和第二类别标签排序值,所述第一类别标签排序值为所述类别概率分布中所述待分类图像的类别标签与所述待分类图像的真实类别标签之间的差距概率排序值,所述第二类别标签排序值为类别距离矩阵中所述待分类图像的类别标签与所述待分类图像的真实类别标签之间的深度距离排序值;The category label ranking value includes a first category label ranking value and a second category label ranking value, and the first category label ranking value is the category label of the image to be classified in the category probability distribution and the image to be classified. The difference probability sorting value between the real category labels, the second category label sorting value is the depth distance sorting value between the category label of the image to be classified and the real category label of the image to be classified in the category distance matrix ;
    利用所述第一排序值和所述第二排序值计算可解释性评分。An interpretability score is calculated using the first ranking value and the second ranking value.
  9. 一种终端设备,其特征在于,所述设备包括存储器以及与所述存储器耦接的处理器;A terminal device, characterized in that the device includes a memory and a processor coupled to the memory;
    其中,所述存储器用于存储程序数据,所述处理器用于执行所述程序数据以实现如权利要求1~6任一项所述的图像分类网络模型的训练方法和/或权利要求7~8任一项所述的图像分类方法。Wherein, the memory is used to store program data, and the processor is used to execute the program data to realize the training method of the image classification network model according to any one of claims 1 to 6 and/or claims 7 to 8 The image classification method of any one.
  10. 一种计算机存储介质,其特征在于,所述计算机存储介质用于存储程序数据,所述程序数据在被处理器执行时,用以实现如权利要求1~6任一项所述的图像分类网络模型的训练方法和/或权利要求7~8任一项所述的图像分类方法。A computer storage medium, characterized in that the computer storage medium is used for storing program data, and when the program data is executed by a processor, the program data is used to implement the image classification network according to any one of claims 1 to 6 A model training method and/or the image classification method according to any one of claims 7 to 8.
PCT/CN2021/080087 2021-03-08 2021-03-10 Image classification network model training method, image classification method, and related device WO2022188080A1 (en)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN202110249741.3A CN112949724A (en) 2021-03-08 2021-03-08 Training method of image classification network model, image classification method and related equipment
CN202110249741.3 2021-03-08

Publications (1)

Publication Number Publication Date
WO2022188080A1 true WO2022188080A1 (en) 2022-09-15

Family

ID=76229599

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2021/080087 WO2022188080A1 (en) 2021-03-08 2021-03-10 Image classification network model training method, image classification method, and related device

Country Status (2)

Country Link
CN (1) CN112949724A (en)
WO (1) WO2022188080A1 (en)

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110147700A (en) * 2018-05-18 2019-08-20 腾讯科技(深圳)有限公司 Video classification methods, device, storage medium and equipment
CN110929807A (en) * 2019-12-06 2020-03-27 腾讯科技(深圳)有限公司 Training method of image classification model, and image classification method and device
CN111353542A (en) * 2020-03-03 2020-06-30 腾讯科技(深圳)有限公司 Training method and device of image classification model, computer equipment and storage medium
WO2020185198A1 (en) * 2019-03-08 2020-09-17 Google Llc Noise tolerant ensemble rcnn for semi-supervised object detection

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110147700A (en) * 2018-05-18 2019-08-20 腾讯科技(深圳)有限公司 Video classification methods, device, storage medium and equipment
WO2020185198A1 (en) * 2019-03-08 2020-09-17 Google Llc Noise tolerant ensemble rcnn for semi-supervised object detection
CN110929807A (en) * 2019-12-06 2020-03-27 腾讯科技(深圳)有限公司 Training method of image classification model, and image classification method and device
CN111353542A (en) * 2020-03-03 2020-06-30 腾讯科技(深圳)有限公司 Training method and device of image classification model, computer equipment and storage medium

Also Published As

Publication number Publication date
CN112949724A (en) 2021-06-11

Similar Documents

Publication Publication Date Title
WO2020182019A1 (en) Image search method, apparatus, device, and computer-readable storage medium
US10438091B2 (en) Method and apparatus for recognizing image content
Mak et al. Empirical evaluation of hierarchical ground‐motion models: Score uncertainty and model weighting
CN110569322A (en) Address information analysis method, device and system and data acquisition method
KR20180011221A (en) Select representative video frames for videos
CN106919957B (en) Method and device for processing data
CN107209861A (en) Use the data-optimized multi-class multimedia data classification of negative
CN110674312B (en) Method, device and medium for constructing knowledge graph and electronic equipment
WO2023115761A1 (en) Event detection method and apparatus based on temporal knowledge graph
CN111612041A (en) Abnormal user identification method and device, storage medium and electronic equipment
CN108959474B (en) Entity relation extraction method
WO2020232898A1 (en) Text classification method and apparatus, electronic device and computer non-volatile readable storage medium
WO2020007177A1 (en) Quotation method executed by computer, quotation device, electronic device and storage medium
CN105989001B (en) Image search method and device, image search system
CN116049412B (en) Text classification method, model training method, device and electronic equipment
CN112131322B (en) Time sequence classification method and device
JP7259935B2 (en) Information processing system, information processing method and program
WO2022188080A1 (en) Image classification network model training method, image classification method, and related device
CN110959157B (en) Accelerating large-scale similarity computation
CN114372518B (en) Test question similarity calculation method based on solving thought and knowledge points
CN113591881B (en) Intention recognition method and device based on model fusion, electronic equipment and medium
CN114511715A (en) Driving scene data mining method
JP2023536773A (en) Text quality evaluation model training method and text quality determination method, device, electronic device, storage medium and computer program
CN114528908A (en) Network request data classification model training method, classification method and storage medium
JP5824429B2 (en) Spam account score calculation apparatus, spam account score calculation method, and program

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 21929558

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE

122 Ep: pct application non-entry in european phase

Ref document number: 21929558

Country of ref document: EP

Kind code of ref document: A1