WO2022188080A1

WO2022188080A1 - Image classification network model training method, image classification method, and related device

Info

Publication number: WO2022188080A1
Application number: PCT/CN2021/080087
Authority: WO
Inventors: 王蕊; 童学智; 曲强; 姜青山
Original assignee: 深圳先进技术研究院
Priority date: 2021-03-08
Filing date: 2021-03-10
Publication date: 2022-09-15
Also published as: CN112949724A

Abstract

The present application provides an image classification network model training method, an image classification method, and a related device. The image classification network model training method comprises: acquiring a training image and an external knowledge base, wherein the external knowledge base comprises a true class label of the training image; encoding the external knowledge base to obtain a class distance matrix; inputting the training image and the true class label thereof and the class distance matrix into an image classification network model to obtain predicted class probability distribution of the training image, wherein the predicted class probability distribution comprises a difference probability between a predicted class label output by the image classification network model and the true class label; calculating a target loss function by using a depth distance, in the class distance matrix, between the true class label and the predicted class label and the predicted class probability distribution; and training the network model on the basis of the target loss function. The present application is used for obtaining an image classification network model for both improving image classification accuracy and enhancing prediction result interpretability.

Description

Image classification network model training method, image classification method and related equipment

【Technical field】

The present application relates to the technical field of image processing, and in particular, to a training method of an image classification network model, an image classification method and related equipment.

【Background technique】

Image classification is one of the most basic problems in the field of image processing technology. In the prior art, a deep neural network image classification method is mainly used, and specifically, the image to be classified and the category label of the to-be-classified image are input into the deep neural network model to train the deep neural network model. However, the predicted classification label of the image to be classified outputted by the deep neural network model obtained in the above manner may be wrong and unexplainable.

[Content of the invention]

The present application provides a training method for an image classification network model, an image classification method and related equipment.

In order to solve the above technical problems, the present application provides a training method of an image classification network model, the method comprising:

acquiring a training image and an external knowledge base, the external knowledge base including the true category label of the training image;

Encoding the external knowledge base to obtain a category distance matrix;

Input the training image and its true category label and the category distance matrix into the image classification network model to obtain the predicted category probability distribution of the training image, wherein the predicted category probability distribution includes the image classification network model The gap probability between the output predicted class label and the true class label;

Calculate the target loss function by using the depth distance between the true class label and the predicted class label in the class distance matrix and the predicted class probability distribution;

The image classification network model is trained based on the target loss function.

In order to solve the above technical problems, the present application provides an image classification method, the image classification method includes:

Get images to be classified;

Inputting the to-be-classified image into an image classification network model to obtain a class label of the to-be-classified image, wherein the image classification network model is an image classification network model trained by the above method;

The class labels of the images to be classified are evaluated to obtain an interpretability score.

In order to solve the above-mentioned technical problems, the present application provides a terminal device, the device includes a memory and a processor coupled to the memory;

The memory is used to store program data, and the processor is used to execute the program data to implement the above-mentioned training method for an image classification network model and/or the above-mentioned image classification method.

In order to solve the above technical problems, the present application also provides a computer storage medium, which is used for storing program data. The training method and/or the image classification method described above.

The beneficial effects of the present application are: acquiring training images and an external knowledge base, where the external knowledge base includes the real category labels of the training images; encoding the external knowledge base to obtain a category distance matrix; combining the training images and their real category labels and category distances The matrix is input to the image classification network model, and the predicted class probability distribution of the training image is obtained, wherein the predicted class probability distribution includes the difference probability between the predicted class label output by the image classification network model and the real class label; the real class label in the class distance matrix is used. Calculate the target loss function based on the depth distance from the predicted class label and the predicted class probability distribution; train the network model based on the target loss function. This application cites an external knowledge base to constrain the predicted category probability distribution output by the image classification network model, which improves the accuracy of image classification and enhances the interpretability of the predicted results.

【Description of drawings】

In order to illustrate the technical solutions in the embodiments of the present invention more clearly, the following briefly introduces the accompanying drawings used in the description of the embodiments. Obviously, the accompanying drawings in the following description are only some embodiments of the present invention. For those of ordinary skill in the art, other drawings can also be obtained from these drawings without creative effort. in:

1 is a schematic flowchart of an embodiment of a training method for an image classification network model provided by the present application;

2 is a simple schematic diagram of an external knowledge base in the training method of the image classification network model provided by the application;

3 is a schematic flowchart of an embodiment of S102 in the training method of the image classification network model shown in FIG. 1;

4 is a schematic flowchart of an embodiment of S104 in the training method of the image classification network model shown in FIG. 1;

5 is a schematic flowchart of an embodiment of an image classification method provided by the present application;

6 is a schematic structural diagram of an embodiment of a terminal device provided by the present application;

FIG. 7 is a schematic structural diagram of an embodiment of a computer storage medium provided by the present application.

【Detailed ways】

The technical solutions in the embodiments of the present application will be clearly and completely described below with reference to the drawings in the embodiments of the present application. Obviously, the described embodiments are only a part of the embodiments of the present application, but not all of the embodiments. Based on the embodiments in this application, all other embodiments obtained by those of ordinary skill in the art without making creative efforts shall fall within the scope of protection of this application.

The present application proposes a training method for an image classification network model. Please refer to FIG. 1 for details. FIG. 1 is a schematic flowchart of an embodiment of the training method for an image classification network model provided by the present application. The training method of the image classification network model in this embodiment can be applied to an image classification apparatus. The image classification apparatus of the present application may be a server, a mobile device, or a system in which a server and a mobile device cooperate with each other. Correspondingly, each part included in the mobile device, such as each unit, subunit, module, and submodule, may all be provided in the server, or in the mobile device, or in the server and the mobile device, respectively.

Further, the above server may be hardware or software. When the server is hardware, it can be implemented as a distributed server cluster composed of multiple servers, or can be implemented as a single server. When the server is software, it can be implemented as multiple software or software modules, such as software or software modules for providing distributed servers, or can be implemented as a single software or software module, which is not specifically limited here.

The training method of the image classification network model of this embodiment specifically includes the following steps:

S101: Acquire training images and an external knowledge base, where the external knowledge base includes ground-truth class labels of the training images.

In the embodiment of the present disclosure, considering that in the prior art, the image classification network model is trained by simply using the training image and the real class label of the training image, the predicted classification label output by the obtained image classification network model has the possibility of error, and there is a possibility of error. inexplicability. In order to avoid the above problems, the image classification apparatus of the present application refers to an external knowledge base to constrain the prediction results of the image classification network model.

Please refer to FIG. 2 , which is a simple schematic diagram of an external knowledge base in the training method of the image classification network model provided by the present application. It can be seen from the figure that the external knowledge base is a tree-like structure composed of multiple category labels, each node in the tree-like structure represents a category label, and the closer the position in the tree-like structure is, the more similar the category labels are between nodes. In order to use the external knowledge base to constrain the predicted class probability distribution output by the image classification network model, the external knowledge base in this embodiment should include all the class labels that the image classification network model can distinguish. Further, in the process of training the image classification network model, the external knowledge base at least includes the real category labels of the training images.

Further, considering the multi-class nature of training images, a single external knowledge base may not be able to include ground-truth class labels for all training images. To this end, the training method of the image classification network model of this embodiment can supplement the missing category labels in the single external knowledge base by manually extracting the category labels from the additional knowledge base.

Considering the influence of the number of training images on the prediction result output by the image classification network model, the number of training images required in this embodiment should be as large as possible. In a specific embodiment, the number of training images is at least 1000.

It should be noted that, before using the training images to train the image classification network model, the image classification apparatus of this embodiment should unify the pixel size of the training images, for example, uniformly scale them to 256×256, so that it is convenient to use the training images of the same pixel size to compare the images. The classification network model is trained.

S102: Encode the external knowledge base to obtain a category distance matrix.

Continuing to refer to Figure 2, in order to use the depth distance between the category labels in the external knowledge base to constrain the predicted category probability distribution output by the network model. In this embodiment, the external knowledge base needs to be encoded to obtain a category distance matrix. Among them, the category distance matrix includes the depth distance between any two category labels in the external knowledge base, that is, the semantic distance.

Optionally, this embodiment may adopt the embodiment in FIG. 3 to implement S102, which specifically includes S201 to S203:

S201: Obtain any two category labels in the external knowledge base.

In order to conveniently obtain the depth distance between the real class label and the predicted class label from the class distance matrix, the image classification apparatus of this embodiment may know in advance a class distance matrix including the depth distance between the real class label and the predicted class label. Specifically, for the acquisition of the category distance matrix, the image classification device first needs to acquire any two category labels in the external knowledge base.

S202: Obtain a common class label between any two class labels.

Further, the image classification apparatus obtains the common class label between any two class labels, that is, the most recent common ancestor, in the external knowledge base. Among them, the nearest common ancestor is the ancestor of one category label and the other category label in any two category labels, and the ancestor depth is as large as possible.

S203: Calculate the depth distance of any two class labels based on the common class label, so as to obtain a class distance matrix including the depth distance of any two class labels.

Wherein, the image classification apparatus of this embodiment uses the common class label to calculate the depth distance of any two class labels, so as to obtain a class distance matrix including the depth distance of any two class labels.

Specifically, the image classification device obtains the common class label, the depth of one class label and the depth of the other class label respectively; calculates the sum of the depth of one class label and the depth of the other class label; uses the common class label The ratio of the depth to the sum of the depth of one category label and the depth of the other category label above calculates the depth distance between any two category labels.

Among them, Wup (Wu-Palmer) semantic similarity is used to calculate the depth distance between any two category labels in the external knowledge base. The specific calculation formula of the depth distance is as follows:

Among them, c ₁ and c ₂ are two category labels in the external knowledge base, depth(c ₁ ) is the depth of the category label c ₁ , depth(c ₂ ) is the depth of the category label c ₂ , lcs(c ₁ , c ₂ ) is the common class label of class label c ₁ and class label c ₂ , depth(lcs(c ₁ , c ₂ )) is the depth of the common class label, d(c ₁ , c ₂ ) is the class label c ₁ and the class Depth distance between labels c ₂ .

Further, the image classification apparatus of this embodiment obtains the number of layers of the common class label by locating the label position of the common class label in the external knowledge base, and uses the label position in the external knowledge base to determine the depth of the common class label. In a specific embodiment, the depth acquisition method of the category label c ₁ and the category label c ₂ refers to the depth acquisition method of the common category label, which will not be repeated here.

S103: Input the training image and its true category label and category distance matrix into an image classification network model to obtain the predicted category probability distribution of the training image.

The image classification apparatus of this embodiment inputs the training image, its real category label and category distance matrix into the image classification network model, and obtains the predicted category probability distribution of the training image. Among them, the predicted class probability distribution includes the difference probability between the predicted class label output by the image classification network model and the real class label.

S104: Calculate the target loss function by using the depth distance between the true category label and the predicted category label in the category distance matrix and the predicted category probability distribution.

Because the loss function used in the existing image classification network model training method has a loss function value when the predicted class label of the training image is consistent with the real class label. When the predicted class label of the training image is inconsistent with the true class label, the loss function value is 0. Therefore, the existing loss function ignores the impact on the training of the image classification network model when the predicted class label of the training image is inconsistent with the real class label, resulting in the predicted class probability distribution output by the image classification network model inconsistent with common sense. In order to solve the above problem, the image classification network model training method of the present embodiment expands the loss function, and takes into account the influence of the inconsistency between the predicted category label of the training image and the real category label on the image classification network model. Specifically, the image classification apparatus of this embodiment uses the depth distance between the real class label and the predicted class label in the class distance matrix and the predicted class probability distribution to calculate the target loss function.

Optionally, this embodiment may adopt the embodiment of FIG. 4 to implement S104, which specifically includes S301 to S304:

S301: Obtain the depth distance between the true category label and the predicted category label in the category distance matrix.

Since this embodiment expands the loss function in the existing image classification network model training method, the target loss function in the image classification network model in this embodiment includes a first loss function and a second loss function. The first loss function and the second loss function respectively represent different aspects of the network model. Specifically, the first loss function represents the loss between the predicted class probability distribution output by the image classification network model and the preset class probability distribution when the predicted class of the training image is consistent with the real class. The second loss function indicates that when the predicted category of the training image is inconsistent with the real category, the depth distance between the predicted category of the training image and the real category, that is, the semantic distance, is used to obtain the predicted category probability distribution and depth output by the image classification network model. loss between distances.

S302: Calculate a first loss function by using the predicted category probability distribution and the depth distance.

The image classification apparatus calculates a first loss function using the predicted class probability distribution and the depth distance.

p(k, l) is the predicted class probability output by the image classification network model. I(k, l) is an indicator function. The indicator function is 1 when the real class labels of l and k are consistent, and the indicator function is 0 when the real class labels of l and k are inconsistent.

It should be noted that, in a specific embodiment, the first loss function may be a cross-entropy loss function.

S303: Calculate the second loss function by using the predicted class probability distribution and the true class label.

Based on the first loss function in S302, it can be seen that when the real category labels of l and k are inconsistent, the indicator function is 0, resulting in the first loss function being 0, and the prediction result output by the image classification network model ignores the real category label and the training image. Predict cases where class labels are inconsistent. In order to solve the above problem, the image classification apparatus of this embodiment extends the first loss function to constrain the prediction results of other category labels except the true category label. Specifically, the image classification apparatus calculates the second loss function using the predicted class probability distribution and the true class label.

Specifically, the second loss function satisfies the following formula:

where L _Sem (k) is the second loss function, t _k is the true class label corresponding to the training image, d(t _k , l) is the class distance matrix including the depth distance between the predicted class label and the true class label, p(k, l) is the predicted class probability output by the image classification network model.

S304: Calculate the target loss function based on the first loss function and the second loss function.

Wherein, the image classification apparatus uses the first loss function and the second loss function to calculate the target loss function.

Specifically, the objective loss function satisfies the following formula:

Among them, L(k) is the target loss function, and α is the weight coefficient, which is used to balance the first loss function and the second loss function to optimize the training of the image classification network model.

In a specific embodiment, the image classification apparatus may use a grid search method to determine the weight coefficient α.

S105: Train an image classification network model based on the target loss function.

The image classification apparatus of this embodiment trains an image classification network model with an objective loss function. Specifically, the image classification apparatus of this embodiment can use the gradient descent technique to train the target loss function.

In the above solution, the image classification device refers to an external knowledge base to constrain the predicted category probability distribution output by the image classification network model, which improves the accuracy of image classification and enhances the interpretability of the predicted results; using the predicted category probability distribution and depth. The distance calculation target loss function extends the existing loss function to avoid the fact that the existing loss function ignores the inconsistency between the predicted category label of the training image and the real category label, which causes the predicted category probability distribution output by the image classification network model to be inconsistent with common sense.

Please refer to FIG. 5 , which is a schematic flowchart of an embodiment of an image classification method provided by the present application. The image classification method in this embodiment can be applied to the image classification network model trained in the above-mentioned training method of the image classification network model, so as to improve the accuracy of image classification and the interpretability of prediction results. Taking the server used for the image classification method as an example below, the image classification method provided by the present application is introduced. The image classification method in this embodiment specifically includes the following steps:

S401: Acquire an image to be classified.

The acquisition of the image to be classified in this embodiment is similar to the acquisition of the training image in the above-mentioned embodiment S101, and details are not repeated here.

S402: Input the image to be classified into the image classification network model to obtain the class label of the image to be classified.

Wherein, the image classification apparatus of this embodiment inputs the image to be classified into the image classification network model, and obtains the class label of the image to be classified.

S403: Evaluate the category labels of the images to be classified to obtain an interpretability score.

In order to improve the accuracy of image classification and enhance the interpretability of the prediction results, in this embodiment, it is necessary to evaluate the class labels of the images to be classified outputted by the image classification network model to obtain an interpretability score. Specifically, in this embodiment, the images to be classified are input into the image classification network model, the class label ranking value and the class probability distribution are obtained, and the interpretability is calculated by using the class label ranking value including the first class label ranking value and the second class label ranking value. score.

Wherein, the first category label ranking value is the difference probability ranking value between the category label of the image to be classified and the real category label of the image to be classified in the category probability distribution. The second category label ranking value is the depth distance ranking value between the category label of the image to be classified and the real category label of the image to be classified in the category distance matrix.

Among them, the interpretability evaluation satisfies the following formula:

Among them, r _{k, l} is the category probability ranking value of the category probability that the image k to be classified belongs to the category label l in the category probability distribution.

For the image to be classified k belongs to the class label

The class probability ranking value of the class probability in the class probability distribution. s _{k, l} is the ranking value of the depth distance between the class label l and the real class label t _k of the image to be classified in the class distance matrix, that is, s _{k, l} is

the corresponding sort value.

label for category

The ranking value of the depth distance between the real class label t _k of the image to be classified in the class distance matrix, i.e.

for

the corresponding sort value.

In this embodiment, the images to be classified are acquired, the images to be classified are input into the image classification network model, the class labels of the images to be classified are obtained, the class labels of the images to be classified are evaluated, and the interpretability score is obtained, so as to improve the accuracy of image classification. and enhance the interpretability of prediction results.

In order to implement the image classification network model training method and/or the image classification method in the above embodiment, the present application proposes a terminal device. Please refer to FIG. 6 for details. FIG. 6 is a schematic structural diagram of an embodiment of the terminal device provided by the present application.

The terminal device 600 includes a memory 61 and a processor 62, wherein the memory 61 and the processor 62 are coupled.

The memory 61 is used for storing program data, and the processor 62 is used for executing the program data to implement the image classification network model training method and/or the image classification method in the above-mentioned embodiments.

In this embodiment, the processor 62 may also be referred to as a CPU (Central Processing Unit, central processing unit). The processor 62 may be an integrated circuit chip with signal processing capability. The processor 62 may also be a general purpose processor, digital signal processor (DSP), application specific integrated circuit (ASIC), field programmable gate array (FPGA) or other programmable logic device, discrete gate or transistor logic, discrete hardware components . A general purpose processor may be a microprocessor or the processor 62 may be any conventional processor or the like.

The present application also provides a computer storage medium 700. As shown in FIG. 7, the computer storage medium 700 is used to store program data 71, and when the program data 71 is executed by the processor, it is used to realize the method described in the embodiment of the present application. The image classification network model training method and/or the image classification method.

The methods involved in the embodiments of the image classification network model training method and/or the image classification method of the present application exist in the form of software functional units when implemented and are sold or used as independent products, and can be stored in the device, for example a computer-readable storage medium. Based on this understanding, the technical solutions of the present application can be embodied in the form of software products in essence, or the parts that contribute to the prior art, or all or part of the technical solutions, and the computer software products are stored in a storage medium , including several instructions for causing a computer device (which may be a personal computer, a server, or a network device, etc.) or a processor to execute all or part of the steps of the methods described in the various embodiments of the present invention. The aforementioned storage medium includes: U disk, mobile hard disk, Read-Only Memory (ROM, Read-Only Memory), Random Access Memory (RAM, Random Access Memory), magnetic disk or optical disk and other media that can store program codes .

The above description is only an embodiment of the present application, and is not intended to limit the scope of the patent of the present application. Any equivalent structure or equivalent process transformation made by using the contents of the description and drawings of the present application, or directly or indirectly applied to other related technologies Fields are similarly included within the scope of patent protection of this application.

Claims

A training method for an image classification network model, characterized in that the method comprises:

acquiring a training image and an external knowledge base, the external knowledge base including the true category label of the training image;

Encoding the external knowledge base to obtain a category distance matrix;

Input the training image and its true category label and the category distance matrix into the image classification network model to obtain the predicted category probability distribution of the training image, wherein the predicted category probability distribution includes the image classification network model The gap probability between the output predicted class label and the true class label;

Calculate the target loss function by using the depth distance between the true class label and the predicted class label in the class distance matrix and the predicted class probability distribution;

The image classification network model is trained based on the target loss function.
The training method according to claim 1, wherein the step of encoding the external knowledge base to obtain a category distance matrix comprises:

Obtain any two category labels in the external knowledge base;

Obtain the common class label between any two class labels;

Calculate the depth distances of the any two class labels based on the common class labels to obtain a class distance matrix including the depth distances of the any two class labels.
The training method according to claim 2, wherein the step of calculating the depth distance of any two class labels based on the common class label comprises:

respectively acquiring the depth of the public category label, one of the any two category labels, and the other category label;

calculating the sum of the depth of the one class label and the depth of the other class label;

The depth distance of any two class labels is calculated using the ratio of the depth of the common class label to the sum.
The training method according to claim 3, wherein the external knowledge base is a tree structure, and the step of acquiring the depth of the public category label comprises:

locate the label position of the public category label in the tree structure;

Obtain the number of layers between the node corresponding to the label position and the root node in the tree structure based on the label position;

The depth of the common class label is determined using the number of layers.
The training method according to claim 1, wherein the target loss function includes a first loss function and a second loss function, and the difference between the real class label and the predicted class label in the class distance matrix is used. The steps of calculating the target loss function of the depth distance and the predicted category probability distribution, including:

Obtain the depth distance between the true class label and the predicted class label in the class distance matrix;

Calculate the first loss function using the predicted class probability distribution and the depth distance;

Calculate a second loss function using the predicted class probability distribution and the true class label;

The target loss function is calculated based on the first loss function and the second loss function.
The training method according to claim 1, wherein the step of training the image classification network model based on the target loss function comprises:

The objective loss function is trained using gradient descent techniques.
An image classification method, characterized in that the image classification method comprises:

Get images to be classified;

Input the image to be classified into an image classification network model to obtain the class label of the image to be classified, wherein the image classification network model is an image obtained by training the method according to any one of the above claims 1-6 Classification network model;

The class labels of the images to be classified are evaluated to obtain an interpretability score.
The method according to claim 7, wherein the step of evaluating the category label of the to-be-classified image to obtain an interpretability score comprises:

Inputting the image to be classified into the image classification network model to obtain a category label ranking value and a category probability distribution;

The category label ranking value includes a first category label ranking value and a second category label ranking value, and the first category label ranking value is the category label of the image to be classified in the category probability distribution and the image to be classified. The difference probability sorting value between the real category labels, the second category label sorting value is the depth distance sorting value between the category label of the image to be classified and the real category label of the image to be classified in the category distance matrix ;

An interpretability score is calculated using the first ranking value and the second ranking value.
A terminal device, characterized in that the device includes a memory and a processor coupled to the memory;

Wherein, the memory is used to store program data, and the processor is used to execute the program data to realize the training method of the image classification network model according to any one of claims 1 to 6 and/or claims 7 to 8 The image classification method of any one.
A computer storage medium, characterized in that the computer storage medium is used for storing program data, and when the program data is executed by a processor, the program data is used to implement the image classification network according to any one of claims 1 to 6 A model training method and/or the image classification method according to any one of claims 7 to 8.