CN112766389A

CN112766389A - Image classification method, training method, device and equipment of image classification model

Info

Publication number: CN112766389A
Application number: CN202110101459.0A
Authority: CN
Inventors: 朱理; 魏晓明
Original assignee: Beijing Sankuai Online Technology Co Ltd
Current assignee: Beijing Sankuai Online Technology Co Ltd
Priority date: 2021-01-26
Filing date: 2021-01-26
Publication date: 2021-05-07
Anticipated expiration: 2041-01-26
Also published as: CN112766389B

Abstract

The application discloses an image classification method, an image classification model training device and image classification model training equipment, and belongs to the technical field of computers. The method comprises the following steps: acquiring a target image to be classified and a target image classification model, wherein the target image classification model is obtained by training a class hierarchy tree, training samples and a loss function based on the shortest path, the class hierarchy tree and the training samples are used for training an initial image classification model to obtain a first image classification model, and the loss function based on the shortest path is used for updating the first image classification model to obtain the target image classification model; and calling a target image classification model to identify the target image to obtain the image category corresponding to the target image. The classification accuracy and the classification rationality of the target image classification model obtained by the method are high, and when the target image classification model is used for classifying the target image, the image classification of the obtained target image is more accurate.

Description

Image classification method, training method, device and equipment of image classification model

Technical Field

The embodiment of the application relates to the technical field of computers, in particular to an image classification method, an image classification model training device and image classification model training equipment.

Background

With the rapid development of computer technology, electronic devices have more and more powerful functions, and users can acquire various images through various channels, and a large number of images of different types also exist in the electronic devices.

In the related art, a target image to be classified is acquired, the target image is input into a target image classification model, and the image is classified based on the target image classification model to obtain a category corresponding to the image. The target image classification model is obtained by training an initial image classification model, and the training process is as follows: and training the initial image classification model based on the reference image and one type of the reference image so as to obtain a target image classification model.

However, in the above training process of the target image classification model, a reference image with inconsistent image content and image category may occur, and overfitting is likely to occur when training is performed based on the reference image, which may result in low accuracy of the determined image category when determining the image category by using the target image classification model.

Disclosure of Invention

The embodiment of the application provides an image classification method, an image classification model training device and image classification model training equipment, which can be used for solving the problems in the related art. The technical scheme is as follows:

in a first aspect, an embodiment of the present application provides an image classification method, where the method includes:

obtaining a target image to be classified and a target image classification model, wherein the target image classification model is obtained by training a class hierarchy tree, a training sample and a loss function based on the shortest path, the class hierarchy tree and the training sample are used for training an initial image classification model to obtain a first image classification model, and the loss function based on the shortest path is used for updating the first image classification model to obtain the target image classification model;

and calling the target image classification model to identify the target image to obtain the image category corresponding to the target image.

In a possible implementation manner, the invoking the target image classification model to identify the target image to obtain an image category corresponding to the target image includes:

calling the target image classification model to identify the target image to obtain a plurality of reference classes corresponding to the target image and the probability of each reference class;

and determining the reference category with the probability meeting the target requirement as the image category corresponding to the target image.

In a second aspect, an embodiment of the present application provides a method for training an image classification model, where the method includes:

obtaining a class hierarchy tree comprising a plurality of image classes and associations between the image classes, and a training sample comprising a first image and a reference class for the first image;

performing class smoothing processing on the first image based on the reference class of the first image and the class hierarchical tree to obtain the probability that the first image belongs to each first class, wherein the first class is any image class included in the class hierarchical tree;

training an initial image classification model based on the first image and the probability that the first image belongs to each first category to obtain a first image classification model;

and updating the first image classification model based on the first image, the reference category of the first image and the first image classification model through a loss function based on the shortest path to obtain a target image classification model, wherein the target image classification model is used for image classification.

In a possible implementation manner, the updating the first image classification model based on the first image, the reference category of the first image, and the first image classification model by using a shortest path-based loss function to obtain a target image classification model includes:

determining a prediction class of the first image based on the first image and the first image classification model;

determining a target path loss value through the shortest path-based loss function based on the reference class of the first image and the prediction class of the first image;

and updating the first image classification model based on the target path loss value to obtain the target image classification model.

In one possible implementation, the determining, by the shortest path-based loss function, a target path loss value based on the reference class of the first picture and the prediction class of the first picture includes:

determining a target path loss value loss' based on the reference class of the first image and the prediction class of the first image according to the following formula by the shortest path based loss function:

wherein pred is a prediction category of the first image, gt is a reference category of the first image, loss is an original path loss value obtained based on the prediction category of the first image and the reference category of the first image, lca is a target category corresponding to both the prediction category of the first image and the reference category of the first image, root is a root category of the category hierarchical tree, and SP is a reference point of the first image and the reference category of the first image_maxThe SP is a second path value between the prediction category of the first image and the reference category of the first image, and the otherwise is a first path value corresponding to the category hierarchical tree.

In a possible implementation manner, the performing a class smoothing process on the first image based on the reference class of the first image and the class hierarchical tree to obtain a probability that the first image belongs to each first class includes:

based on the reference category of the first image and the category hierarchical tree, performing category smoothing processing on the first image according to the following formula to obtain the probability P that the first image belongs to each first category_i：

Wherein i is an ith class, gt is a reference class of the first image, α is a valuation proportion of classes associated with the reference class of the first image, β is a valuation proportion of classes not associated with the reference class of the first image, e is a penalty value, and the value of e is [0.1,0.2]Said N is₁For a number of categories in the category hierarchy tree associated with a reference category of the first image, the N₂Is the number of categories in the category hierarchy tree that are not associated with the reference category of the first image.

In a third aspect, an embodiment of the present application provides an image classification apparatus, including:

the system comprises an acquisition module, a classification module and a classification module, wherein the acquisition module is used for acquiring a target image to be classified and a target image classification model, the target image classification model is obtained by training a class hierarchical tree, a training sample and a loss function based on the shortest path, the class hierarchical tree and the training sample are used for training an initial image classification model to obtain a first image classification model, and the loss function based on the shortest path is used for updating the first image classification model to obtain the target image classification model;

and the identification module is used for calling the target image classification model to identify the target image so as to obtain the image category corresponding to the target image.

In a possible implementation manner, the identification module is configured to invoke the target image classification model to identify the target image, so as to obtain a plurality of reference categories corresponding to the target image and a probability of each reference category;

In a fourth aspect, an embodiment of the present application provides an apparatus for training an image classification model, where the apparatus includes:

an obtaining module configured to obtain a category-level tree and a training sample, the category-level tree including a plurality of image categories and associations between the image categories, the training sample including a first image and a reference category of the first image;

a processing module, configured to perform class smoothing on the first image based on a reference class of the first image and the class hierarchical tree, to obtain a probability that the first image belongs to each first class, where the first class is any image class included in the class hierarchical tree;

the training module is used for training an initial image classification model based on the first image and the probability that the first image belongs to each first category to obtain a first image classification model;

and the updating module is used for updating the first image classification model based on the first image, the reference category of the first image and the first image classification model through a loss function based on the shortest path to obtain a target image classification model, and the target image classification model is used for image classification.

In a possible implementation, the updating module is configured to determine a prediction category of the first image based on the first image and the first image classification model;

In a possible implementation manner, the updating module is configured to determine, based on the reference category of the first image and the prediction category of the first image, a target path loss value loss' according to the following formula through the shortest path-based loss function:

wherein, theThe pred is a prediction category of the first image, the gt is a reference category of the first image, the loss is an original path loss value obtained based on the prediction category of the first image and the reference category of the first image, the lca is a target category corresponding to the prediction category of the first image and the reference category of the first image, the root is a root category of the category hierarchical tree, and the SP is a reference point of the first image and the reference category of the first image_maxThe SP is a second path value between the prediction category of the first image and the reference category of the first image, and the otherwise is a first path value corresponding to the category hierarchical tree.

In a possible implementation manner, the processing module is configured to perform class smoothing on the first image according to the following formula based on the reference class of the first image and the class hierarchical tree, so as to obtain a probability P that the first image belongs to each first class_i：

In a fifth aspect, an embodiment of the present application provides an electronic device, where the electronic device includes a processor and a memory, where the memory stores at least one program code, and the at least one program code is loaded and executed by the processor to implement the image classification method according to the first aspect or any one of the possible implementations of the first aspect, or to implement the training method for the image classification model according to the second aspect or any one of the possible implementations of the second aspect.

In a sixth aspect, a computer-readable storage medium is further provided, where at least one program code is stored in the computer-readable storage medium, and the at least one program code is loaded into and executed by a processor to implement the image classification method according to the first aspect or any one of the possible implementations of the first aspect, or to implement the training method for the image classification model according to the second aspect or any one of the possible implementations of the second aspect.

In a seventh aspect, a computer program or a computer program product is further provided, where at least one computer instruction is stored, and the at least one computer instruction is loaded and executed by a processor to implement the image classification method according to the first aspect or any one of the possible implementations of the first aspect, or to implement the training method for the image classification model according to the second aspect or any one of the possible implementations of the second aspect.

The technical scheme provided by the embodiment of the application at least has the following beneficial effects:

according to the technical scheme provided by the embodiment of the application, the initial image classification model is trained on the basis of the class hierarchy tree and the training samples to obtain the first image classification model, so that the generalization capability of the first image classification model is better, and the first image classification model can effectively prevent the condition of overfitting training; and updating the first image classification model according to the loss function based on the shortest path to obtain a target image classification model, so that the classification accuracy of the target image classification model is better. When the target image classification model is adopted to determine the image category, the accuracy and the reasonability of the determined image category can be improved. Therefore, when the target image classification model is called to determine the image category corresponding to the target image, the determined image category of the target image is more accurate.

Drawings

In order to more clearly illustrate the technical solutions in the embodiments of the present application, the drawings needed to be used in the description of the embodiments are briefly introduced below, and it is obvious that the drawings in the following description are only some embodiments of the present application, and it is obvious for those skilled in the art to obtain other drawings based on these drawings without creative efforts.

Fig. 1 is a schematic diagram of an implementation environment of an image classification method and a training method of an image classification model according to an embodiment of the present application;

fig. 2 is a flowchart of an image classification method provided in an embodiment of the present application;

FIG. 3 is a flowchart of a training method of an image classification model according to an embodiment of the present disclosure;

FIG. 4 is a schematic diagram of a category hierarchy tree provided by an embodiment of the present application;

fig. 5 is a schematic structural diagram of an image classification apparatus according to an embodiment of the present application;

FIG. 6 is a schematic structural diagram of an apparatus for training an image classification model according to an embodiment of the present disclosure;

fig. 7 is a schematic structural diagram of an electronic device according to an embodiment of the present application.

Detailed Description

To make the objects, technical solutions and advantages of the present application more clear, embodiments of the present application will be described in further detail below with reference to the accompanying drawings.

Fig. 1 is a schematic diagram of an implementation environment of a training method of an image classification model and an image classification method provided in an embodiment of the present application, and as shown in fig. 1, the implementation environment includes: an electronic device 101.

The electronic device 101 may be at least one of a smartphone, a game console, a desktop computer, a tablet computer, an e-book reader, an MP3(Moving Picture Experts Group Audio Layer III, motion Picture Experts compression standard Audio Layer 3) player, an MP4(Moving Picture Experts Group Audio Layer IV, motion Picture Experts compression standard Audio Layer 4) player, and a laptop computer. The electronic device 101 is configured to execute the image classification method and the training method of the image classification model provided in the embodiment of the present application.

The electronic device 101 may be generally referred to as one of a plurality of electronic devices, and the embodiment is only illustrated by the electronic device 101. Those skilled in the art will appreciate that the number of electronic devices 101 described above may be greater or fewer. For example, the number of the electronic devices 101 may be only one, or the number of the electronic devices 101 may be tens or hundreds, or more, and the number of the electronic devices and the device types are not limited in the embodiment of the present application.

Based on the foregoing implementation environment, an embodiment of the present application provides an image classification method, which may be executed by the electronic device 101 in fig. 1, taking a flowchart of the image classification method provided in the embodiment of the present application as shown in fig. 2 as an example. As shown in fig. 2, the method includes the following steps 201 to 202:

in step 201, a target image to be classified and a target image classification model are acquired.

In the exemplary embodiment of the application, the target image classification model is obtained by training a class hierarchical tree, training samples and a loss function based on the shortest path, wherein the class hierarchical tree and the training samples are used for training an initial image classification model to obtain a first image classification model, and the loss function based on the shortest path is used for updating the first image classification model to obtain the target image classification model.

In a possible implementation manner, the target image to be classified is an image of any type, any format, and any size, which is not limited in the embodiment of the present application. At least one image is stored in the electronic equipment, and the electronic equipment can directly acquire one image in the storage space of the electronic equipment and determine the image as a target image to be classified. The electronic equipment can also provide an entrance for uploading images, a user uploads an image based on the entrance for uploading images, and the electronic equipment determines the image uploaded by the user as a target image to be classified. Of course, the target image to be classified may also be acquired by other manners, which is not limited in the embodiment of the present application.

In a possible implementation manner, the electronic device trains an initial image classification model in advance to obtain a target image classification model, and stores the obtained target image classification model in a storage space of the electronic device, wherein the target image classification model is used for image classification. After the electronic device receives the target image to be classified, the target image classification model is extracted from the storage space of the electronic device, that is, the electronic device acquires the target image classification model. The training process of the target image classification model is described with the embodiment shown in fig. 3, and is not repeated herein.

In step 202, a target image classification model is called to process the target image, and an image category corresponding to the target image is obtained.

In a possible implementation manner, a process of calling a target image classification model to process a target image to obtain an image class corresponding to the target image is as follows:

calling a target image classification model to identify a target image to obtain a plurality of reference classes corresponding to the target image and the probability of each reference class; and determining the reference category with the probability meeting the target requirement as the image category corresponding to the target image.

The process of calling a target image classification model to identify a target image and obtaining a plurality of reference categories corresponding to the target image and the probability of each reference category is as follows: and inputting the target image into a target image classification model, and identifying the target image based on the target image classification model to obtain a plurality of reference classes corresponding to the target image and the probability of each reference class. The reference category with the probability meeting the target requirement may be the reference category with the maximum probability, or may be another reference category, which is not limited in the embodiment of the present application.

Exemplarily, the target image is input into the target image classification model, and the target image is identified by the target image classification model, so that the probabilities of the plurality of reference classes and the respective reference classes corresponding to the target image are respectively: a first reference category, 80%, a second reference category, 15%, a third reference category, 5%. Since the probability of the first reference category is the highest, the first reference category is determined as the image category corresponding to the target image.

According to the method, the initial image classification model is trained on the basis of the class hierarchy tree and the training samples to obtain the first image classification model, so that the generalization capability of the first image classification model is good, and the first image classification model can effectively prevent the condition of overfitting training; and updating the first image classification model according to the loss function based on the shortest path to obtain a target image classification model, so that the classification accuracy of the target image classification model is better. When the target image classification model is adopted to determine the image category, the accuracy and the reasonability of the determined image category can be improved. Therefore, when the target image classification model is called to determine the image category corresponding to the target image, the determined category of the target image is more accurate.

An embodiment of the present application provides a training method for an image classification model, which is exemplified by a flowchart of the training method for an image classification model provided in the embodiment of the present application shown in fig. 3, and the method can be executed by the electronic device 101 in fig. 1. As shown in fig. 3, the method includes the following steps 301 to 304:

in step 301, a class-level tree comprising a plurality of image classes and associations between the respective image classes and a training sample comprising a first image and a reference class of the first image are obtained.

In a possible implementation manner, an application program for acquiring resources is installed and run in the electronic device, and the application program may acquire resources of different categories, and the program type of the application program is not limited in this embodiment of the application program. The application establishes corresponding category hierarchy trees for different categories of resources. For example, the application may provide resources of a category such as a game resource, a korean resource, and an education resource, and a category hierarchical tree corresponding to the korean resource, a category hierarchical tree corresponding to the game resource, and a category hierarchical tree corresponding to the education resource may be established for the game resource and the education resource, respectively.

In one possible implementation, the category hierarchy tree includes a plurality of resource categories and associations between the resource categories. In the category hierarchical tree, categories (upper-level category, same-level category, and lower-level category) adjacent to the reference category of the image are taken as categories associated with the reference category of the image, and categories not adjacent to the reference category of the image are taken as categories not associated with the reference category of the image.

Fig. 4 is a schematic diagram of a category hierarchical tree according to an embodiment of the present application, where the category hierarchical tree is a category hierarchical tree corresponding to a resource of a korean person, a root category of the category hierarchical tree is a korean person, and the category hierarchical tree includes 12 categories, each of which is: hairdressing, eyelash beautification, tattoo, body beautification, yoga, dance, health care, medical and aesthetic shaping, store environment and propaganda picture. That is, the liren resource includes 12 kinds of resources, which are resources for hairdressing, beauty eyelash, tattoo, nail beauty, tattoo, beauty body, yoga, dance, health care, medical and beauty shaping, shop environment, and propaganda drawing. The resource may be a voucher, such as a hair voucher, or may be other types of resources, which is not limited in the embodiments of the present application.

Wherein, the hairdressing comprises: hair cutting, hair dyeing and perming, hair washing, hair extension, hair planting and hair style; the hair cutting further comprises: children's hair-cutting and adults' hair-cutting; the hairstyle further comprises: braids, oil heads and hair plates. The categories included in the other categories are shown in fig. 4, and are not described herein again. The category hierarchy tree further includes an association relationship between categories, such as hair cutting, since hair dressing is an upper category of hair cutting, hair dyeing and perming, hair washing, hair extension, hair planting, and hair style are same categories of hair cutting, and hair cutting for children and hair cutting for adults are lower categories of hair cutting, hair dressing, hair dyeing and perming, hair washing, hair extension, hair planting, hair style, hair cutting for children, and hair cutting for adults are determined as categories associated with hair cutting. The categories except for hairdressing, hair coloring and perming, hair washing, hair extension, hair setting, hair style, child hair cutting, and adult hair cutting in the category hierarchy tree are determined as categories unassociated with hair cutting.

It should be noted that the determination process of the category associated with the other category is consistent with the determination process of the category associated with the hair-cutting, and the determination process of the category not associated with the other category is consistent with the determination process of the category not associated with the hair-cutting, which is not described herein again.

In a possible implementation manner, the training sample is any image set, the training sample includes a plurality of first images and reference categories of the respective first images, the reference category of the first image is a category determined by a user according to image contents of the first image, but the reference category of the first image may not reflect the contents of the first image by a percentage, that is, the reference category of the first image may be an incorrect category.

It should be noted that the training sample is any image set stored in the electronic device or any image set acquired by the electronic device from the internet, which is not limited in the embodiment of the present application.

It should be further noted that the category hierarchical tree is a category hierarchical tree corresponding to the training sample, and if the image included in the training sample is an image corresponding to the korean resource, the category hierarchical tree is a category hierarchical tree corresponding to the korean resource; if the images included in the training sample are images corresponding to the game resources, the category hierarchical tree is a category hierarchical tree corresponding to the game resources.

In step 302, based on the reference category and the category hierarchical tree of the first image, a category smoothing process is performed on the first image to obtain a probability that the first image belongs to each first category, where the first category is any image category included in the category hierarchical tree.

In a possible implementation manner, based on the reference category and the category hierarchical tree of the first image, the process of performing category smoothing on the first image to obtain the probability that the first image belongs to each first category is as follows:

based on the reference category and category hierarchical tree of the first image, the category smoothing processing is carried out on the first image according to the following formula (1) to obtain the probability P that the first image belongs to each first category_i：

In the above formula (1), i is the ith class, gt is the reference class of the first image, α is the assignment proportion of the class associated with the reference class of the first image, β is the assignment proportion of the class not associated with the reference class of the first image, e is a penalty value, and the value of e is [0.1,0.2 ]]，N₁For the number of classes associated with the reference class of the first image in the class hierarchy tree, N₂Is the number of categories in the category hierarchy tree that are not associated with the reference category of the first image.

It should be noted that the assignment proportion (value of α) of the class associated with the reference class of the first image is greater than the assignment proportion (value of β) of the class not associated with the reference class of the first image, and the sum of the assignment proportion of the class associated with the reference class of the first image and the assignment proportion of the class not associated with the reference class of the first image is 1.

It should be further noted that the value of the penalty value e, the assignment proportion (α value) of the class associated with the reference class of the first image, and the assignment proportion (β value) of the class not associated with the reference class of the first image are all determined in advance by the user.

In one possible implementation, when the first class is a class consistent with a reference class of the first image, determining a probability that the first image belongs to the first class according to a first formula in formula (1) above; determining a probability that the first image belongs to the first class according to a second formula of the above formula (1) when the first class is a class associated with a reference class of the first image; when the first class is a class that is not associated with the reference class of the first image, the probability that the first image belongs to the first class is determined according to the third formula in formula (1) above.

The class smoothing processing is carried out on the first image based on the formula (1), and the probability that the first image belongs to each class in the class hierarchical tree can be determined, so that when the image classification model is trained based on the first image and the probability that the first image belongs to each class, the condition of model overfitting can be effectively avoided, and the accuracy of the image classification model can be improved.

Illustratively, taking the reference category of the first image as the hair-cut example, the number of categories associated with the hair-cut in the category hierarchical tree is determined to be N₁8, number of categories N in a category hierarchy tree that are not associated with hair cutting₂53. With the value of e as 0.2, the assignment proportion α of the class associated with the reference class of the first image is 0.7, and the assignment proportion β of the class not associated with the reference class of the first image is 0.3. When the first class is a reference class of the first image, determining the probability that the first image belongs to the first class as 0.8; when the first category is a category associated with a reference category of the first image, a probability that the first image belongs to the first category is determined to be 0.0175; when the first class is a class not associated with the reference class of the first image, the probability that the first image belongs to the first class is determined to be 0.0011.

In step 303, the initial image classification model is trained based on the first image and the probability that the first image belongs to each first category, so as to obtain a first image classification model.

In a possible implementation manner, the initial image classification model is an image classification model of any type, which is not limited in the embodiment of the present application. Illustratively, the initial image classification model is a Residual Network model (ResNET) or a Visual Geometry Group model (VGG).

In one possible implementation manner, based on the first image and the probability that the first image belongs to each first class, the process of training the initial image classification model is as follows: and inputting the probabilities of the first image and the first image belonging to the first classes into the initial image classification model, so that the probabilities of the first image and the first image belonging to the first classes are used for training the initial image classification model, thereby obtaining the first image classification model.

In the training process of the first image classification model, the probability that the first image belongs to each first category is considered, so that the condition of over-fitting training of the first image classification model can be effectively prevented, and the classification accuracy of the first image classification model is improved. When the image category is determined based on the first image classification model, a plurality of categories corresponding to the image and the probability of each category are output, and then the category (the category with the highest probability) with the probability meeting the target requirement is determined as the image category, so that the determination process of the image category is more accurate.

In step 304, the first image classification model is updated based on the first image, the reference category of the first image, and the first image classification model by a loss function based on the shortest path, so as to obtain a target image classification model.

In one possible implementation, based on the first image, the reference class of the first image, and the first image classification model, the process of updating the first image classification model by the shortest path-based loss function is as follows: determining a prediction class of the first image based on the first image and the first image classification model; determining a target path loss value through a loss function based on a shortest path based on a reference category of the first image and a prediction category of the first image; and updating the first image classification model based on the target path loss value to obtain a target image classification model.

Wherein, based on the first image and the first image classification model, the process of determining the prediction category of the first image is as follows: the method comprises the steps of inputting a first image into a first image classification model, obtaining a plurality of classes corresponding to the first image and the probability of each class based on the output result of the first image classification model, and determining the class with the highest probability as the prediction class of the first image.

Illustratively, a first image is input into a first image classification model, and the first image is recognized based on the first image classification model, and the recognition result is obtained by: the probability of hair cutting is 80 percent, the probability of hair dyeing is 10 percent, and the probability of hair perming is 10 percent. Since the probability of clipping is highest, clipping is determined as the prediction category of the first image.

The process of determining the target path loss value by the shortest path-based loss function based on the reference class of the first image and the prediction class of the first image is as follows: determining a target path loss value loss' by a shortest path-based loss function based on the reference category of the first image and the prediction category of the first image according to the following formula (2):

in the above formula (2), pred is the prediction category of the first image, gt is the reference category of the first image, loss is the original path loss value obtained based on the prediction category of the first image and the reference category of the first image, lca is the target category corresponding to both the prediction category of the first image and the reference category of the first image, root is the root category of the category hierarchical tree, SP_maxIs the first path value corresponding to the category hierarchy tree, SP is the second path value between the prediction category of the first image and the reference category of the first image, otherwise.

In one possible implementation, when the reference class of the first image and the prediction class of the first image are the same class, the target path loss value is determined according to the first formula in the above formula (2).

When determining the original path loss value based on the prediction category of the first image and the reference category of the first image, determining the original path loss value between the prediction category of the first image and the reference category of the first image based on the cross entropy loss function. Of course, the original path loss value between the prediction class of the first image and the reference class of the first image may also be determined based on other loss functions, which is not limited in the embodiment of the present application.

Illustratively, the reference category of the first image is hair-cut, the prediction category of the first image determined based on the first image classification model is also hair-cut, and since the reference category of the first image and the prediction category of the first image are the same category, it is determined that the original path loss value between the reference category of the first image and the prediction category of the first image is 0. Further, the target path loss value between the reference class of the first image and the prediction class of the first image is also determined to be 0 according to the first formula in the above formula (2).

In one possible implementation, in response to the reference category of the first image and the prediction category of the first image being different categories, calculating a loss value between the reference category of the first image and the prediction category of the first image based on a cross-entropy loss function, and determining the loss value as an original path loss value; determining a target class which is commonly corresponding to a reference class of the first image and a prediction class of the first image, wherein the target class is a nearest common class which is corresponding to the reference class of the first image and the prediction class of the first image, determining a first path value of a class hierarchy tree in response to the target class which is commonly corresponding to the reference class of the first image and the prediction class of the first image being consistent with a root class of the class hierarchy tree, wherein the first path value is a farthest path value in the class hierarchy tree, and determining a target path loss value based on an original path loss value and the first path value.

Illustratively, the reference category of the first image is hair cut, the prediction category of the first image is Meicilian, and the original path loss value is determined to be 1 based on the reference category of the first image, the prediction category of the first image, and the cross entropy loss function. The root class of the class hierarchy tree is the beauty, and the target class corresponding to the prediction class of the first image and the reference class of the first image is the beauty. Since the target class, which is commonly associated with the prediction class of the first image and the reference class of the first image, is consistent with the root class of the class hierarchy tree, the first path value of the class hierarchy tree is determined to be 4, and the target path loss value is determined to be loss' log (SP) according to the second formula of the above formula (2)_max)＝1*log4＝0.6。

In one possible implementation manner, in response to that the reference category of the first image and the prediction category of the first image are different categories, and a target category corresponding to the reference category of the first image and the prediction category of the first image together is not consistent with the root category of the category hierarchical tree, calculating a loss value between the reference category of the first image and the prediction category of the first image based on a cross entropy loss function, determining the loss value as an original path loss value, determining a path value from the reference category of the first image to the prediction category of the first image, and determining the path value as a second path value; a target path loss value is determined based on the original path loss value and the second path loss value.

Illustratively, the reference category of the first image is cut hair, the prediction category of the first image is hair washing, and the original path loss value is determined to be 0.6 based on the reference category of the first image, the prediction category of the first image, and the cross entropy loss function. The root class of the category hierarchical tree is liren, the target class corresponding to the reference class of the first image and the prediction class of the first image together is hair beauty, the target class corresponding to the reference class of the first image and the prediction class of the first image together is not consistent with the root class of the category hierarchical tree, therefore, based on the reference class of the first image and the prediction class of the first image, the second path value from the reference class of the first image to the prediction class of the first image is determined to be 4, and based on the original path loss value and the second path loss value, the target path loss value is determined to be loss'. log ═ log (sp) ═ 0.6. 4 ═ log 38 according to the third formula of the above formula (2).

After the target path loss value is determined based on the above process, inputting the target path loss value into the first image classification model, updating parameters in the first image classification model based on the target path loss value so as to obtain updated parameters, and determining the target image classification model based on the updated parameters, wherein the target image classification model is used for image classification.

According to the method, the first image is subjected to smoothing processing based on the class hierarchical tree to obtain the probability that the first image belongs to each class, the initial image classification model is trained based on the first image and the probability that the first image belongs to each class to obtain the first image classification model, and the first image is subjected to smoothing processing, so that the obtained first image classification model can effectively prevent the condition of over-training fitting, and the accuracy of the first image classification model is improved. And updating the first image classification model according to the loss function based on the shortest path to obtain a target image classification model with higher accuracy, so that the accuracy of the target image classification model and the rationality of a classification result can be further improved when the target image classification model is used for image classification.

Fig. 5 is a schematic structural diagram of an image classification apparatus according to an embodiment of the present application, and as shown in fig. 5, the apparatus includes:

an obtaining module 501, configured to obtain a target image to be classified and a target image classification model, where the target image classification model is obtained through class hierarchy tree training, a training sample training and a shortest path-based loss function training, the class hierarchy tree and the training sample training are used to train an initial image classification model to obtain a first image classification model, and the shortest path-based loss function is used to update the first image classification model to obtain the target image classification model;

the identifying module 502 is configured to invoke the target image classification model to identify the target image, so as to obtain an image category corresponding to the target image.

In a possible implementation manner, the identifying module 502 is configured to invoke the target image classification model to identify the target image, so as to obtain a plurality of reference categories corresponding to the target image and probabilities of the reference categories;

The device trains the initial image classification model based on the class hierarchy tree and the training samples to obtain a first image classification model, so that the generalization capability of the first image classification model is better, and the first image classification model can effectively prevent the condition of over-fitting training; and updating the first image classification model according to the loss function based on the shortest path to obtain a target image classification model, so that the classification accuracy of the target image classification model is better. When the target image classification model is adopted to determine the image category, the accuracy and the reasonability of the determined image category can be improved. Therefore, when the target image classification model is called to determine the image category corresponding to the target image, the determined category of the target image is more accurate.

Fig. 6 is a schematic structural diagram of a training apparatus for an image classification model according to an embodiment of the present application, and as shown in fig. 6, the apparatus includes:

an obtaining module 601, configured to obtain a category hierarchical tree and a training sample, where the category hierarchical tree includes a plurality of image categories and associations between the image categories, and the training sample includes a first image and a reference category of the first image;

a processing module 602, configured to perform class smoothing on the first image based on a reference class of the first image and a class hierarchical tree, to obtain a probability that the first image belongs to each first class, where the first class is any image class included in the class hierarchical tree;

a training module 603, configured to train an initial image classification model based on the first image and probabilities that the first image belongs to each first category, to obtain a first image classification model;

an updating module 604, configured to update the first image classification model based on the first image, the reference category of the first image, and the first image classification model through a loss function based on a shortest path, to obtain a target image classification model, where the target image classification model is used for image classification.

In a possible implementation, the updating module 604 is configured to determine a prediction category of the first image based on the first image and the first image classification model;

determining a target path loss value through the shortest path-based loss function based on the reference category of the first image and the prediction category of the first image;

In a possible implementation manner, the updating module 604 is configured to determine a target path loss value according to the following formula through the shortest path-based loss function based on the reference category of the first image and the prediction category of the first imageloss^′：

Wherein pred is the predicted category of the first image, gt is the reference category of the first image, loss is the original path loss value obtained based on the predicted category of the first image and the reference category of the first image, lca is the target category corresponding to both the predicted category of the first image and the reference category of the first image, root is the root category of the category hierarchical tree, and SP is the target category corresponding to both the predicted category of the first image and the reference category of the first image_maxThe SP is a second path value between the prediction category of the first image and the reference category of the first image, and the otherwise is a first path value corresponding to the category hierarchical tree.

In a possible implementation manner, the processing module 602 is configured to perform class smoothing on the first image according to the following formula based on the reference class of the first image and the class hierarchical tree, so as to obtain a probability P that the first image belongs to each first class_i：

Wherein i is the ith class, gt is the reference class of the first image, α is the assignment proportion of the class associated with the reference class of the first image, β is the assignment proportion of the class not associated with the reference class of the first image, e is a penalty value, and the value of e is [0.1,0.2]N of the group₁For the number of categories associated with the reference category of the first image in the category hierarchical tree, the N₂The number of categories in the category hierarchy tree that are not associated with the reference category of the first image.

According to the device, the first image is subjected to smoothing processing based on the class hierarchy tree to obtain the probability that the first image belongs to each class, the initial image classification model is trained based on the first image and the probability that the first image belongs to each class to obtain the first image classification model, and due to the fact that the first image is subjected to smoothing processing, the obtained first image classification model can effectively prevent the situation of over-training fitting, and the accuracy of the first image classification model is improved. And updating the first image classification model according to the loss function based on the shortest path to obtain a target image classification model with higher accuracy, so that the accuracy of the target image classification model and the rationality of a classification result can be further improved when the target image classification model is used for image classification.

It should be understood that the apparatuses provided in fig. 5 and fig. 6 are only illustrated by dividing the functional modules when implementing the functions thereof, and in practical applications, the functions may be distributed by different functional modules according to needs, that is, the internal structure of the apparatus may be divided into different functional modules to implement all or part of the functions described above. In addition, the apparatus provided in the above embodiments and the corresponding method embodiments belong to the same concept, and specific implementation processes thereof are described in the method embodiments for details, which are not described herein again.

Fig. 7 shows a block diagram of an electronic device 700 according to an exemplary embodiment of the present application. The electronic device 700 may be a portable mobile terminal, such as: a smart phone, a tablet computer, an MP3 player (Moving Picture Experts Group Audio Layer III, motion video Experts compression standard Audio Layer 3), an MP4 player (Moving Picture Experts Group Audio Layer IV, motion video Experts compression standard Audio Layer 4), a notebook computer, or a desktop computer. The electronic device 700 may also be referred to by other names such as user equipment, portable terminal, laptop terminal, desktop terminal, and so forth.

In general, the electronic device 700 includes: a processor 701 and a memory 702.

The processor 701 may include one or more processing cores, such as a 4-core processor, an 8-core processor, and so on. The processor 701 may be implemented in at least one hardware form of a DSP (Digital Signal Processing), an FPGA (Field-Programmable Gate Array), and a PLA (Programmable Logic Array). The processor 701 may also include a main processor and a coprocessor, where the main processor is a processor for Processing data in an awake state, and is also called a Central Processing Unit (CPU); a coprocessor is a low power processor for processing data in a standby state. In some embodiments, the processor 701 may be integrated with a GPU (Graphics Processing Unit) which is responsible for rendering and drawing the content required to be displayed by the display screen. In some embodiments, the processor 701 may further include an AI (Artificial Intelligence) processor for processing computing operations related to machine learning.

Memory 702 may include one or more computer-readable storage media, which may be non-transitory. Memory 702 may also include high-speed random access memory, as well as non-volatile memory, such as one or more magnetic disk storage devices, flash memory storage devices. In some embodiments, a non-transitory computer readable storage medium in the memory 702 is used to store at least one instruction for execution by the processor 701 to implement the image classification method, the training method of the image classification model provided by the method embodiments in the present application.

In some embodiments, the electronic device 700 may further optionally include: a peripheral interface 703 and at least one peripheral. The processor 701, the memory 702, and the peripheral interface 703 may be connected by buses or signal lines. Various peripheral devices may be connected to peripheral interface 703 via a bus, signal line, or circuit board. Specifically, the peripheral device includes: at least one of a radio frequency circuit 704, a display screen 705, a camera assembly 706, an audio circuit 707, a positioning component 708, and a power source 709.

The peripheral interface 703 may be used to connect at least one peripheral related to I/O (Input/Output) to the processor 701 and the memory 702. In some embodiments, processor 701, memory 702, and peripheral interface 703 are integrated on the same chip or circuit board; in some other embodiments, any one or two of the processor 701, the memory 702, and the peripheral interface 703 may be implemented on a separate chip or circuit board, which is not limited in this embodiment.

The Radio Frequency circuit 704 is used for receiving and transmitting RF (Radio Frequency) signals, also called electromagnetic signals. The radio frequency circuitry 704 communicates with communication networks and other communication devices via electromagnetic signals. The rf circuit 704 converts an electrical signal into an electromagnetic signal to transmit, or converts a received electromagnetic signal into an electrical signal. Optionally, the radio frequency circuit 704 includes: an antenna system, an RF transceiver, one or more amplifiers, a tuner, an oscillator, a digital signal processor, a codec chipset, a subscriber identity module card, and so forth. The radio frequency circuitry 704 may communicate with other terminals via at least one wireless communication protocol. The wireless communication protocols include, but are not limited to: the world wide web, metropolitan area networks, intranets, generations of mobile communication networks (2G, 3G, 4G, and 5G), Wireless local area networks, and/or WiFi (Wireless Fidelity) networks. In some embodiments, the radio frequency circuit 704 may also include NFC (Near Field Communication) related circuits, which are not limited in this application.

The display screen 705 is used to display a UI (User Interface). The UI may include graphics, text, icons, video, and any combination thereof. When the display screen 705 is a touch display screen, the display screen 705 also has the ability to capture touch signals on or over the surface of the display screen 705. The touch signal may be input to the processor 701 as a control signal for processing. At this point, the display 705 may also be used to provide virtual buttons and/or a virtual keyboard, also referred to as soft buttons and/or a soft keyboard. In some embodiments, the display 705 may be one, disposed on the front panel of the electronic device 700; in other embodiments, the number of the display screens 705 may be at least two, and the at least two display screens are respectively disposed on different surfaces of the electronic device 700 or are in a folding design; in other embodiments, the display 705 may be a flexible display disposed on a curved surface or on a folded surface of the electronic device 700. Even more, the display 705 may be arranged in a non-rectangular irregular pattern, i.e. a shaped screen. The Display 705 may be made of LCD (Liquid Crystal Display), OLED (Organic Light-Emitting Diode), or the like.

The camera assembly 706 is used to capture images or video. Optionally, camera assembly 706 includes a front camera and a rear camera. Generally, a front camera is disposed at a front panel of the terminal, and a rear camera is disposed at a rear surface of the terminal. In some embodiments, the number of the rear cameras is at least two, and each rear camera is any one of a main camera, a depth-of-field camera, a wide-angle camera and a telephoto camera, so that the main camera and the depth-of-field camera are fused to realize a background blurring function, and the main camera and the wide-angle camera are fused to realize panoramic shooting and VR (Virtual Reality) shooting functions or other fusion shooting functions. In some embodiments, camera assembly 706 may also include a flash. The flash lamp can be a monochrome temperature flash lamp or a bicolor temperature flash lamp. The double-color-temperature flash lamp is a combination of a warm-light flash lamp and a cold-light flash lamp, and can be used for light compensation at different color temperatures.

The audio circuitry 707 may include a microphone and a speaker. The microphone is used for collecting sound waves of a user and the environment, converting the sound waves into electric signals, and inputting the electric signals to the processor 701 for processing or inputting the electric signals to the radio frequency circuit 704 to realize voice communication. For stereo capture or noise reduction purposes, the microphones may be multiple and disposed at different locations of the electronic device 700. The microphone may also be an array microphone or an omni-directional pick-up microphone. The speaker is used to convert electrical signals from the processor 701 or the radio frequency circuit 704 into sound waves. The loudspeaker can be a traditional film loudspeaker or a piezoelectric ceramic loudspeaker. When the speaker is a piezoelectric ceramic speaker, the speaker can be used for purposes such as converting an electric signal into a sound wave audible to a human being, or converting an electric signal into a sound wave inaudible to a human being to measure a distance. In some embodiments, the audio circuitry 707 may also include a headphone jack.

The positioning component 708 is operable to locate a current geographic Location of the electronic device 700 to implement a navigation or LBS (Location Based Service). The Positioning component 708 can be a Positioning component based on the Global Positioning System (GPS) in the united states, the beidou System in china, or the galileo System in russia.

The power supply 709 is used to supply power to various components in the electronic device 700. The power source 709 may be alternating current, direct current, disposable batteries, or rechargeable batteries. When the power source 709 includes a rechargeable battery, the rechargeable battery may be a wired rechargeable battery or a wireless rechargeable battery. The wired rechargeable battery is a battery charged through a wired line, and the wireless rechargeable battery is a battery charged through a wireless coil. The rechargeable battery may also be used to support fast charge technology.

In some embodiments, the electronic device 700 also includes one or more sensors 710. The one or more sensors 710 include, but are not limited to: acceleration sensor 711, gyro sensor 712, pressure sensor 713, fingerprint sensor 714, optical sensor 715, and proximity sensor 716.

The acceleration sensor 711 may detect the magnitude of acceleration in three coordinate axes of a coordinate system established with the electronic device 700. For example, the acceleration sensor 711 may be used to detect components of the gravitational acceleration in three coordinate axes. The processor 701 may control the display screen 705 to display the user interface in a landscape view or a portrait view according to the gravitational acceleration signal collected by the acceleration sensor 711. The acceleration sensor 711 may also be used for acquisition of motion data of a game or a user.

The gyro sensor 712 may detect a body direction and a rotation angle of the electronic device 700, and the gyro sensor 712 may cooperate with the acceleration sensor 711 to acquire a 3D motion of the user with respect to the electronic device 700. From the data collected by the gyro sensor 712, the processor 701 may implement the following functions: motion sensing (such as changing the UI according to a user's tilting operation), image stabilization at the time of photographing, game control, and inertial navigation.

Pressure sensors 713 may be disposed on a side bezel of electronic device 700 and/or underlying display screen 705. When the pressure sensor 713 is disposed on a side frame of the electronic device 700, a user holding signal of the electronic device 700 may be detected, and the processor 701 may perform left-right hand recognition or shortcut operation according to the holding signal collected by the pressure sensor 713. When the pressure sensor 713 is disposed at a lower layer of the display screen 705, the processor 701 controls the operability control on the UI interface according to the pressure operation of the user on the display screen 705. The operability control comprises at least one of a button control, a scroll bar control, an icon control and a menu control.

The fingerprint sensor 714 is used for collecting a fingerprint of a user, and the processor 701 identifies the identity of the user according to the fingerprint collected by the fingerprint sensor 714, or the fingerprint sensor 714 identifies the identity of the user according to the collected fingerprint. When the user identity is identified as a trusted identity, the processor 701 authorizes the user to perform relevant sensitive operations, including unlocking a screen, viewing encrypted information, downloading software, paying, changing settings, and the like. The fingerprint sensor 714 may be disposed on the front, back, or side of the electronic device 700. When a physical button or vendor Logo is provided on the electronic device 700, the fingerprint sensor 714 may be integrated with the physical button or vendor Logo.

The optical sensor 715 is used to collect the ambient light intensity. In one embodiment, the processor 701 may control the display brightness of the display screen 705 based on the ambient light intensity collected by the optical sensor 715. Specifically, when the ambient light intensity is high, the display brightness of the display screen 705 is increased; when the ambient light intensity is low, the display brightness of the display screen 705 is adjusted down. In another embodiment, processor 701 may also dynamically adjust the shooting parameters of camera assembly 706 based on the ambient light intensity collected by optical sensor 715.

A proximity sensor 716, also referred to as a distance sensor, is typically disposed on the front panel of the electronic device 700. The proximity sensor 716 is used to capture the distance between the user and the front of the electronic device 700. In one embodiment, the processor 701 controls the display screen 705 to switch from the bright screen state to the dark screen state when the proximity sensor 716 detects that the distance between the user and the front surface of the electronic device 700 is gradually decreased; when the proximity sensor 716 detects that the distance between the user and the front surface of the electronic device 700 is gradually increased, the processor 701 controls the display screen 705 to switch from the breath screen state to the bright screen state.

Those skilled in the art will appreciate that the configuration shown in fig. 7 does not constitute a limitation of the electronic device 700 and may include more or fewer components than those shown, or combine certain components, or employ a different arrangement of components.

In an exemplary embodiment, a computer readable storage medium is further provided, in which at least one program code is stored, and the at least one program code is loaded and executed by a processor to implement any one of the image classification method and the training method of the image classification model described above.

Alternatively, the computer-readable storage medium may be a Read-Only Memory (ROM), a Random Access Memory (RAM), a Compact Disc Read-Only Memory (CD-ROM), a magnetic tape, a floppy disk, an optical data storage device, and the like.

In an exemplary embodiment, a computer program or a computer program product is further provided, in which at least one computer instruction is stored, and the at least one computer instruction is loaded and executed by a processor to implement any one of the image classification method and the training method of the image classification model described above.

It should be understood that reference to "a plurality" herein means two or more. "and/or" describes the association relationship of the associated objects, meaning that there may be three relationships, e.g., a and/or B, which may mean: a exists alone, A and B exist simultaneously, and B exists alone. The character "/" generally indicates that the former and latter associated objects are in an "or" relationship.

The above-mentioned serial numbers of the embodiments of the present application are merely for description and do not represent the merits of the embodiments.

The above description is only exemplary of the present application and should not be taken as limiting the present application, and any modifications, equivalents, improvements and the like that are made within the spirit and principle of the present application should be included in the protection scope of the present application.

Claims

1. A method of image classification, the method comprising:

2. The method according to claim 1, wherein the calling the target image classification model to identify the target image to obtain the image class corresponding to the target image comprises:

3. A method for training an image classification model, the method comprising:

4. The method of claim 3, wherein the updating the first image classification model based on the first image, the reference class of the first image, and the first image classification model by a shortest path-based loss function to obtain a target image classification model comprises:

5. The method of claim 4, wherein determining a target path loss value through the shortest path based loss function based on the reference class of the first picture and the prediction class of the first picture comprises:

6. The method according to any one of claims 3 to 5, wherein the performing class smoothing on the first image based on the reference class of the first image and the class hierarchical tree to obtain the probability that the first image belongs to each first class comprises:

7. An image classification apparatus, characterized in that the apparatus comprises:

8. An apparatus for training an image classification model, the apparatus comprising:

9. An electronic device, comprising a processor and a memory, wherein at least one program code is stored in the memory, and wherein the at least one program code is loaded into and executed by the processor to implement the method for image classification according to claim 1 or 2, or to implement the method for training an image classification model according to any one of claims 3 to 6.

10. A computer-readable storage medium, having stored therein at least one program code, which is loaded and executed by a processor, to implement the method for image classification according to claim 1 or 2, or to implement the method for training an image classification model according to any one of claims 3 to 6.