WO2022188080A1 - Procédé d'entraînement de modèle de réseau de classification d'images, procédé de classification d'images et dispositif associé - Google Patents

Procédé d'entraînement de modèle de réseau de classification d'images, procédé de classification d'images et dispositif associé Download PDF

Info

Publication number
WO2022188080A1
WO2022188080A1 PCT/CN2021/080087 CN2021080087W WO2022188080A1 WO 2022188080 A1 WO2022188080 A1 WO 2022188080A1 CN 2021080087 W CN2021080087 W CN 2021080087W WO 2022188080 A1 WO2022188080 A1 WO 2022188080A1
Authority
WO
WIPO (PCT)
Prior art keywords
label
class
category
image classification
image
Prior art date
Application number
PCT/CN2021/080087
Other languages
English (en)
Chinese (zh)
Inventor
王蕊
童学智
曲强
姜青山
Original Assignee
深圳先进技术研究院
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 深圳先进技术研究院 filed Critical 深圳先进技术研究院
Publication of WO2022188080A1 publication Critical patent/WO2022188080A1/fr

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/24Classification techniques
    • G06F18/241Classification techniques relating to the classification model, e.g. parametric or non-parametric approaches
    • G06F18/2415Classification techniques relating to the classification model, e.g. parametric or non-parametric approaches based on parametric or probabilistic models, e.g. based on likelihood ratio or false acceptance rate versus a false rejection rate

Definitions

  • the present application relates to the technical field of image processing, and in particular, to a training method of an image classification network model, an image classification method and related equipment.
  • Image classification is one of the most basic problems in the field of image processing technology.
  • a deep neural network image classification method is mainly used, and specifically, the image to be classified and the category label of the to-be-classified image are input into the deep neural network model to train the deep neural network model.
  • the predicted classification label of the image to be classified outputted by the deep neural network model obtained in the above manner may be wrong and unexplainable.
  • the present application provides a training method for an image classification network model, an image classification method and related equipment.
  • the present application provides a training method of an image classification network model, the method comprising:
  • the external knowledge base including the true category label of the training image
  • the image classification network model is trained based on the target loss function.
  • the present application provides an image classification method, the image classification method includes:
  • the class labels of the images to be classified are evaluated to obtain an interpretability score.
  • the present application provides a terminal device, the device includes a memory and a processor coupled to the memory;
  • the memory is used to store program data
  • the processor is used to execute the program data to implement the above-mentioned training method for an image classification network model and/or the above-mentioned image classification method.
  • the present application also provides a computer storage medium, which is used for storing program data.
  • a computer storage medium which is used for storing program data.
  • the beneficial effects of the present application are: acquiring training images and an external knowledge base, where the external knowledge base includes the real category labels of the training images; encoding the external knowledge base to obtain a category distance matrix; combining the training images and their real category labels and category distances
  • the matrix is input to the image classification network model, and the predicted class probability distribution of the training image is obtained, wherein the predicted class probability distribution includes the difference probability between the predicted class label output by the image classification network model and the real class label; the real class label in the class distance matrix is used.
  • Calculate the target loss function based on the depth distance from the predicted class label and the predicted class probability distribution; train the network model based on the target loss function.
  • This application cites an external knowledge base to constrain the predicted category probability distribution output by the image classification network model, which improves the accuracy of image classification and enhances the interpretability of the predicted results.
  • FIG. 1 is a schematic flowchart of an embodiment of a training method for an image classification network model provided by the present application
  • FIG. 2 is a simple schematic diagram of an external knowledge base in the training method of the image classification network model provided by the application;
  • FIG. 3 is a schematic flowchart of an embodiment of S102 in the training method of the image classification network model shown in FIG. 1;
  • FIG. 4 is a schematic flowchart of an embodiment of S104 in the training method of the image classification network model shown in FIG. 1;
  • FIG. 5 is a schematic flowchart of an embodiment of an image classification method provided by the present application.
  • FIG. 6 is a schematic structural diagram of an embodiment of a terminal device provided by the present application.
  • FIG. 7 is a schematic structural diagram of an embodiment of a computer storage medium provided by the present application.
  • FIG. 1 is a schematic flowchart of an embodiment of the training method for an image classification network model provided by the present application.
  • the training method of the image classification network model in this embodiment can be applied to an image classification apparatus.
  • the image classification apparatus of the present application may be a server, a mobile device, or a system in which a server and a mobile device cooperate with each other.
  • each part included in the mobile device such as each unit, subunit, module, and submodule, may all be provided in the server, or in the mobile device, or in the server and the mobile device, respectively.
  • the above server may be hardware or software.
  • the server When the server is hardware, it can be implemented as a distributed server cluster composed of multiple servers, or can be implemented as a single server.
  • the server When the server is software, it can be implemented as multiple software or software modules, such as software or software modules for providing distributed servers, or can be implemented as a single software or software module, which is not specifically limited here.
  • S101 Acquire training images and an external knowledge base, where the external knowledge base includes ground-truth class labels of the training images.
  • the image classification network model is trained by simply using the training image and the real class label of the training image, the predicted classification label output by the obtained image classification network model has the possibility of error, and there is a possibility of error. inexplicability.
  • the image classification apparatus of the present application refers to an external knowledge base to constrain the prediction results of the image classification network model.
  • FIG. 2 is a simple schematic diagram of an external knowledge base in the training method of the image classification network model provided by the present application.
  • the external knowledge base is a tree-like structure composed of multiple category labels, each node in the tree-like structure represents a category label, and the closer the position in the tree-like structure is, the more similar the category labels are between nodes.
  • the external knowledge base in this embodiment should include all the class labels that the image classification network model can distinguish.
  • the external knowledge base at least includes the real category labels of the training images.
  • a single external knowledge base may not be able to include ground-truth class labels for all training images.
  • the training method of the image classification network model of this embodiment can supplement the missing category labels in the single external knowledge base by manually extracting the category labels from the additional knowledge base.
  • the number of training images required in this embodiment should be as large as possible.
  • the number of training images is at least 1000.
  • the image classification apparatus of this embodiment should unify the pixel size of the training images, for example, uniformly scale them to 256 ⁇ 256, so that it is convenient to use the training images of the same pixel size to compare the images.
  • the classification network model is trained.
  • the external knowledge base needs to be encoded to obtain a category distance matrix.
  • the category distance matrix includes the depth distance between any two category labels in the external knowledge base, that is, the semantic distance.
  • this embodiment may adopt the embodiment in FIG. 3 to implement S102, which specifically includes S201 to S203:
  • the image classification apparatus of this embodiment may know in advance a class distance matrix including the depth distance between the real class label and the predicted class label. Specifically, for the acquisition of the category distance matrix, the image classification device first needs to acquire any two category labels in the external knowledge base.
  • the image classification apparatus obtains the common class label between any two class labels, that is, the most recent common ancestor, in the external knowledge base.
  • the nearest common ancestor is the ancestor of one category label and the other category label in any two category labels, and the ancestor depth is as large as possible.
  • S203 Calculate the depth distance of any two class labels based on the common class label, so as to obtain a class distance matrix including the depth distance of any two class labels.
  • the image classification apparatus of this embodiment uses the common class label to calculate the depth distance of any two class labels, so as to obtain a class distance matrix including the depth distance of any two class labels.
  • the image classification device obtains the common class label, the depth of one class label and the depth of the other class label respectively; calculates the sum of the depth of one class label and the depth of the other class label; uses the common class label The ratio of the depth to the sum of the depth of one category label and the depth of the other category label above calculates the depth distance between any two category labels.
  • Wup Wi-Palmer semantic similarity
  • c 1 and c 2 are two category labels in the external knowledge base
  • depth(c 1 ) is the depth of the category label c 1
  • depth(c 2 ) is the depth of the category label c 2
  • lcs(c 1 , c 2 ) is the common class label of class label c 1 and class label c 2
  • depth(lcs(c 1 , c 2 )) is the depth of the common class label
  • d(c 1 , c 2 ) is the class label c 1 and the class Depth distance between labels c 2 .
  • the image classification apparatus of this embodiment obtains the number of layers of the common class label by locating the label position of the common class label in the external knowledge base, and uses the label position in the external knowledge base to determine the depth of the common class label.
  • the depth acquisition method of the category label c 1 and the category label c 2 refers to the depth acquisition method of the common category label, which will not be repeated here.
  • S103 Input the training image and its true category label and category distance matrix into an image classification network model to obtain the predicted category probability distribution of the training image.
  • the image classification apparatus of this embodiment inputs the training image, its real category label and category distance matrix into the image classification network model, and obtains the predicted category probability distribution of the training image.
  • the predicted class probability distribution includes the difference probability between the predicted class label output by the image classification network model and the real class label.
  • S104 Calculate the target loss function by using the depth distance between the true category label and the predicted category label in the category distance matrix and the predicted category probability distribution.
  • the loss function used in the existing image classification network model training method has a loss function value when the predicted class label of the training image is consistent with the real class label.
  • the loss function value is 0. Therefore, the existing loss function ignores the impact on the training of the image classification network model when the predicted class label of the training image is inconsistent with the real class label, resulting in the predicted class probability distribution output by the image classification network model inconsistent with common sense.
  • the image classification network model training method of the present embodiment expands the loss function, and takes into account the influence of the inconsistency between the predicted category label of the training image and the real category label on the image classification network model.
  • the image classification apparatus of this embodiment uses the depth distance between the real class label and the predicted class label in the class distance matrix and the predicted class probability distribution to calculate the target loss function.
  • this embodiment may adopt the embodiment of FIG. 4 to implement S104, which specifically includes S301 to S304:
  • the target loss function in the image classification network model in this embodiment includes a first loss function and a second loss function.
  • the first loss function and the second loss function respectively represent different aspects of the network model.
  • the first loss function represents the loss between the predicted class probability distribution output by the image classification network model and the preset class probability distribution when the predicted class of the training image is consistent with the real class.
  • the second loss function indicates that when the predicted category of the training image is inconsistent with the real category, the depth distance between the predicted category of the training image and the real category, that is, the semantic distance, is used to obtain the predicted category probability distribution and depth output by the image classification network model. loss between distances.
  • S302 Calculate a first loss function by using the predicted category probability distribution and the depth distance.
  • the image classification apparatus calculates a first loss function using the predicted class probability distribution and the depth distance.
  • p(k, l) is the predicted class probability output by the image classification network model.
  • I(k, l) is an indicator function.
  • the indicator function is 1 when the real class labels of l and k are consistent, and the indicator function is 0 when the real class labels of l and k are inconsistent.
  • the first loss function may be a cross-entropy loss function.
  • S303 Calculate the second loss function by using the predicted class probability distribution and the true class label.
  • the image classification apparatus of this embodiment extends the first loss function to constrain the prediction results of other category labels except the true category label. Specifically, the image classification apparatus calculates the second loss function using the predicted class probability distribution and the true class label.
  • the second loss function satisfies the following formula:
  • L Sem (k) is the second loss function
  • t k is the true class label corresponding to the training image
  • d(t k , l) is the class distance matrix including the depth distance between the predicted class label and the true class label
  • p(k, l) is the predicted class probability output by the image classification network model.
  • S304 Calculate the target loss function based on the first loss function and the second loss function.
  • the image classification apparatus uses the first loss function and the second loss function to calculate the target loss function.
  • the objective loss function satisfies the following formula:
  • L(k) is the target loss function
  • is the weight coefficient, which is used to balance the first loss function and the second loss function to optimize the training of the image classification network model.
  • the image classification apparatus may use a grid search method to determine the weight coefficient ⁇ .
  • the image classification apparatus of this embodiment trains an image classification network model with an objective loss function. Specifically, the image classification apparatus of this embodiment can use the gradient descent technique to train the target loss function.
  • the image classification device refers to an external knowledge base to constrain the predicted category probability distribution output by the image classification network model, which improves the accuracy of image classification and enhances the interpretability of the predicted results; using the predicted category probability distribution and depth.
  • the distance calculation target loss function extends the existing loss function to avoid the fact that the existing loss function ignores the inconsistency between the predicted category label of the training image and the real category label, which causes the predicted category probability distribution output by the image classification network model to be inconsistent with common sense.
  • FIG. 5 is a schematic flowchart of an embodiment of an image classification method provided by the present application.
  • the image classification method in this embodiment can be applied to the image classification network model trained in the above-mentioned training method of the image classification network model, so as to improve the accuracy of image classification and the interpretability of prediction results.
  • the server used for the image classification method as an example below, the image classification method provided by the present application is introduced.
  • the image classification method in this embodiment specifically includes the following steps:
  • S401 Acquire an image to be classified.
  • the acquisition of the image to be classified in this embodiment is similar to the acquisition of the training image in the above-mentioned embodiment S101, and details are not repeated here.
  • S402 Input the image to be classified into the image classification network model to obtain the class label of the image to be classified.
  • the image classification apparatus of this embodiment inputs the image to be classified into the image classification network model, and obtains the class label of the image to be classified.
  • S403 Evaluate the category labels of the images to be classified to obtain an interpretability score.
  • the image classification network model In order to improve the accuracy of image classification and enhance the interpretability of the prediction results, in this embodiment, it is necessary to evaluate the class labels of the images to be classified outputted by the image classification network model to obtain an interpretability score.
  • the images to be classified are input into the image classification network model, the class label ranking value and the class probability distribution are obtained, and the interpretability is calculated by using the class label ranking value including the first class label ranking value and the second class label ranking value. score.
  • the first category label ranking value is the difference probability ranking value between the category label of the image to be classified and the real category label of the image to be classified in the category probability distribution.
  • the second category label ranking value is the depth distance ranking value between the category label of the image to be classified and the real category label of the image to be classified in the category distance matrix.
  • r k, l is the category probability ranking value of the category probability that the image k to be classified belongs to the category label l in the category probability distribution.
  • s k, l is the ranking value of the depth distance between the class label l and the real class label t k of the image to be classified in the class distance matrix, that is, s k, l is the corresponding sort value.
  • label for category The ranking value of the depth distance between the real class label t k of the image to be classified in the class distance matrix, i.e. for the corresponding sort value.
  • the images to be classified are acquired, the images to be classified are input into the image classification network model, the class labels of the images to be classified are obtained, the class labels of the images to be classified are evaluated, and the interpretability score is obtained, so as to improve the accuracy of image classification. and enhance the interpretability of prediction results.
  • FIG. 6 is a schematic structural diagram of an embodiment of the terminal device provided by the present application.
  • the terminal device 600 includes a memory 61 and a processor 62, wherein the memory 61 and the processor 62 are coupled.
  • the memory 61 is used for storing program data
  • the processor 62 is used for executing the program data to implement the image classification network model training method and/or the image classification method in the above-mentioned embodiments.
  • the processor 62 may also be referred to as a CPU (Central Processing Unit, central processing unit).
  • the processor 62 may be an integrated circuit chip with signal processing capability.
  • the processor 62 may also be a general purpose processor, digital signal processor (DSP), application specific integrated circuit (ASIC), field programmable gate array (FPGA) or other programmable logic device, discrete gate or transistor logic, discrete hardware components .
  • DSP digital signal processor
  • ASIC application specific integrated circuit
  • FPGA field programmable gate array
  • a general purpose processor may be a microprocessor or the processor 62 may be any conventional processor or the like.
  • the present application also provides a computer storage medium 700.
  • the computer storage medium 700 is used to store program data 71, and when the program data 71 is executed by the processor, it is used to realize the method described in the embodiment of the present application.
  • the image classification network model training method and/or the image classification method are used to realize the method described in the embodiment of the present application.
  • the methods involved in the embodiments of the image classification network model training method and/or the image classification method of the present application exist in the form of software functional units when implemented and are sold or used as independent products, and can be stored in the device, for example a computer-readable storage medium.
  • the technical solutions of the present application can be embodied in the form of software products in essence, or the parts that contribute to the prior art, or all or part of the technical solutions, and the computer software products are stored in a storage medium , including several instructions for causing a computer device (which may be a personal computer, a server, or a network device, etc.) or a processor to execute all or part of the steps of the methods described in the various embodiments of the present invention.
  • the aforementioned storage medium includes: U disk, mobile hard disk, Read-Only Memory (ROM, Read-Only Memory), Random Access Memory (RAM, Random Access Memory), magnetic disk or optical disk and other media that can store program codes .

Landscapes

  • Engineering & Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Theoretical Computer Science (AREA)
  • Data Mining & Analysis (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Artificial Intelligence (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Evolutionary Biology (AREA)
  • Evolutionary Computation (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Probability & Statistics with Applications (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)
  • Image Analysis (AREA)

Abstract

La présente invention concerne un procédé d'entraînement de modèle de réseau de classification d'images, un procédé de classification d'images et un dispositif associé. Le procédé d'entraînement de modèle de réseau de classification d'images comprend les étapes consistant à acquérir une image d'apprentissage et une base de connaissances externe, la base de connaissances externe comprenant une étiquette de classe réelle de l'image d'apprentissage ; à coder la base de connaissances externe pour obtenir une matrice de distance de classe ; à entrer l'image d'apprentissage et l'étiquette de classe réelle de celle-ci et la matrice de distance de classe dans un modèle de réseau de classification d'image pour obtenir une distribution de probabilité de classe prédite de l'image d'apprentissage, la distribution de probabilité de classe prédite comprenant une probabilité de différence entre une sortie d'étiquette de classe prédite par le modèle de réseau de classification d'image et l'étiquette de classe réelle ; à calculer une fonction de perte cible en utilisant une distance de profondeur, dans la matrice de distance de classe, entre l'étiquette de classe réelle et l'étiquette de classe prédite et la distribution de probabilité de classe prédite ; et à entraîner le modèle de réseau sur la base de la fonction de perte cible. La présente invention est utilisée pour obtenir un modèle de réseau de classification d'images pour à la fois améliorer la précision de classification d'images et améliorer l'interprétation de résultat de prédiction.
PCT/CN2021/080087 2021-03-08 2021-03-10 Procédé d'entraînement de modèle de réseau de classification d'images, procédé de classification d'images et dispositif associé WO2022188080A1 (fr)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN202110249741.3A CN112949724A (zh) 2021-03-08 2021-03-08 图像分类网络模型的训练方法、图像分类方法及相关设备
CN202110249741.3 2021-03-08

Publications (1)

Publication Number Publication Date
WO2022188080A1 true WO2022188080A1 (fr) 2022-09-15

Family

ID=76229599

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2021/080087 WO2022188080A1 (fr) 2021-03-08 2021-03-10 Procédé d'entraînement de modèle de réseau de classification d'images, procédé de classification d'images et dispositif associé

Country Status (2)

Country Link
CN (1) CN112949724A (fr)
WO (1) WO2022188080A1 (fr)

Families Citing this family (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113901647A (zh) * 2021-09-24 2022-01-07 成都飞机工业(集团)有限责任公司 一种零件工艺规程编制方法、装置、存储介质及电子设备

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110147700A (zh) * 2018-05-18 2019-08-20 腾讯科技(深圳)有限公司 视频分类方法、装置、存储介质以及设备
CN110929807A (zh) * 2019-12-06 2020-03-27 腾讯科技(深圳)有限公司 图像分类模型的训练方法、图像分类方法及装置
CN111353542A (zh) * 2020-03-03 2020-06-30 腾讯科技(深圳)有限公司 图像分类模型的训练方法、装置、计算机设备和存储介质
WO2020185198A1 (fr) * 2019-03-08 2020-09-17 Google Llc Ensemble rcnn insensible au bruit pour la détection d'objets semi-supervisés

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110147700A (zh) * 2018-05-18 2019-08-20 腾讯科技(深圳)有限公司 视频分类方法、装置、存储介质以及设备
WO2020185198A1 (fr) * 2019-03-08 2020-09-17 Google Llc Ensemble rcnn insensible au bruit pour la détection d'objets semi-supervisés
CN110929807A (zh) * 2019-12-06 2020-03-27 腾讯科技(深圳)有限公司 图像分类模型的训练方法、图像分类方法及装置
CN111353542A (zh) * 2020-03-03 2020-06-30 腾讯科技(深圳)有限公司 图像分类模型的训练方法、装置、计算机设备和存储介质

Also Published As

Publication number Publication date
CN112949724A (zh) 2021-06-11

Similar Documents

Publication Publication Date Title
WO2020182019A1 (fr) Procédé de recherche d'image, appareil, dispositif, et support de stockage lisible par ordinateur
US20170372169A1 (en) Method and apparatus for recognizing image content
CN103678702B (zh) 视频去重方法及装置
KR20180011221A (ko) 비디오들에 대해 대표 비디오 프레임들 선택
CN106919957B (zh) 处理数据的方法及装置
CN107209861A (zh) 使用否定数据优化多类别多媒体数据分类
CN110674312B (zh) 构建知识图谱方法、装置、介质及电子设备
WO2023115761A1 (fr) Procédé et appareil de détection d'événement basés sur un graphe de connaissances temporelles
CN112329460B (zh) 文本的主题聚类方法、装置、设备及存储介质
CN108959474B (zh) 实体关系提取方法
CN111612041A (zh) 异常用户识别方法及装置、存储介质、电子设备
CN105989001B (zh) 图像搜索方法及装置、图像搜索系统
CN116049412B (zh) 文本分类方法、模型训练方法、装置及电子设备
CN112131322B (zh) 时间序列分类方法及装置
JP2023536773A (ja) テキスト品質評価モデルのトレーニング方法及びテキスト品質の決定方法、装置、電子機器、記憶媒体およびコンピュータプログラム
WO2022188080A1 (fr) Procédé d'entraînement de modèle de réseau de classification d'images, procédé de classification d'images et dispositif associé
JP7259935B2 (ja) 情報処理システム、情報処理方法およびプログラム
CN110959157B (zh) 加速大规模相似性计算
CN114372518B (zh) 一种基于解题思路和知识点的试题相似度计算方法
US9122705B1 (en) Scoring hash functions
CN114511715A (zh) 一种驾驶场景数据挖掘方法
CN114528908A (zh) 网络请求数据分类模型训练方法、分类方法及存储介质
JP5824429B2 (ja) スパムアカウントスコア算出装置、スパムアカウントスコア算出方法、及びプログラム
CN111984812A (zh) 一种特征提取模型生成方法、图像检索方法、装置及设备
CN116228484B (zh) 基于量子聚类算法的课程组合方法及装置

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 21929558

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE

122 Ep: pct application non-entry in european phase

Ref document number: 21929558

Country of ref document: EP

Kind code of ref document: A1