WO2020047921A1 - 基于层次三元组损失函数的深度度量学习方法及其装置 - Google Patents

基于层次三元组损失函数的深度度量学习方法及其装置 Download PDF

Info

Publication number
WO2020047921A1
WO2020047921A1 PCT/CN2018/108405 CN2018108405W WO2020047921A1 WO 2020047921 A1 WO2020047921 A1 WO 2020047921A1 CN 2018108405 W CN2018108405 W CN 2018108405W WO 2020047921 A1 WO2020047921 A1 WO 2020047921A1
Authority
WO
WIPO (PCT)
Prior art keywords
hierarchical
loss function
triplet loss
category
distance
Prior art date
Application number
PCT/CN2018/108405
Other languages
English (en)
French (fr)
Inventor
黄伟林
戈维峰
董登科
斯科特·马修·罗伯特
Original Assignee
深圳码隆科技有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 深圳码隆科技有限公司 filed Critical 深圳码隆科技有限公司
Publication of WO2020047921A1 publication Critical patent/WO2020047921A1/zh

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks

Definitions

  • the present application relates to the field of image recognition technology, and more particularly, to a deep metric learning method and device based on a hierarchical triplet loss function.
  • a loss function refers to a type that maps an event (an element in a sample space) to a real number that expresses the economic or opportunity cost associated with its event. function. More generally, the loss function in statistics is a function that measures the degree of loss and errors (such losses are related to "wrong" estimates, such as cost or equipment loss).
  • the triplet loss method is generally used to remove the final classification layer of the convolutional neural network, and the feature code normalized by the triplet loss function is directly used.
  • the present application provides a deep metric learning method and device based on the hierarchical triplet loss function to solve the shortcomings of the prior art.
  • this application provides a deep metric learning method based on a hierarchical triplet loss function, including:
  • the distance between any two classes is obtained to construct a hierarchical category tree
  • the triplet loss function is hierarchically obtained through the distance between the classes to obtain a hierarchical triplet loss function
  • a neural network is trained based on the hierarchical triplet loss function to extract target image extraction features, and an image search is performed according to the target image extraction features to obtain a target search image.
  • the obtaining the inter-class distance between any two classes based on the triplet loss function, and constructing a hierarchical category tree includes:
  • Calculation is performed through the data hierarchical structure to obtain the inter-class distance between any two classes, and a hierarchical category tree is constructed based on the inter-class distance.
  • the calculating through the data hierarchical structure to obtain an inter-class distance between any two classes includes:
  • this formula represents the average distance between the p-th category and the q-th category, which is the distance between classes;
  • the range of the distance between classes is 0 to 4.
  • the hierarchical category tree includes multiple levels; wherein the average intra-class distance is used as the merge threshold of the 0th level; the hierarchical category tree further includes a plurality of leaf nodes; each leaf node is an image of a corresponding hierarchy category;
  • the constructing a hierarchical category tree by using the distance between the classes includes:
  • the merge threshold is set as the following formula: Among them, d l is a threshold for merging any two types. In the hierarchical category tree, if the distance between any two types in the first level is less than d l , the two types are merged. Where d 0 is the average distance within the class,
  • the triplet loss function is hierarchically obtained by using the distance between the classes to obtain a hierarchical triplet loss function, including:
  • a dynamic loss boundary corresponding to the triple formed by the training picture is searched to form the hierarchical triplet loss function corresponding to the dynamic loss boundary.
  • the searching for a dynamic loss boundary corresponding to a triple formed by the training picture to form the hierarchical triple loss function includes:
  • the present application also provides a deep metric learning device based on a hierarchical triplet loss function, including: a construction module, a hierarchy module, and a training module;
  • the building module is configured to obtain a distance between any two classes based on a triplet loss function, and construct a hierarchical category tree;
  • the hierarchical module is configured to hierarchically perform a triplet loss function based on the hierarchical category tree and obtain a hierarchical triplet loss function through the distance between the classes;
  • the training module is configured to train a neural network based on the hierarchical triplet loss function, extract target image extraction features, and perform an image search according to the target image extraction features to obtain a target search image.
  • the building module includes:
  • a training unit configured to use a standard triplet loss function to obtain a triplet neural network model
  • a hierarchical unit configured to obtain a hierarchical structure of data according to the triplet neural network model
  • a construction unit is configured to perform calculation through the data hierarchical structure to obtain an inter-class distance between any two classes, and construct a hierarchical category tree by using the inter-class distance.
  • the building unit includes:
  • the calculation subunit is configured to define any two categories, namely the p-th category and the q-th category, and the distance between the categories is calculated by the following formula:
  • this formula represents the average distance between the p-th category and the q-th category, which is the distance between the classes.
  • the range of the distance between classes is 0 to 4.
  • the hierarchical category tree includes multiple levels; wherein the average intra-class distance is used as the merge threshold of the 0th level; the hierarchical category tree further includes a plurality of leaf nodes; each leaf node is an image of a corresponding hierarchy category;
  • the building unit further includes:
  • a merge subunit configured to merge the leaf nodes according to the inter-class distance; wherein the merge of the leaf nodes is performed by setting the merge threshold to construct the hierarchical category tree;
  • the merge threshold is set as the following formula: Where d l is the distance between any two classes. In the hierarchical category tree, if the distance between any two classes in the first level is less than d l , the two classes are merged.
  • the hierarchical module includes:
  • a first extraction unit configured to extract a leaf node at level 0 in the hierarchical category tree as a target leaf node
  • a selection unit configured to select, based on the inter-class distance, the target leaf node corresponding to its nearest neighbor category as the anchor category;
  • a second extraction unit configured to randomly extract pictures in each of the anchor point categories to form a training picture
  • the constituent unit is configured to search a dynamic loss boundary corresponding to a triple formed by the training picture, and constitute a hierarchical triplet loss function corresponding to the dynamic loss boundary.
  • the constituent unit is specifically configured to:
  • the present application also provides a user terminal including a memory and a processor configured to store a deep metric learning program based on a hierarchical triplet loss function, and the processor runs the hierarchical based A deep metric learning program for a triplet loss function to cause the user terminal to execute the deep metric learning method based on the hierarchical triplet loss function as described above.
  • the present application also provides a computer-readable storage medium, where the computer-readable storage medium stores a deep metric learning program based on a hierarchical triplet loss function, and the hierarchical triplet loss
  • the function's deep metric learning program implements the deep metric learning method based on the hierarchical triplet loss function as described above when executed by the processor.
  • a deep metric learning method and device based on a hierarchical triplet loss function provided by the present application.
  • the method provided in the present application includes: obtaining a distance between any two classes based on a triplet loss function to construct a hierarchical category tree; and based on the hierarchical category tree, using the distance between the classes to perform triplet
  • the loss function is hierarchically obtained to obtain a hierarchical triplet loss function; a neural network is trained based on the hierarchical triplet loss function to extract target image extraction features, and an image search is performed according to the target image extraction features to facilitate image search Get the target search image.
  • a hierarchical category tree is constructed in advance, and a hierarchical triplet loss function is obtained based on the hierarchical category tree.
  • the neural network training is performed by using the hierarchical triplet loss function pair.
  • Features have been extracted and image search is performed, which overcomes the existing The shortcoming of the sample in the triplet loss function algorithm is too random, the speed of learning, searching, and identifying tasks is fast and efficient, and the accuracy is greatly improved.
  • FIG. 1 is a schematic structural diagram of a hardware operating environment involved in a solution of an embodiment of a deep metric learning method based on a hierarchical triplet loss function of the present application;
  • FIG. 2 is a schematic flowchart of a first embodiment of a deep metric learning method based on a hierarchical triplet loss function of the present application
  • FIG. 3 is a schematic flowchart of a second embodiment of a deep metric learning method based on a hierarchical triplet loss function of the present application
  • FIG. 5 is a schematic flowchart of a third embodiment of a deep metric learning method based on a hierarchical triplet loss function of the present application
  • step S240 is a detailed flowchart of step S240 in the third embodiment of a deep metric learning method based on a hierarchical triplet loss function of the present application;
  • FIG. 7 is a schematic diagram of functional modules of a deep metric learning device based on a hierarchical triplet loss function of the present application.
  • first and second are used for descriptive purposes only, and cannot be understood as indicating or implying relative importance or implicitly indicating the number of technical features indicated. Therefore, the features defined as “first” and “second” may explicitly or implicitly include one or more of the features. In the description of the present application, the meaning of "a plurality” is two or more, unless specifically defined otherwise.
  • the terms “installation,” “connected,” “connected,” and “fixed” should be broadly understood unless otherwise specified and limited. For example, they can be fixed connections or removable connections. , Or integrated; it can be mechanical or electrical connection; it can be directly connected, or it can be indirectly connected through an intermediate medium, it can be the internal connection of the two elements or the interaction between the two elements.
  • installation should be broadly understood unless otherwise specified and limited. For example, they can be fixed connections or removable connections. , Or integrated; it can be mechanical or electrical connection; it can be directly connected, or it can be indirectly connected through an intermediate medium, it can be the internal connection of the two elements or the interaction between the two elements.
  • FIG. 1 is a schematic structural diagram of a hardware operating environment of a terminal involved in a solution according to an embodiment of the present application.
  • the terminal may be a PC, or a mobile terminal device with a certain computing capability, such as a smart phone, a tablet computer, or a portable computer.
  • the terminal may include a processor 1001, such as a CPU, a network interface 1004, a user interface 1003, a memory 1005, and a communication bus 1002.
  • the communication bus 1002 is used to implement connection and communication between these components.
  • the user interface 1003 may include a display screen, an input unit such as a keyboard, a remote control, and the optional user interface 1003 may further include a standard wired interface and a wireless interface.
  • the network interface 1004 may optionally include a standard wired interface and a wireless interface (such as a WI-FI interface).
  • the memory 1005 may be a high-speed RAM memory or a stable memory, such as a magnetic disk memory.
  • the memory 1005 may optionally be a storage device independent of the foregoing processor 1001.
  • the terminal may further include an RF (Radio Frequency) circuit, an audio circuit, a WiFi module, and the like.
  • the mobile terminal may be configured with other sensors such as a gyroscope, a barometer, a hygrometer, a thermometer, an infrared sensor, and the like, which will not be repeated here.
  • the terminal shown in FIG. 1 does not constitute a limitation on the terminal, and may include more or fewer components than those shown in the figure, or combine some components, or arrange different components.
  • the memory 1005 as a computer-readable storage medium may include an operating system, a data interface control program, a network connection program, and a deep metric learning program based on a hierarchical triplet loss function.
  • the running environment of the deep metric learning method based on the hierarchical triplet loss function may also be the following environment:
  • a deep metric learning method and device based on a hierarchical triplet loss function provided by the present application.
  • the method constructs a hierarchical category tree in advance, and obtains a hierarchical triplet loss function based on the hierarchical category tree.
  • the neural network training is performed through the hierarchical triplet loss function pair, and features have been extracted and image search is performed to overcome
  • the shortcomings of the samples in the existing three-tuple loss function algorithm are too random, and the learning, searching, and recognition tasks are fast and efficient, and the accuracy is greatly improved.
  • a first embodiment of the present application provides a deep metric learning method based on a hierarchical triplet loss function, including:
  • Step S100 The processor obtains the distance between any two classes based on the triplet loss function, and constructs a hierarchical category tree;
  • the deep metric learning method based on the hierarchical triplet loss function provided in this application is applied to deep learning for image search tasks or face recognition tasks.
  • This algorithm can encode the global context information through a predefined hierarchical tree and collect representative training samples (triples), thereby overcoming the main shortcoming of the triple loss function, that is, the selection of training samples is too random .
  • the triple loss function is the standard triple loss function.
  • the loss function refers to a function that maps an event (an element in a sample space) to a real number expressing the economic cost or opportunity cost associated with its event.
  • the corresponding library for deep learning may be a picture library.
  • the picture library may contain different sets of categories, and each category contains different pictures.
  • the data of all categories is hierarchically formed by the triplet loss function, and then the distance between each two classes is obtained, that is, the distance between the classes, and a hierarchical category tree is constructed.
  • Step S200 Based on the hierarchical category tree, the processor hierarchizes a triplet loss function through the distance between the classes to obtain a hierarchical triplet loss function;
  • the triplet loss function is hierarchically layered in the hierarchical category tree, and then a hierarchical triplet loss function is obtained, which can be used for further training of the neural network.
  • step S300 the processor trains a neural network based on the hierarchical triplet loss function, extracts and obtains target image extraction features, and performs image search according to the target image extraction features to obtain a target search image.
  • the neural network is trained by using the hierarchical triplet loss function to obtain the target image extraction features of the images in the image library, and then an image search based on the features can be performed to obtain the target search image.
  • the target image extraction feature is compared with the image feature of the image in the image library to obtain the similarity value corresponding to the image in the image library, and then the image library is further processed according to the similarity value. All pictures are sorted to find the target result that needs to be found in the image search task or image recognition task, that is, the target search image.
  • a hierarchical category tree is constructed in advance, and a hierarchical triplet loss function is obtained based on the hierarchical category tree. Then, the neural network training is performed by using the hierarchical triplet loss function pair. Features have been extracted and image search has been performed.
  • the shortcomings of the existing three-tuple loss function algorithm are that the samples are too random, and the learning, searching, and identifying tasks are fast and efficient, and the accuracy is greatly improved.
  • a second embodiment of the present application provides a deep metric learning method based on a hierarchical triplet loss function.
  • the processor is based on three
  • the tuple loss function obtains the distance between any two classes and constructs a hierarchical category tree, including:
  • step S110 the processor uses a standard triplet loss function to train to obtain a triplet neural network model.
  • Step S120 The processor obtains a hierarchical data structure according to the triplet neural network model.
  • a globalized hierarchical structure at the category level is constructed; given a neural network model trained in advance with a standard triplet loss function Then, a hierarchical structure of the data is obtained through the neural network model (specific rules, that is, rules in which any two categories are continuously and recursively merged through a set threshold).
  • step S130 the processor performs calculations through the data hierarchical structure to obtain the distance between any two classes, and constructs a hierarchical category tree based on the distance between the classes.
  • step S130 the processor performs calculation through the data hierarchical structure to obtain an inter-class distance between any two classes, including:
  • Step S131 define any two categories, that is, the p-th and q-th categories, and the distance between the categories is calculated by the following formula:
  • This formula represents the average distance between the p-th and q-th categories, which is the distance between classes;
  • the range of the distance between classes is 0-4.
  • the definition, the distance between the p-th category and the q-th category, is calculated by the above formula; where the formula represents the average distance of the training samples before the p-type and q-type samples.
  • the depth feature the feature of the deep neural network, that is, the depth feature, in this embodiment, the feature is extracted for the target image
  • the range of distance clustering between any two classes It is 0-4.
  • the inter-class distance obtained through the foregoing steps constitutes a hierarchical category tree.
  • the hierarchical category tree includes multiple levels; wherein the average intra-class distance is used as the merge threshold of the 0th level (the initial level of the hierarchical category tree is 0); the hierarchical category tree further includes a plurality of leaf nodes; each of the Leaf nodes are image categories of corresponding levels;
  • step S130 constructing a hierarchical category tree by using the distance between classes includes:
  • step S132 the leaf nodes are merged according to the distance between the classes; wherein the merge of the leaf nodes is performed by setting the merge threshold to construct the hierarchical category tree;
  • the hierarchical category tree includes multiple leaf nodes, and each leaf node is a corresponding initial image category, where each image category represents a leaf node at level 0; further, all leaf nodes are obtained by using the foregoing steps.
  • the distance between classes is continuously recursively merged to build a hierarchical category tree.
  • the hierarchical category tree can be divided into L levels, and an average intra-class distance d 0 is used as a merge threshold for merging at the 0 level. Specifically, through the formula:
  • these leaf nodes are continuously merged by the merge threshold, where the merge threshold is set to be the first level in the hierarchical category tree as Among them, if the distance between any two classes is less than d l , the two classes will be further merged.
  • the number of nodes in the l layer is N l .
  • Nodes are continuously merged from level 0 to level L.
  • the construction generates a hierarchical category tree. This constructed hierarchical category tree obtains the relationships between different object categories in the entire data set and is updated after the appropriate number of iterations.
  • calculation is performed through a data hierarchical structure to obtain an inter-class distance between any two classes, and a hierarchical category tree is constructed based on the inter-class distance.
  • constructing a hierarchical tree can provide a global distribution of training data, guide the selection of training samples and training rules, and greatly improve convergence speed and accuracy.
  • a fourth embodiment of the present application provides a deep metric learning method based on a hierarchical triplet loss function.
  • the processor is based on The hierarchical category tree is described, and the triple loss function is hierarchically obtained through the distance between the classes to obtain a hierarchical triple loss function, including:
  • Step S210 The processor extracts a leaf node at level 0 in the hierarchical category tree as a target leaf node;
  • the collection of triplet functions is converted into a hierarchical triplet loss function.
  • Anchor-neighbor sampling is required, and l ′ target leaf nodes are randomly selected at the 0th level of the number of construction layers. Among them, each target leaf node represents an initial category.
  • the selection at level 0 of the hierarchical category tree is to maintain the diversity of images in each small batch. In this way, batch normalization will be more stable and accurate.
  • Step S220 Based on the distance between the classes, the processor selects a corresponding nearest neighbor category as the anchor category for the target leaf node;
  • m-1 nearest neighbor classes are selected, so that similar classes can be guaranteed to be placed in the same small batch. Enhance the discrimination ability of neural networks.
  • Step S230 The processor randomly extracts pictures from each of the anchor point categories to form a training picture
  • Step S240 The processor searches for a dynamic loss boundary corresponding to the triple formed by the training picture, and forms a hierarchical triplet loss function corresponding to the dynamic loss boundary.
  • step S240 the processor searches for a dynamic loss boundary corresponding to the triple formed by the training picture to form the hierarchical triple loss function, including:
  • Step S241 For each triple formed by the training pictures, calculate a category relationship between the anchor category and the negative sample category through the hierarchical category tree, and obtain the dynamic loss boundary to be related to the dynamic The hierarchical triplet loss function corresponding to the loss boundary.
  • This dynamic loss boundary is the difference between this algorithm and the existing fixed boundary algorithm.
  • the hierarchical triplet loss function can be of the form
  • a z in the formula is a dynamic loss boundary, which is substantially different from the fixed loss boundary in the existing traditional triplet loss function. It is obtained by calculating the category relationship between the anchor category y a and the negative sample category y n on the constructed hierarchical category tree. In particular, for a triplet T z , the loss boundary a z is calculated as follows,
  • the hierarchical triplet loss function in this embodiment considering the distance, one sample is encouraged to push away the samples of different categories nearby, and close the samples of different categories.
  • the training picture is obtained by sampling the nearest neighbors of the anchor points, and then the dynamic loss boundary is obtained for the triples of the training picture, so as to obtain the hierarchical triplet loss function corresponding to each triplet.
  • the algorithm of the dynamic loss boundary in the embodiment makes the obtained hierarchical triplet loss function more accurate and has higher calculation efficiency.
  • the present application also provides a deep metric learning device based on a hierarchical triplet loss function, including: a construction module 10, a hierarchy module 20, and a training module 30;
  • the building module 10 is configured to obtain a distance between any two classes based on a triplet loss function, and construct a hierarchical category tree;
  • the hierarchical module 20 is configured to perform a hierarchical triplet loss function based on the hierarchical category tree and obtain a hierarchical triplet loss function through the distance between the classes;
  • the training module 30 is configured to train a neural network based on the hierarchical triplet loss function, extract target image extraction features, and perform an image search according to the target image extraction features to obtain a target search image.
  • the building module includes: a training unit, a hierarchical unit, and a building unit.
  • the training unit is configured to use a standard triplet loss function to train to obtain a triplet neural network model;
  • the hierarchical unit is configured to obtain a data hierarchical structure based on the triplet neural network model;
  • the construction unit is configured to use data hierarchicalization
  • the structure is calculated to obtain the inter-class distance between any two classes, and a hierarchical category tree is constructed based on the inter-class distance.
  • the construction unit includes a calculation subunit configured to define any two categories, namely a p-th category and a q-th category, and the distance between the categories is calculated by the following formula:
  • this formula represents the average distance between the p-th category and the q-th category, which is the distance between the classes.
  • the value of the distance between classes ranges from 0 to 4.
  • the hierarchical category tree includes multiple levels; wherein the average intra-class distance is used as the merge threshold of the 0th level; the hierarchical category tree further includes multiple leaf nodes; each leaf node is an image category of the corresponding level;
  • the construction unit further includes a merging sub-unit configured to merge leaf nodes according to the distance between classes; wherein the merging of the leaf nodes is performed by setting a merge threshold to construct a hierarchical category tree;
  • the merge threshold is set as the following formula: Among them, d l is the distance between any two classes. In the hierarchical category tree, if the distance between any two classes in the first level is less than d l , the two classes are merged.
  • the hierarchical module includes a first extraction unit, a second extraction unit, a selection unit, and a constituent unit.
  • the first extraction unit is configured to extract the 0th-level leaf node in the hierarchical category tree as the target leaf node; the selection unit is configured to select the corresponding neighboring category of the target leaf node as the anchor category based on the distance between the classes; the second An extraction unit configured to randomly extract pictures in each anchor category to form a training picture; a composition unit configured to search for a dynamic loss boundary corresponding to a triple formed by the training image, and constitute a hierarchical triple corresponding to the dynamic loss boundary Loss function.
  • the constituting unit is specifically configured to: for each triple formed by each training picture, calculate a category relationship between the anchor category and the negative sample category through a hierarchical category tree to obtain a dynamic loss boundary corresponding to the dynamic loss boundary Hierarchical triplet loss function.
  • the present application also provides a user terminal including a memory and a processor configured to store a deep metric learning program based on a hierarchical triplet loss function, and the processor runs the hierarchical triplet loss function.
  • a deep metric learning program to enable the user terminal to execute the deep metric learning method based on the hierarchical triplet loss function as described above.
  • the present application also provides a computer-readable storage medium storing a deep metric learning program based on a hierarchical triplet loss function, and a deep metric learning based on the hierarchical triplet loss function.
  • the program is executed by the processor, the deep metric learning method based on the hierarchical triplet loss function described above is implemented.
  • a horizontal comparison test experiment is performed using a clothing image retrieval library and a fine-grained classification test library.
  • test subject
  • a deep metric learning method based on a hierarchical triplet loss function provided in this application: HTL.
  • the recall rate R @ # (# is used to represent the ranking ranking number, which includes the ranking ranking number and its previous data, such as R @ 10, Represents the top 10 recall rate; for example, R @ 30 represents the top 30 recall rate, etc.) to measure the target search image in the search results in the top 1, 10, 20, 30, 40, and 50 (sales clothing retrieval data set Comparison of the latest algorithms for accuracy), or the accuracy of the top 1, 2, 4, 8, 16, and 32 (comparison of image retrieval accuracy on the CUB-200-2011 bird fine classification dataset).
  • HTL improves performance by introducing global data distribution. HTL is 18.8% higher than HDC on Recall @ 1 and 4% higher than BIER.
  • the deep metric learning method (algorithm) based on the hierarchical triplet loss function provided in this application overcomes the shortcomings of too random samples in the existing triplet loss function algorithm, and performs learning, searching, and recognition tasks. Is fast, efficient, and greatly improves accuracy.
  • the methods in the above embodiments can be implemented by means of software plus a necessary universal hardware platform, and of course, also by hardware, but in many cases the former is better.
  • Implementation Based on such an understanding, the technical solution of this application that is essentially or contributes to the existing technology can be embodied in the form of a software product.
  • the computer software product is stored in a storage medium (such as ROM / RAM) as described above. , Magnetic disk, optical disc), including a number of instructions for causing a terminal device (which may be a mobile phone, a computer, a server, or a network device, etc.) to execute the methods described in the embodiments of the present application.
  • a hierarchical category tree is constructed in advance, and a hierarchical triplet loss function is obtained based on the hierarchical category tree. Then, the neural network training is performed by using the hierarchical triplet loss function pair. Features have been extracted and image search is performed, which overcomes the existing The shortcoming of the sample in the triplet loss function algorithm is too random, the speed of learning, searching, and identifying tasks is fast and efficient, and the accuracy is greatly improved.

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Data Mining & Analysis (AREA)
  • General Health & Medical Sciences (AREA)
  • Biomedical Technology (AREA)
  • Biophysics (AREA)
  • Computational Linguistics (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Evolutionary Computation (AREA)
  • Artificial Intelligence (AREA)
  • Molecular Biology (AREA)
  • Computing Systems (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Mathematical Physics (AREA)
  • Software Systems (AREA)
  • Health & Medical Sciences (AREA)
  • Image Analysis (AREA)

Abstract

本申请提供了一种基于层次三元组损失函数的深度度量学习方法及其装置,其中所述方法包括:基于三元组损失函数构建层次类别树;对三元组损失函数进行层次化,得到层次三元组损失函数;使用层次三元组损失函数训练深度神经网络;提取得到目标图像抽取特征,并进行图像搜索,以便于得到目标搜索图像。本申请通过预先构建层次类别树,并基于层次类别树得到层次三元组损失函数,进而通过该层次三元组损失函数对进行对神经网络训练,已提取特征并进行图像搜索,克服了现有的三元组损失函数算法中的样本过于随机的缺点,进行学习、搜索和识别任务的速度快、效率高,并且大大提高了准确度。

Description

基于层次三元组损失函数的深度度量学习方法及其装置
本申请要求于2018年09月07日提交中国专利局的申请号为201811044820.5,名称为“基于层次三元组损失函数的深度度量学习方法及其装置”的中国专利申请的优先权,其全部内容通过引用结合在本申请中。
技术领域
本申请涉及图像识别技术领域,更具体地说,涉及一种基于层次三元组损失函数的深度度量学习方法及其装置。
背景技术
在统计学,统计决策理论和经济学中,损失函数是指一种将一个事件(在一个样本空间中的一个元素)映射到一个表达与其事件相关的经济成本或机会成本的实数上的一种函数。更通俗地说,在统计学中损失函数是一种衡量损失和错误(这种损失与“错误地”估计有关,如费用或者设备的损失)程度的函数。在深度卷积神经网络训练中,一般采用三元组损失的方法,去掉卷积神经网络最终的分类层,直接利用三元组损失函数归一化的特征编码。
现有的应用于卷积神经网络学习的三元组损失函数的深度学习方法,在应用于图像搜索任务和人脸识别任务的过程中,存在样本过于随机,导致速度慢、效率低、准确性差的缺点。
发明内容
有鉴于此,本申请提供一种基于层次三元组损失函数的深度度量学习方法及其装置以解决现有技术的不足。
为解决上述问题,本申请提供一种基于层次三元组损失函数的深度度量学习方法,包括:
基于三元组损失函数得到任意两类之间的类间距离,构建层次类别树;
基于所述层次类别树,通过所述类间距离,对三元组损失函数进行层次化,得到层次三元组损失函数;
基于所述层次三元组损失函数对神经网络进行训练,提取得到目标图像抽取特征,并根据所述目标图像抽取特征进行图像搜索,以得到目标搜索图像。
优选地,所述基于三元组损失函数得到任意两类之间的类间距离,构建层次类别树,包括:
利用标准的三元组损失函数训练得到三元组神经网络模型;
根据所述三元组神经网络模型,得到数据层次化结构;
通过所述数据层次化结构进行计算,得到任意两类之间的类间距离,并通过所述类间距离构建层次类别树。
优选地,所述通过所述数据层次化结构进行计算,得到任意两类之间的类间距离,包括:
定义任意两个类别,即为第p个类别和第q个类别,其间的类间距离通过如下公式计算:
Figure PCTCN2018108405-appb-000001
其中,该公式表征了第p个类别和第q个类别之间的平均距离,即为类间距离;
优选地,所述类间距离的取值范围为0至4。
优选地,所述层次类别树包括多个层级;其中,平均类内距离作为第0层级的合并阈值;所述层次类别树还包括多个叶子节点;每个所述叶子节点为对应层级的图像类别;
所述通过所述类间距离构建层次类别树,包括:
根据所述类间距离对所述叶子节点进行合并;其中,叶子节点的合并的通过设置所述合并阈值进行合并,构建所述层次类别树;
所述合并阈值,被设定为如下公式:
Figure PCTCN2018108405-appb-000002
其中,d l为任意两进行合并的阈值,在所述层次类别树中,如果第l层级中任意两类的距离小于d l,即将该两类进行合并。其中d 0为类内的平均距离,
Figure PCTCN2018108405-appb-000003
优选地,所述基于所述层次类别树,通过所述类间距离,对三元组损失函数进行层次化,得到层次三元组损失函数,包括:
提取所述层次类别树中第0层级的叶子节点作为目标叶子节点;
基于所述类间距离,对所述目标叶子节点选择与其对应的近邻类别作为锚点类别;
在每个所述锚点类别中随机提取图片,组成训练图片;
搜索所述训练图片组成的三元组对应的动态损失边界,构成与所述动态损失边界对应的所述层次三元组损失函数。
优选地,所述搜索所述训练图片组成的三元组对应的动态损失边界,构成所述层次三元组损失函数,包括:
对于每一个所述训练图片组成的三元组,通过所述层次类别树计算所述锚点类别和负样本类别之间的类别关系,得到所述动态损失边界,成与所述动态损失边界对应的所述层次三元组损失函数。
此外,为解决上述问题,本申请还提供一种基于层次三元组损失函数的深度度量学习装置,包括:构建模块、层次模块和训练模块;
所述构建模块,配置成基于三元组损失函数得到任意两类之间的类间距离,构建层次类别树;
所述层次模块,配置成基于所述层次类别树,通过所述类间距离,对三元组损失函数进行层次化,得到层次三元组损失函数;
所述训练模块,配置成基于所述层次三元组损失函数对神经网络进行训练, 提取得到目标图像抽取特征,并根据所述目标图像抽取特征进行图像搜索,以便于得到目标搜索图像。
优选地,所述构建模块包括:
训练单元,配置成利用标准的三元组损失函数训练得到三元组神经网络模型;
层次单元,配置成根据所述三元组神经网络模型,得到数据层次化结构;
构建单元,配置成通过所述数据层次化结构进行计算,得到任意两类之间的类间距离,并通过所述类间距离构建层次类别树。
优选地,所述构建单元包括:
计算子单元,配置成定义任意两个类别,即为第p个类别和第q个类别,其间的类间距离通过如下公式计算:
Figure PCTCN2018108405-appb-000004
其中,该公式表征了第p个类别和第q个类别的平均距离,即为类间距离。
优选地,所述类间距离的取值范围为0至4。
优选地,所述层次类别树包括多个层级;其中,平均类内距离作为第0层级的合并阈值;所述层次类别树还包括多个叶子节点;每个所述叶子节点为对应层级的图像类别;
所述构建单元还包括:
合并子单元,配置成根据所述类间距离对所述叶子节点进行合并;其中,叶子节点的合并的通过设置所述合并阈值进行合并,构建所述层次类别树;
所述合并阈值,被设定为如下公式:
Figure PCTCN2018108405-appb-000005
其中,d l为任意两类的距离,在所述层次类别树中,如果第一层级中任意两类的距离小于d l,即将该两类进行合并。
优选地,所述层次模块包括:
第一提取单元,配置成提取所述层次类别树中第0层级的叶子节点作为目标叶子节点;
选择单元,配置成基于所述类间距离,对所述目标叶子节点选择与其对应的近邻类别作为锚点类别;
第二提取单元,配置成在每个所述锚点类别中随机提取图片,组成训练图片;
构成单元,配置成搜索所述训练图片组成的三元组对应的动态损失边界,构成与所述动态损失边界对应的所述层次三元组损失函数。
优选地,所述构成单元,具体配置成:
对于每一个所述训练图片组成的三元组,通过所述层次类别树计算所述锚点类别和负样本类别之间的类别关系,得到所述动态损失边界,成与所述动态损失边界对应的所述层次三元组损失函数。
此外,为解决上述问题,本申请还提供一种用户终端,包括存储器以及处理器,所述存储器配置成存储基于层次三元组损失函数的深度度量学习程序,所述处理器运行所述基于层次三元组损失函数的深度度量学习程序以使所述用户终端执行如上述所述基于层次三元组损失函数的深度度量学习方法。
此外,为解决上述问题,本申请还提供一种计算机可读存储介质,所述计算机可读存储介质上存储有基于层次三元组损失函数的深度度量学习程序,所述基于层次三元组损失函数的深度度量学习程序被处理器执行时实现如上述所述基于层次三元组损失函数的深度度量学习方法。
本申请提供的一种基于层次三元组损失函数的深度度量学习方法及其装置。其中,本申请所提供的方法包括:基于三元组损失函数得到任意两类之间的类间距离,构建层次类别树;基于所述层次类别树,通过所述类间距离,对三元组损失函数进行层次化,得到层次三元组损失函数;基于所述层次三元组损失函数对神经网络进行训练,提取得到目标图像抽取特征,并根据所述目标图像抽取特征进行图像搜索,以便于得到目标搜索图像。本申请通过预先构建层次类别树,并基于层次类别树得到层次三元组损失函数,进而通过该层次三元组损失函数对进行对神经网络训练,已提取特征并进行图像搜索,克服了现有的三元组损失函数算法中的样本过于随机的缺点,进行学习、搜索和识别任务的速度快、效率高,并且大大提高了准确度。
附图说明
为了更清楚地说明本申请实施例的技术方案,下面将对实施例中所需要使用的附图作简单地介绍,应当理解,以下附图仅示出了本申请的某些实施例,因此不应被看作是对范围的限定,对于本领域普通技术人员来讲,在不付出创造性劳动的前提下,还可以根据这些附图获得其他相关的附图。
图1为本申请基于层次三元组损失函数的深度度量学习方法实施例方案涉及的硬件运行环境的结构示意图;
图2为本申请基于层次三元组损失函数的深度度量学习方法第一实施例的流程示意图;
图3为本申请基于层次三元组损失函数的深度度量学习方法第二实施例的流程示意图;
图4为本申请基于层次三元组损失函数的深度度量学习方法第二实施例的步骤S130的细化流程示意图;
图5为本申请基于层次三元组损失函数的深度度量学习方法第三实施例的流程示意图;
图6为本申请基于层次三元组损失函数的深度度量学习方法第三实施例种的步骤S240细化的流程示意图;
图7为本申请基于层次三元组损失函数的深度度量学习装置的功能模块示意图。
本申请目的的实现、功能特点及优点将结合实施例,参照附图做进一步说明。
具体实施方式
下面详细描述本申请的实施例,其中自始至终相同或类似的标号表示相同或类似的元件或具有相同或类似功能的元件。
此外,术语“第一”、“第二”仅用于描述目的,而不能理解为指示或暗示相对重要性或者隐含指明所指示的技术特征的数量。由此,限定有“第一”、“第 二”的特征可以明示或者隐含地包括一个或者更多个该特征。在本申请的描述中,“多个”的含义是两个或两个以上,除非另有明确具体的限定。
在本申请中,除非另有明确的规定和限定,术语“安装”、“相连”、“连接”、“固定”等术语应做广义理解,例如,可以是固定连接,也可以是可拆卸连接,或成一体;可以是机械连接,也可以是电连接;可以是直接相连,也可以通过中间媒介间接相连,可以是两个元件内部的连通或两个元件的相互作用关系。对于本领域的普通技术人员而言,可以根据具体情况理解上述术语在本申请中的具体含义。
应当理解,此处所描述的具体实施例仅仅用以解释本申请,并不用于限定本申请。
如图1所示,图1是本申请实施例方案涉及的终端的硬件运行环境的结构示意图。
本申请实施例终端可以是的PC,也可以是智能手机、平板电脑、便携计算机等具有一定运算能力的可移动式终端设备。
如图1所示,该终端可以包括:处理器1001,例如CPU,网络接口1004,用户接口1003,存储器1005,通信总线1002。其中,通信总线1002用于实现这些组件之间的连接通信。用户接口1003可以包括显示屏、输入单元比如键盘、遥控器,可选用户接口1003还可以包括标准的有线接口、无线接口。网络接口1004可选的可以包括标准的有线接口、无线接口(如WI-FI接口)。存储器1005可以是高速RAM存储器,也可以是稳定的存储器,例如磁盘存储器。存储器1005可选的还可以是独立于前述处理器1001的存储装置。可选地,终端还可以包括RF(Radio Frequency,射频)电路、音频电路、WiFi模块等等。此外,移动终端还可配置陀螺仪、气压计、湿度计、温度计、红外线传感器等其他传感器,在此不再赘述。
本领域技术人员可以理解,图1中示出的终端并不构成对终端的限定,可以包括比图示更多或更少的部件,或者组合某些部件,或者不同的部件布置。
如图1所示,作为一种计算机可读存储介质的存储器1005中可以包括操作系统、数据接口控制程序、网络连接程序以及基于层次三元组损失函数的深度度量学习程序。
此外,本申请所提供的基于层次三元组损失函数的深度度量学习方法的运行环境,也可以为如下环境:
应用深度学习软件Caffe,并且运行在NVIDIA TITAN X GPU上;GPU显存为12GB;使用的深度神经网络结构为GoogleNet V2;该神经网络首先被用在ImageNet训练集上进行预训练。
本申请提供的一种基于层次三元组损失函数的深度度量学习方法及其装置。其中,所述方法通过预先构建层次类别树,并基于层次类别树得到层次三元组损失函数,进而通过该层次三元组损失函数对进行对神经网络训练,已提取特征并进行图像搜索,克服了现有的三元组损失函数算法中的样本过于随机的缺点,进行学习、搜索和识别任务的速度快、效率高,并且大大提高了准确度。
实施例1:
参照图2,本申请第一实施例提供一种基于层次三元组损失函数的深度度量学习方法,包括:
步骤S100,处理器基于三元组损失函数得到任意两类之间的类间距离, 构建层次类别树;
上述,本申请中,所提供的基于层次三元组损失函数的深度度量学习方法,应用于进行图像搜索任务或者人脸识别任务的深度学习。该算法,可以通过预先定义的层次树来将全局上下文信息进行编码,并收集代表性的训练样本(三元组),从而克服了三元组损失函数的主要缺陷,即训练样本的选择过于随机。
上述,三元组损失函数,即为标准的三元组损失函数。其中,损失函数,是指一种将一个事件(在一个样本空间中的一个元素)映射到一个表达与其事件相关的经济成本或机会成本的实数上的一种函数。
上述,进行深度学习的对应的库,可以为图片库,在本实施例中,图片库中可以包含有不同的类别集合,每个类别中包含有不同的图片。
上述,通过三元组损失函数,进行对所有类别数据进行层次化,进而得到每两个类之间的距离,即为类间距离,进而构建层次类别树。
步骤S200,处理器基于所述层次类别树,通过所述类间距离,对三元组损失函数进行层次化,得到层次三元组损失函数;
上述,根据类间距离,在层次类别树中对三元组损失函数进行层次化,进而得到层次三元组损失函数,该损失函数可用于进一步的对于神经网络的训练。
步骤S300,处理器基于所述层次三元组损失函数对神经网络进行训练,提取得到目标图像抽取特征,并根据所述目标图像抽取特征进行图像搜索,以便于得到目标搜索图像。
通过利用层次三元组损失函数对神经网络进行训练,从而得到图像库中的图像的目标图像抽取特征,进而可进行根据该特征的图像搜索,以便于得到目标搜索图像。
具体的,在得到目标图像抽取特征后,将目标图像抽取特征与图像库中的图像的图像特征进行比较,得到图像库中的图像对应的相似度值,进而依照相似度值,对图像库中所有图片进行排序,从而根据排序情况找出图像搜索任务或图像识别任务中的需要查找的目标结果,即目标搜索图像。
本实施例中通过预先构建层次类别树,并基于层次类别树得到层次三元组损失函数,进而通过该层次三元组损失函数对进行对神经网络训练,已提取特征并进行图像搜索,克服了现有的三元组损失函数算法中的样本过于随机的缺点,进行学习、搜索和识别任务的速度快、效率高,并且大大提高了准确度。
实施例2:
参照图3和图4,本申请第二实施例提供一种基于层次三元组损失函数的深度度量学习方法,基于上述图2所示的第一实施例,所述步骤S100,处理器基于三元组损失函数得到任意两类之间的类间距离,构建层次类别树,包括:
步骤S110,处理器利用标准的三元组损失函数训练得到三元组神经网络模型。
步骤S120,处理器根据所述三元组神经网络模型,得到数据层次化结构。
上述,在本实施例中,构建一个在类别层面的全局化的层次结构;给定一个预先用标准的三元组损失函数训练好的神经网络模型
Figure PCTCN2018108405-appb-000006
然后通过该神经网络模型(特定的规则,即指任意两个类别通过设定的阈值不断地递归地合并的规则)得到数据的层次化结构。
步骤S130,处理器通过所述数据层次化结构进行计算,得到任意两类之间的类间距离,并通过所述类间距离构建层次类别树。
所述步骤S130中,处理器通过所述数据层次化结构进行计算,得到任意 两类之间的类间距离,包括:
步骤S131,定义任意两个类别,即为第p个和第q个类别,其间的类间距离通过如下公式计算:
Figure PCTCN2018108405-appb-000007
,该公式表征了第p个和第q个类别的平均距离,即为类间距离;
进一步的,所述类间距离的取值范围为0-4。
上述,定义,第p个类别和第q个类别,其间的距离,通过上述公式计算;其中,该公式表征了p类和q类样本之前的训练样本的平均距离。因为深度特征(深度神经网络的特征,即为深度特征,在本实施例中为所述目标图像抽取特征)已经被归一化到单位长度,任意两类之间的距离聚类的取值范围是0-4。通过前述步骤得到的类间距离从而构成层次类别树。
所述层次类别树包括多个层级;其中,平均类内距离作为第0层级(层次类别树的初始层级为0)的合并阈值;所述层次类别树还包括多个叶子节点;每个所述叶子节点为对应层级的图像类别;
所述步骤S130中,通过所述类间距离构建层次类别树,包括:
步骤S132,根据所述类间距离对所述叶子节点进行合并;其中,叶子节点的合并的通过设置所述合并阈值进行合并,构建所述层次类别树;
所述合并阈值,被设定为如下公式:
Figure PCTCN2018108405-appb-000008
其中,d l为任意两类的距离,在所述层次类别树中,如果第l层级中任意两类的距离小于d l,即将该两类进行合并。
上述,层次类别树中包括多个叶子节点,每个叶子节点为对应的最初的图像类别,其中,每个图像类别代表了第0层级的一个叶子节点;进而将所有叶子节点,利用前述步骤得到的类间距离,不断进行递归合并,从而构建层次类别树。
上述,层次类别树可分为L层级,并且,平均类内距离为d 0被用作在0层级进行合并的合并阈值。具体的,通过公式:
Figure PCTCN2018108405-appb-000009
计算类内的平均距离,用以计算递归合并的公式内计算合并阈值;
进而,这些叶子节点不断通过合并阈值进行合并,其中,在层次类别树中的第l层级,合并阈值设定为
Figure PCTCN2018108405-appb-000010
其中,如果任意两个类的距离小于d l,则这两个类将进一步被合并。
l层的节点数目为N l。节点从第0级到第L级被不断的合并。最终,构建生成了层次类别树。这颗构建的层次类别树获取了整个数据集中不同物体类别之间的关系,并且在适当的迭代次数之后被更新。
本实施例中,通过数据层次化结构进行计算,得到任意两类之间的类间距离,并通过类间距离构建层次类别树。其中,构建层次树可以提供训练数据的全局分布,引导训练样本的选择和训练规则,大幅提升收敛速度和精度
实施例3:
参照图5和图6,本申请第四实施例提供一种基于层次三元组损失函数的深度度量学习方法,基于上述图4所示的第三实施例,所述步骤S200,处理器基于所述层次类别树,通过所述类间距离,对三元组损失函数进行层次化,得到层次三元组损失函数,包括:
步骤S210,处理器提取所述层次类别树中第0层级的叶子节点作为目标叶子节点;
在本实施例中,通过将三元组函数的搜集,转换为层次三元组损失函数。需进行锚点-近邻采样,在构建层次数的第0层级随机选取l′个目标叶子节点。其中,每个目标叶子节点代表了一个初始的类别。在层次类别树的第0级选取,是为了保持每个小批次中图像的多样性,这样的话批量归一化将会更加的稳定和准确。
步骤S220,处理器基于所述类间距离,对所述目标叶子节点选择与其对应的近邻类别作为锚点类别;
上述,基于类间距离,对前述步骤中被选择的锚点类别,再选择m-1个最近的近邻类别,这样就可以保证相似的类别也被放在同一个小批次中,以此来增强神经网络的判别能力。
步骤S230,处理器在每个所述锚点类别中随机提取图片,组成训练图片;
上述,对于每个锚点类别,随机选取其中的t张图片,最终形成n=l′mt张训练图片。
步骤S240,处理器搜索所述训练图片组成的三元组对应的动态损失边界,构成与所述动态损失边界对应的所述层次三元组损失函数。
进一步的,所述步骤S240,处理器搜索所述训练图片组成的三元组对应的动态损失边界,构成所述层次三元组损失函数,包括:
步骤S241,对于每一个所述训练图片组成的三元组,通过所述层次类别树计算所述锚点类别和负样本类别之间的类别关系,得到所述动态损失边界,成与所述动态损失边界对应的所述层次三元组损失函数。
上述,本实施例中,引入一种动态损失边界,该动态损失边界是本算法与现有的固定边界算法的区别。
其中,作为三元组产生于的动态损失边界中,层次三元组损失函数可以为如下形式
Figure PCTCN2018108405-appb-000011
其中,公式中a z是动态损失边界,与现有的传统三元组损失函数中的固定损失边界具有实质上的区别。其为通过在构建的层次类别树上计算锚点类别y a和负样本类别y n之间的类别关系来得到的。尤其的,对于一个三元组T z,损失边界a z计算方式如下,
Figure PCTCN2018108405-appb-000012
这里β(=0.1)是一种固定参数来鼓励图像类别在当前迭代中比上一次迭代分离得更加明显。
Figure PCTCN2018108405-appb-000013
是这颗树的层级高度值。
Figure PCTCN2018108405-appb-000014
是用来合并两个类别的合并阈值。而公式:
Figure PCTCN2018108405-appb-000015
则是第y a类样本内的平均距离。在本实施例中的层次三元组损失函数中,从距离上考虑,为一个样本被鼓励去推开附近不同类别的样本,而拉近不同类 别的样本。
在本实施例中,通过锚点近邻采样后,得到训练图片,进而对训练图片的三元组进行动态损失边界的获取,从而得到每个三元组对应的层次三元组损失函数,通过本实施例中的动态损失边界的算法,从而使得到的层次三元组损失函数更加准确,计算效率更高。
此外,参考图7,本申请还提供一种基于层次三元组损失函数的深度度量学习装置,包括:构建模块10、层次模块20和训练模块30;
所述构建模块10,配置成基于三元组损失函数得到任意两类之间的类间距离,构建层次类别树;
所述层次模块20,配置成基于所述层次类别树,通过所述类间距离,对三元组损失函数进行层次化,得到层次三元组损失函数;
所述训练模块30,配置成基于所述层次三元组损失函数对神经网络进行训练,提取得到目标图像抽取特征,并根据所述目标图像抽取特征进行图像搜索,以便于得到目标搜索图像。
作为一个优选方案,构建模块包括:训练单元、层次单元和构建单元。训练单元,配置成利用标准的三元组损失函数训练得到三元组神经网络模型;层次单元,配置成根据三元组神经网络模型,得到数据层次化结构;构建单元,配置成通过数据层次化结构进行计算,得到任意两类之间的类间距离,并通过类间距离构建层次类别树。
优选地,构建单元包括:计算子单元,配置成定义任意两个类别,即为第p个类别和第q个类别,其间的类间距离通过如下公式计算:
Figure PCTCN2018108405-appb-000016
其中,该公式表征了第p个类别和第q个类别的平均距离,即为类间距离。
优选地,类间距离的取值范围为0至4。进一步地,层次类别树包括多个层级;其中,平均类内距离作为第0层级的合并阈值;层次类别树还包括多个叶子节点;每个叶子节点为对应层级的图像类别;
其中,构建单元还包括:合并子单元,配置成根据类间距离对叶子节点进行合并;其中,叶子节点的合并的通过设置合并阈值进行合并,构建层次类别树;
合并阈值,被设定为如下公式:
Figure PCTCN2018108405-appb-000017
其中,d l为任意两类的距离,在层次类别树中,如果第一层级中任意两类的距离小于d l,即将该两类进行合并。
作为一个优选方案,层次模块包括:第一提取单元、第二提取单元、选择单元和构成单元。
第一提取单元,配置成提取层次类别树中第0层级的叶子节点作为目标叶子节点;选择单元,配置成基于类间距离,对目标叶子节点选择与其对应的近邻类别作为锚点类别;第二提取单元,配置成在每个锚点类别中随机提取图片,组成训练图片;构成单元,配置成搜索训练图片组成的三元组对应的动态损失边界,构成与动态损失边界对应的层次三元组损失函数。
优选地,构成单元,具体配置成:对于每一个训练图片组成的三元组,通过层次类别树计算锚点类别和负样本类别之间的类别关系,得到动态损失边界, 成与动态损失边界对应的层次三元组损失函数。
此外,本申请还提供一种用户终端,包括存储器以及处理器,所述存储器配置成存储基于层次三元组损失函数的深度度量学习程序,所述处理器运行所述基于层次三元组损失函数的深度度量学习程序以使所述用户终端执行如上述所述基于层次三元组损失函数的深度度量学习方法。
此外,本申请还提供一种计算机可读存储介质,所述计算机可读存储介质上存储有基于层次三元组损失函数的深度度量学习程序,所述基于层次三元组损失函数的深度度量学习程序被处理器执行时实现如上述所述基于层次三元组损失函数的深度度量学习方法。
横向比较测试实验:
基于本申请中所提供的基于层次三元组损失函数的深度度量学习方法,利用服饰图像检索库和细粒度分类测试库进行横向比较测试试验。
实验对象:
1、作为比较的现有算法:FashionNet+Joints、FashionNet+Poselets、FashionNet、HDC、BIER、LiftedStruct、Binomial Deviance、FashionNet;
2、本申请所提供算法的基准算法:Ours Baseline(Semi Hard Negative Mining);
3、本申请所提供算法中部分算法-锚点近邻采样算法:A-N Sampling;
4、本申请中所提供的基于层次三元组损失函数的深度度量学习方法:HTL。
实验方法:
基于上述实验对象中所包含的作为比较的现有算法、基准算法、锚点紧邻采样算法和基于层次三元组损失函数的深度度量学习方法,通过不同的包含测试图像图片库的数据集进行搜索,进而通过比较目标图像的排名在所得到的搜索结果中是否靠前,使用召回率R@#(#代表制定的排名名次数字,即包含排名名次数字及其之前的数据,如R@10,代表排名前10的召回率;如R@30代表排名前30的召回率等)来衡量目标搜索图像在搜索结果中排名前1、10、20、30、40、和50(售卖服饰检索数据集的最新算法准确率比较),或者排名前1、2、4、8、16和32(在鸟类细分类数据集CUB-200-2011上的图像检索准确率比较)的准确率。
结果与讨论
1、参考下表1,对作为比较的现有算法、基准算法、锚点紧邻采样算法和基于层次三元组损失函数的深度度量学习方法,进行基于售卖服饰检索数据集的最新算法结果的准确率比较;其中,表格中的R@列,为不同参比的算法;行,为搜索结果中的目标图像排名在搜索结果的名次。
其中,HTL比作为比较的现有算法性能在Recall@1上高出18.6%。这证明了三元组损失函数可以极大地提升深度特征的判别能力。不同于当前的最新算法HDL和BIER通过特征集成来提升性能,HTL通过引入全局数据分布来提升性能。HTL在Recall@1指标上高出HDC达18.8%,并且比BIER高出4%。
2、参考下表2,对作为比较的现有算法、基准算法、锚点紧邻采样算法和基于层次三元组损失函数的深度度量学习方法,进行基于在鸟类细分类数据集CUB-200-2011上的图像检索结果准确率的比较;其中,表格中的R@列,为不同参比的算法;行,为搜索结果中的目标图像排名在搜索结果的名次。
其中,在Caltech-UCSD Birds 200-2011鸟类细分类数据集上,本文自行实现的Semi Hard Negative Mining三元组损失函数获得了当前最好的结果。如果采用HTL算法结果可以达到57.1%Recall@1,高出HDC达3.5%,高出BIER达1.8%。
综上,本申请所提供的基于层次三元组损失函数的深度度量学习方法(算法),克服了现有的三元组损失函数算法中的样本过于随机的缺点,进行学习、搜索和识别任务的速度快、效率高,并且大大提高了准确度。
表1、基于售卖服饰检索数据集的算法结果准确率比较
R@ 1 10 20 30 40 50
FashionNet+Joints 41.0 64.0 68.0 71.0 73.0 73.5
FashionNet+Poselets 42.0 65.0 70.0 72.0 72.0 75.0
FashionNet 53.0 73.0 76.0 77.0 79.0 80.0
HDC 62.1 84.9 89.0 91.2 92.3 93.1
BIER 76.9 92.8 95.2 96.2 96.7 97.1
Ours Baseline 62.3 85.1 89.0 91.1 92.4 93.4
A-N Sampling 75.3 91.8 94.3 96.2 96.7 97.5
HTL 80.9 94.3 95.8 97.2 97.4 97.8
表2、基于鸟类细分类数据集(CUB-200-2011)的图像检索结果准确率比较
R@ 1 2 4 8 16 32
LiftedStruct 47.2 58.9 70.2 80.2 89.3 93.2
Binomial Deviance 52.8 64.4 74.7 83.9 90.4 94.3
Histogram Loss 50.3 61.9 72.6 82.4 88.8 93.7
N-Pair-Loss 51.0 63.3 74.3 83.2 - -
HDC 53.6 65.7 77.0 85.6 91.5 95.5
BIER 55.3 67.2 76.9 85.1 91.7 95.5
Our Baseline 55.9 68.4 78.2 86.0 92.2 95.5
HTL 57.1 68.8 78.7 86.5 92.5 95.5
上述本申请实施例序号仅仅为了描述,不代表实施例的优劣。
通过以上的实施方式的描述,本领域的技术人员可以清楚地了解到上述实施例方法可借助软件加必需的通用硬件平台的方式来实现,当然也可以通过硬件,但很多情况下前者是更佳的实施方式。基于这样的理解,本申请的技术方案本质上或者说对现有技术做出贡献的部分可以以软件产品的形式体现出来,该计算机软件产品存储在如上所述的一个存储介质(如ROM/RAM、磁碟、光盘)中,包括若干指令用以使得一台终端设备(可以是手机,计算机,服务器,或者网络设备等)执行本申请各个实施例所述的方法。
以上所述仅为本申请的优选实施例而已,并不用于限制本申请,对于本领域的技术人员来说,本申请可以有各种更改和变化。凡在本申请的精神和原则之内,所作的任何修改、等同替换、改进等,均应包含在本申请的保护范围之内。应注意到:相似的标号和字母在下面的附图中表示类似项,因此,一旦某一项在一个附图中被定义,则在随后的附图中不需要对其进行进一步定义和解释。
以上仅为本申请的优选实施例,并非因此限制本申请的专利范围,凡是利用本申请说明书及附图内容所作的等效结构或等效流程变换,或直接或间接运用在其他相关的技术领域,均同理包括在本申请的专利保护范围内。
工业实用性:
本申请通过预先构建层次类别树,并基于层次类别树得到层次三元组损失函数,进而通过该层次三元组损失函数对进行对神经网络训练,已提取特征并进行图像搜索,克服了现有的三元组损失函数算法中的样本过于随机的缺点,进行学习、搜索和识别任务的速度快、效率高,并且大大提高了准确度。

Claims (16)

  1. 一种基于层次三元组损失函数的深度度量学习方法,其特征在于,包括:
    基于三元组损失函数得到任意两类之间的类间距离,构建层次类别树;
    基于所述层次类别树,通过所述类间距离,对三元组损失函数进行层次化,得到层次三元组损失函数;
    基于所述层次三元组损失函数对神经网络进行训练,提取得到目标图像抽取特征,并根据所述目标图像抽取特征进行图像搜索,以得到目标搜索图像。
  2. 如权利要求1所述基于层次三元组损失函数的深度度量学习方法,其特征在于,所述基于三元组损失函数得到任意两类之间的类间距离,构建层次类别树,包括:
    利用标准的三元组损失函数训练得到三元组神经网络模型;
    根据所述三元组神经网络模型,得到数据层次化结构;
    通过所述数据层次化结构进行计算,得到任意两类之间的类间距离,并通过所述类间距离构建层次类别树。
  3. 如权利要求2所述基于层次三元组损失函数的深度度量学习方法,其特征在于,所述通过所述数据层次化结构进行计算,得到任意两类之间的类间距离,包括:
    定义任意两个类别,即为第p个类别和第q个类别,其间的类间距离通过如下公式计算:
    Figure PCTCN2018108405-appb-100001
    其中,该公式表征了第p个类别和第q个类别的平均距离,即为类间距离。
  4. 如权利要求2或3所述基于层次三元组损失函数的深度度量学习方法,其特征在于,所述类间距离的取值范围为0至4。
  5. 如权利要求4所述基于层次三元组损失函数的深度度量学习方法,其特征在于,所述层次类别树包括多个层级;其中,平均类内距离作为第0层级的合并阈值;所述层次类别树还包括多个叶子节点;每个所述叶子节点为对应层级的图像类别;
    所述通过所述类间距离构建层次类别树,包括:
    根据所述类间距离对所述叶子节点进行合并;其中,叶子节点的合并的通过设置所述合并阈值进行合并,构建所述层次类别树;
    所述合并阈值,被设定为如下公式:
    Figure PCTCN2018108405-appb-100002
    其中,d 1为任意两类的距离,在所述层次类别树中,如果第一层级中任意两类的距离小于d 1,即将该两类进行合并。
  6. 如权利要求4或5所述基于层次三元组损失函数的深度度量学习方法,其特征在于,所述基于所述层次类别树,通过所述类间距离,对三元组损失函数进行层次化,得到层次三元组损失函数,包括:
    提取所述层次类别树中第0层级的叶子节点作为目标叶子节点;
    基于所述类间距离,对所述目标叶子节点选择与其对应的近邻类别作为锚点类别;
    在每个所述锚点类别中随机提取图片,组成训练图片;
    搜索所述训练图片组成的三元组对应的动态损失边界,构成与所述动态损失边界对应的所述层次三元组损失函数。
  7. 如权利要求6所述基于层次三元组损失函数的深度度量学习方法,其特征在于,所述搜索所述训练图片组成的三元组对应的动态损失边界,构成所述层次三元组损失函数,包括:
    对于每一个所述训练图片组成的三元组,通过所述层次类别树计算所述锚点类别和负样本类别之间的类别关系,得到所述动态损失边界,成与所述动态损失边界对应的所述层次三元组损失函数。
  8. 一种基于层次三元组损失函数的深度度量学习装置,其特征在于,包括:构建模块、层次模块和训练模块;
    所述构建模块,配置成基于三元组损失函数得到任意两类之间的类间距离,构建层次类别树;
    所述层次模块,配置成基于所述层次类别树,通过所述类间距离,对三元组损失函数进行层次化,得到层次三元组损失函数;
    所述训练模块,配置成基于所述层次三元组损失函数对神经网络进行训练,提取得到目标图像抽取特征,并根据所述目标图像抽取特征进行图像搜索,以便于得到目标搜索图像。
  9. 如权利要求8所述基于层次三元组损失函数的深度度量学习装置,其特征在于,所述构建模块包括:
    训练单元,配置成利用标准的三元组损失函数训练得到三元组神经网络模型;
    层次单元,配置成根据所述三元组神经网络模型,得到数据层次化结构;
    构建单元,配置成通过所述数据层次化结构进行计算,得到任意两类之间的类间距离,并通过所述类间距离构建层次类别树。
  10. 如权利要求9所述基于层次三元组损失函数的深度度量学习装置,其特征在于,所述构建单元包括:
    计算子单元,配置成定义任意两个类别,即为第p个类别和第q个类别,其间的类间距离通过如下公式计算:
    Figure PCTCN2018108405-appb-100003
    其中,该公式表征了第p个类别和第q个类别的平均距离,即为类间距离。
  11. 如权利要求10或9所述基于层次三元组损失函数的深度度量学习装置,其特征在于,所述类间距离的取值范围为0至4。
  12. 如权利要求11所述基于层次三元组损失函数的深度度量学习装置,其特征在于,所述层次类别树包括多个层级;其中,平均类内距离作为第0层级 的合并阈值;所述层次类别树还包括多个叶子节点;每个所述叶子节点为对应层级的图像类别;
    所述构建单元还包括:
    合并子单元,配置成根据所述类间距离对所述叶子节点进行合并;其中,叶子节点的合并的通过设置所述合并阈值进行合并,构建所述层次类别树;
    所述合并阈值,被设定为如下公式:
    Figure PCTCN2018108405-appb-100004
    其中,d 1为任意两类的距离,在所述层次类别树中,如果第一层级中任意两类的距离小于d 1,即将该两类进行合并。
  13. 如权利要求11或12所述基于层次三元组损失函数的深度度量学习装置,其特征在于,所述层次模块包括:
    第一提取单元,配置成提取所述层次类别树中第0层级的叶子节点作为目标叶子节点;
    选择单元,配置成基于所述类间距离,对所述目标叶子节点选择与其对应的近邻类别作为锚点类别;
    第二提取单元,配置成在每个所述锚点类别中随机提取图片,组成训练图片;
    构成单元,配置成搜索所述训练图片组成的三元组对应的动态损失边界,构成与所述动态损失边界对应的所述层次三元组损失函数。
  14. 如权利要求13所述基于层次三元组损失函数的深度度量学习装置,其特征在于,所述构成单元,具体配置成:
    对于每一个所述训练图片组成的三元组,通过所述层次类别树计算所述锚点类别和负样本类别之间的类别关系,得到所述动态损失边界,成与所述动态损失边界对应的所述层次三元组损失函数。
  15. 一种用户终端,其特征在于,包括存储器以及处理器,所述存储器配置成存储基于层次三元组损失函数的深度度量学习程序,所述处理器运行所述基于层次三元组损失函数的深度度量学习程序以使所述用户终端执行如权利要求1至7中任一项所述基于层次三元组损失函数的深度度量学习方法。
  16. 一种计算机可读存储介质,其特征在于,所述计算机可读存储介质上存储有基于层次三元组损失函数的深度度量学习程序,所述基于层次三元组损失函数的深度度量学习程序被处理器执行时实现如权利要求1至7中任一项所述基于层次三元组损失函数的深度度量学习方法。
PCT/CN2018/108405 2018-09-07 2018-09-28 基于层次三元组损失函数的深度度量学习方法及其装置 WO2020047921A1 (zh)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN201811044820.5A CN109145129B (zh) 2018-09-07 2018-09-07 基于层次三元组损失函数的深度度量学习方法及其装置
CN201811044820.5 2018-09-07

Publications (1)

Publication Number Publication Date
WO2020047921A1 true WO2020047921A1 (zh) 2020-03-12

Family

ID=64823882

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2018/108405 WO2020047921A1 (zh) 2018-09-07 2018-09-28 基于层次三元组损失函数的深度度量学习方法及其装置

Country Status (2)

Country Link
CN (1) CN109145129B (zh)
WO (1) WO2020047921A1 (zh)

Families Citing this family (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110009013B (zh) * 2019-03-21 2021-04-27 腾讯科技(深圳)有限公司 编码器训练及表征信息提取方法和装置
CN110059604B (zh) * 2019-04-10 2021-04-27 清华大学 深度均匀人脸特征提取的网络训练方法及装置
CN110889348A (zh) * 2019-11-15 2020-03-17 亚信科技(中国)有限公司 一种提升复杂光线下人脸识别成功率的方法及装置
CN111310054B (zh) * 2020-03-06 2022-05-31 中国科学院信息工程研究所 一种基于自适应Margin对称度量学习的推荐方法和装置
CN111667050B (zh) * 2020-04-21 2021-11-30 佳都科技集团股份有限公司 度量学习方法、装置、设备及存储介质
CN111861909B (zh) * 2020-06-29 2023-06-16 南京理工大学 一种网络细粒度图像分类方法

Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN103678504A (zh) * 2013-11-19 2014-03-26 西安华海盈泰医疗信息技术有限公司 基于相似度的乳腺图像的匹配图像检索方法及检索系统
CN103902689A (zh) * 2014-03-26 2014-07-02 小米科技有限责任公司 聚类方法、增量聚类方法及相关装置
US20160180151A1 (en) * 2014-12-17 2016-06-23 Google Inc. Generating numeric embeddings of images
CN107480785A (zh) * 2017-07-04 2017-12-15 北京小米移动软件有限公司 卷积神经网络的训练方法及装置
CN108009531A (zh) * 2017-12-28 2018-05-08 北京工业大学 一种多策略防欺诈的人脸识别方法
CN108197538A (zh) * 2017-12-21 2018-06-22 浙江银江研究院有限公司 一种基于局部特征和深度学习的卡口车辆检索系统及方法

Family Cites Families (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN108399428B (zh) * 2018-02-09 2020-04-10 哈尔滨工业大学深圳研究生院 一种基于迹比准则的三元组损失函数设计方法

Patent Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN103678504A (zh) * 2013-11-19 2014-03-26 西安华海盈泰医疗信息技术有限公司 基于相似度的乳腺图像的匹配图像检索方法及检索系统
CN103902689A (zh) * 2014-03-26 2014-07-02 小米科技有限责任公司 聚类方法、增量聚类方法及相关装置
US20160180151A1 (en) * 2014-12-17 2016-06-23 Google Inc. Generating numeric embeddings of images
CN107480785A (zh) * 2017-07-04 2017-12-15 北京小米移动软件有限公司 卷积神经网络的训练方法及装置
CN108197538A (zh) * 2017-12-21 2018-06-22 浙江银江研究院有限公司 一种基于局部特征和深度学习的卡口车辆检索系统及方法
CN108009531A (zh) * 2017-12-28 2018-05-08 北京工业大学 一种多策略防欺诈的人脸识别方法

Also Published As

Publication number Publication date
CN109145129A (zh) 2019-01-04
CN109145129B (zh) 2020-03-31

Similar Documents

Publication Publication Date Title
WO2020047921A1 (zh) 基于层次三元组损失函数的深度度量学习方法及其装置
US11238065B1 (en) Systems and methods for generating and implementing knowledge graphs for knowledge representation and analysis
US9454580B2 (en) Recommendation system with metric transformation
CN108287864B (zh) 一种兴趣群组划分方法、装置、介质及计算设备
EP2612263B1 (en) Sketch-based image search
US9348898B2 (en) Recommendation system with dual collaborative filter usage matrix
KR20200094627A (ko) 텍스트 관련도를 확정하기 위한 방법, 장치, 기기 및 매체
WO2019041521A1 (zh) 用户关键词提取装置、方法及计算机可读存储介质
US20220075838A1 (en) Taxonomy-based system for discovering and annotating geofences from geo-referenced data
US20140172767A1 (en) Budget optimal crowdsourcing
WO2021027595A1 (zh) 用户画像生成方法、装置、计算机设备和计算机可读存储介质
CN110674312B (zh) 构建知识图谱方法、装置、介质及电子设备
WO2019205373A1 (zh) 相似用户查找装置、方法及计算机可读存储介质
US11200444B2 (en) Presentation object determining method and apparatus based on image content, medium, and device
CN111460234B (zh) 图查询方法、装置、电子设备及计算机可读存储介质
CN104077723B (zh) 一种社交网络推荐系统及方法
US10268655B2 (en) Method, device, server and storage medium of searching a group based on social network
US11475059B2 (en) Automated image retrieval with graph neural network
EP3731239A1 (en) Polypharmacy side effect prediction with relational representation learning
US20150278910A1 (en) Directed Recommendations
CN110968802B (zh) 一种用户特征的分析方法、分析装置及可读存储介质
WO2018227773A1 (zh) 地点推荐方法、装置、计算机设备和存储介质
Chen et al. From tie strength to function: Home location estimation in social network
CN111797620A (zh) 识别专有名词的系统和方法
CN116503679B (zh) 一种基于迁移性图谱的图像分类方法、装置、设备和介质

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 18932674

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE

122 Ep: pct application non-entry in european phase

Ref document number: 18932674

Country of ref document: EP

Kind code of ref document: A1