WO2022185529A1 - Learning device, learning method, inference device, inference method, and recording medium - Google Patents

Learning device, learning method, inference device, inference method, and recording medium Download PDF

Info

Publication number
WO2022185529A1
WO2022185529A1 PCT/JP2021/008691 JP2021008691W WO2022185529A1 WO 2022185529 A1 WO2022185529 A1 WO 2022185529A1 JP 2021008691 W JP2021008691 W JP 2021008691W WO 2022185529 A1 WO2022185529 A1 WO 2022185529A1
Authority
WO
WIPO (PCT)
Prior art keywords
feature representation
class
hierarchical
input data
feature
Prior art date
Application number
PCT/JP2021/008691
Other languages
French (fr)
Japanese (ja)
Inventor
周平 吉田
Original Assignee
日本電気株式会社
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 日本電気株式会社 filed Critical 日本電気株式会社
Priority to PCT/JP2021/008691 priority Critical patent/WO2022185529A1/en
Priority to JP2023503320A priority patent/JPWO2022185529A5/en
Publication of WO2022185529A1 publication Critical patent/WO2022185529A1/en

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N20/00Machine learning

Definitions

  • This disclosure relates to a learning method for a machine learning model.
  • Patent Literature 1 discloses a learning method for identifying categories having a hierarchical structure.
  • One purpose of the present disclosure is to generate a low-cost and highly accurate machine learning model.
  • a learning device includes: a feature extraction means for converting input data into a first feature representation; projection means for transforming the first feature representation into a second feature representation representing a point on hyperbolic space; Classification means for performing classification based on the second feature representation and outputting a score indicating the possibility that the input data belongs to each class; loss calculation means for calculating a hierarchical loss based on the knowledge of the hierarchical structure to which each class belongs, the correct label assigned to the input data, and the score; updating means for updating parameters of the feature extracting means, the projecting means and the classifying means based on the hierarchical loss.
  • a learning method comprises: converting the input data into a first feature representation using the feature extraction means; transforming the first feature representation into a second feature representation representing a point on hyperbolic space using a projection means; Classifying based on the second feature representation using classifying means and outputting a score indicating the likelihood that the input data belongs to each class; calculating a hierarchical loss based on the knowledge of the hierarchical structure to which each class belongs, the correct label assigned to the input data, and the score; Based on the hierarchical loss, parameters of the feature extraction means, the projection means and the classifier are updated.
  • the recording medium comprises transforming the input data into a first feature representation using the classifier; transforming the first feature representation into a second feature representation representing a point on hyperbolic space using a projection means; Classifying based on the second feature representation using classifying means and outputting a score indicating the likelihood that the input data belongs to each class; calculating a hierarchical loss based on the knowledge of the hierarchical structure to which each class belongs, the correct label assigned to the input data, and the score; A program for causing a computer to execute processing for updating parameters of the feature extracting means, the projecting means and the classifying means based on the hierarchical loss is recorded.
  • an inference device includes: a feature extraction means for converting input data into a first feature representation; projection means for transforming the first feature representation into a second feature representation representing a point on hyperbolic space; a classification means for performing classification based on the second feature representation, and using knowledge of a hierarchical structure to which each class belongs to calculate a score indicating a possibility that the input data belongs to each class for each hierarchy.
  • an inference method includes: transforming the input data into a first feature representation; transforming the first feature representation into a second feature representation representing a point on hyperbolic space; Classification is performed based on the second feature representation, and a score indicating the possibility that the input data belongs to each class is calculated for each hierarchy using knowledge of the hierarchical structure to which each class belongs.
  • the recording medium comprises transforming the input data into a first feature representation; transforming the first feature representation into a second feature representation representing a point on hyperbolic space; Classification is performed based on the second feature representation, and a computer is caused to execute processing for calculating, for each hierarchy, a score indicating the possibility that the input data belongs to each class, using knowledge of the hierarchical structure to which each class belongs. Record the program.
  • FIG. 2 shows another example of division by multiple classifiers forming a hierarchical hyperbolic classifier; 9 is a flowchart of learning processing by the learning device of the second embodiment; FIG. 12 is a block diagram showing the functional configuration of the inference device of the second embodiment; FIG. 9 is a flowchart of inference processing by the inference device of the second embodiment; FIG. 11 is a block diagram showing the functional configuration of a learning device according to a third embodiment; FIG. 4 shows a schematic configuration of a hierarchical hyperbolic projection unit; FIG. 2 is a diagram conceptually explaining feature representations and differences; 10 is a flowchart of learning processing by the learning device of the third embodiment; FIG.
  • FIG. 11 is a block diagram showing the functional configuration of an inference device according to a third embodiment;
  • FIG. 10 is a flowchart of inference processing by the inference device of the third embodiment;
  • FIG. 11 is a block diagram showing the functional configuration of a learning device according to a fourth embodiment;
  • FIG. 10 is a flowchart of learning processing by the learning device of the fourth embodiment;
  • FIG. 14 is a block diagram showing the functional configuration of an inference device according to a fifth embodiment;
  • FIG. 12 is a flowchart of inference processing by the inference device of the fifth embodiment;
  • FIG. 1 is a block diagram showing the hardware configuration of the learning device 100 of the first embodiment.
  • the learning device 100 includes an interface (I/F) 11 , a processor 12 , a memory 13 , a recording medium 14 and a database (DB) 15 .
  • the interface 11 performs data input/output with an external device. Specifically, data with correct answers used for learning is input through the interface 11 .
  • the processor 12 is a computer such as a CPU (Central Processing Unit), and controls the entire study device 100 by executing a program prepared in advance.
  • the processor 12 may be a GPU (Graphics Processing Unit) or an FPGA (Field-Programmable Gate Array).
  • the processor 12 executes learning processing, which will be described later.
  • the memory 13 is composed of ROM (Read Only Memory), RAM (Random Access Memory), and the like. Memory 13 is also used as a working memory during execution of various processes by processor 12 .
  • the recording medium 14 is a non-volatile, non-temporary recording medium such as a disk-shaped recording medium or semiconductor memory, and is configured to be detachable from the learning device 100 .
  • the recording medium 14 records various programs executed by the processor 12 .
  • DB15 memorize
  • FIG. 2 is a block diagram showing the functional configuration of the learning device 100 of the first embodiment.
  • the learning device 100 includes a feature extraction unit 21 , a hyperbolic projection unit 22 , a hyperbolic classification unit 23 , a hierarchical loss calculation unit 24 , a gradient calculation unit 25 and an update unit 26 .
  • Data with correct answers include input data and correct labels corresponding to the input data.
  • the input data is an image used for learning
  • the correct label is information indicating the class of the object included in the image.
  • input data is input to the feature extraction unit 21 and correct labels are input to the hierarchical loss calculation unit 24 .
  • the feature extraction unit 21 converts the input data into a pre-feature representation.
  • the feature representation output by the feature extraction unit 21 is called a "pre-feature representation” in the sense of distinguishing it from the feature representation output by the hyperbolic projection unit 22, which will be described later.
  • Both the “previous feature representation” and the “feature representation” are information representing features of the input data.
  • the feature extraction unit 21 is configured by a deep convolutional neural network (CNN) or the like, and uses a sequence (vector) of real numbers representing features of an input image as a pre-feature representation. Output to the hyperbolic projection unit 22 .
  • CNN deep convolutional neural network
  • the hyperbolic projection unit 22 converts the pre-feature representation into a feature representation.
  • feature representation is a point on some manifold, and selecting a specific projection part is equivalent to selecting a manifold (feature space) to which the feature representation belongs.
  • feature space manifold
  • this embodiment uses a hyperbolic space as the feature amount space.
  • this embodiment obtains a highly accurate model with a small amount of training data by using knowledge about the hierarchical structure of classes, but the hierarchical structure (tree structure) expands exponentially.
  • the hierarchical structure (tree structure) expands exponentially.
  • Euclidean space and hypersphere are used as feature space, but Euclidean space and hypersphere expand only polynomially, so they are not suitable for embedding tree structures. That is, when a hierarchical structure is expressed on Euclidean space or hypersphere, distortion is inevitable in low dimensions. Therefore, in order to express a hierarchical structure (tree structure) without distortion on the Euclidean space or hypersphere, it is necessary to use a feature amount space exponentially higher in terms of the number of classes.
  • the hyperbolic space is used as the feature amount space.
  • a tree structure can be embedded efficiently in a hyperbolic space.
  • a hyperbolic space that expands exponentially can embed a tree structure without distortion even in two dimensions. Therefore, the hyperbolic projection unit 22 converts the previous feature representation into a feature representation on the hyperbolic space, and outputs the feature representation to the hyperbolic classification unit 23 .
  • the feature representation is also a sequence (vector) of real numbers, but can be regarded as coordinate values on the hyperbolic space, which is the feature quantity space.
  • the hyperbolic projection unit 22 can use Poincaré projection, Lorentz projection, etc. according to a specific hyperbolic space model.
  • the hyperbolic classifier 23 classifies one feature expression on the hyperbolic space output by the hyperbolic projection unit 22, and outputs the score of each class obtained for the feature expression to the hierarchical loss calculator 24. do. Note that the hyperbolic classifier 23 outputs only the scores of terminal classes in the hierarchical structure of classes.
  • a hyperbolic hyperplane classifier or a hyperbolic nearest neighbor classifier can be used as the hyperbolic classifier 23, a hyperbolic hyperplane classifier or a hyperbolic nearest neighbor classifier can be used.
  • a hyperbolic hyperplane classifier is a classifier that extends a linear classifier to a hyperbolic space and uses a hyperbolic plane in the hyperbolic space as a discrimination plane.
  • a hyperbolic nearest neighbor classifier is a classifier that follows the nearest neighbor rule on hyperbolic space.
  • the specific expression of the hyperbolic classification unit 23 is determined by the hyperbolic space model selected by the hyperbolic projection unit 22 .
  • the hierarchical loss calculator 24 calculates a loss function from the score of each class input from the hyperbolic classifier 23 and the correct labels included in the data with correct answers. At this time, the hierarchical loss calculator 24 uses knowledge of the hierarchical structure of the class to be classified. Specifically, the hierarchical loss calculation unit 24 calculates the score for each layer from the score for each class output by the hyperbolic classifier 23, so that the score for each layer can predict the correct class for each layer. Calculate the loss for each layer. Note that the hierarchical loss calculator 24 can use a general loss function for multilevel classification, such as cross-entropy loss.
  • FIG. 3 shows an example of a hierarchical structure of classes.
  • This example shows a hierarchical structure (tree structure) with a root node of "merchandise” and has first to third hierarchies.
  • the first hierarchy includes three classes “food”, “beverage” and “pharmaceutical” as child nodes of "merchandise”.
  • the second hierarchy contains three classes "Bento”, “Bread” and “Rice ball” as child nodes of "Food” and three classes “Tea”, “Juice” and “Water” as child nodes of "Beverage”. including.
  • the third layer includes “Bento A” to “Bento C” as child nodes of “Bento”, “Bread A” to “Bread C” as child nodes of “Bread”, and “Onigiri” as child nodes.
  • “Rice ball A” to “Rice ball C” are included as nodes.
  • illustration of the second and third layers of pharmaceuticals and the third layer of beverages is omitted.
  • the hyperbolic classifier 23 outputs only the scores of terminal classes in the hierarchical structure of classes.
  • the hyperbolic classifying unit 23 selects only terminal class scores such as "lunch box A” to “lunch box C", “bread A” to “bread C”, and “rice ball A” to “rice ball C”.
  • terminal class scores such as "lunch box A” to "lunch box C”
  • bread A” to "bread C” and "rice ball A” to "rice ball C”.
  • the hierarchical loss calculation unit 24 calculates the loss that maximizes the score of the correct class "Bento B" for the third layer, which is the layer of the terminal class, and sets it as the loss of the third layer. .
  • the hierarchical loss calculation unit 24 integrates the scores of the terminal classes that are descendants of each node and uses them for loss calculation when calculating the loss of the hierarchy higher than the terminal class. Specifically, if the score output by the hyperbolic classifier 23 is the probability of the terminal class, the score of each class in the higher hierarchy is the sum of the probabilities of the terminal classes that are descendants of the class.
  • the score of "lunch box” in the second layer is the sum of the scores of its child nodes “lunch box A” to “lunch box C”.
  • the score of "bread” in the second layer is the sum of the scores of its child nodes “bread A” to “bread C”
  • the score of "rice ball” in the second layer is the sum of the scores of its child nodes “rice ball A” to “rice ball C”.
  • the score of "Food” in the first layer is the terminal class "Bento A” to “Bento C", “Bread A” to “Bread C”, and “Rice ball A” to “Rice ball C”, which are the grandchild nodes of terminal classes. is the sum of the scores of Similarly, the score of "beverage” and "pharmaceutical” in the first hierarchy is also the sum of the scores of terminal classes that are grandchild nodes.
  • the hierarchical loss calculation unit 24 calculates a loss that maximizes the score of "food” having the correct class "lunch box B" as a descendant node for the first layer. Then, the hierarchical loss calculator 24 calculates a weighted sum of the losses calculated for each hierarchy, and outputs it as a hierarchical loss to the gradient calculator 25 .
  • the gradient calculator 25 calculates the gradient of the hierarchical loss input from the hierarchical loss calculator 24 and outputs it to the updater 26 .
  • the update unit 26 updates the parameters of the feature extraction unit 21, the hyperbolic projection unit 22, and the hyperbolic classification unit 23 using the gradients.
  • FIG. 4 is a flowchart of learning processing by the learning device 100 of the first embodiment. This processing is realized by executing a program prepared in advance by the processor 12 shown in FIG. 1 and operating as each element shown in FIG.
  • the feature extraction unit 21 converts the input data into a pre-feature representation (step S11).
  • the hyperbolic projection unit 22 transforms the previous feature representation into a feature representation on the hyperbolic space (step S12).
  • the hyperbolic classifier 23 calculates the score of each class from the feature representation (step S13).
  • the hierarchical loss calculator 24 uses the knowledge of the hierarchical structure of classes to calculate the hierarchical loss from the score and correct label of each class (step S14).
  • the gradient calculator 25 calculates the gradient of the hierarchical loss (step S15).
  • the update unit 26 updates the parameters of the feature extraction unit 21, the hyperbolic projection unit 22, and the hyperbolic classification unit 23 based on the gradient (step S16). The above processing is repeated until a predetermined learning termination condition is satisfied, and the learning processing ends.
  • the learning device 100 of the first embodiment it is possible to learn a highly accurate model with a small amount of learning data by using the knowledge of the hierarchical structure of classes.
  • the inference device 200 of the first embodiment is the same as that of the learning device 100 shown in FIG. 1, so the explanation is omitted.
  • FIG. 5 is a block diagram showing the functional configuration of the inference device 200 of the first embodiment.
  • the inference device 200 includes a feature extraction unit 21 , a hyperbolic projection unit 22 and a hyperbolic classification unit 23 . Parameters obtained by the above learning process are set in the feature extraction unit 21, the hyperbolic projection unit 22, and the hyperbolic classification unit 23.
  • FIG. 21 is a block diagram showing the functional configuration of the inference device 200 of the first embodiment.
  • the inference device 200 includes a feature extraction unit 21 , a hyperbolic projection unit 22 and a hyperbolic classification unit 23 . Parameters obtained by the above learning process are set in the feature extraction unit 21, the hyperbolic projection unit 22, and the hyperbolic classification unit 23.
  • Input data is input to the feature extraction unit 21 .
  • This input data is data such as images that are actually subjected to class classification.
  • the feature extraction unit 21 converts the input data into a pre-feature representation and outputs it to the hyperbolic projection unit 22 .
  • the hyperbolic projection unit 22 converts the previous feature representation into a feature representation on the hyperbolic space, and outputs the feature representation to the hyperbolic classification unit 23 .
  • the hyperbolic classifier 23 calculates scores for terminal classes in the hierarchical structure of classes and outputs them as inference results. Classification of the input data is thus performed.
  • FIG. 6 is a flowchart of inference processing by the inference device 200 of the first embodiment. This processing is realized by executing a program prepared in advance by the processor 12 shown in FIG. 1 and operating as each element shown in FIG.
  • the feature extraction unit 21 converts the input data into a pre-feature representation (step S21).
  • the hyperbolic projection unit 22 transforms the previous feature representation into a feature representation on the hyperbolic space (step S22).
  • the hyperbolic classifier 23 calculates the score of each terminal class from the feature representation and outputs it as an inference result (step S23). The above processing is performed for each input data.
  • the hyperbolic classifier is also hierarchized using the knowledge of the hierarchical structure of classes.
  • FIG. 7 is a block diagram showing the functional configuration of the learning device 100a of the second embodiment.
  • the learning device 100a of the second embodiment has a hierarchical hyperbolic classifier 23x instead of the hyperbolic classifier 23.
  • the hierarchical hyperbolic classification unit 23x uses knowledge of the hierarchical structure of classes to output a score in each layer of the hierarchical structure from one feature representation in the hyperbolic space input from the hyperbolic projection unit 22.
  • FIG. 8 shows an example of a method of sharing by a plurality of classifiers forming the hierarchical hyperbolic classifier 23x.
  • Each of frames 91 to 93 indicated by thick lines indicates a portion corresponding to one classifier.
  • one classifier is provided for each layer in the hierarchical structure of classes. That is, the hierarchical hyperbolic classifier 23x is composed of three classifiers respectively corresponding to the first to third hierarchies.
  • Each classifier is a classifier that identifies nodes belonging to the same hierarchy across subtrees.
  • the hierarchical hyperbolic classifier 23x outputs classification results for each layer by three classifiers.
  • FIG. 9 shows another example of a method of sharing by a plurality of classifiers forming the hierarchical hyperbolic classifier 23x.
  • Each of frames 91 to 93 indicated by thick lines indicates a portion corresponding to one classifier.
  • a plurality of classifiers for identifying sibling nodes belonging to the same parent node are provided in the third hierarchy. That is, one classifier is prepared corresponding to the nodes ⁇ Bento A'' to ⁇ Bento C'' belonging to the same parent node ⁇ Bento'', and the nodes ⁇ Bread A'' to ⁇ Bread C'' belonging to the same parent node ⁇ Bread''.
  • One classifier is prepared corresponding to .
  • the hierarchical hyperbolic classifier 23x outputs classification results by each of the plurality of classifiers. That is, the hierarchical hyperbolic classifier 23x outputs the classification result corresponding to the frame 91 for the first hierarchy, outputs the classification result corresponding to the frame 92 for the second hierarchy, and outputs a plurality of classification results for the third hierarchy. A classification result corresponding to the frame 93 is output.
  • the hierarchical hyperbolic classifier 23x calculates a classification result (score) for each layer and outputs it to the hierarchical loss calculator 24.
  • the hierarchical loss calculator 24 calculates a loss for the classification result of each hierarchy input from the hierarchical hyperbolic classifier 23x, and outputs a weighted sum of them to the gradient calculator 25 as a hierarchical loss.
  • the hierarchical hyperbolic classifier 23x outputs not only the score of the terminal class but also the score of the upper class. It is not necessary to integrate the scores of the terminal classes to calculate the scores of the upper hierarchy as in the case of the embodiment.
  • the configurations and operations of the feature extraction unit 21, the gradient calculation unit 25, and the update unit 26 in the learning device 100a of the second embodiment are the same as those of the first embodiment, so descriptions thereof will be omitted.
  • FIG. 10 is a flowchart of learning processing by the learning device 100a of the second embodiment. This processing is realized by executing a program prepared in advance by the processor 12 shown in FIG. 1 and operating as each element shown in FIG.
  • the feature extraction unit 21 converts the input data into a pre-feature representation (step S31).
  • the hyperbolic projection unit 22 transforms the previous feature representation into a feature representation on the hyperbolic space (step S32).
  • the hierarchical hyperbolic classifier 23x uses the knowledge of the hierarchical structure of classes to calculate the score of each class for each layer from the feature representation (step S33).
  • the hierarchical loss calculator 24 calculates the hierarchical loss from the score of each class and the correct label for each hierarchy (step S34).
  • the gradient calculator 25 calculates the gradient of the hierarchical loss (step S35).
  • the updating unit 26 updates the parameters of the feature extracting unit 21, the hyperbolic projecting unit 22, and the hierarchical hyperbolic classifying unit 23x based on the gradient (step S36). The above processing is repeated until a predetermined learning termination condition is satisfied, and the learning processing ends.
  • the inference device 200 is the same as that of the learning device 100 shown in FIG. 1, so description thereof will be omitted.
  • FIG. 11 is a block diagram showing the functional configuration of the inference device 200a of the second embodiment.
  • the inference device 200a includes a feature extraction unit 21, a hyperbolic projection unit 22, and a hierarchical hyperbolic classification unit 23x. Parameters obtained by the previous learning process are set in the feature extraction unit 21, the hyperbolic projection unit 22, and the hierarchical hyperbolic classification unit 23x.
  • Input data is input to the feature extraction unit 21 .
  • This input data is data such as images that are actually subjected to class classification.
  • the feature extraction unit 21 converts the input data into a pre-feature representation and outputs it to the hyperbolic projection unit 22 .
  • the hyperbolic projection unit 22 converts the previous feature representation into a feature representation on the hyperbolic space, and outputs it to the hierarchical hyperbolic classification unit 23x.
  • the hierarchical hyperbolic classifier 23x uses the knowledge of the hierarchical structure of classes to calculate the score for each class in each hierarchy and output it as an inference result. Classification of the input data is thus performed.
  • FIG. 12 is a flowchart of inference processing by the inference device 200a of the second embodiment. This processing is realized by executing a program prepared in advance by the processor 12 shown in FIG. 1 and operating as each element shown in FIG.
  • the feature extraction unit 21 converts the input data into a pre-feature representation (step S41).
  • the hyperbolic projection unit 22 transforms the previous feature representation into a feature representation on the hyperbolic space (step S42).
  • the hierarchical hyperbolic classifier 23x utilizes the knowledge of the hierarchical structure of the classes, calculates the score of each class for each class from the feature representation, and outputs it as an inference result (step S43). The above processing is performed for each input data.
  • the hyperbolic projection unit 22 is also hierarchized using knowledge of the hierarchical structure of classes.
  • FIG. 13 is a block diagram showing the functional configuration of the learning device 100b of the third embodiment.
  • the learning device 100b of the second embodiment has a hierarchical hyperbolic projection unit 22x instead of the hyperbolic projection unit 22.
  • the hierarchical hyperbolic projection unit 22x uses knowledge of the hierarchical structure of classes to output feature representations in each layer of the hierarchical structure from the pre-feature representation input from the feature extraction unit 21.
  • FIG. 14 shows a schematic configuration of the hierarchical hyperbolic projection unit 22x.
  • the hierarchical hyperbolic projection unit 22 x includes first to third embedding networks (NW) and adders 31 and 32 .
  • Pre-feature expressions are input from the feature extraction unit 21 to the first to third embedding NWs.
  • the first embedding NW uses knowledge of the hierarchical structure of the classes and outputs a vector indicating a point on the hyperbolic space of the class corresponding to the node of the first hierarchy as the feature representation C1.
  • the second embedding NW outputs the difference D1 between the feature representation C1 of the class corresponding to the parent node of the node and the feature representation of the node for the second layer node.
  • the adder 31 then outputs the sum of the characteristic representation C1 of the parent node and the difference D1 as the characteristic representation C2 corresponding to that node in the second layer.
  • the feature representation C2 is a vector indicating a point on the hyperbolic space.
  • the third embedding NW outputs the difference D2 between the feature representation C2 of the class corresponding to the parent node of the node and the feature representation of the node for the node of the third layer.
  • the adder 32 then outputs the sum of the characteristic representation C2 of the parent node and the difference D2 as the characteristic representation C3 corresponding to that node in the third layer.
  • the feature representation C3 is a vector indicating a point on the hyperbolic space.
  • FIG. 15 is a diagram conceptually explaining the feature representations C1 to C3 and the differences D1 to D2.
  • FIG. 15 shows the hyperbolic space in a two-dimensional space for convenience. Assuming the hierarchical structure of the classes shown in FIG. 3, circles ( ⁇ ) indicate the feature representation C1 of the class in the first layer, squares ( ⁇ ) indicate the feature representation C2 of the class in the second layer, and triangles ( ⁇ ) indicates the feature representation C3 of the class in the third layer.
  • the difference D1 can be considered as a vector pointing from the first layer class "food” indicated by circles to the second layer classes "bento", "bread", and "rice ball” indicated by squares.
  • the difference D2 can be considered as a vector pointing from the second-layer class "Bread” indicated by squares to the third-layer classes "Bread A" to "Bread C” indicated by triangles.
  • the "difference” is the tangent vector of the hyperbolic space in the feature representation of the class of the parent node, and the "sum” is realized by exponential mapping.
  • the hierarchical hyperbolic projection unit 22x outputs the feature representations C1 to C3 for each layer for one input data to the hierarchical hyperbolic classification unit 23x.
  • the hierarchical hyperbolic classifier 23 x receives the feature representation for each layer, classifies it for each layer, and outputs the classification result to the hierarchical loss calculator 24 .
  • the configurations and operations of the feature extraction unit 21, the gradient calculation unit 25, and the update unit 26 in the learning device 100b of the third embodiment are the same as those of the first embodiment, so descriptions thereof will be omitted.
  • FIG. 16 is a flowchart of learning processing by the learning device 100b of the third embodiment. This processing is realized by executing a program prepared in advance by the processor 12 shown in FIG. 1 and operating as each element shown in FIG.
  • the feature extraction unit 21 converts the input data into a pre-feature representation (step S51).
  • the hierarchical hyperbolic projection unit 22x converts the previous feature representation into a feature representation on the hyperbolic space for each layer (step S52).
  • the hierarchical hyperbolic classification unit 23x calculates the score of each class for each layer from the feature representation for each layer input from the hierarchical hyperbolic projection unit 22x (step S53).
  • the hierarchical loss calculator 24 calculates a hierarchical loss from the score of each class for each hierarchy and the correct label (step S54).
  • the gradient calculator 25 calculates the gradient of the hierarchical loss (step S55).
  • the update unit 26 updates the parameters of the feature extraction unit 21, the hierarchical hyperbolic projection unit 22x, and the hierarchical hyperbolic classification unit 23x based on the gradient (step S56). The above processing is repeated until a predetermined learning termination condition is satisfied, and the learning processing ends.
  • the inference device 200b of the third embodiment will be described. (Hardware configuration) Since the hardware configuration of the inference device 200b is the same as that of the learning device 100 shown in FIG. 1, the description thereof is omitted.
  • FIG. 17 is a block diagram showing the functional configuration of the inference device 200b of the third embodiment.
  • the inference device 200b includes a feature extraction unit 21, a hierarchical hyperbolic projection unit 22x, and a hierarchical hyperbolic classification unit 23x. Parameters obtained by the above learning process are set in the feature extraction unit 21, the hierarchical hyperbolic projection unit 22x, and the hierarchical hyperbolic classification unit 23x.
  • Input data is input to the feature extraction unit 21 .
  • This input data is data such as images that are actually subjected to class classification.
  • the feature extraction unit 21 converts the input data into a pre-feature representation and outputs it to the hierarchical hyperbolic projection unit 22x.
  • the hierarchical hyperbolic projection unit 22x uses the knowledge of the hierarchical structure of classes to convert the previous feature representation into a feature representation on the hyperbolic space for each layer, and outputs the feature representation to the hierarchical hyperbolic classification unit 23x.
  • the hierarchical hyperbolic classifier 23x calculates a score for each class in each layer based on the feature representation for each layer, and outputs it as an inference result. Classification of the input data is thus performed.
  • FIG. 18 is a flowchart of inference processing by the inference device 200b of the third embodiment. This processing is realized by executing a program prepared in advance by the processor 12 shown in FIG. 1 and operating as each element shown in FIG.
  • the feature extraction unit 21 converts the input data into a pre-feature representation (step S61).
  • the hierarchical hyperbolic projection unit 22x converts the previous feature representation into a feature representation on the hyperbolic space for each layer (step S62).
  • the hierarchical hyperbolic classifier 23x calculates the score of each class for each layer from the feature representation of each layer, and outputs it as an inference result (step S63). The above processing is performed for each input data.
  • FIG. 19 is a block diagram showing the functional configuration of the learning device of the fourth embodiment.
  • the learning device 70 includes feature extraction means 71 , projection means 72 , classification means 73 , loss calculation means 74 and update means 75 .
  • FIG. 20 is a flowchart of learning processing by the learning device 70 of the fourth embodiment.
  • the feature extraction means 71 converts the input data into the first feature representation (step S71).
  • the projection means 72 transforms the first feature representation into a second feature representation indicating a point on the hyperbolic space (step S72).
  • the classification means 73 performs classification based on the second feature representation, and outputs a score indicating the possibility that the input data belongs to each class (step S73).
  • the loss calculation means 74 calculates the hierarchical loss based on the knowledge of the hierarchical structure to which each class belongs, the correct label assigned to the input data, and the score (step S74).
  • the updating means 75 updates the parameters of the feature extracting means, the projecting means and the classifying means based on the hierarchical loss (step S75).
  • the fourth embodiment by using the knowledge of the hierarchical structure of classes, it is possible to generate a highly accurate model even with a small amount of input data.
  • FIG. 21 is a block diagram showing the functional configuration of the inference device of the fifth embodiment.
  • the inference device 80 comprises feature extraction means 81 , projection means 82 and classification means 83 .
  • FIG. 22 is a flowchart of inference processing by the inference device 80 of the fifth embodiment.
  • the feature extraction means 81 converts the input data into the first feature representation (step S81).
  • the projection means 82 transforms the first feature representation into a second feature representation indicating a point on the hyperbolic space (step S82).
  • the classification means 83 performs classification based on the second feature representation, and uses knowledge of the hierarchical structure to which each class belongs to calculate a score indicating the possibility that the input data belongs to each class for each hierarchy. (step S83). According to the fourth embodiment, it is possible to perform inference with high accuracy using a model learned using knowledge of the hierarchical structure of classes.
  • (Appendix 1) a feature extraction means for converting input data into a first feature representation; projection means for transforming the first feature representation into a second feature representation representing a point on hyperbolic space; Classification means for performing classification based on the second feature representation and outputting a score indicating the possibility that the input data belongs to each class; loss calculation means for calculating a hierarchical loss based on the knowledge of the hierarchical structure to which each class belongs, the correct label assigned to the input data, and the score; updating means for updating parameters of the feature extracting means, the projecting means and the classifying means based on the hierarchical loss;
  • a learning device with
  • the classification means outputs a score for the terminal class of the hierarchical structure
  • the loss calculation means according to appendix 1, wherein the scores of the terminal classes are integrated to calculate the losses of the layers higher than the layer of the terminal class, and the weighted sum of the losses of each layer is calculated as the hierarchical loss. learning device.
  • the loss calculation means calculates a loss that maximizes the score of the correct class for the hierarchy of the terminal class, and for the hierarchy higher than the hierarchy of the terminal class, the class to which the correct class belongs among the classes of the hierarchy 3.
  • the learning device of claim 2 which calculates a loss that maximizes the score of .
  • the classifying means outputs the score for each hierarchy using the knowledge of the hierarchical structure, 4.
  • the learning device according to any one of supplementary notes 1 to 3, wherein the loss calculation means calculates the hierarchical loss based on the score output for each layer.
  • the projection means outputs the second feature representation for each hierarchy based on the knowledge of the hierarchical structure; 5.
  • the learning device according to appendix 4, wherein the classifying means outputs the score for each layer based on the second feature representation output for each layer.
  • (Appendix 7) converting the input data into a first feature representation using the feature extraction means; transforming the first feature representation into a second feature representation representing a point on hyperbolic space using a projection means; Classifying based on the second feature representation using classifying means and outputting a score indicating the likelihood that the input data belongs to each class; calculating a hierarchical loss based on the knowledge of the hierarchical structure to which each class belongs, the correct label assigned to the input data, and the score;
  • a recording medium recording a program for causing a computer to execute processing for updating parameters of the feature extraction means, the projection means, and the classification means based on the hierarchical loss.
  • (Appendix 8) a feature extraction means for converting input data into a first feature representation; projection means for transforming the first feature representation into a second feature representation representing a point on hyperbolic space; Classification means for performing classification based on the second feature representation, and using knowledge of the hierarchical structure to which each class belongs to calculate a score indicating the possibility that the input data belongs to each class for each hierarchy;
  • a reasoning device with
  • the projection means outputs the second feature representation for each hierarchy based on the knowledge of the hierarchical structure; 5.
  • the learning device according to any one of supplementary notes 1 to 4, wherein the classifying means outputs the score for each layer based on the second feature representation output for each layer.

Abstract

In this learning device, a feature extraction means converts input data into a first feature representation. A projection means converts the first feature representation into a second feature representation representing a point in a hyperbolic space. A classification means performs classification on the basis of the second feature representation, and outputs a score indicating the likelihood that the input data belongs to each class. A loss calculation means calculates a hierarchical loss on the basis of knowledge of the hierarchical structure to which each class belongs, a correct answer label assigned to the input data, and the score. An update means updates the parameters of the feature extraction means, the projection means, and the classification means on the basis of the hierarchical loss.

Description

学習装置、学習方法、推論装置、推論方法、及び、記録媒体Learning device, learning method, reasoning device, reasoning method, and recording medium
 本開示は、機械学習モデルの学習方法に関する。 This disclosure relates to a learning method for a machine learning model.
 近年、機械学習に基づく認識技術は、画像認識の分野を中心に極めて高い性能を示している。このような機械学習に基づく認識技術の高い精度は、大量の正解付きデータにより支えられている。即ち、大量の正解付きデータを用意して学習を行うことにより、高い精度が実現されている。例えば、特許文献1は、階層構造を持つカテゴリの識別について学習する手法を開示している。 In recent years, recognition technology based on machine learning has shown extremely high performance, mainly in the field of image recognition. The high accuracy of such recognition technology based on machine learning is supported by a large amount of data with correct answers. That is, high accuracy is achieved by preparing a large amount of data with correct answers and performing learning. For example, Patent Literature 1 discloses a learning method for identifying categories having a hierarchical structure.
国際公開WO2006/073081号公報International publication WO2006/073081
 一方、画像認識技術の用途などによっては、大量の正解付きデータを用意することなく、低コストで精度の高い機械学習を実現することが求められている。 On the other hand, depending on the application of image recognition technology, it is required to realize low-cost and highly accurate machine learning without preparing a large amount of data with correct answers.
 本開示の1つの目的は、低コストで精度の高い機械学習モデルを生成することにある。 One purpose of the present disclosure is to generate a low-cost and highly accurate machine learning model.
 本開示の一つの観点では、学習装置は、
 入力データを第1の特徴表現に変換する特徴抽出手段と、
 前記第1の特徴表現を、双曲空間上の点を示す第2の特徴表現に変換する射影手段と、
 前記第2の特徴表現に基づいて分類を行い、前記入力データが各クラスに属する可能性を示すスコアを出力する分類手段と、
 前記各クラスが属する階層構造の知識と、前記入力データに付与された正解ラベルと、前記スコアとに基づいて階層的損失を計算する損失計算手段と、
 前記階層的損失に基づいて、前記特徴抽出手段、前記射影手段及び前記分類手段のパラメータを更新する更新手段と、を備える。
In one aspect of the present disclosure, a learning device includes:
a feature extraction means for converting input data into a first feature representation;
projection means for transforming the first feature representation into a second feature representation representing a point on hyperbolic space;
Classification means for performing classification based on the second feature representation and outputting a score indicating the possibility that the input data belongs to each class;
loss calculation means for calculating a hierarchical loss based on the knowledge of the hierarchical structure to which each class belongs, the correct label assigned to the input data, and the score;
updating means for updating parameters of the feature extracting means, the projecting means and the classifying means based on the hierarchical loss.
 本開示の他の観点では、学習方法は、
 特徴抽出手段を用いて、入力データを第1の特徴表現に変換し、
 射影手段を用いて、前記第1の特徴表現を、双曲空間上の点を示す第2の特徴表現に変換し、
 分類手段を用いて、前記第2の特徴表現に基づいて分類を行い、前記入力データが各クラスに属する可能性を示すスコアを出力し、
 前記各クラスが属する階層構造の知識と、前記入力データに付与された正解ラベルと、前記スコアとに基づいて階層的損失を計算し、
 前記階層的損失に基づいて、前記特徴抽出手段、前記射影手段及び前記分類手段のパラメータを更新する。
In another aspect of the disclosure, a learning method comprises:
converting the input data into a first feature representation using the feature extraction means;
transforming the first feature representation into a second feature representation representing a point on hyperbolic space using a projection means;
Classifying based on the second feature representation using classifying means and outputting a score indicating the likelihood that the input data belongs to each class;
calculating a hierarchical loss based on the knowledge of the hierarchical structure to which each class belongs, the correct label assigned to the input data, and the score;
Based on the hierarchical loss, parameters of the feature extraction means, the projection means and the classifier are updated.
 本開示のさらに他の観点では、記録媒体は、
 分類手段を用いて、入力データを第1の特徴表現に変換し、
 射影手段を用いて、前記第1の特徴表現を、双曲空間上の点を示す第2の特徴表現に変換し、
 分類手段を用いて、前記第2の特徴表現に基づいて分類を行い、前記入力データが各クラスに属する可能性を示すスコアを出力し、
 前記各クラスが属する階層構造の知識と、前記入力データに付与された正解ラベルと、前記スコアとに基づいて階層的損失を計算し、
 前記階層的損失に基づいて、前記特徴抽出手段、前記射影手段及び前記分類手段のパラメータを更新する処理をコンピュータに実行させるプログラムを記録する。
In yet another aspect of the present disclosure, the recording medium comprises
transforming the input data into a first feature representation using the classifier;
transforming the first feature representation into a second feature representation representing a point on hyperbolic space using a projection means;
Classifying based on the second feature representation using classifying means and outputting a score indicating the likelihood that the input data belongs to each class;
calculating a hierarchical loss based on the knowledge of the hierarchical structure to which each class belongs, the correct label assigned to the input data, and the score;
A program for causing a computer to execute processing for updating parameters of the feature extracting means, the projecting means and the classifying means based on the hierarchical loss is recorded.
 本開示のさらに他の観点では、推論装置は、
 入力データを第1の特徴表現に変換する特徴抽出手段と、
 前記第1の特徴表現を、双曲空間上の点を示す第2の特徴表現に変換する射影手段と、
 前記第2の特徴表現に基づいて分類を行い、各クラスが属する階層構造の知識を用いて、前記入力データが各クラスに属する可能性を示すスコアを各階層について算出する分類手段と、を備える。
In still another aspect of the present disclosure, an inference device includes:
a feature extraction means for converting input data into a first feature representation;
projection means for transforming the first feature representation into a second feature representation representing a point on hyperbolic space;
a classification means for performing classification based on the second feature representation, and using knowledge of a hierarchical structure to which each class belongs to calculate a score indicating a possibility that the input data belongs to each class for each hierarchy. .
 本開示のさらに他の観点では、推論方法は、
 入力データを第1の特徴表現に変換し、
 前記第1の特徴表現を、双曲空間上の点を示す第2の特徴表現に変換し、
 前記第2の特徴表現に基づいて分類を行い、各クラスが属する階層構造の知識を用いて、前記入力データが各クラスに属する可能性を示すスコアを各階層について算出する。
In yet another aspect of the disclosure, an inference method includes:
transforming the input data into a first feature representation;
transforming the first feature representation into a second feature representation representing a point on hyperbolic space;
Classification is performed based on the second feature representation, and a score indicating the possibility that the input data belongs to each class is calculated for each hierarchy using knowledge of the hierarchical structure to which each class belongs.
 本開示のさらに他の観点では、記録媒体は、
 入力データを第1の特徴表現に変換し、
 前記第1の特徴表現を、双曲空間上の点を示す第2の特徴表現に変換し、
 前記第2の特徴表現に基づいて分類を行い、各クラスが属する階層構造の知識を用いて、前記入力データが各クラスに属する可能性を示すスコアを各階層について算出する処理をコンピュータに実行させるプログラムを記録する。
In yet another aspect of the present disclosure, the recording medium comprises
transforming the input data into a first feature representation;
transforming the first feature representation into a second feature representation representing a point on hyperbolic space;
Classification is performed based on the second feature representation, and a computer is caused to execute processing for calculating, for each hierarchy, a score indicating the possibility that the input data belongs to each class, using knowledge of the hierarchical structure to which each class belongs. Record the program.
 本開示によれば、クラスの構造の知識を用いることにより、低コストで精度の高い機械学習モデルを生成することが可能となる。 According to the present disclosure, it is possible to generate a low-cost and highly accurate machine learning model by using knowledge of the class structure.
第1実施形態の学習装置のハードウェア構成を示すブロック図である。2 is a block diagram showing the hardware configuration of the learning device of the first embodiment; FIG. 第1実施形態の学習装置の機能構成を示すブロック図である。2 is a block diagram showing the functional configuration of the learning device of the first embodiment; FIG. クラスの階層構造の例を示す。Here is an example of a class hierarchy. 第1実施形態の学習装置による学習処理のフローチャートである。4 is a flowchart of learning processing by the learning device of the first embodiment; 第1実施形態の推論装置の機能構成を示すブロック図である。2 is a block diagram showing the functional configuration of the inference device of the first embodiment; FIG. 第1実施形態の推論装置による推論処理のフローチャートである。4 is a flowchart of inference processing by the inference device of the first embodiment; 第2実施形態の学習装置の機能構成を示すブロック図である。FIG. 11 is a block diagram showing the functional configuration of a learning device according to a second embodiment; FIG. 階層的双曲分類器を構成する複数の分類器による分担の一例を示す。An example of sharing by multiple classifiers forming a hierarchical hyperbolic classifier is shown. 階層的双曲分類器を構成する複数の分類器による分担の他の例を示す。Fig. 2 shows another example of division by multiple classifiers forming a hierarchical hyperbolic classifier; 第2実施形態の学習装置による学習処理のフローチャートである。9 is a flowchart of learning processing by the learning device of the second embodiment; 第2実施形態の推論装置の機能構成を示すブロック図である。FIG. 12 is a block diagram showing the functional configuration of the inference device of the second embodiment; FIG. 第2実施形態の推論装置による推論処理のフローチャートである。9 is a flowchart of inference processing by the inference device of the second embodiment; 第3実施形態の学習装置の機能構成を示すブロック図である。FIG. 11 is a block diagram showing the functional configuration of a learning device according to a third embodiment; FIG. 階層的双曲射影部の概略構成を示す。4 shows a schematic configuration of a hierarchical hyperbolic projection unit; 特徴表現及び差分を概念的に説明する図である。FIG. 2 is a diagram conceptually explaining feature representations and differences; 第3実施形態の学習装置による学習処理のフローチャートである。10 is a flowchart of learning processing by the learning device of the third embodiment; 第3実施形態の推論装置の機能構成を示すブロック図である。FIG. 11 is a block diagram showing the functional configuration of an inference device according to a third embodiment; FIG. 第3実施形態の推論装置による推論処理のフローチャートである。10 is a flowchart of inference processing by the inference device of the third embodiment; 第4実施形態の学習装置の機能構成を示すブロック図である。FIG. 11 is a block diagram showing the functional configuration of a learning device according to a fourth embodiment; FIG. 第4実施形態の学習装置による学習処理のフローチャートである。10 is a flowchart of learning processing by the learning device of the fourth embodiment; 第5実施形態の推論装置の機能構成を示すブロック図である。FIG. 14 is a block diagram showing the functional configuration of an inference device according to a fifth embodiment; 第5実施形態の推論装置による推論処理のフローチャートである。FIG. 12 is a flowchart of inference processing by the inference device of the fifth embodiment; FIG.
 以下、図面を参照して、本開示の好適な実施形態について説明する。
 <概念説明>
 前述のように、大量の正解付き学習データを用いて学習を行えば高精度の認識モデルを得ることができるが、少量のデータから低コストで高精度なモデルを生成することが求められる場合もある。少量のデータから高精度なモデルを学習するためには、学習データ以外の情報を利用することが不可欠である。多クラス分類を行う場合、クラスの階層構造に関する知識は汎用性が高く、容易に入手できる場合が多い。そこで、以下の実施形態では、分類対象のクラスの階層構造を示す知識を利用することにより、少量のデータでも高精度の分類モデルを得ることが可能な学習方法を提供する。
Preferred embodiments of the present disclosure will be described below with reference to the drawings.
<Concept explanation>
As mentioned above, it is possible to obtain a highly accurate recognition model by training using a large amount of training data with correct answers, but there are also cases where it is required to generate a highly accurate model at low cost from a small amount of data. be. In order to learn a highly accurate model from a small amount of data, it is essential to use information other than training data. When performing multi-class classification, knowledge about the hierarchical structure of classes is highly versatile and easily available in many cases. Therefore, the following embodiments provide a learning method that can obtain a highly accurate classification model even with a small amount of data by using knowledge indicating the hierarchical structure of classes to be classified.
 <第1実施形態>
 [学習装置]
 まず、第1実施形態の学習装置について説明する。
 (ハードウェア構成)
 図1は、第1実施形態の学習装置100のハードウェア構成を示すブロック図である。図示のように、学習装置100は、インタフェース(I/F)11と、プロセッサ12と、メモリ13と、記録媒体14と、データベース(DB)15と、を備える。
<First embodiment>
[Learning device]
First, the learning device of the first embodiment will be described.
(Hardware configuration)
FIG. 1 is a block diagram showing the hardware configuration of the learning device 100 of the first embodiment. As illustrated, the learning device 100 includes an interface (I/F) 11 , a processor 12 , a memory 13 , a recording medium 14 and a database (DB) 15 .
 インタフェース11は、外部装置との間でデータの入出力を行う。具体的に、学習に使用される正解付きデータは、インタフェース11を通じて入力される。 The interface 11 performs data input/output with an external device. Specifically, data with correct answers used for learning is input through the interface 11 .
 プロセッサ12は、CPU(Central Processing Unit)などのコンピュータであり、予め用意されたプログラムを実行することにより学習装置100の全体を制御する。なお、プロセッサ12は、GPU(Graphics Processing Unit)またはFPGA(Field-Programmable Gate Array)であってもよい。プロセッサ12は、後述する学習処理を実行する。 The processor 12 is a computer such as a CPU (Central Processing Unit), and controls the entire study device 100 by executing a program prepared in advance. The processor 12 may be a GPU (Graphics Processing Unit) or an FPGA (Field-Programmable Gate Array). The processor 12 executes learning processing, which will be described later.
 メモリ13は、ROM(Read Only Memory)、RAM(Random Access Memory)などにより構成される。メモリ13は、プロセッサ12による各種の処理の実行中に作業メモリとしても使用される。 The memory 13 is composed of ROM (Read Only Memory), RAM (Random Access Memory), and the like. Memory 13 is also used as a working memory during execution of various processes by processor 12 .
 記録媒体14は、ディスク状記録媒体、半導体メモリなどの不揮発性で非一時的な記録媒体であり、学習装置100に対して着脱可能に構成される。記録媒体14は、プロセッサ12が実行する各種のプログラムを記録している。学習装置100が各種の処理を実行する際には、記録媒体14に記録されているプログラムがメモリ13にロードされ、プロセッサ12により実行される。DB15は、必要に応じて、学習のための正解付きデータなどを記憶する。 The recording medium 14 is a non-volatile, non-temporary recording medium such as a disk-shaped recording medium or semiconductor memory, and is configured to be detachable from the learning device 100 . The recording medium 14 records various programs executed by the processor 12 . When the learning device 100 executes various processes, a program recorded on the recording medium 14 is loaded into the memory 13 and executed by the processor 12 . DB15 memorize|stores the data with a correct answer for learning, etc. as needed.
 (機能構成)
 図2は、第1実施形態の学習装置100の機能構成を示すブロック図である。学習装置100は、特徴抽出部21と、双曲射影部22と、双曲分類部23と、階層的損失計算部24と、勾配計算部25と、更新部26とを備える。
(Functional configuration)
FIG. 2 is a block diagram showing the functional configuration of the learning device 100 of the first embodiment. The learning device 100 includes a feature extraction unit 21 , a hyperbolic projection unit 22 , a hyperbolic classification unit 23 , a hierarchical loss calculation unit 24 , a gradient calculation unit 25 and an update unit 26 .
 正解付きデータは、入力データと、その入力データに対応する正解ラベルとを含む。例えば画像認識モデルを学習する場合、入力データは学習に使用される画像であり、正解ラベルはその画像に含まれる対象物のクラスを示す情報である。正解付きデータのうち、入力データは特徴抽出部21に入力され、正解ラベルは階層的損失計算部24へ入力される。 Data with correct answers include input data and correct labels corresponding to the input data. For example, when learning an image recognition model, the input data is an image used for learning, and the correct label is information indicating the class of the object included in the image. Of the data with correct answers, input data is input to the feature extraction unit 21 and correct labels are input to the hierarchical loss calculation unit 24 .
 特徴抽出部21は、入力データを前特徴表現に変換する。なお、特徴抽出部21が出力する特徴表現を、後述する双曲射影部22が出力する特徴表現と区別する意味で「前特徴表現」と呼ぶ。「前特徴表現」及び「特徴表現」は、いずれも入力データの特徴を表す情報である。具体的に、画像認識モデルを学習する場合、特徴抽出部21は、深層畳み込みニューラルネットワーク(CNN)などで構成され、入力された画像の特徴を示す実数値の列(ベクトル)を前特徴表現として双曲射影部22へ出力する。 The feature extraction unit 21 converts the input data into a pre-feature representation. Note that the feature representation output by the feature extraction unit 21 is called a "pre-feature representation" in the sense of distinguishing it from the feature representation output by the hyperbolic projection unit 22, which will be described later. Both the “previous feature representation” and the “feature representation” are information representing features of the input data. Specifically, when learning an image recognition model, the feature extraction unit 21 is configured by a deep convolutional neural network (CNN) or the like, and uses a sequence (vector) of real numbers representing features of an input image as a pre-feature representation. Output to the hyperbolic projection unit 22 .
 双曲射影部22は、前特徴表現を特徴表現に変換する。ここで、「特徴表現」は何らかの多様体上の点であり、具体的な射影部を選択することは、特徴表現が属する多様体(特徴量空間)を選択することと等価である。一般的には、特徴量空間として線形空間(ユークリッド空間)を用い、射影部として線形射影部を使用するか、特徴量空間として高次元超球面を用い、射影部として球面射影部を使用することが多い。これに対し、本実施形態は、特徴量空間として双曲空間を用いる。 The hyperbolic projection unit 22 converts the pre-feature representation into a feature representation. Here, the "feature representation" is a point on some manifold, and selecting a specific projection part is equivalent to selecting a manifold (feature space) to which the feature representation belongs. In general, use a linear space (Euclidean space) as the feature space and a linear projection part as the projection part, or use a high-dimensional hypersphere as the feature space and a spherical projection part as the projection part. There are many. On the other hand, this embodiment uses a hyperbolic space as the feature amount space.
 前述のように、本実施形態は、クラスの階層構造に関する知識を利用することにより、少量の学習データで高精度なモデルを得るものであるが、階層構造(木構造)は指数関数的に広がる性質を有する。一般的には特徴量空間としてユークリッド空間や超球面を使用するが、ユークリッド空間や超球面は多項式的にしか広がらないので木構造の埋め込みには適さない。即ち、ユークリッド空間や超球面の上で階層構造を表現すると、低次元では歪が避けられない。よって、ユークリッド空間や超球面の上で階層構造(木構造)を歪なく表現するには、クラス数に関して指数関数的に高次元の特徴量空間を使用する必要がある。 As described above, this embodiment obtains a highly accurate model with a small amount of training data by using knowledge about the hierarchical structure of classes, but the hierarchical structure (tree structure) expands exponentially. have the property In general, Euclidean space and hypersphere are used as feature space, but Euclidean space and hypersphere expand only polynomially, so they are not suitable for embedding tree structures. That is, when a hierarchical structure is expressed on Euclidean space or hypersphere, distortion is inevitable in low dimensions. Therefore, in order to express a hierarchical structure (tree structure) without distortion on the Euclidean space or hypersphere, it is necessary to use a feature amount space exponentially higher in terms of the number of classes.
 この点から、本実施形態では、特徴量空間として双曲空間を用いる。双曲空間には、木構造を効率的に埋め込むことができる。指数関数的に広がる双曲空間は、2次元でも木構造を歪なく埋め込むことができる。よって、双曲射影部22は、前特徴表現を双曲空間上の特徴表現に変換し、双曲分類部23へ出力する。特徴表現も前特徴表現と同様に実数値の列(ベクトル)であるが、特徴量空間である双曲空間上の座標値とみなすことができる。双曲射影部22は、具体的な双曲空間のモデルに応じて、ポアンカレ射影、ローレンツ射影などを用いることができる。 From this point, in this embodiment, the hyperbolic space is used as the feature amount space. A tree structure can be embedded efficiently in a hyperbolic space. A hyperbolic space that expands exponentially can embed a tree structure without distortion even in two dimensions. Therefore, the hyperbolic projection unit 22 converts the previous feature representation into a feature representation on the hyperbolic space, and outputs the feature representation to the hyperbolic classification unit 23 . Like the previous feature representation, the feature representation is also a sequence (vector) of real numbers, but can be regarded as coordinate values on the hyperbolic space, which is the feature quantity space. The hyperbolic projection unit 22 can use Poincaré projection, Lorentz projection, etc. according to a specific hyperbolic space model.
 双曲分類部23は、双曲射影部22が出力した双曲空間上の1つの特徴表現から分類を行い、その特徴表現に対して得られる各クラスのスコアを階層的損失計算部24へ出力する。なお、双曲分類部23は、クラスの階層構造における末端クラスのスコアだけを出力する。双曲分類部23としては、双曲超平面分類器や双曲最近傍分類器を用いることができる。双曲超平面分類器は、線形分類器を双曲空間に拡張し、双曲空間中の超平面を識別平面とする分類器である。双曲最近傍分類器は、双曲空間上での最近傍則に従う分類器である。なお、双曲分類部23の具体的な表式は、双曲射影部22で選択した双曲空間のモデルによって定まる。 The hyperbolic classifier 23 classifies one feature expression on the hyperbolic space output by the hyperbolic projection unit 22, and outputs the score of each class obtained for the feature expression to the hierarchical loss calculator 24. do. Note that the hyperbolic classifier 23 outputs only the scores of terminal classes in the hierarchical structure of classes. As the hyperbolic classifier 23, a hyperbolic hyperplane classifier or a hyperbolic nearest neighbor classifier can be used. A hyperbolic hyperplane classifier is a classifier that extends a linear classifier to a hyperbolic space and uses a hyperbolic plane in the hyperbolic space as a discrimination plane. A hyperbolic nearest neighbor classifier is a classifier that follows the nearest neighbor rule on hyperbolic space. The specific expression of the hyperbolic classification unit 23 is determined by the hyperbolic space model selected by the hyperbolic projection unit 22 .
 階層的損失計算部24は、双曲分類部23から入力された各クラスのスコアと、正解付きデータに含まれる正解ラベルとから損失関数を計算する。この際、階層的損失計算部24は、分類対象となるクラスの階層構造の知識を利用する。具体的には、階層的損失計算部24は、双曲分類部23が出力する各クラスのスコアから、階層毎のスコアを計算し、階層毎のスコアが各階層で正しいクラスを予測できるように階層毎の損失(ロス)を計算する。なお、階層的損失計算部24は、例えばクロスエントロピー損失など、一般的な多値分類向け損失関数を用いることができる。 The hierarchical loss calculator 24 calculates a loss function from the score of each class input from the hyperbolic classifier 23 and the correct labels included in the data with correct answers. At this time, the hierarchical loss calculator 24 uses knowledge of the hierarchical structure of the class to be classified. Specifically, the hierarchical loss calculation unit 24 calculates the score for each layer from the score for each class output by the hyperbolic classifier 23, so that the score for each layer can predict the correct class for each layer. Calculate the loss for each layer. Note that the hierarchical loss calculator 24 can use a general loss function for multilevel classification, such as cross-entropy loss.
 ここで、階層的損失計算部24による損失の計算方法について例を挙げて説明する。図3は、クラスの階層構造の例を示す。この例は、ルートノードを「商品」とする階層構造(木構造)を示し、第1~第3階層を有する。第1階層は、「商品」の子ノードとして、3つのクラス「食品」、「飲料」、「医薬品」を含む。第2階層は、「食品」の子ノードとして3つのクラス「弁当」、「パン」、「おにぎり」を含み、「飲料」の子ノードとして3つのクラス「お茶」、「ジュース」及び「水」を含む。また、第3階層は、「弁当」の子ノードとして「弁当A」~「弁当C」を含み、「パン」の子ノードとして「パンA」~「パンC」を含み、「おにぎり」の子ノードとして「おにぎりA」~「おにぎりC」を含む。なお、医薬品の第2~第3階層、及び、飲料の第3階層については便宜上図示を省略する。 Here, the loss calculation method by the hierarchical loss calculation unit 24 will be described with an example. FIG. 3 shows an example of a hierarchical structure of classes. This example shows a hierarchical structure (tree structure) with a root node of "merchandise" and has first to third hierarchies. The first hierarchy includes three classes "food", "beverage" and "pharmaceutical" as child nodes of "merchandise". The second hierarchy contains three classes "Bento", "Bread" and "Rice ball" as child nodes of "Food" and three classes "Tea", "Juice" and "Water" as child nodes of "Beverage". including. In addition, the third layer includes “Bento A” to “Bento C” as child nodes of “Bento”, “Bread A” to “Bread C” as child nodes of “Bread”, and “Onigiri” as child nodes. "Rice ball A" to "Rice ball C" are included as nodes. For the sake of convenience, illustration of the second and third layers of pharmaceuticals and the third layer of beverages is omitted.
 前述のように、双曲分類部23は、クラスの階層構造における末端クラスのスコアだけを出力する。図3の例では、双曲分類部23は、「弁当A」~「弁当C」、「パンA」~「パンC」、「おにぎりA」~「おにぎりC」などの末端クラスのスコアのみを出力する。いま、ある入力データが入力され、その正解ラベルが「弁当B」であるとする。この場合、階層的損失計算部24は、末端クラスの階層である第3階層について、正解クラスである「弁当B」のスコアを最大化するような損失を計算し、第3階層の損失とする。 As described above, the hyperbolic classifier 23 outputs only the scores of terminal classes in the hierarchical structure of classes. In the example of FIG. 3, the hyperbolic classifying unit 23 selects only terminal class scores such as "lunch box A" to "lunch box C", "bread A" to "bread C", and "rice ball A" to "rice ball C". Output. Suppose now that certain input data is input and its correct label is "Bento B". In this case, the hierarchical loss calculation unit 24 calculates the loss that maximizes the score of the correct class "Bento B" for the third layer, which is the layer of the terminal class, and sets it as the loss of the third layer. .
 また、階層的損失計算部24は、末端クラスより上位の階層の損失を計算する際には、各ノードの子孫である末端クラスのスコアを統合して損失計算に使用する。具体的に、双曲分類部23が出力するスコアが末端クラスの確率であれば、上位階層の各クラスのスコアはその子孫である末端クラスの確率の和となる。 In addition, the hierarchical loss calculation unit 24 integrates the scores of the terminal classes that are descendants of each node and uses them for loss calculation when calculating the loss of the hierarchy higher than the terminal class. Specifically, if the score output by the hyperbolic classifier 23 is the probability of the terminal class, the score of each class in the higher hierarchy is the sum of the probabilities of the terminal classes that are descendants of the class.
 例えば、図3の例において、第2階層の「弁当」のスコアは、その子ノードである「弁当A」~「弁当C」のスコアの和となる。同様に、第2階層の「パン」のスコアは、その子ノードである「パンA」~「パンC」のスコアの和となり、第2階層の「おにぎり」のスコアは、その子ノードである「おにぎりA」~「おにぎりC」のスコアの和となる。そして、階層的損失計算部24は、第2階層について、正解クラス「弁当B」を子孫ノードに有する「弁当」のスコアを最大化するような損失を計算する。 For example, in the example of FIG. 3, the score of "lunch box" in the second layer is the sum of the scores of its child nodes "lunch box A" to "lunch box C". Similarly, the score of "bread" in the second layer is the sum of the scores of its child nodes "bread A" to "bread C", and the score of "rice ball" in the second layer is the sum of the scores of its child nodes "rice ball A” to “rice ball C”. Then, the hierarchical loss calculation unit 24 calculates a loss that maximizes the score of "Bento" having the correct class "Bento B" as a descendant node for the second layer.
 また、第1階層の「食品」のスコアは、その孫ノードである末端クラスの「弁当A」~「弁当C]、「パンA」~「パンC」、「おにぎりA」~「おにぎりC」のスコアの和となる。同様に、第1階層の「飲料」や「医薬品」のスコアも、その孫ノードである末端クラスのスコアの和となる。階層的損失計算部24は、第1階層について、正解クラス「弁当B」を子孫ノードに有する「食品」のスコアを最大化するような損失を計算する。そして、階層的損失計算部24は、各階層について計算した損失の重み付き和を計算し、階層的損失として勾配計算部25へ出力する。 In addition, the score of "Food" in the first layer is the terminal class "Bento A" to "Bento C", "Bread A" to "Bread C", and "Rice ball A" to "Rice ball C", which are the grandchild nodes of terminal classes. is the sum of the scores of Similarly, the score of "beverage" and "pharmaceutical" in the first hierarchy is also the sum of the scores of terminal classes that are grandchild nodes. The hierarchical loss calculation unit 24 calculates a loss that maximizes the score of "food" having the correct class "lunch box B" as a descendant node for the first layer. Then, the hierarchical loss calculator 24 calculates a weighted sum of the losses calculated for each hierarchy, and outputs it as a hierarchical loss to the gradient calculator 25 .
 勾配計算部25は、階層的損失計算部24から入力された階層的損失の勾配を計算し、更新部26へ出力する。更新部26は、勾配を用いて特徴抽出部21、双曲射影部22及び双曲分類部23のパラメータを更新する。 The gradient calculator 25 calculates the gradient of the hierarchical loss input from the hierarchical loss calculator 24 and outputs it to the updater 26 . The update unit 26 updates the parameters of the feature extraction unit 21, the hyperbolic projection unit 22, and the hyperbolic classification unit 23 using the gradients.
 (学習処理)
 図4は、第1実施形態の学習装置100による学習処理のフローチャートである。この処理は、図1に示すプロセッサ12が予め用意されたプログラムを実行し、図2に示す各要素として動作することにより実現される。
(learning process)
FIG. 4 is a flowchart of learning processing by the learning device 100 of the first embodiment. This processing is realized by executing a program prepared in advance by the processor 12 shown in FIG. 1 and operating as each element shown in FIG.
 まず、特徴抽出部21は、入力データを前特徴表現に変換する(ステップS11)。次に、双曲射影部22は、前特徴表現を双曲空間上の特徴表現に変換する(ステップS12)。次に、双曲分類部23は、特徴表現から各クラスのスコアを算出する(ステップS13)。次に、階層的損失計算部24は、クラスの階層構造の知識を用いて、各クラスのスコアと正解ラベルとから階層的損失を計算する(ステップS14)。次に、勾配計算部25は、階層的損失の勾配を計算する(ステップS15)。次に、更新部26は勾配に基づいて特徴抽出部21、双曲射影部22及び双曲分類部23のパラメータを更新する(ステップS16)。以上の処理が、所定の学習終了条件が具備されるまで繰り返し行われ、学習処理は終了する。 First, the feature extraction unit 21 converts the input data into a pre-feature representation (step S11). Next, the hyperbolic projection unit 22 transforms the previous feature representation into a feature representation on the hyperbolic space (step S12). Next, the hyperbolic classifier 23 calculates the score of each class from the feature representation (step S13). Next, the hierarchical loss calculator 24 uses the knowledge of the hierarchical structure of classes to calculate the hierarchical loss from the score and correct label of each class (step S14). Next, the gradient calculator 25 calculates the gradient of the hierarchical loss (step S15). Next, the update unit 26 updates the parameters of the feature extraction unit 21, the hyperbolic projection unit 22, and the hyperbolic classification unit 23 based on the gradient (step S16). The above processing is repeated until a predetermined learning termination condition is satisfied, and the learning processing ends.
 以上のように、第1実施形態の学習装置100によれば、クラスの階層構造の知識を利用して、少ない学習データでも高精度のモデルを学習することが可能となる。 As described above, according to the learning device 100 of the first embodiment, it is possible to learn a highly accurate model with a small amount of learning data by using the knowledge of the hierarchical structure of classes.
 [推論装置]
 次に、第1実施形態の推論装置について説明する。
 (ハードウェア構成)
 第1実施形態の推論装置200のハードウェア構成は、図1に示す学習装置100と同様であるので説明を省略する。
[Inference device]
Next, the inference device of the first embodiment will be described.
(Hardware configuration)
The hardware configuration of the inference device 200 of the first embodiment is the same as that of the learning device 100 shown in FIG. 1, so the explanation is omitted.
 (機能構成)
 図5は、第1実施形態の推論装置200の機能構成を示すブロック図である。推論装置200は、特徴抽出部21と、双曲射影部22と、双曲分類部23とを備える。なお、特徴抽出部21、双曲射影部22及び双曲分類部23には、先の学習処理により得られたパラメータが設定される。
(Functional configuration)
FIG. 5 is a block diagram showing the functional configuration of the inference device 200 of the first embodiment. The inference device 200 includes a feature extraction unit 21 , a hyperbolic projection unit 22 and a hyperbolic classification unit 23 . Parameters obtained by the above learning process are set in the feature extraction unit 21, the hyperbolic projection unit 22, and the hyperbolic classification unit 23. FIG.
 特徴抽出部21には、入力データが入力される。この入力データは、実際にクラス分類の対象となる画像などのデータである。特徴抽出部21は、入力データを前特徴表現に変換し、双曲射影部22へ出力する。双曲射影部22は、前特徴表現を双曲空間上の特徴表現に変換し、双曲分類部23へ出力する。双曲分類部23は、クラスの階層構造における末端クラスについてスコアを計算し、推論結果として出力する。こうして、入力データのクラス分類が行われる。 Input data is input to the feature extraction unit 21 . This input data is data such as images that are actually subjected to class classification. The feature extraction unit 21 converts the input data into a pre-feature representation and outputs it to the hyperbolic projection unit 22 . The hyperbolic projection unit 22 converts the previous feature representation into a feature representation on the hyperbolic space, and outputs the feature representation to the hyperbolic classification unit 23 . The hyperbolic classifier 23 calculates scores for terminal classes in the hierarchical structure of classes and outputs them as inference results. Classification of the input data is thus performed.
 (推論処理)
 図6は、第1実施形態の推論装置200による推論処理のフローチャートである。この処理は、図1に示すプロセッサ12が予め用意されたプログラムを実行し、図5に示す各要素として動作することにより実現される。
(inference processing)
FIG. 6 is a flowchart of inference processing by the inference device 200 of the first embodiment. This processing is realized by executing a program prepared in advance by the processor 12 shown in FIG. 1 and operating as each element shown in FIG.
 まず、特徴抽出部21は、入力データを前特徴表現に変換する(ステップS21)。次に、双曲射影部22は、前特徴表現を双曲空間上の特徴表現に変換する(ステップS22)。次に、双曲分類部23は、特徴表現から各末端クラスのスコアを算出し、推論結果として出力する(ステップS23)。以上の処理が入力データ毎に行われる。 First, the feature extraction unit 21 converts the input data into a pre-feature representation (step S21). Next, the hyperbolic projection unit 22 transforms the previous feature representation into a feature representation on the hyperbolic space (step S22). Next, the hyperbolic classifier 23 calculates the score of each terminal class from the feature representation and outputs it as an inference result (step S23). The above processing is performed for each input data.
 <第2実施形態>
 次に、第2実施形態について説明する。第2実施形態は、クラスの階層構造の知識を用いて双曲分類部も階層化したものである。
<Second embodiment>
Next, a second embodiment will be described. In the second embodiment, the hyperbolic classifier is also hierarchized using the knowledge of the hierarchical structure of classes.
 [学習装置]
 まず、第2実施形態の学習装置について説明する。
 (ハードウェア構成)
 第2実施形態の学習装置100aのハードウェア構成は、図1に示す学習装置100と同様であるので説明を省略する。
[Learning device]
First, the learning device of the second embodiment will be described.
(Hardware configuration)
The hardware configuration of the learning device 100a of the second embodiment is the same as that of the learning device 100 shown in FIG. 1, so the description is omitted.
 (機能構成)
 図7は、第2実施形態の学習装置100aの機能構成を示すブロック図である。図2に示す第1実施形態の学習装置100と比較するとわかるように、第2実施形態の学習装置100aは、双曲分類部23の代わりに階層的双曲分類部23xを有する。
(Functional configuration)
FIG. 7 is a block diagram showing the functional configuration of the learning device 100a of the second embodiment. As can be seen from a comparison with the learning device 100 of the first embodiment shown in FIG. 2, the learning device 100a of the second embodiment has a hierarchical hyperbolic classifier 23x instead of the hyperbolic classifier 23.
 階層的双曲分類部23xは、クラスの階層構造の知識を用い、双曲射影部22から入力される双曲空間上の1つの特徴表現から、階層構造の各階層におけるスコアを出力する。図8は、階層的双曲分類部23xを構成する複数の分類器による分担方法の一例を示す。太線で示す枠91~93の各々が1つの分類器に対応する部分を示す。図8の例では、クラスの階層構造における階層毎に1つの分類器を設ける。即ち、階層的双曲分類部23xは、第1~第3階層にそれぞれ対応する3つの分類器により構成される。各分類器は、同じ階層に属するノードを、部分木をまたいで識別する分類器である。この例では、階層的双曲分類部23xは、3つの分類器による階層毎の分類結果を出力する。 The hierarchical hyperbolic classification unit 23x uses knowledge of the hierarchical structure of classes to output a score in each layer of the hierarchical structure from one feature representation in the hyperbolic space input from the hyperbolic projection unit 22. FIG. 8 shows an example of a method of sharing by a plurality of classifiers forming the hierarchical hyperbolic classifier 23x. Each of frames 91 to 93 indicated by thick lines indicates a portion corresponding to one classifier. In the example of FIG. 8, one classifier is provided for each layer in the hierarchical structure of classes. That is, the hierarchical hyperbolic classifier 23x is composed of three classifiers respectively corresponding to the first to third hierarchies. Each classifier is a classifier that identifies nodes belonging to the same hierarchy across subtrees. In this example, the hierarchical hyperbolic classifier 23x outputs classification results for each layer by three classifiers.
 図9は、階層的双曲分類部23xを構成する複数の分類器による分担方法の他の例を示す。太線で示す枠91~93の各々が1つの分類器に対応する部分を示す。図9の例では、第3階層において、枠93で示すように、同じ親ノードに属する兄弟ノードを識別する複数の分類器を設ける。即ち、同じ親ノード「弁当」に属するノード「弁当A」~「弁当C」に対応して1つの分類器が用意され、同じ親ノード「パン」に属するノード「パンA」~「パンC」に対応して1つの分類器が用意される。なお、第3階層については、第2階層の「弁当」、「パン」以外の全ての親ノードに属する兄弟ノードに対しても同様に1つの分類器が用意されるが、便宜上図示を省略する。この例では、階層的双曲分類部23xは、複数の分類器それぞれによる分類結果を出力する。即ち、階層的双曲分類部23xは、第1階層については枠91に対応する分類結果を出力し、第2階層については枠92に対応する分類結果を出力し、第3階層については複数の枠93に対応する分類結果を出力する。 FIG. 9 shows another example of a method of sharing by a plurality of classifiers forming the hierarchical hyperbolic classifier 23x. Each of frames 91 to 93 indicated by thick lines indicates a portion corresponding to one classifier. In the example of FIG. 9, in the third hierarchy, as indicated by frame 93, a plurality of classifiers for identifying sibling nodes belonging to the same parent node are provided. That is, one classifier is prepared corresponding to the nodes ``Bento A'' to ``Bento C'' belonging to the same parent node ``Bento'', and the nodes ``Bread A'' to ``Bread C'' belonging to the same parent node ``Bread''. One classifier is prepared corresponding to . As for the third layer, one classifier is similarly prepared for sibling nodes belonging to all parent nodes other than "lunch box" and "bread" in the second layer, but illustration is omitted for convenience. . In this example, the hierarchical hyperbolic classifier 23x outputs classification results by each of the plurality of classifiers. That is, the hierarchical hyperbolic classifier 23x outputs the classification result corresponding to the frame 91 for the first hierarchy, outputs the classification result corresponding to the frame 92 for the second hierarchy, and outputs a plurality of classification results for the third hierarchy. A classification result corresponding to the frame 93 is output.
 上記のいずれかの構成により、階層的双曲分類部23xは、各階層について分類結果(スコア)を計算し、階層的損失計算部24へ出力する。階層的損失計算部24は、階層的双曲分類部23xから入力された各階層の分類結果に対して損失を計算し、それらの重みづけ和を階層的損失として勾配計算部25へ出力する。なお、上記のように、第2実施形態では階層的双曲分類部23xは、末端クラスのスコアのみでなく、上位階層のクラスのスコアも出力するため、階層的損失計算部24は、第1実施形態の場合のように末端クラスのスコアを統合して上位階層のスコアを算出する必要はない。 With any of the above configurations, the hierarchical hyperbolic classifier 23x calculates a classification result (score) for each layer and outputs it to the hierarchical loss calculator 24. The hierarchical loss calculator 24 calculates a loss for the classification result of each hierarchy input from the hierarchical hyperbolic classifier 23x, and outputs a weighted sum of them to the gradient calculator 25 as a hierarchical loss. As described above, in the second embodiment, the hierarchical hyperbolic classifier 23x outputs not only the score of the terminal class but also the score of the upper class. It is not necessary to integrate the scores of the terminal classes to calculate the scores of the upper hierarchy as in the case of the embodiment.
 なお、第2実施形態の学習装置100aにおける特徴抽出部21、勾配計算部25及び更新部26の構成及び動作は第1実施形態と同様であるので、説明を省略する。 The configurations and operations of the feature extraction unit 21, the gradient calculation unit 25, and the update unit 26 in the learning device 100a of the second embodiment are the same as those of the first embodiment, so descriptions thereof will be omitted.
 (学習処理)
 図10は、第2実施形態の学習装置100aによる学習処理のフローチャートである。この処理は、図1に示すプロセッサ12が予め用意されたプログラムを実行し、図7に示す各要素として動作することにより実現される。
(learning process)
FIG. 10 is a flowchart of learning processing by the learning device 100a of the second embodiment. This processing is realized by executing a program prepared in advance by the processor 12 shown in FIG. 1 and operating as each element shown in FIG.
 まず、特徴抽出部21は、入力データを前特徴表現に変換する(ステップS31)。次に、双曲射影部22は、前特徴表現を双曲空間上の特徴表現に変換する(ステップS32)。次に、階層的双曲分類部23xは、クラスの階層構造の知識を用いて、特徴表現から各階層について各クラスのスコアを算出する(ステップS33)。次に、階層的損失計算部24は、各階層についての各クラスのスコアと、正解ラベルとから階層的損失を計算する(ステップS34)。次に、勾配計算部25は、階層的損失の勾配を計算する(ステップS35)。次に、更新部26は勾配に基づいて特徴抽出部21、双曲射影部22及び階層的双曲分類部23xのパラメータを更新する(ステップS36)。以上の処理が、所定の学習終了条件が具備されるまで繰り返し行われ、学習処理は終了する。 First, the feature extraction unit 21 converts the input data into a pre-feature representation (step S31). Next, the hyperbolic projection unit 22 transforms the previous feature representation into a feature representation on the hyperbolic space (step S32). Next, the hierarchical hyperbolic classifier 23x uses the knowledge of the hierarchical structure of classes to calculate the score of each class for each layer from the feature representation (step S33). Next, the hierarchical loss calculator 24 calculates the hierarchical loss from the score of each class and the correct label for each hierarchy (step S34). Next, the gradient calculator 25 calculates the gradient of the hierarchical loss (step S35). Next, the updating unit 26 updates the parameters of the feature extracting unit 21, the hyperbolic projecting unit 22, and the hierarchical hyperbolic classifying unit 23x based on the gradient (step S36). The above processing is repeated until a predetermined learning termination condition is satisfied, and the learning processing ends.
 [推論装置]
 次に、第2実施形態の推論装置について説明する。
 (ハードウェア構成)
 推論装置200のハードウェア構成は、図1に示す学習装置100と同様であるので説明を省略する。
[Inference device]
Next, the inference device of the second embodiment will be described.
(Hardware configuration)
The hardware configuration of the inference device 200 is the same as that of the learning device 100 shown in FIG. 1, so description thereof will be omitted.
 (機能構成)
 図11は、第2実施形態の推論装置200aの機能構成を示すブロック図である。推論装置200aは、特徴抽出部21と、双曲射影部22と、階層的双曲分類部23xとを備える。なお、特徴抽出部21、双曲射影部22及び階層的双曲分類部23xには、先の学習処理により得られたパラメータが設定される。
(Functional configuration)
FIG. 11 is a block diagram showing the functional configuration of the inference device 200a of the second embodiment. The inference device 200a includes a feature extraction unit 21, a hyperbolic projection unit 22, and a hierarchical hyperbolic classification unit 23x. Parameters obtained by the previous learning process are set in the feature extraction unit 21, the hyperbolic projection unit 22, and the hierarchical hyperbolic classification unit 23x.
 特徴抽出部21には、入力データが入力される。この入力データは、実際にクラス分類の対象となる画像などのデータである。特徴抽出部21は、入力データを前特徴表現に変換し、双曲射影部22へ出力する。双曲射影部22は、前特徴表現を双曲空間上の特徴表現に変換し、階層的双曲分類部23xへ出力する。階層的双曲分類部23xは、クラスの階層構造の知識を用いて、各階層の各クラスについてスコアを計算し、推論結果として出力する。こうして、入力データのクラス分類が行われる。 Input data is input to the feature extraction unit 21 . This input data is data such as images that are actually subjected to class classification. The feature extraction unit 21 converts the input data into a pre-feature representation and outputs it to the hyperbolic projection unit 22 . The hyperbolic projection unit 22 converts the previous feature representation into a feature representation on the hyperbolic space, and outputs it to the hierarchical hyperbolic classification unit 23x. The hierarchical hyperbolic classifier 23x uses the knowledge of the hierarchical structure of classes to calculate the score for each class in each hierarchy and output it as an inference result. Classification of the input data is thus performed.
 (推論処理)
 図12は、第2実施形態の推論装置200aによる推論処理のフローチャートである。この処理は、図1に示すプロセッサ12が予め用意されたプログラムを実行し、図11に示す各要素として動作することにより実現される。
(inference processing)
FIG. 12 is a flowchart of inference processing by the inference device 200a of the second embodiment. This processing is realized by executing a program prepared in advance by the processor 12 shown in FIG. 1 and operating as each element shown in FIG.
 まず、特徴抽出部21は、入力データを前特徴表現に変換する(ステップS41)。次に、双曲射影部22は、前特徴表現を双曲空間上の特徴表現に変換する(ステップS42)。次に、階層的双曲分類部23xは、クラスの階層構造の知識を利用し、特徴表現から各階層について各クラスのスコアを算出し、推論結果として出力する(ステップS43)。以上の処理が入力データ毎に行われる。 First, the feature extraction unit 21 converts the input data into a pre-feature representation (step S41). Next, the hyperbolic projection unit 22 transforms the previous feature representation into a feature representation on the hyperbolic space (step S42). Next, the hierarchical hyperbolic classifier 23x utilizes the knowledge of the hierarchical structure of the classes, calculates the score of each class for each class from the feature representation, and outputs it as an inference result (step S43). The above processing is performed for each input data.
 <第3実施形態>
 次に、第3実施形態について説明する。第3実施形態は、クラスの階層構造の知識を用いて双曲射影部22も階層化したものである。
<Third Embodiment>
Next, a third embodiment will be described. In the third embodiment, the hyperbolic projection unit 22 is also hierarchized using knowledge of the hierarchical structure of classes.
 [学習装置]
 まず、第3実施形態の学習装置について説明する。
 (ハードウェア構成)
 第3実施形態の学習装置100bのハードウェア構成は、図1に示す第1実施形態の学習装置100と同様であるので説明を省略する。
[Learning device]
First, the learning device of the third embodiment will be described.
(Hardware configuration)
The hardware configuration of the learning device 100b of the third embodiment is the same as that of the learning device 100 of the first embodiment shown in FIG. 1, so description thereof will be omitted.
 (機能構成)
 図13は、第3実施形態の学習装置100bの機能構成を示すブロック図である。図7に示す第2実施形態の学習装置100aと比較するとわかるように、第2実施形態の学習装置100bは、双曲射影部22の代わりに、階層的双曲射影部22xを有する。
(Functional configuration)
FIG. 13 is a block diagram showing the functional configuration of the learning device 100b of the third embodiment. As can be seen from a comparison with the learning device 100a of the second embodiment shown in FIG. 7, the learning device 100b of the second embodiment has a hierarchical hyperbolic projection unit 22x instead of the hyperbolic projection unit 22.
 階層的双曲射影部22xは、クラスの階層構造の知識を用い、特徴抽出部21から入力される前特徴表現から、階層構造の各階層における特徴表現を出力する。図14は、階層的双曲射影部22xの概略構成を示す。階層的双曲射影部22xは、第1~第3埋め込みネットワーク(NW)と、加算器31、32とを備える。 The hierarchical hyperbolic projection unit 22x uses knowledge of the hierarchical structure of classes to output feature representations in each layer of the hierarchical structure from the pre-feature representation input from the feature extraction unit 21. FIG. 14 shows a schematic configuration of the hierarchical hyperbolic projection unit 22x. The hierarchical hyperbolic projection unit 22 x includes first to third embedding networks (NW) and adders 31 and 32 .
 第1~第3埋め込みNWには、特徴抽出部21から前特徴表現が入力される。第1埋め込みNWは、クラスの階層構造の知識を用い、第1階層のノードに対応するクラスの双曲空間上における点を示すベクトルを特徴表現C1として出力する。 Pre-feature expressions are input from the feature extraction unit 21 to the first to third embedding NWs. The first embedding NW uses knowledge of the hierarchical structure of the classes and outputs a vector indicating a point on the hyperbolic space of the class corresponding to the node of the first hierarchy as the feature representation C1.
 第2埋め込みNWは、第2階層のノードについて、そのノードの親ノードに対応するクラスの特徴表現C1と、そのノードの特徴表現との差分D1を出力する。そして、加算器31は、親ノードの特徴表現C1と差分D1との和を、第2階層のそのノードに対応する特徴表現C2として出力する。特徴表現C2は、特徴表現C1と同様に、双曲空間上における点を示すベクトルとなる。 The second embedding NW outputs the difference D1 between the feature representation C1 of the class corresponding to the parent node of the node and the feature representation of the node for the second layer node. The adder 31 then outputs the sum of the characteristic representation C1 of the parent node and the difference D1 as the characteristic representation C2 corresponding to that node in the second layer. Like the feature representation C1, the feature representation C2 is a vector indicating a point on the hyperbolic space.
 同様に、第3埋め込みNWは、第3階層のノードについて、そのノードの親ノードに対応するクラスの特徴表現C2と、そのノードの特徴表現との差分D2を出力する。そして、加算器32は、親ノードの特徴表現C2と差分D2との和を、第3階層のそのノードに対応する特徴表現C3として出力する。特徴表現C3は、特徴表現C1と同様に、双曲空間上における点を示すベクトルとなる。 Similarly, the third embedding NW outputs the difference D2 between the feature representation C2 of the class corresponding to the parent node of the node and the feature representation of the node for the node of the third layer. The adder 32 then outputs the sum of the characteristic representation C2 of the parent node and the difference D2 as the characteristic representation C3 corresponding to that node in the third layer. Like the feature representation C1, the feature representation C3 is a vector indicating a point on the hyperbolic space.
 図15は、上記の特徴表現C1~C3及び差分D1~D2を概念的に説明する図である。図15では、双曲空間を便宜上2次元空間で示している。いま図3に示すクラスの階層構造を前提とすると、丸(●)は第1階層のクラスの特徴表現C1を示し、四角(■)は第2階層のクラスの特徴表現C2を示し、三角(▲)は第3階層のクラスの特徴表現C3を示す。この場合、差分D1は、丸で示す第1階層のクラス「食品」から、四角で示す第2階層のクラス「弁当」、「パン」、「おにぎり」を指すベクトルと考えることができる。同様に、差分D2は、四角で示す第2階層のクラス「パン」から、三角で示す第3階層のクラス「パンA」~「パンC」を指すベクトルと考えることができる。なお、数学的には、上記「差分」は親ノードのクラスの特徴表現における双曲空間の接ベクトルであり、「和」は指数写像により実現される。 FIG. 15 is a diagram conceptually explaining the feature representations C1 to C3 and the differences D1 to D2. FIG. 15 shows the hyperbolic space in a two-dimensional space for convenience. Assuming the hierarchical structure of the classes shown in FIG. 3, circles () indicate the feature representation C1 of the class in the first layer, squares (▪) indicate the feature representation C2 of the class in the second layer, and triangles ( ▲) indicates the feature representation C3 of the class in the third layer. In this case, the difference D1 can be considered as a vector pointing from the first layer class "food" indicated by circles to the second layer classes "bento", "bread", and "rice ball" indicated by squares. Similarly, the difference D2 can be considered as a vector pointing from the second-layer class "Bread" indicated by squares to the third-layer classes "Bread A" to "Bread C" indicated by triangles. Mathematically, the "difference" is the tangent vector of the hyperbolic space in the feature representation of the class of the parent node, and the "sum" is realized by exponential mapping.
 こうして、階層的双曲射影部22xは、1つの入力データに対して階層毎の特徴表現C1~C3を階層的双曲分類部23xへ出力する。階層的双曲分類部23xは、階層毎の特徴表現を受け取り、階層毎に分類を行って分類結果を階層的損失計算部24へ出力する。 Thus, the hierarchical hyperbolic projection unit 22x outputs the feature representations C1 to C3 for each layer for one input data to the hierarchical hyperbolic classification unit 23x. The hierarchical hyperbolic classifier 23 x receives the feature representation for each layer, classifies it for each layer, and outputs the classification result to the hierarchical loss calculator 24 .
 なお、第3実施形態の学習装置100bにおける特徴抽出部21、勾配計算部25及び更新部26の構成及び動作は第1実施形態と同様であるので、説明を省略する。 The configurations and operations of the feature extraction unit 21, the gradient calculation unit 25, and the update unit 26 in the learning device 100b of the third embodiment are the same as those of the first embodiment, so descriptions thereof will be omitted.
 (学習処理)
 図16は、第3実施形態の学習装置100bによる学習処理のフローチャートである。この処理は、図1に示すプロセッサ12が予め用意されたプログラムを実行し、図13に示す各要素として動作することにより実現される。
(learning process)
FIG. 16 is a flowchart of learning processing by the learning device 100b of the third embodiment. This processing is realized by executing a program prepared in advance by the processor 12 shown in FIG. 1 and operating as each element shown in FIG.
 まず、特徴抽出部21は、入力データを前特徴表現に変換する(ステップS51)。次に、階層的双曲射影部22xは、前特徴表現を階層毎に双曲空間上の特徴表現に変換する(ステップS52)。次に、階層的双曲分類部23xは、階層的双曲射影部22xから入力された階層毎の特徴表現から、各階層について各クラスのスコアを算出する(ステップS53)。次に、階層的損失計算部24は、各階層についての各クラスのスコアと、正解ラベルとから階層的損失を計算する(ステップS54)。次に、勾配計算部25は、階層的損失の勾配を計算する(ステップS55)。次に、更新部26は勾配に基づいて特徴抽出部21、階層的双曲射影部22x及び階層的双曲分類部23xのパラメータを更新する(ステップS56)。以上の処理が、所定の学習終了条件が具備されるまで繰り返し行われ、学習処理は終了する。 First, the feature extraction unit 21 converts the input data into a pre-feature representation (step S51). Next, the hierarchical hyperbolic projection unit 22x converts the previous feature representation into a feature representation on the hyperbolic space for each layer (step S52). Next, the hierarchical hyperbolic classification unit 23x calculates the score of each class for each layer from the feature representation for each layer input from the hierarchical hyperbolic projection unit 22x (step S53). Next, the hierarchical loss calculator 24 calculates a hierarchical loss from the score of each class for each hierarchy and the correct label (step S54). Next, the gradient calculator 25 calculates the gradient of the hierarchical loss (step S55). Next, the update unit 26 updates the parameters of the feature extraction unit 21, the hierarchical hyperbolic projection unit 22x, and the hierarchical hyperbolic classification unit 23x based on the gradient (step S56). The above processing is repeated until a predetermined learning termination condition is satisfied, and the learning processing ends.
 [推論装置]
 次に、第3実施形態の推論装置200bについて説明する。
 (ハードウェア構成)
 推論装置200bのハードウェア構成は、図1に示す学習装置100と同様であるので説明を省略する。
[Inference device]
Next, the inference device 200b of the third embodiment will be described.
(Hardware configuration)
Since the hardware configuration of the inference device 200b is the same as that of the learning device 100 shown in FIG. 1, the description thereof is omitted.
 (機能構成)
 図17は、第3実施形態の推論装置200bの機能構成を示すブロック図である。推論装置200bは、特徴抽出部21と、階層的双曲射影部22xと、階層的双曲分類部23xとを備える。なお、特徴抽出部21、階層的双曲射影部22x及び階層的双曲分類部23xには、先の学習処理により得られたパラメータが設定される。
(Functional configuration)
FIG. 17 is a block diagram showing the functional configuration of the inference device 200b of the third embodiment. The inference device 200b includes a feature extraction unit 21, a hierarchical hyperbolic projection unit 22x, and a hierarchical hyperbolic classification unit 23x. Parameters obtained by the above learning process are set in the feature extraction unit 21, the hierarchical hyperbolic projection unit 22x, and the hierarchical hyperbolic classification unit 23x.
 特徴抽出部21には、入力データが入力される。この入力データは、実際にクラス分類の対象となる画像などのデータである。特徴抽出部21は、入力データを前特徴表現に変換し、階層的双曲射影部22xへ出力する。階層的双曲射影部22xは、クラスの階層構造の知識を用いて、前特徴表現を階層毎に双曲空間上の特徴表現に変換し、階層的双曲分類部23xへ出力する。階層的双曲分類部23xは、階層毎の特徴表現に基づき、各階層の各クラスについてスコアを計算し、推論結果として出力する。こうして、入力データのクラス分類が行われる。 Input data is input to the feature extraction unit 21 . This input data is data such as images that are actually subjected to class classification. The feature extraction unit 21 converts the input data into a pre-feature representation and outputs it to the hierarchical hyperbolic projection unit 22x. The hierarchical hyperbolic projection unit 22x uses the knowledge of the hierarchical structure of classes to convert the previous feature representation into a feature representation on the hyperbolic space for each layer, and outputs the feature representation to the hierarchical hyperbolic classification unit 23x. The hierarchical hyperbolic classifier 23x calculates a score for each class in each layer based on the feature representation for each layer, and outputs it as an inference result. Classification of the input data is thus performed.
 (推論処理)
 図18は、第3実施形態の推論装置200bによる推論処理のフローチャートである。この処理は、図1に示すプロセッサ12が予め用意されたプログラムを実行し、図17に示す各要素として動作することにより実現される。
(inference processing)
FIG. 18 is a flowchart of inference processing by the inference device 200b of the third embodiment. This processing is realized by executing a program prepared in advance by the processor 12 shown in FIG. 1 and operating as each element shown in FIG.
 まず、特徴抽出部21は、入力データを前特徴表現に変換する(ステップS61)。次に、階層的双曲射影部22xは、前特徴表現を階層毎に双曲空間上の特徴表現に変換する(ステップS62)。次に、階層的双曲分類部23xは、階層毎の特徴表現から、各階層について各クラスのスコアを算出し、推論結果として出力する(ステップS63)。以上の処理が入力データ毎に行われる。 First, the feature extraction unit 21 converts the input data into a pre-feature representation (step S61). Next, the hierarchical hyperbolic projection unit 22x converts the previous feature representation into a feature representation on the hyperbolic space for each layer (step S62). Next, the hierarchical hyperbolic classifier 23x calculates the score of each class for each layer from the feature representation of each layer, and outputs it as an inference result (step S63). The above processing is performed for each input data.
 <第4実施形態>
 図19は、第4実施形態の学習装置の機能構成を示すブロック図である。学習装置70は、特徴抽出手段71と、射影手段72と、分類手段73と、損失計算手段74と、更新手段75とを備える。
<Fourth Embodiment>
FIG. 19 is a block diagram showing the functional configuration of the learning device of the fourth embodiment. The learning device 70 includes feature extraction means 71 , projection means 72 , classification means 73 , loss calculation means 74 and update means 75 .
 図20は、第4実施形態の学習装置70による学習処理のフローチャートである。まず、特徴抽出手段71は、入力データを第1の特徴表現に変換する(ステップS71)。次に、射影手段72は、第1の特徴表現を、双曲空間上の点を示す第2の特徴表現に変換する(ステップS72)。次に、分類手段73は、第2の特徴表現に基づいて分類を行い、入力データが各クラスに属する可能性を示すスコアを出力する(ステップS73)。次に、損失計算手段74は、各クラスが属する階層構造の知識と、入力データに付与された正解ラベルと、スコアとに基づいて階層的損失を計算する(ステップS74)。次に、更新手段75は、階層的損失に基づいて、特徴抽出手段、射影手段及び分類手段のパラメータを更新する(ステップS75)。第4実施形態によれば、クラスの階層構造の知識を用いることにより、少ない入力データでも高精度のモデルを生成することが可能となる。 FIG. 20 is a flowchart of learning processing by the learning device 70 of the fourth embodiment. First, the feature extraction means 71 converts the input data into the first feature representation (step S71). Next, the projection means 72 transforms the first feature representation into a second feature representation indicating a point on the hyperbolic space (step S72). Next, the classification means 73 performs classification based on the second feature representation, and outputs a score indicating the possibility that the input data belongs to each class (step S73). Next, the loss calculation means 74 calculates the hierarchical loss based on the knowledge of the hierarchical structure to which each class belongs, the correct label assigned to the input data, and the score (step S74). Next, the updating means 75 updates the parameters of the feature extracting means, the projecting means and the classifying means based on the hierarchical loss (step S75). According to the fourth embodiment, by using the knowledge of the hierarchical structure of classes, it is possible to generate a highly accurate model even with a small amount of input data.
 <第5実施形態>
 図21は、第5実施形態の推論装置の機能構成を示すブロック図である。推論装置80は、特徴抽出手段81と、射影手段82と、分類手段83とを備える。
<Fifth Embodiment>
FIG. 21 is a block diagram showing the functional configuration of the inference device of the fifth embodiment. The inference device 80 comprises feature extraction means 81 , projection means 82 and classification means 83 .
 図22は、第5実施形態の推論装置80による推論処理のフローチャートである。まず、特徴抽出手段81は、入力データを第1の特徴表現に変換する(ステップS81)。次に、射影手段82は、第1の特徴表現を、双曲空間上の点を示す第2の特徴表現に変換する(ステップS82)。次に、分類手段83は、第2の特徴表現に基づいて分類を行い、各クラスが属する階層構造の知識を用いて、前記入力データが各クラスに属する可能性を示すスコアを各階層について算出する(ステップS83)。第4実施形態によれば、クラスの階層構造の知識を利用して学習したモデルを用いて、高精度に推論を行うことが可能となる。 FIG. 22 is a flowchart of inference processing by the inference device 80 of the fifth embodiment. First, the feature extraction means 81 converts the input data into the first feature representation (step S81). Next, the projection means 82 transforms the first feature representation into a second feature representation indicating a point on the hyperbolic space (step S82). Next, the classification means 83 performs classification based on the second feature representation, and uses knowledge of the hierarchical structure to which each class belongs to calculate a score indicating the possibility that the input data belongs to each class for each hierarchy. (step S83). According to the fourth embodiment, it is possible to perform inference with high accuracy using a model learned using knowledge of the hierarchical structure of classes.
 上記の実施形態の一部又は全部は、以下の付記のようにも記載されうるが、以下には限られない。 Some or all of the above embodiments can also be described as the following additional remarks, but are not limited to the following.
 (付記1)
 入力データを第1の特徴表現に変換する特徴抽出手段と、
 前記第1の特徴表現を、双曲空間上の点を示す第2の特徴表現に変換する射影手段と、
 前記第2の特徴表現に基づいて分類を行い、前記入力データが各クラスに属する可能性を示すスコアを出力する分類手段と、
 前記各クラスが属する階層構造の知識と、前記入力データに付与された正解ラベルと、前記スコアとに基づいて階層的損失を計算する損失計算手段と、
 前記階層的損失に基づいて、前記特徴抽出手段、前記射影手段及び前記分類手段のパラメータを更新する更新手段と、
 を備える学習装置。
(Appendix 1)
a feature extraction means for converting input data into a first feature representation;
projection means for transforming the first feature representation into a second feature representation representing a point on hyperbolic space;
Classification means for performing classification based on the second feature representation and outputting a score indicating the possibility that the input data belongs to each class;
loss calculation means for calculating a hierarchical loss based on the knowledge of the hierarchical structure to which each class belongs, the correct label assigned to the input data, and the score;
updating means for updating parameters of the feature extracting means, the projecting means and the classifying means based on the hierarchical loss;
A learning device with
 (付記2)
 前記分類手段は、前記階層構造の末端クラスについてスコアを出力し、
 前記損失計算手段は、前記末端クラスのスコアを統合して当該末端クラスの階層より上位の階層の損失を計算し、各階層の損失の重み付け和を前記階層的損失として計算する付記1に記載の学習装置。
(Appendix 2)
The classification means outputs a score for the terminal class of the hierarchical structure,
The loss calculation means according to appendix 1, wherein the scores of the terminal classes are integrated to calculate the losses of the layers higher than the layer of the terminal class, and the weighted sum of the losses of each layer is calculated as the hierarchical loss. learning device.
 (付記3)
 前記損失計算手段は、前記末端クラスの階層については正解クラスのスコアを最大化する損失を計算し、前記末端クラスの階層より上位の階層については、当該階層のクラスのうち前記正解クラスが属するクラスのスコアを最大化する損失を計算する付記2に記載の学習装置。
(Appendix 3)
The loss calculation means calculates a loss that maximizes the score of the correct class for the hierarchy of the terminal class, and for the hierarchy higher than the hierarchy of the terminal class, the class to which the correct class belongs among the classes of the hierarchy 3. The learning device of claim 2, which calculates a loss that maximizes the score of .
 (付記4)
 前記分類手段は、前記階層構造の知識を用いて、各階層について前記スコアを出力し、
 前記損失計算手段は、各階層について出力された前記スコアに基づいて前記階層的損失を計算する付記1乃至3のいずれか一項に記載の学習装置。
(Appendix 4)
The classifying means outputs the score for each hierarchy using the knowledge of the hierarchical structure,
4. The learning device according to any one of supplementary notes 1 to 3, wherein the loss calculation means calculates the hierarchical loss based on the score output for each layer.
 (付記5)
 前記射影手段は、前記階層構造の知識に基づいて、各階層について前記第2の特徴表現を出力し、
 前記分類手段は、各階層について出力された前記第2の特徴表現に基づいて、前記各階層について前記スコアを出力する付記4に記載の学習装置。
(Appendix 5)
the projection means outputs the second feature representation for each hierarchy based on the knowledge of the hierarchical structure;
5. The learning device according to appendix 4, wherein the classifying means outputs the score for each layer based on the second feature representation output for each layer.
 (付記6)
 特徴抽出手段を用いて、入力データを第1の特徴表現に変換し、
 射影手段を用いて、前記第1の特徴表現を、双曲空間上の点を示す第2の特徴表現に変換し、
 分類手段を用いて、前記第2の特徴表現に基づいて分類を行い、前記入力データが各クラスに属する可能性を示すスコアを出力し、
 前記各クラスが属する階層構造の知識と、前記入力データに付与された正解ラベルと、前記スコアとに基づいて階層的損失を計算し、
 前記階層的損失に基づいて、前記特徴抽出手段、前記射影手段及び前記分類手段のパラメータを更新する学習方法。
(Appendix 6)
converting the input data into a first feature representation using the feature extraction means;
transforming the first feature representation into a second feature representation representing a point on hyperbolic space using a projection means;
Classifying based on the second feature representation using classifying means and outputting a score indicating the likelihood that the input data belongs to each class;
calculating a hierarchical loss based on the knowledge of the hierarchical structure to which each class belongs, the correct label assigned to the input data, and the score;
A learning method for updating parameters of the feature extraction means, the projection means and the classification means based on the hierarchical loss.
 (付記7)
 特徴抽出手段を用いて、入力データを第1の特徴表現に変換し、
 射影手段を用いて、前記第1の特徴表現を、双曲空間上の点を示す第2の特徴表現に変換し、
 分類手段を用いて、前記第2の特徴表現に基づいて分類を行い、前記入力データが各クラスに属する可能性を示すスコアを出力し、
 前記各クラスが属する階層構造の知識と、前記入力データに付与された正解ラベルと、前記スコアとに基づいて階層的損失を計算し、
 前記階層的損失に基づいて、前記特徴抽出手段、前記射影手段及び前記分類手段のパラメータを更新する処理をコンピュータに実行させるプログラムを記録した記録媒体。
(Appendix 7)
converting the input data into a first feature representation using the feature extraction means;
transforming the first feature representation into a second feature representation representing a point on hyperbolic space using a projection means;
Classifying based on the second feature representation using classifying means and outputting a score indicating the likelihood that the input data belongs to each class;
calculating a hierarchical loss based on the knowledge of the hierarchical structure to which each class belongs, the correct label assigned to the input data, and the score;
A recording medium recording a program for causing a computer to execute processing for updating parameters of the feature extraction means, the projection means, and the classification means based on the hierarchical loss.
 (付記8)
 入力データを第1の特徴表現に変換する特徴抽出手段と、
 前記第1の特徴表現を、双曲空間上の点を示す第2の特徴表現に変換する射影手段と、
 前記第2の特徴表現に基づいて分類を行い、各クラスが属する階層構造の知識を用いて、前記入力データが各クラスに属する可能性を示すスコアを各階層について算出する分類手段と、
 を備える推論装置。
(Appendix 8)
a feature extraction means for converting input data into a first feature representation;
projection means for transforming the first feature representation into a second feature representation representing a point on hyperbolic space;
Classification means for performing classification based on the second feature representation, and using knowledge of the hierarchical structure to which each class belongs to calculate a score indicating the possibility that the input data belongs to each class for each hierarchy;
A reasoning device with
 (付記9)
 前記射影手段は、前記階層構造の知識に基づいて、各階層について前記第2の特徴表現を出力し、
 前記分類手段は、前記各階層について出力された前記第2の特徴表現に基づいて、各階層について前記スコアを出力する付記1乃至4のいずれか一項に記載の学習装置。
(Appendix 9)
the projection means outputs the second feature representation for each hierarchy based on the knowledge of the hierarchical structure;
5. The learning device according to any one of supplementary notes 1 to 4, wherein the classifying means outputs the score for each layer based on the second feature representation output for each layer.
 (付記10)
 入力データを第1の特徴表現に変換し、
 前記第1の特徴表現を、双曲空間上の点を示す第2の特徴表現に変換し、
 前記第2の特徴表現に基づいて分類を行い、各クラスが属する階層構造の知識を用いて、前記入力データが各クラスに属する可能性を示すスコアを各階層について算出する推論方法。
(Appendix 10)
transforming the input data into a first feature representation;
transforming the first feature representation into a second feature representation representing a point on hyperbolic space;
An inference method for performing classification based on the second feature representation, and using knowledge of a hierarchical structure to which each class belongs to calculate, for each hierarchy, a score indicating the possibility that the input data belongs to each class.
 (付記11)
 入力データを第1の特徴表現に変換し、
 前記第1の特徴表現を、双曲空間上の点を示す第2の特徴表現に変換し、
 前記第2の特徴表現に基づいて分類を行い、各クラスが属する階層構造の知識を用いて、前記入力データが各クラスに属する可能性を示すスコアを各階層について算出する処理をコンピュータに実行させるプログラムを記録した記録媒体。
(Appendix 11)
transforming the input data into a first feature representation;
transforming the first feature representation into a second feature representation representing a point on hyperbolic space;
Classification is performed based on the second feature representation, and a computer is caused to execute processing for calculating, for each hierarchy, a score indicating the possibility that the input data belongs to each class, using knowledge of the hierarchical structure to which each class belongs. A recording medium that records a program.
 以上、実施形態及び実施例を参照して本開示を説明したが、本開示は上記実施形態及び実施例に限定されるものではない。本開示の構成や詳細には、本開示のスコープ内で当業者が理解し得る様々な変更をすることができる。 Although the present disclosure has been described above with reference to the embodiments and examples, the present disclosure is not limited to the above embodiments and examples. Various changes that can be understood by those skilled in the art can be made to the configuration and details of the present disclosure within the scope of the present disclosure.
 21 特徴抽出部
 22 双曲射影部
 22x 階層的双曲射影部
 23 双曲分類部
 23x 階層的双曲分類部
 24 階層的損失計算部
 25 勾配計算部
 26 更新部
 100、100a、100b 学習装置
 200、200a、200b 推論装置
21 feature extraction unit 22 hyperbolic projection unit 22x hierarchical hyperbolic projection unit 23 hyperbolic classification unit 23x hierarchical hyperbolic classification unit 24 hierarchical loss calculation unit 25 gradient calculation unit 26 update unit 100, 100a, 100b learning device 200, 200a, 200b inference device

Claims (11)

  1.  入力データを第1の特徴表現に変換する特徴抽出手段と、
     前記第1の特徴表現を、双曲空間上の点を示す第2の特徴表現に変換する射影手段と、
     前記第2の特徴表現に基づいて分類を行い、前記入力データが各クラスに属する可能性を示すスコアを出力する分類手段と、
     前記各クラスが属する階層構造の知識と、前記入力データに付与された正解ラベルと、前記スコアとに基づいて階層的損失を計算する損失計算手段と、
     前記階層的損失に基づいて、前記特徴抽出手段、前記射影手段及び前記分類手段のパラメータを更新する更新手段と、
     を備える学習装置。
    a feature extraction means for converting input data into a first feature representation;
    projection means for transforming the first feature representation into a second feature representation representing a point on hyperbolic space;
    Classification means for performing classification based on the second feature representation and outputting a score indicating the possibility that the input data belongs to each class;
    loss calculation means for calculating a hierarchical loss based on the knowledge of the hierarchical structure to which each class belongs, the correct label assigned to the input data, and the score;
    updating means for updating parameters of the feature extracting means, the projecting means and the classifying means based on the hierarchical loss;
    A learning device with
  2.  前記分類手段は、前記階層構造の末端クラスについてスコアを出力し、
     前記損失計算手段は、前記末端クラスのスコアを統合して当該末端クラスの階層より上位の階層の損失を計算し、各階層の損失の重み付け和を前記階層的損失として計算する請求項1に記載の学習装置。
    The classification means outputs a score for the terminal class of the hierarchical structure,
    2. The loss calculation means according to claim 1, wherein the scores of the terminal classes are integrated to calculate the losses of the layers higher than the layer of the terminal class, and the weighted sum of the losses of each layer is calculated as the hierarchical loss. learning device.
  3.  前記損失計算手段は、前記末端クラスの階層については正解クラスのスコアを最大化する損失を計算し、前記末端クラスの階層より上位の階層については、当該階層において前記正解クラスが属するクラスのスコアを最大化する損失を計算する請求項2に記載の学習装置。 The loss calculation means calculates a loss that maximizes the score of the correct class for the hierarchy of the terminal class, and calculates the score of the class to which the correct class belongs in the hierarchy higher than the hierarchy of the terminal class. 3. The learning device according to claim 2, wherein the maximizing loss is calculated.
  4.  前記分類手段は、前記階層構造の知識を用いて、各階層について前記スコアを出力し、
     前記損失計算手段は、各階層について出力された前記スコアに基づいて前記階層的損失を計算する請求項1乃至3のいずれか一項に記載の学習装置。
    The classifying means outputs the score for each hierarchy using the knowledge of the hierarchical structure,
    4. The learning device according to any one of claims 1 to 3, wherein said loss calculation means calculates said hierarchical loss based on said score output for each layer.
  5.  前記射影手段は、前記階層構造の知識に基づいて、各階層について前記第2の特徴表現を出力し、
     前記分類手段は、各階層について出力された前記第2の特徴表現に基づいて、前記各階層について前記スコアを出力する請求項4に記載の学習装置。
    the projection means outputs the second feature representation for each hierarchy based on the knowledge of the hierarchical structure;
    5. The learning device according to claim 4, wherein the classifying means outputs the score for each layer based on the second feature representation output for each layer.
  6.  特徴抽出手段を用いて、入力データを第1の特徴表現に変換し、
     射影手段を用いて、前記第1の特徴表現を、双曲空間上の点を示す第2の特徴表現に変換し、
     分類手段を用いて、前記第2の特徴表現に基づいて分類を行い、前記入力データが各クラスに属する可能性を示すスコアを出力し、
     前記各クラスが属する階層構造の知識と、前記入力データに付与された正解ラベルと、前記スコアとに基づいて階層的損失を計算し、
     前記階層的損失に基づいて、前記特徴抽出手段、前記射影手段及び前記分類手段のパラメータを更新する学習方法。
    converting the input data into a first feature representation using the feature extraction means;
    transforming the first feature representation into a second feature representation representing a point on hyperbolic space using a projection means;
    Classifying based on the second feature representation using classifying means and outputting a score indicating the likelihood that the input data belongs to each class;
    calculating a hierarchical loss based on the knowledge of the hierarchical structure to which each class belongs, the correct label assigned to the input data, and the score;
    A learning method for updating parameters of the feature extraction means, the projection means and the classification means based on the hierarchical loss.
  7.  特徴抽出手段を用いて、入力データを第1の特徴表現に変換し、
     射影手段を用いて、前記第1の特徴表現を、双曲空間上の点を示す第2の特徴表現に変換し、
     分類手段を用いて、前記第2の特徴表現に基づいて分類を行い、前記入力データが各クラスに属する可能性を示すスコアを出力し、
     前記各クラスが属する階層構造の知識と、前記入力データに付与された正解ラベルと、前記スコアとに基づいて階層的損失を計算し、
     前記階層的損失に基づいて、前記特徴抽出手段、前記射影手段及び前記分類手段のパラメータを更新する処理をコンピュータに実行させるプログラムを記録した記録媒体。
    converting the input data into a first feature representation using the feature extraction means;
    transforming the first feature representation into a second feature representation representing a point on hyperbolic space using a projection means;
    Classifying based on the second feature representation using classifying means and outputting a score indicating the likelihood that the input data belongs to each class;
    calculating a hierarchical loss based on the knowledge of the hierarchical structure to which each class belongs, the correct label assigned to the input data, and the score;
    A recording medium recording a program for causing a computer to execute a process of updating parameters of the feature extracting means, the projecting means and the classifying means based on the hierarchical loss.
  8.  入力データを第1の特徴表現に変換する特徴抽出手段と、
     前記第1の特徴表現を、双曲空間上の点を示す第2の特徴表現に変換する射影手段と、
     前記第2の特徴表現に基づいて分類を行い、各クラスが属する階層構造の知識を用いて、前記入力データが各クラスに属する可能性を示すスコアを各階層について算出する分類手段と、
     を備える推論装置。
    a feature extraction means for converting input data into a first feature representation;
    projection means for transforming the first feature representation into a second feature representation representing a point on hyperbolic space;
    Classification means for performing classification based on the second feature representation, and using knowledge of the hierarchical structure to which each class belongs to calculate a score indicating the possibility that the input data belongs to each class for each hierarchy;
    A reasoning device with
  9.  前記射影手段は、前記階層構造の知識に基づいて、各階層について前記第2の特徴表現を出力し、
     前記分類手段は、前記各階層について出力された前記第2の特徴表現に基づいて、各階層について前記スコアを出力する請求項1乃至4のいずれか一項に記載の学習装置。
    the projection means outputs the second feature representation for each hierarchy based on the knowledge of the hierarchical structure;
    5. The learning device according to any one of claims 1 to 4, wherein the classifying means outputs the score for each layer based on the second characteristic representation output for each layer.
  10.  入力データを第1の特徴表現に変換し、
     前記第1の特徴表現を、双曲空間上の点を示す第2の特徴表現に変換し、
     前記第2の特徴表現に基づいて分類を行い、各クラスが属する階層構造の知識を用いて、前記入力データが各クラスに属する可能性を示すスコアを各階層について算出する推論方法。
    transforming the input data into a first feature representation;
    transforming the first feature representation into a second feature representation representing a point on hyperbolic space;
    An inference method for performing classification based on the second feature representation, and using knowledge of a hierarchical structure to which each class belongs to calculate, for each hierarchy, a score indicating the possibility that the input data belongs to each class.
  11.  入力データを第1の特徴表現に変換し、
     前記第1の特徴表現を、双曲空間上の点を示す第2の特徴表現に変換し、
     前記第2の特徴表現に基づいて分類を行い、各クラスが属する階層構造の知識を用いて、前記入力データが各クラスに属する可能性を示すスコアを各階層について算出する処理をコンピュータに実行させるプログラムを記録した記録媒体。
    transforming the input data into a first feature representation;
    transforming the first feature representation into a second feature representation representing a point on hyperbolic space;
    Classification is performed based on the second feature representation, and a computer is caused to execute processing for calculating, for each hierarchy, a score indicating the possibility that the input data belongs to each class, using knowledge of the hierarchical structure to which each class belongs. A recording medium that records a program.
PCT/JP2021/008691 2021-03-05 2021-03-05 Learning device, learning method, inference device, inference method, and recording medium WO2022185529A1 (en)

Priority Applications (2)

Application Number Priority Date Filing Date Title
PCT/JP2021/008691 WO2022185529A1 (en) 2021-03-05 2021-03-05 Learning device, learning method, inference device, inference method, and recording medium
JP2023503320A JPWO2022185529A5 (en) 2021-03-05 Learning device, learning method, inference device, inference method, and program

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
PCT/JP2021/008691 WO2022185529A1 (en) 2021-03-05 2021-03-05 Learning device, learning method, inference device, inference method, and recording medium

Publications (1)

Publication Number Publication Date
WO2022185529A1 true WO2022185529A1 (en) 2022-09-09

Family

ID=83154123

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/JP2021/008691 WO2022185529A1 (en) 2021-03-05 2021-03-05 Learning device, learning method, inference device, inference method, and recording medium

Country Status (1)

Country Link
WO (1) WO2022185529A1 (en)

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JP2020042403A (en) * 2018-09-07 2020-03-19 Zホールディングス株式会社 Information processing apparatus, information processing method and program
JP2020053073A (en) * 2014-03-28 2020-04-02 日本電気株式会社 Learning method, learning system, and learning program
JP2020091846A (en) * 2018-10-19 2020-06-11 タタ コンサルタンシー サービシズ リミテッドTATA Consultancy Services Limited Systems and methods for conversation-based ticket logging
JP2020091813A (en) * 2018-12-07 2020-06-11 公立大学法人会津大学 Learning method for neural network, computer program and computer device
WO2020162294A1 (en) * 2019-02-07 2020-08-13 株式会社Preferred Networks Conversion method, training device, and inference device

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JP2020053073A (en) * 2014-03-28 2020-04-02 日本電気株式会社 Learning method, learning system, and learning program
JP2020042403A (en) * 2018-09-07 2020-03-19 Zホールディングス株式会社 Information processing apparatus, information processing method and program
JP2020091846A (en) * 2018-10-19 2020-06-11 タタ コンサルタンシー サービシズ リミテッドTATA Consultancy Services Limited Systems and methods for conversation-based ticket logging
JP2020091813A (en) * 2018-12-07 2020-06-11 公立大学法人会津大学 Learning method for neural network, computer program and computer device
WO2020162294A1 (en) * 2019-02-07 2020-08-13 株式会社Preferred Networks Conversion method, training device, and inference device

Non-Patent Citations (2)

* Cited by examiner, † Cited by third party
Title
HIGASHIYAMA, SHOHEI; BLONDEL, MATHIEU; SEKI, KAZUHIRO; UEHARA, KUNIAKI: "Named Entity Recognition Exploiting Category Hierarchy Using Structured Perceptron", IPSJ SIG TECHNICAL REPORTS, vol. 2012-BIO-32, no. 25, 30 November 2011 (2011-11-30), pages 1 - 6, XP009539751 *
MAXIMILIAN NICKEL; DOUWE KIELA: "Poincare Embeddings for Learning Hierarchical Representations", ARXIV.ORG, CORNELL UNIVERSITY LIBRARY, 201 OLIN LIBRARY CORNELL UNIVERSITY ITHACA, NY 14853, 23 May 2017 (2017-05-23), 201 Olin Library Cornell University Ithaca, NY 14853 , XP080949506 *

Also Published As

Publication number Publication date
JPWO2022185529A1 (en) 2022-09-09

Similar Documents

Publication Publication Date Title
Sarker Deep learning: a comprehensive overview on techniques, taxonomy, applications and research directions
CN112560432B (en) Text emotion analysis method based on graph attention network
CN111353373B (en) Related alignment domain adaptive fault diagnosis method
Naz et al. Intelligent routing between capsules empowered with deep extreme machine learning technique
CN112733866A (en) Network construction method for improving text description correctness of controllable image
CN112199536A (en) Cross-modality-based rapid multi-label image classification method and system
CN115661550A (en) Graph data class imbalance classification method and device based on generation countermeasure network
CN113254675A (en) Knowledge graph construction method based on self-adaptive few-sample relation extraction
Jiang et al. An intelligent recommendation approach for online advertising based on hybrid deep neural network and parallel computing
Jia et al. Imbalanced disk failure data processing method based on CTGAN
Dinov et al. Black box machine-learning methods: Neural networks and support vector machines
CN113849653A (en) Text classification method and device
WO2022185529A1 (en) Learning device, learning method, inference device, inference method, and recording medium
Wang et al. Interpret neural networks by extracting critical subnetworks
Jiang et al. A massive multi-modal perception data classification method using deep learning based on internet of things
CN111259938A (en) Manifold learning and gradient lifting model-based image multi-label classification method
CN112668633B (en) Adaptive graph migration learning method based on fine granularity field
CN114492386A (en) Combined detection method for drug name and adverse drug reaction in web text
Aziz Deep learning: an overview of Convolutional Neural Network (CNN)
Li et al. PointSmile: Point self-supervised learning via curriculum mutual information
CN113032565B (en) Cross-language supervision-based superior-inferior relation detection method
CN112927248B (en) Point cloud segmentation method based on local feature enhancement and conditional random field
CN115952259B (en) Intelligent generation method of enterprise portrait tag
US20240020553A1 (en) Interactive electronic device for performing functions of providing responses to questions from users and real-time conversation with the users using models learned by deep learning technique and operating method thereof
Dinov Black Box Machine Learning Methods

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 21929101

Country of ref document: EP

Kind code of ref document: A1

WWE Wipo information: entry into national phase

Ref document number: 2023503320

Country of ref document: JP

NENP Non-entry into the national phase

Ref country code: DE

122 Ep: pct application non-entry in european phase

Ref document number: 21929101

Country of ref document: EP

Kind code of ref document: A1