WO2022185529A1

WO2022185529A1 - Learning device, learning method, inference device, inference method, and recording medium

Info

Publication number: WO2022185529A1
Application number: PCT/JP2021/008691
Authority: WO
Inventors: 周平吉田
Original assignee: 日本電気株式会社
Priority date: 2021-03-05
Filing date: 2021-03-05
Publication date: 2022-09-09
Also published as: JPWO2022185529A1

Abstract

In this learning device, a feature extraction means converts input data into a first feature representation. A projection means converts the first feature representation into a second feature representation representing a point in a hyperbolic space. A classification means performs classification on the basis of the second feature representation, and outputs a score indicating the likelihood that the input data belongs to each class. A loss calculation means calculates a hierarchical loss on the basis of knowledge of the hierarchical structure to which each class belongs, a correct answer label assigned to the input data, and the score. An update means updates the parameters of the feature extraction means, the projection means, and the classification means on the basis of the hierarchical loss.

Description

Learning device, learning method, reasoning device, reasoning method, and recording medium

This disclosure relates to a learning method for a machine learning model.

In recent years, recognition technology based on machine learning has shown extremely high performance, mainly in the field of image recognition. The high accuracy of such recognition technology based on machine learning is supported by a large amount of data with correct answers. That is, high accuracy is achieved by preparing a large amount of data with correct answers and performing learning. For example, Patent Literature 1 discloses a learning method for identifying categories having a hierarchical structure.

International publication WO2006/073081

On the other hand, depending on the application of image recognition technology, it is required to realize low-cost and highly accurate machine learning without preparing a large amount of data with correct answers.

One purpose of the present disclosure is to generate a low-cost and highly accurate machine learning model.

In one aspect of the present disclosure, a learning device includes:
a feature extraction means for converting input data into a first feature representation;
projection means for transforming the first feature representation into a second feature representation representing a point on hyperbolic space;
Classification means for performing classification based on the second feature representation and outputting a score indicating the possibility that the input data belongs to each class;
loss calculation means for calculating a hierarchical loss based on the knowledge of the hierarchical structure to which each class belongs, the correct label assigned to the input data, and the score;
updating means for updating parameters of the feature extracting means, the projecting means and the classifying means based on the hierarchical loss.

In another aspect of the disclosure, a learning method comprises:
converting the input data into a first feature representation using the feature extraction means;
transforming the first feature representation into a second feature representation representing a point on hyperbolic space using a projection means;
Classifying based on the second feature representation using classifying means and outputting a score indicating the likelihood that the input data belongs to each class;
calculating a hierarchical loss based on the knowledge of the hierarchical structure to which each class belongs, the correct label assigned to the input data, and the score;
Based on the hierarchical loss, parameters of the feature extraction means, the projection means and the classifier are updated.

In yet another aspect of the present disclosure, the recording medium comprises
transforming the input data into a first feature representation using the classifier;
transforming the first feature representation into a second feature representation representing a point on hyperbolic space using a projection means;
Classifying based on the second feature representation using classifying means and outputting a score indicating the likelihood that the input data belongs to each class;
calculating a hierarchical loss based on the knowledge of the hierarchical structure to which each class belongs, the correct label assigned to the input data, and the score;
A program for causing a computer to execute processing for updating parameters of the feature extracting means, the projecting means and the classifying means based on the hierarchical loss is recorded.

In still another aspect of the present disclosure, an inference device includes:
a feature extraction means for converting input data into a first feature representation;
projection means for transforming the first feature representation into a second feature representation representing a point on hyperbolic space;
a classification means for performing classification based on the second feature representation, and using knowledge of a hierarchical structure to which each class belongs to calculate a score indicating a possibility that the input data belongs to each class for each hierarchy. .

In yet another aspect of the disclosure, an inference method includes:
transforming the input data into a first feature representation;
transforming the first feature representation into a second feature representation representing a point on hyperbolic space;
Classification is performed based on the second feature representation, and a score indicating the possibility that the input data belongs to each class is calculated for each hierarchy using knowledge of the hierarchical structure to which each class belongs.

In yet another aspect of the present disclosure, the recording medium comprises
transforming the input data into a first feature representation;
transforming the first feature representation into a second feature representation representing a point on hyperbolic space;
Classification is performed based on the second feature representation, and a computer is caused to execute processing for calculating, for each hierarchy, a score indicating the possibility that the input data belongs to each class, using knowledge of the hierarchical structure to which each class belongs. Record the program.

According to the present disclosure, it is possible to generate a low-cost and highly accurate machine learning model by using knowledge of the class structure.

2 is a block diagram showing the hardware configuration of the learning device of the first embodiment; FIG. 2 is a block diagram showing the functional configuration of the learning device of the first embodiment; FIG. Here is an example of a class hierarchy. 4 is a flowchart of learning processing by the learning device of the first embodiment; 2 is a block diagram showing the functional configuration of the inference device of the first embodiment; FIG. 4 is a flowchart of inference processing by the inference device of the first embodiment; FIG. 11 is a block diagram showing the functional configuration of a learning device according to a second embodiment; FIG. An example of sharing by multiple classifiers forming a hierarchical hyperbolic classifier is shown. Fig. 2 shows another example of division by multiple classifiers forming a hierarchical hyperbolic classifier; 9 is a flowchart of learning processing by the learning device of the second embodiment; FIG. 12 is a block diagram showing the functional configuration of the inference device of the second embodiment; FIG. 9 is a flowchart of inference processing by the inference device of the second embodiment; FIG. 11 is a block diagram showing the functional configuration of a learning device according to a third embodiment; FIG. 4 shows a schematic configuration of a hierarchical hyperbolic projection unit; FIG. 2 is a diagram conceptually explaining feature representations and differences; 10 is a flowchart of learning processing by the learning device of the third embodiment; FIG. 11 is a block diagram showing the functional configuration of an inference device according to a third embodiment; FIG. 10 is a flowchart of inference processing by the inference device of the third embodiment; FIG. 11 is a block diagram showing the functional configuration of a learning device according to a fourth embodiment; FIG. 10 is a flowchart of learning processing by the learning device of the fourth embodiment; FIG. 14 is a block diagram showing the functional configuration of an inference device according to a fifth embodiment; FIG. 12 is a flowchart of inference processing by the inference device of the fifth embodiment; FIG.

Preferred embodiments of the present disclosure will be described below with reference to the drawings.
<Concept explanation>
As mentioned above, it is possible to obtain a highly accurate recognition model by training using a large amount of training data with correct answers, but there are also cases where it is required to generate a highly accurate model at low cost from a small amount of data. be. In order to learn a highly accurate model from a small amount of data, it is essential to use information other than training data. When performing multi-class classification, knowledge about the hierarchical structure of classes is highly versatile and easily available in many cases. Therefore, the following embodiments provide a learning method that can obtain a highly accurate classification model even with a small amount of data by using knowledge indicating the hierarchical structure of classes to be classified.

<First embodiment>
[Learning device]
First, the learning device of the first embodiment will be described.
(Hardware configuration)
FIG. 1 is a block diagram showing the hardware configuration of the learning device 100 of the first embodiment. As illustrated, the learning device 100 includes an interface (I/F) 11 , a processor 12 , a memory 13 , a recording medium 14 and a database (DB) 15 .

The interface 11 performs data input/output with an external device. Specifically, data with correct answers used for learning is input through the interface 11 .

The processor 12 is a computer such as a CPU (Central Processing Unit), and controls the entire study device 100 by executing a program prepared in advance. The processor 12 may be a GPU (Graphics Processing Unit) or an FPGA (Field-Programmable Gate Array). The processor 12 executes learning processing, which will be described later.

The memory 13 is composed of ROM (Read Only Memory), RAM (Random Access Memory), and the like. Memory 13 is also used as a working memory during execution of various processes by processor 12 .

The recording medium 14 is a non-volatile, non-temporary recording medium such as a disk-shaped recording medium or semiconductor memory, and is configured to be detachable from the learning device 100 . The recording medium 14 records various programs executed by the processor 12 . When the learning device 100 executes various processes, a program recorded on the recording medium 14 is loaded into the memory 13 and executed by the processor 12 . DB15 memorize|stores the data with a correct answer for learning, etc. as needed.

(Functional configuration)
FIG. 2 is a block diagram showing the functional configuration of the learning device 100 of the first embodiment. The learning device 100 includes a feature extraction unit 21 , a hyperbolic projection unit 22 , a hyperbolic classification unit 23 , a hierarchical loss calculation unit 24 , a gradient calculation unit 25 and an update unit 26 .

Data with correct answers include input data and correct labels corresponding to the input data. For example, when learning an image recognition model, the input data is an image used for learning, and the correct label is information indicating the class of the object included in the image. Of the data with correct answers, input data is input to the feature extraction unit 21 and correct labels are input to the hierarchical loss calculation unit 24 .

The feature extraction unit 21 converts the input data into a pre-feature representation. Note that the feature representation output by the feature extraction unit 21 is called a "pre-feature representation" in the sense of distinguishing it from the feature representation output by the hyperbolic projection unit 22, which will be described later. Both the “previous feature representation” and the “feature representation” are information representing features of the input data. Specifically, when learning an image recognition model, the feature extraction unit 21 is configured by a deep convolutional neural network (CNN) or the like, and uses a sequence (vector) of real numbers representing features of an input image as a pre-feature representation. Output to the hyperbolic projection unit 22 .

The hyperbolic projection unit 22 converts the pre-feature representation into a feature representation. Here, the "feature representation" is a point on some manifold, and selecting a specific projection part is equivalent to selecting a manifold (feature space) to which the feature representation belongs. In general, use a linear space (Euclidean space) as the feature space and a linear projection part as the projection part, or use a high-dimensional hypersphere as the feature space and a spherical projection part as the projection part. There are many. On the other hand, this embodiment uses a hyperbolic space as the feature amount space.

As described above, this embodiment obtains a highly accurate model with a small amount of training data by using knowledge about the hierarchical structure of classes, but the hierarchical structure (tree structure) expands exponentially. have the property In general, Euclidean space and hypersphere are used as feature space, but Euclidean space and hypersphere expand only polynomially, so they are not suitable for embedding tree structures. That is, when a hierarchical structure is expressed on Euclidean space or hypersphere, distortion is inevitable in low dimensions. Therefore, in order to express a hierarchical structure (tree structure) without distortion on the Euclidean space or hypersphere, it is necessary to use a feature amount space exponentially higher in terms of the number of classes.

From this point, in this embodiment, the hyperbolic space is used as the feature amount space. A tree structure can be embedded efficiently in a hyperbolic space. A hyperbolic space that expands exponentially can embed a tree structure without distortion even in two dimensions. Therefore, the hyperbolic projection unit 22 converts the previous feature representation into a feature representation on the hyperbolic space, and outputs the feature representation to the hyperbolic classification unit 23 . Like the previous feature representation, the feature representation is also a sequence (vector) of real numbers, but can be regarded as coordinate values on the hyperbolic space, which is the feature quantity space. The hyperbolic projection unit 22 can use Poincaré projection, Lorentz projection, etc. according to a specific hyperbolic space model.

The hyperbolic classifier 23 classifies one feature expression on the hyperbolic space output by the hyperbolic projection unit 22, and outputs the score of each class obtained for the feature expression to the hierarchical loss calculator 24. do. Note that the hyperbolic classifier 23 outputs only the scores of terminal classes in the hierarchical structure of classes. As the hyperbolic classifier 23, a hyperbolic hyperplane classifier or a hyperbolic nearest neighbor classifier can be used. A hyperbolic hyperplane classifier is a classifier that extends a linear classifier to a hyperbolic space and uses a hyperbolic plane in the hyperbolic space as a discrimination plane. A hyperbolic nearest neighbor classifier is a classifier that follows the nearest neighbor rule on hyperbolic space. The specific expression of the hyperbolic classification unit 23 is determined by the hyperbolic space model selected by the hyperbolic projection unit 22 .

The hierarchical loss calculator 24 calculates a loss function from the score of each class input from the hyperbolic classifier 23 and the correct labels included in the data with correct answers. At this time, the hierarchical loss calculator 24 uses knowledge of the hierarchical structure of the class to be classified. Specifically, the hierarchical loss calculation unit 24 calculates the score for each layer from the score for each class output by the hyperbolic classifier 23, so that the score for each layer can predict the correct class for each layer. Calculate the loss for each layer. Note that the hierarchical loss calculator 24 can use a general loss function for multilevel classification, such as cross-entropy loss.

Here, the loss calculation method by the hierarchical loss calculation unit 24 will be described with an example. FIG. 3 shows an example of a hierarchical structure of classes. This example shows a hierarchical structure (tree structure) with a root node of "merchandise" and has first to third hierarchies. The first hierarchy includes three classes "food", "beverage" and "pharmaceutical" as child nodes of "merchandise". The second hierarchy contains three classes "Bento", "Bread" and "Rice ball" as child nodes of "Food" and three classes "Tea", "Juice" and "Water" as child nodes of "Beverage". including. In addition, the third layer includes “Bento A” to “Bento C” as child nodes of “Bento”, “Bread A” to “Bread C” as child nodes of “Bread”, and “Onigiri” as child nodes. "Rice ball A" to "Rice ball C" are included as nodes. For the sake of convenience, illustration of the second and third layers of pharmaceuticals and the third layer of beverages is omitted.

As described above, the hyperbolic classifier 23 outputs only the scores of terminal classes in the hierarchical structure of classes. In the example of FIG. 3, the hyperbolic classifying unit 23 selects only terminal class scores such as "lunch box A" to "lunch box C", "bread A" to "bread C", and "rice ball A" to "rice ball C". Output. Suppose now that certain input data is input and its correct label is "Bento B". In this case, the hierarchical loss calculation unit 24 calculates the loss that maximizes the score of the correct class "Bento B" for the third layer, which is the layer of the terminal class, and sets it as the loss of the third layer. .

In addition, the hierarchical loss calculation unit 24 integrates the scores of the terminal classes that are descendants of each node and uses them for loss calculation when calculating the loss of the hierarchy higher than the terminal class. Specifically, if the score output by the hyperbolic classifier 23 is the probability of the terminal class, the score of each class in the higher hierarchy is the sum of the probabilities of the terminal classes that are descendants of the class.

For example, in the example of FIG. 3, the score of "lunch box" in the second layer is the sum of the scores of its child nodes "lunch box A" to "lunch box C". Similarly, the score of "bread" in the second layer is the sum of the scores of its child nodes "bread A" to "bread C", and the score of "rice ball" in the second layer is the sum of the scores of its child nodes "rice ball A” to “rice ball C”. Then, the hierarchical loss calculation unit 24 calculates a loss that maximizes the score of "Bento" having the correct class "Bento B" as a descendant node for the second layer.

In addition, the score of "Food" in the first layer is the terminal class "Bento A" to "Bento C", "Bread A" to "Bread C", and "Rice ball A" to "Rice ball C", which are the grandchild nodes of terminal classes. is the sum of the scores of Similarly, the score of "beverage" and "pharmaceutical" in the first hierarchy is also the sum of the scores of terminal classes that are grandchild nodes. The hierarchical loss calculation unit 24 calculates a loss that maximizes the score of "food" having the correct class "lunch box B" as a descendant node for the first layer. Then, the hierarchical loss calculator 24 calculates a weighted sum of the losses calculated for each hierarchy, and outputs it as a hierarchical loss to the gradient calculator 25 .

The gradient calculator 25 calculates the gradient of the hierarchical loss input from the hierarchical loss calculator 24 and outputs it to the updater 26 . The update unit 26 updates the parameters of the feature extraction unit 21, the hyperbolic projection unit 22, and the hyperbolic classification unit 23 using the gradients.

(learning process)
FIG. 4 is a flowchart of learning processing by the learning device 100 of the first embodiment. This processing is realized by executing a program prepared in advance by the processor 12 shown in FIG. 1 and operating as each element shown in FIG.

First, the feature extraction unit 21 converts the input data into a pre-feature representation (step S11). Next, the hyperbolic projection unit 22 transforms the previous feature representation into a feature representation on the hyperbolic space (step S12). Next, the hyperbolic classifier 23 calculates the score of each class from the feature representation (step S13). Next, the hierarchical loss calculator 24 uses the knowledge of the hierarchical structure of classes to calculate the hierarchical loss from the score and correct label of each class (step S14). Next, the gradient calculator 25 calculates the gradient of the hierarchical loss (step S15). Next, the update unit 26 updates the parameters of the feature extraction unit 21, the hyperbolic projection unit 22, and the hyperbolic classification unit 23 based on the gradient (step S16). The above processing is repeated until a predetermined learning termination condition is satisfied, and the learning processing ends.

As described above, according to the learning device 100 of the first embodiment, it is possible to learn a highly accurate model with a small amount of learning data by using the knowledge of the hierarchical structure of classes.

[Inference device]
Next, the inference device of the first embodiment will be described.
(Hardware configuration)
The hardware configuration of the inference device 200 of the first embodiment is the same as that of the learning device 100 shown in FIG. 1, so the explanation is omitted.

(Functional configuration)
FIG. 5 is a block diagram showing the functional configuration of the inference device 200 of the first embodiment. The inference device 200 includes a feature extraction unit 21 , a hyperbolic projection unit 22 and a hyperbolic classification unit 23 . Parameters obtained by the above learning process are set in the feature extraction unit 21, the hyperbolic projection unit 22, and the hyperbolic classification unit 23. FIG.

Input data is input to the feature extraction unit 21 . This input data is data such as images that are actually subjected to class classification. The feature extraction unit 21 converts the input data into a pre-feature representation and outputs it to the hyperbolic projection unit 22 . The hyperbolic projection unit 22 converts the previous feature representation into a feature representation on the hyperbolic space, and outputs the feature representation to the hyperbolic classification unit 23 . The hyperbolic classifier 23 calculates scores for terminal classes in the hierarchical structure of classes and outputs them as inference results. Classification of the input data is thus performed.

(inference processing)
FIG. 6 is a flowchart of inference processing by the inference device 200 of the first embodiment. This processing is realized by executing a program prepared in advance by the processor 12 shown in FIG. 1 and operating as each element shown in FIG.

First, the feature extraction unit 21 converts the input data into a pre-feature representation (step S21). Next, the hyperbolic projection unit 22 transforms the previous feature representation into a feature representation on the hyperbolic space (step S22). Next, the hyperbolic classifier 23 calculates the score of each terminal class from the feature representation and outputs it as an inference result (step S23). The above processing is performed for each input data.

<Second embodiment>
Next, a second embodiment will be described. In the second embodiment, the hyperbolic classifier is also hierarchized using the knowledge of the hierarchical structure of classes.

[Learning device]
First, the learning device of the second embodiment will be described.
(Hardware configuration)
The hardware configuration of the learning device 100a of the second embodiment is the same as that of the learning device 100 shown in FIG. 1, so the description is omitted.

(Functional configuration)
FIG. 7 is a block diagram showing the functional configuration of the learning device 100a of the second embodiment. As can be seen from a comparison with the learning device 100 of the first embodiment shown in FIG. 2, the learning device 100a of the second embodiment has a hierarchical hyperbolic classifier 23x instead of the hyperbolic classifier 23.

The hierarchical hyperbolic classification unit 23x uses knowledge of the hierarchical structure of classes to output a score in each layer of the hierarchical structure from one feature representation in the hyperbolic space input from the hyperbolic projection unit 22. FIG. 8 shows an example of a method of sharing by a plurality of classifiers forming the hierarchical hyperbolic classifier 23x. Each of frames 91 to 93 indicated by thick lines indicates a portion corresponding to one classifier. In the example of FIG. 8, one classifier is provided for each layer in the hierarchical structure of classes. That is, the hierarchical hyperbolic classifier 23x is composed of three classifiers respectively corresponding to the first to third hierarchies. Each classifier is a classifier that identifies nodes belonging to the same hierarchy across subtrees. In this example, the hierarchical hyperbolic classifier 23x outputs classification results for each layer by three classifiers.

FIG. 9 shows another example of a method of sharing by a plurality of classifiers forming the hierarchical hyperbolic classifier 23x. Each of frames 91 to 93 indicated by thick lines indicates a portion corresponding to one classifier. In the example of FIG. 9, in the third hierarchy, as indicated by frame 93, a plurality of classifiers for identifying sibling nodes belonging to the same parent node are provided. That is, one classifier is prepared corresponding to the nodes ``Bento A'' to ``Bento C'' belonging to the same parent node ``Bento'', and the nodes ``Bread A'' to ``Bread C'' belonging to the same parent node ``Bread''. One classifier is prepared corresponding to . As for the third layer, one classifier is similarly prepared for sibling nodes belonging to all parent nodes other than "lunch box" and "bread" in the second layer, but illustration is omitted for convenience. . In this example, the hierarchical hyperbolic classifier 23x outputs classification results by each of the plurality of classifiers. That is, the hierarchical hyperbolic classifier 23x outputs the classification result corresponding to the frame 91 for the first hierarchy, outputs the classification result corresponding to the frame 92 for the second hierarchy, and outputs a plurality of classification results for the third hierarchy. A classification result corresponding to the frame 93 is output.

With any of the above configurations, the hierarchical hyperbolic classifier 23x calculates a classification result (score) for each layer and outputs it to the hierarchical loss calculator 24. The hierarchical loss calculator 24 calculates a loss for the classification result of each hierarchy input from the hierarchical hyperbolic classifier 23x, and outputs a weighted sum of them to the gradient calculator 25 as a hierarchical loss. As described above, in the second embodiment, the hierarchical hyperbolic classifier 23x outputs not only the score of the terminal class but also the score of the upper class. It is not necessary to integrate the scores of the terminal classes to calculate the scores of the upper hierarchy as in the case of the embodiment.

The configurations and operations of the feature extraction unit 21, the gradient calculation unit 25, and the update unit 26 in the learning device 100a of the second embodiment are the same as those of the first embodiment, so descriptions thereof will be omitted.

(learning process)
FIG. 10 is a flowchart of learning processing by the learning device 100a of the second embodiment. This processing is realized by executing a program prepared in advance by the processor 12 shown in FIG. 1 and operating as each element shown in FIG.

First, the feature extraction unit 21 converts the input data into a pre-feature representation (step S31). Next, the hyperbolic projection unit 22 transforms the previous feature representation into a feature representation on the hyperbolic space (step S32). Next, the hierarchical hyperbolic classifier 23x uses the knowledge of the hierarchical structure of classes to calculate the score of each class for each layer from the feature representation (step S33). Next, the hierarchical loss calculator 24 calculates the hierarchical loss from the score of each class and the correct label for each hierarchy (step S34). Next, the gradient calculator 25 calculates the gradient of the hierarchical loss (step S35). Next, the updating unit 26 updates the parameters of the feature extracting unit 21, the hyperbolic projecting unit 22, and the hierarchical hyperbolic classifying unit 23x based on the gradient (step S36). The above processing is repeated until a predetermined learning termination condition is satisfied, and the learning processing ends.

[Inference device]
Next, the inference device of the second embodiment will be described.
(Hardware configuration)
The hardware configuration of the inference device 200 is the same as that of the learning device 100 shown in FIG. 1, so description thereof will be omitted.

(Functional configuration)
FIG. 11 is a block diagram showing the functional configuration of the inference device 200a of the second embodiment. The inference device 200a includes a feature extraction unit 21, a hyperbolic projection unit 22, and a hierarchical hyperbolic classification unit 23x. Parameters obtained by the previous learning process are set in the feature extraction unit 21, the hyperbolic projection unit 22, and the hierarchical hyperbolic classification unit 23x.

Input data is input to the feature extraction unit 21 . This input data is data such as images that are actually subjected to class classification. The feature extraction unit 21 converts the input data into a pre-feature representation and outputs it to the hyperbolic projection unit 22 . The hyperbolic projection unit 22 converts the previous feature representation into a feature representation on the hyperbolic space, and outputs it to the hierarchical hyperbolic classification unit 23x. The hierarchical hyperbolic classifier 23x uses the knowledge of the hierarchical structure of classes to calculate the score for each class in each hierarchy and output it as an inference result. Classification of the input data is thus performed.

(inference processing)
FIG. 12 is a flowchart of inference processing by the inference device 200a of the second embodiment. This processing is realized by executing a program prepared in advance by the processor 12 shown in FIG. 1 and operating as each element shown in FIG.

First, the feature extraction unit 21 converts the input data into a pre-feature representation (step S41). Next, the hyperbolic projection unit 22 transforms the previous feature representation into a feature representation on the hyperbolic space (step S42). Next, the hierarchical hyperbolic classifier 23x utilizes the knowledge of the hierarchical structure of the classes, calculates the score of each class for each class from the feature representation, and outputs it as an inference result (step S43). The above processing is performed for each input data.

<Third Embodiment>
Next, a third embodiment will be described. In the third embodiment, the hyperbolic projection unit 22 is also hierarchized using knowledge of the hierarchical structure of classes.

[Learning device]
First, the learning device of the third embodiment will be described.
(Hardware configuration)
The hardware configuration of the learning device 100b of the third embodiment is the same as that of the learning device 100 of the first embodiment shown in FIG. 1, so description thereof will be omitted.

(Functional configuration)
FIG. 13 is a block diagram showing the functional configuration of the learning device 100b of the third embodiment. As can be seen from a comparison with the learning device 100a of the second embodiment shown in FIG. 7, the learning device 100b of the second embodiment has a hierarchical hyperbolic projection unit 22x instead of the hyperbolic projection unit 22.

The hierarchical hyperbolic projection unit 22x uses knowledge of the hierarchical structure of classes to output feature representations in each layer of the hierarchical structure from the pre-feature representation input from the feature extraction unit 21. FIG. 14 shows a schematic configuration of the hierarchical hyperbolic projection unit 22x. The hierarchical hyperbolic projection unit 22 x includes first to third embedding networks (NW) and

adders

31 and 32 .

Pre-feature expressions are input from the feature extraction unit 21 to the first to third embedding NWs. The first embedding NW uses knowledge of the hierarchical structure of the classes and outputs a vector indicating a point on the hyperbolic space of the class corresponding to the node of the first hierarchy as the feature representation C1.

The second embedding NW outputs the difference D1 between the feature representation C1 of the class corresponding to the parent node of the node and the feature representation of the node for the second layer node. The adder 31 then outputs the sum of the characteristic representation C1 of the parent node and the difference D1 as the characteristic representation C2 corresponding to that node in the second layer. Like the feature representation C1, the feature representation C2 is a vector indicating a point on the hyperbolic space.

Similarly, the third embedding NW outputs the difference D2 between the feature representation C2 of the class corresponding to the parent node of the node and the feature representation of the node for the node of the third layer. The adder 32 then outputs the sum of the characteristic representation C2 of the parent node and the difference D2 as the characteristic representation C3 corresponding to that node in the third layer. Like the feature representation C1, the feature representation C3 is a vector indicating a point on the hyperbolic space.

FIG. 15 is a diagram conceptually explaining the feature representations C1 to C3 and the differences D1 to D2. FIG. 15 shows the hyperbolic space in a two-dimensional space for convenience. Assuming the hierarchical structure of the classes shown in FIG. 3, circles () indicate the feature representation C1 of the class in the first layer, squares (▪) indicate the feature representation C2 of the class in the second layer, and triangles ( ▲) indicates the feature representation C3 of the class in the third layer. In this case, the difference D1 can be considered as a vector pointing from the first layer class "food" indicated by circles to the second layer classes "bento", "bread", and "rice ball" indicated by squares. Similarly, the difference D2 can be considered as a vector pointing from the second-layer class "Bread" indicated by squares to the third-layer classes "Bread A" to "Bread C" indicated by triangles. Mathematically, the "difference" is the tangent vector of the hyperbolic space in the feature representation of the class of the parent node, and the "sum" is realized by exponential mapping.

Thus, the hierarchical hyperbolic projection unit 22x outputs the feature representations C1 to C3 for each layer for one input data to the hierarchical hyperbolic classification unit 23x. The hierarchical hyperbolic classifier 23 x receives the feature representation for each layer, classifies it for each layer, and outputs the classification result to the hierarchical loss calculator 24 .

The configurations and operations of the feature extraction unit 21, the gradient calculation unit 25, and the update unit 26 in the learning device 100b of the third embodiment are the same as those of the first embodiment, so descriptions thereof will be omitted.

(learning process)
FIG. 16 is a flowchart of learning processing by the learning device 100b of the third embodiment. This processing is realized by executing a program prepared in advance by the processor 12 shown in FIG. 1 and operating as each element shown in FIG.

First, the feature extraction unit 21 converts the input data into a pre-feature representation (step S51). Next, the hierarchical hyperbolic projection unit 22x converts the previous feature representation into a feature representation on the hyperbolic space for each layer (step S52). Next, the hierarchical hyperbolic classification unit 23x calculates the score of each class for each layer from the feature representation for each layer input from the hierarchical hyperbolic projection unit 22x (step S53). Next, the hierarchical loss calculator 24 calculates a hierarchical loss from the score of each class for each hierarchy and the correct label (step S54). Next, the gradient calculator 25 calculates the gradient of the hierarchical loss (step S55). Next, the update unit 26 updates the parameters of the feature extraction unit 21, the hierarchical hyperbolic projection unit 22x, and the hierarchical hyperbolic classification unit 23x based on the gradient (step S56). The above processing is repeated until a predetermined learning termination condition is satisfied, and the learning processing ends.

[Inference device]
Next, the inference device 200b of the third embodiment will be described.
(Hardware configuration)
Since the hardware configuration of the inference device 200b is the same as that of the learning device 100 shown in FIG. 1, the description thereof is omitted.

(Functional configuration)
FIG. 17 is a block diagram showing the functional configuration of the inference device 200b of the third embodiment. The inference device 200b includes a feature extraction unit 21, a hierarchical hyperbolic projection unit 22x, and a hierarchical hyperbolic classification unit 23x. Parameters obtained by the above learning process are set in the feature extraction unit 21, the hierarchical hyperbolic projection unit 22x, and the hierarchical hyperbolic classification unit 23x.

Input data is input to the feature extraction unit 21 . This input data is data such as images that are actually subjected to class classification. The feature extraction unit 21 converts the input data into a pre-feature representation and outputs it to the hierarchical hyperbolic projection unit 22x. The hierarchical hyperbolic projection unit 22x uses the knowledge of the hierarchical structure of classes to convert the previous feature representation into a feature representation on the hyperbolic space for each layer, and outputs the feature representation to the hierarchical hyperbolic classification unit 23x. The hierarchical hyperbolic classifier 23x calculates a score for each class in each layer based on the feature representation for each layer, and outputs it as an inference result. Classification of the input data is thus performed.

(inference processing)
FIG. 18 is a flowchart of inference processing by the inference device 200b of the third embodiment. This processing is realized by executing a program prepared in advance by the processor 12 shown in FIG. 1 and operating as each element shown in FIG.

First, the feature extraction unit 21 converts the input data into a pre-feature representation (step S61). Next, the hierarchical hyperbolic projection unit 22x converts the previous feature representation into a feature representation on the hyperbolic space for each layer (step S62). Next, the hierarchical hyperbolic classifier 23x calculates the score of each class for each layer from the feature representation of each layer, and outputs it as an inference result (step S63). The above processing is performed for each input data.

<Fourth Embodiment>
FIG. 19 is a block diagram showing the functional configuration of the learning device of the fourth embodiment. The learning device 70 includes feature extraction means 71 , projection means 72 , classification means 73 , loss calculation means 74 and update means 75 .

FIG. 20 is a flowchart of learning processing by the learning device 70 of the fourth embodiment. First, the feature extraction means 71 converts the input data into the first feature representation (step S71). Next, the projection means 72 transforms the first feature representation into a second feature representation indicating a point on the hyperbolic space (step S72). Next, the classification means 73 performs classification based on the second feature representation, and outputs a score indicating the possibility that the input data belongs to each class (step S73). Next, the loss calculation means 74 calculates the hierarchical loss based on the knowledge of the hierarchical structure to which each class belongs, the correct label assigned to the input data, and the score (step S74). Next, the updating means 75 updates the parameters of the feature extracting means, the projecting means and the classifying means based on the hierarchical loss (step S75). According to the fourth embodiment, by using the knowledge of the hierarchical structure of classes, it is possible to generate a highly accurate model even with a small amount of input data.

<Fifth Embodiment>
FIG. 21 is a block diagram showing the functional configuration of the inference device of the fifth embodiment. The inference device 80 comprises feature extraction means 81 , projection means 82 and classification means 83 .

FIG. 22 is a flowchart of inference processing by the inference device 80 of the fifth embodiment. First, the feature extraction means 81 converts the input data into the first feature representation (step S81). Next, the projection means 82 transforms the first feature representation into a second feature representation indicating a point on the hyperbolic space (step S82). Next, the classification means 83 performs classification based on the second feature representation, and uses knowledge of the hierarchical structure to which each class belongs to calculate a score indicating the possibility that the input data belongs to each class for each hierarchy. (step S83). According to the fourth embodiment, it is possible to perform inference with high accuracy using a model learned using knowledge of the hierarchical structure of classes.

Some or all of the above embodiments can also be described as the following additional remarks, but are not limited to the following.

(Appendix 1)
a feature extraction means for converting input data into a first feature representation;
projection means for transforming the first feature representation into a second feature representation representing a point on hyperbolic space;
Classification means for performing classification based on the second feature representation and outputting a score indicating the possibility that the input data belongs to each class;
loss calculation means for calculating a hierarchical loss based on the knowledge of the hierarchical structure to which each class belongs, the correct label assigned to the input data, and the score;
updating means for updating parameters of the feature extracting means, the projecting means and the classifying means based on the hierarchical loss;
A learning device with

(Appendix 2)
The classification means outputs a score for the terminal class of the hierarchical structure,
The loss calculation means according to appendix 1, wherein the scores of the terminal classes are integrated to calculate the losses of the layers higher than the layer of the terminal class, and the weighted sum of the losses of each layer is calculated as the hierarchical loss. learning device.

(Appendix 3)
The loss calculation means calculates a loss that maximizes the score of the correct class for the hierarchy of the terminal class, and for the hierarchy higher than the hierarchy of the terminal class, the class to which the correct class belongs among the classes of the hierarchy 3. The learning device of claim 2, which calculates a loss that maximizes the score of .

(Appendix 4)
The classifying means outputs the score for each hierarchy using the knowledge of the hierarchical structure,
4. The learning device according to any one of supplementary notes 1 to 3, wherein the loss calculation means calculates the hierarchical loss based on the score output for each layer.

(Appendix 5)
the projection means outputs the second feature representation for each hierarchy based on the knowledge of the hierarchical structure;
5. The learning device according to appendix 4, wherein the classifying means outputs the score for each layer based on the second feature representation output for each layer.

(Appendix 6)
converting the input data into a first feature representation using the feature extraction means;
transforming the first feature representation into a second feature representation representing a point on hyperbolic space using a projection means;
Classifying based on the second feature representation using classifying means and outputting a score indicating the likelihood that the input data belongs to each class;
calculating a hierarchical loss based on the knowledge of the hierarchical structure to which each class belongs, the correct label assigned to the input data, and the score;
A learning method for updating parameters of the feature extraction means, the projection means and the classification means based on the hierarchical loss.

(Appendix 7)
converting the input data into a first feature representation using the feature extraction means;
transforming the first feature representation into a second feature representation representing a point on hyperbolic space using a projection means;
Classifying based on the second feature representation using classifying means and outputting a score indicating the likelihood that the input data belongs to each class;
calculating a hierarchical loss based on the knowledge of the hierarchical structure to which each class belongs, the correct label assigned to the input data, and the score;
A recording medium recording a program for causing a computer to execute processing for updating parameters of the feature extraction means, the projection means, and the classification means based on the hierarchical loss.

(Appendix 8)
a feature extraction means for converting input data into a first feature representation;
projection means for transforming the first feature representation into a second feature representation representing a point on hyperbolic space;
Classification means for performing classification based on the second feature representation, and using knowledge of the hierarchical structure to which each class belongs to calculate a score indicating the possibility that the input data belongs to each class for each hierarchy;
A reasoning device with

(Appendix 9)
the projection means outputs the second feature representation for each hierarchy based on the knowledge of the hierarchical structure;
5. The learning device according to any one of supplementary notes 1 to 4, wherein the classifying means outputs the score for each layer based on the second feature representation output for each layer.

(Appendix 10)
transforming the input data into a first feature representation;
transforming the first feature representation into a second feature representation representing a point on hyperbolic space;
An inference method for performing classification based on the second feature representation, and using knowledge of a hierarchical structure to which each class belongs to calculate, for each hierarchy, a score indicating the possibility that the input data belongs to each class.

(Appendix 11)
transforming the input data into a first feature representation;
transforming the first feature representation into a second feature representation representing a point on hyperbolic space;
Classification is performed based on the second feature representation, and a computer is caused to execute processing for calculating, for each hierarchy, a score indicating the possibility that the input data belongs to each class, using knowledge of the hierarchical structure to which each class belongs. A recording medium that records a program.

Although the present disclosure has been described above with reference to the embodiments and examples, the present disclosure is not limited to the above embodiments and examples. Various changes that can be understood by those skilled in the art can be made to the configuration and details of the present disclosure within the scope of the present disclosure.

21 feature extraction unit 22 hyperbolic projection unit 22x hierarchical hyperbolic projection unit 23 hyperbolic classification unit 23x hierarchical hyperbolic classification unit 24 hierarchical loss calculation unit 25 gradient calculation unit 26

update unit

100, 100a, 100b

learning device

200, 200a, 200b inference device

Claims

a feature extraction means for converting input data into a first feature representation;
projection means for transforming the first feature representation into a second feature representation representing a point on hyperbolic space;
Classification means for performing classification based on the second feature representation and outputting a score indicating the possibility that the input data belongs to each class;
loss calculation means for calculating a hierarchical loss based on the knowledge of the hierarchical structure to which each class belongs, the correct label assigned to the input data, and the score;
updating means for updating parameters of the feature extracting means, the projecting means and the classifying means based on the hierarchical loss;
A learning device with
The classification means outputs a score for the terminal class of the hierarchical structure,
2. The loss calculation means according to claim 1, wherein the scores of the terminal classes are integrated to calculate the losses of the layers higher than the layer of the terminal class, and the weighted sum of the losses of each layer is calculated as the hierarchical loss. learning device.
The loss calculation means calculates a loss that maximizes the score of the correct class for the hierarchy of the terminal class, and calculates the score of the class to which the correct class belongs in the hierarchy higher than the hierarchy of the terminal class. 3. The learning device according to claim 2, wherein the maximizing loss is calculated.
The classifying means outputs the score for each hierarchy using the knowledge of the hierarchical structure,
4. The learning device according to any one of claims 1 to 3, wherein said loss calculation means calculates said hierarchical loss based on said score output for each layer.
the projection means outputs the second feature representation for each hierarchy based on the knowledge of the hierarchical structure;
5. The learning device according to claim 4, wherein the classifying means outputs the score for each layer based on the second feature representation output for each layer.
converting the input data into a first feature representation using the feature extraction means;
transforming the first feature representation into a second feature representation representing a point on hyperbolic space using a projection means;
Classifying based on the second feature representation using classifying means and outputting a score indicating the likelihood that the input data belongs to each class;
calculating a hierarchical loss based on the knowledge of the hierarchical structure to which each class belongs, the correct label assigned to the input data, and the score;
A learning method for updating parameters of the feature extraction means, the projection means and the classification means based on the hierarchical loss.
converting the input data into a first feature representation using the feature extraction means;
transforming the first feature representation into a second feature representation representing a point on hyperbolic space using a projection means;
Classifying based on the second feature representation using classifying means and outputting a score indicating the likelihood that the input data belongs to each class;
calculating a hierarchical loss based on the knowledge of the hierarchical structure to which each class belongs, the correct label assigned to the input data, and the score;
A recording medium recording a program for causing a computer to execute a process of updating parameters of the feature extracting means, the projecting means and the classifying means based on the hierarchical loss.
a feature extraction means for converting input data into a first feature representation;
projection means for transforming the first feature representation into a second feature representation representing a point on hyperbolic space;
Classification means for performing classification based on the second feature representation, and using knowledge of the hierarchical structure to which each class belongs to calculate a score indicating the possibility that the input data belongs to each class for each hierarchy;
A reasoning device with
the projection means outputs the second feature representation for each hierarchy based on the knowledge of the hierarchical structure;
5. The learning device according to any one of claims 1 to 4, wherein the classifying means outputs the score for each layer based on the second characteristic representation output for each layer.
transforming the input data into a first feature representation;
transforming the first feature representation into a second feature representation representing a point on hyperbolic space;
An inference method for performing classification based on the second feature representation, and using knowledge of a hierarchical structure to which each class belongs to calculate, for each hierarchy, a score indicating the possibility that the input data belongs to each class.
transforming the input data into a first feature representation;
transforming the first feature representation into a second feature representation representing a point on hyperbolic space;
Classification is performed based on the second feature representation, and a computer is caused to execute processing for calculating, for each hierarchy, a score indicating the possibility that the input data belongs to each class, using knowledge of the hierarchical structure to which each class belongs. A recording medium that records a program.