WO2022185529A1 - Dispositif d'apprentissage, procédé d'apprentissage, dispositif d'inférence, procédé d'inférence et support d'enregistrement - Google Patents

Dispositif d'apprentissage, procédé d'apprentissage, dispositif d'inférence, procédé d'inférence et support d'enregistrement Download PDF

Info

Publication number
WO2022185529A1
WO2022185529A1 PCT/JP2021/008691 JP2021008691W WO2022185529A1 WO 2022185529 A1 WO2022185529 A1 WO 2022185529A1 JP 2021008691 W JP2021008691 W JP 2021008691W WO 2022185529 A1 WO2022185529 A1 WO 2022185529A1
Authority
WO
WIPO (PCT)
Prior art keywords
feature representation
class
hierarchical
input data
feature
Prior art date
Application number
PCT/JP2021/008691
Other languages
English (en)
Japanese (ja)
Inventor
周平 吉田
Original Assignee
日本電気株式会社
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 日本電気株式会社 filed Critical 日本電気株式会社
Priority to PCT/JP2021/008691 priority Critical patent/WO2022185529A1/fr
Priority to JP2023503320A priority patent/JPWO2022185529A5/ja
Publication of WO2022185529A1 publication Critical patent/WO2022185529A1/fr

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N20/00Machine learning

Definitions

  • This disclosure relates to a learning method for a machine learning model.
  • Patent Literature 1 discloses a learning method for identifying categories having a hierarchical structure.
  • One purpose of the present disclosure is to generate a low-cost and highly accurate machine learning model.
  • a learning device includes: a feature extraction means for converting input data into a first feature representation; projection means for transforming the first feature representation into a second feature representation representing a point on hyperbolic space; Classification means for performing classification based on the second feature representation and outputting a score indicating the possibility that the input data belongs to each class; loss calculation means for calculating a hierarchical loss based on the knowledge of the hierarchical structure to which each class belongs, the correct label assigned to the input data, and the score; updating means for updating parameters of the feature extracting means, the projecting means and the classifying means based on the hierarchical loss.
  • a learning method comprises: converting the input data into a first feature representation using the feature extraction means; transforming the first feature representation into a second feature representation representing a point on hyperbolic space using a projection means; Classifying based on the second feature representation using classifying means and outputting a score indicating the likelihood that the input data belongs to each class; calculating a hierarchical loss based on the knowledge of the hierarchical structure to which each class belongs, the correct label assigned to the input data, and the score; Based on the hierarchical loss, parameters of the feature extraction means, the projection means and the classifier are updated.
  • the recording medium comprises transforming the input data into a first feature representation using the classifier; transforming the first feature representation into a second feature representation representing a point on hyperbolic space using a projection means; Classifying based on the second feature representation using classifying means and outputting a score indicating the likelihood that the input data belongs to each class; calculating a hierarchical loss based on the knowledge of the hierarchical structure to which each class belongs, the correct label assigned to the input data, and the score; A program for causing a computer to execute processing for updating parameters of the feature extracting means, the projecting means and the classifying means based on the hierarchical loss is recorded.
  • an inference device includes: a feature extraction means for converting input data into a first feature representation; projection means for transforming the first feature representation into a second feature representation representing a point on hyperbolic space; a classification means for performing classification based on the second feature representation, and using knowledge of a hierarchical structure to which each class belongs to calculate a score indicating a possibility that the input data belongs to each class for each hierarchy.
  • an inference method includes: transforming the input data into a first feature representation; transforming the first feature representation into a second feature representation representing a point on hyperbolic space; Classification is performed based on the second feature representation, and a score indicating the possibility that the input data belongs to each class is calculated for each hierarchy using knowledge of the hierarchical structure to which each class belongs.
  • the recording medium comprises transforming the input data into a first feature representation; transforming the first feature representation into a second feature representation representing a point on hyperbolic space; Classification is performed based on the second feature representation, and a computer is caused to execute processing for calculating, for each hierarchy, a score indicating the possibility that the input data belongs to each class, using knowledge of the hierarchical structure to which each class belongs. Record the program.
  • FIG. 2 shows another example of division by multiple classifiers forming a hierarchical hyperbolic classifier; 9 is a flowchart of learning processing by the learning device of the second embodiment; FIG. 12 is a block diagram showing the functional configuration of the inference device of the second embodiment; FIG. 9 is a flowchart of inference processing by the inference device of the second embodiment; FIG. 11 is a block diagram showing the functional configuration of a learning device according to a third embodiment; FIG. 4 shows a schematic configuration of a hierarchical hyperbolic projection unit; FIG. 2 is a diagram conceptually explaining feature representations and differences; 10 is a flowchart of learning processing by the learning device of the third embodiment; FIG.
  • FIG. 11 is a block diagram showing the functional configuration of an inference device according to a third embodiment;
  • FIG. 10 is a flowchart of inference processing by the inference device of the third embodiment;
  • FIG. 11 is a block diagram showing the functional configuration of a learning device according to a fourth embodiment;
  • FIG. 10 is a flowchart of learning processing by the learning device of the fourth embodiment;
  • FIG. 14 is a block diagram showing the functional configuration of an inference device according to a fifth embodiment;
  • FIG. 12 is a flowchart of inference processing by the inference device of the fifth embodiment;
  • FIG. 1 is a block diagram showing the hardware configuration of the learning device 100 of the first embodiment.
  • the learning device 100 includes an interface (I/F) 11 , a processor 12 , a memory 13 , a recording medium 14 and a database (DB) 15 .
  • the interface 11 performs data input/output with an external device. Specifically, data with correct answers used for learning is input through the interface 11 .
  • the processor 12 is a computer such as a CPU (Central Processing Unit), and controls the entire study device 100 by executing a program prepared in advance.
  • the processor 12 may be a GPU (Graphics Processing Unit) or an FPGA (Field-Programmable Gate Array).
  • the processor 12 executes learning processing, which will be described later.
  • the memory 13 is composed of ROM (Read Only Memory), RAM (Random Access Memory), and the like. Memory 13 is also used as a working memory during execution of various processes by processor 12 .
  • the recording medium 14 is a non-volatile, non-temporary recording medium such as a disk-shaped recording medium or semiconductor memory, and is configured to be detachable from the learning device 100 .
  • the recording medium 14 records various programs executed by the processor 12 .
  • DB15 memorize
  • FIG. 2 is a block diagram showing the functional configuration of the learning device 100 of the first embodiment.
  • the learning device 100 includes a feature extraction unit 21 , a hyperbolic projection unit 22 , a hyperbolic classification unit 23 , a hierarchical loss calculation unit 24 , a gradient calculation unit 25 and an update unit 26 .
  • Data with correct answers include input data and correct labels corresponding to the input data.
  • the input data is an image used for learning
  • the correct label is information indicating the class of the object included in the image.
  • input data is input to the feature extraction unit 21 and correct labels are input to the hierarchical loss calculation unit 24 .
  • the feature extraction unit 21 converts the input data into a pre-feature representation.
  • the feature representation output by the feature extraction unit 21 is called a "pre-feature representation” in the sense of distinguishing it from the feature representation output by the hyperbolic projection unit 22, which will be described later.
  • Both the “previous feature representation” and the “feature representation” are information representing features of the input data.
  • the feature extraction unit 21 is configured by a deep convolutional neural network (CNN) or the like, and uses a sequence (vector) of real numbers representing features of an input image as a pre-feature representation. Output to the hyperbolic projection unit 22 .
  • CNN deep convolutional neural network
  • the hyperbolic projection unit 22 converts the pre-feature representation into a feature representation.
  • feature representation is a point on some manifold, and selecting a specific projection part is equivalent to selecting a manifold (feature space) to which the feature representation belongs.
  • feature space manifold
  • this embodiment uses a hyperbolic space as the feature amount space.
  • this embodiment obtains a highly accurate model with a small amount of training data by using knowledge about the hierarchical structure of classes, but the hierarchical structure (tree structure) expands exponentially.
  • the hierarchical structure (tree structure) expands exponentially.
  • Euclidean space and hypersphere are used as feature space, but Euclidean space and hypersphere expand only polynomially, so they are not suitable for embedding tree structures. That is, when a hierarchical structure is expressed on Euclidean space or hypersphere, distortion is inevitable in low dimensions. Therefore, in order to express a hierarchical structure (tree structure) without distortion on the Euclidean space or hypersphere, it is necessary to use a feature amount space exponentially higher in terms of the number of classes.
  • the hyperbolic space is used as the feature amount space.
  • a tree structure can be embedded efficiently in a hyperbolic space.
  • a hyperbolic space that expands exponentially can embed a tree structure without distortion even in two dimensions. Therefore, the hyperbolic projection unit 22 converts the previous feature representation into a feature representation on the hyperbolic space, and outputs the feature representation to the hyperbolic classification unit 23 .
  • the feature representation is also a sequence (vector) of real numbers, but can be regarded as coordinate values on the hyperbolic space, which is the feature quantity space.
  • the hyperbolic projection unit 22 can use Poincaré projection, Lorentz projection, etc. according to a specific hyperbolic space model.
  • the hyperbolic classifier 23 classifies one feature expression on the hyperbolic space output by the hyperbolic projection unit 22, and outputs the score of each class obtained for the feature expression to the hierarchical loss calculator 24. do. Note that the hyperbolic classifier 23 outputs only the scores of terminal classes in the hierarchical structure of classes.
  • a hyperbolic hyperplane classifier or a hyperbolic nearest neighbor classifier can be used as the hyperbolic classifier 23, a hyperbolic hyperplane classifier or a hyperbolic nearest neighbor classifier can be used.
  • a hyperbolic hyperplane classifier is a classifier that extends a linear classifier to a hyperbolic space and uses a hyperbolic plane in the hyperbolic space as a discrimination plane.
  • a hyperbolic nearest neighbor classifier is a classifier that follows the nearest neighbor rule on hyperbolic space.
  • the specific expression of the hyperbolic classification unit 23 is determined by the hyperbolic space model selected by the hyperbolic projection unit 22 .
  • the hierarchical loss calculator 24 calculates a loss function from the score of each class input from the hyperbolic classifier 23 and the correct labels included in the data with correct answers. At this time, the hierarchical loss calculator 24 uses knowledge of the hierarchical structure of the class to be classified. Specifically, the hierarchical loss calculation unit 24 calculates the score for each layer from the score for each class output by the hyperbolic classifier 23, so that the score for each layer can predict the correct class for each layer. Calculate the loss for each layer. Note that the hierarchical loss calculator 24 can use a general loss function for multilevel classification, such as cross-entropy loss.
  • FIG. 3 shows an example of a hierarchical structure of classes.
  • This example shows a hierarchical structure (tree structure) with a root node of "merchandise” and has first to third hierarchies.
  • the first hierarchy includes three classes “food”, “beverage” and “pharmaceutical” as child nodes of "merchandise”.
  • the second hierarchy contains three classes "Bento”, “Bread” and “Rice ball” as child nodes of "Food” and three classes “Tea”, “Juice” and “Water” as child nodes of "Beverage”. including.
  • the third layer includes “Bento A” to “Bento C” as child nodes of “Bento”, “Bread A” to “Bread C” as child nodes of “Bread”, and “Onigiri” as child nodes.
  • “Rice ball A” to “Rice ball C” are included as nodes.
  • illustration of the second and third layers of pharmaceuticals and the third layer of beverages is omitted.
  • the hyperbolic classifier 23 outputs only the scores of terminal classes in the hierarchical structure of classes.
  • the hyperbolic classifying unit 23 selects only terminal class scores such as "lunch box A” to “lunch box C", “bread A” to “bread C”, and “rice ball A” to “rice ball C”.
  • terminal class scores such as "lunch box A” to "lunch box C”
  • bread A” to "bread C” and "rice ball A” to "rice ball C”.
  • the hierarchical loss calculation unit 24 calculates the loss that maximizes the score of the correct class "Bento B" for the third layer, which is the layer of the terminal class, and sets it as the loss of the third layer. .
  • the hierarchical loss calculation unit 24 integrates the scores of the terminal classes that are descendants of each node and uses them for loss calculation when calculating the loss of the hierarchy higher than the terminal class. Specifically, if the score output by the hyperbolic classifier 23 is the probability of the terminal class, the score of each class in the higher hierarchy is the sum of the probabilities of the terminal classes that are descendants of the class.
  • the score of "lunch box” in the second layer is the sum of the scores of its child nodes “lunch box A” to “lunch box C”.
  • the score of "bread” in the second layer is the sum of the scores of its child nodes “bread A” to “bread C”
  • the score of "rice ball” in the second layer is the sum of the scores of its child nodes “rice ball A” to “rice ball C”.
  • the score of "Food” in the first layer is the terminal class "Bento A” to “Bento C", “Bread A” to “Bread C”, and “Rice ball A” to “Rice ball C”, which are the grandchild nodes of terminal classes. is the sum of the scores of Similarly, the score of "beverage” and "pharmaceutical” in the first hierarchy is also the sum of the scores of terminal classes that are grandchild nodes.
  • the hierarchical loss calculation unit 24 calculates a loss that maximizes the score of "food” having the correct class "lunch box B" as a descendant node for the first layer. Then, the hierarchical loss calculator 24 calculates a weighted sum of the losses calculated for each hierarchy, and outputs it as a hierarchical loss to the gradient calculator 25 .
  • the gradient calculator 25 calculates the gradient of the hierarchical loss input from the hierarchical loss calculator 24 and outputs it to the updater 26 .
  • the update unit 26 updates the parameters of the feature extraction unit 21, the hyperbolic projection unit 22, and the hyperbolic classification unit 23 using the gradients.
  • FIG. 4 is a flowchart of learning processing by the learning device 100 of the first embodiment. This processing is realized by executing a program prepared in advance by the processor 12 shown in FIG. 1 and operating as each element shown in FIG.
  • the feature extraction unit 21 converts the input data into a pre-feature representation (step S11).
  • the hyperbolic projection unit 22 transforms the previous feature representation into a feature representation on the hyperbolic space (step S12).
  • the hyperbolic classifier 23 calculates the score of each class from the feature representation (step S13).
  • the hierarchical loss calculator 24 uses the knowledge of the hierarchical structure of classes to calculate the hierarchical loss from the score and correct label of each class (step S14).
  • the gradient calculator 25 calculates the gradient of the hierarchical loss (step S15).
  • the update unit 26 updates the parameters of the feature extraction unit 21, the hyperbolic projection unit 22, and the hyperbolic classification unit 23 based on the gradient (step S16). The above processing is repeated until a predetermined learning termination condition is satisfied, and the learning processing ends.
  • the learning device 100 of the first embodiment it is possible to learn a highly accurate model with a small amount of learning data by using the knowledge of the hierarchical structure of classes.
  • the inference device 200 of the first embodiment is the same as that of the learning device 100 shown in FIG. 1, so the explanation is omitted.
  • FIG. 5 is a block diagram showing the functional configuration of the inference device 200 of the first embodiment.
  • the inference device 200 includes a feature extraction unit 21 , a hyperbolic projection unit 22 and a hyperbolic classification unit 23 . Parameters obtained by the above learning process are set in the feature extraction unit 21, the hyperbolic projection unit 22, and the hyperbolic classification unit 23.
  • FIG. 21 is a block diagram showing the functional configuration of the inference device 200 of the first embodiment.
  • the inference device 200 includes a feature extraction unit 21 , a hyperbolic projection unit 22 and a hyperbolic classification unit 23 . Parameters obtained by the above learning process are set in the feature extraction unit 21, the hyperbolic projection unit 22, and the hyperbolic classification unit 23.
  • Input data is input to the feature extraction unit 21 .
  • This input data is data such as images that are actually subjected to class classification.
  • the feature extraction unit 21 converts the input data into a pre-feature representation and outputs it to the hyperbolic projection unit 22 .
  • the hyperbolic projection unit 22 converts the previous feature representation into a feature representation on the hyperbolic space, and outputs the feature representation to the hyperbolic classification unit 23 .
  • the hyperbolic classifier 23 calculates scores for terminal classes in the hierarchical structure of classes and outputs them as inference results. Classification of the input data is thus performed.
  • FIG. 6 is a flowchart of inference processing by the inference device 200 of the first embodiment. This processing is realized by executing a program prepared in advance by the processor 12 shown in FIG. 1 and operating as each element shown in FIG.
  • the feature extraction unit 21 converts the input data into a pre-feature representation (step S21).
  • the hyperbolic projection unit 22 transforms the previous feature representation into a feature representation on the hyperbolic space (step S22).
  • the hyperbolic classifier 23 calculates the score of each terminal class from the feature representation and outputs it as an inference result (step S23). The above processing is performed for each input data.
  • the hyperbolic classifier is also hierarchized using the knowledge of the hierarchical structure of classes.
  • FIG. 7 is a block diagram showing the functional configuration of the learning device 100a of the second embodiment.
  • the learning device 100a of the second embodiment has a hierarchical hyperbolic classifier 23x instead of the hyperbolic classifier 23.
  • the hierarchical hyperbolic classification unit 23x uses knowledge of the hierarchical structure of classes to output a score in each layer of the hierarchical structure from one feature representation in the hyperbolic space input from the hyperbolic projection unit 22.
  • FIG. 8 shows an example of a method of sharing by a plurality of classifiers forming the hierarchical hyperbolic classifier 23x.
  • Each of frames 91 to 93 indicated by thick lines indicates a portion corresponding to one classifier.
  • one classifier is provided for each layer in the hierarchical structure of classes. That is, the hierarchical hyperbolic classifier 23x is composed of three classifiers respectively corresponding to the first to third hierarchies.
  • Each classifier is a classifier that identifies nodes belonging to the same hierarchy across subtrees.
  • the hierarchical hyperbolic classifier 23x outputs classification results for each layer by three classifiers.
  • FIG. 9 shows another example of a method of sharing by a plurality of classifiers forming the hierarchical hyperbolic classifier 23x.
  • Each of frames 91 to 93 indicated by thick lines indicates a portion corresponding to one classifier.
  • a plurality of classifiers for identifying sibling nodes belonging to the same parent node are provided in the third hierarchy. That is, one classifier is prepared corresponding to the nodes ⁇ Bento A'' to ⁇ Bento C'' belonging to the same parent node ⁇ Bento'', and the nodes ⁇ Bread A'' to ⁇ Bread C'' belonging to the same parent node ⁇ Bread''.
  • One classifier is prepared corresponding to .
  • the hierarchical hyperbolic classifier 23x outputs classification results by each of the plurality of classifiers. That is, the hierarchical hyperbolic classifier 23x outputs the classification result corresponding to the frame 91 for the first hierarchy, outputs the classification result corresponding to the frame 92 for the second hierarchy, and outputs a plurality of classification results for the third hierarchy. A classification result corresponding to the frame 93 is output.
  • the hierarchical hyperbolic classifier 23x calculates a classification result (score) for each layer and outputs it to the hierarchical loss calculator 24.
  • the hierarchical loss calculator 24 calculates a loss for the classification result of each hierarchy input from the hierarchical hyperbolic classifier 23x, and outputs a weighted sum of them to the gradient calculator 25 as a hierarchical loss.
  • the hierarchical hyperbolic classifier 23x outputs not only the score of the terminal class but also the score of the upper class. It is not necessary to integrate the scores of the terminal classes to calculate the scores of the upper hierarchy as in the case of the embodiment.
  • the configurations and operations of the feature extraction unit 21, the gradient calculation unit 25, and the update unit 26 in the learning device 100a of the second embodiment are the same as those of the first embodiment, so descriptions thereof will be omitted.
  • FIG. 10 is a flowchart of learning processing by the learning device 100a of the second embodiment. This processing is realized by executing a program prepared in advance by the processor 12 shown in FIG. 1 and operating as each element shown in FIG.
  • the feature extraction unit 21 converts the input data into a pre-feature representation (step S31).
  • the hyperbolic projection unit 22 transforms the previous feature representation into a feature representation on the hyperbolic space (step S32).
  • the hierarchical hyperbolic classifier 23x uses the knowledge of the hierarchical structure of classes to calculate the score of each class for each layer from the feature representation (step S33).
  • the hierarchical loss calculator 24 calculates the hierarchical loss from the score of each class and the correct label for each hierarchy (step S34).
  • the gradient calculator 25 calculates the gradient of the hierarchical loss (step S35).
  • the updating unit 26 updates the parameters of the feature extracting unit 21, the hyperbolic projecting unit 22, and the hierarchical hyperbolic classifying unit 23x based on the gradient (step S36). The above processing is repeated until a predetermined learning termination condition is satisfied, and the learning processing ends.
  • the inference device 200 is the same as that of the learning device 100 shown in FIG. 1, so description thereof will be omitted.
  • FIG. 11 is a block diagram showing the functional configuration of the inference device 200a of the second embodiment.
  • the inference device 200a includes a feature extraction unit 21, a hyperbolic projection unit 22, and a hierarchical hyperbolic classification unit 23x. Parameters obtained by the previous learning process are set in the feature extraction unit 21, the hyperbolic projection unit 22, and the hierarchical hyperbolic classification unit 23x.
  • Input data is input to the feature extraction unit 21 .
  • This input data is data such as images that are actually subjected to class classification.
  • the feature extraction unit 21 converts the input data into a pre-feature representation and outputs it to the hyperbolic projection unit 22 .
  • the hyperbolic projection unit 22 converts the previous feature representation into a feature representation on the hyperbolic space, and outputs it to the hierarchical hyperbolic classification unit 23x.
  • the hierarchical hyperbolic classifier 23x uses the knowledge of the hierarchical structure of classes to calculate the score for each class in each hierarchy and output it as an inference result. Classification of the input data is thus performed.
  • FIG. 12 is a flowchart of inference processing by the inference device 200a of the second embodiment. This processing is realized by executing a program prepared in advance by the processor 12 shown in FIG. 1 and operating as each element shown in FIG.
  • the feature extraction unit 21 converts the input data into a pre-feature representation (step S41).
  • the hyperbolic projection unit 22 transforms the previous feature representation into a feature representation on the hyperbolic space (step S42).
  • the hierarchical hyperbolic classifier 23x utilizes the knowledge of the hierarchical structure of the classes, calculates the score of each class for each class from the feature representation, and outputs it as an inference result (step S43). The above processing is performed for each input data.
  • the hyperbolic projection unit 22 is also hierarchized using knowledge of the hierarchical structure of classes.
  • FIG. 13 is a block diagram showing the functional configuration of the learning device 100b of the third embodiment.
  • the learning device 100b of the second embodiment has a hierarchical hyperbolic projection unit 22x instead of the hyperbolic projection unit 22.
  • the hierarchical hyperbolic projection unit 22x uses knowledge of the hierarchical structure of classes to output feature representations in each layer of the hierarchical structure from the pre-feature representation input from the feature extraction unit 21.
  • FIG. 14 shows a schematic configuration of the hierarchical hyperbolic projection unit 22x.
  • the hierarchical hyperbolic projection unit 22 x includes first to third embedding networks (NW) and adders 31 and 32 .
  • Pre-feature expressions are input from the feature extraction unit 21 to the first to third embedding NWs.
  • the first embedding NW uses knowledge of the hierarchical structure of the classes and outputs a vector indicating a point on the hyperbolic space of the class corresponding to the node of the first hierarchy as the feature representation C1.
  • the second embedding NW outputs the difference D1 between the feature representation C1 of the class corresponding to the parent node of the node and the feature representation of the node for the second layer node.
  • the adder 31 then outputs the sum of the characteristic representation C1 of the parent node and the difference D1 as the characteristic representation C2 corresponding to that node in the second layer.
  • the feature representation C2 is a vector indicating a point on the hyperbolic space.
  • the third embedding NW outputs the difference D2 between the feature representation C2 of the class corresponding to the parent node of the node and the feature representation of the node for the node of the third layer.
  • the adder 32 then outputs the sum of the characteristic representation C2 of the parent node and the difference D2 as the characteristic representation C3 corresponding to that node in the third layer.
  • the feature representation C3 is a vector indicating a point on the hyperbolic space.
  • FIG. 15 is a diagram conceptually explaining the feature representations C1 to C3 and the differences D1 to D2.
  • FIG. 15 shows the hyperbolic space in a two-dimensional space for convenience. Assuming the hierarchical structure of the classes shown in FIG. 3, circles ( ⁇ ) indicate the feature representation C1 of the class in the first layer, squares ( ⁇ ) indicate the feature representation C2 of the class in the second layer, and triangles ( ⁇ ) indicates the feature representation C3 of the class in the third layer.
  • the difference D1 can be considered as a vector pointing from the first layer class "food” indicated by circles to the second layer classes "bento", "bread", and "rice ball” indicated by squares.
  • the difference D2 can be considered as a vector pointing from the second-layer class "Bread” indicated by squares to the third-layer classes "Bread A" to "Bread C” indicated by triangles.
  • the "difference” is the tangent vector of the hyperbolic space in the feature representation of the class of the parent node, and the "sum” is realized by exponential mapping.
  • the hierarchical hyperbolic projection unit 22x outputs the feature representations C1 to C3 for each layer for one input data to the hierarchical hyperbolic classification unit 23x.
  • the hierarchical hyperbolic classifier 23 x receives the feature representation for each layer, classifies it for each layer, and outputs the classification result to the hierarchical loss calculator 24 .
  • the configurations and operations of the feature extraction unit 21, the gradient calculation unit 25, and the update unit 26 in the learning device 100b of the third embodiment are the same as those of the first embodiment, so descriptions thereof will be omitted.
  • FIG. 16 is a flowchart of learning processing by the learning device 100b of the third embodiment. This processing is realized by executing a program prepared in advance by the processor 12 shown in FIG. 1 and operating as each element shown in FIG.
  • the feature extraction unit 21 converts the input data into a pre-feature representation (step S51).
  • the hierarchical hyperbolic projection unit 22x converts the previous feature representation into a feature representation on the hyperbolic space for each layer (step S52).
  • the hierarchical hyperbolic classification unit 23x calculates the score of each class for each layer from the feature representation for each layer input from the hierarchical hyperbolic projection unit 22x (step S53).
  • the hierarchical loss calculator 24 calculates a hierarchical loss from the score of each class for each hierarchy and the correct label (step S54).
  • the gradient calculator 25 calculates the gradient of the hierarchical loss (step S55).
  • the update unit 26 updates the parameters of the feature extraction unit 21, the hierarchical hyperbolic projection unit 22x, and the hierarchical hyperbolic classification unit 23x based on the gradient (step S56). The above processing is repeated until a predetermined learning termination condition is satisfied, and the learning processing ends.
  • the inference device 200b of the third embodiment will be described. (Hardware configuration) Since the hardware configuration of the inference device 200b is the same as that of the learning device 100 shown in FIG. 1, the description thereof is omitted.
  • FIG. 17 is a block diagram showing the functional configuration of the inference device 200b of the third embodiment.
  • the inference device 200b includes a feature extraction unit 21, a hierarchical hyperbolic projection unit 22x, and a hierarchical hyperbolic classification unit 23x. Parameters obtained by the above learning process are set in the feature extraction unit 21, the hierarchical hyperbolic projection unit 22x, and the hierarchical hyperbolic classification unit 23x.
  • Input data is input to the feature extraction unit 21 .
  • This input data is data such as images that are actually subjected to class classification.
  • the feature extraction unit 21 converts the input data into a pre-feature representation and outputs it to the hierarchical hyperbolic projection unit 22x.
  • the hierarchical hyperbolic projection unit 22x uses the knowledge of the hierarchical structure of classes to convert the previous feature representation into a feature representation on the hyperbolic space for each layer, and outputs the feature representation to the hierarchical hyperbolic classification unit 23x.
  • the hierarchical hyperbolic classifier 23x calculates a score for each class in each layer based on the feature representation for each layer, and outputs it as an inference result. Classification of the input data is thus performed.
  • FIG. 18 is a flowchart of inference processing by the inference device 200b of the third embodiment. This processing is realized by executing a program prepared in advance by the processor 12 shown in FIG. 1 and operating as each element shown in FIG.
  • the feature extraction unit 21 converts the input data into a pre-feature representation (step S61).
  • the hierarchical hyperbolic projection unit 22x converts the previous feature representation into a feature representation on the hyperbolic space for each layer (step S62).
  • the hierarchical hyperbolic classifier 23x calculates the score of each class for each layer from the feature representation of each layer, and outputs it as an inference result (step S63). The above processing is performed for each input data.
  • FIG. 19 is a block diagram showing the functional configuration of the learning device of the fourth embodiment.
  • the learning device 70 includes feature extraction means 71 , projection means 72 , classification means 73 , loss calculation means 74 and update means 75 .
  • FIG. 20 is a flowchart of learning processing by the learning device 70 of the fourth embodiment.
  • the feature extraction means 71 converts the input data into the first feature representation (step S71).
  • the projection means 72 transforms the first feature representation into a second feature representation indicating a point on the hyperbolic space (step S72).
  • the classification means 73 performs classification based on the second feature representation, and outputs a score indicating the possibility that the input data belongs to each class (step S73).
  • the loss calculation means 74 calculates the hierarchical loss based on the knowledge of the hierarchical structure to which each class belongs, the correct label assigned to the input data, and the score (step S74).
  • the updating means 75 updates the parameters of the feature extracting means, the projecting means and the classifying means based on the hierarchical loss (step S75).
  • the fourth embodiment by using the knowledge of the hierarchical structure of classes, it is possible to generate a highly accurate model even with a small amount of input data.
  • FIG. 21 is a block diagram showing the functional configuration of the inference device of the fifth embodiment.
  • the inference device 80 comprises feature extraction means 81 , projection means 82 and classification means 83 .
  • FIG. 22 is a flowchart of inference processing by the inference device 80 of the fifth embodiment.
  • the feature extraction means 81 converts the input data into the first feature representation (step S81).
  • the projection means 82 transforms the first feature representation into a second feature representation indicating a point on the hyperbolic space (step S82).
  • the classification means 83 performs classification based on the second feature representation, and uses knowledge of the hierarchical structure to which each class belongs to calculate a score indicating the possibility that the input data belongs to each class for each hierarchy. (step S83). According to the fourth embodiment, it is possible to perform inference with high accuracy using a model learned using knowledge of the hierarchical structure of classes.
  • (Appendix 1) a feature extraction means for converting input data into a first feature representation; projection means for transforming the first feature representation into a second feature representation representing a point on hyperbolic space; Classification means for performing classification based on the second feature representation and outputting a score indicating the possibility that the input data belongs to each class; loss calculation means for calculating a hierarchical loss based on the knowledge of the hierarchical structure to which each class belongs, the correct label assigned to the input data, and the score; updating means for updating parameters of the feature extracting means, the projecting means and the classifying means based on the hierarchical loss;
  • a learning device with
  • the classification means outputs a score for the terminal class of the hierarchical structure
  • the loss calculation means according to appendix 1, wherein the scores of the terminal classes are integrated to calculate the losses of the layers higher than the layer of the terminal class, and the weighted sum of the losses of each layer is calculated as the hierarchical loss. learning device.
  • the loss calculation means calculates a loss that maximizes the score of the correct class for the hierarchy of the terminal class, and for the hierarchy higher than the hierarchy of the terminal class, the class to which the correct class belongs among the classes of the hierarchy 3.
  • the learning device of claim 2 which calculates a loss that maximizes the score of .
  • the classifying means outputs the score for each hierarchy using the knowledge of the hierarchical structure, 4.
  • the learning device according to any one of supplementary notes 1 to 3, wherein the loss calculation means calculates the hierarchical loss based on the score output for each layer.
  • the projection means outputs the second feature representation for each hierarchy based on the knowledge of the hierarchical structure; 5.
  • the learning device according to appendix 4, wherein the classifying means outputs the score for each layer based on the second feature representation output for each layer.
  • (Appendix 7) converting the input data into a first feature representation using the feature extraction means; transforming the first feature representation into a second feature representation representing a point on hyperbolic space using a projection means; Classifying based on the second feature representation using classifying means and outputting a score indicating the likelihood that the input data belongs to each class; calculating a hierarchical loss based on the knowledge of the hierarchical structure to which each class belongs, the correct label assigned to the input data, and the score;
  • a recording medium recording a program for causing a computer to execute processing for updating parameters of the feature extraction means, the projection means, and the classification means based on the hierarchical loss.
  • (Appendix 8) a feature extraction means for converting input data into a first feature representation; projection means for transforming the first feature representation into a second feature representation representing a point on hyperbolic space; Classification means for performing classification based on the second feature representation, and using knowledge of the hierarchical structure to which each class belongs to calculate a score indicating the possibility that the input data belongs to each class for each hierarchy;
  • a reasoning device with
  • the projection means outputs the second feature representation for each hierarchy based on the knowledge of the hierarchical structure; 5.
  • the learning device according to any one of supplementary notes 1 to 4, wherein the classifying means outputs the score for each layer based on the second feature representation output for each layer.

Abstract

Dans ce dispositif d'apprentissage, un moyen d'extraction de caractéristiques convertit des données d'entrée en une première représentation de caractéristiques. Un moyen de projection convertit la première représentation de caractéristiques en une seconde représentation de caractéristiques représentant un point dans un espace hyperbolique. Un moyen de classification procède à une classification d'après la seconde représentation de caractéristiques, puis génère un score indiquant la probabilité que les données d'entrée appartiennent à chaque classe. Un moyen de calcul de perte calcule une perte hiérarchique d'après la connaissance de la structure hiérarchique à laquelle appartient chaque classe, une étiquette de réponse correcte attribuée aux données d'entrée, ainsi que le score. Un moyen de mise à jour met à jour les paramètres du moyen d'extraction de caractéristiques, du moyen de projection et du moyen de classification d'après la perte hiérarchique.
PCT/JP2021/008691 2021-03-05 2021-03-05 Dispositif d'apprentissage, procédé d'apprentissage, dispositif d'inférence, procédé d'inférence et support d'enregistrement WO2022185529A1 (fr)

Priority Applications (2)

Application Number Priority Date Filing Date Title
PCT/JP2021/008691 WO2022185529A1 (fr) 2021-03-05 2021-03-05 Dispositif d'apprentissage, procédé d'apprentissage, dispositif d'inférence, procédé d'inférence et support d'enregistrement
JP2023503320A JPWO2022185529A5 (ja) 2021-03-05 学習装置、学習方法、推論装置、推論方法、及び、プログラム

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
PCT/JP2021/008691 WO2022185529A1 (fr) 2021-03-05 2021-03-05 Dispositif d'apprentissage, procédé d'apprentissage, dispositif d'inférence, procédé d'inférence et support d'enregistrement

Publications (1)

Publication Number Publication Date
WO2022185529A1 true WO2022185529A1 (fr) 2022-09-09

Family

ID=83154123

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/JP2021/008691 WO2022185529A1 (fr) 2021-03-05 2021-03-05 Dispositif d'apprentissage, procédé d'apprentissage, dispositif d'inférence, procédé d'inférence et support d'enregistrement

Country Status (1)

Country Link
WO (1) WO2022185529A1 (fr)

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JP2020042403A (ja) * 2018-09-07 2020-03-19 Zホールディングス株式会社 情報処理装置、情報処理方法、及びプログラム
JP2020053073A (ja) * 2014-03-28 2020-04-02 日本電気株式会社 学習方法、学習システム、および学習プログラム
JP2020091846A (ja) * 2018-10-19 2020-06-11 タタ コンサルタンシー サービシズ リミテッドTATA Consultancy Services Limited 会話に基づくチケットロギングのためのシステム及び方法
JP2020091813A (ja) * 2018-12-07 2020-06-11 公立大学法人会津大学 ニューラルネットワークの学習方法、コンピュータプログラム及びコンピュータ装置
WO2020162294A1 (fr) * 2019-02-07 2020-08-13 株式会社Preferred Networks Procédé de conversion, dispositif d'apprentissage et dispositif d'inférence

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JP2020053073A (ja) * 2014-03-28 2020-04-02 日本電気株式会社 学習方法、学習システム、および学習プログラム
JP2020042403A (ja) * 2018-09-07 2020-03-19 Zホールディングス株式会社 情報処理装置、情報処理方法、及びプログラム
JP2020091846A (ja) * 2018-10-19 2020-06-11 タタ コンサルタンシー サービシズ リミテッドTATA Consultancy Services Limited 会話に基づくチケットロギングのためのシステム及び方法
JP2020091813A (ja) * 2018-12-07 2020-06-11 公立大学法人会津大学 ニューラルネットワークの学習方法、コンピュータプログラム及びコンピュータ装置
WO2020162294A1 (fr) * 2019-02-07 2020-08-13 株式会社Preferred Networks Procédé de conversion, dispositif d'apprentissage et dispositif d'inférence

Non-Patent Citations (2)

* Cited by examiner, † Cited by third party
Title
HIGASHIYAMA, SHOHEI; BLONDEL, MATHIEU; SEKI, KAZUHIRO; UEHARA, KUNIAKI: "Named Entity Recognition Exploiting Category Hierarchy Using Structured Perceptron", IPSJ SIG TECHNICAL REPORTS, vol. 2012-BIO-32, no. 25, 30 November 2011 (2011-11-30), pages 1 - 6, XP009539751 *
MAXIMILIAN NICKEL; DOUWE KIELA: "Poincare Embeddings for Learning Hierarchical Representations", ARXIV.ORG, CORNELL UNIVERSITY LIBRARY, 201 OLIN LIBRARY CORNELL UNIVERSITY ITHACA, NY 14853, 23 May 2017 (2017-05-23), 201 Olin Library Cornell University Ithaca, NY 14853 , XP080949506 *

Also Published As

Publication number Publication date
JPWO2022185529A1 (fr) 2022-09-09

Similar Documents

Publication Publication Date Title
Sarker Deep learning: a comprehensive overview on techniques, taxonomy, applications and research directions
CN112560432B (zh) 基于图注意力网络的文本情感分析方法
CN111353373B (zh) 一种相关对齐域适应故障诊断方法
Naz et al. Intelligent routing between capsules empowered with deep extreme machine learning technique
CN112733866A (zh) 一种提高可控图像文本描述正确性的网络构建方法
CN112199536A (zh) 一种基于跨模态的快速多标签图像分类方法和系统
CN115661550B (zh) 基于生成对抗网络的图数据类别不平衡分类方法及装置
Joshua Thomas et al. A deep learning framework on generation of image descriptions with bidirectional recurrent neural networks
CN113254675A (zh) 基于自适应少样本关系抽取的知识图谱构建方法
Jiang et al. An intelligent recommendation approach for online advertising based on hybrid deep neural network and parallel computing
Jia et al. Imbalanced disk failure data processing method based on CTGAN
Dinov et al. Black box machine-learning methods: Neural networks and support vector machines
CN113849653A (zh) 一种文本分类方法及装置
WO2022185529A1 (fr) Dispositif d'apprentissage, procédé d'apprentissage, dispositif d'inférence, procédé d'inférence et support d'enregistrement
Wang et al. Interpret neural networks by extracting critical subnetworks
Jiang et al. A massive multi-modal perception data classification method using deep learning based on internet of things
CN111259938A (zh) 基于流形学习和梯度提升模型的图片偏多标签分类方法
CN112668633B (zh) 一种基于细粒度领域自适应的图迁移学习方法
Aziz Deep learning: an overview of Convolutional Neural Network (CNN)
CN113158577A (zh) 基于层次化耦合关系的离散数据表征学习方法及系统
Li et al. PointSmile: Point self-supervised learning via curriculum mutual information
CN113032565B (zh) 一种基于跨语言监督的上下位关系检测方法
CN112927248B (zh) 一种基于局部特征增强和条件随机场的点云分割方法
CN115952259B (zh) 一种企业画像标签智能生成方法
US20240020553A1 (en) Interactive electronic device for performing functions of providing responses to questions from users and real-time conversation with the users using models learned by deep learning technique and operating method thereof

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 21929101

Country of ref document: EP

Kind code of ref document: A1

WWE Wipo information: entry into national phase

Ref document number: 2023503320

Country of ref document: JP

NENP Non-entry into the national phase

Ref country code: DE

122 Ep: pct application non-entry in european phase

Ref document number: 21929101

Country of ref document: EP

Kind code of ref document: A1