JP7440798B2

JP7440798B2 - Learning device, prediction device, learning method and program

Info

Publication number: JP7440798B2
Application number: JP2022530395A
Authority: JP
Inventors: 悠三鼓; 豪入江; 大貴伊神
Original assignee: Nippon Telegraph and Telephone Corp
Current assignee: Nippon Telegraph and Telephone Corp
Priority date: 2020-06-09
Filing date: 2020-06-09
Publication date: 2024-02-29
Anticipated expiration: 2040-06-09
Also published as: WO2021250774A1; JPWO2021250774A1

Description

本発明は、学習装置、予測装置、学習方法及びプログラムの技術に関する。 The present invention relates to a learning device, a prediction device, a learning method, and a program technique.

機械学習を用いた予測モデル学習には、一般的に教師あり学習と呼ばれる枠組みが用いられる。教師あり学習とは、あるデータとそのデータに対する正解クラスラベルとのペアを大量に用意し、データとクラスラベルとのペアからその関係性を学習する枠組みである。 A framework called supervised learning is generally used for predictive model learning using machine learning. Supervised learning is a framework that prepares a large number of pairs of certain data and correct class labels for that data, and learns the relationships between the data and class label pairs.

教師あり学習を実現するためには、大量のデータとクラスラベルとのペアを用意する必要があるが、これを作成することは基本的に高コストである。そこで、すでに教師ありデータが存在する領域（以下「ドメイン」という。）で学習したモデルを、目標とするドメインで活用する方法がとられることがある。例えば、手書き文字を認識する場合に、比較的教師ありデータが容易に得られるデジタルフォントデータを用いて識別器を学習した後に、教師ありデータが少ない（あるいはまったくない）手書き文字データで識別器を再訓練するような方法がとられることがある。 In order to implement supervised learning, it is necessary to prepare pairs of large amounts of data and class labels, but creating these pairs is basically expensive. Therefore, a method is sometimes taken in which a model learned in an area where supervised data already exists (hereinafter referred to as a "domain") is utilized in a target domain. For example, when recognizing handwritten characters, after training a discriminator using digital font data for which supervised data is relatively easy to obtain, the discriminator is trained using handwritten character data with little (or no supervised data). Retraining methods may be used.

しかし学習を行った元のドメイン（以下「元ドメイン」という：先の例の場合はデジタルフォントデータ）と、目標とするドメイン（以下「目標ドメイン」という：先の例の場合は手書き文字データ）とでは、データの生成分布が異なる場合がある。図６は、このような問題の概略を示す図である。図６において、実線で囲まれた領域は元ドメイン１０であり、破線で囲まれた領域は目標ドメイン２０であり、直線で示された線は識別境界３０である。例えば、同じ「あ」という文字でも、デジタルフォントと手書き文字とでは、その形が大きく異なることがある。生成分布が異なる場合、図６のように元ドメイン１０で学習した識別境界３０は、目標ドメイン２０に対して信頼性がないことがある。このような場合、学習したモデルが目標ドメイン２０において期待する識別精度を達成することができないという問題が生じる。このように、ドメイン間に差異がある場合における学習問題はドメイン適応問題と呼ばれる。 However, the original domain for which learning was performed (hereinafter referred to as the "source domain"; in the case of the previous example, the digital font data) and the target domain (hereinafter referred to as the "target domain"; in the case of the previous example, the handwritten character data) The data generation distribution may differ between the two. FIG. 6 is a diagram schematically showing such a problem. In FIG. 6, the region surrounded by solid lines is the source domain 10, the region surrounded by broken lines is the target domain 20, and the straight line is the identification boundary 30. For example, even if the character "a" is the same, the shape of the digital font and the handwritten character may be significantly different. When the generation distributions are different, the discrimination boundary 30 learned in the source domain 10 as shown in FIG. 6 may be unreliable in the target domain 20. In such a case, a problem arises in that the learned model cannot achieve the expected classification accuracy in the target domain 20. In this way, a learning problem in which there are differences between domains is called a domain adaptation problem.

従来、このようなドメイン適応問題を解決するために、下記のような公知の技術が存在する。特許文献１に開示された技術では、元ドメインにおけるサンプルの生成分布と、目標ドメインにおけるサンプルの生成分布と、の間の分布感距離であるＭＭＤの値を最小化するような元ドメインから目標ドメインへの変換則が学習される。そして、学習された変換則を用いて元ドメインのデータを変換し、変換された元ドメインのデータを用いた教師あり学習により、モデルの学習が行われる。 Conventionally, in order to solve such domain adaptation problems, the following known techniques exist. In the technique disclosed in Patent Document 1, the process is performed from the source domain to the target domain in such a way as to minimize the value of MMD, which is the perceived distribution distance between the sample generation distribution in the source domain and the sample generation distribution in the target domain. A conversion rule is learned. Then, data in the original domain is transformed using the learned transformation rule, and model learning is performed by supervised learning using the transformed data in the original domain.

非特許文献１では、元ドメインのデータと目標ドメインのデータとについて、ドメインの識別が困難になるような特徴空間へ射影する特徴抽出器と、その特徴空間での元ドメインのデータとそのデータに付与されたクラスラベルとの関係性と、が同時に学習される。元ドメインのデータと目標ドメインのデータとを特徴空間上識別困難にすることは、両者の生成分布を特徴空間上で近づけることを意味する。このような処理は、例えば、図６の状態から図７の状態に変化させることを意味してもよい。これにより、元ドメインのデータで教師あり学習を行うことによって得られたモデルについて、目標ドメインのデータへの予測精度が改善される。 Non-Patent Document 1 describes a feature extractor that projects source domain data and target domain data onto a feature space that makes it difficult to identify the domain, and a feature extractor that projects source domain data and target domain data into a feature space that makes domain identification difficult, and The relationship with the assigned class label is learned at the same time. Making it difficult to distinguish between the data of the source domain and the data of the target domain in the feature space means making the generation distributions of both closer to each other in the feature space. Such processing may mean, for example, changing the state of FIG. 6 to the state of FIG. 7. This improves the prediction accuracy for the target domain data of the model obtained by performing supervised learning on the source domain data.

非特許文献２では、非特許文献１で学習する共通特徴空間を、特徴抽出器とそれに連なる２つの識別器を用いて学習する。非特許文献１で学習されたモデルよりも、非特許文献２で学習されたモデルの方が、目標ドメインデータへの予測精度が高くなることが知られている。 In Non-Patent Document 2, the common feature space learned in Non-Patent Document 1 is learned using a feature extractor and two discriminators connected to the feature extractor. It is known that the model learned in Non-Patent Document 2 has higher prediction accuracy for target domain data than the model learned in Non-Patent Document 1.

元ドメインと目標ドメインとの間の差異に依存して、さまざまな付随問題が生じることがある。付随問題の一つとして、元ドメインに与えられているクラス以外のデータが、目標ドメインに存在する場合に生じる問題がある。先の手書き文字認識の場合を例にとると、デジタルフォントデータには「あ」、「い」、「う」しか存在しないにもかかわらず、手書き文字データには「え」、「お」が含まれるような場合にこのような問題が生じる。元ドメインによってラベルが付与されているクラスを既知クラス（先の例の場合は「あ」、「い」、「う」）と称し、それ以外のクラスを未知クラス（先の例の場合は「え」、「お」）と称する。通常、教師あり学習をした識別器は、未知クラスに属するデータが入力された場合であっても、既知クラスのいずれかのクラスに属すると予測してしまう。このような動作により、文字認識の精度が低下してしまうという問題が生じうる。 Various collateral problems may arise depending on the differences between the source and target domains. One of the accompanying problems is the problem that occurs when data other than the class given to the source domain exists in the target domain. Taking the case of handwritten character recognition mentioned above as an example, even though the digital font data only contains "a", "i", and "u", the handwritten character data contains "e" and "o". Such a problem arises when it is included. Classes that are labeled by the original domain are called known classes (in the previous example, "A", "I", and "U"), and other classes are called unknown classes (in the previous example, "A", "I", and "U"). ``E'', ``O''). Normally, a classifier that has undergone supervised learning predicts that even if data belonging to an unknown class is input, the data belongs to one of the known classes. Such an operation may cause a problem in that the accuracy of character recognition decreases.

また、別の問題として以下のような問題もある。通常、元ドメインと目標ドメインとはそれぞれ単一のドメインから構成されることが想定されている。しかし、元ドメインと目標ドメインとのいずれもが、複数のドメインにより形成されうる場合がある。例えば、手書き文字データが、異なる複数の個人により書かれていた場合や、異なる筆記用具を用いて書かれていた場合には、元ドメインや目標ドメインが複数のドメインにより形成されるおそれがある。この場合、それぞれ生成分布が変化するため、目標ドメイン内に複数のドメインが内在すると考えることができる。ドメインが複数のドメインにより形成されている場合、非特許文献１のような方法では、期待される予測精度を実現できない問題が生じる。 In addition, there are other problems as follows. Normally, it is assumed that the source domain and the target domain each consist of a single domain. However, both the source domain and the target domain may be formed by multiple domains. For example, when handwritten character data is written by multiple different individuals or using different writing instruments, there is a possibility that the source domain and the target domain are formed by multiple domains. In this case, since the generation distribution changes, it can be considered that a plurality of domains exist within the target domain. When a domain is formed by a plurality of domains, a problem arises in which the method described in Non-Patent Document 1 cannot achieve the expected prediction accuracy.

元ドメインに複数のドメインが内在する問題に対応する技術に関する文献として、非特許文献４がある。非特許文献４に開示された技術では、元ドメインに内在する各ドメインと目標ドメインとの間で識別が困難になるような特徴が学習される。また反対に、目標ドメインに複数のドメインが内在する問題に対応する技術に関する文献として、非特許文献５がある。非特許文献５に開示された技術では、目標ドメインに内在する複数ドメインの領域間でドメインの識別が困難になるような特徴が学習される。 Non-patent document 4 is a document related to a technique for dealing with the problem of multiple domains inherent in the original domain. In the technique disclosed in Non-Patent Document 4, features that make it difficult to distinguish between each domain inherent in the source domain and the target domain are learned. On the other hand, there is Non-Patent Document 5 as a document related to a technique for dealing with the problem that a target domain includes multiple domains. In the technique disclosed in Non-Patent Document 5, features that make it difficult to identify a domain among multiple domain regions within a target domain are learned.

特開２０１９－１０１７８９号公報JP 2019-101789 Publication

Yaroslav Ganin and Victor Lempitsky. Unsupervised domain adaptation by backpropagation. In David Blei and Francis Bach, editors, Proceedings of the 32nd International Conference on Machine Learning (ICML15), pages 1180-1189. JMLR Workshop and Conference Proceedings, 2015. 2Yaroslav Ganin and Victor Lempitsky. Unsupervised domain adaptation by backpropagation. In David Blei and Francis Bach, editors, Proceedings of the 32nd International Conference on Machine Learning (ICML15), pages 1180-1189. JMLR Workshop and Conference Proceedings, 2015. 2 Kuniaki Saito, Kohei Watanabe, Yoshitaka Ushiku, Tatsuya Harada. Maximum Classifier Discrepancy for Unsupervised Domain Adaptation. The 31th IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR 2018), 2018Kuniaki Saito, Kohei Watanabe, Yoshitaka Ushiku, Tatsuya Harada. Maximum Classifier Discrepancy for Unsupervised Domain Adaptation. The 31th IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR 2018), 2018 K. Saito, S. Yamamoto, Y. Ushiku, and T. Harada. Open set domain adaptation by backpropagation. In The European Conference on Computer Vision (ECCV), September 2018. 2K. Saito, S. Yamamoto, Y. Ushiku, and T. Harada. Open set domain adaptation by backpropagation. In The European Conference on Computer Vision (ECCV), September 2018. 2 Han Zhao, Shanghang Zhang, Guanhang Wu, Jose MF ´ Moura, Joao P Costeira, and Geoffrey J Gordon. Adversarial multiple source domain adaptation. In Advances in Neural Information Processing Systems, pages 8568-8579, 2018. 1, 2, 5Han Zhao, Shanghang Zhang, Guanhang Wu, Jose MF ´ Moura, Joao P Costeira, and Geoffrey J Gordon. Adversarial multiple source domain adaptation. In Advances in Neural Information Processing Systems, pages 8568-8579, 2018. 1, 2, 5 Z. Chen, J. Zhuang, X. Liang, and L. Lin. Blending-target domain adaptation by adversarial meta-adaptation networks. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pages 2248-2257, 2019.Z. Chen, J. Zhuang, X. Liang, and L. Lin. Blending-target domain adaptation by adversarial meta-adaptation networks. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pages 2248-2257, 2019.

元ドメインと目標ドメインとで生成分布が異なるドメイン適応問題と、それに付随して発生する種々の付随問題を解決するために、それぞれ非特許文献３、非特許文献４、非特許文献５のような技術が提案されてきた。しかしながら、それぞれの技術は、その技術が考慮している付随問題に対しては良好な性能を示すものの、他の付随問題に対しては有効ではない。 In order to solve the domain adaptation problem in which the generation distribution is different between the source domain and the target domain, and the various incidental problems that occur along with it, we have proposed methods such as those in Non-patent Document 3, Non-patent Document 4, and Non-patent Document 5, respectively. techniques have been proposed. However, although each technique shows good performance for the concomitant problem it considers, it is not effective for other concomitant problems.

例えば、元ドメインに複数のドメインが内在する問題に対応する技術である非特許文献４の技術を、目標ドメインに複数のドメインが内在する問題に対して適用しても、十分な性能が得られない。 For example, even if the technique of Non-Patent Document 4, which deals with a problem where multiple domains are inherent in the source domain, is applied to a problem where multiple domains are inherent in the target domain, sufficient performance cannot be obtained. do not have.

一般的に、処理の対象においてどのような問題が存在するかを事前に知ることができるケースは稀である。そのため、どのような問題に対する技術を適用すればよいかを判断することが困難である。また、上述した問題が複数混在するようなケースには、技術の適用が困難になるという問題も存在する。 Generally, it is rare that it is possible to know in advance what kind of problems exist in the target of processing. Therefore, it is difficult to judge what kind of problem the technology should be applied to. Furthermore, in cases where multiple of the above-mentioned problems coexist, there is also the problem that it becomes difficult to apply the technology.

上記事情に鑑み、本発明は、このような問題を鑑みてなされたものであり、ドメインに関するより広範な問題に対して良好な性能を達成する技術の提供を目的としている。 In view of the above circumstances, the present invention has been made in view of such problems, and aims to provide a technique that achieves good performance for a wider range of domain-related problems.

本発明の一態様は、入力されたデータの特徴量を出力する特徴抽出器と、前記特徴量に基づいて、前記データについて既知クラス及び未知クラスへの帰属確率を取得する複数の識別器と、前記識別器によって取得された前記帰属確率に基づいて、前記データが未知クラスであるか否か判断する未知クラス識別器と、前記データに対して、前記複数の識別器によって得られたそれぞれの帰属確率の違いを示す識別不一致度の値を出力する識別不一致評価部と、前記未知クラスではなく、且つ、教師ラベルが付与されていないデータを用いて、前記特徴抽出器については前記識別不一致度の値を小さくするように、複数の前記識別器については前記識別不一致度の値を大きくするように、前記特徴抽出器及び複数の前記識別器のパラメータの反復学習を行う学習部と、を備える学習装置である。 One aspect of the present invention includes: a feature extractor that outputs feature amounts of input data; and a plurality of discriminators that obtain probability of belonging to a known class and an unknown class for the data based on the feature amounts. an unknown class discriminator that determines whether the data is an unknown class based on the attribution probability obtained by the discriminator; and an unknown class discriminator that determines whether the data is an unknown class based on the attribution probability obtained by the discriminator; The feature extractor calculates the identification discrepancy value by using the identification discrepancy evaluator that outputs the identification discrepancy value that indicates the difference in probability, and the data that is not the unknown class and to which no teacher label is attached. a learning unit that iteratively learns the parameters of the feature extractor and the plurality of classifiers so as to decrease the value and increase the value of the discrimination inconsistency degree for the plurality of classifiers; It is a device.

本発明の一態様は、上記の学習装置によって得られたパラメータに基づいて、入力されたデータの特徴量を出力する特徴抽出器と、上記の学習装置によって得られたパラメータと前記特徴量とに基づいて、前記データについて既知クラス及び未知クラスへの帰属確率を取得する識別器と、を備える予測装置である。 One aspect of the present invention provides a feature extractor that outputs a feature amount of input data based on a parameter obtained by the above learning device, and a feature extractor that outputs a feature amount of input data based on a parameter obtained by the above learning device; and a discriminator that obtains the probability of belonging of the data to a known class and an unknown class based on the data.

本発明の一態様は、特徴抽出器を用いて、入力されたデータの特徴量を出力する特徴抽出ステップと、複数の識別器を用いて、前記特徴量に基づいて、前記データについて既知クラス及び未知クラスへの帰属確率をそれぞれ取得する識別ステップと、取得された前記帰属確率に基づいて、前記データが未知クラスであるか否か判断する未知クラス識別ステップと、前記データに対して、前記複数の識別器によって得られたそれぞれの帰属確率の違いを示す識別不一致度の値を出力する識別不一致評価ステップと、前記未知クラスではなく、且つ、教師ラベルが付与されていないデータを用いて、前記特徴抽出器については前記識別不一致度の値を小さくするように、複数の前記識別器については前記識別不一致度の値を大きくするように、前記特徴抽出器及び複数の前記識別器のパラメータの反復学習を行う学習ステップと、を有する学習方法である。 One aspect of the present invention includes a feature extraction step of outputting a feature amount of input data using a feature extractor, and a feature extraction step of outputting a feature amount of input data using a plurality of discriminators to identify a known class for the data based on the feature amount. an identification step of acquiring respective belonging probabilities to unknown classes; an unknown class identification step of determining whether the data is an unknown class based on the acquired belonging probabilities; an identification discrepancy evaluation step of outputting a value of the classification discrepancy degree indicating the difference in the respective attribution probabilities obtained by the classifiers; Repetition of the parameters of the feature extractor and the plurality of classifiers so that the value of the degree of discrimination inconsistency is decreased for the feature extractor, and the value of the degree of discrimination inconsistency is increased for the plurality of classifiers. This learning method includes a learning step for performing learning.

本発明の一態様は、上記の学習装置としてコンピューターを動作させるためのプログラムである。 One aspect of the present invention is a program for operating a computer as the learning device described above.

本発明により、このような問題を鑑みてなされたものであり、ドメインに関するより広範な問題に対して良好な性能を達成することが可能となる。 The present invention has been made in view of such problems, and makes it possible to achieve good performance for a wider range of domain-related problems.

本実施形態の概略を示す図である。FIG. 1 is a diagram schematically showing the present embodiment. 本実施形態の概略を示す図である。FIG. 1 is a diagram schematically showing the present embodiment. 本実施形態に係る学習装置１００の一例を示す機能ブロック図である。FIG. 1 is a functional block diagram showing an example of a learning device 100 according to the present embodiment. 本実施形態に係る予測装置２００の一例を示す機能ブロック図である。It is a functional block diagram showing an example of a prediction device 200 according to the present embodiment. 学習装置１００の動作例を示すフローチャートである。3 is a flowchart illustrating an example of the operation of the learning device 100. 従来技術の例を示す図である。FIG. 2 is a diagram showing an example of conventional technology. 従来技術の例を示す図である。FIG. 2 is a diagram showing an example of conventional technology.

＜概略＞
まず、本実施形態の概略について説明する。本実施形態は、未知クラスが存在する問題（以下「第一問題」という。）が存在した場合であっても適切に動作する。さらに、本実施形態は、各ドメインのデータに部分的にしかラベルづけがされていないという問題（以下「第二問題」という。）や、データのドメイン帰属情報が未知であるという問題（以下「第三問題」という。）が存在する場合であっても、適切に動作するように構成されてもよい。また、これら３つの付随問題のうち複数の問題が内在している場合であっても適切に動作するように構成されてもよい。 <Summary>
First, the outline of this embodiment will be explained. This embodiment operates appropriately even when there is a problem in which an unknown class exists (hereinafter referred to as the "first problem"). Furthermore, this embodiment solves the problem that the data in each domain is only partially labeled (hereinafter referred to as the "second problem") and the problem that the domain attribution information of the data is unknown (hereinafter referred to as the "second problem"). The configuration may be configured to operate appropriately even if the third problem exists. Furthermore, the system may be configured to operate appropriately even when a plurality of problems among these three incidental problems are present.

より具体的には以下の通りである。図１及び図２は、本実施形態の概略を示す図である。図１及び図２では、実線で囲まれた領域は元ドメイン１０であり、破線で囲まれた領域は目標ドメイン２０であり、直線で示された線は識別境界３０である。線分４０は、既知クラスと未知クラスとの境界として特定された情報を示す。矢印５０は、ドメイン適応を構成していることを示す。 More specifically, it is as follows. FIGS. 1 and 2 are diagrams schematically showing the present embodiment. In FIGS. 1 and 2, the region surrounded by solid lines is the source domain 10, the region surrounded by broken lines is the target domain 20, and the straight line is the identification boundary 30. A line segment 40 indicates information identified as a boundary between a known class and an unknown class. Arrow 50 indicates configuring domain adaptation.

本実施形態は、未知クラスが存在するという第一問題に対しては、教師ラベルの与えられていないデータの中から未知クラスに属するものを識別して特定することで対処する。本実施形態は、第二問題及び第三問題に対しては、ラベル付けがされているデータを元ドメイン、教師ラベルが与えられていないデータのうち既知クラスに属するものを目標ドメインとみなしたドメイン適応を構成することで対処する。 This embodiment deals with the first problem of the existence of an unknown class by identifying and specifying data belonging to the unknown class from among data to which no teacher label has been given. In this embodiment, for the second and third problems, labeled data is considered as the source domain, and data that is not given a teacher label and belongs to a known class is considered as the target domain. Deal with it by configuring adaptations.

＜学習装置の構成例＞
次に、本実施形態に係る学習装置の構成について説明する。図３は、本実施形態に係る学習装置１００の一例を示す機能ブロック図である。学習装置１００は、例えばパーソナルコンピューターやサーバー装置等の情報処理装置を用いて構成される。学習装置１００は、制御部９０、未知クラス情報記憶部１３０及び学習結果記憶部１４０を備える。制御部９０は、ＣＰＵ（Central Processing Unit）等のプロセッサーとメモリーとを用いて構成される。制御部９０は、プロセッサーがプログラムを実行することによって、特徴抽出器１０１、第一識別器１０２、第二識別器１０３、識別損失評価部１０４、未知クラス識別器１０５、識別不一致評価部１０６及び学習部１０７として機能する。なお、制御部９０の各機能の全て又は一部は、ＡＳＩＣ（Application Specific Integrated Circuit）やＰＬＤ（Programmable Logic Device）やＦＰＧＡ（Field Programmable Gate Array）等のハードウェアを用いて実現されても良い。上記のプログラムは、コンピューター読み取り可能な記録媒体に記録されても良い。コンピューター読み取り可能な記録媒体とは、例えばフレキシブルディスク、光磁気ディスク、ＲＯＭ、ＣＤ－ＲＯＭ、半導体記憶装置（例えばＳＳＤ：Solid State Drive）等の可搬媒体、コンピューターシステムに内蔵されるハードディスクや半導体記憶装置等の記憶装置である。上記のプログラムは、電気通信回線を介して送信されてもよい。 <Example of configuration of learning device>
Next, the configuration of the learning device according to this embodiment will be explained. FIG. 3 is a functional block diagram showing an example of the learning device 100 according to this embodiment. The learning device 100 is configured using, for example, an information processing device such as a personal computer or a server device. The learning device 100 includes a control section 90, an unknown class information storage section 130, and a learning result storage section 140. The control unit 90 is configured using a processor such as a CPU (Central Processing Unit) and a memory. The control unit 90 controls the feature extractor 101, the first classifier 102, the second classifier 103, the classification loss evaluation unit 104, the unknown class classifier 105, the classification discrepancy evaluation unit 106, and the learning by the processor executing the program. 107. Note that all or part of each function of the control unit 90 may be realized using hardware such as an ASIC (Application Specific Integrated Circuit), a PLD (Programmable Logic Device), or an FPGA (Field Programmable Gate Array). The above program may be recorded on a computer-readable recording medium. Computer-readable recording media include portable media such as flexible disks, magneto-optical disks, ROMs, CD-ROMs, semiconductor storage devices (for example, SSDs: Solid State Drives), and hard disks and semiconductor storages built into computer systems. It is a storage device such as a device. The above program may be transmitted via a telecommunications line.

学習装置１００は、教師ありデータ記憶部１１０及び教師なしデータ記憶部１２０からデータを取得して動作する。教師ありデータ記憶部１１０は、磁気ハードディスク装置や半導体記憶装置等の記憶装置、ＣＤ－ＲＯＭ等の記録媒体等のようにデータを記憶できる機器又は媒体を用いて構成される。教師ありデータ記憶部１１０は、教師ありデータ集合を記憶する。教師ありデータ集合は、所望のクラスラベルが付与されたデータの集合である。教師なしデータ記憶部１２０は、磁気ハードディスク装置や半導体記憶装置等の記憶装置、ＣＤ－ＲＯＭ等の記録媒体等のようにデータを記憶できる機器又は媒体を用いて構成される。教師なしデータ記憶部１２０は、教師なしデータ集合を記憶する。教師なしデータ集合は、所望のクラスラベルが付与されていないデータの集合である。 The learning device 100 operates by acquiring data from the supervised data storage section 110 and the unsupervised data storage section 120. The supervised data storage unit 110 is configured using a device or medium capable of storing data, such as a storage device such as a magnetic hard disk device or a semiconductor storage device, or a recording medium such as a CD-ROM. The supervised data storage unit 110 stores a supervised data set. A supervised data set is a data set that is given a desired class label. The unsupervised data storage unit 120 is configured using a device or medium capable of storing data, such as a storage device such as a magnetic hard disk device or a semiconductor storage device, or a recording medium such as a CD-ROM. The unsupervised data storage unit 120 stores unsupervised data sets. An unsupervised data set is a data set to which a desired class label has not been assigned.

特徴抽出器１０１は、教師ありデータ集合及び教師なしデータ集合を入力として受け取り、各データから特徴ベクトルを抽出する。特徴抽出器１０１は、抽出された特徴ベクトルを第一識別器１０２及び第二識別器１０３に出力する。特徴抽出器１０１は、このような特徴ベクトルを抽出することができるパラメータを持つ関数に基づいて動作する。特徴ベクトルとは、例えばデータの特徴を数ベクトルで表したものである。言い換えると、特徴ベクトルは、必要なデータの特徴をｎ次元の要素を持つベクトルで表したものである。ｎは任意の整数値であり、例えばｎ＝５１２であってもよい。なお、特徴ベクトルは、便宜上ベクトルの形式を持つものとして説明するが、形式は本発明の要点とは無関係であり、任意の形式をとることができる。特徴抽出器１０１は、特徴ベクトルを出力する度に、学習結果記憶部１４０に記憶されているパラメータを読み込んで特徴ベクトルを出力する。 The feature extractor 101 receives a supervised data set and an unsupervised data set as input, and extracts a feature vector from each data. The feature extractor 101 outputs the extracted feature vectors to the first classifier 102 and the second classifier 103. The feature extractor 101 operates based on a function having parameters that can extract such feature vectors. A feature vector is, for example, a vector representing the features of data. In other words, the feature vector represents the features of necessary data as a vector having n-dimensional elements. n may be any integer value, for example n=512. Note that although the feature vector will be described as having a vector format for convenience, the format is irrelevant to the main point of the present invention and can take any format. Each time the feature extractor 101 outputs a feature vector, it reads the parameters stored in the learning result storage unit 140 and outputs the feature vector.

第一識別器１０２は、特徴抽出器１０１によって出力された特徴ベクトルを入力として受け取る。第一識別器１０２は、入力された特徴ベクトルの元データに対する各クラスと未知クラスへの帰属確率の推定値（以下「推定帰属確率」という。）を出力する。推定帰属確率は、データが各既知クラス及び未知クラスに帰属する尤もらしさを表す確率である。第一識別器１０２は、このような推定帰属確率を出力することができるパラメータを持つ関数に基づいて動作する。第一識別器１０２は、推定帰属確率を出力する度に、学習結果記憶部１４０に記憶されているパラメータを読み込んで推定帰属確率を出力する。 The first classifier 102 receives the feature vector output by the feature extractor 101 as input. The first classifier 102 outputs an estimated value of the probability of belonging to each class and the unknown class (hereinafter referred to as "estimated belonging probability") for the original data of the input feature vector. Estimated belonging probability is a probability representing the likelihood that data belongs to each known class and unknown class. The first classifier 102 operates based on a function having parameters that can output such an estimated belonging probability. Each time the first discriminator 102 outputs the estimated belonging probability, it reads the parameters stored in the learning result storage unit 140 and outputs the estimated belonging probability.

第二識別器１０３は、特徴抽出器１０１によって出力された特徴ベクトルを入力として受け取る。第二識別器１０３は、入力された特徴ベクトルの元データに対する各クラスと未知クラスへの帰属確率の推定値（推定帰属確率）を出力する。第二識別器１０３は、このような推定帰属確率を出力することができるパラメータを持つ関数に基づいて動作する。第二識別器１０３は、推定帰属確率を出力する度に、学習結果記憶部１４０に記憶されているパラメータを読み込んで推定帰属確率を出力する。なお、第一識別器１０２及び第二識別器１０３には同一の特徴ベクトルが入力される。 The second classifier 103 receives the feature vector output by the feature extractor 101 as input. The second classifier 103 outputs an estimated value of the probability of belonging to each class and the unknown class (estimated belonging probability) for the original data of the input feature vector. The second classifier 103 operates based on a function having parameters that can output such an estimated belonging probability. Every time the second discriminator 103 outputs the estimated belonging probability, it reads the parameters stored in the learning result storage unit 140 and outputs the estimated belonging probability. Note that the same feature vector is input to the first classifier 102 and the second classifier 103.

特徴抽出器１０１、第一識別器１０２及び第二識別器１０３に適用される関数は、パラメータに対して微分可能であるものであれば、任意のものを用いることができる。本実施形態では、ＣＮＮ（Convolutional Neural Network）が用いられる。ただし、ＣＮＮは一例に過ぎず、これに限定される必要は無い。 Any functions can be used as the functions applied to the feature extractor 101, the first classifier 102, and the second classifier 103 as long as they are differentiable with respect to the parameters. In this embodiment, a CNN (Convolutional Neural Network) is used. However, CNN is only one example, and there is no need to be limited to this.

識別損失評価部１０４は、処理対象のデータと、この処理対象のデータが未知クラスであるか否かを示す情報と、処理対象のデータに対して第一識別器１０２及び第二識別器１０３が出力した推定帰属確率と、処理対象のデータに対する所望の帰属確率（以下「教師帰属確率」という。）と、を入力として受ける。識別損失評価部１０４は、これらの差異を表す第一の損失関数である識別損失関数の値（以下「識別損失評価値」という。）を求める。教師帰属確率とは、学習の際に正解となるクラスラベルに応じた帰属確率である。 The identification loss evaluation unit 104 includes data to be processed, information indicating whether or not the data to be processed is an unknown class, and a first discriminator 102 and a second discriminator 103 for the data to be processed. The output estimated attribution probability and the desired attribution probability for the data to be processed (hereinafter referred to as "teacher attribution probability") are received as input. The discrimination loss evaluation unit 104 obtains a value of a discrimination loss function (hereinafter referred to as "discrimination loss evaluation value") which is a first loss function representing these differences. The teacher attribution probability is the attribution probability according to the class label that is the correct answer during learning.

未知クラス識別器１０５は、処理対象のデータと、処理対象のデータに対して第一識別器１０２及び第二識別器１０３が出力した推定帰属確率と、を入力として受ける。未知クラス識別器１０５は、処理対象のデータが未知クラスであるか否かについて識別する。未知クラス識別器１０５は、識別結果を示す情報（以下「未知クラス情報」という。）を未知クラス情報記憶部１３０に記録する。未知クラス情報記憶部１３０に記録された情報は、識別損失評価部１０４及び識別不一致評価部１０６において使用される。 The unknown class classifier 105 receives as input the data to be processed and the estimated belonging probabilities output by the first classifier 102 and the second classifier 103 for the data to be processed. The unknown class discriminator 105 identifies whether the data to be processed is an unknown class. The unknown class discriminator 105 records information indicating the identification result (hereinafter referred to as "unknown class information") in the unknown class information storage section 130. The information recorded in the unknown class information storage section 130 is used by the identification loss evaluation section 104 and the identification mismatch evaluation section 106.

識別不一致評価部１０６は、処理対象のデータと、処理対象のデータに対して第一識別器１０２及び第二識別器１０３が出力した推定帰属確率と、を入力として受ける。識別不一致評価部１０６は、第一識別器１０２及び第二識別器１０３の推定帰属確率の不一致度を示す値（以下「識別不一致度評価値」という。）を取得する。 The identification discrepancy evaluation unit 106 receives as input the data to be processed and the estimated attribution probabilities output by the first classifier 102 and the second classifier 103 for the data to be processed. The identification discrepancy evaluation unit 106 obtains a value indicating the degree of discrepancy between the estimated attribution probabilities of the first classifier 102 and the second classifier 103 (hereinafter referred to as "identification discrepancy evaluation value").

学習部１０７は、識別損失評価部１０４によって得られた識別損失関数と、識別不一致評価部１０６によって得られた識別不一致度評価値と、を入力として受ける。学習部１０７は、入力された値を用いて、特徴抽出器１０１、第一識別器１０２及び第二識別器１０３のパラメータの反復学習を行う。学習部１０７は、反復学習によって得られた特徴抽出器１０１、第一識別器１０２及び第二識別器１０３のパラメータを、学習結果記憶部１４０に記録する。特徴抽出器１０１に関する反復学習は、識別損失評価値及び識別不一致度評価値が共に小さくなるように行われる。第一識別器１０２及び第二識別器１０３に関する反復学習は、識別損失評価値が小さくなるように且つ識別不一致度評価値が大きくなるように行われる。 The learning unit 107 receives as input the classification loss function obtained by the classification loss evaluation unit 104 and the classification mismatch degree evaluation value obtained by the classification discrepancy evaluation unit 106. The learning unit 107 performs iterative learning of the parameters of the feature extractor 101, first classifier 102, and second classifier 103 using the input values. The learning unit 107 records the parameters of the feature extractor 101, first classifier 102, and second classifier 103 obtained through iterative learning in the learning result storage unit 140. Iterative learning regarding the feature extractor 101 is performed so that both the classification loss evaluation value and the classification mismatch evaluation value become small. Iterative learning regarding the first classifier 102 and the second classifier 103 is performed so that the classification loss evaluation value becomes small and the classification mismatch degree evaluation value becomes large.

＜予測装置の構成例＞
次に、本実施形態に係る予測装置の構成について説明する。図４は、本実施形態に係る予測装置２００の一例を示す機能ブロック図である。予測装置２００は、例えばパーソナルコンピューターやサーバー装置等の情報処理装置を用いて構成される。予測装置２００は、制御部９１及び記憶部２３０を備える。制御部９１は、ＣＰＵ等のプロセッサーとメモリーとを用いて構成される。制御部９１は、プロセッサーがプログラムを実行することによって、特徴抽出器２０１及び識別器２０２として機能する。なお、制御部９１の各機能の全て又は一部は、ＡＳＩＣやＰＬＤやＦＰＧＡ等のハードウェアを用いて実現されても良い。上記のプログラムは、コンピューター読み取り可能な記録媒体に記録されても良い。コンピューター読み取り可能な記録媒体とは、例えばフレキシブルディスク、光磁気ディスク、ＲＯＭ、ＣＤ－ＲＯＭ、半導体記憶装置（例えばＳＳＤ）等の可搬媒体、コンピューターシステムに内蔵されるハードディスクや半導体記憶装置等の記憶装置である。上記のプログラムは、電気通信回線を介して送信されてもよい。 <Example of configuration of prediction device>
Next, the configuration of the prediction device according to this embodiment will be explained. FIG. 4 is a functional block diagram showing an example of the prediction device 200 according to this embodiment. The prediction device 200 is configured using, for example, an information processing device such as a personal computer or a server device. The prediction device 200 includes a control section 91 and a storage section 230. The control unit 91 is configured using a processor such as a CPU and a memory. The control unit 91 functions as a feature extractor 201 and a discriminator 202 when a processor executes a program. Note that all or part of each function of the control unit 91 may be realized using hardware such as an ASIC, a PLD, or an FPGA. The above program may be recorded on a computer-readable recording medium. Computer-readable recording media include, for example, portable media such as flexible disks, magneto-optical disks, ROMs, CD-ROMs, semiconductor storage devices (such as SSDs), and storage devices such as hard disks and semiconductor storage devices built into computer systems. It is a device. The above program may be transmitted via a telecommunications line.

記憶部２３０は、磁気ハードディスク装置や半導体記憶装置等の記憶装置を用いて構成される。記憶部２３０は、学習装置１００の学習部１０７によって行われた反復学習で得られた学習結果としてのパラメータを記憶する。 The storage unit 230 is configured using a storage device such as a magnetic hard disk device or a semiconductor storage device. The storage unit 230 stores parameters as learning results obtained through iterative learning performed by the learning unit 107 of the learning device 100.

特徴抽出器２０１は、処理対象のデータ（予測対象のデータ）２４０を受けると、記憶部２３０からパラメータを読み出し、パラメータに基づき動作する。特徴抽出器２０１は、処理対象のデータ２４０について特徴ベクトルを出力する。識別器２０２は、記憶部２３０からパラメータを読み出し、パラメータに基づいて動作する。識別器２０２は、特徴抽出器２０１によって得られた特徴ベクトルに基づいて、処理対象のデータ２４０について推定帰属確率を求める。識別器２０２の出力は、処理対象のデータ２４０についての、各クラスに対する推定帰属確率そのものであってもよいし、どのクラスに属するかの予測結果を示す情報であってもよい。 When the feature extractor 201 receives data to be processed (data to be predicted) 240, it reads parameters from the storage unit 230 and operates based on the parameters. The feature extractor 201 outputs a feature vector for the data 240 to be processed. The discriminator 202 reads parameters from the storage unit 230 and operates based on the parameters. The classifier 202 calculates an estimated belonging probability for the data 240 to be processed, based on the feature vector obtained by the feature extractor 201. The output of the classifier 202 may be the estimated belonging probability itself for each class regarding the data 240 to be processed, or may be information indicating the prediction result of which class the data 240 belongs to.

＜学習装置の動作例＞
図５は、学習装置１００の動作例を示すフローチャートである。次に、学習装置１００の動作例について説明する。学習装置１００は、教師ありデータ集合１１０及び教師なしデータ集合１２０を受けて、図５に示される学習処理ルーチンを実行する。 <Example of operation of learning device>
FIG. 5 is a flowchart showing an example of the operation of the learning device 100. Next, an example of the operation of the learning device 100 will be described. The learning device 100 receives the supervised data set 110 and the unsupervised data set 120, and executes the learning processing routine shown in FIG.

まず、学習装置１００の制御部９０は、一つ以上の教師ありデータ集合１１０及び教師なしデータ集合１２０を読み込む（ステップＳ１０１）。次に、制御部９０は、学習の反復回数が予め定められた予定回数以下であるか否かの分岐判定を行う（ステップＳ１０２）。反復回数が予定回数以下であれば、ステップＳ１０３の処理が実行される。一方、反復回数が予定回数より多ければステップＳ１０４の処理が実行される。 First, the control unit 90 of the learning device 100 reads one or more supervised data sets 110 and unsupervised data sets 120 (step S101). Next, the control unit 90 makes a branch determination as to whether or not the number of repetitions of learning is equal to or less than a predetermined number of times (step S102). If the number of repetitions is less than or equal to the scheduled number of times, the process of step S103 is executed. On the other hand, if the number of repetitions is greater than the scheduled number of times, the process of step S104 is executed.

ここで、ステップＳ１０２における分岐処理の意義について説明する。この分岐処理によって、未知クラスの識別方法が変化する。第一識別器１０２及び第二識別器１０３が既知クラスＫ個と未知クラスとを合わせた（Ｋ＋１）個のクラスを識別できるように学習する。しかし、既知クラスに関しては教師ありデータとしてデータとその教師帰属確率の組が得られるのに対して、未知クラスに関してはどのデータが未知クラスであるかは不明である。そこで、反復回数が予定回数以下である場合は、教師なしデータに対して未知クラスの識別を行い、結果を識別履歴として記録する。一方、反復回数が予定回数より多い場合は、（Ｋ＋１）個のクラスを識別できるように第一識別機１０２及び第二識別器１０３を学習し、未知クラスの識別結果を識別履歴として記録する。 Here, the significance of the branch processing in step S102 will be explained. This branching process changes the method of identifying unknown classes. The first classifier 102 and the second classifier 103 learn to identify (K+1) classes, which is a total of K known classes and unknown classes. However, for known classes, a set of data and its teacher attribution probability is obtained as supervised data, whereas for unknown classes, it is unclear which data is the unknown class. Therefore, if the number of repetitions is less than the planned number, unknown classes are identified for the unsupervised data, and the results are recorded as an identification history. On the other hand, if the number of repetitions is greater than the planned number, the first classifier 102 and the second classifier 103 are trained to identify (K+1) classes, and the results of identifying unknown classes are recorded as a classification history.

反復回数が予定階数以下である場合には、未知クラスの識別を行うことで未知クラスの教師帰属確率を推定することができるが、誤りも含まれてしまう。そのため、識別履歴を記録しながら（Ｋ＋１）個のクラスの識別を学習することで、誤りの少ない未知クラス識別が可能になる。 If the number of iterations is less than the predetermined rank, the teacher belonging probability of the unknown class can be estimated by identifying the unknown class, but this may include errors. Therefore, by learning the classification of (K+1) classes while recording the classification history, it is possible to identify unknown classes with fewer errors.

ステップＳ１０３では、教師ありデータ集合１１０及び教師なしデータ集合１２０に対して特徴抽出器１０１、第一識別器１０２、第二識別器１０３及び未知クラス識別器１０５を適用して、識別損失評価値、識別不一致度評価値、未知クラスか否かの判定が得られる。 In step S103, the feature extractor 101, the first classifier 102, the second classifier 103, and the unknown class classifier 105 are applied to the supervised data set 110 and the unsupervised data set 120, and the classification loss evaluation value, An identification inconsistency evaluation value and a determination as to whether the class is an unknown class or not are obtained.

ステップＳ１０４では、教師なしデータ集合１２０について未知クラス識別履歴を読み込む。そして、ステップＳ１０５において、教師ありデータ集合１１０及び教師なしデータ集合１２０、未知クラス識別履歴に対して、特徴抽出器１０１、第一識別器１０２、第二識別器１０３及び未知クラス識別器１０５を適用して、識別損失評価値、識別不一致度評価値、未知クラスか否かの判定が得られる。 In step S104, the unknown class identification history for the unsupervised data set 120 is read. Then, in step S105, the feature extractor 101, the first classifier 102, the second classifier 103, and the unknown class classifier 105 are applied to the supervised data set 110, the unsupervised data set 120, and the unknown class identification history. As a result, an identification loss evaluation value, an identification inconsistency evaluation value, and a determination as to whether or not the class is an unknown class are obtained.

ステップＳ１０３又はステップＳ１０５の処理が終わると、学習部１０７は、識別損失評価値及び識別不一致度評価値に基づいて、特徴抽出器１０１、第一識別器１０２及び第二識別器１０３のパラメータの値（学習結果記憶部１４０に記録される値）をそれぞれ更新する（ステップＳ１０６）。 When the process of step S103 or step S105 is finished, the learning unit 107 determines the values of the parameters of the feature extractor 101, the first classifier 102, and the second classifier 103 based on the classification loss evaluation value and the classification inconsistency evaluation value. (values recorded in the learning result storage unit 140) are updated (step S106).

特徴抽出器１０１、第一識別器１０２、第二識別器１０３のパラメータを学習結果記憶部１４０に格納する。次に、未知クラス識別器１０５は、ステップＳ１０３又はＳ１０５で得られた未知クラスデータであるかの識別結果を未知クラス情報記憶部１３０に記録する（ステップＳ１０７）。 The parameters of the feature extractor 101, first classifier 102, and second classifier 103 are stored in the learning result storage unit 140. Next, the unknown class discriminator 105 records the identification result as to whether the data is unknown class data obtained in step S103 or S105 in the unknown class information storage unit 130 (step S107).

そして、制御部９０は、終了条件を満たすかを判定する（ステップＳ１０８）。終了条件を満たしている場合（ステップＳ１０８－ＹＥＳ）、制御部９０は処理を終了する。終了条件を満たしていない場合（ステップＳ１０８－ＮＯ）、制御部９０は、ステップＳ１０１に戻って処理を繰り返す。 Then, the control unit 90 determines whether the termination condition is satisfied (step S108). If the termination condition is satisfied (step S108-YES), the control unit 90 terminates the process. If the termination condition is not satisfied (step S108-NO), the control unit 90 returns to step S101 and repeats the process.

以上説明した反復学習により、特徴抽出器１０１、第一識別器１０２及び第二識別器１０３のパラメータが学習される。特徴抽出器１０１に関しては、識別損失評価値及び識別不一致度評価値を用いて、識別損失評価値と識別不一致度評価値が小さくなるように学習が行われる。識別損失関数は、第一識別器１０２及び第二識別器１０３が出力したデータの推定帰属確率と、データの所与の教師帰属確率と、の類似度が高いほど小さい値を出力する。識別不一致度評価値は、第一識別器１０２及び第二識別器１０３が出力したデータの推定帰属確率についての識別器間の差を示す。また、第一識別器１０１及び第二識別器１０２に関しては、識別損失評価値は小さく、識別不一致度評価値は大きくなるように学習が行われる。 Through the iterative learning described above, the parameters of the feature extractor 101, first classifier 102, and second classifier 103 are learned. Regarding the feature extractor 101, learning is performed using the identification loss evaluation value and the identification inconsistency evaluation value so that the identification loss evaluation value and the identification inconsistency evaluation value become small. The identification loss function outputs a smaller value as the degree of similarity between the estimated attribution probability of the data output by the first classifier 102 and the second classifier 103 and the given teacher attribution probability of the data is higher. The identification discrepancy evaluation value indicates the difference between the classifiers in the estimated attribution probability of the data output by the first classifier 102 and the second classifier 103. Further, regarding the first classifier 101 and the second classifier 102, learning is performed so that the classification loss evaluation value is small and the classification mismatch degree evaluation value is large.

［各処理の詳細］
次に学習装置１００の各処理部の処理の詳細について説明する。
［反復回数が予定回数以下の場合］
ステップ１０２において、反復回数が予定回数以下の場合における識別損失評価部１０４、未知クラス識別器１０５、識別不一致評価部１０６、の各処理について説明する。 [Details of each process]
Next, details of processing of each processing unit of the learning device 100 will be explained.
[If the number of repetitions is less than the planned number]
In step 102, each process of the classification loss evaluation section 104, the unknown class classifier 105, and the classification mismatch evaluation section 106 when the number of repetitions is less than or equal to the planned number of repetitions will be explained.

識別損失関数は、特徴抽出器１０１の出力した特徴ベクトルを入力として第一識別器１０２及び第二識別器１０３が出力したデータの推定帰属確率とデータの所与の教師帰属確率との類似度が高いほど小さい値を出力するものである。識別損失関数は、後述する式２及び式３に対応する。また、値は、識別損失評価値に対応するものである。 The discrimination loss function calculates the degree of similarity between the estimated attribution probability of the data output by the first classifier 102 and the second classifier 103 using the feature vector output from the feature extractor 101 as input, and the given teacher attribution probability of the data. The higher the value, the smaller the value output. The discrimination loss function corresponds to Equation 2 and Equation 3, which will be described later. Further, the value corresponds to the identification loss evaluation value.

［識別損失評価部の処理］
特徴抽出器１０１は、データxを入力として特徴ベクトルfを出力しパラメータφを持つような関数Fを用いることで実現される。第一識別器１０２は、特徴ベクトルfを入力として推定帰属確率y1を出力するパラメータθ１を持つ関数として表現することができる。第二識別器１０３は、特徴ベクトルfを入力として推定帰属確率y2を出力するパラメータθ２を持つ関数として表現することができる。第一識別器１０２及び第二識別器１０３を実現する関数は、特徴抽出器１０１を実現する関数Fを用いて、確率関数として下記式１のように表すことができる。なお、iは2つの識別器を区別するための添え字として用いる。 [Processing of identification loss evaluation unit]
The feature extractor 101 is realized by using a function F that receives data x as input, outputs a feature vector f, and has a parameter φ. The first classifier 102 can be expressed as a function having a parameter θ1 that receives the feature vector f as an input and outputs an estimated belonging probability y1. The second classifier 103 can be expressed as a function having a parameter θ2 that receives the feature vector f as an input and outputs an estimated belonging probability y2. A function that implements the first classifier 102 and second classifier 103 can be expressed as a probability function as shown in Equation 1 below using a function F that implements the feature extractor 101. Note that i is used as a subscript to distinguish between two classifiers.

式はφ、θi、及びxが与えられた下でのyiが出現する確率である。望ましい特徴抽出器１０１、第一識別器１０２及び第二識別器１０３は、教師ありデータ集合からデータsが与えられた時、各クラスへの教師帰属確率tが出現するようなものである。すなわち、正解となるクラスが識別可能な帰属確率が求められる特徴抽出器１０１、第一識別器１０２及び第二識別器１０３である。データsと対応する教師帰属確率tの出現確率をp（s，t）とすると、学習は下記式２が小さくなるようにパラメータφ、θiを決定できれば良い。 The formula is the probability that yi will appear given φ, θi, and x. Desirable feature extractor 101, first classifier 102, and second classifier 103 are such that when data s from a supervised data set is given, supervised attribution probability t to each class appears. That is, the feature extractor 101, the first classifier 102, and the second classifier 103 are used to determine the probability of belonging to which a correct class can be identified. When the appearance probability of the teacher attribution probability t corresponding to the data s is p(s, t), it is sufficient for learning to determine the parameters φ and θi so that the following equation 2 becomes small.

Eb［a］は、aの確率bに対する期待値である。本実施形態の場合は、教師ありデータは教師ありデータ集合から取得されるので、期待値は下記式３のように総和の形で近似的に置き換えられる。 Eb[a] is the expected value of probability b of a. In the case of this embodiment, since supervised data is obtained from a supervised data set, the expected value is approximately replaced in the form of a summation as shown in Equation 3 below.

なお、S、T、はそれぞれ１つ以上のデータと、対応する教師帰属確率の集合である。式３が本実施形態の一例における識別損失関数であり、これを任意のS、Tに対して評価した値が識別損失評価値である。 Note that S and T are each a set of one or more data and corresponding teacher attribution probabilities. Equation 3 is the discrimination loss function in an example of this embodiment, and the value obtained by evaluating this for arbitrary S and T is the discrimination loss evaluation value.

式３をφ、θ１、θ２について小さくすることで、sに対してtを出力できるような望ましい特徴抽出器１０１、第一識別機１０２及び第二識別器１０３を得ることができる。このようなφ、θ１、θ２を求める方法は様々ある。単純には、特徴抽出器を実現する関数Fと、第一識別器１０２及び第二識別器１０３と、を表す確率関数がそれぞれのパラメータφ、θ１、θ２に対して微分可能である場合、局所最小化できることが知られている。そのため、本実施形態の一例においては、特徴抽出器１０１として、データxを入力された下でそのデータの特徴ベクトルfを出力する関数であること、φについて微分可能であること、第一識別器１０２及び第二識別器１０３として特徴ベクトルfを入力として推定帰属確率y１、y２を出力する関数であること、それぞれθ１、θ２に対して微分可能であること、という条件を満たす関数を選んでもよい。 By reducing Equation 3 with respect to φ, θ1, and θ2, it is possible to obtain a desirable feature extractor 101, first classifier 102, and second classifier 103 that can output t for s. There are various methods to obtain such φ, θ1, and θ2. Simply put, if the function F that realizes the feature extractor and the probability functions representing the first classifier 102 and the second classifier 103 are differentiable with respect to the respective parameters φ, θ1, θ2, then the local It is known that it can be minimized. Therefore, in an example of the present embodiment, the feature extractor 101 is a function that outputs a feature vector f of the data when data x is input, is differentiable with respect to φ, and is used as a first discriminator. 102 and the second discriminator 103 may be selected from functions that satisfy the conditions that the function outputs the estimated membership probabilities y1 and y2 by inputting the feature vector f, and that it is differentiable with respect to θ1 and θ2, respectively. .

［識別不一致評価部の処理］
ある推定帰属確率p1,p2の識別不一致度評価値は、p1k,p2kをそれぞれ推定帰属確率p1,p2のクラスkに対する帰属確率を表すものとした時、下記式４のように表される。ここでKは識別すべき既知クラスの数、K+1は既知クラスのいずれにも該当しない未知クラスを表す。 [Processing of identification discrepancy evaluation unit]
The discrimination inconsistency evaluation value for certain estimated belonging probabilities p1 and p2 is expressed as shown in the following equation 4, when p1k and p2k represent the belonging probabilities of estimated belonging probabilities p1 and p2 for class k, respectively. Here, K represents the number of known classes to be identified, and K+1 represents an unknown class that does not fall under any of the known classes.

識別不一致評価部１０６は、教師なしデータ集合１２０のデータuに対して第一識別器１０２及び第二識別器１０３が出力する推定帰属確率y1、y2の不一致度を評価する。すなわち識別不一致評価部１０６は、式４の推定帰属確率の識別不一致度評価値を用いて、下記式５に示す、教師なしデータ集合のデータuの出現確率p(u)について、第一識別器１０２及び第二識別器１０３の推定帰属確率の識別不一致度評価値Ladvを出力する。 The identification discrepancy evaluation unit 106 evaluates the degree of discrepancy between the estimated belonging probabilities y1 and y2 output by the first classifier 102 and the second classifier 103 for the data u of the unsupervised data set 120. That is, the identification discrepancy evaluation unit 106 uses the identification discrepancy degree evaluation value of the estimated attribution probability in equation 4 to calculate the first discriminator for the appearance probability p(u) of data u in the unsupervised data set shown in equation 5 below. 102 and the second classifier 103 are output.

Eb［a］は、aの確率bに対する期待値である。本実施形態の場合は、教師なしデータは教師なしデータ集合から取得されるので、期待値は下記式６のように総和の形で近似的に置き換えられる。 Eb[a] is the expected value of probability b of a. In the case of this embodiment, since the unsupervised data is obtained from an unsupervised data set, the expected value is approximately replaced in the form of a sum as shown in Equation 6 below.

なお、Uは１つ以上のデータである。式６が本実施形態の一例における識別不一致度であり、これを任意のUに対して評価した値が識別不一致度評価値である。 Note that U is one or more pieces of data. Equation 6 is the identification mismatch degree in an example of this embodiment, and the value obtained by evaluating this for any U is the identification mismatch degree evaluation value.

［未知クラス識別器処理］
データxに対する第一識別器１０２及び第二識別器１０３が出力する推定帰属確率y1、y2は上述の式１を用いて表すことができる。第一識別器１０２及び第二識別器１０３が出力する推定帰属確率y1、y2の平均推定帰属確率yについて出力された帰属確率yの曖昧性を示す情報エントロピーH（y｜x）は下記式７のように表される。 [Unknown class classifier processing]
Estimated attribution probabilities y1 and y2 output by the first classifier 102 and the second classifier 103 for data x can be expressed using Equation 1 above. The information entropy H (y | It is expressed as follows.

教師なしデータ集合の教師なしデータuが未知クラスデータであるか否かの判別は、式４に示される情報エントロピーの値が予め定めた閾値σより大きいか否かによって決まる。すなわち、反復回数e回目における教師なしデータuが未知クラスデータであるか否かの識別y_u,eは下記式８のように表される。 Determination of whether the unsupervised data u of the unsupervised data set is unknown class data is determined by whether the value of information entropy shown in Equation 4 is larger than a predetermined threshold value σ. That is, the identification y _u,e of whether or not the unsupervised data u at the e-th iteration is unknown class data is expressed as shown in Equation 8 below.

［反復回数が一定より大きい場合］
ステップＳ１０５において教師なしデータ集合を既知クラスデータ集合U_Iと未知クラスデータ集合U_Oとに分割する処理について説明する。教師なしデータ集合のデータuについて、反復回数tの時の未知クラスデータであるか否かについての識別結果は、後述のステップＳ１０７でy_u,tとして未知クラス情報記憶部１３０に格納されている。ステップＳ１０４では、過去T回の識別結果を未知クラス情報記憶部１３０から読み出し、過去T/2回以上未知クラスデータであると識別された教師なしデータ集合のデータuについては、未知クラスデータ集合U_Oに属するもの、それ以外のデータは既知クラスデータ集合U_Iに属するものする。すなわち、反復回数eにおいて教師なしデータ集合をUとした時、Uは下記式９及び式１０にしたがい、既知クラスデータ集合U_Iと未知クラスデータ集合U_Oとに分割される。 [If the number of repetitions is greater than a certain value]
The process of dividing the unsupervised data set into the known class data set U _I and the unknown class data set U _O in step S105 will be described. The identification result as to whether or not data u of the unsupervised data set is unknown class data at the number of iterations t is stored in the unknown class information storage unit 130 as y _u,t in step S107, which will be described later. . In step S104, the past T identification results are read from the unknown class information storage unit 130, and for the unsupervised data set data u that has been identified as unknown class data more than T/2 times in the past, the unknown class data set U Data belonging to _O , and other data belonging to known class data set U _I. That is, when the unsupervised data set is U at the number of iterations e, U is divided into a known class data set U _I and an unknown class data set U _O according to Equations 9 and 10 below.

次にステップＳ１０５に係る評価処理について説明する。識別損失評価部１０４及び識別不一致評価部１０６の処理については、反復回数が一定以下の場合の処理であるステップＳ１０３とほぼ同様の処理を行う。 Next, the evaluation process related to step S105 will be explained. The processing of the identification loss evaluation section 104 and the identification mismatch evaluation section 106 is substantially the same as step S103, which is the processing when the number of repetitions is less than a certain value.

［識別損失評価部の処理］
識別損失評価部１０４は、教師ありデータとその教師帰属確率の集合（S，T）と未知クラスデータ集合U_Oの和集合について総和を取ることにより、識別損失評価値を求める。すなわち、識別損失評価部１０４の評価値は下記式１１の形で表される。 [Processing of identification loss evaluation unit]
The discrimination loss evaluation unit 104 obtains a discrimination loss evaluation value by calculating the sum of the union of the supervised data and its teacher attribution probability set (S, T) and the unknown class data set U _O. That is, the evaluation value of the identification loss evaluation unit 104 is expressed in the form of Equation 11 below.

［識別不一致評価部の処理］
識別不一致評価部１０６の処理については、既知クラスデータ集合U_Iのデータに対して、ステップＳ１０３における識別不一致評価部１０６の式６の評価処理と同様の処理を行うことで、識別不一致度評価値を求める。すなわち、ステップＳ１０５に係る識別不一致評価部１０６の出力する識別不一致度評価値は下記式１２により求められる。 [Processing of identification discrepancy evaluation unit]
Regarding the process of the identification discrepancy evaluation unit 106, the identification discrepancy degree evaluation value is obtained by performing the same processing as the evaluation process of Equation 6 of the identification discrepancy evaluation unit 106 in step _S103 on the data of the known class data set U I. seek. That is, the identification inconsistency degree evaluation value output by the identification inconsistency evaluation unit 106 in step S105 is obtained by equation 12 below.

［未知クラス識別器の処理］
データxに対する第一識別器１０２及び第二識別器１０３が出力する推定帰属確率y₁、y₂は上述の式１を用いて表すことができる。第一識別器１０２及び第二識別器１０３が出力する推定帰属確率y₁、y₂から平均推定帰属確率yを求めることができる。教師なしデータ集合の教師なしデータuが未知クラスデータであるかどうかの判別は、平均推定帰属確率yについて、各識別クラスに対する帰属確率のうち、未知クラスであるK+１クラスに対する帰属確率がもっとも高いデータであれば、未知クラスデータであるとし、そうでない場合は未知クラスデータではないとして判断を行う。すなわち反復回数e回目における教師なしデータuが未知クラスデータであるかどうかの識別y_u,eは下記式１３のように表される。 [Processing of unknown class classifier]
Estimated attribution probabilities y ₁ and y ₂ output by the first classifier 102 and the second classifier 103 for data x can be expressed using Equation 1 above. The average estimated attribution probability y can be determined from the estimated attribution probabilities y ₁ and y ₂ output by the first classifier 102 and the second classifier 103. To determine whether unsupervised data u of an unsupervised data set is unknown class data, for the average estimated attribution probability y, among the attribution probabilities for each identified class, the attribution probability for the K+1 class, which is an unknown class, is the highest. If the data is high, it is determined that it is unknown class data, and if not, it is determined that it is not unknown class data. That is, the identification y _u,e of whether the unsupervised data u at the e-th iteration is unknown class data is expressed as shown in Equation 13 below.

［学習処理］
ステップＳ１０６にかかる学習部１０７の学習処理について説明する。特徴抽出器１０１については識別損失評価値L_sと識別不一致度評価値L_advの値が小さくなるように学習処理を行う。第一識別器１０２及び第二識別器１０３については、識別損失評価値L_sは小さく、識別不一致度評価値は大きくなるように学習処理を行う。具体的には式１４、式１５及び式１６に示す問題を順次最適化するように行う。 [Learning process]
The learning process of the learning unit 107 in step S106 will be explained. Regarding the feature extractor 101, learning processing is performed so that the values of the discrimination loss evaluation value L _s and the discrimination inconsistency degree evaluation value L _adv become small. Regarding the first classifier 102 and the second classifier 103, learning processing is performed so that the classification loss evaluation value L _s is small and the classification mismatch degree evaluation value is large. Specifically, the problems shown in Equations 14, 15, and 16 are sequentially optimized.

ここで、識別損失評価値Lsと識別不一致度評価値L_advがパラメータθ₁、θ₂、φについて微分可能であるように特徴抽出器１０１、第一識別器１０２及び第二識別器１０３の関数を選んだため、誤差勾配効果法により学習することが可能である。 Here, the functions of the feature extractor 101, the first classifier 102, and the second classifier 103 are set so that the classification loss evaluation value Ls and the classification discrepancy evaluation value L _adv are differentiable with respect to the parameters θ ₁ , θ ₂ , and φ. Since we chose , it is possible to learn using the error gradient effect method.

上記の学習により期待される効果を説明する。まずL_sについてはパラメータθ₁、θ₂、φについて最小化させることは、一般の識別学習と同様に、教師ありデータに基づいて認識精度を改善させる効果を生む。 The expected effects of the above learning will be explained. First, regarding L _s , minimizing the parameters θ ₁ , θ ₂ , and φ produces the effect of improving recognition accuracy based on supervised data, similar to general discrimination learning.

L_advについては、パラメータθ₁、θ₂は値が大きくなるように、パラメータφについては最小化するように学習を行う。この学習の効果に関する詳細については非特許文献４に記載されているとおりである。特徴抽出器１０１が出力する特徴の空間における教師ありデータの分布と教師なしデータの分布とが近づくことになる。特徴空間における分布が近づくことによって、教師ありデータで学習した識別器によって、教師なしデータを認識した場合に高精度に認識することが可能になる。 Regarding L _adv , learning is performed so that the values of the parameters θ ₁ and θ ₂ are increased, and the parameter φ is minimized. Details regarding the effect of this learning are as described in Non-Patent Document 4. The distribution of supervised data and the distribution of unsupervised data in the space of features output by the feature extractor 101 become close to each other. As the distribution in the feature space becomes closer, it becomes possible for a classifier trained on supervised data to perform highly accurate recognition when recognizing unsupervised data.

しかし、単に非特許文献４と同様の学習により、教師ありデータの分布と教師なしデータの分布を特徴空間上において近づけると、教師なしデータのうち未知クラスデータも近づいてしまうことになる。この時、教師なし未知クラスデータは教師ありデータに近づいてしまい、本来不適当な既知クラスデータのクラスのいずれかに識別されることになる。本実施形態では、未知クラスデータの検出を行い、ステップＳ１０５において、未知クラスデータと検出されたデータに対しては、L_advの評価には用いないようにしている。これによって、上述した不適切な教師ありデータ分布と未知クラスデータ分布を近づけることを防ぎ、未知クラスデータは未知クラスデータであると検知するように学習することが可能になっている。 However, if the distribution of supervised data and the distribution of unsupervised data are brought closer together in the feature space simply by learning similar to Non-Patent Document 4, the unknown class data among the unsupervised data will also be brought closer together. At this time, the unsupervised unknown class data approaches the supervised data, and is identified as one of the classes of known class data that is originally inappropriate. In this embodiment, unknown class data is detected, and in step S105, data detected as unknown class data is not used for L _adv evaluation. This prevents the above-mentioned inappropriate supervised data distribution from becoming close to the unknown class data distribution, and makes it possible to learn to detect unknown class data as unknown class data.

［パラメータ格納処理］
パラメータ学習後、ステップＳ１０７に係る処理にて、パラメータθ_１、θ_２、φを学習結果記憶部１４０に格納する。 [Parameter storage processing]
After parameter learning, the parameters θ ₁ , θ ₂ , and φ are stored in the learning result storage unit 140 in the process related to step S107.

ステップＳ１０８にかかる処理における、教師なしデータが未知クラスデータであるかについての識別結果の保存処理について説明する。反復回数eにおける教師なしデータ集合のデータuが未知クラスデータであるかの識別履歴は、反復回数eが一定以下の場合、ステップＳ１０３の処理によって、y_u,eが得られている。また、反復回数eにおける教師なしデータ集合のデータuが未知クラスデータであるかの識別履歴は、反復回数eが一定よりも大きい場合、ステップS３０５の処理によって、y_u,eが得られている。ステップＳ１０８では教師なしデータ集合のデータuそれぞれについて、識別結果y_u,eを未知クラス情報記憶部１３０に格納する。 The process of saving the identification result as to whether the unsupervised data is unknown class data in the process of step S108 will be described. As for the identification history of whether the data u of the unsupervised data set at the number of iterations e is unknown class data, when the number of iterations e is less than a certain value, y _u,e is obtained by the process of step S103. In addition, the identification history of whether the data u of the unsupervised data set at the number of iterations e is unknown class data is that if the number of iterations e is greater than a certain value, y _u,e is obtained by the process of step S305. . In step S108, the identification results y _u,e are stored in the unknown class information storage unit 130 for each data u of the unsupervised data set.

以上のステップＳ１０１からＳ１０８までの学習処理を、終了条件が満たされるまで繰り返せば良い。 The learning process from steps S101 to S108 described above may be repeated until the termination condition is satisfied.

終了条件については、任意の情報が用いられて良い。例えば、「所定の回数を繰り返すまで」、「目的関数の値が一定以上変換しなくなるまで」、「学習データとは別に用意した評価用データに対する精度が一定以上変化しなくなるまで」などとすればよい。 Any information may be used as the termination condition. For example, "until it is repeated a predetermined number of times", "until the value of the objective function no longer changes beyond a certain level", "until the accuracy of evaluation data prepared separately from the training data no longer changes beyond a certain level", etc. good.

（変形例）
教師ありデータ記憶部１１０及び教師なしデータ記憶部１２０のいずれか一方又は双方は、学習装置１００に備えられてもよい。未知クラス情報記憶部１３０及び学習結果記憶部１４０のいずれか一方又は双方は、学習装置１００の外部に設けられてもよい。外部に設けられた場合には、例えばＴＣＰ／ＩＰ等の通信を行うことでデータが取得されてもよい。 (Modified example)
Either or both of the supervised data storage section 110 and the unsupervised data storage section 120 may be included in the learning device 100. Either or both of the unknown class information storage section 130 and the learning result storage section 140 may be provided outside the learning device 100. If it is provided externally, data may be acquired through communication such as TCP/IP.

学習装置１００は、１台の情報処理装置を用いて実装されてもよいし、複数台の情報処理装置に分散して実装されてもよい。 The learning device 100 may be implemented using one information processing device, or may be distributed and implemented among multiple information processing devices.

以上、この発明の実施形態について図面を参照して詳述してきたが、具体的な構成はこの実施形態に限られるものではなく、この発明の要旨を逸脱しない範囲の設計等も含まれる。 Although the embodiments of the present invention have been described above in detail with reference to the drawings, the specific configuration is not limited to these embodiments, and includes designs within the scope of the gist of the present invention.

本発明は、学習装置に適用可能である。 The present invention is applicable to learning devices.

１００…学習装置、１０１…特徴抽出器、１０２…第一識別器、１０３…第二識別器、１０４…識別損失評価部、１０５…未知クラス識別器、１０６…識別不一致評価部、１０７…学習部、２００…予測装置 100...Learning device, 101...Feature extractor, 102...First classifier, 103...Second classifier, 104...Discrimination loss evaluation section, 105...Unknown class classifier, 106...Identification discrepancy evaluation section, 107...Learning section , 200...prediction device

Claims

a feature extractor that outputs feature quantities of input data;
a plurality of classifiers that obtain probability of belonging to a known class and an unknown class for the data based on the feature amount;
an unknown class classifier that determines whether the data is an unknown class based on the belonging probability obtained by the classifier;
an identification inconsistency evaluation unit that outputs an identification inconsistency degree value indicating a difference in each of the attribution probabilities obtained by the plurality of classifiers for the data;
Using data that is not the unknown class and to which no teacher label has been assigned, the value of the identification discrepancy is reduced for the feature extractor, and the identification discrepancy for the plurality of classifiers is a learning unit that repeatedly learns parameters of the feature extractor and the plurality of classifiers so as to increase the value;
Equipped with
Further comprising a discriminant loss evaluation unit that outputs, for the data, a value of a discriminant loss function that indicates a smaller value as the degree of similarity between the attribution probability and a given teacher attribution probability of the data is higher;
The learning unit performs the discrimination using the feature extractor and the plurality of classifiers using data to which a teacher label is assigned and data that is an unknown class and does not have a teacher label. The learning device further performs iterative learning of the parameters so as to reduce the value of the loss function .

a feature extractor that outputs feature quantities of input data;
a plurality of classifiers that obtain probability of belonging to a known class and an unknown class for the data based on the feature amount;
an unknown class classifier that determines whether the data is an unknown class based on the belonging probability obtained by the classifier;
an identification inconsistency evaluation unit that outputs an identification inconsistency degree value indicating a difference in each of the attribution probabilities obtained by the plurality of classifiers for the data;
Using data that is not the unknown class and to which no teacher label has been assigned, the value of the identification discrepancy is reduced for the feature extractor, and the identification discrepancy for the plurality of classifiers is a learning unit that repeatedly learns parameters of the feature extractor and the plurality of classifiers so as to increase the value;
Equipped with
The unknown class classifier is a learning device that makes a determination based on past determination results when the number of times of iterative learning in the learning section is greater than a predetermined number of times .

3. The learning device according to claim 1, wherein the unknown class classifier makes a determination based on the belonging probability when the number of times of iterative learning in the learning section is less than or equal to a predetermined number.

A feature extractor that outputs feature amounts of input data based on parameters obtained by the learning device according to any one of claims 1 to 3 ;
A classifier that obtains probability of belonging to a known class and an unknown class for the data based on the parameters and the feature amount obtained by the learning device according to any one of claims 1 to 3 ;
A prediction device comprising:

a feature extraction step of outputting feature quantities of input data using a feature extractor;
an identification step of obtaining probability of belonging to a known class and an unknown class for the data based on the feature amount using a plurality of classifiers;
an unknown class identification step of determining whether the data is an unknown class based on the acquired belonging probability;
an identification inconsistency evaluation step of outputting an identification inconsistency degree value indicating a difference in the respective attribution probabilities obtained by the plurality of classifiers for the data;
Using data that is not the unknown class and to which no teacher label has been assigned, the value of the identification discrepancy is reduced for the feature extractor, and the identification discrepancy for the plurality of classifiers is a learning step of iteratively learning the parameters of the feature extractor and the plurality of classifiers so as to increase the value;
has
further comprising a discriminative loss evaluation step for outputting a discriminative loss function value that is smaller as the degree of similarity between the attribution probability and a given teacher attribution probability of the data is higher for the data;
In the learning step, the feature extractor and the plurality of classifiers perform the identification using data to which a teacher label has been assigned and data that is an unknown class and to which no teacher label has been assigned. A learning method further comprising iteratively learning the parameters so as to reduce the value of the loss function .

a feature extraction step of outputting feature quantities of input data using a feature extractor;
an identification step of obtaining probability of belonging to a known class and an unknown class for the data based on the feature amount using a plurality of classifiers;
an unknown class identification step of determining whether the data is an unknown class based on the acquired belonging probability;
an identification inconsistency evaluation step of outputting an identification inconsistency degree value indicating a difference in the respective attribution probabilities obtained by the plurality of classifiers for the data;
Using data that is not the unknown class and to which no teacher label has been assigned, the value of the identification discrepancy is reduced for the feature extractor, and the identification discrepancy for the plurality of classifiers is a learning step of iteratively learning the parameters of the feature extractor and the plurality of classifiers so as to increase the value;
has
In the unknown class identification step, if the number of times of iterative learning in the learning step is greater than a predetermined number of times, the learning method comprises making a determination based on past determination results .

A program for operating a computer as the learning device according to claim 1 .