JP2010003106A

JP2010003106A - Classification model generation device, classification device, classification model generation method, classification method, classification model generation program, classification program and recording medium

Info

Publication number: JP2010003106A
Application number: JP2008161237A
Authority: JP
Inventors: Tomoharu Iwata; 具治岩田; Toshiyuki Tanaka; 利幸田中
Original assignee: Kyoto University; Nippon Telegraph and Telephone Corp
Current assignee: Kyoto University; Nippon Telegraph and Telephone Corp
Priority date: 2008-06-20
Filing date: 2008-06-20
Publication date: 2010-01-07
Anticipated expiration: 2028-06-20
Also published as: JP5164209B2

Abstract

<P>PROBLEM TO BE SOLVED: To generate a high-accuracy classification model related to a target classification system by efficiently using not only data on the target classification system but also data on an auxiliary classification system. <P>SOLUTION: This classification device 1 uses not only already classified data in the target classification system but also already classified data in the auxiliary classification system, estimates weight so as to minimize expected error as total of a product of an error function and the weight, generates the classification model by use of the estimated weight and two kinds of already classified data to effectively use not only the data of the target classification system but also the data of the auxiliary classification system, and can generate the high-accuracy classification model related to the target classification system. <P>COPYRIGHT: (C)2010,JPO&INPIT

Description

本発明は、分類対象データを分類する分類体系（以下、「ターゲット分類体系」という。）のデータだけでなく、別の分類体系（以下、「補助分類体系」という。）のデータも用いて、分類モデルを学習し、また、その学習した分類モデルを用いて分類対象データをターゲット分類体系において分類する技術に関する。 The present invention uses not only data of a classification system (hereinafter referred to as “target classification system”) for classifying classification target data but also data of another classification system (hereinafter referred to as “auxiliary classification system”). The present invention relates to a technique for learning a classification model and classifying classification target data in a target classification system using the learned classification model.

学習データ（学習用のデータ）の数が少ない場合、一般に、分類モデルの性能は低くなる。そこで、補助分類体系におけるクラスのラベル（以下、「クラスラベル」または単に「ラベル」という。）が付与されたデータを用いることにより、分類モデルの性能を向上させることができれば好ましい。その場合、例えば、あるＷｅｂページを、あるターゲット分類体系のクラス（以下、「ターゲットクラス」ともいう。）に分類したいとする。そして、ディレクトリ型検索エンジンやソーシャルブックマークサイトにおける多数のユーザによって、ターゲット分類体系とは異なる補助分類体系に、多くのＷｅｂページがすでに分類されており、そのような情報を活用できれば望ましい。 When the number of learning data (learning data) is small, generally the performance of the classification model is low. Therefore, it is preferable that the performance of the classification model can be improved by using data with a class label (hereinafter referred to as “class label” or simply “label”) in the auxiliary classification system. In this case, for example, it is assumed that a certain Web page is classified into a class of a certain target classification system (hereinafter also referred to as “target class”). It is desirable if many web pages have already been classified into an auxiliary classification system different from the target classification system by a large number of users in a directory search engine or a social bookmark site, and such information can be utilized.

また、例えば、オンラインショッピングなどの商品について購買順序を考慮した予測（分類）に関する技術が知られている（非特許文献１参照）。
岩田具治、山田武士、上田修功、“購買順序を考慮した協調フィルタリング”、人工知能と知識処理研究会、AI2007-3,13-18,2007 In addition, for example, a technique related to prediction (classification) in consideration of a purchase order for products such as online shopping is known (see Non-Patent Document 1).
Tomoharu Iwata, Takeshi Yamada, Nobuo Ueda, “Collaborative Filtering Considering Purchase Order”, Artificial Intelligence and Knowledge Processing Study Group, AI2007-3,13-18,2007

しかし、補助分類体系とターゲット分類体系とでは、一般にクラスラベルが異なり、また、同じラベルがあったとしても意味が異なる可能性もある。そのため、従来の教師あり学習の技術（非特許文献１など）を用いて、補助分類体系のクラス（以下、「補助クラス」ともいう。）のデータを利用することはできないという問題がある。 However, the class label is generally different between the auxiliary classification system and the target classification system, and even if the same label exists, the meaning may be different. Therefore, there is a problem that data of a class of an auxiliary classification system (hereinafter also referred to as “auxiliary class”) cannot be used by using a conventional supervised learning technique (Non-Patent Document 1 or the like).

そこで、本発明は、前記問題に鑑みてなされたものであり、補助分類体系のデータを有効に利用することで、ターゲット分類体系に関する高精度な分類モデルを生成することを課題とする。また、その生成した分類モデルを用いて、分類対象データをターゲット分類体系において高精度に分類することを、他の課題とする。 Therefore, the present invention has been made in view of the above problems, and an object thereof is to generate a highly accurate classification model related to a target classification system by effectively using data of an auxiliary classification system. Another object is to use the generated classification model to classify the classification target data with high accuracy in the target classification system.

前記課題を解決するために、本発明は、分類対象データを分類する分類体系であるターゲット分類体系においてすでに分類されている１つ以上の既分類データと、前記ターゲット分類体系とは異なる分類体系である補助分類体系においてすでに分類されている１つ以上の既分類データと、を用いて学習を行うことで、前記分類対象データを前記ターゲット分類体系における複数のクラスのいずれかに分類するための分類モデルを生成する分類モデル生成装置であって、情報を記憶する記憶手段と、前記した２種類の既分類データにおける個別の各既分類データを前記ターゲット分類体系のいずれかのクラスに分類したと予測したときの前記分類モデルの誤差関数と、当該予測をしたときの前記した２種類の既分類データにおける個別の各既分類データの前記分類モデルへの影響度を示す各重みと、を用いて、前記した２種類の既分類データにおける個別の既分類データごとの前記誤差関数の値と前記重みとの積の総和である期待誤差を最小化させるように、前記重みを推定して、当該重みを前記記憶手段に格納する重み推定部と、前記記憶手段に格納された重みと、前記した２種類の既分類データと、を用いて、前記分類モデルを生成するモデル構築部と、を備えることを特徴とする。 In order to solve the above problems, the present invention provides one or more already classified data already classified in a target classification system that is a classification system for classifying classification target data, and a classification system different from the target classification system. Classification for classifying the data to be classified into one of a plurality of classes in the target classification system by performing learning using one or more already classified data already classified in a certain auxiliary classification system A classification model generation device for generating a model, wherein storage means for storing information and prediction that each individual classification data in the two types of classification data described above is classified into one of the classes of the target classification system Error function of the classification model at the time of the prediction, and each individual fraction in the two types of previously classified data at the time of the prediction Each weight indicating the degree of influence of the data on the classification model, and the sum of products of the error function value and the weight for each of the already classified data in the two types of previously classified data. In order to minimize the expected error, the weight is estimated, the weight estimation unit that stores the weight in the storage unit, the weight stored in the storage unit, and the two types of already classified data described above, And a model construction unit for generating the classification model.

かかる発明によれば、ターゲット分類体系における既分類データだけでなく、補助分類体系における既分類データも用い、誤差関数と重みとの積の総和である期待誤差を最小化させるように重みを推定し、その推定された重みと２種類の既分類データとを用いて分類モデルを生成することで、補助分類体系のデータも有効に利用し、ターゲット分類体系に関する高精度な分類モデルを生成することができる。 According to this invention, not only the already classified data in the target classification system but also the already classified data in the auxiliary classification system is used, and the weight is estimated so as to minimize the expected error that is the sum of the products of the error function and the weight. By generating a classification model using the estimated weight and two types of already-classified data, it is possible to effectively use the data of the auxiliary classification system and generate a highly accurate classification model for the target classification system. it can.

また、本発明は、前記重み推定部が、前記ターゲット分類体系と前記補助分類体系とを統合した場合の確率分布モデルを、前記ターゲット分類体系の確率分布モデルに近似させるための、前記ターゲット分類体系と前記補助分類体系とのクラスごとの前記分類モデルへの影響度の比率を示す混合比を用いて、前記した２種類の既分類データに関する事後確率を推定して、当該事後確率を前記記憶手段に格納する事後確率推定部と、前記記憶手段に格納された事後確率を用いて、前記ターゲット分類体系の既分類データに対する尤度を最大化するように、前記混合比を推定し、前記尤度が最大化されたときの前記混合比から前記重みを推定して、当該重みを前記記憶手段に格納する混合比推定部と、を備えることを特徴とする。 Further, the present invention provides the target classification system for approximating a probability distribution model when the weight estimation unit integrates the target classification system and the auxiliary classification system to a probability distribution model of the target classification system. And the auxiliary classification system are used to estimate the posterior probabilities related to the two types of already classified data using a mixing ratio indicating the ratio of the degree of influence on the classification model for each class, and the storage means stores the posterior probabilities The mixture ratio is estimated using the posterior probability estimation unit stored in the storage means and the posterior probability stored in the storage means so as to maximize the likelihood for the already classified data of the target classification system, and the likelihood And a mixture ratio estimation unit that estimates the weight from the mixture ratio when the value is maximized and stores the weight in the storage means.

かかる発明によれば、重み推定部が、事後確率推定部と、混合比推定部とを備えているので、例えば、事後確率推定部が、ＥＭ（Expectation-Maximization）アルゴリズムにおけるＥ(Expectation)ステップを行い、かつ、混合比推定部がＭ(Maximization)ステップを行うことで、混合比についての大域的最適解を求め、求めた混合比から重みを決定（推定）することができる。 According to this invention, since the weight estimation unit includes the posterior probability estimation unit and the mixture ratio estimation unit, for example, the posterior probability estimation unit performs the E (Expectation) step in the EM (Expectation-Maximization) algorithm. And the mixture ratio estimation unit performs the M (Maximization) step, thereby obtaining a global optimum solution for the mixture ratio and determining (estimating) the weight from the obtained mixture ratio.

また、本発明は、前記モデル構築部が、前記記憶手段に格納された重みと、前記した２種類の既分類データと、を用いて、前記分類モデルにおいて前記分類対象データを前記ターゲット分類体系に分類するためのモデルパラメータを推定して、当該モデルパラメータを前記記憶手段に格納するモデルパラメータ推定部を備えることを特徴とする。 Further, according to the present invention, the model construction unit uses the weight stored in the storage unit and the two types of already classified data, and the classification target data in the classification model is included in the target classification system. A model parameter estimating unit for estimating model parameters for classification and storing the model parameters in the storage unit is provided.

かかる発明によれば、モデルパラメータ推定部が、例えば、後記する式（１０）を用いてモデルパラメータを推定することができる。 According to this invention, a model parameter estimation part can estimate a model parameter using the formula (10) described later, for example.

また、本発明は、分類装置が、分類モデル生成装置の前記記憶手段に格納されたモデルパラメータを用いて、前記分類対象データを前記ターゲット分類体系における複数のクラスのいずれかに分類する分類部を備えることを特徴とする。 Further, according to the present invention, there is provided a classification unit that classifies the classification target data into any one of a plurality of classes in the target classification system using a model parameter stored in the storage unit of the classification model generation device. It is characterized by providing.

かかる発明によれば、分類部が、推定したモデルパラメータを用いて分類対象データをターゲット分類体系における複数のクラスのいずれかに分類する、つまり、高精度な分類モデルを用いることで高精度な分類を実現することができる。 According to this invention, the classification unit classifies the classification target data into any one of a plurality of classes in the target classification system using the estimated model parameter, that is, a high-precision classification using a high-precision classification model. Can be realized.

また、本発明は、コンピュータを、分類モデル生成装置または分類装置の各部として機能させるためのプログラムである。これにより、このプログラムをインストールされたコンピュータは、このプログラムに基づいた各機能を実現することができる。 In addition, the present invention is a program for causing a computer to function as each unit of the classification model generation device or the classification device. Thereby, the computer installed with this program can realize each function based on this program.

また、本発明は、前記プログラムが記録されたことを特徴とするコンピュータに読み取り可能な記録媒体である。これにより、この記録媒体を装着されたコンピュータは、この記録媒体に記録されたプログラムに基づいた各機能を実現することができる。 The present invention also provides a computer-readable recording medium in which the program is recorded. Thereby, the computer equipped with this recording medium can realize each function based on the program recorded on this recording medium.

本発明によれば、補助分類体系のデータも有効に利用することで、ターゲット分類体系に関する高精度な分類モデルを生成することができる。また、その生成した分類モデルを用いて、分類対象データをターゲット分類体系において高精度に分類することができる。 According to the present invention, it is possible to generate a highly accurate classification model related to the target classification system by effectively using the data of the auxiliary classification system. Further, using the generated classification model, the classification target data can be classified with high accuracy in the target classification system.

以下、本発明を実施するための最良の形態（以下、「実施形態」という。）について、詳細に説明する。図１は、本実施形態に係る分類装置の構成を示すブロック図である。図１に示すように、分類装置１は、演算手段２と、入力手段３と、記憶手段４と、出力手段５とを備えている。各手段２〜５はバスライン１１に接続されている。なお、分類装置１は、分類モデル（以下、単に「モデル」ともいう。）を生成する分類モデル生成装置としての機能と、その生成した分類モデルによって分類対象データを分類する分類装置としての機能とを兼ね備えるものであるが、いずれか一方の機能のみを有するものとして実現されてもよい。 Hereinafter, the best mode for carrying out the present invention (hereinafter referred to as “embodiment”) will be described in detail. FIG. 1 is a block diagram showing the configuration of the classification apparatus according to this embodiment. As shown in FIG. 1, the classification device 1 includes a calculation unit 2, an input unit 3, a storage unit 4, and an output unit 5. Each means 2 to 5 is connected to the bus line 11. The classification device 1 has a function as a classification model generation device that generates a classification model (hereinafter, also simply referred to as “model”), and a function as a classification device that classifies the classification target data using the generated classification model. However, it may be realized as having only one of the functions.

演算手段２は、例えば、ＣＰＵ（Central Processing Unit）およびＲＡＭ（Random Access Memory）から構成される主制御装置である。この演算手段２は、図１に示すように、重み推定部２１と、モデル構築部２２と、分類部２３と、メモリ２４とを含んで構成される。なお、各部２１〜２３の説明は後記するが、従来手法と比較した場合の本実施形態における主な特徴は重み推定部２１であるので、重み推定部２１に関して特に詳細に説明する。また、モデル構築部２２と分類部２３に関しては、従来手法を大きく変更せずに適用できるので、詳細な説明を省略する。 The computing means 2 is a main control device composed of, for example, a CPU (Central Processing Unit) and a RAM (Random Access Memory). As shown in FIG. 1, the calculation unit 2 includes a weight estimation unit 21, a model construction unit 22, a classification unit 23, and a memory 24. In addition, although description of each part 21-23 is mentioned later, since the main characteristic in this embodiment at the time of comparing with the conventional method is the weight estimation part 21, it demonstrates in detail regarding the weight estimation part 21 in particular. Further, the model construction unit 22 and the classification unit 23 can be applied without greatly changing the conventional method, and thus detailed description thereof is omitted.

入力手段３は、例えば、キーボード、マウス、ディスクドライブ装置等から構成される。この入力手段３は、各種データを入力し、記憶手段４に格納する（詳細は後記）。 The input unit 3 includes, for example, a keyboard, a mouse, a disk drive device, and the like. The input means 3 inputs various data and stores them in the storage means 4 (details will be described later).

記憶手段４は、例えば、一般的なハードディスク装置等から構成され、演算手段２で用いられる各種プログラムや各種データ等を記憶する。この記憶手段４は、プログラムとして、重み推定プログラム４１と、モデル構築プログラム４２と、分類プログラム４３とをプログラム格納部４０ａに記憶する。そして、演算手段２は、これらのプログラム４１〜４３を記憶手段４から読み込んでメモリ２４に展開して実行することで、前記した重み推定部２１、モデル構築部２２、分類部２３の各機能を実現する。 The storage means 4 is composed of, for example, a general hard disk device or the like, and stores various programs and various data used by the calculation means 2. The storage means 4 stores a weight estimation program 41, a model construction program 42, and a classification program 43 as programs in the program storage unit 40a. And the calculating means 2 reads these programs 41-43 from the memory | storage means 4, expand | deploys to the memory 24, and performs them, and each function of the above-mentioned weight estimation part 21, the model construction part 22, and the classification | category part 23 is carried out. Realize.

また、記憶手段４は、入力データ４４と、重み４５と、モデルパラメータ４６と、テストデータ４７とをデータ格納部４０ｂに記憶する。ここで、入力データ４４は、入力手段３から入力されるデータであり、学習用サンプルである。重み４５は、演算手段２の重み推定部２１の演算処理によって推定された重みに関するデータである（詳細は後記）。モデルパラメータ４６は、演算手段２のモデル構築部２２の演算処理によって算出されたデータである（詳細は後記）。テストデータ４７は、テスト用サンプルである（詳細は後記）。なお、入力データ４４、重み４５、モデルパラメータ４６およびテストデータ４７に関しては、以下、符号を適宜省略する。 The storage unit 4 stores the input data 44, the weight 45, the model parameter 46, and the test data 47 in the data storage unit 40b. Here, the input data 44 is data input from the input means 3 and is a learning sample. The weight 45 is data relating to the weight estimated by the calculation processing of the weight estimation unit 21 of the calculation means 2 (details will be described later). The model parameter 46 is data calculated by the calculation process of the model construction unit 22 of the calculation unit 2 (details will be described later). The test data 47 is a test sample (details will be described later). The input data 44, the weight 45, the model parameter 46, and the test data 47 are appropriately omitted below.

出力手段５は、例えば、グラフィックボード（出力インタフェース）およびそれに接続されたモニタである。このモニタは、例えば、液晶ディスプレイ等から構成され、演算処理結果（分類対象データの分類結果等）を表示する。 The output means 5 is, for example, a graphic board (output interface) and a monitor connected thereto. This monitor is composed of, for example, a liquid crystal display or the like, and displays an operation processing result (classification result of classification target data, etc.).

本実施形態では、ターゲット分類体系のデータ（既分類データ。以下、「ターゲットデータ」ともいう。）だけでなく、補助分類体系のデータ（既分類データ以下、「補助データ」ともいう。）も用いて、分類器（分類モデル）を学習する。ターゲットクラス集合をＺ、補助クラス集合をＡ、全クラス集合をＹ＝｛Ｚ，Ａ｝とする。 In the present embodiment, not only the data of the target classification system (already classified data; hereinafter also referred to as “target data”) but also the data of the auxiliary classification system (hereinafter referred to as already classified data, also referred to as “auxiliary data”) is used. Learn the classifier (classification model). The target class set is Z, the auxiliary class set is A, and all class sets are Y = {Z, A}.

学習データとして、ターゲットデータであるＤ_ｚ＝｛ｘ_ｎ，ｙ_ｎ｝^Ｎｚ _ｎ＝１（本明細書において、「^Ｎｚ _ｎ＝１」は「ｎ」に「１」から「Ｎ_ｚ」までを代入することを意味する。他の文字についても同様）と、
補助データであるＤ_ａ＝｛ｘ_ｎ，ｙ_ｎ｝^Ｎ _{ｎ＝Ｎｚ＋１}とが与えられたとき、クラスが未知のサンプルｘ（分類対象データ。後記するテストデータ４７）のクラスｙ∈Ｚを予測する分類モデルを学習する。 As the learning data, target data D _z = {x _n , y _n } ^Nz _{n = 1} (In this specification, “ ^Nz _{n = 1} ” substitutes “1” to “N _z ” for “n”. Meaning the same for other characters)
_Given auxiliary data D _a = {x _n , y _n } ^N _{n = Nz + 1} , class y∈Z of sample x (classification target data; test data 47 described later) whose class is unknown is predicted. Learn classification models.

ここで、Ｗｅｂページデータの場合、サンプルは例えば単語出現頻度ベクトルｘ_ｎ＝（ｘ_ｎ１，・・・，ｘ_ｎｗ）で表される（ｘ_ｎｗは第ｎサンプルに単語ｗが出現した回数を表す）。 Here, in the case of Web page data, a sample is represented by, for example, a word appearance frequency vector x _n = (x _n1 ,..., X _nw ) (x _nw represents the number of times the word w has appeared in the nth sample. ).

また、ｙ_ｎ∈Ｚ（ｉｆ１≦ｎ≦Ｎ_ｚ）、ｙ_ｎ∈Ａ（ｉｆＮ_ｚ＋１≦ｎ≦Ｎ）であり、Ｗｅｂページの場合、ｙ_ｎは第ｎサンプルが分類されているカテゴリを表す。なお、ｙは離散値である。また、ｘを離散変数として扱うが、連続変数の場合へも容易に拡張可能である。 Further, y _n εZ (if 1 ≦ n ≦ N _z ), y _n εA (if N _z + 1 ≦ n ≦ N), and in the case of a Web page, y _n is a category in which the nth sample is classified. Represents. Y is a discrete value. Moreover, although x is treated as a discrete variable, it can be easily extended to the case of a continuous variable.

本実施形態では、ターゲットデータに補助データ（補助分類体系のデータ）も含めた全データに関する重み付き経験誤差Ｅ（Ｍ）（式（１））を最小化することにより、モデルＭを学習する。

In the present embodiment, the model M is learned by minimizing the weighted experience error E (M) (equation (1)) for all data including auxiliary data (data of the auxiliary classification system) in the target data.

ここで、ｗ（ｚ|ｙ）はクラスｙ∈Ｙのサンプルがターゲットクラスｚ∈Ｚのモデル学習にどのくらい参考になるかをあらわす重みを表す。なお、式（１）において、太字の文字（ここではｘ_ｎとＺ）は、複数の成分を有していることを示し、以下の他の式についても同様である。また、文章中の文字については、いずれも太字で示していないが、各式と整合をとったものであるものとする。 Here, w (z | y) represents a weight representing how much a sample of class yεY is useful for model learning of target class zεZ. In Expression (1), bold characters (here, _xn and Z) indicate that they have a plurality of components, and the same applies to the other expressions below. In addition, the characters in the text are not shown in bold but are consistent with the respective expressions.

また、Ｊ（ｘ_ｎ，ｚ；Ｍ）はサンプルｘのクラスをｚと予測したときのモデルＭの誤差関数を表す。誤差関数の例として、
負の対数尤度Ｊ（ｘ，ｚ；Ｍ）＝−ｌｏｇＰ（ｚ｜ｘ；Ｍ）や、
０−１損失関数Ｊ（ｘ，ｚ；Ｍ）＝０（ｉｆｆ（ｘ）＝ｙ），
Ｊ（ｘ，ｚ；Ｍ）＝１（otherwise）、などが考えられる。なお、本明細書では、対数は自然対数、すなわち、対数ｌｏｇの底は「ｅ」であるものとする。 J (x _n , z; M) represents an error function of the model M when the class of the sample x is predicted as z. As an example of the error function,
Negative log likelihood J (x, z; M) = − logP (z | x; M),
0-1 loss function J (x, z; M) = 0 (if f (x) = y),
J (x, z; M) = 1 (otherwise) can be considered. In this specification, the logarithm is a natural logarithm, that is, the base of the logarithm log is “e”.

重みを、以下のように決定する（動作主体については後記。以下同様）。まず、クラスｙにおける経験分布を近似するモデル分布Ｐ^〜（ｘ｜ｙ）（本明細書において、経験分布を意味する記号「^〜」はその直前の文字の上に付される記号であるものとする。後記する「＾」についても同様）を推定する（式（２））。ここで、δ（ｘ，ｘ_n）はクロネッカーのデルタを表し、Ｎ（ｙ）はクラスがｙであるサンプルの数を表す。

The weight is determined as follows (the action subject will be described later, and so on). First, a model distribution P ^~ (x | y) approximating an empirical distribution in class y (in this specification, a symbol " ^~ " meaning an empirical distribution is a symbol added on the immediately preceding letter. The same applies to “^” described later (formula (2)). Where δ (x, x _n ) represents the Kronecker delta and N (y) represents the number of samples with class y.

次に、モデル分布の全クラスの混合がターゲットクラスｚ∈Ｚの真の分布Ｐ（ｘ｜ｚ）を近似するように、混合比Ｐ_ｚ（ｙ）を推定する（式（３））。ここで、混合比とは、ターゲット分類体系と補助分類体系とを統合した場合の確率分布モデルを、ターゲット分類体系の確率分布モデルに近似させるための、ターゲット分類体系と補助分類体系とのクラスごとの、分類モデルに対する影響度の比率を示すものである。

Next, the mixture ratio P _z (y) is estimated so that the mixture of all classes of the model distribution approximates the true distribution P (x | z) of the target class z∈Z (formula (3)). Here, the mixture ratio is the class of the target classification system and the auxiliary classification system for approximating the probability distribution model when the target classification system and the auxiliary classification system are integrated to the probability distribution model of the target classification system. The ratio of the degree of influence on the classification model is shown.

なお、混合比Ｐ_ｚ（ｙ）、および、混合比Ｐ_ｚ（ｙ）の集合Ｐは、
Ｐ＝｛｛Ｐ_ｚ（ｙ）｝_ｙ∈Ｙ｝_ｚ∈Ｚ（０≦Ｐ_ｚ（ｙ）≦１，Σ_ｙ∈ＹＰ_ｚ（ｙ）＝１）を満たすものとする。 The mixing ratio _P z (y), and the set P of the mixed ratio _P z (y) is
P = {{P _z (y)} _y∈Y } _z∈Z (0 ≦ P _z (y) ≦ 1, Σ _y∈Y P _z (y) = 1).

そして、重みｗ（ｚ|ｙ）を設定（算出）する（式（４））。なお、Ｐ（ｚ）は、あるサンプルに関してクラスｚが選ばれる確率である。

Then, the weight w (z | y) is set (calculated) (formula (4)). Note that P (z) is a probability that a class z is selected for a certain sample.

このとき、重み付き誤差Ｅ（Ｍ）は期待誤差の近似となる（式（５））。

At this time, the weighted error E (M) is an approximation of the expected error (formula (5)).

式（５）において、右辺の１行目から２行目への式変形は、ｎについての総和の式をｘとｙについての総和の式に変えたものである。右辺の２行目から３行目への式変形は、式（４）を使ったものである。右辺の３行目から４行目への式変形は、式（２）を使ったものである。右辺の４行目から５行目への式変形は、式（３）を使ったものである。右辺の５行目から６行目への式変形は、条件付確率の公式（定義）を使ったものであり、Ｐ（ｘ，ｚ）はｘとｚが同時に発生する確率を示す。 In Expression (5), the expression modification from the first line to the second line on the right side is obtained by changing the summation expression for n to the summation expression for x and y. The expression transformation from the second line to the third line on the right side uses Expression (4). Expression transformation from the third line to the fourth line on the right side uses Expression (2). Expression transformation from the fourth line to the fifth line on the right side uses Expression (3). Formula transformation from the 5th line to the 6th line on the right side uses a conditional probability formula (definition), and P (x, z) indicates the probability that x and z occur simultaneously.

右辺の６行目から７行目への式変形は、期待値の公式（定義）を使ったものであり、ε_ｚ[Ｊ（ｘ，ｚ；Ｍ）]はターゲットクラスｚに関する誤差の期待値を示す。このため、補助データも利用した重み付き誤差Ｅ（Ｍ）を最小化することにより、頑健な（高精度な）モデルが推定できると期待できる。 The expression transformation from the 6th line to the 7th line on the right side uses the expected value formula (definition), and ε _z [J (x, z; M)] is the expected value of the error for the target class z. Indicates. For this reason, it can be expected that a robust (high-precision) model can be estimated by minimizing the weighted error E (M) using auxiliary data.

式（３）の近似を満たす集合Ｐは、ターゲットデータに対する対数尤度Ｌ（Ｐ）をＥＭ（Expectation-Maximization）アルゴリズムを用いて最大化することにより推定する（式（６））。ＥＭアルゴリズムとは、Ｅ(Expectation)ステップとＭ(Maximization)ステップとの２つの手順を収束条件が満たされるまで繰り返すことで、パラメータ（ここでは集合Ｐ）の最尤推定を行うアルゴリズムである。

The set P satisfying the approximation of Expression (3) is estimated by maximizing the log likelihood L (P) for the target data using an EM (Expectation-Maximization) algorithm (Expression (6)). The EM algorithm is an algorithm that performs maximum likelihood estimation of parameters (here, set P) by repeating two procedures of an E (Expectation) step and an M (Maximization) step until a convergence condition is satisfied.

ここで、Ｐ^〜 _−ｎ（ｘ｜ｙ）は、ｎ番目のサンプルを除いたデータを用いて推定したモデル分布を表す。モデル分布の推定に用いたサンプルを用いて混合比を推定する場合、過学習を起こし、Ｐ_ｚ（ｚ）＝１、Ｐ_ｚ（ｙ≠ｚ）＝０という自明な解が得られてしまうため、式（６）のように1eave-one-out（ＬＯＯ）法を用いる。Ｐ^〜 _−ｎ（ｘ｜ｙ）をクラスｙのデータを用いて推定し固定した場合、Ｌ（Ｐ）はＰに関して上に凸であるため、解の大域的最適性が保証される。ＥＭアルゴリズムにおける第τステップでの推定値をＰ^（τ）とする。ここで、τは、ＥステップとＭステップとの２つの手順を繰り返した回数（τ＝０，１，２，…）を指す。なお、τ＝０のときには推定値の予め定められた初期値を示す。このとき、最大化すべき完全データ対数尤度の条件付き期待値Ｑ（Ｐ｜Ｐ^（τ））は、式（７）のように表すことができる。

Here, P ^~ _-n (x | y) represents a model distribution estimated using data excluding the nth sample. When the mixture ratio is estimated using the samples used for estimation of the model distribution, overlearning occurs, and an obvious solution of P _z (z) = 1 and P _z (y ≠ z) = 0 is obtained. The 1eave-one-out (LOO) method is used as shown in Equation (6). When P ^~ _-n (x | y) is estimated and fixed using data of class y, since L (P) is convex upward with respect to P, the global optimality of the solution is guaranteed. Let P ^(τ) be the estimated value at the τ-th step in the EM algorithm. Here, τ indicates the number of times (τ = 0, 1, 2,...) That the two procedures of E step and M step are repeated. In addition, when τ = 0, a predetermined initial value of the estimated value is shown. At this time, the conditional expected value Q (P | P ^(τ) ) of the complete data log likelihood to be maximized can be expressed as Equation (7).

Ｅステップにおける計算は式（８）のように表すことができる。なお、式（８）の右辺の分母におけるｙ’は、式（８）の他の箇所におけるｙと区別するために便宜上記号を変えたもので、ｙと同じ意味である。

The calculation in the E step can be expressed as shown in Equation (8). Note that y ′ in the denominator on the right side of Equation (8) is the same meaning as y except that the symbol is changed for the sake of distinction from y in other parts of Equation (8).

Ｍステップにおける計算は式（９）のように表すことができる。

The calculation in the M step can be expressed as Equation (9).

このＥステップにおける計算とＭステップにおける計算を、収束条件が満たされるまで繰り返すことにより、集合Ｐの推定値が得られる。 By repeating the calculation at the E step and the calculation at the M step until the convergence condition is satisfied, an estimated value of the set P is obtained.

なお、ＥＭアルゴリズムではなく、準ニュートン法など他の最適化手法を用いて式（６）を最大化することによっても、集合Ｐを推定できる。 Note that the set P can also be estimated by maximizing Equation (6) using another optimization method such as the quasi-Newton method instead of the EM algorithm.

＜重み推定＞
図２を参照しながら、重み推定部２１の構成について説明する。図２は、本実施形態に係る重み推定部のブロック図を含む図である。図２に示すように、重み推定部２１は、入力データ読込部２１１と、事後確率推定部２１２と、混合比推定部２１３と、重み書込部２１４とを備えている。 <Weight estimation>
The configuration of the weight estimation unit 21 will be described with reference to FIG. FIG. 2 is a diagram including a block diagram of a weight estimation unit according to the present embodiment. As shown in FIG. 2, the weight estimation unit 21 includes an input data reading unit 211, a posterior probability estimation unit 212, a mixture ratio estimation unit 213, and a weight writing unit 214.

まず、入力データ読込部２１１により、入力データ４４を読み込む。そして、事後確率推定部２１２によって式（８）を用いて全学習用サンプルの全時刻に対する事後確率を推定し、また、混合比推定部２１３によって式（９）を用いて混合比を推定する。この事後確率推定と混合比推定を式（６）が収束するまで交互に繰り返し、重み書込部２１４において、
重みをｗ（ｚ|ｙ）＝Ｐ（ｚ）Ｐ_ｚ（ｙ）／Ｎ（ｙ）と設定（算出）し、重み４５に格納する。なお、格納された重み４５は、モデル構築部２２で利用される。 First, the input data reading unit 211 reads the input data 44. Then, the posterior probability estimation unit 212 estimates the posterior probability of all the learning samples with respect to all times using the equation (8), and the mixture ratio estimation unit 213 estimates the mixture ratio using the equation (9). The posterior probability estimation and the mixture ratio estimation are alternately repeated until the expression (6) converges.
The weight is set (calculated) as w (z | y) = P (z) P _z (y) / N (y) and stored in the weight 45. The stored weight 45 is used by the model construction unit 22.

＜モデル構築＞
図３を参照しながら、モデル構築部２２の構成について説明する。図３は、本実施形態に係るモデル構築部のブロック図を含む図である。図３に示すように、モデル構築部２２は、入力データ読込部２２１と、重み読込部２２２と、モデルパラメータ推定部２２３と、モデルパラメータ書込部２２４とを備えている。 <Model construction>
The configuration of the model construction unit 22 will be described with reference to FIG. FIG. 3 is a diagram including a block diagram of the model construction unit according to the present embodiment. As shown in FIG. 3, the model construction unit 22 includes an input data reading unit 221, a weight reading unit 222, a model parameter estimation unit 223, and a model parameter writing unit 224.

まず、入力データ読込部２２１により、入力データ４４を読み込む。また、重み読込部２２２により、重み４５を読み込む。そして、モデルパラメータ推定部２２３によって式（１０）を用いてモデルパラメータＭ＾を推定する。

なお、式（１０）の左辺においてＭに付した記号「＾（ハット）」は、そのＭがargmin関数の引数を最小化させることを示すものである。 First, the input data reading unit 221 reads input data 44. Further, the weight reading unit 222 reads the weight 45. Then, the model parameter estimation unit 223 estimates the model parameter M ^ using Expression (10).

The symbol “記号 (hat)” attached to M on the left side of Expression (10) indicates that M minimizes the argument of the argmin function.

モデルパラメータ書込部２２４は、モデルパラメータ推定部２２３が推定したモデルパラメータをモデルパラメータ４６に格納する。なお、格納されたモデルパラメータ４６は、分類部２３で利用される。 The model parameter writing unit 224 stores the model parameter estimated by the model parameter estimation unit 223 in the model parameter 46. The stored model parameter 46 is used by the classification unit 23.

図４を参照しながら、分類部２３の構成について説明する。図４は、本実施形態に係る分類部のブロック図を含む図である。図４に示すように、分類部２３は、テストデータ読込部２３１と、モデルパラメータ読込部２３２と、分類結果出力部２３３とを備えている。 The configuration of the classification unit 23 will be described with reference to FIG. FIG. 4 is a diagram including a block diagram of a classification unit according to the present embodiment. As shown in FIG. 4, the classification unit 23 includes a test data reading unit 231, a model parameter reading unit 232, and a classification result output unit 233.

まず、テストデータ読込部２３１により、未分類のテストデータ４７を読み込む。また、モデルパラメータ読込部２３２により、モデルパラメータ４６を読み込む。そして、分類結果出力部２３３において、テストデータとモデルパラメータを使って分類結果を計算し、分類結果を出力する。 First, the test data reading unit 231 reads unclassified test data 47. The model parameter reading unit 232 reads the model parameter 46. Then, the classification result output unit 233 calculates the classification result using the test data and the model parameters, and outputs the classification result.

図１に示した分類装置１の動作について図５を参照（適宜図１参照）して説明する。図５は、本実施形態に係る分類装置の処理の流れを示す説明図である。 The operation of the classification apparatus 1 shown in FIG. 1 will be described with reference to FIG. 5 (refer to FIG. 1 as appropriate). FIG. 5 is an explanatory diagram showing the flow of processing of the classification device according to the present embodiment.

まず、分類装置１は、重み推定部２１によって、記憶手段４（図１参照）に予め格納された入力データ４４に基づいて重みを推定する（ステップＳ１０：重み推定ステップ）。推定された重みは、重み４５として記憶手段４に格納される。次に、分類装置１は、モデル構築部２２によって、記憶手段４（図１参照）に予め格納された入力データ４４および重み４５に基づいてモデルを構築する（ステップＳ２０：モデル構築ステップ）。構築されたモデルは、モデルパラメータ４６として記憶手段４に格納される。このステップＳ１０とステップＳ２０はモデルの学習に関する処理である。 First, the classification device 1 uses the weight estimation unit 21 to estimate weights based on the input data 44 stored in advance in the storage unit 4 (see FIG. 1) (step S10: weight estimation step). The estimated weight is stored in the storage unit 4 as the weight 45. Next, the classification device 1 uses the model construction unit 22 to construct a model based on the input data 44 and weights 45 stored in advance in the storage unit 4 (see FIG. 1) (step S20: model construction step). The constructed model is stored in the storage unit 4 as the model parameter 46. Steps S10 and S20 are processes related to model learning.

続いて、分類装置１は、分類部２３によって、記憶手段４（図１参照）に予め格納された未分類であるテストデータ４７（分類対象データ）を、モデルパラメータ４６に基づいて分類する（ステップＳ３０：分類ステップ）。このステップＳ３０は分類対象データの分類に関する処理である。 Subsequently, the classification device 1 classifies the unclassified test data 47 (classification target data) stored in advance in the storage unit 4 (see FIG. 1) based on the model parameter 46 by the classification unit 23 (step). S30: Classification step). This step S30 is processing relating to the classification of the classification target data.

次に、前記したステップＳ１０の重み推定ステップについて図６を参照（適宜図１ないし図５参照）して説明する。図６は、重み推定ステップの処理を示すフローチャートである。 Next, the weight estimation step of step S10 described above will be described with reference to FIG. 6 (refer to FIGS. 1 to 5 as appropriate). FIG. 6 is a flowchart showing the weight estimation step.

まず、図６に示すように、重み推定部２１は、入力データ読込部２１１によって、記憶手段４（図１参照）から、入力データ４４を読み込む（ステップＳ１）。次に、重み推定部２１は、事後確率推定部２１２によって、モデル分布の推定を行う（ステップＳ２）。具体的には、前記した式（２）を満たすモデル分布を推定する。 First, as shown in FIG. 6, the weight estimation unit 21 reads the input data 44 from the storage unit 4 (see FIG. 1) by the input data reading unit 211 (step S1). Next, the weight estimation unit 21 estimates the model distribution by the posterior probability estimation unit 212 (step S2). Specifically, a model distribution that satisfies the above-described equation (2) is estimated.

その後、重み推定部２１は、事後確率推定部２１２によって、初期化を行う（ステップＳ３）。具体的には、事後確率推定部２１２は、ＥＭアルゴリズムのＥステップとＭステップとの２つの手順の繰り返し回数τを０に設定し、混合比Ｐ_ｚ（ｙ）の分布をランダムに設定する。 Thereafter, the weight estimation unit 21 performs initialization by the posterior probability estimation unit 212 (step S3). Specifically, the posterior probability estimation unit 212 sets the number of repetitions τ of the two steps of the E step and the M step of the EM algorithm to 0, and randomly sets the distribution of the mixture ratio P _z (y).

次に、重み推定部２１は、事後確率推定部２１２によって、ＥＭアルゴリズムのＥステップを実行する（ステップＳ４）。具体的には、事後確率推定部２１２は、前記した式（８）により、前記事後確率を推定する。続いて、重み推定部２１は、混合比推定部２１３によって、ＥＭアルゴリズムのＭステップを実行する（ステップＳ５）。具体的には、混合比推定部２１３は、前記した式（９）により、前記混合比を推定する。次に、重み推定部２１は、混合比推定部２１３によって、収束条件が満たされたか否かを判別する（ステップＳ６）。具体的には、混合比推定部２１３は、前記した式（６）に示す尤度Ｌ（Ｐ）が収束したか否かを判別する。この収束の判別は、閾値や変化率などを使用することにより行うことができる。 Next, the weight estimation unit 21 executes the E step of the EM algorithm by using the posterior probability estimation unit 212 (step S4). Specifically, the posterior probability estimation unit 212 estimates the posterior probability by the above-described equation (8). Subsequently, the weight estimation unit 21 executes the M step of the EM algorithm by the mixture ratio estimation unit 213 (step S5). Specifically, the mixture ratio estimation unit 213 estimates the mixture ratio according to the equation (9). Next, the weight estimation unit 21 determines whether or not the convergence condition is satisfied by the mixture ratio estimation unit 213 (step S6). Specifically, the mixture ratio estimation unit 213 determines whether or not the likelihood L (P) shown in the above equation (6) has converged. This determination of convergence can be made by using a threshold value, a change rate, or the like.

収束条件が満たされた場合、すなわち前記した式（６）に示す尤度Ｌ（Ｐ）が収束した場合（ステップＳ６：Ｙｅｓ）、混合比推定部２１３は、重みｗ（ｚ|ｙ）を計算する（ステップＳ８）。具体的には、混合比推定部２１３は、
ｗ（ｚ|ｙ）＝Ｐ（ｚ）Ｐ_ｚ（ｙ）／Ｎ（ｙ）の式を用いて重みを計算する。そして、重み推定部２１は、重み書込部２１４によって、その重みを、重み４５として、記憶手段４（図１参照）に書き込み、処理を終了する。 When the convergence condition is satisfied, that is, when the likelihood L (P) shown in the above equation (6) has converged (step S6: Yes), the mixture ratio estimation unit 213 calculates the weight w (z | y). (Step S8). Specifically, the mixture ratio estimation unit 213
The weight is calculated using the formula w (z | y) = P (z) P _z (y) / N (y). Then, the weight estimation unit 21 writes the weight as the weight 45 in the storage unit 4 (see FIG. 1) by the weight writing unit 214 and ends the processing.

一方、ステップＳ６において、収束条件が満たされていない場合、すなわち前記した式（６）に示す尤度Ｌ（Ｐ）が収束していない場合（ステップＳ６：Ｎｏ）、重み推定部２１は、ＥステップおよびＭステップの繰り返し回数τに「１」を加算し（τ＝τ＋１）（ステップＳ７）、ステップＳ４に戻る。 On the other hand, when the convergence condition is not satisfied in step S6, that is, when the likelihood L (P) shown in the above equation (6) is not converged (step S6: No), the weight estimation unit 21 determines that E “1” is added to the number of repetitions τ of the step and the M step (τ = τ + 1) (step S7), and the process returns to step S4.

本実施形態によれば、分類装置１は、ターゲット分類体系における既分類データだけでなく、補助分類体系における既分類データも用い、誤差関数と重みとの積の総和である期待誤差（式（５）参照）を最小化させるように重みを推定し、その推定された重みと２種類の既分類データとを用いて分類モデルを生成することで、補助分類体系のデータも有効に利用し、ターゲット分類体系に関する高精度な分類モデルを生成することができる。 According to this embodiment, the classification device 1 uses not only the already classified data in the target classification system but also the already classified data in the auxiliary classification system, and an expected error (formula (5)) that is the sum of the products of the error function and the weight. ) See)) to minimize the weight, and generate a classification model using the estimated weight and two types of already classified data. A highly accurate classification model related to the classification system can be generated.

また、重み推定部２１が、事後確率推定部２１２と、混合比推定部２１３とを備えているので、例えば、事後確率推定部２１２が、ＥＭアルゴリズムにおけるＥステップを行い、かつ、混合比推定部２１３がＭステップを行うことで、混合比についての大域的最適解を求め、求めた混合比から重みを決定（推定）することができる。 Further, since the weight estimation unit 21 includes the posterior probability estimation unit 212 and the mixture ratio estimation unit 213, for example, the posterior probability estimation unit 212 performs the E step in the EM algorithm, and the mixture ratio estimation unit When 213 performs M steps, a global optimum solution for the mixture ratio can be obtained, and a weight can be determined (estimated) from the obtained mixture ratio.

また、例えば、モデルパラメータ推定部２２３が、式（１０）を用いてモデルパラメータを推定することができる。 Further, for example, the model parameter estimation unit 223 can estimate the model parameter using Expression (10).

また、分類部２３が、推定したモデルパラメータを用いて分類対象データをターゲット分類体系における複数のクラスのいずれかに分類する、つまり、高精度な分類モデルを用いることで高精度な分類を実現することができる。 Further, the classification unit 23 classifies the classification target data into one of a plurality of classes in the target classification system using the estimated model parameters, that is, realizes high-precision classification by using a high-precision classification model. be able to.

また、分類装置１は、一般的なコンピュータに、前記した各処理のプログラムを実行させることで実現することもできる。このプログラムは、通信回線を介して配布することも可能であるし、ＣＤ−ＲＯＭ（Compact Disc-Read Only Memory）等の記録媒体に書き込んで配布することも可能である。 The classification device 1 can also be realized by causing a general computer to execute the above-described processing programs. This program can be distributed via a communication line, or can be written on a recording medium such as a CD-ROM (Compact Disc-Read Only Memory) for distribution.

以上で本実施形態の説明を終えるが、本発明の態様はこれらに限定されるものではない。例えば、本発明は、任意の誤差関数およびモデルを用いることが可能である。その他、ハードウェアやフローチャート等の具体的な構成について、本発明の趣旨を逸脱しない範囲で適宜変更が可能である。 Although description of this embodiment is finished above, the aspect of the present invention is not limited to these. For example, the present invention can use any error function and model. In addition, specific configurations such as hardware and flowcharts can be appropriately changed without departing from the spirit of the present invention.

《人工データにおける実施例》
本実施形態の分類装置１を評価するため、人工データを用いた２クラス分類実験を行った。この２クラス分類実験とは、ターゲットデータと補助データから生成した分類モデルに基づき、テストデータを２つのクラスのいずれかに分類する実験である。 <Examples of artificial data>
In order to evaluate the classification apparatus 1 of the present embodiment, a two-class classification experiment using artificial data was performed. This two-class classification experiment is an experiment for classifying test data into one of two classes based on a classification model generated from target data and auxiliary data.

ターゲットデータは平均の異なる２つの１００次元正規分布からデータが生成されるものとする。ここで、クラスｃ_１、ｃ_２の平均はそれぞれ
μ_１＝（−１，０，０，・・・，０），μ_２＝（１，０，０，・・・，０）であり、共分散行列はともに単位行列であるものとする。そして、補助データとして，以下の３パターンを考える。なお、第３次元以降の平均はターゲットデータと同じく全て０、共分散行列は全て単位行列とする。図７（ａ）にターゲットデータ，図７（ｂ）〜（ｄ）に各補助データの生成モデルの第１，第２次元を示す。図７（ａ）〜（ｄ）は、特に軸や目盛りを図示していないが、２次元の座標平面を表しており、中央部分が原点である。また、各円は標準偏差のラインを表す。 It is assumed that target data is generated from two 100-dimensional normal distributions having different averages. Here, the averages of classes c ₁ and c ₂ are μ ₁ = (− 1, 0, 0,..., 0) and μ ₂ = (1, 0, 0,..., 0), respectively. Both covariance matrices are assumed to be unit matrices. Then, the following three patterns are considered as auxiliary data. Note that the average after the third dimension is all 0 as in the target data, and the covariance matrix is all the unit matrix. FIG. 7A shows target data, and FIGS. 7B to 7D show the first and second dimensions of each auxiliary data generation model. FIGS. 7A to 7D do not particularly show axes or scales, but represent a two-dimensional coordinate plane, and the central portion is the origin. Each circle represents a standard deviation line.

図７（ｂ）に示す同一補助データは、ターゲットデータと同一の生成モデルから生成され、クラスｃ_３、ｃ_４の平均はそれぞれ
μ_３＝（−１，０，０，・・・，０），μ_４＝（１，０，０，・・・，０）である。 The same auxiliary data shown in FIG. 7B is generated from the same generation model as the target data, and the averages of classes c ₃ and c ₄ are μ ₃ = (− 1, 0, 0,..., 0), respectively. , Μ ₄ = (1, 0, 0,..., 0).

図７（ｃ）に示す相関補助データは、ターゲットデータとクラス間関係に相関がある生成モデルから生成され、クラスｃ_３、ｃ_４の平均はそれぞれ
μ_３＝（−√０．５，√０．５，０，・・・，０），
μ_４＝（√０．５，−√０．５，０，・・・，０）である。 The correlation auxiliary data shown in FIG. 7C is generated from a generation model having a correlation between the target data and the class relationship, and the averages of the classes c ₃ and c ₄ are μ ₃ = (− √0.5, √0, respectively. .5,0, ..., 0),
μ ₄ = (√0.5, −√0.5, 0,..., 0).

図７（ｄ）に示す混合補助データは、同一補助データ、および、ターゲットデータとクラス間関係が直交する補助データの組合せ（混合）であり、クラスｃ_３、ｃ_４、ｃ_５、ｃ_６の平均はそれぞれ
μ_３＝（−１，０，０，・・・，０），μ_４＝（１，０，０，・・・，０），
μ_５＝（０，１，０，・・・，０），μ_６＝（０，−１，０，・・・，０）である。なお、補助データのうち、この混合補助データのみ４補助クラスであり、それ以外は２補助クラスである。 The mixed auxiliary data shown in FIG. 7D is a combination (mixed) of the same auxiliary data and auxiliary data in which the relationship between the target data and the class is orthogonal, and the class c ₃ , c ₄ , c ₅ , c ₆ The average is μ ₃ = (− 1, 0, 0,..., 0), μ ₄ = (1, 0, 0,..., 0),
μ ₅ = (0, 1, 0,..., 0) and μ ₆ = (0, −1, 0,..., 0). Of the auxiliary data, only this mixed auxiliary data is 4 auxiliary classes, and the others are 2 auxiliary classes.

ターゲットデータとして各クラス２，４，８，１６，３２，６４，１２８，２５６サンプル（入力データ４４）、補助データとして各クラス２５６サンプル（入力データ４４）、テストデータとして各クラス１００サンプル（テストデータ４７）を生成した。これらに基づき、分類モデルを生成し、補助データを使わない場合（ターゲットデータのみ）と各補助データを使った場合の、テストデータの分類に関する正答率を計算した。その結果、表１のようになった。表１において、右４列の数字は平均正答率の百分率を示し、それぞれの括弧内の数字は標準偏差を示している。本実施形態の分類装置１の分類方法に基づいて補助データを使うことによって、補助データを使わない場合よりも正答率が向上していることがわかる。 Each class 2, 4, 8, 16, 32, 64, 128, 256 samples (input data 44) as target data, each class 256 samples (input data 44) as auxiliary data, each class 100 samples (test data) as test data 47) was produced. Based on these, a classification model was generated, and the correct answer rate for the classification of test data when the auxiliary data was not used (target data only) and when each auxiliary data was used was calculated. As a result, it became as shown in Table 1. In Table 1, the numbers in the right four columns indicate the percentage of the average correct answer rate, and the numbers in parentheses indicate the standard deviation. It can be seen that the use of auxiliary data based on the classification method of the classification device 1 of the present embodiment improves the correct answer rate compared to the case where auxiliary data is not used.

《テキストデータにおける実施例》
本実施形態の分類装置１を評価するため、テキストデータを用いて分類実験を行った。 << Example of text data >>
In order to evaluate the classification device 1 of the present embodiment, a classification experiment was performed using text data.

＜モデル分布＞
モデル分布Ｐ^〜（ｘ｜ｙ）として、正規分布、多項分布など任意の分布を仮定することができる。ここでは、入力データ４４およびテストデータ４７としてテキストデータを想定し、ｘを単語出現頻度ベクトルと考え、モデル分布として多項分布Ｐ^〜（ｘ_ｎ｜ｙ）（式（１１））を用いる。

ここで、Ｖは総語彙数、θ_ｙｊはクラスｙのときｊ番目の単語が出現する確率、ｘ_ｎｊはｎ番目のサンプルにおけるｊ番目の単語の出現頻度を表す。 <Model distribution>
An arbitrary distribution such as a normal distribution or a multinomial distribution can be assumed as the model distribution P ^~ (x | y). Here, text data is assumed as the input data 44 and the test data 47, x is considered as a word appearance frequency vector, and a multinomial distribution P ^to (x _n | y) (formula (11)) is used as a model distribution.

Here, V is the total number of vocabularies, θ _yj is the probability that the j-th word will appear in class y, and x _nj represents the frequency of appearance of the j-th word in the n-th sample.

多項分布のパラメータθ_ｙｊのｎ番目のサンプルを除いたときのＬＯＯ最尤推定値θ＾_{−ｎ，ｙｊ}は式（１２）で得られる。

The LOO maximum likelihood estimation value θ ^ _{−n, yj} when the nth sample of the parameter _θyj of the multinomial distribution is removed is obtained by Expression (12).

ここで、ゼロ確率問題を回避するために、ＬＯＯ最尤推定値と一様分布の線形和を用いてスムージングする（式（１３））。

Here, in order to avoid the zero probability problem, smoothing is performed using the LOO maximum likelihood estimate and the linear sum of the uniform distribution (Equation (13)).

ここで、０≦α≦１はハイパーパラメータである。ハイパーパラメータを人手で設定してもよいが、一般化ＥＭアルゴリズムを用いることにより、以下の
Ｑ（Ｐ，α｜Ｐ^（τ），α^（τ））を最大化するように、混合比の集合Ｐとハイパーパラメータαを同時にデータから推定することも可能である（式（１４））。

Here, 0 ≦ α ≦ 1 is a hyper parameter. The hyperparameters may be set manually, but by using the generalized EM algorithm, the set of mixing ratios is maximized so as to maximize the following Q (P, α | P ^(τ) , α ^(τ) ) It is also possible to estimate P and hyperparameter α from the data at the same time (formula (14)).

Ｅステップは式（８）、Ｍステップにおける混合比の更新は式（９）で、通常のＥＭアルゴリズムと同様に実現できる。Ｍステップにおけるハイパーパラメータの更新はニュートン法を用いて行う（式（１５））。

The E step is expressed by equation (8), and the update of the mixing ratio in the M step is expressed by equation (9), which can be realized in the same manner as a normal EM algorithm. The hyperparameters in the M step are updated using the Newton method (Formula (15)).

ここで、式（１５）に記載されている式（１４）のαによる一階偏微分は式（１６）となる。

Here, the first-order partial differentiation with respect to α in the equation (14) described in the equation (15) becomes the equation (16).

また、式（１５）に記載されている式（１４）のαによる二階偏微分は式（１７）となる。

Further, the second-order partial differentiation with respect to α in the equation (14) described in the equation (15) becomes the equation (17).

式（１７）から明らかなように、二階偏微分は常に負になるため、
Ｑ（Ｐ，α｜Ｐ^（τ），α^（τ））はαに関して上に凸である。この実験では、一般化ＥＭアルゴリズムを用いて混合比の集合Ｐおよびハイパーパラメータαをデータから推定した。 As is clear from equation (17), the second-order partial derivative is always negative,
Q (P, α | P ^(τ) , α ^(τ) ) is convex upward with respect to α. In this experiment, the set P of mixing ratios and the hyperparameter α were estimated from the data using a generalized EM algorithm.

＜分類モデル＞
代表的なテキスト分類モデルであるナイーブベイズモデルとロジスティック回帰モデルをモデルＭとして用いた場合について説明する。 <Classification model>
A case where a naive Bayes model, which is a typical text classification model, and a logistic regression model are used as the model M will be described.

（ナイーブベイズモデル）
ナイーブベイズモデルではクラスが与えられたとき、文書中の各単語は独立に生成されると仮定され、クラスｚにおける単語出現頻度ベクトルｘの分布Ｐ（ｘ｜ｚ）が多項分布で表される（式（１８））。

(Naive Bayes model)
In the naive Bayes model, when a class is given, it is assumed that each word in the document is generated independently, and the distribution P (x | z) of the word appearance frequency vector x in the class z is expressed by a multinomial distribution ( Formula (18)).

ここで、φ_ｚｊはクラスｚの文書におけるｊ番目の単語が出現する確率を表す。誤差関数として負の対数尤度を用い、また、φ＝｛｛φ_ｚｊ｝^Ｖ _ｊ＝１｝_ｚ∈Ｚの事前確率としてディリクレ分布Ｐ（φ）∝Π_ｚ∈ＺΠ^Ｖ _ｊ＝１φ^β _ｚｊを用いたとき、重み付き誤差関数Ｅ（Ｍ_ＮＢ）は、式（１９）のように表される。

Here, φ _zj represents the probability that the j-th word appears in a document of class z. Using a negative log likelihood as an error function, _{also, φ = {{φ zj}} V j = 1} Dirichlet P as the prior probability of _{_{^{z∈Z (φ) αΠ z∈Z Π V}}} j = 1 φ β _{When zj} is used, the weighted error function E (M _NB ) is expressed as Equation (19).

式（１９）を最小化するφ_ｚｊの推定値φ＾_ｚｊは、式（２０）によって得られる。

The estimated value φ ^ _zj of _φzj that minimizes the equation (19) is obtained by the equation (20).

（ロジスティック回帰モデル）
ロジスティック回帰モデルでは、単語出現頻度ベクトルｘが与えられたとき、クラスｚに属する確率Ｐ（ｚ｜ｘ）は式（２１）のように表される。

(Logistic regression model)
In the logistic regression model, when a word appearance frequency vector x is given, the probability P (z | x) belonging to the class z is expressed as in Expression (21).

ここで、λ_ｚはクラスｚに関する未知パラメータベクトル、λ_ｚ ^Ｔはλ_ｚの転置を表す。誤差関数として負の対数尤度を用い、また、λ_ｚの事前確率として平均０、共分散行列γ^−１Ｉ（Ｉは単位行列）の正規分布を用いたとき、重み付き誤差（期待誤差）Ｅ（Ｍ_ＬＲ）は、式（２２）のように表される。

Here, λ _z represents an unknown parameter vector related to class z, and λ _z ^T represents transposition of λ _z . When a negative log likelihood is used as an error function, and a normal distribution of mean 0 and covariance matrix γ ⁻¹ I (I is a unit matrix) is used as a prior probability of λ _z , a weighted error (expected error) E (M _LR ) is expressed as in Expression (22).

準ニュートン法などを用いて式（２２）の値を最小化することにより、未知パラメータベクトル｛λ_ｚ｝_ｚ∈Ｚを推定できる。ロジスティック回帰モデルを用いた場合、各サンプルの誤差関数を付加するのみであるため、これまで提案されている多くの分類モデルを若干修正するのみで適用することができる。 The unknown parameter vector {λ _z } _zεZ can be estimated by minimizing the value of Equation (22) using a quasi-Newton method or the like. When a logistic regression model is used, only the error function of each sample is added, so that many classification models that have been proposed so far can be applied with slight modification.

＜比較手法＞
分類モデルとしてナイーブベイズモデルを用いた本手法（本実施形態の分類装置１による手法）（ＣＡ−ＮＢ）と、分類モデルとしてロジスティック回帰モデルを用いた本手法（ＣＡ−ＬＲ）と、補助データを用いないナイーブベイズモデルによる手法（ＮＢ）、ロジスティック回帰モデルによる手法（ＬＲ）の４手法を比較した。ＮＢの推定値は、推定値である式（２０）の重みを
ｗ（ｚ｜ｚ）＝１，ｗ（ｚ｜ｙ≠ｚ）＝０としたものである。同様に、ＬＲの推定値は、本手法における重み付き誤差である式（２２）の重みを
ｗ（ｚ｜ｚ）＝１，ｗ（ｚ｜ｙ≠ｚ）＝０として最小化することにより得られる。 <Comparison method>
This method using the naive Bayes model as a classification model (method by the classification apparatus 1 of this embodiment) (CA-NB), this method using a logistic regression model as a classification model (CA-LR), and auxiliary data Four methods, a method using a naive Bayes model not used (NB) and a method using a logistic regression model (LR), were compared. The estimated value of NB is obtained by setting w (z | z) = 1 and w (z | y ≠ z) = 0 as the weights of Equation (20), which is an estimated value. Similarly, the estimated value of LR is obtained by minimizing the weight of Equation (22), which is a weighted error in this method, as w (z | z) = 1 and w (z | y ≠ z) = 0. It is done.

それぞれの実験において評価用データセットを１００作成し、その平均正答率を用いて評価した。また、評価用データセットとは別に１つの開発用データセットを作成し、各手法において開発用データセットの正答率を最も高くする分類モデルのハイパーパラメータ（βもしくはγ）を｛１０^−３，１０^−２，１０^−１，１｝の４候補から選択した。 In each experiment, 100 evaluation data sets were created and evaluated using the average correct answer rate. Also, one development data set is created separately from the evaluation data set, and the hyperparameter (β or γ) of the classification model that maximizes the correct answer rate of the development data set in each method is set to {10 ⁻³ , 10 ^-2 , 10 ^-1 , 1}.

＜Ｔｏｙデータ＞
20Newsgroups（20news）から作成したデータセットを用い、各補助クラスの分布が、あるターゲットクラスと同じ分布である場合の、本手法の効果を評価する。20newsは、２０のディスカッショングループに投稿された約２万の英語文書から成る。各文書の特徴量として単語出現頻度を用いた。このとき、停止語（文書に含まれる意味的な内容を持たない前置詞や冠詞などの一般的に機能語と呼ばれ検索に役立たない単語）および出現頻度が１以下の単語は省き、総語彙数は52,647であった。 <Toy data>
Using the data set created from 20Newsgroups (20news), the effect of this method is evaluated when the distribution of each auxiliary class is the same as that of a certain target class. 20news consists of about 20,000 English documents submitted to 20 discussion groups. The word appearance frequency was used as the feature value of each document. At this time, stop words (words that are generally called function words, such as prepositions and articles that have no semantic content, and are not useful for search) and words whose appearance frequency is 1 or less are omitted, and the total number of vocabularies Was 52,647.

２０のグループのうち、コンピュータ（comp）を親ディレクトリにもつ５つのグループ（graphics，os.ms-windows.misc，sys.ibm.pc.hardware，sys.mac.hardware，windows.x）に分類する問題について、
ターゲットクラス集合をＺ＝｛ｃ_１，・・・，ｃ_５｝、
補助クラス集合をＡ＝｛ｃ_６，・・・，ｃ_１０｝とする。 Of the 20 groups, classify into 5 groups (graphics, os.ms-windows.misc, sys.ibm.pc.hardware, sys.mac.hardware, windows.x) that have computers (comp) in their parent directories. About the problem
The target class set is Z = {c ₁ ,..., C ₅ },
Let the auxiliary class set be A = {c ₆ ,..., C ₁₀ }.

そして、graphicsの記事をターゲットクラスｃ_１もしくは補助クラスｃ_６に、os.ms-windows.miscの記事をターゲットクラスｃ_２もしくは補助クラスｃ_７に、sys.ibm.pc.hardwareの記事をターゲットクラスｃ_３もしくは補助クラスｃ_８に、sys.mac.hardwareの記事をターゲットクラスｃ_４もしくは補助クラスｃ_９に、windows.xの記事をターゲットクラスｃ_５もしくは補助クラスｃ_１０に、ランダムに割り当て、ターゲットデータおよび補助データを作成した。 Then, the article graphics to the target class c ₁ or the auxiliary class c _6, the articles os.ms-windows.misc the target class c ₂ or auxiliary class c _7, target class articles sys.ibm.pc.hardware to c ₃ or auxiliary class _{c 8,} articles sys.mac.hardware the target class _{c 4} or auxiliary class _{c 9,} the articles windows.x the target class _{c 5} or auxiliary class _{c 10,} randomly assigned, the target Data and auxiliary data were created.

このとき、テストデータとして各クラス１００サンプル、ターゲットデータとして各クラス２，４，８，１６，３２，６４，１２８，２５６サンプル、補助データとして残り全サンプル用いた。総学習サンプル数は4,363であった。このときの正答率を表２に示す。表２において、右４列の数字は平均正答率の百分率を示し、それぞれの括弧内の数字は標準偏差を示している。 At this time, 100 samples of each class were used as test data, 2, 4, 8, 16, 32, 64, 128, 256 samples were used as target data, and all remaining samples were used as auxiliary data. The total number of learning samples was 4,363. The correct answer rate at this time is shown in Table 2. In Table 2, the numbers in the four right columns indicate the percentage of the average correct answer rate, and the numbers in parentheses indicate the standard deviation.

本手法であるＣＡ−ＮＢ、ＣＡ−ＬＲの正答率は学習サンプル数が少ない場合でも極めて高く、補助データを適切に利用することにより、頑健な（高精度な）モデル推定ができていると言える。 The correct answer rate of CA-NB and CA-LR, which are the present methods, is extremely high even when the number of learning samples is small, and it can be said that robust (high-accuracy) model estimation can be performed by appropriately using auxiliary data. .

＜20Newsgroupsデータ＞
20newsの２０グループのうち、comp.graphics，rec.sport.baseba11，sci.electronics，talk.religion.miscの４グループをターゲットクラスとし、他の１６グループを補助クラスとしてデータを作成し、本手法を評価した。テストデータ４７として各クラス１００サンプル、ターゲットデータ（入力データ４４）として各クラス２，４，８，１６，３２，６４，１２８，２５６サンプル、補助データ（入力データ４４）として全サンプル用いた、総補助サンプル数は15,211であった。このときの正答率を表３に示す。表３において、右４列の数字は平均正答率の百分率を示し、それぞれの括弧内の数字は標準偏差を示している。本手法であるＣＡ−ＮＢの正答率が最も高くなっている。 <20Newsgroups data>
Of 20 groups of 20news, 4 groups of comp.graphics, rec.sport.baseba11, sci.electronics, talk.religion.misc are used as target classes, and the other 16 groups are used as auxiliary classes to create data. evaluated. 100 samples for each class as test data 47, 2, 4, 8, 16, 32, 64, 128, 256 samples for target data (input data 44), and all samples for auxiliary data (input data 44). The number of auxiliary samples was 15,211. The correct answer rate at this time is shown in Table 3. In Table 3, the numbers in the right four columns indicate the percentage of the average correct answer rate, and the numbers in parentheses indicate the standard deviation. The correct answer rate of CA-NB which is this method is the highest.

＜Webページデータ＞
日本語のディレクトリ型検索エンジンgoo（登録商標）カテゴリ検索（２００３年９月取得）とyahoo（登録商標）カテゴリ（２００３年３月取得）のデータを用いて本手法を評価した。形態素解析により単語を抽出し、両カテゴリで出現数が１０以上の単語を特徴量として用いた。このとき、総語彙数は43,200であった。goo（登録商標）とyahoo（登録商標）でクラスラベルが同一のクラスや、関連していると思われるクラスもあるが、明確な対応付けが難しいクラスもあり、また、クラス数も異なる（goo（登録商標）：１３クラス、yahoo（登録商標）：１４クラス）。 <Web page data>
This method was evaluated using data of a Japanese directory search engine goo (registered trademark) category search (acquired in September 2003) and yahoo (registered trademark) category (acquired in March 2003). Words were extracted by morphological analysis, and words having an appearance count of 10 or more in both categories were used as feature quantities. At this time, the total number of vocabulary was 43,200. There are classes with the same class label in goo (registered trademark) and yahoo (registered trademark), and classes that seem to be related, but there are classes that are difficult to clearly associate, and the number of classes is also different (goo (Registered trademark): 13 classes, yahoo (registered trademark): 14 classes).

goo（登録商標）ディレクトリのクラスをターゲットクラスとし、テストデータ４７として各クラス１００サンプル、ターゲットデータ（入力データ４４）として各クラス２，４，８，１６，３２，６４，１２８，２５６サンプル、補助データ（入力データ４４）としてyahoo（登録商標）ディレクトリに含まれる全サンプル用いた。総補助サンプル数は51,728であった。このときの正答率を表４に示す。表４において、右４列の数字は平均正答率の百分率を示し、それぞれの括弧内の数字は標準偏差を示している。本手法であるＣＡ−ＮＲ、ＣＡ−ＬＲの正答率が総じて高くなっている。 The class of the goo (registered trademark) directory is the target class, 100 samples for each class as test data 47, each class 2, 4, 8, 16, 32, 64, 128, 256 samples for target data (input data 44), auxiliary All samples included in the yahoo (registered trademark) directory were used as data (input data 44). The total number of auxiliary samples was 51,728. The correct answer rate at this time is shown in Table 4. In Table 4, the numbers in the right four columns indicate the percentage of the average correct answer rate, and the numbers in parentheses indicate the standard deviation. The correct answer rate of CA-NR and CA-LR which are the present methods is generally high.

本実施形態に係る分類装置の構成を示すブロック図である。It is a block diagram which shows the structure of the classification device which concerns on this embodiment. 本実施形態に係る重み推定部のブロック図を含む図である。It is a figure containing the block diagram of the weight estimation part which concerns on this embodiment. 本実施形態に係るモデル構築部のブロック図を含む図である。It is a figure containing the block diagram of the model construction part which concerns on this embodiment. 本実施形態に係る分類部のブロック図を含む図である。It is a figure containing the block diagram of the classification | category part which concerns on this embodiment. 本実施形態に係る分類装置の処理の流れを示す説明図である。It is explanatory drawing which shows the flow of a process of the classification device concerning this embodiment. 重み推定ステップの処理を示すフローチャートである。It is a flowchart which shows the process of a weight estimation step. （ａ）はターゲットデータ、（ｂ）〜（ｄ）は各補助データの生成モデルの第１、第２次元を示す図である。(A) is target data, (b)-(d) is a figure which shows the 1st, 2nd dimension of the production | generation model of each auxiliary data.

Explanation of symbols

１分類装置
２演算手段
３入力手段
４記憶手段
５出力手段
１１バスライン
２１重み推定部
２２モデル構築部
２３分類部
２４メモリ
４０ａプログラム格納部
４１重み推定プログラム
４２モデル構築プログラム
４３分類プログラム
４０ｂデータ格納部
４４入力データ
４５重み
４６モデルパラメータ
４７テストデータ
２１１入力データ読込部
２１２事後確率推定部
２１３混合比推定部
２１４重み書込部
２２１入力データ読込部
２２２重み読込部
２２３モデルパラメータ推定部
２２４モデルパラメータ書込部
２３１テストデータ読込部
２３２モデルパラメータ読込部
２３３分類結果出力部 DESCRIPTION OF SYMBOLS 1 Classifier 2 Calculation means 3 Input means 4 Storage means 5 Output means 11 Bus line 21 Weight estimation part 22 Model construction part 23 Classification part 24 Memory 40a Program storage part 41 Weight estimation program 42 Model construction program 43 Classification program 40b Data storage part 44 Input data 45 Weight 46 Model parameter 47 Test data 211 Input data reading unit 212 A posteriori probability estimating unit 213 Mixing ratio estimating unit 214 Weight writing unit 221 Input data reading unit 222 Weight reading unit 223 Model parameter estimation unit 224 Model parameter writing 231 Test data reading unit 232 Model parameter reading unit 233 Classification result output unit

Claims

One or more already classified data already classified in the target classification system which is a classification system for classifying the classification target data, and one already classified in the auxiliary classification system which is a classification system different from the target classification system A classification model generation device that generates a classification model for classifying the classification target data into one of a plurality of classes in the target classification system by performing learning using the above-described already classified data,
Storage means for storing information;
The error function of the classification model when it is predicted that each individual classification data in the two types of classification data described above is classified into any class of the target classification system, and the above-described 2 when the prediction is performed. The weights indicating the degree of influence on the classification model of each individual classified data in the types of already classified data, and the error function of each of the already classified data in the two types of already classified data A weight estimation unit that estimates the weight and stores the weight in the storage unit so as to minimize an expected error that is a sum of products of the value and the weight;
A model construction unit that generates the classification model using the weight stored in the storage unit and the two types of already-classified data;
A classification model generation device comprising:

The weight estimation unit includes:
The classification for each class of the target classification system and the auxiliary classification system for approximating a probability distribution model when the target classification system and the auxiliary classification system are integrated to a probability distribution model of the target classification system A posterior probability estimating unit that estimates the posterior probability of the two types of already-classified data, and stores the posterior probability in the storage unit, using a mixture ratio indicating a ratio of the degree of influence on the model;
The mixture ratio when the likelihood is maximized is estimated by using the posterior probability stored in the storage means so as to maximize the likelihood for the already classified data of the target classification system. Estimating the weight from a ratio, and storing the weight in the storage means;
The classification model generation device according to claim 1, further comprising:

The model building unit
Using the weights stored in the storage means and the two types of already classified data, estimating model parameters for classifying the classification target data in the target classification system in the classification model, The classification model generation apparatus according to claim 1, further comprising a model parameter estimation unit that stores model parameters in the storage unit.

A classification unit that classifies the classification target data into one of a plurality of classes in the target classification system using model parameters stored in the storage unit of the classification model generation device according to claim 3. Classification device.

One or more already classified data already classified in the target classification system which is a classification system for classifying the classification target data, and one already classified in the auxiliary classification system which is a classification system different from the target classification system Generation of a classification model by a classification model generation device that generates a classification model for classifying the classification target data into one of a plurality of classes in the target classification system by performing learning using the above-described already classified data A method,
The classification model generation device includes storage means for storing information, a weight estimation unit, and a model construction unit,
The weight estimation unit is configured to calculate an error function of the classification model when the individual classification data in the two types of classification data is predicted to be classified into any class of the target classification system, and the prediction. Using the respective weights indicating the degree of influence of the individual classified data in the two types of already classified data described above on the classification model, the individual previously classified data in the two types of previously classified data Performing a weight estimation step of estimating the weight and storing the weight in the storage means so as to minimize an expected error that is a sum of products of the value of the error function for each and the weight.
The model construction unit executes a model construction step of creating the classification model using the weights stored in the storage means and the two types of already-classified data described above. Method.

The weight estimation unit includes a posterior probability estimation unit and a mixture ratio estimation unit,
In the weight estimation step,
The posterior probability estimation unit is configured to approximate the probability distribution model obtained by integrating the target classification system and the auxiliary classification system to the probability distribution model of the target classification system, and the auxiliary classification system. Using a mixture ratio indicating the ratio of the degree of influence on the classification model for each class with the system, estimating the posterior probability for the two types of already-classified data, and storing the posterior probability in the storage means;
The mixture ratio estimation unit estimates the mixture ratio using the posterior probability stored in the storage unit so as to maximize the likelihood for the already classified data of the target classification system, and the likelihood is maximized. The classification model generation method according to claim 5, wherein the weight is estimated from the mixture ratio at the time of conversion into the storage unit, and the weight is stored in the storage unit.

The model construction unit includes a model parameter estimation unit,
In the model building step,
The model parameter estimation unit is a model for classifying the classification target data into the target classification system in the classification model using the weight stored in the storage means and the two types of already classified data. The classification model generation method according to claim 5, wherein a parameter is estimated and the model parameter is stored in the storage unit.

Using the model parameters stored in the storage means by the classification model generation method according to claim 7,
The classification unit in the classification device for classifying the classification target data,
Classifying the classification target data into any one of a plurality of classes in the target classification system.

The classification model production | generation program for functioning a computer as each part of the classification model production | generation apparatus as described in any one of Claims 1-3.

The classification program for functioning a computer as a classification | category part of the classification device of Claim 4.

A computer-readable recording medium in which the classification model generation program according to claim 9 or the classification program according to claim 10 is recorded.