JP2000293502A

JP2000293502A - Data sorting method and device and storage medium storing data sorting program

Info

Publication number: JP2000293502A
Application number: JP11098037A
Authority: JP
Inventors: Hiroyori Taira; 博順平; Takafumi Mukaiuchi; 隆文向内; Masahiko Haruno; 雅彦春野
Original assignee: Nippon Telegraph and Telephone Corp
Current assignee: Nippon Telegraph and Telephone Corp
Priority date: 1999-04-05
Filing date: 1999-04-05
Publication date: 2000-10-20

Abstract

PROBLEM TO BE SOLVED: To reduce a trend that the training data are decided in a category having a large quantity of data and to improve the deciding accuracy by preparing an object function having both positive and negative data weighting parameter included in an error term in a data sorting method that uses a support vector machine. SOLUTION: In a data sorting method using a support vector machine, an object function having both positive and negative data weighting parameters included in an error term is prepared. Then the parameter C of the object function is discriminated between the cases where the training data x belong to the positive and negative examples respectively, and an the object function is shown in an expression where Cp and Cn show the non-negative actual numbers. An object function minimization means of this data sorting device minimizes the object function according to the data x. A sorting decision means forms a separate hyperplane by means of the minimized object function and sorts the inputted test data into a category of a positive or negative example of the separate hyperplane and then outputs the sorted data.

Description

DETAILED DESCRIPTION OF THE INVENTION

【０００１】[0001]

【発明の属する技術分野】本発明は、データ分類方法及
び装置及びデータ分類プログラムを格納した記憶媒体に
係り、特に、目的関数を変えることで分類精度の向上を
図ったSupport Vector Machineを用いたデータ分類方法
及び装置及びデータ分類プログラムを格納した記憶媒体
に関する。BACKGROUND OF THE INVENTION 1. Field of the Invention The present invention relates to a data classification method and apparatus, and a storage medium storing a data classification program, and more particularly to data using a Support Vector Machine which improves classification accuracy by changing an objective function. The present invention relates to a classification method and apparatus and a storage medium storing a data classification program.

【０００２】大量のデータが流通している情報産業界で
は、データの効率的な分類が求められている。中でも、
Support Vector Machineを用いた分類方法は、高い分類
精度を上げることが知られており、広範囲の分野で使用
されている。本発明は、当該Support Vector Machineを
用いたデータ分類方法及び装置及びデータ分類プログラ
ムを格納した記憶媒体に関する。[0002] In the information industry where a large amount of data is distributed, efficient classification of data is required. Among them,
The classification method using Support Vector Machine is known to improve the classification accuracy, and is used in a wide range of fields. The present invention relates to a data classification method and apparatus using the Support Vector Machine and a storage medium storing a data classification program.

【０００３】[0003]

【従来の技術】従来のSupport Vector Machine（参考文
献 V.Vapnik. The Nature of Statistical Learning Th
eory. Springer-Verlag, New York, 1995.参照) を用い
たデータ分類について述べる。正しい例である正例と誤
った例である負例の２つのクラスのいずれかに属すｒ個
の訓練データのベクトルの集合を、（ｘ₁，ｙ₁），… （ｘ_r，ｙ_r）とする。ここで、ｘ_iは、データｉの特徴ベクトルでｎ
次元のベクトルである。また、ｙ_iは、データｉが正例
の場合＋１の値をとり、負例の場合には、−１の値をと
るスカラの変数である。データ分類では、データ中に出
現する特徴ｗ_k（但し、ｋは、１≦ｋ≦ｎの整数）がテ
キスト中に出現する場合、ｗ_k＝１、出現しない場合を
ｗ_k＝０としてデータをベクトルｘ_i＝（ｗ₁，ｗ₂，
…，ｗ_n）で表す。データがあるカテゴリに含まれる場
合を正例、含まれない場合を負例として、各カテゴリに
対してSupport Vector Machineを構成する。2. Description of the Related Art A conventional Support Vector Machine (reference document V. Vapnik. The Nature of Statistical Learning Th)
eory. Springer-Verlag, New York, 1995.) A set of r training data vectors belonging to one of two classes, a positive example which is a correct example and a negative example which is an incorrect example, is represented by (x ₁ , y ₁ ), ... (x _r , y _r ) And Here, x _i is a feature vector of data i and n
It is a dimensional vector. Y _i is a scalar variable that takes a value of +1 when the data i is a positive example and takes a value of −1 when the data i is a negative example. In the data classification, data is set as w _k = 1 when a feature w _k (where k is an integer of 1 ≦ k ≦ n) appearing in data is w _k = 1 and w _k = 0 otherwise. The vector x _i = (w ₁ , w ₂ ,
.., W _n ). A support vector machine is configured for each category, with data being included in a category as a positive example and data not being included in a category as a negative example.

【０００４】これらのデータをｎ次元Euclid空間上の（ｗ・ｘ）＋ｂ＝０なる超平面で分離する。この際、近接する正例と負例の
データ間の距離が大きい方が、精度よくテストデータを
分類できる。ここで、正例側の分離超平面と呼ばれる（ｗ・ｘ）＋ｂ＝１（１）なる超平面と負例側の分離超平面と呼ばれる（ｗ・ｘ）＋ｂ＝−１（２）を定義する。[0004] These data are separated by a hyperplane of (wx) + b = 0 on an n-dimensional Euclid space. At this time, the larger the distance between the data of the adjacent positive and negative examples, the more accurately the test data can be classified. Here, a hyperplane called (w · x) + b = 1 (1) called a separating hyperplane on the positive example side and (w · x) + b = -1 (2) defined as a separating hyperplane on the negative example side are defined. I do.

【０００５】２つの分離超平面間の距離は、The distance between two separating hyperplanes is

【０００６】[0006]

【数４】 (Equation 4)

【０００７】である。この距離を最大にするためには、
‖ｗ‖を最小化すればよい。その関数のとる値が最小値
の時、‖ｗ‖が最適値をとるような関数を目的関数と呼
ぶ。この場合、目的関数Φは以下に示すものになる。[0007] To maximize this distance,
It is sufficient to minimize {w}. When the value of the function is the minimum value, a function in which {w} takes the optimum value is called an objective function. In this case, the objective function Φ is as shown below.

【０００８】[0008]

【数５】 (Equation 5)

【０００９】ここで、ξ_iは訓練データｘ_i（ｉは１か
らｒまでの整数）の正例／負例が分離超平面によって分
離できない場合の分離超平面からの距離を表す非負の変
数である。右辺第一項は、二つの分離超平面ｗ・ｘ＋ｂ
＝１と、ｗ・ｘ＋ｂ＝−１間の距離を表し、この項の値
が小さいほど分離超平面間の距離が大きい。右辺第二項
のＣを除いた部分は、分離できなかった訓練データが２
つの超平面、ｗ・ｘ＋ｂ＝１あるいは、ｗ・ｘ＋ｂ＝−
１からの距離の和で誤差項（誤差項：正例の場合、ｗ・
ｘ＋ｂ＝１からの距離と、負例の場合、ｗ・ｘ＋ｂ＝−
１からの距離との和）と呼ぶ。Ｃは、第一項と第二項の
重視の度合いを決める正値（０以上）のパラメータであ
る。Ｃの値が大きいときは、訓練データの超平面からの
誤差が大きく評価されて、Ｃの値が小さい時は、相対的
に分離超平面間の距離の大きさが重視される。Here, ξ _i is a non-negative variable representing the distance from the separating hyperplane when the positive / negative examples of the training data x _i (i is an integer from 1 to r) cannot be separated by the separating hyperplane. is there. The first term on the right side is two separation hyperplanes wx + b
= 1 and the distance between w · x + b = −1. The smaller the value of this term, the greater the distance between the separating hyperplanes. Excluding C in the second term on the right side, the training data that could not be separated is 2
Two hyperplanes, w · x + b = 1 or w · x + b = −
The error term (error term: in the case of a positive example, w ·
distance from x + b = 1, and in the case of a negative example, w · x + b = −
(Sum of distance from 1). C is a positive value (0 or more) parameter that determines the degree of importance of the first and second terms. When the value of C is large, the error of the training data from the hyperplane is evaluated to be large, and when the value of C is small, the magnitude of the distance between the separated hyperplanes is relatively emphasized.

【００１０】[0010]

【発明が解決しようとする課題】しかしながら、上記従
来のSupport Vector Machineを用いたデータ分類では、
正例と負例の２つのカテゴリに分けられるデータについ
て学習を行った場合、正例／負例が未知のデータに対し
て、訓練データのデータ量が多いカテゴリ（正例もしく
は負例）に判定する傾向があり、判定精度が低くなる原
因の一つになっている。However, in the data classification using the conventional Support Vector Machine described above,
When learning is performed on data classified into two categories, positive and negative examples, data with unknown positive / negative examples is determined as a category (positive or negative example) with a large data amount of training data. This is one of the causes of the lowering of the determination accuracy.

【００１１】本発明は、上記の点に鑑みなされたもの
で、訓練データにおいて、データ量が多いカテゴリに判
定する傾向を緩和し、判定精度を高くすることが可能な
データ分類方法及び装置及びデータ分類プログラムを格
納した記憶媒体を提供することを目的とする。SUMMARY OF THE INVENTION The present invention has been made in view of the above points, and a data classification method and apparatus and a data classification method capable of reducing the tendency of a training data to be determined to be a category having a large data amount and increasing the determination accuracy. It is an object to provide a storage medium storing a classification program.

【００１２】[0012]

【課題を解決するための手段】本発明（請求項１）は、
サポートベクトルマシン(Support Vector Machine)を用
いたデータ分類方法において、正例データと負例データ
用の二種類の重み付けパラメータを誤差項に持つ目的関
数を持つ。Means for Solving the Problems The present invention (claim 1) provides:
A data classification method using a support vector machine has an objective function having two types of weighting parameters for positive example data and negative example data in an error term.

【００１３】本発明（請求項２）は、目的関数のパラメ
ータＣを訓練データｘが正例に属す場合と負例に属す場
合で区別し、目的関数を、Ｃ_p、Ｃ_nを非負の実数と
し、According to the present invention (claim 2), the parameter C of the objective function is distinguished between the case where the training data x belongs to a positive example and the case where the training data x belongs to a negative example, and the objective functions are represented by C _p and C _n which are non-negative real numbers. age,

【００１４】[0014]

【数６】 (Equation 6)

【００１５】とする。図１は、本発明の原理を説明する
ための図である。本発明（請求項３）は、正例と負例の
２つのクラスに属す訓練データが入力されると（ステッ
プ１）、該訓練データに応じた目的関数の最小化を行い
（ステップ２）、最小化された目的関数を用いて分離超
平面を構成し（ステップ３）、テストデータが入力され
ると、分離超平面の正例または、負例のいずれかのカテ
ゴリに分類し（ステップ４）、分類された分類結果を出
力する（ステップ５）。[0015] FIG. 1 is a diagram for explaining the principle of the present invention. According to the present invention (claim 3), when training data belonging to two classes of a positive example and a negative example is input (step 1), the objective function according to the training data is minimized (step 2). A separating hyperplane is constructed using the minimized objective function (step 3). When test data is input, the separating hyperplane is classified into either a positive example or a negative example category (step 4). Then, the classified result is output (step 5).

【００１６】本発明（請求項４）は、Support Vector M
achineを用いたデータ分類装置であって、正例データと
負例データ用の二種類の重み付けパラメータを誤差項に
持つ目的関数を持つ。本発明（請求項５）は、目的関数
のパラメータＣを訓練データｘが正例に属す場合と負例
に属す場合で区別し、目的関数を、Ｃ_p、Ｃ_nを非負の
実数とし、The present invention (claim 4) provides a Support Vector M
A data classification device using achine, having an objective function having two types of weighting parameters for positive example data and negative example data in an error term. The present invention (claim 5) distinguishes a parameter C of an objective function between a case where the training data x belongs to a positive example and a case where the training data x belongs to a negative example, and sets the objective functions to C _p and C _n as non-negative real numbers,

【００１７】[0017]

【数７】 (Equation 7)

【００１８】とする。図２は、本発明の原理構成図であ
る。本発明（請求項６）は、正例と負例の２つのクラス
に属す訓練データが入力されると、該訓練データに応じ
た目的関数の最小化を行う目的関数最小化手段１０と、
最小化された目的関数を用いて分離超平面を構成し、テ
ストデータが入力されると、分離超平面の正例または、
負例のいずれかのカテゴリに分類し、分類された分類結
果を出力する分類判定手段２０とを有する。It is assumed that FIG. 2 is a diagram illustrating the principle of the present invention. According to the present invention (claim 6), when training data belonging to two classes, a positive example and a negative example, is input, an objective function minimizing means 10 for minimizing an objective function according to the training data,
A separation hyperplane is constructed using the minimized objective function, and when test data is input, a positive example of the separation hyperplane or
A classification determining unit that classifies the result into one of the categories of the negative example and outputs the classified result.

【００１９】本発明（請求項７）は、Support Vector M
achineを用いたデータ分類装置に搭載されるデータ分類
プログラムを格納した記憶媒体であって、正例データと
負例データ用の二種類の重み付けパラメータを誤差項に
持つ目的関数を持つ。本発明（請求項８）は、目的関数
のパラメータＣを訓練データｘが正例に属す場合と負例
に属す場合で区別し、目的関数を、Ｃ_p、Ｃ_nを非負の
実数とし、The present invention (claim 7) provides a support vector M
A storage medium storing a data classification program mounted on a data classification device using achine, having an objective function having two types of weighting parameters for positive example data and negative example data in an error term. The present invention (claim 8) distinguishes the parameter C of the objective function between the case where the training data x belongs to a positive example and the case where the training data x belongs to a negative example, and sets the objective functions to C _p and C _n as non-negative real numbers,

【００２０】[0020]

【数８】 (Equation 8)

【００２１】とするプロセスを含む。本発明（請求項
９）は、正例と負例の２つのクラスに属す訓練データが
入力されると、該訓練データに応じた目的関数の最小化
を行う目的関数最小化プロセスと、最小化された目的関
数を用いて分離超平面を構成し、テストデータが入力さ
れると、分離超平面の正例または、負例のいずれかのカ
テゴリに分類し、分類された分類結果を出力する分類判
定プロセスとを有する。The process includes: According to the present invention (claim 9), when training data belonging to two classes of a positive example and a negative example is input, an objective function minimizing process for minimizing an objective function according to the training data, A separation hyperplane is constructed by using the objective function obtained, and when test data is input, the separation hyperplane is classified into either a positive example or a negative example category, and the classified result is output. A judgment process.

【００２２】上記のように、本発明では、上記の目的関
数中のＣ_pとＣ_nの２つのパラメータのうち、訓練デー
タ量が正例の方が多ければ、Ｃ_p＜Ｃ_n、負例の方が多
ければ、Ｃ_p＞Ｃ_nとすれば、データの少ないカテゴリ
のデータも正しく判定しやすくなり、データ分類の精度
が向上する。As described above, according to the present invention, of the two parameters C _p and C _{n in} the above objective function, if the training data amount is larger in the positive example, C _p <C _n , If C _p > C _n , it is easier to correctly determine data of a category with less data, and the accuracy of data classification is improved.

【００２３】[0023]

【発明の実施の形態】本発明では、Support Vector Mac
hineを用いた分類において目的関数のパラメータＣを訓
練データｘが正例に属す場合と、負例に属す場合で区別
し、目的関数を、DESCRIPTION OF THE PREFERRED EMBODIMENTS In the present invention, a Support Vector Mac
In the classification using hine, the parameter C of the objective function is distinguished between a case where the training data x belongs to a positive example and a case where the training data x belongs to a negative example.

【００２４】[0024]

【数９】 (Equation 9)

【００２５】とする。なお、Ｃ_p，Ｃ_nは共に非負の実
数である。本発明では、上記のような目的関数を使用す
ることを特徴とする。図３は、本発明のデータ分類装置
の構成を示す。同図に示すデータ分類装置は、目的関数
最小化部１０と分類判定部２０から構成される。It is assumed that Note that C _p and C _n are both non-negative real numbers. The present invention is characterized by using the above objective function. FIG. 3 shows the configuration of the data classification device of the present invention. The data classification device shown in FIG. 1 includes an objective function minimization unit 10 and a classification judgment unit 20.

【００２６】目的関数最小化部１０は、訓練データ３０
が入力されると、当該訓練データに応じた目的関数の最
小化を行い、最適なｗ及びｂを算出する。分類判定部２
０は、目的関数最小化部１０で算出されたｗとｂを用い
て、分離超平面を構成し、入力される分類対象データ
（テストデータ）に対して、正例／負例のいずれかのカ
テゴリに分類した分類結果を出力する。The objective function minimizing unit 10 generates the training data 30
Is input, the objective function according to the training data is minimized, and the optimal w and b are calculated. Classification judgment unit 2
0 is used to construct a separating hyperplane using w and b calculated by the objective function minimizing unit 10, and to input classification target data (test data), either positive or negative example Output the result of classification into categories.

【００２７】[0027]

【実施例】以下、図面と共に本発明の実施例を説明す
る。本実施例として、訓練データ及びテストデータに、
ＲＷＣＰテキストコーパス（参考文献：豊浦潤、徳永健
伸、井佐原均、岡隆一、ＲＷＣにおける分類コード付き
テキストデータベースの開発、情報処理学会研究報告Ｎ
ＬＣ96-13 ．ＩＥＩＣＥ，１９９６，参照）を用いて説
明する。当該コーパスは、１９９４年版の毎日新聞の約
３万件の記事に、国際十進分類法に基づくＵＤＣコード
（参考文献：情報科学技術協会、国際十進分類法、丸
善，１９９４．参照）を付与したものである。Embodiments of the present invention will be described below with reference to the drawings. In this embodiment, the training data and the test data include
RWCP text corpus (references: Jun Toyoura, Takenobu Tokunaga, Hitoshi Isahara, Ryuichi Oka, Development of a text database with classification codes in RWC, Information Processing Society of Japan research report N
LC96-13. IEICE, 1996). The corpus attaches UDC codes based on the International Decimal Classification method to the 30,000 articles of the 1994 edition of the Mainichi Shimbun (Ref .: Information and Technology Association, International Decimal Classification Method, Maruzen, 1994.) It was done.

【００２８】これらの記事の中から頻度の高い１０種類
の分類コード（スポーツ、刑法、政府、教育、交通、軍
事、国際関連、言語活動、演劇、作物）が付与されたデ
ータ２０００記事を選び、１０００記事を訓練データ、
残りの１０００記事をテストデータ、つまり、分類対象
データとした。訓練データとテストデータ数を以下の表
に示す。From these articles, data 2000 articles to which ten kinds of frequently classified codes (sports, criminal law, government, education, transportation, military, international relations, language activities, drama, crops) are added are selected. Training data for 1000 articles,
The remaining 1000 articles were used as test data, that is, data to be classified. The following table shows the training data and the number of test data.

【００２９】[0029]

【表１】 [Table 1]

【００３０】これらの、記事に対して形態素解析を行っ
た後、一つの記事の中に特定の名詞及び固有名詞の出現
するか否かを記事の特徴とみなし、特徴ベクトルを構成
した。分類装置は、上記の１０種類の分類に対象して、
１０台の分類装置を構成する。例えば、スポーツに関す
る分類装置においては、スポーツの分類コードが付与さ
れたデータを正例、付与されていないデータを負例と
し、テストデータに対して正例／負例のいずれかのカテ
ゴリに入るかを判定する。よって、スカラの変数ｙ_iも
分類装置毎に設定する。After performing morphological analysis on these articles, whether or not specific nouns and proper nouns appear in one article is regarded as a feature of the article, and a feature vector is constructed. The classification device targets the above 10 types of classification,
10 classifiers are configured. For example, in a classification apparatus for sports, data to which a sports classification code is assigned is regarded as a positive example, and data not assigned is regarded as a negative example. Is determined. Therefore, a scalar variable y _i is also set for each classification device.

【００３１】次に、データ分類装置の動作について説明
する。まず、１０００個の訓練データが入力されると、
目的関数最小化部１０は、訓練データに応じた目的関数
の最小化を行い、最適なｗ及びｂを算出する。次に、分
類判定部２０は、目的関数最小化部１０で算出されたｗ
及びｂを用いて分離超平面を構成し、入力される１００
０個のテストデータに対して、正例／負例のいずれかの
カテゴリに分類し、分類結果を出力する。Next, the operation of the data classification device will be described. First, when 1000 training data are input,
The objective function minimizing unit 10 minimizes the objective function according to the training data, and calculates optimal w and b. Next, the classification determining unit 20 calculates w calculated by the objective function minimizing unit 10.
And b are used to construct a separating hyperplane and 100
The zero test data is classified into either a positive or negative example category, and a classification result is output.

【００３２】分類精度を評価するために、適合率、再現
率、Ｆ値（参考文献：B.M.Sundheim. Overview of the
Fourth Message Understanding Evaluation and Confer
ence. Proceedings of Fourth Message Understanding
Conference, pp. 3-29, 1992. 参照) を用いた。各分類
毎に、分類モデルと正解の正例と負例の数から、ａ：正解が正例で分類モデルも正例と判断した数；ｂ：正解が負例で分類モデルも正例と判断した数；ｃ：正解が正例で分類モデルも負例と判断した数；とを考える。すると、適合率Ｐ、再現率Ｒは、次のよう
に定義される。In order to evaluate the classification accuracy, the precision, recall, and F value (reference: BMSundheim. Overview of the
Fourth Message Understanding Evaluation and Confer
ence. Proceedings of Fourth Message Understanding
Conference, pp. 3-29, 1992). For each classification, based on the classification model and the number of positive and negative examples of correct answers, a: the number of correct answers is positive and the classification model is also determined as positive; b: the correct answer is negative and the classification model is also determined as positive C: the number of correct answers determined to be positive examples and the classification model determined to be negative examples; Then, the precision P and the recall R are defined as follows.

【００３３】[0033]

【数１０】 (Equation 10)

【００３４】また、Ｆ値は適合率、再現率より、The F value is obtained from the precision and the recall.

【００３５】[0035]

【数１１】 [Equation 11]

【００３６】で表される。ここで、βは重み付けパラメ
ータで本実施例ではβ＝１とした。本発明において、Ｃ
_p＝３０とＣ_n＝８とした場合と従来の方法により、Ｃ
＝１０¹²とした場合を比較した結果を以下に示す。Is represented by Here, β is a weighting parameter, and in this embodiment, β = 1. In the present invention, C
_{In the case where p} = 30 and C _n = 8 and the conventional method, C
= 10 ¹² and the result of comparison is shown below.

【００３７】[0037]

【表２】 [Table 2]

【００３８】Ｆ値は、０から１までの値をとり、１に近
いほど精度が高いので、各カテゴリともＣを分離した方
が分類精度が高く、本発明がデータ分類の精度を上げる
のに有効であることが分かる。また、図３に示す（５）
式に示す目的関数を用いる目的関数最小化部１０と分類
判定部２０をプログラムとして構築しておき、データ分
類装置として利用されるコンピュータに接続されるディ
スク装置や、フロッピー（登録商標）ディスク、ＣＤ−
ＲＯＭ等の可搬記憶媒体に格納しておき、本発明を実施
する際にインストールすることにより、容易に本発明を
実現できる。The F value takes a value from 0 to 1 and the closer to 1, the higher the accuracy. Therefore, separating C for each category increases the classification accuracy, and the present invention increases the accuracy of data classification. It turns out to be effective. Also, (5) shown in FIG.
The objective function minimizing unit 10 and the classification judging unit 20 using the objective function shown in the equation are constructed as a program, and a disk device connected to a computer used as a data classifying device, a floppy (registered trademark) disk, a CD −
The present invention can be easily realized by storing it in a portable storage medium such as a ROM or the like and installing it when implementing the present invention.

【００３９】なお、本発明は、上記の実施例に限定され
ることなく、特許請求の範囲内で種々変更・応用が可能
である。The present invention is not limited to the above embodiment, but can be variously modified and applied within the scope of the claims.

【００４０】[0040]

【発明の効果】上述のように、本発明によれば、正例／
負例が未知のデータに対して、訓練データのデータ量が
多いカテゴリに誤って判定する傾向が緩和させるデータ
分類精度が向上する、という効果が得られる。As described above, according to the present invention, the positive /
This has the effect of improving the data classification accuracy in which the tendency of erroneously determining a training data to be a category having a large amount of training data with respect to data whose negative example is unknown is reduced.

【図面の簡単な説明】[Brief description of the drawings]

【図１】本発明の原理を説明するための図である。FIG. 1 is a diagram for explaining the principle of the present invention.

【図２】本発明の原理構成図である。FIG. 2 is a principle configuration diagram of the present invention.

【図３】本発明のデータ分類装置の構成図である。FIG. 3 is a configuration diagram of a data classification device of the present invention.

[Explanation of symbols]

１０目的関数最小化手段、目的関数最小化部２０分類判定手段、分類判定部３０訓練データ Reference Signs List 10 objective function minimizing means, objective function minimizing unit 20 classification determining means, classification determining unit 30 training data

───────────────────────────────────────────────────── フロントページの続き (72)発明者春野雅彦京都府相楽郡精華町大字乾谷小字三平谷５番地株式会社エイ・ティ・アール人間情報通信研究所内Ｆターム(参考） 5B075 ND02 NR02 NR12 PR06 QM08 ────────────────────────────────────────────────── ─── Continuing from the front page (72) Inventor Masahiko Haruno 5th Sanraya, Daiya, Seika-cho, Sagara-gun, Kyoto F-term in ATI Human Information and Communication Research Laboratories Co., Ltd. 5B075 ND02 NR02 NR12 NR12 PR06 QM08

Claims

[Claims]

1. Support Vector Machine (Support Vecto)
r Machine), which comprises an objective function having two types of weighting parameters for positive example data and negative example data in an error term.

2. A parameter C of the objective function is distinguished between a case where the training data x belongs to a positive example and a case where the training data x belongs to a negative example. The objective function is defined as C _p and C _n being non-negative real numbers. ] The data classification method according to claim 1, wherein

3. When training data belonging to two classes, a positive example and a negative example, is input, the objective function according to the training data is minimized, and the separation is performed using the minimized objective function. The data classification according to claim 1, wherein a plane is configured, and when test data is input, the separated hyperplane is classified into one of a positive example category and a negative example category, and a classified result is output. Method.

4. A data classification device using a Support Vector Machine, characterized by having an objective function having two types of weighting parameters for positive example data and negative example data in an error term.

5. A parameter C of the objective function is distinguished between a case where the training data x belongs to a positive example and a case where the training data x belongs to a negative example, and the objective function is a non-negative real number of C _p and C _n. ] The data classification device according to claim 4, wherein

6. An objective function minimizing means for inputting training data belonging to two classes, a positive example and a negative example, for minimizing the objective function according to the training data, Classification determining means for forming a separation hyperplane using a function and, when test data is input, classifying the separation hyperplane into either a positive example or a negative example category and outputting a classified result The data classification device according to claim 4, comprising:

7. A storage medium storing a data classification program mounted on a data classification device using a Support Vector Machine, wherein an error term has two types of weighting parameters for positive example data and negative example data. A storage medium storing a data classification program characterized by having a function.

8. The parameter C of the objective function is distinguished between a case where the training data x belongs to a positive example and a case where the training data x belongs to a negative example, and the objective function is a non-negative real number of C _p and C _n. ] A storage medium storing the data classification program according to claim 7, which has a process of:

9. An objective function minimization process for inputting training data belonging to two classes, a positive example and a negative example, for minimizing the objective function according to the training data, A separation hyperplane using a function, and when test data is input, the separation hyperplane is classified into either a positive example or a negative example category, and a classified result is output. 7. The method of claim 7, wherein
Or a storage medium storing the data classification program according to 8.