JP7359729B2

JP7359729B2 - Classification device and method

Info

Publication number: JP7359729B2
Application number: JP2020053210A
Authority: JP
Inventors: 和彦篠田; 洋隆梶; 将杉山
Original assignee: University of Tokyo NUC; Toyota Motor Corp
Current assignee: University of Tokyo NUC; Toyota Motor Corp
Priority date: 2020-03-24
Filing date: 2020-03-24
Publication date: 2023-10-11
Anticipated expiration: 2040-03-24
Also published as: JP2021152799A

Description

特許法第３０条第２項適用令和１年１１月１５日、ウェブサイト（ｈｔｔｐｓ：／／ｄｒｉｖｅ．ｇｏｏｇｌｅ．ｃｏｍ／ｆｉｌｅ／ｄ／１ｓＮＮｇ３９Ｑ８ｂＥＴｔＨｃ２Ｌｄ０ｏｒＴｄｑ００ｐＰＮＧ３ＳＯ／ｖｉｅｗ）で公開された第２２回情報論的学習理論ワークショップ（ＩＢＩＳ２０１９ＩＢＩＳ）のポスターセッションのプレビュースライドにて公開Application of Article 30, Paragraph 2 of the Patent Act The 22nd Information Theory Study published on the website (https://drive.google.com/file/d/1sNNg39Q8bETtHc2Ld0orTdq00pPNG3SO/view) on November 15, 2020 Published as a preview slide of the poster session of the theory workshop (IBIS 2019 IBIS)

特許法第３０条第２項適用令和１年１１月２１日、ウインク愛知において開催された第２２回情報論的学習理論ワークショップ（ＩＢＩＳ２０１９ＩＢＩＳ）で公開Application of Article 30, Paragraph 2 of the Patent Act Published at the 22nd Information-Based Learning Theory Workshop (IBIS 2019 IBIS) held at Wink Aichi on November 21, 2020

特許法第３０条第２項適用令和２年１月２９日、ウェブサイト（ｈｔｔｐｓ：／／ａｒｘｉｖ．ｏｒｇ／ａｂｓ／２００１．１０６４２）で公開Application of Article 30, Paragraph 2 of the Patent Act Published on the website (https://arxiv.org/abs/2001.10642) on January 29, 2020

本発明は、データの分類を行う分類装置及び分類方法の技術分野に関する。 The present invention relates to the technical field of a classification device and a classification method for classifying data.

この種の装置として、収集したデータを正のデータと負のデータとに分類するものが知られている。例えば非特許文献１では、正のデータ及び正のデータの信頼度を用いて、正のデータと負のデータとの分類境界を学習する技術が開示されている。 As this type of device, one is known that classifies collected data into positive data and negative data. For example, Non-Patent Document 1 discloses a technique for learning classification boundaries between positive data and negative data using positive data and reliability of the positive data.

Takeshi Ishida, Gang Niu, and Masashi Sugiyama. 2018. Binary classification from positive-confidence data. In Proceedings of the 32nd International Conference on Neural Information Processing Systems (NIPS’18).Takeshi Ishida, Gang Niu, and Masashi Sugiyama. 2018. Binary classification from positive-confidence data. In Proceedings of the 32nd International Conference on Neural Information Processing Systems (NIPS’18).

しかしながら、正のデータのみが得られる状況では、その信頼度のバイアス（即ち、真の分布からのずれ）の有無や度合いを判断することができない。このため、上述した非特許文献１に係る技術では、データ分類の精度が低下してしまうという技術的問題点が生じ得る。 However, in a situation where only positive data are obtained, it is not possible to determine whether or not there is a bias in the reliability (that is, a deviation from the true distribution). For this reason, the technique according to Non-Patent Document 1 described above may cause a technical problem in that the accuracy of data classification decreases.

本発明は、例えば上記問題点に鑑みてなされたものであり、正のデータとその信頼度を用いて適切にデータを分類することが可能な分類装置及び分類方法を提供することを課題とする。 The present invention has been made, for example, in view of the above problems, and an object of the present invention is to provide a classification device and a classification method that can appropriately classify data using positive data and its reliability. .

本発明に係る分類装置の一態様は、複数のデータを正例と負例とに分類する分類装置であって、前記正例及び前記正例の信頼度である正信頼度に基づいて、前記正例と前記負例とを分類する境界を学習する学習手段と、補正パラメータを用いて前記正信頼度を補正する補正手段と、前記正例が誤って前記負例として分類される確率である誤分類率について、予め求めた所定値と前記境界を用いた場合の実際の値との差が小さくなるように、前記補正パラメータを更新する更新手段とを備える。 One aspect of the classification device according to the present invention is a classification device that classifies a plurality of data into positive examples and negative examples, wherein a learning means for learning a boundary for classifying the positive example and the negative example; a correction means for correcting the correctness reliability using a correction parameter; and a probability that the positive example is erroneously classified as the negative example. Regarding the misclassification rate, an updating means is provided for updating the correction parameter so that a difference between a predetermined value determined in advance and an actual value when the boundary is used becomes smaller.

本発明に係る分類方法の一態様は、複数のデータを正例と負例とに分類する分類方法であって、前記正例及び前記正例の信頼度である正信頼度に基づいて、前記正例と前記負例とを分類する境界を学習する学習工程と、補正パラメータを用いて前記正信頼度を補正する補正工程と、前記正例が誤って前記負例として分類される確率である誤分類率について、予め求めた所定値と前記境界を用いた場合の実際の値との差が小さくなるように、前記補正パラメータを更新する更新工程とを含む。 One aspect of the classification method according to the present invention is a classification method for classifying a plurality of data into positive examples and negative examples, wherein the classification method classifies a plurality of data into positive examples and negative examples. a learning step of learning a boundary for classifying positive examples and the negative examples; a correction step of correcting the accuracy reliability using a correction parameter; and a probability that the positive example is erroneously classified as the negative example. Regarding the misclassification rate, the method includes an updating step of updating the correction parameter so that a difference between a predetermined value obtained in advance and an actual value when the boundary is used is reduced.

上述した分類装置及び分類方法の一の態様によれば、正のデータとその信頼度を用いて適切にデータを分類することが可能である。 According to one aspect of the classification device and classification method described above, it is possible to appropriately classify data using positive data and its reliability.

実施形態に係る分類装置の構成を示すブロック図である。FIG. 1 is a block diagram showing the configuration of a classification device according to an embodiment. 実施形態に係る分類装置の動作の流れを示すフローチャートである。It is a flowchart which shows the flow of operation of the classification device concerning an embodiment. 正例の誤分類率を示す概念図である。It is a conceptual diagram which shows the misclassification rate of a positive example. 二値分類器で学習される分類境界の一例を示す概念図である。FIG. 2 is a conceptual diagram showing an example of classification boundaries learned by a binary classifier. 実施形態に係る分類装置を用いてドライバの眠気を予測した結果を示す表である。It is a table showing the results of predicting driver drowsiness using the classification device according to the embodiment.

以下、図面を参照して分類装置及び分類方法の実施形態について説明する。 Embodiments of a classification device and a classification method will be described below with reference to the drawings.

＜装置構成＞
まず、本実施形態に係る分類装置の構成について、図１を参照して説明する。図１は、実施形態に係る分類装置の構成を示すブロック図である。 <Device configuration>
First, the configuration of the classification device according to this embodiment will be explained with reference to FIG. FIG. 1 is a block diagram showing the configuration of a classification device according to an embodiment.

図１において、本実施形態に係る分類装置１０は、収集した複数のデータを正例（即ち、正のデータ）と負例（即ち、負のデータ）とに分類することが可能に構成されている。分類装置１０は、例えば演算回路やメモリ等を備えて構成されている。分類装置１０は、その機能を実現するための構成要素として、記録部１００と、学習部２００とを備えて構成されている。 In FIG. 1, a classification device 10 according to the present embodiment is configured to be able to classify a plurality of collected data into positive examples (i.e., positive data) and negative examples (i.e., negative data). There is. The classification device 10 includes, for example, an arithmetic circuit, a memory, and the like. The classification device 10 includes a recording section 100 and a learning section 200 as components for realizing its functions.

記録部１００は、分類装置１０が用いる各種パラメータを記録可能に構成されている。記録部１００は、分類装置１０による分類の対象となるデータｘ、正信頼度ｒ、及び正例の誤分類率φを記録しており、それらを学習部２００に出力可能に構成されている。なお、「正信頼度ｒ」とは、データｘに含まれる正例の信頼度（即ち、正例がどれだけ正しいか）を示すパラメータである。「誤分類率φ」は、正例が誤って負例として分類されてしまう確率を示すパラメータである。記録部１００は更に、学習部１００で学習された二値分類器ｇを記録可能に構成されている。「二値分類器ｇ」は、データｘを正例と負例とに分類する分類境界を示すパラメータである。 The recording unit 100 is configured to be able to record various parameters used by the classification device 10. The recording unit 100 records the data x to be classified by the classification device 10, the accuracy r, and the misclassification rate φ of positive examples, and is configured to be able to output them to the learning unit 200. Note that the "correctness reliability r" is a parameter indicating the reliability of the positive example included in the data x (that is, how accurate the positive example is). “Misclassification rate φ” is a parameter indicating the probability that a positive example is erroneously classified as a negative example. The recording unit 100 is further configured to be able to record the binary classifier g learned by the learning unit 100. The "binary classifier g" is a parameter that indicates a classification boundary for classifying data x into positive examples and negative examples.

学習部２００は、分類器学習部２１０と、パラメータ調整部２２０とを備えている。分類器学習部２１０は、記録部１００から入力されるデータｘ、正信頼度ｒ、及び正例の誤分類率φを用いて、二値分類器ｇの学習を実行する。また、分類器学習部２１０は、学習の際に補正パラメータｋ*を用いて正信頼度ｒを補正する。分類器学習部２１０は、後述する付記における「学習手段」及び「補正手段」の一具体例である。パラメータ調整部２２０は、分類器学習部２１０の学習に用いられる補正パラメータｋ*を調整（言い換えれば、更新）する。パラメータ調整部２２０は、後述する付記における「更新手段」の一具体例である。分類器学習部２１０及びパラメータ調整部２２０の具体的な動作については、以下で詳しく説明する。 The learning section 200 includes a classifier learning section 210 and a parameter adjustment section 220. The classifier learning unit 210 executes learning of the binary classifier g using the data x input from the recording unit 100, the correctness reliability r, and the misclassification rate φ of positive examples. In addition, the classifier learning unit 210 corrects the accuracy r using the correction parameter k* during learning. The classifier learning unit 210 is a specific example of a "learning means" and a "correction means" in the supplementary notes described later. The parameter adjustment unit 220 adjusts (in other words, updates) the correction parameter k* used for learning by the classifier learning unit 210. The parameter adjustment unit 220 is a specific example of "updating means" in the supplementary notes described later. The specific operations of the classifier learning section 210 and the parameter adjustment section 220 will be described in detail below.

＜動作説明＞
次に、本実施形態に係る分類装置１００の動作の流れについて、図２を参照して説明する。図２は、実施形態に係る分類装置の動作の流れを示すフローチャートである。 <Operation explanation>
Next, the flow of operation of the classification device 100 according to this embodiment will be explained with reference to FIG. 2. FIG. 2 is a flowchart showing the flow of operations of the classification device according to the embodiment.

図２に示すように、本実施形態に係る分類装置１００の動作時には、まず分類器学習部２１０が、補正パラメータｋ及びｋ*、並びにｍ、Δ*に、それぞれ初期値を代入する（ステップＳ１１）。補正パラメータｋは、実際に用いられる補正パラメータｋ*を更新するために算出される値である。補正パラメータｋ及びｋ*の初期値は適当な値であってよい。ただし、経験上適切な初期値が計算できる場合には、その初期値を代入してもよい。ｍは処理が繰り返される回数をカウントするためのパラメータであり、例えば“１”が初期値として代入される。Δ*は、予め算出される正例の誤分類率φと実際の誤分類率との二乗誤差に対応するパラメータであり、例えば十分に大きな値が初期値として代入される。 As shown in FIG. 2, when the classification device 100 according to the present embodiment operates, the classifier learning unit 210 first assigns initial values to the correction parameters k and k*, as well as m and Δ* (step S11 ). The correction parameter k is a value calculated to update the correction parameter k* that is actually used. The initial values of the correction parameters k and k* may be any appropriate values. However, if an appropriate initial value can be calculated from experience, that initial value may be substituted. m is a parameter for counting the number of times the process is repeated; for example, "1" is assigned as an initial value. Δ* is a parameter corresponding to the square error between the pre-calculated misclassification rate φ of positive examples and the actual misclassification rate, and for example, a sufficiently large value is substituted as an initial value.

続いて、分類器学習部２１０は、補正パラメータｋを用いて正信頼度ｒを補正する（ステップＳ１２）。具体的には、正信頼度ｒは“ｒ^ｋ”として補正される。 Next, the classifier learning unit 210 corrects the accuracy r using the correction parameter k (step S12). Specifically, the accuracy degree r is corrected as "r ^k ".

続いて、分類器学習部２１０は、分類リスク最小化を行って二値分類器ｇを学習する（ステップＳ１３）。具体的には、分類器学習部２１０は、下記式（１）で示すように、損失関数ｌが小さくなるように二値分類器ｇを学習する。 Subsequently, the classifier learning unit 210 performs classification risk minimization to learn the binary classifier g (step S13). Specifically, the classifier learning unit 210 learns the binary classifier g so that the loss function l becomes small, as shown in equation (1) below.

続いて、分類器学習部２１０は、予め算出された正例の誤分類率φと実際の誤分類率との二乗誤差を求め、Δとする（ステップＳ１４）。なお、正例の誤分類率φ及び二乗誤差Δは、それぞれ下記式（２）及び（３）を用いて算出することができる。 Next, the classifier learning unit 210 calculates the square error between the pre-calculated misclassification rate φ of the positive example and the actual misclassification rate, and sets it as Δ (step S14). Note that the misclassification rate φ and the squared error Δ of positive examples can be calculated using the following formulas (2) and (3), respectively.

なお、ここでの二乗誤差Δは、予め算出された正例の誤分類率φと、二値分類器ｇによる実際の誤分類率との差分を知るために算出される。このため、算出されるΔは、二乗誤差に限定されるわけではなく差分を示す値であればよい。 Note that the squared error Δ here is calculated in order to know the difference between the misclassification rate φ of the positive example calculated in advance and the actual misclassification rate by the binary classifier g. Therefore, the calculated Δ is not limited to the squared error, but may be any value that indicates the difference.

続いて、パラメータ調整部２２０は、Δ*がΔより大きいか否かを判定する（ステップＳ１５）。そして、Δ*がΔより大きい場合（ステップＳ１５：ＹＥＳ）、パラメータ調整部２２０は、Δ*をΔの値で更新し、ｋ*をｋの値で更新する（ステップＳ１６）。一方、Δ*がΔより大きくない場合（ステップＳ１５：ＮＯ）、パラメータ調整部２２０は、Δ*及びｋ*を更新しない（即ち、ステップＳ１６の処理は省略される）。 Subsequently, the parameter adjustment unit 220 determines whether Δ* is larger than Δ (step S15). If Δ* is larger than Δ (step S15: YES), the parameter adjustment unit 220 updates Δ* with the value of Δ and updates k* with the value of k (step S16). On the other hand, if Δ* is not larger than Δ (step S15: NO), the parameter adjustment unit 220 does not update Δ* and k* (that is, the process of step S16 is omitted).

その後、パラメータ調整部２２０は、ｍを１増やす（ステップＳ１７）。そして、ｍが事前に定めた上限回数Ｍを超えているか、又はΔが閾値εより小さくなっているかを判定する（ステップＳ１８）。なお、閾値εは、二乗誤差が十分に小さくなったことを判定するために予め設定される閾値である。 After that, the parameter adjustment unit 220 increases m by 1 (step S17). Then, it is determined whether m exceeds a predetermined upper limit number M or whether Δ is smaller than a threshold value ε (step S18). Note that the threshold value ε is a threshold value set in advance to determine that the squared error has become sufficiently small.

ｍが事前に定めた上限回数Ｍを超えておらず、且つ、Δが閾値εより小さくなっていない場合（ステップＳ１８：ＮＯ）、パラメータ調整部２２０は、所定の手続きで補正パラメータｋを更新する（ステップＳ１９）。具体的には、Δがより小さい値として算出されるように、補正パラメータｋが更新される。補正パラメータｋが更新されると、更新後の補正パラメータｋを用いてステップＳ１２以降の処理が繰り返される。 If m does not exceed the predetermined upper limit number M and Δ has not become smaller than the threshold ε (step S18: NO), the parameter adjustment unit 220 updates the correction parameter k according to a predetermined procedure. (Step S19). Specifically, the correction parameter k is updated so that Δ is calculated as a smaller value. Once the correction parameter k is updated, the processes from step S12 onward are repeated using the updated correction parameter k.

他方、ｍが事前に定めた上限回数Ｍを超えている、又はΔが閾値εより小さくなっている場合（ステップＳ１８：ＹＥＳ）、分類器学習部２１０は、補正パラメータｋ*を用いて正信頼度ｒを補正する（ステップＳ２０）。具体的には、正信頼度ｒは“ｒ^ｋ*”として補正される。 On the other hand, if m exceeds the predetermined upper limit M or Δ is smaller than the threshold ε (step S18: YES), the classifier learning unit 210 uses the correction parameter k* to determine the correct reliability. The degree r is corrected (step S20). Specifically, the accuracy degree r is corrected as “r ^k* ”.

続いて、分類器学習部２１０は、分類リスク最小化を行って二値分類器ｇを学習する（ステップＳ２１）。具体的には、分類器学習部２１０は、下記式（４）で示すように、損失関数ｌが小さくなるように二値分類器ｇを学習する。 Subsequently, the classifier learning unit 210 performs classification risk minimization to learn the binary classifier g (step S21). Specifically, the classifier learning unit 210 learns the binary classifier g so that the loss function l becomes small, as shown in equation (4) below.

＜技術的効果＞
次に、本実施形態に係る分類装置１０の技術的効果について、図３及び図４を参照して説明する。図３は、正例の誤分類率を示す概念図である。図４は、二値分類器で学習される分類境界の一例を示す概念図である。 <Technical effect>
Next, the technical effects of the classification device 10 according to this embodiment will be explained with reference to FIGS. 3 and 4. FIG. 3 is a conceptual diagram showing the misclassification rate of positive examples. FIG. 4 is a conceptual diagram showing an example of classification boundaries learned by a binary classifier.

図３に示すように、正例の誤分類率φは、正例を分類境界で分類した場合に、誤って負例（即ち、分類境界の反対側）に分類されてしまう部分の割合である。そして、本願発明者の研究するところによれば、予め算出された正例の誤分類率φと、二値分類器ｇによる実際の誤分類率との差分が小さくなるように二値分類器ｇを学習すれば、正信頼度ｒに生じるバイアスの影響を小さくできることが判明している。従って、本実施形態に係る分類装置１０によれば、二値分類器ｇをより適切に学習することが可能である。 As shown in Figure 3, the misclassification rate φ of positive examples is the proportion of the portion that is incorrectly classified as a negative example (i.e., on the opposite side of the classification boundary) when positive examples are classified at the classification boundary. . According to the research of the present inventor, the binary classifier g It has been found that the influence of bias on the accuracy r can be reduced by learning . Therefore, according to the classification device 10 according to this embodiment, it is possible to learn the binary classifier g more appropriately.

図４において、比較例に係る分類境界は、本実施形態のように正例の誤分類率φを用いずに学習した場合（例えば、非特許文献１の技術をそのまま用いた場合）の分類境界である。図を見ても分かるように、比較例に係る分類境界では、正例（図中の黒丸）の一部が負例として分類されることになってしまう。一方、本実施形態に係る分類境界（即ち、正例の誤分類率φを用いずに学習した場合）は、正例と負例とが精度よく分類されている。このように、本実施形態に係る分類装置１０によれば、正例と負例との分類精度を向上させることが可能である。 In FIG. 4, the classification boundary according to the comparative example is the classification boundary when learning is performed without using the misclassification rate φ of positive examples as in the present embodiment (for example, when the technique of Non-Patent Document 1 is used as is). It is. As can be seen from the figure, at the classification boundary according to the comparative example, some of the positive examples (black circles in the figure) are classified as negative examples. On the other hand, in the classification boundary according to the present embodiment (that is, when learning is performed without using the misclassification rate φ of positive examples), positive examples and negative examples are classified with high accuracy. In this way, according to the classification device 10 according to the present embodiment, it is possible to improve the accuracy of classifying positive examples and negative examples.

＜具体的な適用例＞
次に、本実施形態に係る分類装置１０の具体的な適用例について、図５を参照して説明する。図５は、実施形態に係る分類装置を用いてドライバの眠気を予測した結果を示す表である。 <Specific application examples>
Next, a specific application example of the classification device 10 according to this embodiment will be described with reference to FIG. 5. FIG. 5 is a table showing the results of predicting driver drowsiness using the classification device according to the embodiment.

本実施形態に係る分類装置１０は、車両のドライバの眠気を予測する装置（具体的には、眠気がない状態か、眠気がある状態かを分類する装置）に適用することができる。 The classification device 10 according to the present embodiment can be applied to a device that predicts the drowsiness of a vehicle driver (specifically, a device that classifies whether the driver is not sleepy or has sleepiness).

図５において、この装置では、ドライバ１～３の心電位のＲ－Ｒ間隔から計算した７つの特徴量を使用してドライバの眠気を予測している。ここでの比較例は、本実施形態のように正例の誤分類率φを用いない装置（例えば、非特許文献１をそのまま適用した装置）を適用したものであるが、すべて覚醒状態（即ち、眠気がない状態）であると予測されてしまい、眠気を示す値を算出することができていない。一方で、本実施形態に係る分類装置１０を適用したものでは、それぞれ眠気を示す値を算出できている。また、算出された値についても、教師データあり学習の場合と大きく変わらない。このような結果から、本実施形態に係る分類装置１０は、比較例と比べて有益な効果を奏するものであることが分かる。 In FIG. 5, this device predicts the driver's drowsiness using seven feature quantities calculated from the RR intervals of the cardiac potentials of drivers 1 to 3. The comparative examples here are those in which an apparatus that does not use the misclassification rate φ of positive examples as in the present embodiment (for example, an apparatus to which Non-Patent Document 1 is directly applied) is applied, but all of them are in an awake state (i.e. , a state of no drowsiness), and it is not possible to calculate a value indicating drowsiness. On the other hand, the classification device 10 according to the present embodiment is able to calculate values indicating drowsiness. Also, the calculated values are not significantly different from those in learning with supervised data. From these results, it can be seen that the classification device 10 according to the present embodiment has more beneficial effects than the comparative example.

また、上述した眠気を予測する装置では、その教師データとして正例（眠気がない状態を示すデータ）を収集することは容易である一方で、負例（眠気がある状態を示すデータ）を収集することが難しい。なぜなら、眠気がある状態を示すデータを収集するには、眠気があるドライバに車両を運転させることになり、安全上の問題が発生する可能性があるからである。しかしながら、本実施形態に係る分類装置１０によれば、すでに説明したように、負例を用いずに適切な分類境界を学習することができる。 In addition, with the device for predicting drowsiness described above, while it is easy to collect positive examples (data indicating a state of not being drowsy) as training data, it also collects negative examples (data indicating a state of drowsiness). difficult to do. This is because collecting data indicating a drowsy state requires a drowsy driver to drive the vehicle, which may pose a safety problem. However, according to the classification device 10 according to the present embodiment, as described above, it is possible to learn appropriate classification boundaries without using negative examples.

本実施形態に係る分類装置１０は、上記例のように正例が容易に取得できる一方で、負例が取得し難い状況において顕著に効果を発揮する。例えば、新規自社ユーザの継続確率を評価したいが、自社ユーザのデータ及びロイヤリティスコアしか持っていないような状況においても有用である。この場合、毎年どの程度のユーザが離脱するかが分かっていれば、本実施形態に係る分類装置１０を適用して精度よく二値分類が可能となる。 The classification device 10 according to the present embodiment is particularly effective in situations where positive examples can be easily acquired, but negative examples are difficult to acquire, as in the above example. For example, it is also useful in a situation where you want to evaluate the continuation probability of new users of your company, but only have data and loyalty scores of your company's users. In this case, if it is known how many users leave each year, it becomes possible to perform binary classification with high accuracy by applying the classification device 10 according to this embodiment.

＜付記＞
以上説明した実施形態に関して、更に以下の付記を開示する。 <Additional notes>
Regarding the embodiment described above, the following additional notes are further disclosed.

（付記１）
付記１に記載の分類装置は、複数のデータを正例と負例とに分類する分類装置であって、
前記正例及び前記正例の信頼度である正信頼度に基づいて、前記正例と前記負例とを分類する境界を学習する学習手段と、補正パラメータを用いて前記正信頼度を補正する補正手段と、前記正例が誤って前記負例として分類される確率である誤分類率について、予め求めた所定値と前記境界を用いた場合の実際の値との差が小さくなるように、前記補正パラメータを更新する更新手段とを備えることを特徴とする分類装置である。 (Additional note 1)
The classification device according to appendix 1 is a classification device that classifies a plurality of data into positive examples and negative examples,
A learning means for learning a boundary for classifying the positive example and the negative example based on the positive example and a positive reliability that is a reliability of the positive example, and correcting the positive reliability using a correction parameter. The correcting means and the misclassification rate, which is the probability that the positive example is erroneously classified as the negative example, so that the difference between a predetermined value determined in advance and an actual value when the boundary is used is small; The classification device is characterized by comprising: updating means for updating the correction parameters.

付記１に記載の分類装置によれば、誤分類率に基づいて更新される補正パラメータによって正信頼度が補正される。よって、正信頼度に生ずるバイアス等の影響を小さくすることができ、正例と負例との境界を適切に学習することが可能となる。 According to the classification device described in Supplementary Note 1, the accuracy degree is corrected by the correction parameter that is updated based on the misclassification rate. Therefore, it is possible to reduce the influence of bias, etc. that occurs on the positive reliability, and it is possible to appropriately learn the boundary between positive examples and negative examples.

（付記２）
付記２に記載の分類装置は、前記更新手段は、クロスバリデーションを用いて前記実際の値を算出することを特徴とする付記１に記載の分類装置である。 (Additional note 2)
The classification device according to appendix 2 is the classification device according to appendix 1, wherein the updating means calculates the actual value using cross validation.

付記２に記載の運転装置によれば、クロスバリデーションを用いて適切に補正パラメータを更新することが可能である。 According to the operating device described in Appendix 2, it is possible to appropriately update correction parameters using cross validation.

（付記３）
付記３に記載の分類装置は、前記所定値を算出する所定値算出部を更に備えることを特徴とする付記１又は２に記載の分類装置である。 (Additional note 3)
The classification device according to appendix 3 is the classification device according to appendix 1 or 2, further comprising a predetermined value calculation unit that calculates the predetermined value.

付記３に記載の分類装置によれば、誤分類値の所定値を予め算出することが可能である。 According to the classification device described in Appendix 3, it is possible to calculate the predetermined value of the misclassification value in advance.

（付記４）
付記４に記載の分類方法は、複数のデータを正例と負例とに分類する分類方法であって、
前記正例及び前記正例の信頼度である正信頼度に基づいて、前記正例と前記負例とを分類する境界を学習する学習工程と、補正パラメータを用いて前記正信頼度を補正する補正工程と、前記正例が誤って前記負例として分類される確率である誤分類率について、予め求めた所定値と前記境界を用いた場合の実際の値との差が小さくなるように、前記補正パラメータを更新する更新工程とを含むことを特徴とする分類方法である。 (Additional note 4)
The classification method described in Appendix 4 is a classification method that classifies a plurality of data into positive examples and negative examples,
a learning step of learning a boundary for classifying the positive example and the negative example based on the positive example and the positive reliability that is the reliability of the positive example; and correcting the positive reliability using a correction parameter. In the correction step, with respect to the misclassification rate, which is the probability that the positive example is erroneously classified as the negative example, the difference between the predetermined value determined in advance and the actual value when the boundary is used is small. The classification method is characterized in that it includes an updating step of updating the correction parameter.

付記４に記載の分類方法によれば、付記１に記載の分類装置と同様に、正信頼度に生ずるバイアス等の影響を小さくすることができる。従って、正例と負例との境界を適切に学習することが可能となる。 According to the classification method described in Appendix 4, similarly to the classification device described in Appendix 1, it is possible to reduce the influence of bias, etc. that occurs on accuracy. Therefore, it becomes possible to appropriately learn the boundary between positive examples and negative examples.

本発明は、上述した実施形態に限られるものではなく、請求の範囲及び明細書全体から読み取れる発明の要旨或いは思想に反しない範囲で適宜変更可能であり、そのような変更を伴う分類装置及び分類方法もまた本発明の技術的範囲に含まれるものである。 The present invention is not limited to the embodiments described above, and can be modified as appropriate within the scope or idea of the invention that can be read from the claims and the entire specification. Methods are also within the scope of the present invention.

１０分類装置
１００記録部
２００学習部
２１０分類器学習部
２２０パラメータ調整部
ｘデータ
ｒ正信頼度
φ 正例の誤分類率
ｇ二値分類器
ｋ補正パラメータ
Δ 二乗誤差 10 Classification device 100 Recording unit 200 Learning unit 210 Classifier learning unit 220 Parameter adjustment unit x Data r Correct reliability φ Misclassification rate of positive examples g Binary classifier k Correction parameter Δ Squared error

Claims

A classification device that classifies a plurality of data into positive examples and negative examples,
learning means for learning a boundary for classifying the positive example and the negative example based on the positive example and a positive reliability that is a reliability of the positive example;
a correction means for correcting the degree of accuracy using a correction parameter;
Regarding the misclassification rate, which is the probability that the positive example is erroneously classified as the negative example, the correction parameter is set so that the difference between a predetermined value determined in advance and an actual value when the boundary is used is small. A classification device comprising: updating means for updating;

The classification device according to claim 1, wherein the updating means calculates the actual value using cross validation.

The classification device according to claim 1 or 2, further comprising a predetermined value calculation unit that calculates the predetermined value.

A classification method for classifying multiple data into positive examples and negative examples,
a learning step of learning a boundary for classifying the positive example and the negative example based on the positive example and a positive reliability that is a reliability of the positive example;
a correction step of correcting the degree of accuracy using a correction parameter;
Regarding the misclassification rate, which is the probability that the positive example is erroneously classified as the negative example, the correction parameter is set so that the difference between a predetermined value determined in advance and an actual value when the boundary is used is small. A classification method characterized by comprising an update step of updating.