CN104820839A

CN104820839A - Respective positive and negative example correct rate setting-based controllable confidence machine algorithm

Info

Publication number: CN104820839A
Application number: CN201510202168.5A
Authority: CN
Inventors: 蒋方纯
Original assignee: Shenzhen Institute of Information Technology
Current assignee: Shenzhen Institute of Information Technology
Priority date: 2015-04-24
Filing date: 2015-04-24
Publication date: 2015-08-05

Abstract

The invention belongs to the machine learning field and provides a respective positive and negative example correct rate setting-based controllable confidence machine algorithm. The respective positive and negative example correct rate setting-based controllable confidence machine algorithm includes the following steps that: a binary classifier is trained according to a sample train set, and classification is performed on the train set according to the binary classifier, and a classification result is converted into an output value; a preset equidistant step length is gradually increased from an original point, and a positive example correct rate is calculated and is compared with a preset correct rate, so that a positive example threshold value t1 can be obtained, and a negative example correct rate is calculated and is compared with a preset negative example correct rate, so that a negative example threshold value -t2 can be obtained, and a threshold value range (-t2,t1) can be formed according to the positive example threshold value t1 and the negative example threshold value -t2; and classification results of unknown samples are distributed according to the threshold value range. The respective positive and negative example correct rate setting-based controllable confidence machine algorithm provided by the technical scheme of the invention has the advantages of control precision and flexible control.

Description

Based on the controlled confidence machine algorithm arranging positive and negative routine accuracy respectively

Technical field

The invention belongs to machine learning field, particularly relating to a kind of controlled confidence machine algorithm based on arranging positive and negative routine accuracy respectively.

Background technology

Confidence machine is exactly provide a believable degree to judge the classification process maybe can preset learning outcome to the result of study in the process of machine learning simultaneously.Confidence machine has important realistic meaning in high risk applications such as medical diagnosiss.Confidence machine is the branch that in machine learning field, search time is not long, the theoretical foundation and the method that realize confidence learning machine are also few, there is the method for directly structure degree of confidence, there is the method for indirect configuration degree of confidence, having by arranging rejecting option, the classification process preset can be carried out, get rid of low believable part, thus improve the confidence level of remainder, realize confidence classification, and divide rate controlled to mistake.

Within 2005, publish the monograph " Algorithmic Learning in a Random World " about trusting machine learning by Vladimir Vovk, Alexander Gammerman, Glenn Shafer.Within 2004, the red grade of Qiu De is at Journal of Computer Research and Development periodical Vol.41, deliver in No.9 the confidence Learning machine of unusual description " theoretical based on algorithmic theory of randomness and ", theoretical according to Kolmogorov algorithmic theory of randomness, for Learning machine establishes a kind of confidence mechanism, describe the algorithm of confidence Learning machine.

Existing scheme has following problem:

(1) precision of confidence control is inadequate.Confidence learning machine method above-mentioned is at present that the method by arranging Bin calculates wrong point rate, and arranges threshold value according to result of calculation, but last positive and negative routine accuracy control is compared with preset value originally, and gap can be very large sometimes.

(2) confidence controls underaction.The method arranging Bin has certain restriction, can not arrange numerical value arbitrarily, carries out flexible and changeable control, meets different requirements.

Summary of the invention

The object of the embodiment of the present invention is to provide a kind of controlled confidence machine algorithm based on arranging positive and negative routine accuracy respectively, and the precision that its confidence solving prior art controls is inadequate, and confidence controls the problem of underaction.

The embodiment of the present invention is achieved in that a kind of controlled confidence machine algorithm based on arranging positive and negative routine accuracy respectively, described method comprises the steps: on the one hand

Receive the training set Train Set of binary training data sample and binary training sample label formation;

Train binary classifier according to described training set Train Set, obtain binary classifier parameter value;

Classify on described training set Train Set according to described binary classifier, and convert classification results to output valve Output score;

From initial point, by progressively increasing default equidistant step-length, calculate positive example accuracy rate to compare with the positive accuracy rate preset, obtain positive example threshold value t1, and, calculate negative accuracy rate and compare with the negative accuracy rate preset, obtain negative routine threshold value-t2, according to the threshold range (-t2, t1) that described positive example threshold value t1 and negative routine threshold value-t2 is formed;

Obtain unknown binary sample, according to described binary classifier, unknown binary sample is classified, and convert classification results to output valve Output score;

If the output valve Output score of described unknown binary sample belongs to this threshold range, unknown sample is assigned to region of rejection, the output valve Output Score as unknown binary sample does not belong to this threshold range, and unknown sample is assigned to acceptance domain.

Optionally, described from initial point, by progressively increasing default equidistant step-length, calculate positive example accuracy rate and compare with the positive accuracy rate preset, obtaining positive example threshold value t1 step is:

11, from initial point, the output valve Output score according to classification results conversion calculates positive example accuracy rate;

12, if the positive example accuracy rate calculated is more than or equal to default positive example accuracy rate, be then defined as positive example threshold value t1 by the point that the positive example accuracy rate of current calculating is corresponding;

13, if the positive example accuracy rate calculated is less than default positive example accuracy rate, then increases predetermined positive example step-length, and return to step S13.

Optionally, described from initial point, by progressively increasing default equidistant step-length, calculate negative accuracy rate and compare with the negative accuracy rate preset, obtaining negative routine threshold value-t2 step is:

21, from initial point, the situation possible according to the output valve Output score of classification results conversion calculates negative routine accuracy rate;

22, if the negative routine accuracy rate calculated is more than or equal to default negative routine accuracy rate, then determine negative routine threshold value-t2 by the point that the negative routine accuracy rate of current calculating is corresponding;

23, if the negative routine accuracy rate calculated is less than default negative routine accuracy rate, then increases predetermined negative routine step-length, and return to step S21.

In embodiments of the present invention, technical scheme provided by the invention, by arranging positive and negative routine accuracy and equidistant step-length respectively, can arrange numerical value as required flexibly, controls flexibly, and by the equidistant step-length of adjustment, can realize more high-precision confidence and control.

Accompanying drawing explanation

Fig. 1 is a kind of process flow diagram of controlled confidence machine algorithm based on arranging positive and negative routine accuracy respectively provided by the invention.

Embodiment

In order to make object of the present invention, technical scheme and advantage clearly understand, below in conjunction with drawings and Examples, the present invention is further elaborated.Should be appreciated that specific embodiment described herein only in order to explain the present invention, be not intended to limit the present invention.

The specific embodiment of the invention provides a kind of controlled confidence machine algorithm based on arranging positive and negative routine accuracy respectively, and said method is performed by confidence machine, and the method as shown in Figure 1, comprises the steps:

In step S101, receive the training set Train Set of binary training data sample and binary training sample label formation;

In step s 102, train binary classifier according to described training set Train Set, obtain binary classifier parameter value;

In step s 103, classify on described training set Train Set according to described binary classifier, and convert classification results to output valve Output score;

In step S104, from initial point, by progressively increasing default equidistant step-length, calculate positive example accuracy rate to compare with the positive accuracy rate preset, obtain positive example threshold value t1, and, calculate negative accuracy rate to compare with the negative accuracy rate preset, obtain negative routine threshold value-t2, according to the threshold range (-t2, t1) that described positive example threshold value t1 and negative routine threshold value-t2 is formed;

In step S105, obtain unknown binary sample, according to described binary classifier, unknown binary sample is classified, and convert classification results to output valve Output score;

In step s 106, if the output valve Output score of described unknown binary sample belongs to this threshold range, unknown sample is assigned to region of rejection, and the output valve Output Score as unknown binary sample does not belong to this threshold range, and unknown sample is assigned to acceptance domain.

Concrete scheme provided by the invention, by arranging positive and negative routine accuracy and equidistant step-length respectively, can arrange numerical value as required flexibly, controls flexibly, and by the equidistant step-length of adjustment, can realize more high-precision confidence and control.

Preferably, described from initial point, by progressively increasing default equidistant step-length, calculate negative accuracy rate and compare with the negative accuracy rate preset, obtaining negative routine threshold value-t2 step is:

Technical scheme provided by the invention improves control accuracy, realize controlling controlled confidence machine flexibly based on two-dimensional problem, use this kind of algorithm, there is the needs adapting to different occasion, the feature meeting different application demand, be also easy to be generalized to multivariate classification problem simultaneously.The method achieve and improve control accuracy, have and control feature flexibly, multiple experimental data collection such as heart disease and diabetes is verified, achieves good experiment effect.

The software algorithm flow process of technical scheme provided by the invention is as follows:

Training algorithm flow process:

Input

X: binary training data sample

Y: binary training sample label

Train Set：(X，Y)

Pp: positive example accuracy rate

Ps: positive example step-length

NP: negative routine accuracy rate

Ns: negative routine step-length

Export

Positive example threshold value: t1

Negative routine threshold value :-t2

Process

1, train binary classifier with Train Set, obtain the relevant parameters value of binary classifier

2, classify on training set Train Set with binary classifier

3, classification results is converted to score to export

4, four kinds of situations judge

5, positive example accuracy rate is calculated

6, if positive example accuracy rate >=pp

goto 14

endif

7, step-length ps is increased

8、goto 4

9, four kinds of situations judge

10, negative routine accuracy rate is calculated

11, if bears routine accuracy rate >=np

goto 14

endif

12, step-length ns is increased

13、goto 9

14, t1 and-t2 is exported

15, terminate

Sorting algorithm flow process:

Input

X: unknown sample

Export

The classification of unknown sample

or

Artificial treatment unknown sample

Process

1, with binary classifier, unknown x is classified

2, classification results is converted to score to export

3, if score value <=-t2or score value >=t1

Export the classification of unknown sample

else

Artificial treatment

endif

4, terminate

Adopt the experimental data of scheme of the present invention as follows:

Usage data collection information slip tested by table 1

The data result that this algorithm performs is as shown in table 2.

Table 2 algorithm execution result (unit: %)

This algorithm finds the mistake point rate obtained after threshold value, can with setting value closer to or consistent.

The foregoing is only preferred embodiment of the present invention, not in order to limit the present invention, all any amendments done within the spirit and principles in the present invention, equivalent replacement and improvement etc., all should be included within protection scope of the present invention.

Claims

1., based on the controlled confidence machine algorithm arranging positive and negative routine accuracy respectively, it is characterized in that, described method comprises the steps:

2. method according to claim 1, is characterized in that, described from initial point, and by progressively increasing default equidistant step-length, calculate positive example accuracy rate and compare with default positive accuracy rate, obtaining positive example threshold value t1 step is:

3. method according to claim 1, is characterized in that, described from initial point, and by progressively increasing default equidistant step-length, calculate negative accuracy rate and compare with default negative accuracy rate, obtaining negative routine threshold value-t2 step is: