CN104820839A - Respective positive and negative example correct rate setting-based controllable confidence machine algorithm - Google Patents
Respective positive and negative example correct rate setting-based controllable confidence machine algorithm Download PDFInfo
- Publication number
- CN104820839A CN104820839A CN201510202168.5A CN201510202168A CN104820839A CN 104820839 A CN104820839 A CN 104820839A CN 201510202168 A CN201510202168 A CN 201510202168A CN 104820839 A CN104820839 A CN 104820839A
- Authority
- CN
- China
- Prior art keywords
- accuracy rate
- negative
- threshold value
- positive example
- positive
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Pending
Links
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F18/00—Pattern recognition
- G06F18/20—Analysing
- G06F18/21—Design or setup of recognition systems or techniques; Extraction of features in feature space; Blind source separation
- G06F18/211—Selection of the most significant subset of features
- G06F18/2111—Selection of the most significant subset of features by using evolutionary computational techniques, e.g. genetic algorithms
Landscapes
- Engineering & Computer Science (AREA)
- Data Mining & Analysis (AREA)
- Theoretical Computer Science (AREA)
- Bioinformatics & Cheminformatics (AREA)
- Computer Vision & Pattern Recognition (AREA)
- General Health & Medical Sciences (AREA)
- Life Sciences & Earth Sciences (AREA)
- Artificial Intelligence (AREA)
- Health & Medical Sciences (AREA)
- Bioinformatics & Computational Biology (AREA)
- Physiology (AREA)
- Evolutionary Biology (AREA)
- Evolutionary Computation (AREA)
- Physics & Mathematics (AREA)
- General Engineering & Computer Science (AREA)
- General Physics & Mathematics (AREA)
- Computing Systems (AREA)
- Information Retrieval, Db Structures And Fs Structures Therefor (AREA)
Abstract
The invention belongs to the machine learning field and provides a respective positive and negative example correct rate setting-based controllable confidence machine algorithm. The respective positive and negative example correct rate setting-based controllable confidence machine algorithm includes the following steps that: a binary classifier is trained according to a sample train set, and classification is performed on the train set according to the binary classifier, and a classification result is converted into an output value; a preset equidistant step length is gradually increased from an original point, and a positive example correct rate is calculated and is compared with a preset correct rate, so that a positive example threshold value t1 can be obtained, and a negative example correct rate is calculated and is compared with a preset negative example correct rate, so that a negative example threshold value -t2 can be obtained, and a threshold value range (-t2,t1) can be formed according to the positive example threshold value t1 and the negative example threshold value -t2; and classification results of unknown samples are distributed according to the threshold value range. The respective positive and negative example correct rate setting-based controllable confidence machine algorithm provided by the technical scheme of the invention has the advantages of control precision and flexible control.
Description
Technical field
The invention belongs to machine learning field, particularly relating to a kind of controlled confidence machine algorithm based on arranging positive and negative routine accuracy respectively.
Background technology
Confidence machine is exactly provide a believable degree to judge the classification process maybe can preset learning outcome to the result of study in the process of machine learning simultaneously.Confidence machine has important realistic meaning in high risk applications such as medical diagnosiss.Confidence machine is the branch that in machine learning field, search time is not long, the theoretical foundation and the method that realize confidence learning machine are also few, there is the method for directly structure degree of confidence, there is the method for indirect configuration degree of confidence, having by arranging rejecting option, the classification process preset can be carried out, get rid of low believable part, thus improve the confidence level of remainder, realize confidence classification, and divide rate controlled to mistake.
Within 2005, publish the monograph " Algorithmic Learning in a Random World " about trusting machine learning by Vladimir Vovk, Alexander Gammerman, Glenn Shafer.Within 2004, the red grade of Qiu De is at Journal of Computer Research and Development periodical Vol.41, deliver in No.9 the confidence Learning machine of unusual description " theoretical based on algorithmic theory of randomness and ", theoretical according to Kolmogorov algorithmic theory of randomness, for Learning machine establishes a kind of confidence mechanism, describe the algorithm of confidence Learning machine.
Existing scheme has following problem:
(1) precision of confidence control is inadequate.Confidence learning machine method above-mentioned is at present that the method by arranging Bin calculates wrong point rate, and arranges threshold value according to result of calculation, but last positive and negative routine accuracy control is compared with preset value originally, and gap can be very large sometimes.
(2) confidence controls underaction.The method arranging Bin has certain restriction, can not arrange numerical value arbitrarily, carries out flexible and changeable control, meets different requirements.
Summary of the invention
The object of the embodiment of the present invention is to provide a kind of controlled confidence machine algorithm based on arranging positive and negative routine accuracy respectively, and the precision that its confidence solving prior art controls is inadequate, and confidence controls the problem of underaction.
The embodiment of the present invention is achieved in that a kind of controlled confidence machine algorithm based on arranging positive and negative routine accuracy respectively, described method comprises the steps: on the one hand
Receive the training set Train Set of binary training data sample and binary training sample label formation;
Train binary classifier according to described training set Train Set, obtain binary classifier parameter value;
Classify on described training set Train Set according to described binary classifier, and convert classification results to output valve Output score;
From initial point, by progressively increasing default equidistant step-length, calculate positive example accuracy rate to compare with the positive accuracy rate preset, obtain positive example threshold value t1, and, calculate negative accuracy rate and compare with the negative accuracy rate preset, obtain negative routine threshold value-t2, according to the threshold range (-t2, t1) that described positive example threshold value t1 and negative routine threshold value-t2 is formed;
Obtain unknown binary sample, according to described binary classifier, unknown binary sample is classified, and convert classification results to output valve Output score;
If the output valve Output score of described unknown binary sample belongs to this threshold range, unknown sample is assigned to region of rejection, the output valve Output Score as unknown binary sample does not belong to this threshold range, and unknown sample is assigned to acceptance domain.
Optionally, described from initial point, by progressively increasing default equidistant step-length, calculate positive example accuracy rate and compare with the positive accuracy rate preset, obtaining positive example threshold value t1 step is:
11, from initial point, the output valve Output score according to classification results conversion calculates positive example accuracy rate;
12, if the positive example accuracy rate calculated is more than or equal to default positive example accuracy rate, be then defined as positive example threshold value t1 by the point that the positive example accuracy rate of current calculating is corresponding;
13, if the positive example accuracy rate calculated is less than default positive example accuracy rate, then increases predetermined positive example step-length, and return to step S13.
Optionally, described from initial point, by progressively increasing default equidistant step-length, calculate negative accuracy rate and compare with the negative accuracy rate preset, obtaining negative routine threshold value-t2 step is:
21, from initial point, the situation possible according to the output valve Output score of classification results conversion calculates negative routine accuracy rate;
22, if the negative routine accuracy rate calculated is more than or equal to default negative routine accuracy rate, then determine negative routine threshold value-t2 by the point that the negative routine accuracy rate of current calculating is corresponding;
23, if the negative routine accuracy rate calculated is less than default negative routine accuracy rate, then increases predetermined negative routine step-length, and return to step S21.
In embodiments of the present invention, technical scheme provided by the invention, by arranging positive and negative routine accuracy and equidistant step-length respectively, can arrange numerical value as required flexibly, controls flexibly, and by the equidistant step-length of adjustment, can realize more high-precision confidence and control.
Accompanying drawing explanation
Fig. 1 is a kind of process flow diagram of controlled confidence machine algorithm based on arranging positive and negative routine accuracy respectively provided by the invention.
Embodiment
In order to make object of the present invention, technical scheme and advantage clearly understand, below in conjunction with drawings and Examples, the present invention is further elaborated.Should be appreciated that specific embodiment described herein only in order to explain the present invention, be not intended to limit the present invention.
The specific embodiment of the invention provides a kind of controlled confidence machine algorithm based on arranging positive and negative routine accuracy respectively, and said method is performed by confidence machine, and the method as shown in Figure 1, comprises the steps:
In step S101, receive the training set Train Set of binary training data sample and binary training sample label formation;
In step s 102, train binary classifier according to described training set Train Set, obtain binary classifier parameter value;
In step s 103, classify on described training set Train Set according to described binary classifier, and convert classification results to output valve Output score;
In step S104, from initial point, by progressively increasing default equidistant step-length, calculate positive example accuracy rate to compare with the positive accuracy rate preset, obtain positive example threshold value t1, and, calculate negative accuracy rate to compare with the negative accuracy rate preset, obtain negative routine threshold value-t2, according to the threshold range (-t2, t1) that described positive example threshold value t1 and negative routine threshold value-t2 is formed;
In step S105, obtain unknown binary sample, according to described binary classifier, unknown binary sample is classified, and convert classification results to output valve Output score;
In step s 106, if the output valve Output score of described unknown binary sample belongs to this threshold range, unknown sample is assigned to region of rejection, and the output valve Output Score as unknown binary sample does not belong to this threshold range, and unknown sample is assigned to acceptance domain.
Concrete scheme provided by the invention, by arranging positive and negative routine accuracy and equidistant step-length respectively, can arrange numerical value as required flexibly, controls flexibly, and by the equidistant step-length of adjustment, can realize more high-precision confidence and control.
Optionally, described from initial point, by progressively increasing default equidistant step-length, calculate positive example accuracy rate and compare with the positive accuracy rate preset, obtaining positive example threshold value t1 step is:
11, from initial point, the output valve Output score according to classification results conversion calculates positive example accuracy rate;
12, if the positive example accuracy rate calculated is more than or equal to default positive example accuracy rate, be then defined as positive example threshold value t1 by the point that the positive example accuracy rate of current calculating is corresponding;
13, if the positive example accuracy rate calculated is less than default positive example accuracy rate, then increases predetermined positive example step-length, and return to step S13.
Preferably, described from initial point, by progressively increasing default equidistant step-length, calculate negative accuracy rate and compare with the negative accuracy rate preset, obtaining negative routine threshold value-t2 step is:
21, from initial point, the situation possible according to the output valve Output score of classification results conversion calculates negative routine accuracy rate;
22, if the negative routine accuracy rate calculated is more than or equal to default negative routine accuracy rate, then determine negative routine threshold value-t2 by the point that the negative routine accuracy rate of current calculating is corresponding;
23, if the negative routine accuracy rate calculated is less than default negative routine accuracy rate, then increases predetermined negative routine step-length, and return to step S21.
Technical scheme provided by the invention improves control accuracy, realize controlling controlled confidence machine flexibly based on two-dimensional problem, use this kind of algorithm, there is the needs adapting to different occasion, the feature meeting different application demand, be also easy to be generalized to multivariate classification problem simultaneously.The method achieve and improve control accuracy, have and control feature flexibly, multiple experimental data collection such as heart disease and diabetes is verified, achieves good experiment effect.
The software algorithm flow process of technical scheme provided by the invention is as follows:
Training algorithm flow process:
Input
X: binary training data sample
Y: binary training sample label
Train Set:(X,Y)
Pp: positive example accuracy rate
Ps: positive example step-length
NP: negative routine accuracy rate
Ns: negative routine step-length
Export
Positive example threshold value: t1
Negative routine threshold value :-t2
Process
1, train binary classifier with Train Set, obtain the relevant parameters value of binary classifier
2, classify on training set Train Set with binary classifier
3, classification results is converted to score to export
4, four kinds of situations judge
5, positive example accuracy rate is calculated
6, if positive example accuracy rate >=pp
goto 14
endif
7, step-length ps is increased
8、goto 4
9, four kinds of situations judge
10, negative routine accuracy rate is calculated
11, if bears routine accuracy rate >=np
goto 14
endif
12, step-length ns is increased
13、goto 9
14, t1 and-t2 is exported
15, terminate
Sorting algorithm flow process:
Input
X: unknown sample
Export
The classification of unknown sample
or
Artificial treatment unknown sample
Process
1, with binary classifier, unknown x is classified
2, classification results is converted to score to export
3, if score value <=-t2or score value >=t1
Export the classification of unknown sample
else
Artificial treatment
endif
4, terminate
Adopt the experimental data of scheme of the present invention as follows:
Usage data collection information slip tested by table 1
The data result that this algorithm performs is as shown in table 2.
Table 2 algorithm execution result (unit: %)
This algorithm finds the mistake point rate obtained after threshold value, can with setting value closer to or consistent.
The foregoing is only preferred embodiment of the present invention, not in order to limit the present invention, all any amendments done within the spirit and principles in the present invention, equivalent replacement and improvement etc., all should be included within protection scope of the present invention.
Claims (3)
1., based on the controlled confidence machine algorithm arranging positive and negative routine accuracy respectively, it is characterized in that, described method comprises the steps:
Receive the training set Train Set of binary training data sample and binary training sample label formation;
Train binary classifier according to described training set Train Set, obtain binary classifier parameter value;
Classify on described training set Train Set according to described binary classifier, and convert classification results to output valve Output score;
From initial point, by progressively increasing default equidistant step-length, calculate positive example accuracy rate to compare with the positive accuracy rate preset, obtain positive example threshold value t1, and, calculate negative accuracy rate and compare with the negative accuracy rate preset, obtain negative routine threshold value-t2, according to the threshold range (-t2, t1) that described positive example threshold value t1 and negative routine threshold value-t2 is formed;
Obtain unknown binary sample, according to described binary classifier, unknown binary sample is classified, and convert classification results to output valve Output score;
If the output valve Output score of described unknown binary sample belongs to this threshold range, unknown sample is assigned to region of rejection, the output valve Output Score as unknown binary sample does not belong to this threshold range, and unknown sample is assigned to acceptance domain.
2. method according to claim 1, is characterized in that, described from initial point, and by progressively increasing default equidistant step-length, calculate positive example accuracy rate and compare with default positive accuracy rate, obtaining positive example threshold value t1 step is:
11, from initial point, the output valve Output score according to classification results conversion calculates positive example accuracy rate;
12, if the positive example accuracy rate calculated is more than or equal to default positive example accuracy rate, be then defined as positive example threshold value t1 by the point that the positive example accuracy rate of current calculating is corresponding;
13, if the positive example accuracy rate calculated is less than default positive example accuracy rate, then increases predetermined positive example step-length, and return to step S13.
3. method according to claim 1, is characterized in that, described from initial point, and by progressively increasing default equidistant step-length, calculate negative accuracy rate and compare with default negative accuracy rate, obtaining negative routine threshold value-t2 step is:
21, from initial point, the situation possible according to the output valve Output score of classification results conversion calculates negative routine accuracy rate;
22, if the negative routine accuracy rate calculated is more than or equal to default negative routine accuracy rate, then determine negative routine threshold value-t2 by the point that the negative routine accuracy rate of current calculating is corresponding;
23, if the negative routine accuracy rate calculated is less than default negative routine accuracy rate, then increases predetermined negative routine step-length, and return to step S21.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201510202168.5A CN104820839A (en) | 2015-04-24 | 2015-04-24 | Respective positive and negative example correct rate setting-based controllable confidence machine algorithm |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201510202168.5A CN104820839A (en) | 2015-04-24 | 2015-04-24 | Respective positive and negative example correct rate setting-based controllable confidence machine algorithm |
Publications (1)
Publication Number | Publication Date |
---|---|
CN104820839A true CN104820839A (en) | 2015-08-05 |
Family
ID=53731128
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201510202168.5A Pending CN104820839A (en) | 2015-04-24 | 2015-04-24 | Respective positive and negative example correct rate setting-based controllable confidence machine algorithm |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN104820839A (en) |
Cited By (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN110689034A (en) * | 2018-07-06 | 2020-01-14 | 阿里巴巴集团控股有限公司 | Classifier optimization method and device |
Citations (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
JP2797721B2 (en) * | 1991-01-08 | 1998-09-17 | 日本電気株式会社 | Character recognition device |
CN102542291A (en) * | 2011-12-23 | 2012-07-04 | 国网电力科学研究院 | Hyperspectral remote sensing image classification method based on binary decision tree |
CN102722726A (en) * | 2012-06-05 | 2012-10-10 | 江苏省电力公司南京供电公司 | Multi-class support vector machine classification method based on dynamic binary tree |
-
2015
- 2015-04-24 CN CN201510202168.5A patent/CN104820839A/en active Pending
Patent Citations (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
JP2797721B2 (en) * | 1991-01-08 | 1998-09-17 | 日本電気株式会社 | Character recognition device |
CN102542291A (en) * | 2011-12-23 | 2012-07-04 | 国网电力科学研究院 | Hyperspectral remote sensing image classification method based on binary decision tree |
CN102722726A (en) * | 2012-06-05 | 2012-10-10 | 江苏省电力公司南京供电公司 | Multi-class support vector machine classification method based on dynamic binary tree |
Cited By (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN110689034A (en) * | 2018-07-06 | 2020-01-14 | 阿里巴巴集团控股有限公司 | Classifier optimization method and device |
CN110689034B (en) * | 2018-07-06 | 2023-04-07 | 阿里巴巴集团控股有限公司 | Classifier optimization method and device |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
TW201947510A (en) | Insurance service risk prediction processing method, device and processing equipment | |
CN106533742B (en) | Weighting directed complex networks networking method based on time sequence model characterization | |
WO2017124930A1 (en) | Method and device for feature data processing | |
JP2018511870A (en) | Big data processing method for segment-based two-stage deep learning model | |
Li et al. | Improved sparse least-squares support vector machine classifiers | |
JP6172317B2 (en) | Method and apparatus for mixed model selection | |
CN106547899B (en) | Intermittent process time interval division method based on multi-scale time-varying clustering center change | |
Jie et al. | Naive Bayesian classifier based on genetic simulated annealing algorithm | |
CN104820839A (en) | Respective positive and negative example correct rate setting-based controllable confidence machine algorithm | |
CN108021985A (en) | A kind of model parameter training method and device | |
CN115935212A (en) | Adjustable load clustering method and system based on longitudinal trend prediction | |
CN104820838A (en) | Positive and negative example misclassification value percentage setting-based controllable confidence machine algorithm | |
CN104573709A (en) | Controllable confidence machine algorithm based on set total error rate | |
JP2007257295A (en) | Pattern recognition method | |
Li et al. | Parameters selection for support vector machine based on particle swarm optimization | |
CN104598923B (en) | Controllable confidence machine classification process based on score output valve percentages | |
CN108804807B (en) | Method and device for acquiring surface potential | |
CN112365363A (en) | Calculation method for similarity of power load curves | |
CN107958695B (en) | High-precision medicine quantification method based on machine learning | |
Zhao et al. | FCM algorithm based on the optimization parameters of objective function point | |
CN109903753A (en) | More human speech sentence classification methods, equipment, medium and system based on sound source angle | |
CN104317824A (en) | Initial clustering center optimization algorithm based on outlier indexes | |
RU2014130519A (en) | METHOD FOR AUTOMATIC CLUSTERING OBJECTS | |
CN105389359A (en) | Search method and system | |
CN112861130B (en) | Multi-class conversion malicious software detection method from N to N +1 |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
C06 | Publication | ||
PB01 | Publication | ||
EXSB | Decision made by sipo to initiate substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
RJ01 | Rejection of invention patent application after publication | ||
RJ01 | Rejection of invention patent application after publication |
Application publication date: 20150805 |