CN104820839A - Respective positive and negative example correct rate setting-based controllable confidence machine algorithm - Google Patents

Respective positive and negative example correct rate setting-based controllable confidence machine algorithm Download PDF

Info

Publication number
CN104820839A
CN104820839A CN201510202168.5A CN201510202168A CN104820839A CN 104820839 A CN104820839 A CN 104820839A CN 201510202168 A CN201510202168 A CN 201510202168A CN 104820839 A CN104820839 A CN 104820839A
Authority
CN
China
Prior art keywords
accuracy rate
negative
threshold value
positive example
positive
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN201510202168.5A
Other languages
Chinese (zh)
Inventor
蒋方纯
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Shenzhen Institute of Information Technology
Original Assignee
Shenzhen Institute of Information Technology
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Shenzhen Institute of Information Technology filed Critical Shenzhen Institute of Information Technology
Priority to CN201510202168.5A priority Critical patent/CN104820839A/en
Publication of CN104820839A publication Critical patent/CN104820839A/en
Pending legal-status Critical Current

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/21Design or setup of recognition systems or techniques; Extraction of features in feature space; Blind source separation
    • G06F18/211Selection of the most significant subset of features
    • G06F18/2111Selection of the most significant subset of features by using evolutionary computational techniques, e.g. genetic algorithms

Landscapes

  • Engineering & Computer Science (AREA)
  • Data Mining & Analysis (AREA)
  • Theoretical Computer Science (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • General Health & Medical Sciences (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Artificial Intelligence (AREA)
  • Health & Medical Sciences (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Physiology (AREA)
  • Evolutionary Biology (AREA)
  • Evolutionary Computation (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Computing Systems (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

The invention belongs to the machine learning field and provides a respective positive and negative example correct rate setting-based controllable confidence machine algorithm. The respective positive and negative example correct rate setting-based controllable confidence machine algorithm includes the following steps that: a binary classifier is trained according to a sample train set, and classification is performed on the train set according to the binary classifier, and a classification result is converted into an output value; a preset equidistant step length is gradually increased from an original point, and a positive example correct rate is calculated and is compared with a preset correct rate, so that a positive example threshold value t1 can be obtained, and a negative example correct rate is calculated and is compared with a preset negative example correct rate, so that a negative example threshold value -t2 can be obtained, and a threshold value range (-t2,t1) can be formed according to the positive example threshold value t1 and the negative example threshold value -t2; and classification results of unknown samples are distributed according to the threshold value range. The respective positive and negative example correct rate setting-based controllable confidence machine algorithm provided by the technical scheme of the invention has the advantages of control precision and flexible control.

Description

Based on the controlled confidence machine algorithm arranging positive and negative routine accuracy respectively
Technical field
The invention belongs to machine learning field, particularly relating to a kind of controlled confidence machine algorithm based on arranging positive and negative routine accuracy respectively.
Background technology
Confidence machine is exactly provide a believable degree to judge the classification process maybe can preset learning outcome to the result of study in the process of machine learning simultaneously.Confidence machine has important realistic meaning in high risk applications such as medical diagnosiss.Confidence machine is the branch that in machine learning field, search time is not long, the theoretical foundation and the method that realize confidence learning machine are also few, there is the method for directly structure degree of confidence, there is the method for indirect configuration degree of confidence, having by arranging rejecting option, the classification process preset can be carried out, get rid of low believable part, thus improve the confidence level of remainder, realize confidence classification, and divide rate controlled to mistake.
Within 2005, publish the monograph " Algorithmic Learning in a Random World " about trusting machine learning by Vladimir Vovk, Alexander Gammerman, Glenn Shafer.Within 2004, the red grade of Qiu De is at Journal of Computer Research and Development periodical Vol.41, deliver in No.9 the confidence Learning machine of unusual description " theoretical based on algorithmic theory of randomness and ", theoretical according to Kolmogorov algorithmic theory of randomness, for Learning machine establishes a kind of confidence mechanism, describe the algorithm of confidence Learning machine.
Existing scheme has following problem:
(1) precision of confidence control is inadequate.Confidence learning machine method above-mentioned is at present that the method by arranging Bin calculates wrong point rate, and arranges threshold value according to result of calculation, but last positive and negative routine accuracy control is compared with preset value originally, and gap can be very large sometimes.
(2) confidence controls underaction.The method arranging Bin has certain restriction, can not arrange numerical value arbitrarily, carries out flexible and changeable control, meets different requirements.
Summary of the invention
The object of the embodiment of the present invention is to provide a kind of controlled confidence machine algorithm based on arranging positive and negative routine accuracy respectively, and the precision that its confidence solving prior art controls is inadequate, and confidence controls the problem of underaction.
The embodiment of the present invention is achieved in that a kind of controlled confidence machine algorithm based on arranging positive and negative routine accuracy respectively, described method comprises the steps: on the one hand
Receive the training set Train Set of binary training data sample and binary training sample label formation;
Train binary classifier according to described training set Train Set, obtain binary classifier parameter value;
Classify on described training set Train Set according to described binary classifier, and convert classification results to output valve Output score;
From initial point, by progressively increasing default equidistant step-length, calculate positive example accuracy rate to compare with the positive accuracy rate preset, obtain positive example threshold value t1, and, calculate negative accuracy rate and compare with the negative accuracy rate preset, obtain negative routine threshold value-t2, according to the threshold range (-t2, t1) that described positive example threshold value t1 and negative routine threshold value-t2 is formed;
Obtain unknown binary sample, according to described binary classifier, unknown binary sample is classified, and convert classification results to output valve Output score;
If the output valve Output score of described unknown binary sample belongs to this threshold range, unknown sample is assigned to region of rejection, the output valve Output Score as unknown binary sample does not belong to this threshold range, and unknown sample is assigned to acceptance domain.
Optionally, described from initial point, by progressively increasing default equidistant step-length, calculate positive example accuracy rate and compare with the positive accuracy rate preset, obtaining positive example threshold value t1 step is:
11, from initial point, the output valve Output score according to classification results conversion calculates positive example accuracy rate;
12, if the positive example accuracy rate calculated is more than or equal to default positive example accuracy rate, be then defined as positive example threshold value t1 by the point that the positive example accuracy rate of current calculating is corresponding;
13, if the positive example accuracy rate calculated is less than default positive example accuracy rate, then increases predetermined positive example step-length, and return to step S13.
Optionally, described from initial point, by progressively increasing default equidistant step-length, calculate negative accuracy rate and compare with the negative accuracy rate preset, obtaining negative routine threshold value-t2 step is:
21, from initial point, the situation possible according to the output valve Output score of classification results conversion calculates negative routine accuracy rate;
22, if the negative routine accuracy rate calculated is more than or equal to default negative routine accuracy rate, then determine negative routine threshold value-t2 by the point that the negative routine accuracy rate of current calculating is corresponding;
23, if the negative routine accuracy rate calculated is less than default negative routine accuracy rate, then increases predetermined negative routine step-length, and return to step S21.
In embodiments of the present invention, technical scheme provided by the invention, by arranging positive and negative routine accuracy and equidistant step-length respectively, can arrange numerical value as required flexibly, controls flexibly, and by the equidistant step-length of adjustment, can realize more high-precision confidence and control.
Accompanying drawing explanation
Fig. 1 is a kind of process flow diagram of controlled confidence machine algorithm based on arranging positive and negative routine accuracy respectively provided by the invention.
Embodiment
In order to make object of the present invention, technical scheme and advantage clearly understand, below in conjunction with drawings and Examples, the present invention is further elaborated.Should be appreciated that specific embodiment described herein only in order to explain the present invention, be not intended to limit the present invention.
The specific embodiment of the invention provides a kind of controlled confidence machine algorithm based on arranging positive and negative routine accuracy respectively, and said method is performed by confidence machine, and the method as shown in Figure 1, comprises the steps:
In step S101, receive the training set Train Set of binary training data sample and binary training sample label formation;
In step s 102, train binary classifier according to described training set Train Set, obtain binary classifier parameter value;
In step s 103, classify on described training set Train Set according to described binary classifier, and convert classification results to output valve Output score;
In step S104, from initial point, by progressively increasing default equidistant step-length, calculate positive example accuracy rate to compare with the positive accuracy rate preset, obtain positive example threshold value t1, and, calculate negative accuracy rate to compare with the negative accuracy rate preset, obtain negative routine threshold value-t2, according to the threshold range (-t2, t1) that described positive example threshold value t1 and negative routine threshold value-t2 is formed;
In step S105, obtain unknown binary sample, according to described binary classifier, unknown binary sample is classified, and convert classification results to output valve Output score;
In step s 106, if the output valve Output score of described unknown binary sample belongs to this threshold range, unknown sample is assigned to region of rejection, and the output valve Output Score as unknown binary sample does not belong to this threshold range, and unknown sample is assigned to acceptance domain.
Concrete scheme provided by the invention, by arranging positive and negative routine accuracy and equidistant step-length respectively, can arrange numerical value as required flexibly, controls flexibly, and by the equidistant step-length of adjustment, can realize more high-precision confidence and control.
Optionally, described from initial point, by progressively increasing default equidistant step-length, calculate positive example accuracy rate and compare with the positive accuracy rate preset, obtaining positive example threshold value t1 step is:
11, from initial point, the output valve Output score according to classification results conversion calculates positive example accuracy rate;
12, if the positive example accuracy rate calculated is more than or equal to default positive example accuracy rate, be then defined as positive example threshold value t1 by the point that the positive example accuracy rate of current calculating is corresponding;
13, if the positive example accuracy rate calculated is less than default positive example accuracy rate, then increases predetermined positive example step-length, and return to step S13.
Preferably, described from initial point, by progressively increasing default equidistant step-length, calculate negative accuracy rate and compare with the negative accuracy rate preset, obtaining negative routine threshold value-t2 step is:
21, from initial point, the situation possible according to the output valve Output score of classification results conversion calculates negative routine accuracy rate;
22, if the negative routine accuracy rate calculated is more than or equal to default negative routine accuracy rate, then determine negative routine threshold value-t2 by the point that the negative routine accuracy rate of current calculating is corresponding;
23, if the negative routine accuracy rate calculated is less than default negative routine accuracy rate, then increases predetermined negative routine step-length, and return to step S21.
Technical scheme provided by the invention improves control accuracy, realize controlling controlled confidence machine flexibly based on two-dimensional problem, use this kind of algorithm, there is the needs adapting to different occasion, the feature meeting different application demand, be also easy to be generalized to multivariate classification problem simultaneously.The method achieve and improve control accuracy, have and control feature flexibly, multiple experimental data collection such as heart disease and diabetes is verified, achieves good experiment effect.
The software algorithm flow process of technical scheme provided by the invention is as follows:
Training algorithm flow process:
Input
X: binary training data sample
Y: binary training sample label
Train Set:(X,Y)
Pp: positive example accuracy rate
Ps: positive example step-length
NP: negative routine accuracy rate
Ns: negative routine step-length
Export
Positive example threshold value: t1
Negative routine threshold value :-t2
Process
1, train binary classifier with Train Set, obtain the relevant parameters value of binary classifier
2, classify on training set Train Set with binary classifier
3, classification results is converted to score to export
4, four kinds of situations judge
5, positive example accuracy rate is calculated
6, if positive example accuracy rate >=pp
goto 14
endif
7, step-length ps is increased
8、goto 4
9, four kinds of situations judge
10, negative routine accuracy rate is calculated
11, if bears routine accuracy rate >=np
goto 14
endif
12, step-length ns is increased
13、goto 9
14, t1 and-t2 is exported
15, terminate
Sorting algorithm flow process:
Input
X: unknown sample
Export
The classification of unknown sample
or
Artificial treatment unknown sample
Process
1, with binary classifier, unknown x is classified
2, classification results is converted to score to export
3, if score value <=-t2or score value >=t1
Export the classification of unknown sample
else
Artificial treatment
endif
4, terminate
Adopt the experimental data of scheme of the present invention as follows:
Usage data collection information slip tested by table 1
The data result that this algorithm performs is as shown in table 2.
Table 2 algorithm execution result (unit: %)
This algorithm finds the mistake point rate obtained after threshold value, can with setting value closer to or consistent.
The foregoing is only preferred embodiment of the present invention, not in order to limit the present invention, all any amendments done within the spirit and principles in the present invention, equivalent replacement and improvement etc., all should be included within protection scope of the present invention.

Claims (3)

1., based on the controlled confidence machine algorithm arranging positive and negative routine accuracy respectively, it is characterized in that, described method comprises the steps:
Receive the training set Train Set of binary training data sample and binary training sample label formation;
Train binary classifier according to described training set Train Set, obtain binary classifier parameter value;
Classify on described training set Train Set according to described binary classifier, and convert classification results to output valve Output score;
From initial point, by progressively increasing default equidistant step-length, calculate positive example accuracy rate to compare with the positive accuracy rate preset, obtain positive example threshold value t1, and, calculate negative accuracy rate and compare with the negative accuracy rate preset, obtain negative routine threshold value-t2, according to the threshold range (-t2, t1) that described positive example threshold value t1 and negative routine threshold value-t2 is formed;
Obtain unknown binary sample, according to described binary classifier, unknown binary sample is classified, and convert classification results to output valve Output score;
If the output valve Output score of described unknown binary sample belongs to this threshold range, unknown sample is assigned to region of rejection, the output valve Output Score as unknown binary sample does not belong to this threshold range, and unknown sample is assigned to acceptance domain.
2. method according to claim 1, is characterized in that, described from initial point, and by progressively increasing default equidistant step-length, calculate positive example accuracy rate and compare with default positive accuracy rate, obtaining positive example threshold value t1 step is:
11, from initial point, the output valve Output score according to classification results conversion calculates positive example accuracy rate;
12, if the positive example accuracy rate calculated is more than or equal to default positive example accuracy rate, be then defined as positive example threshold value t1 by the point that the positive example accuracy rate of current calculating is corresponding;
13, if the positive example accuracy rate calculated is less than default positive example accuracy rate, then increases predetermined positive example step-length, and return to step S13.
3. method according to claim 1, is characterized in that, described from initial point, and by progressively increasing default equidistant step-length, calculate negative accuracy rate and compare with default negative accuracy rate, obtaining negative routine threshold value-t2 step is:
21, from initial point, the situation possible according to the output valve Output score of classification results conversion calculates negative routine accuracy rate;
22, if the negative routine accuracy rate calculated is more than or equal to default negative routine accuracy rate, then determine negative routine threshold value-t2 by the point that the negative routine accuracy rate of current calculating is corresponding;
23, if the negative routine accuracy rate calculated is less than default negative routine accuracy rate, then increases predetermined negative routine step-length, and return to step S21.
CN201510202168.5A 2015-04-24 2015-04-24 Respective positive and negative example correct rate setting-based controllable confidence machine algorithm Pending CN104820839A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201510202168.5A CN104820839A (en) 2015-04-24 2015-04-24 Respective positive and negative example correct rate setting-based controllable confidence machine algorithm

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201510202168.5A CN104820839A (en) 2015-04-24 2015-04-24 Respective positive and negative example correct rate setting-based controllable confidence machine algorithm

Publications (1)

Publication Number Publication Date
CN104820839A true CN104820839A (en) 2015-08-05

Family

ID=53731128

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201510202168.5A Pending CN104820839A (en) 2015-04-24 2015-04-24 Respective positive and negative example correct rate setting-based controllable confidence machine algorithm

Country Status (1)

Country Link
CN (1) CN104820839A (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110689034A (en) * 2018-07-06 2020-01-14 阿里巴巴集团控股有限公司 Classifier optimization method and device

Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JP2797721B2 (en) * 1991-01-08 1998-09-17 日本電気株式会社 Character recognition device
CN102542291A (en) * 2011-12-23 2012-07-04 国网电力科学研究院 Hyperspectral remote sensing image classification method based on binary decision tree
CN102722726A (en) * 2012-06-05 2012-10-10 江苏省电力公司南京供电公司 Multi-class support vector machine classification method based on dynamic binary tree

Patent Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JP2797721B2 (en) * 1991-01-08 1998-09-17 日本電気株式会社 Character recognition device
CN102542291A (en) * 2011-12-23 2012-07-04 国网电力科学研究院 Hyperspectral remote sensing image classification method based on binary decision tree
CN102722726A (en) * 2012-06-05 2012-10-10 江苏省电力公司南京供电公司 Multi-class support vector machine classification method based on dynamic binary tree

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110689034A (en) * 2018-07-06 2020-01-14 阿里巴巴集团控股有限公司 Classifier optimization method and device
CN110689034B (en) * 2018-07-06 2023-04-07 阿里巴巴集团控股有限公司 Classifier optimization method and device

Similar Documents

Publication Publication Date Title
TW201947510A (en) Insurance service risk prediction processing method, device and processing equipment
CN106533742B (en) Weighting directed complex networks networking method based on time sequence model characterization
WO2017124930A1 (en) Method and device for feature data processing
JP2018511870A (en) Big data processing method for segment-based two-stage deep learning model
Li et al. Improved sparse least-squares support vector machine classifiers
JP6172317B2 (en) Method and apparatus for mixed model selection
CN106547899B (en) Intermittent process time interval division method based on multi-scale time-varying clustering center change
Jie et al. Naive Bayesian classifier based on genetic simulated annealing algorithm
CN104820839A (en) Respective positive and negative example correct rate setting-based controllable confidence machine algorithm
CN108021985A (en) A kind of model parameter training method and device
CN115935212A (en) Adjustable load clustering method and system based on longitudinal trend prediction
CN104820838A (en) Positive and negative example misclassification value percentage setting-based controllable confidence machine algorithm
CN104573709A (en) Controllable confidence machine algorithm based on set total error rate
JP2007257295A (en) Pattern recognition method
Li et al. Parameters selection for support vector machine based on particle swarm optimization
CN104598923B (en) Controllable confidence machine classification process based on score output valve percentages
CN108804807B (en) Method and device for acquiring surface potential
CN112365363A (en) Calculation method for similarity of power load curves
CN107958695B (en) High-precision medicine quantification method based on machine learning
Zhao et al. FCM algorithm based on the optimization parameters of objective function point
CN109903753A (en) More human speech sentence classification methods, equipment, medium and system based on sound source angle
CN104317824A (en) Initial clustering center optimization algorithm based on outlier indexes
RU2014130519A (en) METHOD FOR AUTOMATIC CLUSTERING OBJECTS
CN105389359A (en) Search method and system
CN112861130B (en) Multi-class conversion malicious software detection method from N to N +1

Legal Events

Date Code Title Description
C06 Publication
PB01 Publication
EXSB Decision made by sipo to initiate substantive examination
SE01 Entry into force of request for substantive examination
RJ01 Rejection of invention patent application after publication
RJ01 Rejection of invention patent application after publication

Application publication date: 20150805