CN104537157A - Confidence regression algorithm and device based on KNN (K-Nearest-Neighbor) - Google Patents
Confidence regression algorithm and device based on KNN (K-Nearest-Neighbor) Download PDFInfo
- Publication number
- CN104537157A CN104537157A CN201410767787.4A CN201410767787A CN104537157A CN 104537157 A CN104537157 A CN 104537157A CN 201410767787 A CN201410767787 A CN 201410767787A CN 104537157 A CN104537157 A CN 104537157A
- Authority
- CN
- China
- Prior art keywords
- sample
- unknown
- regression
- recurrence
- sample set
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Pending
Links
Landscapes
- Information Retrieval, Db Structures And Fs Structures Therefor (AREA)
Abstract
The invention is applicable to the field of machine learning and provides a confidence regression algorithm based on KNN (K-Nearest-Neighbor). The confidence regression algorithm comprises the following steps: determining a sample set, wherein the sample set comprises a known regression sample set and an unknown regression sample set; selecting an unknown sample from the unknown regression sample set and calculating an Euclidean distance between the unknown sample and each sample in the known regression sample set; inquiring K samples with the Euclidean distances which are the closest to the unknown sample from the known regression sample set; calculating an average value of regression values of the K samples; predicating a regression value of the unknown sample by a regression module; calculating a difference value T between the regression value and the average value; and dividing an acceptance domain and a rejection domain according to the difference value T. The confidence regression algorithm based on the KNN has the advantage that the regression value is accurate.
Description
Technical field
The invention belongs to machine learning field, particularly relate to a kind of confidence regression algorithm based on KNN and device.
Background technology
The research field of machine learning, except studying classification problem, also has important research field to be exactly the research of regression forecasting.So correspond to confidence learning machine research, confidence sort research and confidence recurrence research also should be comprised.Mainly concentrate in classification problem the research of confidence learning machine at present, confidence machine research most at present also mainly concentrates in confidence classification problem, and the research returned confidence is fewer; But confidence returns has important realistic meaning in high risk applications such as medical diagnosis predictions.
The one support vector machine method that prior art provides does confidence and returns, as document " Support vector regression line modeling and application " (Wang Dingcheng etc. control and decision-making, 2003.1); Document (Zidelmal Z, Amirou A, Belouchrani A.HEARTBEAT CLASSIFICATION USING SUPPORTVECTOR MACHINES (SVMs) WITH AN EMBEDDED REJECT OPTION [J] .INTERNATIONAL JOURNAL OF PATTERN RECOGNITION AND ARTIFICIALINTELLIGENCE, 2012,26 (1)) be studied to the sorter with rejecting option, according to Bayesian learning method, for two class classification problems, propose Optimum Classification device and refusal rule, and threshold value is set accordingly.
In the scheme realizing prior art, find that prior art exists following technical matters:
The technical scheme that prior art provides does not carry out classification pre-service, and confidence returns inaccurate.
Summary of the invention
The object of the embodiment of the present invention is to provide a kind of confidence regression algorithm based on KNN, and its confidence solving prior art returns inaccurate problem.
The embodiment of the present invention is achieved in that a kind of confidence regression algorithm based on KNN, described method comprises the steps: on the one hand
101, determine sample set, this sample set comprises: known recurrence sample set and unknown recurrence sample set;
102, in the unknown recurrence sample set, unknown sample x is selected
p;
103, x is calculated
pand the Euclidean distance D in known recurrence sample set between each sample
e(X, Z);
Wherein,
D
e(X
p, Z
q) represent unknown sample x
pwith known sample Z
qbetween Euclidean distance, wherein, sample X
p=(x
1 p, x
2 p..., x
n p); Sample Z
q=[z
1 q, z
2 q, z
n q]; x
i prepresent sample x
pi-th element; z
i qrepresent sample Z
qi-th element;
104, go out and x at known sample Integrated query
pnearest K the sample of Euclidean distance; The mean value of the regressand value of a calculating K sample
105, forecast of regression model unknown sample x
pregressand value
calculate
with
difference T, set one divide threshold value t; As-t≤T≤t, then by x
pbe divided into recurrence acceptance domain, and determine
if T>t or T<-t is then by x
pbe divided into and return region of rejection, and uncertain
On the other hand, provide a kind of confidence return device based on KNN, described device comprises:
Determining unit, for determining sample set, this sample set comprises: known recurrence sample set and unknown recurrence sample set;
Sampling unit, selects unknown sample x for returning in sample set in the unknown
p;
Computing unit, for calculating x
pand the Euclidean distance D in known recurrence sample set between each sample
e(X, Z);
Wherein,
D
e(X
p, Z
q) represent unknown sample x
pwith known sample Z
qbetween Euclidean distance, wherein, sample X
p=(x
1 p, x
2 p..., x
n p); Sample Z
q=[z
1 q, z
2 q, z
n q]; x
i prepresent sample x
pi-th element; z
i qrepresent sample Z
qi-th element; Go out and x at known sample Integrated query
pnearest K the sample of Euclidean distance; The mean value of the regressand value of a calculating K sample
Division unit, for adopting forecast of regression model unknown sample x
pregressand value
calculate
with
difference T, set one divide threshold value t; As-t≤T≤t, then by x
pbe divided into recurrence acceptance domain, and determine
if T>t or T<-t is then by x
pbe divided into and return region of rejection, and uncertain
In embodiments of the present invention, technical scheme provided by the invention can by arranging concrete threshold value, carry out classification process.The present invention is with KNN (Chinese: nearest neighbor algorithm, English full name: k-NearestNeighbor) algorithm is instrument, the result of recurrence learning is carried out error judgment, realize the division of acceptance domain and region of rejection, thus realize confidence recurrence, and by arranging concrete error value, achieve the controlled of confidence recurrence, multiple experimental data collection such as body fat are verified, achieves good experiment effect.
Accompanying drawing explanation
Fig. 1 is the process flow diagram of a kind of confidence regression algorithm based on KNN provided by the invention;
Fig. 2 is the structural drawing of a kind of confidence return device based on KNN provided by the invention;
Fig. 3 is the operational scheme schematic diagram of the confidence regression algorithm based on KNN provided by the invention.
Embodiment
In order to make object of the present invention, technical scheme and advantage clearly understand, below in conjunction with drawings and Examples, the present invention is further elaborated.Should be appreciated that specific embodiment described herein only in order to explain the present invention, be not intended to limit the present invention.
The specific embodiment of the invention provides a kind of confidence regression algorithm based on KNN, and said method is performed by confidence machine, and the method as shown in Figure 1, comprises the steps:
101, determine sample set, this sample set comprises: known recurrence sample set and unknown recurrence sample set;
102, in the unknown recurrence sample set, unknown sample x is selected
p;
103, x is calculated
pand the Euclidean distance D in known recurrence sample set between each sample
e(X, Z);
Wherein,
D
e(X
p, Z
q) represent unknown sample x
pwith known sample Z
qbetween Euclidean distance, wherein, sample X
p=(x
1 p, x
2 p..., x
n p); Sample Z
q=[z
1 q, z
2 q, z
n q]; x
i prepresent sample x
pi-th element; z
i qrepresent sample Z
qi-th element;
104, go out and x at known sample Integrated query
pnearest K the sample of Euclidean distance; The mean value of the regressand value of a calculating K sample
105, forecast of regression model unknown sample x
pregressand value
calculate
with
difference T, set one divide threshold value t; As-t≤T≤t, then by x
pbe divided into recurrence acceptance domain, and determine
if T>t or T<-t is then by x
pbe divided into and return region of rejection, and uncertain
The present invention arranges concrete threshold value, carries out classification process.The result of recurrence learning for instrument, is carried out error judgment with KNN algorithm by the present invention, realizes the division of acceptance domain and region of rejection, thus realizes confidence recurrence.And by arranging concrete error value, achieve the controlled of confidence recurrence.Multiple experimental data collection such as body fat are verified, achieves good experiment effect
As described in Figure 3, the operational scheme schematic diagram of the confidence regression algorithm based on KNN provided by the invention, the software description of its correspondence is as follows:
(1), CR-KNN algorithm flow is as follows:
Input
X: unknown sample characteristic data value
K: get arest neighbors number
T: error boundary line
Export
Y: predicted value
or
Hand over artificial treatment
Process
1, carry out regression forecasting with the sample characteristics data value x of regression machine model to the unknown, obtain regressand value y
2, calculate with KNN algorithm
value
3, calculate
4、if-t≤T≤t
Export y
else
Hand over artificial treatment;
5, terminate
(2), experimental conditions
Table 1: experiment usage data collection information slip
The experimental result of table 2 LIBSVM on bodyfat
k | t | ReR(%) | MSE | MSE-A |
1 | 0.01 | 25 | 5.322046e-006 | 2.224215e-006 |
2 | 0.01 | 14.81 | 5.322046e-006 | 1.399697e-006 |
3 | 0.01 | 10.77 | 5.322046e-006 | 5.495417e-007 |
4 | 0.01 | 10.96 | 5.322046e-006 | 5.356577e-007 |
5 | 0.01 | 13.27 | 5.322046e-006 | 5.057531e-007 |
6 | 0.01 | 15.38 | 5.322046e-006 | 4.768819e-007 |
7 | 0.01 | 15.19 | 5.322046e-006 | 4.633491e-007 |
8 | 0.01 | 16.15 | 5.322046e-006 | 4.748111e-007 |
9 | 0.01 | 16.15 | 5.322046e-006 | 4.517478e-007 |
10 | 0.01 | 16.73 | 5.322046e-006 | 4.589683e-007 |
Table 3 bodyfat data set tests ten average data tables
k | t | ReR(%) | MSE | MSE-A |
3 | 0.5 | 0 | 5.322046e-006 | 5.322046e-006 |
3 | 0.00001 | 100 | 5.322046e-006 | NaN |
3 | 0.005 | 41.54 | 5.322046e-006 | 5.711509e-007 |
3 | 0.01 | 10.77 | 5.322046e-006 | 5.495417e-007 |
3 | 0.02 | 1.15 | 5.322046e-006 | 2.295326e-006 |
Table 4 five data set experimental data value tables
Data set | k | t | ReR(%) | MSE | MSE-A |
bodyfat | 3 | 0.01 | 10.77 | 5.322046e-006 | 5.495417e-007 |
housing | 3 | 5 | 15 | 9.794674e+000 | 5.034802e+000 |
pyrim | 3 | 0.1 | 20.83 | 8.303049e-003 | 2.991800e-003 |
triazines | 3 | 0.2 | 13.06 | 1.945802e-002 | 7.178456e-003 |
cpusmall | 3 | 8 | 16.67 | 1.878886e+002 | 8.909445e+001 |
Wherein ReR is reject rate, MSE is the meaning of Mean Square Error, be the one more conveniently method weighing " average error ", the MSE in table is Mean Square Error when not dividing acceptance domain and region of rejection, and MSE-A is the Mean Square Error of acceptance domain, clearly, MSE-A is significantly less than MSE, and the validity of algorithm is described, by the division of acceptance domain and region of rejection, greatly reduce the error of acceptance domain, improve the accuracy of regression forecasting.
The specific embodiment of the invention also provides a kind of confidence return device based on KNN, and as shown in Figure 2, described device comprises:
Determining unit 21, for determining sample set, this sample set comprises: known recurrence sample set and unknown recurrence sample set;
Sampling unit 22, selects unknown sample x for returning in sample set in the unknown
p;
Computing unit 23, for calculating x
pand the Euclidean distance D in known recurrence sample set between each sample
e(X, Z);
Wherein,
D
e(X
p, Z
q) represent unknown sample x
pwith known sample Z
qbetween Euclidean distance, wherein, sample X
p=(x
1 p, x
2 p..., x
n p); Sample Z
q=[z
1 q, z
2 q, z
n q]; x
i prepresent sample x
pi-th element; z
i qrepresent sample Z
qi-th element; Go out and x at known sample Integrated query
pnearest K the sample of Euclidean distance; The mean value of the regressand value of a calculating K sample
Division unit 24, for adopting forecast of regression model unknown sample x
pregressand value
calculate
with
difference T, set one divide threshold value t; As-t≤T≤t, then by x
pbe divided into recurrence acceptance domain, and determine
if T>t or T<-t is then by x
pbe divided into and return region of rejection, and uncertain
It should be noted that in above-described embodiment, included unit is carry out dividing according to function logic, but is not limited to above-mentioned division, as long as can realize corresponding function; In addition, the concrete title of each functional unit, also just for the ease of mutual differentiation, is not limited to protection scope of the present invention.
In addition, one of ordinary skill in the art will appreciate that all or part of step realized in the various embodiments described above method is that the hardware that can carry out instruction relevant by program has come, corresponding program can be stored in a computer read/write memory medium, described storage medium, as ROM/RAM, disk or CD etc.
The foregoing is only preferred embodiment of the present invention, not in order to limit the present invention, all any amendments done within the spirit and principles in the present invention, equivalent replacement and improvement etc., all should be included within protection scope of the present invention.
Claims (2)
1. based on a confidence regression algorithm of KNN, it is characterized in that, described method comprises the steps:
101, determine sample set, this sample set comprises: known recurrence sample set and unknown recurrence sample set;
102, in the unknown recurrence sample set, unknown sample x is selected
p;
103, x is calculated
pand the Euclidean distance D in known recurrence sample set between each sample
e(X, Z);
Wherein,
D
e(X
p, Z
q) represent unknown sample x
pwith known sample Z
qbetween Euclidean distance, wherein, sample
Sample
represent sample x
pi-th element;
represent sample Z
qi-th element;
104, go out and x at known sample Integrated query
pnearest K the sample of Euclidean distance; The mean value of the regressand value of a calculating K sample
105, forecast of regression model unknown sample x
pregressand value
calculate
with
difference T, set one divide threshold value t; As-t≤T≤t, then by x
pbe divided into recurrence acceptance domain, and determine
if T>t or T<-t is then by x
pbe divided into and return region of rejection, and uncertain
2. based on a confidence return device of KNN, it is characterized in that, described device comprises:
Determining unit, for determining sample set, this sample set comprises: known recurrence sample set and unknown recurrence sample set;
Sampling unit, selects unknown sample x for returning in sample set in the unknown
p;
Computing unit, for calculating x
pand the Euclidean distance D in known recurrence sample set between each sample
e(X, Z);
Wherein,
D
e(X
p, Z
q) represent unknown sample x
pwith known sample Z
qbetween Euclidean distance, wherein, sample
Sample
represent sample x
pi-th element;
represent sample Z
qi-th element; Go out and x at known sample Integrated query
pnearest K the sample of Euclidean distance; The mean value of the regressand value of a calculating K sample
Division unit, for adopting forecast of regression model unknown sample x
pregressand value
calculate
with
difference T, set one divide threshold value t; As-t≤T≤t, then by x
pbe divided into recurrence acceptance domain, and determine
if T>t or T<-t is then by x
pbe divided into and return region of rejection, and uncertain
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201410767787.4A CN104537157A (en) | 2014-12-12 | 2014-12-12 | Confidence regression algorithm and device based on KNN (K-Nearest-Neighbor) |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201410767787.4A CN104537157A (en) | 2014-12-12 | 2014-12-12 | Confidence regression algorithm and device based on KNN (K-Nearest-Neighbor) |
Publications (1)
Publication Number | Publication Date |
---|---|
CN104537157A true CN104537157A (en) | 2015-04-22 |
Family
ID=52852684
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201410767787.4A Pending CN104537157A (en) | 2014-12-12 | 2014-12-12 | Confidence regression algorithm and device based on KNN (K-Nearest-Neighbor) |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN104537157A (en) |
Cited By (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN109396956A (en) * | 2018-11-06 | 2019-03-01 | 重庆大学 | A kind of chain digital control gear hobbing machine hobboing cutter state intelligent monitoring method |
-
2014
- 2014-12-12 CN CN201410767787.4A patent/CN104537157A/en active Pending
Cited By (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN109396956A (en) * | 2018-11-06 | 2019-03-01 | 重庆大学 | A kind of chain digital control gear hobbing machine hobboing cutter state intelligent monitoring method |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
Duan et al. | Multi-category classification by soft-max combination of binary classifiers | |
CN105426426B (en) | A kind of KNN file classification methods based on improved K-Medoids | |
CN105930862A (en) | Density peak clustering algorithm based on density adaptive distance | |
CN108875067A (en) | text data classification method, device, equipment and storage medium | |
CN105045812A (en) | Text topic classification method and system | |
CN104794500A (en) | Tri-training semi-supervised learning method and device | |
CN107145560A (en) | A kind of file classification method and device | |
Pang et al. | Towards balanced learning for instance recognition | |
Wei et al. | Semi-supervised multi-label image classification based on nearest neighbor editing | |
CN109684477A (en) | A kind of patent text feature extracting method and system | |
CN105809113A (en) | Three-dimensional human face identification method and data processing apparatus using the same | |
CN103593674A (en) | Cervical lymph node ultrasonoscopy feature selection method | |
CN105574213A (en) | Microblog recommendation method and device based on data mining technology | |
Zhong et al. | An improved k-NN classification with dynamic k | |
CN111340057B (en) | Classification model training method and device | |
CN104537157A (en) | Confidence regression algorithm and device based on KNN (K-Nearest-Neighbor) | |
Chen et al. | Support function machine for set-based classification with application to water quality evaluation | |
CN111241269B (en) | Short message text classification method and device, electronic equipment and storage medium | |
Le et al. | Multiple distribution data description learning method for novelty detection | |
CN105760471A (en) | Classification method for two types of texts based on multiconlitron | |
He et al. | Classification based on dimension transposition for high dimension data | |
Le et al. | Multiple distribution data description learning algorithm for novelty detection | |
Le et al. | A theoretical framework for multi-sphere support vector data description | |
KR101133804B1 (en) | Fast kernel quantile clustering method for large-scale data | |
WO2022083047A1 (en) | Method and apparatus for obtaining cell classification model, and computer readable storage medium |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
C06 | Publication | ||
PB01 | Publication | ||
C10 | Entry into substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
RJ01 | Rejection of invention patent application after publication | ||
RJ01 | Rejection of invention patent application after publication |
Application publication date: 20150422 |