CN104537157A - Confidence regression algorithm and device based on KNN (K-Nearest-Neighbor) - Google Patents

Confidence regression algorithm and device based on KNN (K-Nearest-Neighbor) Download PDF

Info

Publication number
CN104537157A
CN104537157A CN201410767787.4A CN201410767787A CN104537157A CN 104537157 A CN104537157 A CN 104537157A CN 201410767787 A CN201410767787 A CN 201410767787A CN 104537157 A CN104537157 A CN 104537157A
Authority
CN
China
Prior art keywords
sample
unknown
regression
recurrence
sample set
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN201410767787.4A
Other languages
Chinese (zh)
Inventor
蒋方纯
田盛丰
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Shenzhen Institute of Information Technology
Original Assignee
Shenzhen Institute of Information Technology
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Shenzhen Institute of Information Technology filed Critical Shenzhen Institute of Information Technology
Priority to CN201410767787.4A priority Critical patent/CN104537157A/en
Publication of CN104537157A publication Critical patent/CN104537157A/en
Pending legal-status Critical Current

Links

Landscapes

  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

The invention is applicable to the field of machine learning and provides a confidence regression algorithm based on KNN (K-Nearest-Neighbor). The confidence regression algorithm comprises the following steps: determining a sample set, wherein the sample set comprises a known regression sample set and an unknown regression sample set; selecting an unknown sample from the unknown regression sample set and calculating an Euclidean distance between the unknown sample and each sample in the known regression sample set; inquiring K samples with the Euclidean distances which are the closest to the unknown sample from the known regression sample set; calculating an average value of regression values of the K samples; predicating a regression value of the unknown sample by a regression module; calculating a difference value T between the regression value and the average value; and dividing an acceptance domain and a rejection domain according to the difference value T. The confidence regression algorithm based on the KNN has the advantage that the regression value is accurate.

Description

Based on confidence regression algorithm and the device of KNN
Technical field
The invention belongs to machine learning field, particularly relate to a kind of confidence regression algorithm based on KNN and device.
Background technology
The research field of machine learning, except studying classification problem, also has important research field to be exactly the research of regression forecasting.So correspond to confidence learning machine research, confidence sort research and confidence recurrence research also should be comprised.Mainly concentrate in classification problem the research of confidence learning machine at present, confidence machine research most at present also mainly concentrates in confidence classification problem, and the research returned confidence is fewer; But confidence returns has important realistic meaning in high risk applications such as medical diagnosis predictions.
The one support vector machine method that prior art provides does confidence and returns, as document " Support vector regression line modeling and application " (Wang Dingcheng etc. control and decision-making, 2003.1); Document (Zidelmal Z, Amirou A, Belouchrani A.HEARTBEAT CLASSIFICATION USING SUPPORTVECTOR MACHINES (SVMs) WITH AN EMBEDDED REJECT OPTION [J] .INTERNATIONAL JOURNAL OF PATTERN RECOGNITION AND ARTIFICIALINTELLIGENCE, 2012,26 (1)) be studied to the sorter with rejecting option, according to Bayesian learning method, for two class classification problems, propose Optimum Classification device and refusal rule, and threshold value is set accordingly.
In the scheme realizing prior art, find that prior art exists following technical matters:
The technical scheme that prior art provides does not carry out classification pre-service, and confidence returns inaccurate.
Summary of the invention
The object of the embodiment of the present invention is to provide a kind of confidence regression algorithm based on KNN, and its confidence solving prior art returns inaccurate problem.
The embodiment of the present invention is achieved in that a kind of confidence regression algorithm based on KNN, described method comprises the steps: on the one hand
101, determine sample set, this sample set comprises: known recurrence sample set and unknown recurrence sample set;
102, in the unknown recurrence sample set, unknown sample x is selected p;
103, x is calculated pand the Euclidean distance D in known recurrence sample set between each sample e(X, Z);
Wherein, D E ( X p , Z q ) = Σ i = 1 n ( x i p - z i q ) 2 ;
D e(X p, Z q) represent unknown sample x pwith known sample Z qbetween Euclidean distance, wherein, sample X p=(x 1 p, x 2 p..., x n p); Sample Z q=[z 1 q, z 2 q, z n q]; x i prepresent sample x pi-th element; z i qrepresent sample Z qi-th element;
104, go out and x at known sample Integrated query pnearest K the sample of Euclidean distance; The mean value of the regressand value of a calculating K sample
105, forecast of regression model unknown sample x pregressand value calculate with difference T, set one divide threshold value t; As-t≤T≤t, then by x pbe divided into recurrence acceptance domain, and determine if T>t or T<-t is then by x pbe divided into and return region of rejection, and uncertain
On the other hand, provide a kind of confidence return device based on KNN, described device comprises:
Determining unit, for determining sample set, this sample set comprises: known recurrence sample set and unknown recurrence sample set;
Sampling unit, selects unknown sample x for returning in sample set in the unknown p;
Computing unit, for calculating x pand the Euclidean distance D in known recurrence sample set between each sample e(X, Z);
Wherein, D E ( X p , Z q ) = &Sigma; i = 1 n ( x i p - z i q ) 2 ;
D e(X p, Z q) represent unknown sample x pwith known sample Z qbetween Euclidean distance, wherein, sample X p=(x 1 p, x 2 p..., x n p); Sample Z q=[z 1 q, z 2 q, z n q]; x i prepresent sample x pi-th element; z i qrepresent sample Z qi-th element; Go out and x at known sample Integrated query pnearest K the sample of Euclidean distance; The mean value of the regressand value of a calculating K sample
Division unit, for adopting forecast of regression model unknown sample x pregressand value calculate with difference T, set one divide threshold value t; As-t≤T≤t, then by x pbe divided into recurrence acceptance domain, and determine if T>t or T<-t is then by x pbe divided into and return region of rejection, and uncertain
In embodiments of the present invention, technical scheme provided by the invention can by arranging concrete threshold value, carry out classification process.The present invention is with KNN (Chinese: nearest neighbor algorithm, English full name: k-NearestNeighbor) algorithm is instrument, the result of recurrence learning is carried out error judgment, realize the division of acceptance domain and region of rejection, thus realize confidence recurrence, and by arranging concrete error value, achieve the controlled of confidence recurrence, multiple experimental data collection such as body fat are verified, achieves good experiment effect.
Accompanying drawing explanation
Fig. 1 is the process flow diagram of a kind of confidence regression algorithm based on KNN provided by the invention;
Fig. 2 is the structural drawing of a kind of confidence return device based on KNN provided by the invention;
Fig. 3 is the operational scheme schematic diagram of the confidence regression algorithm based on KNN provided by the invention.
Embodiment
In order to make object of the present invention, technical scheme and advantage clearly understand, below in conjunction with drawings and Examples, the present invention is further elaborated.Should be appreciated that specific embodiment described herein only in order to explain the present invention, be not intended to limit the present invention.
The specific embodiment of the invention provides a kind of confidence regression algorithm based on KNN, and said method is performed by confidence machine, and the method as shown in Figure 1, comprises the steps:
101, determine sample set, this sample set comprises: known recurrence sample set and unknown recurrence sample set;
102, in the unknown recurrence sample set, unknown sample x is selected p;
103, x is calculated pand the Euclidean distance D in known recurrence sample set between each sample e(X, Z);
Wherein, D E ( X p , Z q ) = &Sigma; i = 1 n ( x i p - z i q ) 2 ;
D e(X p, Z q) represent unknown sample x pwith known sample Z qbetween Euclidean distance, wherein, sample X p=(x 1 p, x 2 p..., x n p); Sample Z q=[z 1 q, z 2 q, z n q]; x i prepresent sample x pi-th element; z i qrepresent sample Z qi-th element;
104, go out and x at known sample Integrated query pnearest K the sample of Euclidean distance; The mean value of the regressand value of a calculating K sample
105, forecast of regression model unknown sample x pregressand value calculate with difference T, set one divide threshold value t; As-t≤T≤t, then by x pbe divided into recurrence acceptance domain, and determine if T>t or T<-t is then by x pbe divided into and return region of rejection, and uncertain
The present invention arranges concrete threshold value, carries out classification process.The result of recurrence learning for instrument, is carried out error judgment with KNN algorithm by the present invention, realizes the division of acceptance domain and region of rejection, thus realizes confidence recurrence.And by arranging concrete error value, achieve the controlled of confidence recurrence.Multiple experimental data collection such as body fat are verified, achieves good experiment effect
As described in Figure 3, the operational scheme schematic diagram of the confidence regression algorithm based on KNN provided by the invention, the software description of its correspondence is as follows:
(1), CR-KNN algorithm flow is as follows:
Input
X: unknown sample characteristic data value
K: get arest neighbors number
T: error boundary line
Export
Y: predicted value
or
Hand over artificial treatment
Process
1, carry out regression forecasting with the sample characteristics data value x of regression machine model to the unknown, obtain regressand value y
2, calculate with KNN algorithm value
3, calculate
4、if-t≤T≤t
Export y
else
Hand over artificial treatment;
5, terminate
(2), experimental conditions
Table 1: experiment usage data collection information slip
The experimental result of table 2 LIBSVM on bodyfat
k t ReR(%) MSE MSE-A
1 0.01 25 5.322046e-006 2.224215e-006
2 0.01 14.81 5.322046e-006 1.399697e-006
3 0.01 10.77 5.322046e-006 5.495417e-007
4 0.01 10.96 5.322046e-006 5.356577e-007
5 0.01 13.27 5.322046e-006 5.057531e-007
6 0.01 15.38 5.322046e-006 4.768819e-007
7 0.01 15.19 5.322046e-006 4.633491e-007
8 0.01 16.15 5.322046e-006 4.748111e-007
9 0.01 16.15 5.322046e-006 4.517478e-007
10 0.01 16.73 5.322046e-006 4.589683e-007
Table 3 bodyfat data set tests ten average data tables
k t ReR(%) MSE MSE-A
3 0.5 0 5.322046e-006 5.322046e-006
3 0.00001 100 5.322046e-006 NaN
3 0.005 41.54 5.322046e-006 5.711509e-007
3 0.01 10.77 5.322046e-006 5.495417e-007
3 0.02 1.15 5.322046e-006 2.295326e-006
Table 4 five data set experimental data value tables
Data set k t ReR(%) MSE MSE-A
bodyfat 3 0.01 10.77 5.322046e-006 5.495417e-007
housing 3 5 15 9.794674e+000 5.034802e+000
pyrim 3 0.1 20.83 8.303049e-003 2.991800e-003
triazines 3 0.2 13.06 1.945802e-002 7.178456e-003
cpusmall 3 8 16.67 1.878886e+002 8.909445e+001
Wherein ReR is reject rate, MSE is the meaning of Mean Square Error, be the one more conveniently method weighing " average error ", the MSE in table is Mean Square Error when not dividing acceptance domain and region of rejection, and MSE-A is the Mean Square Error of acceptance domain, clearly, MSE-A is significantly less than MSE, and the validity of algorithm is described, by the division of acceptance domain and region of rejection, greatly reduce the error of acceptance domain, improve the accuracy of regression forecasting.
The specific embodiment of the invention also provides a kind of confidence return device based on KNN, and as shown in Figure 2, described device comprises:
Determining unit 21, for determining sample set, this sample set comprises: known recurrence sample set and unknown recurrence sample set;
Sampling unit 22, selects unknown sample x for returning in sample set in the unknown p;
Computing unit 23, for calculating x pand the Euclidean distance D in known recurrence sample set between each sample e(X, Z);
Wherein, D E ( X p , Z q ) = &Sigma; i = 1 n ( x i p - z i q ) 2 ;
D e(X p, Z q) represent unknown sample x pwith known sample Z qbetween Euclidean distance, wherein, sample X p=(x 1 p, x 2 p..., x n p); Sample Z q=[z 1 q, z 2 q, z n q]; x i prepresent sample x pi-th element; z i qrepresent sample Z qi-th element; Go out and x at known sample Integrated query pnearest K the sample of Euclidean distance; The mean value of the regressand value of a calculating K sample
Division unit 24, for adopting forecast of regression model unknown sample x pregressand value calculate with difference T, set one divide threshold value t; As-t≤T≤t, then by x pbe divided into recurrence acceptance domain, and determine if T>t or T<-t is then by x pbe divided into and return region of rejection, and uncertain
It should be noted that in above-described embodiment, included unit is carry out dividing according to function logic, but is not limited to above-mentioned division, as long as can realize corresponding function; In addition, the concrete title of each functional unit, also just for the ease of mutual differentiation, is not limited to protection scope of the present invention.
In addition, one of ordinary skill in the art will appreciate that all or part of step realized in the various embodiments described above method is that the hardware that can carry out instruction relevant by program has come, corresponding program can be stored in a computer read/write memory medium, described storage medium, as ROM/RAM, disk or CD etc.
The foregoing is only preferred embodiment of the present invention, not in order to limit the present invention, all any amendments done within the spirit and principles in the present invention, equivalent replacement and improvement etc., all should be included within protection scope of the present invention.

Claims (2)

1. based on a confidence regression algorithm of KNN, it is characterized in that, described method comprises the steps:
101, determine sample set, this sample set comprises: known recurrence sample set and unknown recurrence sample set;
102, in the unknown recurrence sample set, unknown sample x is selected p;
103, x is calculated pand the Euclidean distance D in known recurrence sample set between each sample e(X, Z);
Wherein, D E ( X p , Z q ) = &Sigma; i = 1 n ( x i p - z i q ) 2 ;
D e(X p, Z q) represent unknown sample x pwith known sample Z qbetween Euclidean distance, wherein, sample X p = ( x 1 p , p 2 p , . . . , x n p ) ; Sample Z q = [ z 1 q , z 2 q . . . , z n q ] ; represent sample x pi-th element; represent sample Z qi-th element;
104, go out and x at known sample Integrated query pnearest K the sample of Euclidean distance; The mean value of the regressand value of a calculating K sample
105, forecast of regression model unknown sample x pregressand value calculate with difference T, set one divide threshold value t; As-t≤T≤t, then by x pbe divided into recurrence acceptance domain, and determine if T>t or T<-t is then by x pbe divided into and return region of rejection, and uncertain
2. based on a confidence return device of KNN, it is characterized in that, described device comprises:
Determining unit, for determining sample set, this sample set comprises: known recurrence sample set and unknown recurrence sample set;
Sampling unit, selects unknown sample x for returning in sample set in the unknown p;
Computing unit, for calculating x pand the Euclidean distance D in known recurrence sample set between each sample e(X, Z);
Wherein, D E ( X p , Z q ) = &Sigma; i = 1 n ( x i p - z i q ) 2 ;
D e(X p, Z q) represent unknown sample x pwith known sample Z qbetween Euclidean distance, wherein, sample X p = ( x 1 p , p 2 p , . . . , x n p ) ; Sample Z q = [ z 1 q , z 2 q . . . , z n q ] ; represent sample x pi-th element; represent sample Z qi-th element; Go out and x at known sample Integrated query pnearest K the sample of Euclidean distance; The mean value of the regressand value of a calculating K sample
Division unit, for adopting forecast of regression model unknown sample x pregressand value calculate with difference T, set one divide threshold value t; As-t≤T≤t, then by x pbe divided into recurrence acceptance domain, and determine if T>t or T<-t is then by x pbe divided into and return region of rejection, and uncertain
CN201410767787.4A 2014-12-12 2014-12-12 Confidence regression algorithm and device based on KNN (K-Nearest-Neighbor) Pending CN104537157A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201410767787.4A CN104537157A (en) 2014-12-12 2014-12-12 Confidence regression algorithm and device based on KNN (K-Nearest-Neighbor)

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201410767787.4A CN104537157A (en) 2014-12-12 2014-12-12 Confidence regression algorithm and device based on KNN (K-Nearest-Neighbor)

Publications (1)

Publication Number Publication Date
CN104537157A true CN104537157A (en) 2015-04-22

Family

ID=52852684

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201410767787.4A Pending CN104537157A (en) 2014-12-12 2014-12-12 Confidence regression algorithm and device based on KNN (K-Nearest-Neighbor)

Country Status (1)

Country Link
CN (1) CN104537157A (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109396956A (en) * 2018-11-06 2019-03-01 重庆大学 A kind of chain digital control gear hobbing machine hobboing cutter state intelligent monitoring method

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109396956A (en) * 2018-11-06 2019-03-01 重庆大学 A kind of chain digital control gear hobbing machine hobboing cutter state intelligent monitoring method

Similar Documents

Publication Publication Date Title
Duan et al. Multi-category classification by soft-max combination of binary classifiers
CN105426426B (en) A kind of KNN file classification methods based on improved K-Medoids
CN105930862A (en) Density peak clustering algorithm based on density adaptive distance
CN108875067A (en) text data classification method, device, equipment and storage medium
CN105045812A (en) Text topic classification method and system
CN104794500A (en) Tri-training semi-supervised learning method and device
CN107145560A (en) A kind of file classification method and device
Pang et al. Towards balanced learning for instance recognition
Wei et al. Semi-supervised multi-label image classification based on nearest neighbor editing
CN109684477A (en) A kind of patent text feature extracting method and system
CN105809113A (en) Three-dimensional human face identification method and data processing apparatus using the same
CN103593674A (en) Cervical lymph node ultrasonoscopy feature selection method
CN105574213A (en) Microblog recommendation method and device based on data mining technology
Zhong et al. An improved k-NN classification with dynamic k
CN111340057B (en) Classification model training method and device
CN104537157A (en) Confidence regression algorithm and device based on KNN (K-Nearest-Neighbor)
Chen et al. Support function machine for set-based classification with application to water quality evaluation
CN111241269B (en) Short message text classification method and device, electronic equipment and storage medium
Le et al. Multiple distribution data description learning method for novelty detection
CN105760471A (en) Classification method for two types of texts based on multiconlitron
He et al. Classification based on dimension transposition for high dimension data
Le et al. Multiple distribution data description learning algorithm for novelty detection
Le et al. A theoretical framework for multi-sphere support vector data description
KR101133804B1 (en) Fast kernel quantile clustering method for large-scale data
WO2022083047A1 (en) Method and apparatus for obtaining cell classification model, and computer readable storage medium

Legal Events

Date Code Title Description
C06 Publication
PB01 Publication
C10 Entry into substantive examination
SE01 Entry into force of request for substantive examination
RJ01 Rejection of invention patent application after publication
RJ01 Rejection of invention patent application after publication

Application publication date: 20150422