CN108710761A

CN108710761A - A kind of robust Model approximating method removing outlier based on spectral clustering

Info

Publication number: CN108710761A
Application number: CN201810494460.2A
Authority: CN
Inventors: 李琦铭; 李俊
Original assignee: Quanzhou Institute of Equipment Manufacturing
Current assignee: Quanzhou Institute of Equipment Manufacturing
Priority date: 2018-05-22
Filing date: 2018-05-22
Publication date: 2018-10-26

Abstract

The present invention relates to a kind of robust Model approximating methods removing outlier based on spectral clustering, generate model hypothesis by carrying out multiple repairing weld to input data, build the similar matrix between preference matrix and its row vector；Then using based on spectral clustering removal outlier and the more structural model data class of generation；Finally according to the stopping function of more structural model instance datas come judge simulate approximating method whether stop, obtaining final models fitting as a result, the model parameter of i.e. more structural model structures.The present invention furthers investigate distinction of outlier during spectral clustering with interior point, efficiently removes outlier, accurately estimates more structural model parameters, and interior point and the result of outlier classification can instruct the subsequent sampling process of local sampling strategy.

Description

A kind of robust Model approximating method removing outlier based on spectral clustering

Technical field

The present invention relates to computer vision model-fitting technique fields, and in particular to one kind removing outlier based on spectral clustering Robust Model approximating method.

Background technology

The models fitting basic research task important as one is played an important role in field of machine vision, regarding Feel that SLAM, motion segmentation, three-dimensional reconstruction, panorama are taken pictures etc. all to have a wide range of applications.As shown in Figure 1, models fitting side The task of method is to generate model hypothesis by multiple data sampling, and the mould of these structure examples is then estimated by model selection Shape parameter.Since the original image and video information obtained from sensor is by camera inherent parameters, shooting angle and distance, light According to the influence of equal environmental changes, the key feature in image is often selected to describe sub (such as local feature description in models fitting Sub- SIFT feature) it is used as input data.And noise (noise), outlier are inevitably present in these data (outlier) information such as, accurately estimate model number and its corresponding parameter be still a great challenge task.

It is existing based on outlier removal pattern fitting method be usually required for independent step to containing outlier It is removed after being differentiated, if removing outlier before model estimation, the interior point of some model instances may be removed simultaneously；Such as Fruit removes after model estimation, and the presence of outlier can influence the accuracy of model parameter estimation；Secondly, most of method Do not use the subsequent sampling process that local sampling strategy is instructed with the relevant information of model fully yet, lead to not for comprising Noise and more structural model instance data quick samplings of high proportion outlier are to clean data subset.

Invention content

In view of the above-mentioned problems, the purpose of the present invention is to provide a kind of robust Model removing outlier based on spectral clustering is quasi- Conjunction method can efficiently remove outlier, improve the accuracy of model parameter.

To achieve the above object, the technical solution adopted by the present invention is：

A kind of robust Model approximating method removing outlier based on spectral clustering, specifically includes following steps：

Similar matrix between step 1, structure preference matrix and its row vector；

From N number of observation data point X={ x₁,x₂,...,x_NIn, M subsets of stochastical sampling, it is false that estimation obtains M model If Θ={ θ₁,θ₂,...,θ_M}；Then each model hypothesis is given to assign corresponding weight：

Wherein, n_mIt is that m-th of the interior of model hypothesis is counted out；It is i-th of data point relative to m-th of model hypothesis Residual error；S_mIt is the interior spot noise scale estimated by IKOSE methods；ψ () and h_mIt is kernel function and its corresponding bandwidth；

The adaptive weight threshold obtained by IKOSE methods removes the low model hypothesis of weight, retains G more Add model hypothesis Θ={ θ of robust₁,θ₂,...,θ_G}；So, preference of N number of observation data point relative to this G model hypothesis Matrix is represented by：

Wherein, P is the two-dimensional matrix of N*M dimensions；r_i ^gIt is residual error of i-th of data point relative to g-th of model hypothesis； S_gIt is interior spot noise scale；

By in preference matrix row vector P (i,:) and P (j,:) distance d (P (i,:),P(j,:)) structure similar matrix：

Wherein, σ is the parameter of exponential function；

Step 2, the subspace classification that similar matrix is obtained by spectral clustering, carry out according to the classification score of concept subspace Outlier differentiates；

The Spectral Clustering that can automatically determine subspace number is used to obtain the k sub-spaces of similar matrix as { C₁, C₂,...,C_k, the concept space of every sub-spaces classification of similar matrix is then built, and calculate the classification point of concept subspace Number, wherein the label that each data point belongs to some structural model example is I={ i₁,i₂,...,i_N}；

Then the classification score of concept subspace is differentiated by following formula：

Wherein, n (C_k) indicate to belong to classification C_kInterior point, as S (C_k) being more than certain value, then the category is interior classification, it is no It is then outlier classification；Outlier classification is removed, the corresponding interior point of more structural model data class is retained；

Step 3 judges whether to stop models fitting according to stopping function；

First, the interior point of obtained model data class is fitted according to least square method, each model can be obtained Parameter：

Wherein, θ_lIt is the model parameter that first of model structure is obtained by least square method；Expression passes through spectrum Cluster all interior points of obtained l model data classes；

Then, the residual sum of squares (RSS) that the corresponding interior point of each model parameter obtains is counted：

Finally, we obtain the item whether models fitting stops according to the difference of the residual sum of squares (RSS) of front and back iteration twice Part：

f^t(θ_l)-f^t-1(θ_l) < δ (8)；

If difference is less than threshold value δ, iteration ends export the model of outlier classification and more structural model examples Parameter；Otherwise, continue sampled point subset in the interior point data of more structural model data categories, estimate model hypothesis and build phase Like matrix, then repeatedly step 2 and step 3 continue next iteration, until meeting formula (8) or reaching greatest iteration time Number.

The kernel function is：

Wherein, t is the threshold constant of setting.

The maximum iteration is 5 times.

After adopting the above scheme, the present invention generates model hypothesis by carrying out multiple repairing weld to input data, builds preference Similar matrix between matrix and its row vector；Then using based on spectral clustering removal outlier and the more structural models of generation Data class；Finally judge to simulate whether approximating method stops according to the stopping function of more structural model instance datas, obtain most Whole models fitting as a result, the model parameter of i.e. more structural model structures.The present invention furthers investigate outlier in spectral clustering mistake The distinction put in Cheng Zhongyu, efficiently removes outlier, accurately estimates more structural model parameters, and interior point and outlier classification Result can instruct the subsequent sampling process of local sampling strategy.

Description of the drawings

Fig. 1 is existing pattern fitting method schematic diagram；

Fig. 2 is flow chart of the method for the present invention；

Fig. 3 is the subspace classification range distribution schematic diagram obtained based on Spectral Clustering.

Specific implementation mode

As shown in Fig. 2, present invention is disclosed a kind of robust Model approximating method removing outlier based on spectral clustering, tool Body includes the following steps：

It is assumed that from N number of observation data point X={ x₁,x₂,...,x_NIn, M subsets of stochastical sampling, estimation obtains M Model hypothesis Θ={ θ₁,θ₂,...,θ_M, it is then quasi- in model in order to reduce some redundancies and poor robustness model hypothesis Influence during conjunction, we assign corresponding weight to each model hypothesis：

Wherein, n_mIt is that m-th of the interior of model hypothesis is counted out；r_i ^mIt is i-th of data point relative to m-th of model hypothesis Residual error；S_mIt is the interior spot noise scale estimated by IKOSE methods；ψ () and h_mIt is kernel function and its corresponding bandwidth.

The expression formula for the kernel function that the present invention uses is as follows：

Wherein, t is the threshold constant of setting, we set it to 2.5.

The adaptive weight threshold obtained again by IKOSE methods removes the lower model hypothesis of some weights, Retain G more robust model hypothesis Θ={ θ₁,θ₂,...,θ_G}.So, N number of observation data point is relative to this G model The preference matrix of hypothesis is represented by：

Wherein, P is the two-dimensional matrix of N*M dimensions；r_i ^gIt is residual error of i-th of data point relative to g-th of model hypothesis； S_gIt is interior spot noise scale.

By in preference matrix row vector P (i,:) and P (j,:) distance d (P (i,:),P(j,:)) it may make up similar square Battle array：

Wherein, σ is the parameter of exponential function.

It is found in previous research：To row vector P (i, m) the structure concept space of preference matrix, can be obtained interior point and The different range distribution of outlier can remove the outlier in data according to the size of range distribution.And this patent passes through experiment It was found that it is that five lines are based on spectral clustering also to have similar property, Fig. 3 in concept space of the similar matrix per sub-spaces classification The range distribution schematic diagram of data point of the subspace classification that method obtains in concept space in classification.

Therefore, the present invention is obtained similar using the Spectral Clustering (Self-Tuning) that can automatically determine subspace number The k sub-spaces of matrix are { C₁,C₂,...,C_k, every sub-spaces classification of similar matrix is then built in concept space Range distribution, and calculate the classification score of concept subspace, judge its category attribute (interior classification or outlier classification), In each data point to belong to the label of some structural model example (classification) be I={ I₁,I₂,...,I_N}.On this basis not only The classification containing larger outlier ratio can be obtained, and can be to the interior point minute of the model instance comprising more structured datas Class.The numerical score that the classification of outlier can be obtained by formula (5) is differentiated：

Wherein, wherein I_iRefer to i-th of data and belongs to which subspace classification, each concept subspace classification C_kIt will basis All data points obtain its fractional value S (C in its classification_k)。n(C_k) indicate to belong to classification C_kInterior point, as S (C_k) be more than centainly Then the category is interior classification to value, is otherwise outlier classification.Outlier classification is removed, it is each to retain more structural model data class Self-corresponding interior point.

Differentiate that outlier is advantageous in that in the subspace structure concept space of similar matrix：In the subspace classification of generation In may include multiple outlier classes, multiple and different distributions is fitted to outlier in this way, is more nearly truthful data.

Step 3 judges whether to stop models fitting according to stopping function；

First, the interior point of obtained model data class is fitted according to least square method (Least Square), it can To obtain the parameter of each model：

Wherein, θ_lIt is the model parameter that first of model structure is obtained by least square method；Expression passes through spectrum Cluster all interior points of obtained l model data classes.

f^t(θ_l)-f^t-1(θ_l) < δ (8)；

If difference is less than threshold value δ, iteration ends export the model of outlier classification and more structural model examples Parameter；Otherwise, continue sampled point subset in the interior point data of more structural model data categories, estimate model hypothesis and build phase Like matrix, then repeatedly step 2 and step 3 continue next iteration, until meeting formula (8) or reaching maximum iteration (5 times).

When for more structural model examples of relatively simple lines the case where, pass through one to cluster process twice Simultaneously accurately to export the model parameter of outlier and more structured datas.And when in face of needing to estimate that the list of more structural models is answered It, can be according to once clustering as a result, change in obtained interior point data guidance next time when the complex situations such as matrix and basis matrix For the sampling of process, generate more robust comprising more interior model hypothesis put.This have the advantage that by each iteration, We can obtain cleaner and more robust model hypothesis, and preferably distinguish outlier information.

In order to verify the performance of the present invention, above-mentioned pattern fitting method, code fortune are realized with Matlab Programming with Pascal Language Capable hardware platform is 8 core processors of 3.4GHZ.More structural models of five straight lines of the selection comprising high proportion outlier are made For test data of experiment collection, including more structural models of five straight lines in the test set, it is 50 to count out in every straight line, Peeling off, to count out be 250, and the total number of data point is 500.

Experiment 50 times is repeated to the test set, the point number of subsets of each initial random acquisition is 2000.We utilize 50 times Average false drop rate and minimum false drop rate as evaluation criterion, while giving the run time of each algorithm as a comparison, The calculation formula of middle false drop rate is as follows：

The pattern fitting method and the classical model approximating method based on outlier removing method that table 1 gives the present invention Comparison result, it is specific as shown in table 1.

Method	Average false drop rate (%)	Minimum false drop rate (%)	Run time (second)
				KF	25.02	16.6	2.59
T-linkage	26.07	19.6	24.87
				The pattern fitting method of the present invention	16.02	12.6	1.94

Table 1

In table 1, KF be model estimation before remove outlier method, T-linkage be model estimation after remove from The method of group's point.Result can be seen that the pattern fitting method of the present invention is substantially better than other methods from table, achieve minimum Average false drop rate (16.02%) and minimum false drop rate (12.6%)；Simultaneously at runtime on, this patent propose method (1.94 seconds) embody the high efficiency of this patent method also below other control methods.

To sum up, the robust Model approximating method proposed by the present invention that outlier is removed based on spectral clustering can reach efficient standard True effect, to provide preferably theoretical base for the practical application of more structural model approximating methods comprising high proportion outlier Plinth.

The above is only the embodiment of the present invention, is not intended to limit the scope of the present invention, therefore every According to the technical essence of the invention to any subtle modifications, equivalent variations and modifications made by above example, this is still fallen within In the range of inventive technique scheme.

Claims

1. a kind of robust Model approximating method removing outlier based on spectral clustering, it is characterised in that：The pattern fitting method Specifically include following steps：

From N number of observation data point X={ x₁,x₂,...,x_NIn, M subsets of stochastical sampling, estimation obtains M model hypothesis Θ ={ θ₁,θ₂,...,θ_M}；Then each model hypothesis is given to assign corresponding weight：

Wherein, n_mIt is that m-th of the interior of model hypothesis is counted out；r_i ^mIt is i-th of data point relative to the residual of m-th model hypothesis Difference；S_mIt is the interior spot noise scale estimated by IKOSE methods；ψ () and h_mIt is kernel function and its corresponding bandwidth；

The adaptive weight threshold obtained by IKOSE methods removes the low model hypothesis of weight, retains G more Shandongs Model hypothesis Θ={ θ of stick₁,θ₂,...,θ_G}；So, preference matrix of N number of observation data point relative to this G model hypothesis It is represented by：

Wherein, P is the two-dimensional matrix of N*M dimensions；r_i ^gIt is residual error of i-th of data point relative to g-th of model hypothesis；S_gIt is Interior spot noise scale；

Wherein, σ is the parameter of exponential function；

Step 2, the subspace classification that similar matrix is obtained by spectral clustering, peel off according to the classification score of concept subspace Point differentiates；

The Spectral Clustering that can automatically determine subspace number is used to obtain the k sub-spaces of similar matrix as { C₁,C₂,..., C_k, the concept space of every sub-spaces classification of similar matrix is then built, and calculate the classification score of concept subspace, In each data point to belong to the label of some structural model example be I={ i₁,i₂,...,i_N}；

Wherein, n (C_k) indicate to belong to classification C_kInterior point, as S (C_k) be more than certain value then the category be interior classification, otherwise for Outlier classification；Outlier classification is removed, the corresponding interior point of more structural model data class is retained；

Step 3 judges whether to stop models fitting according to stopping function；

First, the interior point of obtained model data class is fitted according to least square method, the ginseng of each model can be obtained Number：

Wherein, θ_lIt is the model parameter that first of model structure is obtained by least square method；Expression passes through spectral clustering All interior points of obtained l model data classes；

Finally, we obtain the condition whether models fitting stops according to the difference of the residual sum of squares (RSS) of front and back iteration twice：

f^t(θ_l)-f^t-1(θ_l) < δ (8)；

If difference is less than threshold value δ, iteration ends export outlier classification and the model ginseng of more structural model examples Number；Otherwise, continue sampled point subset in the interior point data of more structural model data categories, estimate model hypothesis and build similar Matrix, then repeatedly step 2 and step 3 continue next iteration, until meeting formula (8) or reaching maximum iteration.

2. a kind of robust Model approximating method being removed outlier based on spectral clustering according to claim 1, feature are existed In：The kernel function is：

Wherein, t is the threshold constant of setting.

3. a kind of robust Model approximating method being removed outlier based on spectral clustering according to claim 1, feature are existed In：The maximum iteration is 5 times.