CN103902737A

CN103902737A - Projection pursuit classification modeling software and implementation based on swarm intelligence algorithms

Info

Publication number: CN103902737A
Application number: CN201410160986.9A
Authority: CN
Inventors: 熊聘; 楼文高; 乔龙; 楼际通; 陈冬露; 熊辉; 陈鹏辉; 代辉
Original assignee: SHANGHAI BUSINESS SCHOOL; University of Shanghai for Science and Technology
Current assignee: SHANGHAI BUSINESS SCHOOL; University of Shanghai for Science and Technology
Priority date: 2014-04-22
Filing date: 2014-04-22
Publication date: 2014-07-02

Abstract

The invention belongs to the technical field of computer application, and provides projection pursuit classification modeling software and implementation based on swarm intelligence algorithms. The modeling software comprises a sample data module, a normalization method module, a PPC modeling module and a globally optimal solution module. The implementation includes the steps that normalization preprocessing is conducted on sample data, and an objective function of a PPC model is built; the objective function of the PPC model is calculated, and the R value range is provided; the objective function of the PPC model is solved through the swarm intelligence optimization algorithms, and then an optimal projection vector is acquired; a criterion for judging whether a real globally optimal solution is acquired in the optimization process is provided; optimal solutions acquired through the three different swarm intelligence algorithms are judged, and then the globally optimal solution is acquired. The way for determining the reasonable R value range is provided, the criterion for judging whether the acquired solution is the globally optimal solution is provided, the calculation results are conveniently compared and verified, and therefore reliability, reasonability and correctness of PPC modeling are improved.

Description

Projection pursuit classification model construction software and realization based on swarm intelligence algorithm

Technical field

The invention belongs to Computer Applied Technology field, particularly a kind ofly solve the classification of high dimensional data and projection pursuit classification model construction software and the realization based on swarm intelligence algorithm of sequencing problem.

Background technology

In the time processing the data of higher-dimension, non-linear, skewed distribution, the effect of traditional statistical method is poor.1974, Friedman has proposed projection pursuit classification (Projection Pursuit Clustering the is called for short PPC) model that can effectively address this problem, and this model is introduced into domestic subsequently, and be widely used in various fields, also obtain good effect.The basic thought of PPC modeling is: find a projection vector, original high dimensional data is mapped on low n-dimensional subspace n, and disclose the architectural characteristic of raw data by the data rule on research subspace, to reach the object of research high dimensional data.One dimension PPC simulated target function Q (b)=max (S that Friedman proposes _y* D _y) most widely used, the present invention is taking this objective function as research object.

In PPC modeling process, first to eliminate the adverse effects of bringing due to dimension difference between sample data, must be normalized sample data.Conventional normalization processing method has extreme difference normalization method, maximal value normalization method and zero-mean normalization method, three kinds of different method for normalizing have different impacts to PPC modeling result, and application person should select rational method for normalizing for the characteristic distributions of sample data.

Can find out from PPC simulated target function, the windows radius R of local density is unique parameter that affects projection vector, therefore chooses one of key that rational R value is PPC modeling.Friedman proposes to determine that rational R value need to meet two requirements: 1. require subpoint to disperse as far as possible on the whole, and intensive as far as possible on part; 2. the subpoint number in projection window can not be very little, in order to avoid deviation is too large when sample running mean, can not increase too large along with the increase of number of samples n again simultaneously.Domestic scholars, according to the requirement of choosing R value, has proposed three kinds of schemes of definite R value: 1. smaller value scheme, i.e. R≤0.1S _z; 2. higher value scheme, i.e. r _max≤ R≤2p; 3. appropriate value scheme in the middle of, i.e. r _max5≤R≤r _max3.

At present, can provide PPC modeling without any a software, reliability, rationality and the correctness of the globally optimal solution that the PPC modeling that adopts original method to realize is tried to achieve are all very low.

Therefore, a result that can contrast three kinds of different method for normalizing is badly in need of in Computer Applied Technology field, the rational R value of how to confirm scope is proposed, and then propose how to judge that the result of trying to achieve is the criterion of globally optimal solution, result of calculation is convenient to comparatively validate, has improved the projection pursuit classification model construction software based on swarm intelligence algorithm of reliability, rationality and the correctness of the globally optimal solution that PPC modeling tries to achieve.

Summary of the invention

The invention provides a kind of projection pursuit classification model construction software and realization based on swarm intelligence algorithm, technical scheme is as follows:

Projection pursuit classification model construction software based on swarm intelligence algorithm, comprising:

Sample data module, for gathering sample data;

Method for normalizing module, is connected with sample data module, for the sample data collecting is carried out to pre-service;

PPC MBM, is connected with method for normalizing module, with setting up PPC model through pretreated sample data;

Globally optimal solution module, be connected with PPC MBM, set rational R value, try to achieve globally optimal solution by colony intelligence optimization algorithm, and judge that according to the criterion of whether trying to achieve real globally optimal solution whether the globally optimal solution that optimization procedure tries to achieve is correct, thereby try to achieve optimum projection vector and sample projection value.

As above the projection pursuit classification model construction software based on swarm intelligence algorithm, wherein, method for normalizing module, also comprises: extreme difference normalization method module, maximal value normalization method module, zero-mean normalization method module.

As above the projection pursuit classification model construction software based on swarm intelligence algorithm, wherein, globally optimal solution module, also comprises: particle cluster algorithm module, multi-Agent Genetic Algorithm module, chaos ant colony algorithm module.

The implementation method of the projection pursuit classification model construction software based on swarm intelligence algorithm, wherein, comprises the steps:

Step 1: sample data normalization pre-service;

First by sample data acquisition module collecting sample data, because the variation range of the unit between the each index of sample data, evaluation criterion, numerical values recited etc. all exists larger difference, in order to retain as far as possible relative change information and the rule between evaluation index value in raw sample data, must carry out pre-service to raw sample data, provide three kinds of different method for normalizing to carry out pre-service to raw sample data by method for normalizing module, obtain forward index and reverse index, concrete grammar is:

1. adopt extreme difference normalization method, obtain forward index and reverse desired value, be specially:

Forward index:

Reverse index:

2. adopt maximal value normalization method, obtain forward index and reverse desired value, be specially:

Forward index:

Reverse index:

3. adopt zero-mean normalization method, obtain forward index and reverse desired value, be specially

Forward index:

Reverse index:

Wherein, in three kinds of different method for normalizing, x _i,j,

be j forward desired value, reverse desired value and the desired value of i sample respectively;

be maximal value and the minimum value of j index respectively;

be the average of j index,

be the standard deviation of j index;

Step 2: set up the objective function Q (b) of PPC model by PPC MBM, and by solving the maximal value of objective function Q (b), obtain further optimum projection vector coefficient or be called weight

First, order

for p dimension projection vector, b _jbe the projection vector coefficient of j index or be called weight, the projection value of i sample is

Disperse as far as possible on the whole and local intensive as far as possible basic thought according to all sample subpoints of PPC modeling demand, show that the objective function Q (b) of PPC model is for the standard deviation S of sample subpoint _ywith local density D _yproduct, by solving the maximal value of objective function Q (b), obtain further optimum projection vector coefficient or be called the b of weight _j, that is:

Q(b)＝max(S _y*D _y)，

Constraint condition:

-1≤b _j≤ 1,

Wherein, the standard deviation of sample subpoint

n is number of samples,

for the average of y (i); Local density

distance r between sample i and k _i,k=| y (i)-y (k) |, R is local density's windows radius, sign function f (R-r _i,k) be unit-step function, work as R-r _i,k>=0 o'clock, f (R-r _i,k)=1, otherwise, f (R-r _i,k)=0;

Step 3: according to peaked the solving of objective function Q (b) to PPC model in step 2, proposed a kind of more rationally, the span of R value accurately, be specially;

r _max5≤R≤r _max/3,

Wherein, R is windows radius, r _maxfor the ultimate range between all samples;

Step 4: by the span of the R value that defines in globally optimal solution module and step 3, the objective function Q (b) that application group intelligence optimization algorithm solves PPC model, tries to achieve globally optimal solution P further _g;

Propose three kinds of distinct group intelligence optimization algorithms, comprising: particle cluster algorithm, multi-Agent Genetic Algorithm and chaos ant colony algorithm, respectively PPC simulated target function is solved, to guarantee the correctness of the PPC model of setting up, specific algorithm is as follows:

1. particle cluster algorithm, the concrete formula of speed and position is:

\{\begin{matrix} V_{d} (t + 1) = ω V_{d} (t) + c_{1} . {rand}_{1} . (P_{d} - X_{d} (t)) + c_{2} . {rand}_{2} . (P_{d} - X_{d} (t)) \\ X_{d} (t + 1) = X_{d} (t) + V_{d} (t + 1) \end{matrix},

Constraint condition: s.t.-V _max< V _d(t+1) < V _max,

Wherein, c ₁representing self study coefficient, is constant, and occurrence is c ₁=2, c ₂representing social learning's coefficient, is constant, and occurrence is c ₂=2; Rand ₁and rand ₂for the random number of [0,1]; ω represents inertial coefficient, from maximal value ω _maxlinearity is reduced to minimum value ω _min, computing formula is

ω in this algorithm _max=0.9, ω _min=0.4, iterative steps t=1,2 ..., N, N is greatest iteration step number, V _maxrepresent the maximal rate that particle moves, V in this algorithm _max=0.5, V _d(t) represent the translational speed of d particle in the time that iterative steps is t, X _d(t) represent the position of d particle in the time that iterative steps is t, d=1,2 ..., M, M is population scale, P _drepresent the optimal location of d particle experience, P _grepresent the optimal location that in colony, preferably particle experiences, delete and select globally optimal solution according to constraint condition; In the time that t has exceeded greatest iteration step number N, stop calculating;

2. multi-Agent Genetic Algorithm, concrete steps are:

1. initialization size

each intelligent body L _d,e=(X ₁, X ₂..., X _p),

be all p dimension vector of unit length, intelligent body L _d,ethe value of each component upper in [1,1] ,-1≤X _j≤ 1, j=1～p,

Wherein,

represent initial intelligent body size, L ⁰represent original state Agent Grid;

2. couple L ^tin each intelligent body execution field competition operator calculate, obtain L ^t+1/3;

Wherein, L ^trepresent that t is for Agent Grid, L ^t+1/3l ^tand L ^t+1between centre for Agent Grid, _trepresent iterative steps;

3. couple L ^t+1/3in each intelligent body calculate, if U (0,1) < P _c, field orthogonal crossover operator is acted on each intelligent body, obtain L ^t+2/3;

Wherein, L ^t+2/3l ^tand L ^t+1between centre for Agent Grid, U (0,1) represents relative search radius, P _cbe predefined parameter, for the execution of orthogonal crossover operator in control field, U in this algorithm (0,1) is the random number of [0,1], P _c=0.1;

4. couple L ^t+2/3in each intelligent body calculate, if U (0,1) < P _m, mutation operator is acted on each intelligent body, obtain L ^t+1;

Wherein, P _mpredefined parameter, for the execution of mutation operator in control field, P in this algorithm _m=0.1;

If 5. iterative steps t has exceeded greatest iteration step number N, stop calculating, and export P _g, otherwise make t+1 → t and repeat the calculating of step 2 to four, again obtain L ^t+1/3, L ^t+2/3and L ^t+1;

Wherein, P _gagent Grid L ⁰, L ¹..., L ^tin optimum intelligent body, i.e. the globally optimal solution of PPC simulated target function;

3. chaos ant colony algorithm, by each food source X _d(d=1,2 ..., M) position represent a feasible solution of PPC simulated target function, i.e. projection vector, obtains maximum appropriateness value F further _gcorresponding food source, i.e. the globally optimal solution of PPC simulated target function, concrete steps are as follows:

1. obtain M initial food source, each food source X with chaos algorithm _d(d=1,2 ..., M) be one _pdimensional vector;

2. observe honeybee and select food source, and then calculate the probability q that observes honeybee selection food source _d, concrete formula is:

q_{d} = F_{d} / Σ_{e = 1}^{M} F_{e};

Wherein, F _drepresent d the fitness of separating, F _erepresent e the fitness of separating;

3. gathering honey honeybee and observation honeybee are searched for primitive solution neighborhood in memory, and concrete formula is:

V _d,j＝X _d,j+rand _d,j(X _d,j-X _e,j)，

Wherein, V _d,jrepresent honeybee step-size in search, e ∈ 1,2 ..., M}, j ∈ 1,2 ..., the random subscript of selecting in p}, and meet e ≠ j, rand _d,jfor [1,1] interval random number, for having controlled X _d,jthe generation of new explanation represent the comparison of honeybee to food position in 2 visual ranges in field;

4. if food source X _d,jafter the limit time circulation of specifying, still do not improve, solution corresponding to this position will be abandoned; Otherwise the gathering honey honeybee of this position is changed search bee into, searches out new food source, concrete formula is:

X _d＝X _min+rand·(X _max-X _min)，

X _drepresent the New food source calculating, X _maxand X _minrepresent respectively the upper bound and the lower bound of New food source, rand represents to generate the random number of [0,1];

If 5. iterative steps t has exceeded greatest iteration step number N, stop calculating, and export P _g, otherwise repeating step 2 to 4, until output P _gtill;

Wherein, P _g=max (F _g), F _grepresent g the fitness of separating, i.e. the globally optimal solution of PPC simulated target function;

Step 5: propose how to judge that whether optimization procedure tries to achieve the criterion of real globally optimal solution, is specially:

Criterion 1: according to step 1, same achievement data adopts forward index normalization and reverse index normalization to process, its projection vector coefficient or be called the value opposite number each other of weight;

Criterion 2: from criterion 1, if optimal weights,

also must be optimal weights;

Criterion 3: the projection vector coefficient of the index that all sample values all equate or the value that is called weight must equal 0;

Criterion 4: identical two indexs of numerical value after normalization, its projection vector coefficient or the value that is called weight must be identical;

Criterion 5: adopt constraint condition

0≤b _j≤ 1 carries out modeling, and the projection vector coefficient of reverse index or the value that is called weight must equal 0;

Step 6: according to the criterion of step 5, the optimum solution of trying to achieve by three kinds of distinct group intelligent algorithms in step 4 is judged, and then try to achieve globally optimal solution;

According to judging real globally optimal solution criterion in step 5, the result of calculating is judged, if three kinds of distinct group intelligent algorithms all fail to try to achieve globally optimal solution, be R value too little due to, suitably increasing after R value, the process of repeating step four, calculates optimum solution by three kinds of distinct group intelligent algorithms, until try to achieve globally optimal solution.

Beneficial effect of the present invention:

1. five criterions of judgement globally optimal solution that the present invention proposes can guarantee that optimization procedure tries to achieve real globally optimal solution, thereby ensure the correctness of optimum solution and modeling result.

2. the scheme of choosing R value zone of reasonableness that the present invention proposes, had both revised the r that is greater than of some document proposition _maxwrong scheme, avoided again R to get smaller value scheme and may cause optimization procedure to restrain and can not try to achieve the problem of optimum solution, more reasonable; Within the scope of this, the optimum projection vector obtaining according to different R values has disclosed the architectural feature of observing from different directions sample data, has reflected the essence of PPC modeling.

3. the present invention has realized man-machine interaction, and user can be according to the parameter that swarm intelligence algorithm need to be set of self; Can on the basis of raw data, dummy variable be set according to criterion 1,2,3 and 4, be convenient to adjust the most optimized parameter, and then be convenient to judge whether optimization procedure has tried to achieve globally optimal solution.

4. the present invention adopts three kinds of different method for normalizing, three kinds of different colony intelligence optimized algorithms to carry out result of calculation, and user can be analyzed result, and then verifies correctness and the rationality of the globally optimal solution of trying to achieve.

5. the optimization process of whole complexity is all packaged into software by the present invention, adopts graph visualization interface, and person's operation easy to use is more quick, reliable.

Brief description of the drawings

Describe the present invention in detail below in conjunction with the drawings and specific embodiments:

Fig. 1 is the process flow diagram that the present invention is based on the projection pursuit classification model construction software of swarm intelligence algorithm.

Fig. 2 is the structural representation that the present invention is based on the projection pursuit classification model construction software of swarm intelligence algorithm.

Embodiment

For measure, creation characteristic that the technology of the present invention is realized, reach object and effect is easy to understand, below in conjunction with concrete diagram, further set forth the present invention.

As shown in Figure 2, the projection pursuit classification model construction software based on swarm intelligence algorithm, comprising: sample data module 1, for sample data is gathered; Method for normalizing module 2, is connected with sample data module 1, for the sample data collecting is carried out to pre-service; PPC MBM 3, is connected with method for normalizing module 2, with setting up PPC model through pretreated sample data; Globally optimal solution module 4, be connected with PPC MBM 3, set rational R value, try to achieve globally optimal solution by colony intelligence optimization algorithm, and judge that according to the criterion of whether trying to achieve real globally optimal solution whether the globally optimal solution that optimization procedure tries to achieve is correct, thereby try to achieve optimum projection vector and sample projection value.

Method for normalizing module 2, also comprises: extreme difference normalization method module 201, maximal value normalization method module 202, zero-mean normalization method module 203.

Globally optimal solution module 4, also comprises: particle cluster algorithm module 401, multi-Agent Genetic Algorithm module 402, chaos ant colony algorithm module 403.

As shown in Figure 1, the implementation method of the projection pursuit classification model construction software based on swarm intelligence algorithm, comprises the steps:

Step 1: sample data normalization pre-service;

First by sample data acquisition module 1 collecting sample data, collecting sample data, because the variation range of the unit between the each index of sample data, evaluation criterion, numerical values recited etc. all exists larger difference, in order to retain as far as possible relative change information and the rule between evaluation index value in raw sample data, must carry out pre-service to raw sample data, provide three kinds of different method for normalizing to carry out pre-service to raw sample data by method for normalizing module 2, obtain forward index and reverse index, concrete grammar is:

Forward index:

Reverse index:

Forward index:

Reverse index:

Forward index:

Reverse index:

Wherein, in three kinds of different method for normalizing, x _i,j,

be maximal value and the minimum value of j index respectively;

be the average of j index, be the standard deviation of j index;

Step 2: set up the objective function Q (b) of PPC model by PPC MBM 3, and by solving the maximal value of objective function Q (b), obtain further optimum projection vector coefficient or be called the b of weight _j;

First,

Q(b)＝max(S _y*D _y)，

Constraint condition:

-1≤bj≤1,

Wherein, the standard deviation of sample subpoint

n is number of samples,

for the average of y (i); Local density

Step 3: according to peaked the solving of objective function Q (b) to PPC model in step 2, or be called the optimization of objective function, proposed a kind of more rationally, the span of R value accurately, be specially;

r _max5≤R≤r _max3,

Wherein, R is windows radius, r _maxfor the ultimate range between all samples;

There is three kinds of schemes: 1. smaller value scheme, i.e. R≤0.1S in the known span of determining at present R value _z; 2. higher value scheme, i.e. r _max≤ R≤2p; 3. appropriate value scheme in the middle of, i.e. r _max5≤R≤r _max3.

1. smaller value scheme: as R≤0.1S _ztime, due to r _i,k=r _k,i, and in the time of i=k, r _i,k=0, therefore, the local density in step 2

D_{y} = Σ_{i = 1}^{n} Σ_{k = 1}^{n} (R - r_{i, k}) . f (R - r_{i, k}) = n . R + 2 Σ_{i = 1}^{n} Σ_{k = 1}^{n} (R - r_{i, k}) . f (R - r_{i, k}),

In the time that R gets smaller value, most subpoints are had to R < r _i,k(i ≠ k), thus make f (R-r _i,k)=0,

value very little, thereby have

D_{y} = n . R + 2 Σ_{i = 1}^{n} Σ_{k = i + 1}^{n} (R - r_{i, k}) . f (R - r_{i, k}) = n . R + ξ (R - r_{i, k}) &cong; n . R,

Wherein, ξ (R-r _i,k) be very little value, from step 2, now objective function

can find out the value of Q (b) density S only and between class from objective function Q (b) _yrelevant, this formula is that subpoint disperses on the whole as far as possible, and in the PPC modeling basic thought that this and Friedman propose, in class, sample subpoint is intensive as far as possible, and the conclusion of disperseing as far as possible between class and class contradicts, and therefore, it is irrational that R gets smaller value scheme;

2. higher value scheme: work as r _maxwhen≤R≤2p, i.e. R>=r _max>=r _i,ktime, i.e. R-r _i,k>=0, from step 2, f (R-r _i,k)=1,

D_{y} = Σ_{i = 1}^{n} Σ_{k = 1}^{n} (R - r_{i, k}) . f (R - r_{i, k}) = Σ_{i = 1}^{n} Σ_{k = 1}^{n} (R - r_{i, k}) = n^{2} R - Σ_{i = 1}^{n} Σ_{k = 1}^{n} r_{i, k},

Now objective function

Q (b) = \max (S_{y} * D_{y}) = \max (S_{y} * (n^{2} R - Σ_{i = 1}^{n} Σ_{k = 1}^{n} r_{i, k})),

Therefore, make objective function Q (b) obtain maximal value, S _ybe the bigger the better, overstepping the bounds of propriety loose better between sample point; r _i,kthe smaller the better, more intensive better between sample point; As R>=r _maxtime, all sample points are all in the same window, and the requirement of choosing reasonable R value proposing with Friedman contradicts, and therefore, R gets and is greater than r _maxscheme be wrong in principle.

3. appropriate value scheme in the middle of: work as r _max5≤R≤r _max3 o'clock, the ultimate range between sample point was r _maxwhen window sliding, comprise all the time 1/5～1/3 sample point, sample point number in window changes can be too unobvious, in the time that sample point increases, can not increase too many yet, the PPC modeling basic thought that these features and Friedman propose and definite R value need satisfied requirement to match, also with people's ordinary practice things be divided into 3～5 classes study consistent, therefore, R get in the middle of appropriateness value scheme be the most rational;

Step 4: by the span of the R value that defines in globally optimal solution module 4 and step 3, the objective function Q (b) that application group intelligence optimization algorithm solves PPC model, tries to achieve globally optimal solution P further _g;

2. particle cluster algorithm, the concrete formula of speed and position is:

\{\begin{matrix} V_{d} (t + 1) = ω V_{d} (t) + c_{1} . {rand}_{1} . (P_{d} - X_{d} (t)) + c_{2} . {rand}_{2} . (P_{d} - X_{d} (t)) \\ X_{d} (t + 1) = X_{d} (t) + V_{d} (t + 1) \end{matrix},

Constraint condition: s.t.-V _max< V _d(t+1) < V _max,

2. multi-Agent Genetic Algorithm, concrete steps are:

1. initialization size each intelligent body L _d,e=(X ₁, X ₂..., X _p),

Wherein, represent initial intelligent body size, L ⁰represent original state Agent Grid;

Wherein, L ^trepresent that t is for Agent Grid, L ^t+1/3l ^tand L ^t+1between centre for Agent Grid, t represents iterative steps;

1. obtain M initial food source, each food source X with chaos algorithm _d(d=1,2 ..., M) be a p dimensional vector;

q_{d} = F_{d} / Σ_{e = 1}^{M} F_{e};

V _d,j＝X _d,j+rand _d,j(X _d,j-X _e,j)，

X _d＝X _min+rand·(X _max-X _min)，

If 5. iterative steps t has exceeded greatest iteration step number N, stop calculating, and export P _g, otherwise repeating step two to four, until output P _gtill;

Be convenient to user and carry out verification comparative analysis, in use can be optionally wherein a kind of algorithm solve.

Step 5: obtained optimum solution according to any one algorithm in step 4, but three kinds of swarm intelligence algorithms itself can not ensure optimization procedure and necessarily try to achieve globally optimal solution, therefore, judge whether PPC model searching process tries to achieve the criterion of globally optimal solution most important, to ensure whether effectively important evidence of whole solution procedure, therefore, proposed 5 and how to have judged that whether optimization procedure tries to achieve the criterion of real globally optimal solution, is specially:

Suppose m index

be normalized according to the mode being the bigger the better, its value is x _i,m, optimal weights is b _m, the projection value of each sample is y (i)=b ₁x _{i, 1}+ b ₂x _{i, 2}+ ...+b _mx _i,m+ ...+b _px _i,p, now change the normalization mode of this index, be normalized according to the smaller the better mode, its value becomes

according to criterion 1, now to prove that its optimal weights must be-b _m.

Adopt reduction to absurdity: the optimal weights of suppose to change after the normalization mode of m index is-b _m, projection vector is the projection value of each sample is

y^{*} (i) = Σ_{j = 1}^{p} b_{j} x_{i, j} = b_{1} x_{i, 1} + b_{2} x_{i, 2} + . . . + (- b_{m}) (1 - x_{i, m}) + . . . + b_{p} x_{i, p} = y (i) - b_{m},

Be y ^*(i)-y (i)=-bm=constant, thereby have | y ^*(i)-y ^*(k) |=| [y (i)-b _my]-[(k)-b _m] |=| y (i)-y (k) |, change before and after m index normalization mode, the distance between any two samples remains unchanged, and changes the standard deviation S of front and back sample projection value _ywith local density D _yalso remain unchanged, thereby have objective function Q (b) also to remain unchanged, therefore, if-b _mit not reverse index

optimal weights, optimum projection vector that neither Q (b), this and null hypothesis b _mthat optimal weights contradicts, therefore criterion 1 must be demonstrate,proved;

Criterion 2: from criterion 1, if

optimal weights,

also must be optimal weights;

After supposing the normalization of the individual index of u and s(u ≠ s), numerical value is identical, show that by modeling its weight is respectively b _uand b _s, now the position of two achievement datas to be exchanged, the weight again obtaining by modeling is respectively b _sand b _u, due to before and after transposition, in fact there is not any change in data, so the weight of trying to achieve must equate, therefore criterion 4 must be demonstrate,proved;

Criterion 5: adopt constraint condition

Suppose that m index is reverse index, adopts constraint condition

-1≤b _jthe weight of≤1 o'clock is-b _m,-b _m≤ 0, if adopt constraint condition 0≤b _j≤ 1 carries out modeling, and weight should be more than or equal to 0, gets-b _m>=0, must there is b _m=0, therefore criterion 5 must be demonstrate,proved;

Five criterions of judgement globally optimal solution that the present invention proposes can guarantee that optimization procedure tries to achieve real globally optimal solution, thereby ensure the correctness of optimum solution and modeling result.

The scheme of choosing R value zone of reasonableness that the present invention proposes, had both revised the r that is greater than of some document proposition _maxwrong scheme, avoided again R to get smaller value scheme and may cause optimization procedure to restrain and can not try to achieve the problem of optimum solution, more reasonable; Within the scope of this, the optimum projection vector obtaining according to different R values has disclosed the architectural feature of observing from different directions sample data, has reflected the essence of PPC modeling.

The present invention has realized man-machine interaction, and user can be according to the parameter that swarm intelligence algorithm need to be set of self; Can on the basis of raw data, dummy variable be set according to criterion 1,2,3 and 4, be convenient to adjust the most optimized parameter, and then be convenient to judge whether optimization procedure has tried to achieve globally optimal solution.

The present invention adopts three kinds of different method for normalizing, three kinds of different colony intelligence optimized algorithms to carry out result of calculation, and user can be analyzed result, and then verifies correctness and the rationality of the globally optimal solution of trying to achieve.

The optimization process of whole complexity is all packaged into software by the present invention, adopts graph visualization interface, and person's operation easy to use is more quick, reliable.

More than show and described ultimate principle of the present invention, principal character and advantage of the present invention.The technician of the industry should understand; the present invention is not restricted to the described embodiments; that in above-described embodiment and instructions, describes just illustrates principle of the present invention; the present invention also has various changes and modifications without departing from the spirit and scope of the present invention, and these changes and improvements all fall in the claimed scope of the invention.The claimed scope of the present invention is defined by appending claims and equivalent thereof.

Claims

1. the projection pursuit classification model construction software based on swarm intelligence algorithm, is characterized in that, comprising:

Sample data module, for gathering sample data;

Method for normalizing module, is connected with described sample data module, for the sample data collecting is carried out to pre-service;

PPC MBM, is connected with described method for normalizing module, with setting up PPC model through pretreated sample data; Globally optimal solution module, be connected with described PPC MBM, set rational R value, try to achieve globally optimal solution by colony intelligence optimization algorithm, and judge that according to the criterion of whether trying to achieve real globally optimal solution whether the globally optimal solution that optimization procedure tries to achieve is correct, thereby try to achieve optimum projection vector and sample projection value.

2. the projection pursuit classification model construction software based on swarm intelligence algorithm according to claim 1, is characterized in that, described method for normalizing module, also comprises: extreme difference normalization method module, maximal value normalization method module, zero-mean normalization method module.

3. the projection pursuit classification model construction software based on swarm intelligence algorithm according to claim 1, is characterized in that, described globally optimal solution module, also comprises: particle cluster algorithm module, multi-Agent Genetic Algorithm module, chaos ant colony algorithm module.

4. the implementation method of the projection pursuit classification model construction software based on swarm intelligence algorithm, is characterized in that, comprises the steps: step 1: sample data normalization pre-service;

First by described sample data acquisition module collecting sample data, because the variation range of the unit between the each index of sample data, evaluation criterion, numerical values recited etc. all exists larger difference, in order to retain as far as possible relative change information and the rule between evaluation index value in raw sample data, must carry out pre-service to raw sample data, provide three kinds of different method for normalizing to carry out pre-service to raw sample data by method for normalizing module, obtain forward index and reverse index, concrete grammar is:

Forward index:

Reverse index:

Forward index:

Reverse index:

Forward index:

Reverse index:

Wherein, in three kinds of different method for normalizing, x _i,j,

be maximal value and the minimum value of j index respectively; be the average of j index,

be the standard deviation of j index;

Step 2: set up the objective function Q (b) of PPC model by PPC MBM, and by solving the maximal value of objective function Q (b), obtain further optimum projection vector coefficient or be called the b of weight _j;

First, order

Disperse as far as possible on the whole and local intensive as far as possible basic thought according to all sample subpoints of PPC modeling demand, show that the objective function Q (b) of PPC model is for the standard deviation S of sample subpoint _ywith local density D _yproduct, by solving the maximal value of objective function Q (b), obtain further optimum projection vector coefficient or be called the b of weight _j, that is: Q (b)=max (S _y* D _y),

Constraint condition:

-1≤b _j≤ 1,

Wherein, the standard deviation of sample subpoint

n is number of samples,

for the average of y (i); Local density

Step 3: according to peaked the solving of objective function Q (b) to PPC model in described step 2, or be called the optimization of objective function, proposed a kind of more rationally, the span of R value accurately, be specially;

r _max5≤R≤r _max/3,

Wherein, R is windows radius, r _maxfor the ultimate range between all samples;

Step 4: by the span of the R value that defines in described globally optimal solution module and described step 3, the objective function Q (b) that application group intelligence optimization algorithm solves PPC model, tries to achieve globally optimal solution P further _g;

1. particle cluster algorithm, the concrete formula of speed and position is:

Constraint condition: s.t.-V _max< V _d(t+1) < V _max,

Wherein, c ₁representing self study coefficient, is constant, and occurrence is c ₁=2, c ₂representing social learning's coefficient, is constant, and occurrence is c ₂=2; Rand ₁and rand ₂for the random number of [0,1]; ω represents inertial coefficient, from maximal value ω _maxlinearity is reduced to minimum value ω _min, computing formula is ω in this algorithm _max=0.9, ω _min=0.4, iterative steps t=1,2 ..., N, N is greatest iteration step number, V _maxrepresent the maximal rate that particle moves, V in this algorithm _max=0.5, V _d(t) represent the translational speed of d particle in the time that iterative steps is t, X _d(t) represent the position of d particle in the time that iterative steps is t, d=1,2 ..., M, M is population scale, P _drepresent the optimal location of d particle experience, P _grepresent the optimal location that in colony, preferably particle experiences, delete and select globally optimal solution according to constraint condition; In the time that t has exceeded greatest iteration step number N, stop calculating;

2. multi-Agent Genetic Algorithm, concrete steps are:

1. initialization size

each intelligent body L _d,e=(X ₁, X ₂..., X _p),

be all p dimension vector of unit length, intelligent body L _d,ethe value of each component upper in [1,1] ,-1≤X _j≤ 1, j=1～p, wherein,

Wherein, L ^t+2/3lt and L ^t+1between centre for Agent Grid, U (0,1) represents relative search radius, P _cbe predefined parameter, for the execution of orthogonal crossover operator in control field, U in this algorithm (0,1) is the random number of [0,1], P _c=0.1;

If 5. iterative steps t has exceeded greatest iteration step number N, stop calculating, and export P _g, otherwise make t+1 → t and repeat the calculating of described step 2 to four, again obtain L ^t+1/3, L ^t+2/3and L ^t+1;

V _d,j＝X _d,j+rand _d,j(X _d,j-X _e,j)，

X _d＝X _min+rand·(X _max-X _min)，

X _drepresent the New food source calculating, X _maxand X _minrepresent respectively the upper bound and the lower bound of New food source, rand represents to generate the random number of [0,1].

If 5. iterative steps t has exceeded greatest iteration step number N, stop calculating, and export P _g, otherwise repeating said steps two to four, until output P _gtill;

Wherein, P _g=max (F _g), F _grepresent g the fitness of separating, i.e. the globally optimal solution of PPC simulated target function; Step 5: propose how to judge that whether optimization procedure tries to achieve the criterion of real globally optimal solution, is specially:

Criterion 1: according to described step 1, same achievement data adopts forward index normalization and reverse index normalization to process, its projection vector coefficient or be called the value opposite number each other of weight;

Criterion 2: from described criterion 1, if

optimal weights,

also must be optimal weights;

Criterion 5: adopt constraint condition

Step 6: according to the criterion of described step 5, the optimum solution of trying to achieve by three kinds of distinct group intelligent algorithms in step 4 is judged, and then try to achieve globally optimal solution;

According to judging real globally optimal solution criterion in described step 5, the result of calculating is judged, if three kinds of distinct group intelligent algorithms all fail to try to achieve globally optimal solution, be R value too little due to, suitably increasing after R value, the process of repeating said steps four, calculates optimum solution by these three kinds of distinct group intelligent algorithms, until try to achieve globally optimal solution.