CN108595499A - A kind of population cluster High dimensional data analysis method of clone's optimization - Google Patents

A kind of population cluster High dimensional data analysis method of clone's optimization Download PDF

Info

Publication number
CN108595499A
CN108595499A CN201810221722.8A CN201810221722A CN108595499A CN 108595499 A CN108595499 A CN 108595499A CN 201810221722 A CN201810221722 A CN 201810221722A CN 108595499 A CN108595499 A CN 108595499A
Authority
CN
China
Prior art keywords
particle
population
clone
cluster
clustering
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN201810221722.8A
Other languages
Chinese (zh)
Inventor
罗养霞
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
XI'AN UNIVERSITY OF FINANCE AND ECONOMICS
Original Assignee
XI'AN UNIVERSITY OF FINANCE AND ECONOMICS
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by XI'AN UNIVERSITY OF FINANCE AND ECONOMICS filed Critical XI'AN UNIVERSITY OF FINANCE AND ECONOMICS
Priority to CN201810221722.8A priority Critical patent/CN108595499A/en
Publication of CN108595499A publication Critical patent/CN108595499A/en
Pending legal-status Critical Current

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/23Clustering techniques
    • G06F18/232Non-hierarchical techniques
    • G06F18/2321Non-hierarchical techniques using statistics or function optimisation, e.g. modelling of probability density functions
    • G06F18/23213Non-hierarchical techniques using statistics or function optimisation, e.g. modelling of probability density functions with fixed number of clusters, e.g. K-means clustering
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/004Artificial life, i.e. computing arrangements simulating life
    • G06N3/006Artificial life, i.e. computing arrangements simulating life based on simulated virtual individual or collective life forms, e.g. social simulations or particle swarm optimisation [PSO]

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Data Mining & Analysis (AREA)
  • Evolutionary Computation (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Artificial Intelligence (AREA)
  • General Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • Computing Systems (AREA)
  • Software Systems (AREA)
  • Molecular Biology (AREA)
  • Computational Linguistics (AREA)
  • Biophysics (AREA)
  • Biomedical Technology (AREA)
  • Mathematical Physics (AREA)
  • General Health & Medical Sciences (AREA)
  • Health & Medical Sciences (AREA)
  • Probability & Statistics with Applications (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Evolutionary Biology (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

The invention belongs to High dimensional space data analysis technical fields, a kind of population cluster High dimensional data analysis method of clone's optimization are disclosed, using based on clone's dynamic select population clustering technique;The assessment measure of combined coding mechanism and feature based dimension contribution rate based on restriction.Particle Swarm Theory is applied in high dimensional data clustering, guidance ground random search cluster centre vector is concentrated in data using the Optimizing Search mechanism of particle cluster algorithm.Each particle is considered as antibody, regard a kind of clustering cluster dividing mode of data set to be clustered as, particle is optimized and immunoevolution simultaneously, when dynamic evolution, particle by its affinity height it is directly proportional into Mobile state clone, by antibody concentration be inversely proportional carry out clone inhibition, by affinity height be inversely proportional carry out local variations.The present invention, which is effectively prevented from, is absorbed in local optimum, improves the stability and reliability of clustering algorithm.Accelerate high dimensional data search process, prevents from being absorbed in suboptimization.

Description

A kind of population cluster High dimensional data analysis method of clone's optimization
Technical field
The invention belongs to High dimensional space data analysis technical fields more particularly to a kind of population of clone's optimization to cluster height Dimension data analysis method.
Background technology
In recent years, data mining causes the very big concern of information industry circle and entire society, the reason is that in daily life In the presence of largely can be with widely used data, and the presence of high dimensional data in practice be more universal, and there is an urgent need to by data It is converted into useful information and knowledge.Currently, the clustering algorithm of low-dimensional data comparative maturity, but in practical applications, it is high The data of dimension, for example, the data of finance data, retail business, the data of telecommunications industry and biological data generally existing.Data It is influenced by " dimension calamity " (the curse of dimensionality), many traditional clustering algorithms apply to high dimension Often fail according to upper, exist such as to initial value it is sensitive, be easily trapped into that local best points, algorithm retractility are poor, can not handle The problems such as large-scale data.Therefore, there is very important theory significance to the research of high dimensional data clustering and applies valence Value.
High dimensional data is a highly important task in clustering, many applications need to comprising a large amount of characteristic items or The object of person's dimension is analyzed.It may be incoherent that its data characteristics, which is between multiple dimensions, with the increase of dimension, data What is become is more and more sparse so that the distance between point loses meaning in pairs, and the averag density between data becomes very low.Tradition cluster When method is to high dimensional data clustering, the problem is that:1. high dimensional data, which is concentrated, has a large amount of unrelated attributes so that in institute Have in dimension and there is a possibility that cluster is almost nil;2. the data of high-dimensional data space, dilute compared with the data distribution in lower dimensional space It dredges, distance is almost equal between data, it is difficult to be measured with distance.
Inside data mining, in order to meet the needs of numerous users in different application field, researchers propose very Spininess mainly has the cluster of (1) based on dimensionality reduction to the clustering method of high dimensional data;(2) subspace clustering;(3) based on hypergraph Cluster;(4) joint cluster.Dimensionality reduction be exactly by Mapping of data points to more low-dimensional spatially to seek the compact representation of data A kind of technology, the compact representation of this lower dimensional space is beneficial to be further processed data.Different dimension reduction methods, it Seek that the mode that the low-dimensional of high dimensional data indicates is different, and the data and the degree of approximation of initial data after dimensionality reduction are also different, It is also different to their clustering performance.Its maximum disadvantage be it a specific criterion is not provided evaluate from The quality that higher-dimension is converted to low-dimensional.And for the data of very higher-dimension, the training process convergence of cluster can be very slow.It is sub empty Between cluster be also known as feature selecting, it is divided into original data space different subspaces, only on those relevant subspaces Investigate the presence of cluster.Such algorithm can find the cluster of any type and shape in any amount dimension, result in theory It is made of the cluster of one group of different subspace, and can be represented by a disjunctive expression, and need not determine dimension amount in advance. The disadvantage is that if parameter setting is improper, it is likely to leave out some important clusters in the beta pruning stage, specified to one For data set, to determine that these parameters are extremely difficult.The relationship map that high dimensional data is asked is arrived based on the clustering method of hypergraph On one hypergraph, the relationship of certain data is expressed on the super side of each in figure, and the weights on side then indicate the close of corresponding relation Degree.This method biggest advantage is that it does not have to calculate the similarity between high dimensional data during cluster, therefore calculates The time complexity of method is relatively low.But foot point is not that the data type of cluster is restricted.The thought of joint cluster is exactly that will first gather The attribute of class data set is divided into several groups, then represents the set of properties for each set of properties one new attribute of proposition, after And carry out high dimensional data cluster for several attributes derived from.The deficiency of this method is the raising of cluster data quality Dependent on the cluster of its attribute, and attribute is clustered and also has to depend on corresponding data set.All due to each method There is its advantage and defect, is not that a kind of algorithm can in practical applications can be according to particular problem suitable for all situations The characteristics of select suitable algorithm.
Clone's optimization population cluster high dimensional data method that this scheme proposes is in conjunction with the excellent of dimensionality reduction and subspace clustering The searching method of point design.Dimensionality reduction technology is typically to pass through feature selecting (Feature selection) or eigentransformation (Feature transforma-tion) can utilize traditional gather by original high-dimensional data space reduction to compared with lower dimensional space Class method completes clustering processing.Feature selection approach is the requirement or data set characteristic according to cluster target, from all attributes Important attribute set is selected to be clustered.In general, feature selecting includes two parts, first, being carried out to each character subset Search, second is that being evaluated character subset by certain criterion.Subspace clustering (Subspace Clustering) is different Class be present in different subspaces, such method seeks to effectively extract the cluster for being present in subspace.With the total space Dimension reduction method it is different, subspace clustering is that each cluster searches for its corresponding subspace.It, will be sub empty according to the difference of the direction of search Between clustering method be divided into two major classes:The searching method of bottom-up (Bottom-up Subspace Search) and top-down The searching method of (Top-bottom subspace search).The elder generation in correlation rule is utilized in bottom-up searching method Property is tested, merges neighbouring dense cell to form cluster.CLIQUE algorithms first with correlation rule priori decision search and Merge the grid that density is more than given threshold value, forms candidate subspace, and its subspace midpoint is pressed into these candidate subspaces The size of quantity (covering) sorts, followed by Minimum description length criterion by the lower subspace beta pruning of scale.It is top-down Searcher rule be to be scanned for subspace according to direction from top to bottom.PROCLUS algorithms are that earliest use is pushed up certainly And the projected clustering algorithm of lower search strategy.PROCLUS is an algorithm based on central point, uses random sampling and Greedy Method combines and selects some cluster central points, then calculates the weight often tieed up to each cluster with determining discriminant function, is constantly changing The weight that dimension is adjusted during generation, finally finds out the class around these central points.DOC algorithms be used simultaneously from bottom to On grid policies and top-down iterative modification cluster quality strategy, and propose a kind of determining for optimal projective clustering Justice, but it still needs further improvement for the precision and operational efficiency of DOC algorithms.
Algorithm above is the main thought with the relevant Clustering Algorithm of Hi-dimensional Dataset of this programme, feature selecting or eigentransformation It is to find all clusters inside the same proper subspace, has ignored inside high-dimensional data space, different clusters may has Different proper subspaces;Subspace clustering method can then make different clusters there are different subspaces, but such methods Computational complexity it is higher.
In conclusion problem of the existing technology is:When traditional clustering method is to high dimensional data clustering, due to higher-dimension There are a large amount of unrelated attributes in data set, it is sparse compared with the data distribution in lower dimensional space so that there are clusters in all dimensions Possibility it is almost nil, cluster when, it is difficult to accomplish Fast Convergent, and ensure that global search is optimal.
Particle cluster algorithm is the optimization algorithm based on swarm intelligence theory, compares emphasis and searches for premium class in whole dimension spaces Central point, the intensive good subspace of search data set is clustered.It is generated by the interparticle cooperation and competition of population Swarm intelligence instructs Optimizing Search, convergence rate very fast.Evolution Theory have it is stronger identification, study, memory and it is adaptive should be able to Power, clone operations realize the expansion in antibody population space, and the antibody to generate new provides basis.This research one side grain to be utilized Swarm optimization guiding search direction reaches effective quick clustering convergence;On the other hand each iteration of particle cluster algorithm is generated As a result it is cloned, the search result of particle cluster algorithm is expanded to the population space of bigger, by being carried out not to portion gene More fine local search is realized in variation with degree, recompresses search result to original seed group space size by selection, To ensure that cluster has good global search and local search performance.
The groundwork of this patent is combined Immune Clone Selection with particle swarm optimization algorithm in clustering, establishing base It, in conjunction with Immune Clone Selection mechanism, is constructed on the basis of Further aim function in the high dimensional data Clustering Model of particle cluster algorithm For the population Dynamic Clustering Algorithm of data clusters analysis.Unlike existing research, in terms of particle variations and evolution, It is improved in terms of the assessment measurement of particle group coding and high dimensional data feature dimensions, overcomes traditional clustering algorithm sensitive to initial value The shortcomings that, the stability of high dimensional data cluster is improved, research is clustered for high dimensional data and application provides Technical Reference.
Invention content
In view of the problems of the existing technology, the present invention provides a kind of population cluster high dimensional datas point of clone's optimization Analysis method.
The invention is realized in this way a kind of population of clone's optimization clusters High dimensional data analysis method, the clone The population cluster High dimensional data analysis method of optimization generates N number of particle, adjusts the position of this N number of particle, calculates corresponding suitable Response;The clone of different number is carried out according to its antibody-antigene affinity and antibody-antibody similarity to N number of particle;Clone's Antibody, with the more respective antibody-antigene affinity of original antibody, is retained after the selection for gene by Immune Clone Selection The highest particle of affinity, into next iteration;It to the last produces the optimum antibody of capture antigen or reaches specified Until iterations.
Further, the population cluster High dimensional data analysis method of clone's optimization includes the following steps:
Step 1, the initialization each sample of particle, which is randomly assigned, to be calculated all kinds of for certain one kind as initial clustering Cluster centre, as the position encoded of primary;N times are repeated in the speed for initializing particle, and symbiosis is at N number of initial Population;
Step 2 calculates the contribution rate each tieed up in every one kind in each particle to such, the highest s dimension of contribution rate The serial number of dimension calculates the fitness of particle as feature dimensions;
Step 3 compares the fitness for the desired positions Best_id that fitness value is lived through with it to each particle Value, if more preferably, updating Best_id;
Step 4 compares fitness value and the fitness of desired positions Best_id that group is undergone to each particle Value, if more preferably, updating Best_Value;
Step 5 adjusts speed and the position of particle;
Step 6, the k mean clusters of new individual;
Step 7 reaches algorithm termination condition, then terminates;Otherwise two are gone to step.
Further, particle initialization includes with coding in the step 1:The space encoder of design is quasi- to be made of three parts (SUP, CEP, CPV), wherein SUP indicate that the real coding string of proper subspace, CEP indicate the real coding string at class center, CPV Indicate class center degree of change (record update position, for adjusting global and local consistency).Initial population is given birth in a random basis At a feature dimensions of random selection SUP_maxnumber (maximum feature dimensions number) and CEP_maxnnumber (maximum classes Number) a data object carries out coding composition individual, and then iteration N_size (scale of preset initial population) is secondary, that is, completes The generation of initial population.
Further, fitness function calculates in the step 2, is indicated the contribution rate of subspace clustering with feature dimensions;
K with { C1,C2,…CkCentered on subspace class { A1,A2,…Ak, to each subclass Ai(i=1,2 ..., K) it is measured, contribution rate metric evaluation function is as follows:
J expressions contain intrinsic dimensionality in subspace,Indicate class AiOn data point jth dimension and the of central point J ties up distance, and value is smaller, indicates class AiBe class on feature dimensions j it is compact, also referred to as ties up j to class AiContribution it is big, FijValue It is bigger;Conversely, claiming dimension j to class AiContribution it is small.Calculate AiAll feature dimensions are to A in classiContribution and be expressed as μi
The sum of fitness by all K classes indicates the fitness of all particles:
Further, Immune Clone Selection dynamic clustering specifically includes in the step 1:
● the position Z of each particle in initial initialization populationi={ Zi1, Zi2..., ZikAnd speed Vi={ Vi1, Vi1..., Vik};
● While (current iteration number t<T);// provide cycle qualifications
● Fori=1to populations N//cluster starts;
● minimal distance principle is pressed by all vector X' in X 'jIt assigns in the class cluster that a cluster centre Zij is represented;
● calculate the adaptive value of each particle;
● clone's quantity is calculated, particle is cloned;
● Immune Clone Selection is carried out to data;
● update the current optimal solution of each particle;
● the current optimal solution of update group;
● the speed of more new particle and position;
●End for
● endwhile//cycle terminates
● calculate the index of Clustering Effect;
● output cluster result;
● terminate
Further, the population cluster High dimensional data analysis method of clone's optimization includes:Initialize population size N, maximum iteration T, variation amplitude coefficient lambda, antibody likeness coefficient η cluster manifold X, as follows:
Further, particle is evaluated and is measured according to formula in the step 5;
K with { C1,C2,…Ck, centered on subspace class { A1,A2,…Ak, to each subclass Ai(i=1, 2 ..., k) it is measured, contribution rate metric evaluation function is as follows:
J expressions contain intrinsic dimensionality in subspace,Indicate class AiOn data point jth dimension and the of central point J ties up distance, and value is smaller, indicates class AiBe class on feature dimensions j it is compact, also referred to as ties up j to class AiContribution it is big, FijValue It is bigger;Conversely, claiming dimension j to class AiContribution it is small.Calculate AiAll feature dimensions are to A in classiContribution and be expressed as μi
The sum of fitness by all K classes indicates the fitness of all particles:
Further, particle of new generation is clustered according to following k mean algorithms in the step 6:
(1) it is encoded according to the cluster centre of particle, according to arest neighbors rule, determines the clustering of the corresponding particle;
(2) according to clustering, new cluster centre is calculated, the fitness value of more new particle updates original encoded radio.
Further particle is cloned, is positively correlated by affinity, clone's thought of concentration inverse correlation.Formula defines table Show as follows:
Wherein a is clone's upper limit quantity, Fi_AffinityIndicate affinity degree, FsimilarityIndicate that similarity, β indicate antibody kind The size of group,Indicating that certain similar population number accounts for total antibody population number ratio, ratio is higher, and concentration is bigger, gram Grand number is smaller.
Further, the particle position and speed of the Immune Clone Selection Dynamic Clustering Algorithm:
Vi'd=wVid+n1rand1(Pid-Xid)+n2rand2(Pgd-Xid);
X′id=Xid+Vid
Parameter selection includes three parameters:w、n1、n2, maximum speed Vmax, maximum position Xmax, Population Size:W takes 0.4 To 0.8, n1、n2Take 1.0 to 2.0, maximum speed:Vmax=0.2*Xmax, maximum position Xmax=max (Xi)<Per one-dimensional maximum Value>, Population Size:N=20,30,40,50.
It is normalized to X '={ x '1,x'2,…x'n, Clustering Effect index I (k) obtains the k of maximum value as cluster numbers, also It needs to judge the corresponding best cluster results of categorized data set X.
The present invention is theoretical applied in high dimensional data clustering by population (PSO), utilizes the optimization of particle cluster algorithm Search mechanisms are concentrated with guidance ground random search cluster centre vector in data.A group random particles are initialized, are looked for by iteration To optimal solution, in each iteration, particle updates the position of oneself by tracking two " extreme values ", and one is particle itself The preferably solution found, i.e., individual extreme value (p_best), another extreme value are that all particles are searched in the successive dynasties in entire population The optimal solution (g_best) reached in the process, i.e. global extremum, have emphasize it is distributed, relatively easy, individual between it is direct Or indirect reciprocation, there is very strong adaptability and robustness.
Present invention improves over particle group coding and subspace valuation functions, general coding method emphasis is empty in class central point Between encode, and project is improved to combined coding mode, by feature selecting space, the class center space of points (position of particle in corresponding PSO Set) and knots modification (speed of particle in corresponding PSO) three parts of central point constitute jointly space encoder.Subspace is improved to comment Estimate mode, proposes fitness function of the feature based dimension to subspace clustering contribution rate, be the valuation functions of subspace clustering, than More different subspace clustering effects are together evaluated the feature dimensions that cluster result joint subspace includes.
Evolution Theory is applied to clustering problem and solved by the present invention, on the basis of Further aim function, is selected in conjunction with clone Select a good opportunity reason, each particle be considered as antibody, regard a kind of clustering cluster dividing mode of data set to be clustered as, at the same to particle into Row optimization and immunoevolution.In evolutionary process, particle is cloned, is inversely proportional by antibody concentration by its affinity height is directly proportional Carry out clone inhibition, being inversely proportional by affinity height carries out local variations.
Currently, data mining and data analysis have broad application prospects under study for action, the present invention is existed by Clone cells The multiple directions of same particle periphery carry out global or local search, promote the particle tachytelic evolution in population, are solving higher-dimension When the clustering problem of data, the traditional clustering algorithm disadvantage sensitive to initial value is not only overcome, but also can be effectively prevented from sunken Enter local optimum, improves the stability and reliability of clustering algorithm.The traditional clustering algorithm disadvantage sensitive to initial value is overcome, Accelerate high dimensional data search process, prevents from being absorbed in suboptimization;Life is also mostly high dimensional data with other data in practice, Such as biological data, image data, network data, economic data, medical data, utilization and analysis to these data provide skill Art refers to, and to the research that WEB data, text cluster and class internal schema are the clustering problem that non-spherical is spread, is especially adding There is important theory significance and positive facilitation in terms of speed convergence and global optimum.
Description of the drawings
Fig. 1 is the population cluster High dimensional data analysis method flow diagram of clone's optimization provided in an embodiment of the present invention.
Fig. 2 is cluster result schematic diagram of each algorithm provided in an embodiment of the present invention on wine data sets.
Fig. 3 is that the embodiment of the present invention provides cluster result schematic diagram of each algorithm on Ionosphere data sets.
Fig. 4 is cluster result schematic diagram of each algorithm provided in an embodiment of the present invention on spambase data sets.
Specific implementation mode
In order to make the purpose , technical scheme and advantage of the present invention be clearer, with reference to embodiments, to the present invention It is further elaborated.It should be appreciated that the specific embodiments described herein are merely illustrative of the present invention, it is not used to Limit the present invention.
The present invention in terms of particle variations and the evolution, in terms of the assessment measurement of particle group coding and high dimensional data feature dimensions into Row research and improvement, overcome the disadvantage that traditional clustering algorithm is sensitive to initial value, improve the stability of high dimensional data cluster, are High dimensional data cluster research provides practicable theory and technology reference.
The application principle of the present invention is explained in detail below in conjunction with the accompanying drawings.
As shown in Figure 1, the population cluster High dimensional data analysis method of clone's optimization provided in an embodiment of the present invention includes Following steps:
S101:The initialization of population, when initializing particle, by each sample be randomly assigned for certain one kind, as initial Clustering, and calculate all kinds of cluster centres, as the position encoded of primary, and initialize the speed of particle, instead N times are carried out again, and symbiosis is at N number of primary group;
S102:The contribution rate each tieed up in every one kind in each particle to such is calculated, and the highest s dimension of contribution rate The serial number of dimension obtain feature dimensions as such, while calculating the fitness of the particle;
S103:To each particle, compare the fitness for the desired positions Best_id that its fitness value is lived through with it Value, if more preferably, updating Best_id;
S104:To each particle, compare:The fitness for the desired positions Best_Value that fitness value and group are undergone Value, if more preferably, updating Best_Value;
S105:Adjust speed and the position of particle;
S106:The k mean clusters of new individual;
S107:If reaching algorithm termination condition, terminate, otherwise goes to step S102;In the mistake using particle cluster algorithm Cheng Zhong, when carrying out repartitioning classification to individual of new generation less than S106, it is possible to will appear empty class, gather if there is empty Class then takes out the farthest pattern vector of cluster centre from the cluster of some other non-empty, vector is put into empty cluster at random, weight This multiple process, until in division without empty cluster.
In step S105:Speed and the position of particle are adjusted according to formula;
K with { C1,C2,…CkCentered on subspace class { A1,A2,…Ak, to each subclass Ai(i=1,2 ..., K) it is measured, contribution rate metric evaluation function is as follows:
J expressions contain intrinsic dimensionality in subspace,Indicate class AiOn data point jth dimension and the of central point J ties up distance, and value is smaller, indicates class AiBe class on feature dimensions j it is compact, also referred to as ties up j to class AiContribution it is big, FijValue It is bigger;Conversely, claiming dimension j to class AiContribution it is small.Calculate AiAll feature dimensions are to A in classiContribution and be expressed as μi
The sum of fitness by all K classes indicates the fitness of all particles:
The fitness of so entire particle (proper subspace) is exactly the sum of the fitness for seeking k class.
In step s 106:For particle of new generation, clustered according to k mean algorithms below:A, according to particle Cluster centre encodes, and according to arest neighbors rule, determines the clustering of the corresponding particle;B, it according to clustering, calculates new Cluster centre, the fitness value of more new particle update original encoded radio.Since k mean values have stronger local search ability, Therefore introducing the skilled speed of the population of k mean cluster thoughts can greatly improve.
The population cluster High dimensional data analysis method of clone's optimization provided in an embodiment of the present invention is taken for be clustered Data setting cluster number of clusters k=2, and be incremented toRespective optimum kind cluster center is found respectively, finally by each Clustering Effect index I (k) under k values determines cluster numbers and corresponding cluster centre.
First, initialization population size N, maximum iteration T, variation amplitude coefficient lambda, antibody likeness coefficient η gather Class manifold X, as follows:
It is normalized to X '={ x '1,x'2,…x'n, Clustering Effect index I (k) obtains the k of maximum value as cluster numbers, also It needs to judge the corresponding best cluster results of categorized data set X.
The application principle of the present invention is further described with reference to specific embodiment.
The population cluster High dimensional data analysis method of clone's optimization provided in an embodiment of the present invention includes the following steps:
1, particle initialization and coding
Space encoder is quasi- to be made of (SUP, CEP, CPV) three parts, and wherein SUP indicates the real coding of proper subspace String, CEP indicate the real coding string at class center, CPV indicate class center degree of change (record update position, for adjust it is global and Locally coherence), initial population generates in a random basis, random selection SUP_maxnumber (maximum feature dimensions number) A feature dimensions and CEP_maxnnumber (maximum class number) a data object carry out coding composition individual, then iteration N_ Size (scale of preset initial population) is secondary, that is, completes the generation of initial population.
2, fitness function calculates, and is indicated the contribution rate of subspace clustering with feature dimensions;
K with { C1,C2,…Ck, centered on subspace class { A1,A2,…Ak, to each subclass Ai(i=1, 2 ..., k) it is measured, contribution rate metric evaluation function is as follows:
J expressions contain intrinsic dimensionality in subspace,Indicate class AiOn data point jth dimension and the of central point J ties up distance, and value is smaller, indicates class AiBe class on feature dimensions j it is compact, also referred to as ties up j to class AiContribution it is big, FijValue It is bigger;Conversely, claiming dimension j to class AiContribution it is small.
Calculate AiAll feature dimensions are to A in classiContribution and be expressed as μi
The sum of fitness by all K classes indicates the fitness of all particles:
3, Immune Clone Selection, particle cluster algorithm are all vectors for the particle position when being updated to particle position Direction all updates, and is easy to skip more excellent or optimal location, therefore in Immune Clone Selection dynamic clustering, genetic mutation is operated to population The partial gene fragments (each genetic fragment corresponds to a cluster centre) of new particle (i.e. antibody) are pressed after each iteration of algorithm Formula carries out mutation operation, so as to increase the dynamic local search capability to the particle current location.
Clone is defined as follows method:
Population clone determines that affinity is higher with affinity and concentration, and clone's number is bigger, and antibody concentration is higher, gram Grand number is smaller, is positively correlated by affinity, clone's thought of concentration inverse correlation.Formula definition indicates as follows:
Wherein a is clone's upper limit quantity, Fi_AffinityIndicate affinity degree, FsimilarityIndicate that similarity, β indicate antibody kind The size of group,Indicating that certain similar population number accounts for total antibody population number ratio, ratio is higher, and concentration is bigger, gram Grand number is smaller.
4, particle position and speed
V′id=wVid+n1rand1(Pid-Xid)+n2rand2(Pgd-Xid);
X′id=Xid+Vid
Parameter selection includes three parameters:w、n1、n2, maximum speed Vmax, maximum position Xmax, Population Size:W is quasi- 0.4 to 0.8, n is taken in fixed experiment1、n21.0 to 2.0, maximum speed is taken in drafting experiment:Vmax=0.2*Xmax, maximum position Xmax=max (Xi)<Per one-dimensional maximum value>, draft experimental population size:N=20,30,40,50.
5, stopping criterion for iteration determines
The Stopping criteria selection of usual algorithm has following three criterion:
(1) fitness of optimum individual reaches given threshold value.
(2) iterations reach a preset maximum iterations.
(3) when the fitness solved in search process is no longer substantially change after continuous multi-generation.
6. Immune Clone Selection dynamic clustering summary algorithm is as follows:
There are the parameters such as speed, position for Immune Clone Selection Dynamic Clustering Algorithm, and each more new capital of each particle is to pass through What speed and position carried out.
The application effect of the present invention is explained in detail with reference to experiment.
It in order to verify the feasibility and validity of the present invention, is analyzed and is compared by experiment, comparison application is classical to calculate Subspace clustering algorithm-PROCLUS algorithms of method k-means algorithms and classics, and compare the grain with the band clone in project Subgroup High Dimensional Clustering Analysis algorithm is (in experiment referred to as:Clone_POS_Cluster).
Data set is chosen:In order to which whether verification algorithm is effective to high dimensional data cluster, and ensure the practicability of algorithm, chooses Data have two groups, first, real application data, second is that classical machine learning data.Real application data derives from interbank The official Shibor data (http of short-term loan at daily interest interest rate://www.shibor.org official websites), chose for 1500 day of trade, totally 9 Group data.Classical machine learning data source (comes from http in UCI data sets://archive.ics.uci.edu/ml/ nets Location), selected three group data set therein obtained is respectively:Wine data sets, Ionosphere data sets, spambase data Collection.
Cluster result is compared according to following three indexs:
1) Purity purity:It is the another of object of the cluster as obtained by algorithm operation to what extent comprising former single class A kind of measurement:
If purity is bigger, cluster result is more close with known " brass tacks " obtained by algorithm, and Clustering Effect is better.
2)RI:Rand statistics are a kind of to take ideal cluster similarity matrix related to ideal class similarity matrix Spend the measurement as Cluster Validity.Ideal cluster similarity matrix, the i-th j is 1, if two objects i and j are same Otherwise a cluster is 0;Ideal class similarity matrix, the i-th j is 1, if two object i and j in same class, otherwise for 0.Rand statistics can calculate as follows:
Wherein, f00=has the number of the object pair of different class and different clusters;
F01=has the number of the object pair of different class and identical cluster;
f10The number of the object pair of=class having the same and different clusters;
The number of the object pair of f11=classes having the same and identical cluster;
Rand statistics are bigger it can be seen from formula, and cluster result gets over phase with known " brass tacks " obtained by algorithm Closely, Clustering Effect is better.
3) Error_degree error rates remember that data amount check is T in initial data, and the data amount check of the i-th class is Ti, pass through Cluster, obtains i-th1Class corresponds to the i-th class of initial data, and i-th1The data amount check for belonging to original i-th class in the data of class isThen the error rate of the i-th class is:
It remembers the data point that each class is confused after row cluster into and (belongs to i-th1Class and be not belonging to the i-th class) number be T1', then Total false rate is:
Algorithms of different is used in plan respectively, in k-means algorithms, PROCLUS_clustering algorithms and project Population High Dimensional Clustering Analysis algorithm (Clone_POS_Cluster), analysis two groups of different data collection of comparison are cloned, and count above three A Cluster Validity measurement index, provides specific experiment parameter and experiment analysis results.
It is separately operable k-means algorithms, PROCLUS_clustering algorithms and clone's population High Dimensional Clustering Analysis algorithm (Clone_POS_Cluster), and three above Cluster Validity measurement index is counted, specific experimental result and analysis are such as Under:
When running PROCLUS_clustering algorithms, need to set relevant parameter crossover probability P c, mutation probability m P, Number max_fnum, number of clusters mesh max_cnum, population scale popsize and the iteration maximum times max_gen of feature dimensions are selected, Depending on the setting of these parameters will be according to specific data set, specific experiment parameter such as table 1:
The parameter of 1 experimental setup PROCLUS_clustering algorithms of table
When running population High Dimensional Clustering Analysis algorithm, relevant parameter is also set:W generally takes 0.4 to 0.8, and n1, n2 are general 1.0 to 2.0 are taken, maximum speed:Vmax=0.2*Xmax, maximum position Xmax=max (Xi)<Per one-dimensional maximum value>, population Size:The setting of these parameters of N=20,30,40 also will be depending on specific data, specific parameter such as table 2:
2 each parameters of experimental setup PROCLUS_clustering of table
(1) table 3 is k-means algorithms, PROCLUS_clustering algorithms and clone's population High Dimensional Clustering Analysis algorithm (Clone_POS_Cluster) three algorithms are shown in wine data in the sign subspace clustered on wine data sets, Fig. 2 Cluster result on collection.
The proper subspace that 3 each algorithm of table clusters on wine data sets
From figure 2 it can be seen that on wine data sets, the value of the error_drgee of population High Dimensional Clustering Analysis algorithm is most Small, purity second, RI are also the error_drgee values second obtained by second, PROCLUS_clustering algorithms, purity It is maximum with the value of RI, effect it is worst be k-means algorithms, and as can be seen from Table 5, pass through PROCLUS_ The dimension for the optimal solution that clustering algorithms and population High Dimensional Clustering Analysis algorithm (Clone_POS_Cluster) are found all is 8 Dimension, only each class of population High Dimensional Clustering Analysis algorithm must tie up different, to contain jointly in the dimension that the two algorithms are selected dimension Degree is:3,4,5,11, it is believed that this apteryx is important in all dimensions, and the optimal solution that k-means algorithms are found Dimension is to tie up entirely, i.e., 13 dimensions.From the cluster knot solved with upper table and it can be seen from the figure that, clone's population High Dimensional Clustering Analysis algorithm gained The error rate of fruit is minimum, and Clustering Effect is preferable, and the dimension of proper subspace is also minimum, is worked as in three algorithms In, the solution obtained by PROCLUS_clustering algorithms and population High Dimensional Clustering Analysis algorithm (Clone_POS_Cluster) is most Alright, while also illustrating, population higher-dimension algorithm can reduce the influence of " dimension calamity " to a certain extent, poly- to high dimensional data Class is effective.
(2) table 4 is k-means algorithms, PROCLUS_clustering algorithms and clone's population High Dimensional Clustering Analysis algorithm (Clone_POS_Cluster) proper subspace clustered on Ionosphere data sets, Fig. 3 are shown three algorithms and exist Cluster result on Ionosphere data sets.
The proper subspace that 4 each algorithm of table clusters on Ionosphere data sets
It can be seen that on Ionosphere data sets from table 4 and Fig. 3, clone population High Dimensional Clustering Analysis algorithm (Clone_POS_Cluster) the purity values that gained solves most are beaten, RI values are maximum, error_drgee values are minimum;Followed by k- Solution obtained by means algorithms, effect it is worst be solution obtained by PROCLUS_clustering algorithms.PROCLUS_ Clustering algorithms and clone's population High Dimensional Clustering Analysis algorithm can drop to dimension very low, but Clone_POS_ The error rate of Cluster algorithms is lower, purity and RI highers, so effect is more preferable, it is clear that this population high dimensional data Clustering algorithm is effective to dimensionality reduction, can be used for high dimensional data cluster.
(3) table 5 is k-means algorithms, PROCLUS_clustering algorithms and clone's population High Dimensional Clustering Analysis algorithm (Clone_POS_Cluster) proper subspace clustered on spambase data sets, Fig. 4 are shown three algorithms and exist Cluster result on spambase data sets.
The proper subspace that 5 each algorithm of table clusters on spambase data sets
From, as can be seen that on spambase data sets, population High Dimensional Clustering Analysis algorithm gained solves in table 5 and Fig. 4 Purity values are maximum, RI values are maximum, error_drgee values are minimum, and effect is best;Followed by PROCLUS_clustering algorithms The solution of gained, effect it is worst be solution obtained by k-means algorithms.G PROCLUS_clustering algorithms and population higher-dimension Clustering algorithm displays the advantage of high dimensional data, and because the subspace of each class of population High Dimensional Clustering Analysis algorithm Feature dimensions can be different, so can more accurately be clustered, Clustering Effect is better than PROCLUS_clustering algorithms.
The cluster feature subspace dimension of k-means algorithms is 57 dimension of full dimension, PROCLUS_clustering algorithm gained The dimension of solution is identical 13 dimension of each class, and the dimension of clone's population High Dimensional Clustering Analysis algorithm is 13 different dimensions of each class, phase Comparatively, Clone_POS_Cluster algorithms and PROCLUS_clustering algorithms all greatly reduce data set Dimension, Clone_POS_Cluster algorithms but remain better Clustering Effect, are said from this angle, population High Dimensional Clustering Analysis Solution obtained by algorithm is better than solution obtained by k-means algorithms and PROCLUS_clustering algorithms." dimension calamity " is by data Caused by dimension height, therefore, under the premise of ensureing Clustering Effect, the lower the dimension for being desirable to data the better.This experiment Further demonstrate feasibility and validity that Clone_POS_Cluster algorithms cluster high dimensional data.
Summarize so carrying out one for the experimental result of three above algorithm, such as following table:
6 algorithms of different experimental result of table summarizes and compares
From both the above it can be seen from the figure that, clone's population High Dimensional Clustering Analysis algorithm (Clone_POS_ of research Cluster) to dimension be 13 wine data sets, it is good without other two kinds of algorithms, but to dimension be 34 Ionosphere The spambase data sets that data set and dimension are 57 all achieve good effect, and Clone_POS_Cluster algorithms are significantly Ground reduces the dimension of data set, but remains the same Clustering Effect, is said from this angle, population High Dimensional Clustering Analysis algorithm institute Solution obtained by better than k-means algorithms must be solved, it is also slightly more excellent than PROCLUS_clustering algorithm.
In conclusion either in artificial data, or on truthful data, Purity, RI, error_drgee tri- Evaluation of a measurement index to its experimental result, all illustrate the algorithm researched and proposed for high dimensional data cluster be it is effective, The influence of " dimension calamity " can be reduced to a certain extent.
The foregoing is merely illustrative of the preferred embodiments of the present invention, is not intended to limit the invention, all essences in the present invention All any modification, equivalent and improvement etc., should all be included in the protection scope of the present invention made by within refreshing and principle.

Claims (9)

1. a kind of population of clone's optimization clusters High dimensional data analysis method, which is characterized in that the particle of clone's optimization Clustering class High dimensional data analysis method generates N number of particle, adjusts the position of particle, is dynamically selected;It is anti-to N number of particle measurement Similarity between body-antibody and " body-antigen binding affinity passes through measurement results and carries out different interparticle Immune Clone Selections;To grain Son carries out specific mutation operation, measures and compare original antibody antibody-antigene affinity, retains affinity most by Immune Clone Selection High particle, and dynamic updates each particle rapidity of population, position, enters back into next iteration;It to the last exports optimal anti- The population of body, or while reaching given number of iterations, terminate;
Similarity function F between " antibody-antibody "similarity, indicate i and j in n-dimensional space at a distance from, apart from smaller, Similarity is bigger:
" antibody-antigene " affinity of i-th kind of clustering calculates function Fi_Affinity, indicate as follows:
Wherein for given data acquisition system, M is a constant, indicates affinity force coefficient, D (xi,yi) indicate data set to be clustered xiTo the distance of its data set central point, wiIndicate the weighted factor of ith feature attribute, and all characteristic attribute weighted sums are 1。
2. the population of clone's dynamic optimization clusters High dimensional data analysis method as described in claim 1, which is characterized in that institute The population cluster High dimensional data analysis method for stating clone's optimization includes the following steps:
Step 1 initializes population sample, is randomly assigned and classifies, and initialization cluster subgroup initializes the speed of particle, Each population is measured, as the position encoded of primary, n times are repeated, generates N number of primary group;
Step 2 evaluates population based on contribution rate;Specific to calculate per the contribution rate each tieed up in one kind to such, contribution rate is most High preceding m feature serial number is also the fitness of the particle as selected feature dimensions;
Step 3 compares the fitness value for the desired positions Best_id that fitness value is lived through with it, such as to each particle Fruit is more preferable, updates Best_id;
Step 4 compares fitness value and the fitness value of desired positions Best_id that group is undergone to each particle Best_Value, if more preferably, updating Best_Value;
Step 5 adjusts speed and the position of particle;
Step 6, the k mean clusters of new individual;
Step 7 reaches algorithm termination condition, then terminates;Otherwise two are gone to step.
3. the population of clone's optimization as claimed in claim 2 clusters High dimensional data analysis method, which is characterized in that the step Particle coding uses " the combined coding mechanism based on restriction " in rapid one;General coding method focuses on the class center space of points more On, using the real coding scheme based on cluster centre;Space encoder forms SUP, CEP, CPV by three parts, and wherein SUP is indicated The real coding string of proper subspace, CEP indicate that the real coding string at class center, CPV indicate class center degree of change;With specific reference to Its quantized value is encoded bunchiness by respective value range, while under restrictive condition, is effectively shortened code length, is prevented because of particle Length increased dramatically, and runnability is quite declined;Initial population generates in a random basis, random selection SUP_ The maximum feature dimensions number feature dimensions of maxnumber and the maximum class number data objects of CEP_maxnnumber are compiled Code composition individual, then the scale of the preset initial populations of iteration N_size time, that is, complete the generation of initial population.
4. the population of clone's optimization as claimed in claim 2 clusters High dimensional data analysis method, which is characterized in that the step Fitness function calculates in rapid two, is indicated the contribution rate of subspace clustering with feature dimensions;
K with { C1,C2,…Ck, centered on subspace class { A1,A2,…Ak, to each subclass Ai(i=1,2 ..., k) It is measured, contribution rate metric evaluation function is as follows:
J expressions contain intrinsic dimensionality in subspace,Indicate class AiOn the jth dimension of data point tieed up with the jth of central point Distance, value is smaller, indicates class AiBe class on feature dimensions j it is compact, also referred to as ties up j to class AiContribution it is big, FijValue get over Greatly;Conversely, claiming dimension j to class AiContribution it is small;Calculate AiAll feature dimensions are to A in classiContribution and be expressed as μi
The sum of fitness by all K classes indicates the fitness of all particles:
5. the population of clone's optimization as claimed in claim 2 clusters High dimensional data analysis method, which is characterized in that the step Immune Clone Selection dynamic clustering process specifically includes in rapid one:
The position Z of each particle in initial initialization populationi={ Zi1, Zi2..., ZikAnd speed Vi={ Vi1, Vi1..., Vik};
While (current iteration number t<T);// provide cycle qualifications
For i=1 to populations N//cluster starts;
By minimal distance principle by all vector X' in X 'jIt assigns in the class that a cluster centre Zij is represented;
Calculate the adaptive value of each particle;
Clone's quantity is calculated, particle is cloned;
Immune Clone Selection is carried out to data;
Update the current optimal solution of each particle;
Update the current optimal solution of group;
The speed of more new particle and position;
End for
End while//cycle terminates
Calculate the index of Clustering Effect;
Export cluster result;
Terminate.
6. the population of clone's optimization as claimed in claim 5 clusters High dimensional data analysis method, characterized in that the clone It is position and speed to select the newer attribute of particle of Dynamic Clustering Algorithm:
Vi'd=wVid+n1rand1(Pid-Xid)+n2rand2(Pgd-Xid);
X'id=Xid+Vid
Dynamic update includes three parameters:w、n1、n2;It is related to particle rapidity and position:VmaxIndicate maximum speed, XmaxIt indicates most Big position;
Population Size W refers to value range:0.5 to 0.9;
n1、n2Can be 1.0 to 2.0 with reference to value;
Dynamic mapping principle refers to:Maximum speed:Vmax=0.3*Xmax, maximum position Xmax=max (Xi) take per one-dimensional maximum Value;
Population Size reference value:N=10,20,30,40,50.
7. the population of clonal vaviation optimization as claimed in claim 2 clusters High dimensional data analysis method, which is characterized in that institute The clone stated is defined as follows method:
Population clone determines that affinity is higher with affinity and concentration, and clone's number is bigger, and antibody concentration is higher, clones number It is smaller, it is positively correlated by affinity, clone's thought of concentration inverse correlation;Formula definition indicates as follows:
Wherein a is clone's upper limit quantity, Fi_AffinityIndicate affinity degree, FsimilarityIndicate that similarity, β indicate antibody population Size,Indicate that certain similar population number accounts for total antibody population number ratio, ratio is higher, and concentration is bigger, clones number It is smaller.
8. the population of clone's optimization as claimed in claim 2 clusters High dimensional data analysis method, which is characterized in that the step For particle of new generation in rapid six, clustered according to following k mean algorithms:
(1) it is encoded according to the cluster centre of particle, according to arest neighbors rule, determines the clustering of the corresponding particle;
(2) according to clustering, new cluster centre is calculated, the fitness value of more new particle updates original encoded radio.
9. the population of clone's optimization as described in claim 1 clusters High dimensional data analysis method, which is characterized in that described gram The population of grand optimization clusters High dimensional data analysis method:Population size N, maximum iteration T are initialized, make a variation width Coefficient lambda is spent, antibody likeness coefficient η clusters manifold X, as follows:
It is normalized to X'={ x'1,x'2,…x'n, Clustering Effect index I (k) obtains the k of maximum value as cluster numbers, it is also necessary to Judge the corresponding best cluster results of categorized data set X.
CN201810221722.8A 2018-03-18 2018-03-18 A kind of population cluster High dimensional data analysis method of clone's optimization Pending CN108595499A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201810221722.8A CN108595499A (en) 2018-03-18 2018-03-18 A kind of population cluster High dimensional data analysis method of clone's optimization

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201810221722.8A CN108595499A (en) 2018-03-18 2018-03-18 A kind of population cluster High dimensional data analysis method of clone's optimization

Publications (1)

Publication Number Publication Date
CN108595499A true CN108595499A (en) 2018-09-28

Family

ID=63626707

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201810221722.8A Pending CN108595499A (en) 2018-03-18 2018-03-18 A kind of population cluster High dimensional data analysis method of clone's optimization

Country Status (1)

Country Link
CN (1) CN108595499A (en)

Cited By (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110532429A (en) * 2019-09-04 2019-12-03 重庆邮电大学 It is a kind of based on cluster and correlation rule line on user group's classification method and device
CN110782949A (en) * 2019-10-22 2020-02-11 王文婷 Multilayer gene weighting grouping method based on maximum minimum sequence search
CN112309577A (en) * 2020-10-10 2021-02-02 广东工业大学 Multi-mode feature selection method for optimizing Parkinson voice data
CN112784910A (en) * 2021-01-28 2021-05-11 武汉市博畅软件开发有限公司 Deep filtering method and system for junk data
CN112907484A (en) * 2021-03-18 2021-06-04 国家海洋信息中心 Remote sensing image color cloning method based on artificial immune algorithm
CN113571134A (en) * 2021-07-28 2021-10-29 合肥综合性国家科学中心人工智能研究院(安徽省人工智能实验室) Method and device for selecting gene data characteristics based on backbone particle swarm optimization

Cited By (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110532429A (en) * 2019-09-04 2019-12-03 重庆邮电大学 It is a kind of based on cluster and correlation rule line on user group's classification method and device
CN110782949A (en) * 2019-10-22 2020-02-11 王文婷 Multilayer gene weighting grouping method based on maximum minimum sequence search
CN112309577A (en) * 2020-10-10 2021-02-02 广东工业大学 Multi-mode feature selection method for optimizing Parkinson voice data
CN112309577B (en) * 2020-10-10 2023-10-13 广东工业大学 Multi-mode feature selection method for optimizing parkinsonism voice data
CN112784910A (en) * 2021-01-28 2021-05-11 武汉市博畅软件开发有限公司 Deep filtering method and system for junk data
CN112907484A (en) * 2021-03-18 2021-06-04 国家海洋信息中心 Remote sensing image color cloning method based on artificial immune algorithm
CN112907484B (en) * 2021-03-18 2022-08-12 国家海洋信息中心 Remote sensing image color cloning method based on artificial immune algorithm
CN113571134A (en) * 2021-07-28 2021-10-29 合肥综合性国家科学中心人工智能研究院(安徽省人工智能实验室) Method and device for selecting gene data characteristics based on backbone particle swarm optimization

Similar Documents

Publication Publication Date Title
CN108595499A (en) A kind of population cluster High dimensional data analysis method of clone&#39;s optimization
CN102663100B (en) Two-stage hybrid particle swarm optimization clustering method
Michalski et al. Automated construction of classifications: Conceptual clustering versus numerical taxonomy
CN103116766B (en) A kind of image classification method of encoding based on Increment Artificial Neural Network and subgraph
CN108875816A (en) Merge the Active Learning samples selection strategy of Reliability Code and diversity criterion
CN112613552B (en) Convolutional neural network emotion image classification method combined with emotion type attention loss
Tang et al. Clustering big IoT data by metaheuristic optimized mini-batch and parallel partition-based DGC in Hadoop
CN108897791B (en) Image retrieval method based on depth convolution characteristics and semantic similarity measurement
CN106056059B (en) The face identification method of multi-direction SLGS feature description and performance cloud Weighted Fusion
CN113032613B (en) Three-dimensional model retrieval method based on interactive attention convolution neural network
CN110070121A (en) A kind of quick approximate k nearest neighbor method based on tree strategy with balance K mean cluster
CN111914900B (en) User electricity utilization mode classification method
CN100416599C (en) Not supervised classification process of artificial immunity in remote sensing images
CN108596186B (en) Three-dimensional model retrieval method
CN110364264A (en) Medical data collection feature dimension reduction method based on sub-space learning
CN110263855A (en) A method of it is projected using cobasis capsule and carries out image classification
Tan et al. Deep adaptive fuzzy clustering for evolutionary unsupervised representation learning
Saha et al. Semi-supervised clustering using multiobjective optimization
Guo Research on sports video retrieval algorithm based on semantic feature extraction
CN110490234A (en) The construction method and classification method of classifier based on Cluster Classification associative mechanism
Zhu et al. Beyond Similar and Dissimilar Relations: A Kernel Regression Formulation for Metric Learning.
CN110610420A (en) Stock price trend prediction method and system
CN105654498A (en) Image segmentation method based on dynamic local search and immune clone automatic clustering
CN110059752A (en) A kind of statistical learning querying method based on comentropy Sampling Estimation
CN113011589B (en) Co-evolution-based hyperspectral image band selection method and system

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination