CN108595499A - A kind of population cluster High dimensional data analysis method of clone's optimization - Google Patents
A kind of population cluster High dimensional data analysis method of clone's optimization Download PDFInfo
- Publication number
- CN108595499A CN108595499A CN201810221722.8A CN201810221722A CN108595499A CN 108595499 A CN108595499 A CN 108595499A CN 201810221722 A CN201810221722 A CN 201810221722A CN 108595499 A CN108595499 A CN 108595499A
- Authority
- CN
- China
- Prior art keywords
- particle
- population
- clone
- cluster
- clustering
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Pending
Links
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F18/00—Pattern recognition
- G06F18/20—Analysing
- G06F18/23—Clustering techniques
- G06F18/232—Non-hierarchical techniques
- G06F18/2321—Non-hierarchical techniques using statistics or function optimisation, e.g. modelling of probability density functions
- G06F18/23213—Non-hierarchical techniques using statistics or function optimisation, e.g. modelling of probability density functions with fixed number of clusters, e.g. K-means clustering
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/004—Artificial life, i.e. computing arrangements simulating life
- G06N3/006—Artificial life, i.e. computing arrangements simulating life based on simulated virtual individual or collective life forms, e.g. social simulations or particle swarm optimisation [PSO]
Landscapes
- Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- Physics & Mathematics (AREA)
- Data Mining & Analysis (AREA)
- Evolutionary Computation (AREA)
- Life Sciences & Earth Sciences (AREA)
- Artificial Intelligence (AREA)
- General Physics & Mathematics (AREA)
- General Engineering & Computer Science (AREA)
- Computing Systems (AREA)
- Software Systems (AREA)
- Molecular Biology (AREA)
- Computational Linguistics (AREA)
- Biophysics (AREA)
- Biomedical Technology (AREA)
- Mathematical Physics (AREA)
- General Health & Medical Sciences (AREA)
- Health & Medical Sciences (AREA)
- Probability & Statistics with Applications (AREA)
- Bioinformatics & Cheminformatics (AREA)
- Bioinformatics & Computational Biology (AREA)
- Computer Vision & Pattern Recognition (AREA)
- Evolutionary Biology (AREA)
- Information Retrieval, Db Structures And Fs Structures Therefor (AREA)
Abstract
The invention belongs to High dimensional space data analysis technical fields, a kind of population cluster High dimensional data analysis method of clone's optimization are disclosed, using based on clone's dynamic select population clustering technique;The assessment measure of combined coding mechanism and feature based dimension contribution rate based on restriction.Particle Swarm Theory is applied in high dimensional data clustering, guidance ground random search cluster centre vector is concentrated in data using the Optimizing Search mechanism of particle cluster algorithm.Each particle is considered as antibody, regard a kind of clustering cluster dividing mode of data set to be clustered as, particle is optimized and immunoevolution simultaneously, when dynamic evolution, particle by its affinity height it is directly proportional into Mobile state clone, by antibody concentration be inversely proportional carry out clone inhibition, by affinity height be inversely proportional carry out local variations.The present invention, which is effectively prevented from, is absorbed in local optimum, improves the stability and reliability of clustering algorithm.Accelerate high dimensional data search process, prevents from being absorbed in suboptimization.
Description
Technical field
The invention belongs to High dimensional space data analysis technical fields more particularly to a kind of population of clone's optimization to cluster height
Dimension data analysis method.
Background technology
In recent years, data mining causes the very big concern of information industry circle and entire society, the reason is that in daily life
In the presence of largely can be with widely used data, and the presence of high dimensional data in practice be more universal, and there is an urgent need to by data
It is converted into useful information and knowledge.Currently, the clustering algorithm of low-dimensional data comparative maturity, but in practical applications, it is high
The data of dimension, for example, the data of finance data, retail business, the data of telecommunications industry and biological data generally existing.Data
It is influenced by " dimension calamity " (the curse of dimensionality), many traditional clustering algorithms apply to high dimension
Often fail according to upper, exist such as to initial value it is sensitive, be easily trapped into that local best points, algorithm retractility are poor, can not handle
The problems such as large-scale data.Therefore, there is very important theory significance to the research of high dimensional data clustering and applies valence
Value.
High dimensional data is a highly important task in clustering, many applications need to comprising a large amount of characteristic items or
The object of person's dimension is analyzed.It may be incoherent that its data characteristics, which is between multiple dimensions, with the increase of dimension, data
What is become is more and more sparse so that the distance between point loses meaning in pairs, and the averag density between data becomes very low.Tradition cluster
When method is to high dimensional data clustering, the problem is that:1. high dimensional data, which is concentrated, has a large amount of unrelated attributes so that in institute
Have in dimension and there is a possibility that cluster is almost nil;2. the data of high-dimensional data space, dilute compared with the data distribution in lower dimensional space
It dredges, distance is almost equal between data, it is difficult to be measured with distance.
Inside data mining, in order to meet the needs of numerous users in different application field, researchers propose very
Spininess mainly has the cluster of (1) based on dimensionality reduction to the clustering method of high dimensional data;(2) subspace clustering;(3) based on hypergraph
Cluster;(4) joint cluster.Dimensionality reduction be exactly by Mapping of data points to more low-dimensional spatially to seek the compact representation of data
A kind of technology, the compact representation of this lower dimensional space is beneficial to be further processed data.Different dimension reduction methods, it
Seek that the mode that the low-dimensional of high dimensional data indicates is different, and the data and the degree of approximation of initial data after dimensionality reduction are also different,
It is also different to their clustering performance.Its maximum disadvantage be it a specific criterion is not provided evaluate from
The quality that higher-dimension is converted to low-dimensional.And for the data of very higher-dimension, the training process convergence of cluster can be very slow.It is sub empty
Between cluster be also known as feature selecting, it is divided into original data space different subspaces, only on those relevant subspaces
Investigate the presence of cluster.Such algorithm can find the cluster of any type and shape in any amount dimension, result in theory
It is made of the cluster of one group of different subspace, and can be represented by a disjunctive expression, and need not determine dimension amount in advance.
The disadvantage is that if parameter setting is improper, it is likely to leave out some important clusters in the beta pruning stage, specified to one
For data set, to determine that these parameters are extremely difficult.The relationship map that high dimensional data is asked is arrived based on the clustering method of hypergraph
On one hypergraph, the relationship of certain data is expressed on the super side of each in figure, and the weights on side then indicate the close of corresponding relation
Degree.This method biggest advantage is that it does not have to calculate the similarity between high dimensional data during cluster, therefore calculates
The time complexity of method is relatively low.But foot point is not that the data type of cluster is restricted.The thought of joint cluster is exactly that will first gather
The attribute of class data set is divided into several groups, then represents the set of properties for each set of properties one new attribute of proposition, after
And carry out high dimensional data cluster for several attributes derived from.The deficiency of this method is the raising of cluster data quality
Dependent on the cluster of its attribute, and attribute is clustered and also has to depend on corresponding data set.All due to each method
There is its advantage and defect, is not that a kind of algorithm can in practical applications can be according to particular problem suitable for all situations
The characteristics of select suitable algorithm.
Clone's optimization population cluster high dimensional data method that this scheme proposes is in conjunction with the excellent of dimensionality reduction and subspace clustering
The searching method of point design.Dimensionality reduction technology is typically to pass through feature selecting (Feature selection) or eigentransformation
(Feature transforma-tion) can utilize traditional gather by original high-dimensional data space reduction to compared with lower dimensional space
Class method completes clustering processing.Feature selection approach is the requirement or data set characteristic according to cluster target, from all attributes
Important attribute set is selected to be clustered.In general, feature selecting includes two parts, first, being carried out to each character subset
Search, second is that being evaluated character subset by certain criterion.Subspace clustering (Subspace Clustering) is different
Class be present in different subspaces, such method seeks to effectively extract the cluster for being present in subspace.With the total space
Dimension reduction method it is different, subspace clustering is that each cluster searches for its corresponding subspace.It, will be sub empty according to the difference of the direction of search
Between clustering method be divided into two major classes:The searching method of bottom-up (Bottom-up Subspace Search) and top-down
The searching method of (Top-bottom subspace search).The elder generation in correlation rule is utilized in bottom-up searching method
Property is tested, merges neighbouring dense cell to form cluster.CLIQUE algorithms first with correlation rule priori decision search and
Merge the grid that density is more than given threshold value, forms candidate subspace, and its subspace midpoint is pressed into these candidate subspaces
The size of quantity (covering) sorts, followed by Minimum description length criterion by the lower subspace beta pruning of scale.It is top-down
Searcher rule be to be scanned for subspace according to direction from top to bottom.PROCLUS algorithms are that earliest use is pushed up certainly
And the projected clustering algorithm of lower search strategy.PROCLUS is an algorithm based on central point, uses random sampling and Greedy
Method combines and selects some cluster central points, then calculates the weight often tieed up to each cluster with determining discriminant function, is constantly changing
The weight that dimension is adjusted during generation, finally finds out the class around these central points.DOC algorithms be used simultaneously from bottom to
On grid policies and top-down iterative modification cluster quality strategy, and propose a kind of determining for optimal projective clustering
Justice, but it still needs further improvement for the precision and operational efficiency of DOC algorithms.
Algorithm above is the main thought with the relevant Clustering Algorithm of Hi-dimensional Dataset of this programme, feature selecting or eigentransformation
It is to find all clusters inside the same proper subspace, has ignored inside high-dimensional data space, different clusters may has
Different proper subspaces;Subspace clustering method can then make different clusters there are different subspaces, but such methods
Computational complexity it is higher.
In conclusion problem of the existing technology is:When traditional clustering method is to high dimensional data clustering, due to higher-dimension
There are a large amount of unrelated attributes in data set, it is sparse compared with the data distribution in lower dimensional space so that there are clusters in all dimensions
Possibility it is almost nil, cluster when, it is difficult to accomplish Fast Convergent, and ensure that global search is optimal.
Particle cluster algorithm is the optimization algorithm based on swarm intelligence theory, compares emphasis and searches for premium class in whole dimension spaces
Central point, the intensive good subspace of search data set is clustered.It is generated by the interparticle cooperation and competition of population
Swarm intelligence instructs Optimizing Search, convergence rate very fast.Evolution Theory have it is stronger identification, study, memory and it is adaptive should be able to
Power, clone operations realize the expansion in antibody population space, and the antibody to generate new provides basis.This research one side grain to be utilized
Swarm optimization guiding search direction reaches effective quick clustering convergence;On the other hand each iteration of particle cluster algorithm is generated
As a result it is cloned, the search result of particle cluster algorithm is expanded to the population space of bigger, by being carried out not to portion gene
More fine local search is realized in variation with degree, recompresses search result to original seed group space size by selection,
To ensure that cluster has good global search and local search performance.
The groundwork of this patent is combined Immune Clone Selection with particle swarm optimization algorithm in clustering, establishing base
It, in conjunction with Immune Clone Selection mechanism, is constructed on the basis of Further aim function in the high dimensional data Clustering Model of particle cluster algorithm
For the population Dynamic Clustering Algorithm of data clusters analysis.Unlike existing research, in terms of particle variations and evolution,
It is improved in terms of the assessment measurement of particle group coding and high dimensional data feature dimensions, overcomes traditional clustering algorithm sensitive to initial value
The shortcomings that, the stability of high dimensional data cluster is improved, research is clustered for high dimensional data and application provides Technical Reference.
Invention content
In view of the problems of the existing technology, the present invention provides a kind of population cluster high dimensional datas point of clone's optimization
Analysis method.
The invention is realized in this way a kind of population of clone's optimization clusters High dimensional data analysis method, the clone
The population cluster High dimensional data analysis method of optimization generates N number of particle, adjusts the position of this N number of particle, calculates corresponding suitable
Response;The clone of different number is carried out according to its antibody-antigene affinity and antibody-antibody similarity to N number of particle;Clone's
Antibody, with the more respective antibody-antigene affinity of original antibody, is retained after the selection for gene by Immune Clone Selection
The highest particle of affinity, into next iteration;It to the last produces the optimum antibody of capture antigen or reaches specified
Until iterations.
Further, the population cluster High dimensional data analysis method of clone's optimization includes the following steps:
Step 1, the initialization each sample of particle, which is randomly assigned, to be calculated all kinds of for certain one kind as initial clustering
Cluster centre, as the position encoded of primary;N times are repeated in the speed for initializing particle, and symbiosis is at N number of initial
Population;
Step 2 calculates the contribution rate each tieed up in every one kind in each particle to such, the highest s dimension of contribution rate
The serial number of dimension calculates the fitness of particle as feature dimensions;
Step 3 compares the fitness for the desired positions Best_id that fitness value is lived through with it to each particle
Value, if more preferably, updating Best_id;
Step 4 compares fitness value and the fitness of desired positions Best_id that group is undergone to each particle
Value, if more preferably, updating Best_Value;
Step 5 adjusts speed and the position of particle;
Step 6, the k mean clusters of new individual;
Step 7 reaches algorithm termination condition, then terminates;Otherwise two are gone to step.
Further, particle initialization includes with coding in the step 1:The space encoder of design is quasi- to be made of three parts
(SUP, CEP, CPV), wherein SUP indicate that the real coding string of proper subspace, CEP indicate the real coding string at class center, CPV
Indicate class center degree of change (record update position, for adjusting global and local consistency).Initial population is given birth in a random basis
At a feature dimensions of random selection SUP_maxnumber (maximum feature dimensions number) and CEP_maxnnumber (maximum classes
Number) a data object carries out coding composition individual, and then iteration N_size (scale of preset initial population) is secondary, that is, completes
The generation of initial population.
Further, fitness function calculates in the step 2, is indicated the contribution rate of subspace clustering with feature dimensions;
K with { C1,C2,…CkCentered on subspace class { A1,A2,…Ak, to each subclass Ai(i=1,2 ...,
K) it is measured, contribution rate metric evaluation function is as follows:
J expressions contain intrinsic dimensionality in subspace,Indicate class AiOn data point jth dimension and the of central point
J ties up distance, and value is smaller, indicates class AiBe class on feature dimensions j it is compact, also referred to as ties up j to class AiContribution it is big, FijValue
It is bigger;Conversely, claiming dimension j to class AiContribution it is small.Calculate AiAll feature dimensions are to A in classiContribution and be expressed as μi:
The sum of fitness by all K classes indicates the fitness of all particles:
Further, Immune Clone Selection dynamic clustering specifically includes in the step 1:
● the position Z of each particle in initial initialization populationi={ Zi1, Zi2..., ZikAnd speed Vi={ Vi1,
Vi1..., Vik};
● While (current iteration number t<T);// provide cycle qualifications
● Fori=1to populations N//cluster starts;
● minimal distance principle is pressed by all vector X' in X 'jIt assigns in the class cluster that a cluster centre Zij is represented;
● calculate the adaptive value of each particle;
● clone's quantity is calculated, particle is cloned;
● Immune Clone Selection is carried out to data;
● update the current optimal solution of each particle;
● the current optimal solution of update group;
● the speed of more new particle and position;
●End for
● endwhile//cycle terminates
● calculate the index of Clustering Effect;
● output cluster result;
● terminate
Further, the population cluster High dimensional data analysis method of clone's optimization includes:Initialize population size
N, maximum iteration T, variation amplitude coefficient lambda, antibody likeness coefficient η cluster manifold X, as follows:
Further, particle is evaluated and is measured according to formula in the step 5;
K with { C1,C2,…Ck, centered on subspace class { A1,A2,…Ak, to each subclass Ai(i=1,
2 ..., k) it is measured, contribution rate metric evaluation function is as follows:
J expressions contain intrinsic dimensionality in subspace,Indicate class AiOn data point jth dimension and the of central point
J ties up distance, and value is smaller, indicates class AiBe class on feature dimensions j it is compact, also referred to as ties up j to class AiContribution it is big, FijValue
It is bigger;Conversely, claiming dimension j to class AiContribution it is small.Calculate AiAll feature dimensions are to A in classiContribution and be expressed as μi:
The sum of fitness by all K classes indicates the fitness of all particles:
Further, particle of new generation is clustered according to following k mean algorithms in the step 6:
(1) it is encoded according to the cluster centre of particle, according to arest neighbors rule, determines the clustering of the corresponding particle;
(2) according to clustering, new cluster centre is calculated, the fitness value of more new particle updates original encoded radio.
Further particle is cloned, is positively correlated by affinity, clone's thought of concentration inverse correlation.Formula defines table
Show as follows:
Wherein a is clone's upper limit quantity, Fi_AffinityIndicate affinity degree, FsimilarityIndicate that similarity, β indicate antibody kind
The size of group,Indicating that certain similar population number accounts for total antibody population number ratio, ratio is higher, and concentration is bigger, gram
Grand number is smaller.
Further, the particle position and speed of the Immune Clone Selection Dynamic Clustering Algorithm:
Vi'd=wVid+n1rand1(Pid-Xid)+n2rand2(Pgd-Xid);
X′id=Xid+Vid;
Parameter selection includes three parameters:w、n1、n2, maximum speed Vmax, maximum position Xmax, Population Size:W takes 0.4
To 0.8, n1、n2Take 1.0 to 2.0, maximum speed:Vmax=0.2*Xmax, maximum position Xmax=max (Xi)<Per one-dimensional maximum
Value>, Population Size:N=20,30,40,50.
It is normalized to X '={ x '1,x'2,…x'n, Clustering Effect index I (k) obtains the k of maximum value as cluster numbers, also
It needs to judge the corresponding best cluster results of categorized data set X.
The present invention is theoretical applied in high dimensional data clustering by population (PSO), utilizes the optimization of particle cluster algorithm
Search mechanisms are concentrated with guidance ground random search cluster centre vector in data.A group random particles are initialized, are looked for by iteration
To optimal solution, in each iteration, particle updates the position of oneself by tracking two " extreme values ", and one is particle itself
The preferably solution found, i.e., individual extreme value (p_best), another extreme value are that all particles are searched in the successive dynasties in entire population
The optimal solution (g_best) reached in the process, i.e. global extremum, have emphasize it is distributed, relatively easy, individual between it is direct
Or indirect reciprocation, there is very strong adaptability and robustness.
Present invention improves over particle group coding and subspace valuation functions, general coding method emphasis is empty in class central point
Between encode, and project is improved to combined coding mode, by feature selecting space, the class center space of points (position of particle in corresponding PSO
Set) and knots modification (speed of particle in corresponding PSO) three parts of central point constitute jointly space encoder.Subspace is improved to comment
Estimate mode, proposes fitness function of the feature based dimension to subspace clustering contribution rate, be the valuation functions of subspace clustering, than
More different subspace clustering effects are together evaluated the feature dimensions that cluster result joint subspace includes.
Evolution Theory is applied to clustering problem and solved by the present invention, on the basis of Further aim function, is selected in conjunction with clone
Select a good opportunity reason, each particle be considered as antibody, regard a kind of clustering cluster dividing mode of data set to be clustered as, at the same to particle into
Row optimization and immunoevolution.In evolutionary process, particle is cloned, is inversely proportional by antibody concentration by its affinity height is directly proportional
Carry out clone inhibition, being inversely proportional by affinity height carries out local variations.
Currently, data mining and data analysis have broad application prospects under study for action, the present invention is existed by Clone cells
The multiple directions of same particle periphery carry out global or local search, promote the particle tachytelic evolution in population, are solving higher-dimension
When the clustering problem of data, the traditional clustering algorithm disadvantage sensitive to initial value is not only overcome, but also can be effectively prevented from sunken
Enter local optimum, improves the stability and reliability of clustering algorithm.The traditional clustering algorithm disadvantage sensitive to initial value is overcome,
Accelerate high dimensional data search process, prevents from being absorbed in suboptimization;Life is also mostly high dimensional data with other data in practice,
Such as biological data, image data, network data, economic data, medical data, utilization and analysis to these data provide skill
Art refers to, and to the research that WEB data, text cluster and class internal schema are the clustering problem that non-spherical is spread, is especially adding
There is important theory significance and positive facilitation in terms of speed convergence and global optimum.
Description of the drawings
Fig. 1 is the population cluster High dimensional data analysis method flow diagram of clone's optimization provided in an embodiment of the present invention.
Fig. 2 is cluster result schematic diagram of each algorithm provided in an embodiment of the present invention on wine data sets.
Fig. 3 is that the embodiment of the present invention provides cluster result schematic diagram of each algorithm on Ionosphere data sets.
Fig. 4 is cluster result schematic diagram of each algorithm provided in an embodiment of the present invention on spambase data sets.
Specific implementation mode
In order to make the purpose , technical scheme and advantage of the present invention be clearer, with reference to embodiments, to the present invention
It is further elaborated.It should be appreciated that the specific embodiments described herein are merely illustrative of the present invention, it is not used to
Limit the present invention.
The present invention in terms of particle variations and the evolution, in terms of the assessment measurement of particle group coding and high dimensional data feature dimensions into
Row research and improvement, overcome the disadvantage that traditional clustering algorithm is sensitive to initial value, improve the stability of high dimensional data cluster, are
High dimensional data cluster research provides practicable theory and technology reference.
The application principle of the present invention is explained in detail below in conjunction with the accompanying drawings.
As shown in Figure 1, the population cluster High dimensional data analysis method of clone's optimization provided in an embodiment of the present invention includes
Following steps:
S101:The initialization of population, when initializing particle, by each sample be randomly assigned for certain one kind, as initial
Clustering, and calculate all kinds of cluster centres, as the position encoded of primary, and initialize the speed of particle, instead
N times are carried out again, and symbiosis is at N number of primary group;
S102:The contribution rate each tieed up in every one kind in each particle to such is calculated, and the highest s dimension of contribution rate
The serial number of dimension obtain feature dimensions as such, while calculating the fitness of the particle;
S103:To each particle, compare the fitness for the desired positions Best_id that its fitness value is lived through with it
Value, if more preferably, updating Best_id;
S104:To each particle, compare:The fitness for the desired positions Best_Value that fitness value and group are undergone
Value, if more preferably, updating Best_Value;
S105:Adjust speed and the position of particle;
S106:The k mean clusters of new individual;
S107:If reaching algorithm termination condition, terminate, otherwise goes to step S102;In the mistake using particle cluster algorithm
Cheng Zhong, when carrying out repartitioning classification to individual of new generation less than S106, it is possible to will appear empty class, gather if there is empty
Class then takes out the farthest pattern vector of cluster centre from the cluster of some other non-empty, vector is put into empty cluster at random, weight
This multiple process, until in division without empty cluster.
In step S105:Speed and the position of particle are adjusted according to formula;
K with { C1,C2,…CkCentered on subspace class { A1,A2,…Ak, to each subclass Ai(i=1,2 ...,
K) it is measured, contribution rate metric evaluation function is as follows:
J expressions contain intrinsic dimensionality in subspace,Indicate class AiOn data point jth dimension and the of central point
J ties up distance, and value is smaller, indicates class AiBe class on feature dimensions j it is compact, also referred to as ties up j to class AiContribution it is big, FijValue
It is bigger;Conversely, claiming dimension j to class AiContribution it is small.Calculate AiAll feature dimensions are to A in classiContribution and be expressed as μi:
The sum of fitness by all K classes indicates the fitness of all particles:
The fitness of so entire particle (proper subspace) is exactly the sum of the fitness for seeking k class.
In step s 106:For particle of new generation, clustered according to k mean algorithms below:A, according to particle
Cluster centre encodes, and according to arest neighbors rule, determines the clustering of the corresponding particle;B, it according to clustering, calculates new
Cluster centre, the fitness value of more new particle update original encoded radio.Since k mean values have stronger local search ability,
Therefore introducing the skilled speed of the population of k mean cluster thoughts can greatly improve.
The population cluster High dimensional data analysis method of clone's optimization provided in an embodiment of the present invention is taken for be clustered
Data setting cluster number of clusters k=2, and be incremented toRespective optimum kind cluster center is found respectively, finally by each
Clustering Effect index I (k) under k values determines cluster numbers and corresponding cluster centre.
First, initialization population size N, maximum iteration T, variation amplitude coefficient lambda, antibody likeness coefficient η gather
Class manifold X, as follows:
It is normalized to X '={ x '1,x'2,…x'n, Clustering Effect index I (k) obtains the k of maximum value as cluster numbers, also
It needs to judge the corresponding best cluster results of categorized data set X.
The application principle of the present invention is further described with reference to specific embodiment.
The population cluster High dimensional data analysis method of clone's optimization provided in an embodiment of the present invention includes the following steps:
1, particle initialization and coding
Space encoder is quasi- to be made of (SUP, CEP, CPV) three parts, and wherein SUP indicates the real coding of proper subspace
String, CEP indicate the real coding string at class center, CPV indicate class center degree of change (record update position, for adjust it is global and
Locally coherence), initial population generates in a random basis, random selection SUP_maxnumber (maximum feature dimensions number)
A feature dimensions and CEP_maxnnumber (maximum class number) a data object carry out coding composition individual, then iteration N_
Size (scale of preset initial population) is secondary, that is, completes the generation of initial population.
2, fitness function calculates, and is indicated the contribution rate of subspace clustering with feature dimensions;
K with { C1,C2,…Ck, centered on subspace class { A1,A2,…Ak, to each subclass Ai(i=1,
2 ..., k) it is measured, contribution rate metric evaluation function is as follows:
J expressions contain intrinsic dimensionality in subspace,Indicate class AiOn data point jth dimension and the of central point
J ties up distance, and value is smaller, indicates class AiBe class on feature dimensions j it is compact, also referred to as ties up j to class AiContribution it is big, FijValue
It is bigger;Conversely, claiming dimension j to class AiContribution it is small.
Calculate AiAll feature dimensions are to A in classiContribution and be expressed as μi:
The sum of fitness by all K classes indicates the fitness of all particles:
3, Immune Clone Selection, particle cluster algorithm are all vectors for the particle position when being updated to particle position
Direction all updates, and is easy to skip more excellent or optimal location, therefore in Immune Clone Selection dynamic clustering, genetic mutation is operated to population
The partial gene fragments (each genetic fragment corresponds to a cluster centre) of new particle (i.e. antibody) are pressed after each iteration of algorithm
Formula carries out mutation operation, so as to increase the dynamic local search capability to the particle current location.
Clone is defined as follows method:
Population clone determines that affinity is higher with affinity and concentration, and clone's number is bigger, and antibody concentration is higher, gram
Grand number is smaller, is positively correlated by affinity, clone's thought of concentration inverse correlation.Formula definition indicates as follows:
Wherein a is clone's upper limit quantity, Fi_AffinityIndicate affinity degree, FsimilarityIndicate that similarity, β indicate antibody kind
The size of group,Indicating that certain similar population number accounts for total antibody population number ratio, ratio is higher, and concentration is bigger, gram
Grand number is smaller.
4, particle position and speed
V′id=wVid+n1rand1(Pid-Xid)+n2rand2(Pgd-Xid);
X′id=Xid+Vid;
Parameter selection includes three parameters:w、n1、n2, maximum speed Vmax, maximum position Xmax, Population Size:W is quasi-
0.4 to 0.8, n is taken in fixed experiment1、n21.0 to 2.0, maximum speed is taken in drafting experiment:Vmax=0.2*Xmax, maximum position
Xmax=max (Xi)<Per one-dimensional maximum value>, draft experimental population size:N=20,30,40,50.
5, stopping criterion for iteration determines
The Stopping criteria selection of usual algorithm has following three criterion:
(1) fitness of optimum individual reaches given threshold value.
(2) iterations reach a preset maximum iterations.
(3) when the fitness solved in search process is no longer substantially change after continuous multi-generation.
6. Immune Clone Selection dynamic clustering summary algorithm is as follows:
There are the parameters such as speed, position for Immune Clone Selection Dynamic Clustering Algorithm, and each more new capital of each particle is to pass through
What speed and position carried out.
The application effect of the present invention is explained in detail with reference to experiment.
It in order to verify the feasibility and validity of the present invention, is analyzed and is compared by experiment, comparison application is classical to calculate
Subspace clustering algorithm-PROCLUS algorithms of method k-means algorithms and classics, and compare the grain with the band clone in project
Subgroup High Dimensional Clustering Analysis algorithm is (in experiment referred to as:Clone_POS_Cluster).
Data set is chosen:In order to which whether verification algorithm is effective to high dimensional data cluster, and ensure the practicability of algorithm, chooses
Data have two groups, first, real application data, second is that classical machine learning data.Real application data derives from interbank
The official Shibor data (http of short-term loan at daily interest interest rate://www.shibor.org official websites), chose for 1500 day of trade, totally 9
Group data.Classical machine learning data source (comes from http in UCI data sets://archive.ics.uci.edu/ml/ nets
Location), selected three group data set therein obtained is respectively:Wine data sets, Ionosphere data sets, spambase data
Collection.
Cluster result is compared according to following three indexs:
1) Purity purity:It is the another of object of the cluster as obtained by algorithm operation to what extent comprising former single class
A kind of measurement:
If purity is bigger, cluster result is more close with known " brass tacks " obtained by algorithm, and Clustering Effect is better.
2)RI:Rand statistics are a kind of to take ideal cluster similarity matrix related to ideal class similarity matrix
Spend the measurement as Cluster Validity.Ideal cluster similarity matrix, the i-th j is 1, if two objects i and j are same
Otherwise a cluster is 0;Ideal class similarity matrix, the i-th j is 1, if two object i and j in same class, otherwise for
0.Rand statistics can calculate as follows:
Wherein, f00=has the number of the object pair of different class and different clusters;
F01=has the number of the object pair of different class and identical cluster;
f10The number of the object pair of=class having the same and different clusters;
The number of the object pair of f11=classes having the same and identical cluster;
Rand statistics are bigger it can be seen from formula, and cluster result gets over phase with known " brass tacks " obtained by algorithm
Closely, Clustering Effect is better.
3) Error_degree error rates remember that data amount check is T in initial data, and the data amount check of the i-th class is Ti, pass through
Cluster, obtains i-th1Class corresponds to the i-th class of initial data, and i-th1The data amount check for belonging to original i-th class in the data of class isThen the error rate of the i-th class is:
It remembers the data point that each class is confused after row cluster into and (belongs to i-th1Class and be not belonging to the i-th class) number be T1', then
Total false rate is:
Algorithms of different is used in plan respectively, in k-means algorithms, PROCLUS_clustering algorithms and project
Population High Dimensional Clustering Analysis algorithm (Clone_POS_Cluster), analysis two groups of different data collection of comparison are cloned, and count above three
A Cluster Validity measurement index, provides specific experiment parameter and experiment analysis results.
It is separately operable k-means algorithms, PROCLUS_clustering algorithms and clone's population High Dimensional Clustering Analysis algorithm
(Clone_POS_Cluster), and three above Cluster Validity measurement index is counted, specific experimental result and analysis are such as
Under:
When running PROCLUS_clustering algorithms, need to set relevant parameter crossover probability P c, mutation probability m P,
Number max_fnum, number of clusters mesh max_cnum, population scale popsize and the iteration maximum times max_gen of feature dimensions are selected,
Depending on the setting of these parameters will be according to specific data set, specific experiment parameter such as table 1:
The parameter of 1 experimental setup PROCLUS_clustering algorithms of table
When running population High Dimensional Clustering Analysis algorithm, relevant parameter is also set:W generally takes 0.4 to 0.8, and n1, n2 are general
1.0 to 2.0 are taken, maximum speed:Vmax=0.2*Xmax, maximum position Xmax=max (Xi)<Per one-dimensional maximum value>, population
Size:The setting of these parameters of N=20,30,40 also will be depending on specific data, specific parameter such as table 2:
2 each parameters of experimental setup PROCLUS_clustering of table
(1) table 3 is k-means algorithms, PROCLUS_clustering algorithms and clone's population High Dimensional Clustering Analysis algorithm
(Clone_POS_Cluster) three algorithms are shown in wine data in the sign subspace clustered on wine data sets, Fig. 2
Cluster result on collection.
The proper subspace that 3 each algorithm of table clusters on wine data sets
From figure 2 it can be seen that on wine data sets, the value of the error_drgee of population High Dimensional Clustering Analysis algorithm is most
Small, purity second, RI are also the error_drgee values second obtained by second, PROCLUS_clustering algorithms, purity
It is maximum with the value of RI, effect it is worst be k-means algorithms, and as can be seen from Table 5, pass through PROCLUS_
The dimension for the optimal solution that clustering algorithms and population High Dimensional Clustering Analysis algorithm (Clone_POS_Cluster) are found all is 8
Dimension, only each class of population High Dimensional Clustering Analysis algorithm must tie up different, to contain jointly in the dimension that the two algorithms are selected dimension
Degree is:3,4,5,11, it is believed that this apteryx is important in all dimensions, and the optimal solution that k-means algorithms are found
Dimension is to tie up entirely, i.e., 13 dimensions.From the cluster knot solved with upper table and it can be seen from the figure that, clone's population High Dimensional Clustering Analysis algorithm gained
The error rate of fruit is minimum, and Clustering Effect is preferable, and the dimension of proper subspace is also minimum, is worked as in three algorithms
In, the solution obtained by PROCLUS_clustering algorithms and population High Dimensional Clustering Analysis algorithm (Clone_POS_Cluster) is most
Alright, while also illustrating, population higher-dimension algorithm can reduce the influence of " dimension calamity " to a certain extent, poly- to high dimensional data
Class is effective.
(2) table 4 is k-means algorithms, PROCLUS_clustering algorithms and clone's population High Dimensional Clustering Analysis algorithm
(Clone_POS_Cluster) proper subspace clustered on Ionosphere data sets, Fig. 3 are shown three algorithms and exist
Cluster result on Ionosphere data sets.
The proper subspace that 4 each algorithm of table clusters on Ionosphere data sets
It can be seen that on Ionosphere data sets from table 4 and Fig. 3, clone population High Dimensional Clustering Analysis algorithm
(Clone_POS_Cluster) the purity values that gained solves most are beaten, RI values are maximum, error_drgee values are minimum;Followed by k-
Solution obtained by means algorithms, effect it is worst be solution obtained by PROCLUS_clustering algorithms.PROCLUS_
Clustering algorithms and clone's population High Dimensional Clustering Analysis algorithm can drop to dimension very low, but Clone_POS_
The error rate of Cluster algorithms is lower, purity and RI highers, so effect is more preferable, it is clear that this population high dimensional data
Clustering algorithm is effective to dimensionality reduction, can be used for high dimensional data cluster.
(3) table 5 is k-means algorithms, PROCLUS_clustering algorithms and clone's population High Dimensional Clustering Analysis algorithm
(Clone_POS_Cluster) proper subspace clustered on spambase data sets, Fig. 4 are shown three algorithms and exist
Cluster result on spambase data sets.
The proper subspace that 5 each algorithm of table clusters on spambase data sets
From, as can be seen that on spambase data sets, population High Dimensional Clustering Analysis algorithm gained solves in table 5 and Fig. 4
Purity values are maximum, RI values are maximum, error_drgee values are minimum, and effect is best;Followed by PROCLUS_clustering algorithms
The solution of gained, effect it is worst be solution obtained by k-means algorithms.G PROCLUS_clustering algorithms and population higher-dimension
Clustering algorithm displays the advantage of high dimensional data, and because the subspace of each class of population High Dimensional Clustering Analysis algorithm
Feature dimensions can be different, so can more accurately be clustered, Clustering Effect is better than PROCLUS_clustering algorithms.
The cluster feature subspace dimension of k-means algorithms is 57 dimension of full dimension, PROCLUS_clustering algorithm gained
The dimension of solution is identical 13 dimension of each class, and the dimension of clone's population High Dimensional Clustering Analysis algorithm is 13 different dimensions of each class, phase
Comparatively, Clone_POS_Cluster algorithms and PROCLUS_clustering algorithms all greatly reduce data set
Dimension, Clone_POS_Cluster algorithms but remain better Clustering Effect, are said from this angle, population High Dimensional Clustering Analysis
Solution obtained by algorithm is better than solution obtained by k-means algorithms and PROCLUS_clustering algorithms." dimension calamity " is by data
Caused by dimension height, therefore, under the premise of ensureing Clustering Effect, the lower the dimension for being desirable to data the better.This experiment
Further demonstrate feasibility and validity that Clone_POS_Cluster algorithms cluster high dimensional data.
Summarize so carrying out one for the experimental result of three above algorithm, such as following table:
6 algorithms of different experimental result of table summarizes and compares
From both the above it can be seen from the figure that, clone's population High Dimensional Clustering Analysis algorithm (Clone_POS_ of research
Cluster) to dimension be 13 wine data sets, it is good without other two kinds of algorithms, but to dimension be 34 Ionosphere
The spambase data sets that data set and dimension are 57 all achieve good effect, and Clone_POS_Cluster algorithms are significantly
Ground reduces the dimension of data set, but remains the same Clustering Effect, is said from this angle, population High Dimensional Clustering Analysis algorithm institute
Solution obtained by better than k-means algorithms must be solved, it is also slightly more excellent than PROCLUS_clustering algorithm.
In conclusion either in artificial data, or on truthful data, Purity, RI, error_drgee tri-
Evaluation of a measurement index to its experimental result, all illustrate the algorithm researched and proposed for high dimensional data cluster be it is effective,
The influence of " dimension calamity " can be reduced to a certain extent.
The foregoing is merely illustrative of the preferred embodiments of the present invention, is not intended to limit the invention, all essences in the present invention
All any modification, equivalent and improvement etc., should all be included in the protection scope of the present invention made by within refreshing and principle.
Claims (9)
1. a kind of population of clone's optimization clusters High dimensional data analysis method, which is characterized in that the particle of clone's optimization
Clustering class High dimensional data analysis method generates N number of particle, adjusts the position of particle, is dynamically selected;It is anti-to N number of particle measurement
Similarity between body-antibody and " body-antigen binding affinity passes through measurement results and carries out different interparticle Immune Clone Selections;To grain
Son carries out specific mutation operation, measures and compare original antibody antibody-antigene affinity, retains affinity most by Immune Clone Selection
High particle, and dynamic updates each particle rapidity of population, position, enters back into next iteration;It to the last exports optimal anti-
The population of body, or while reaching given number of iterations, terminate;
Similarity function F between " antibody-antibody "similarity, indicate i and j in n-dimensional space at a distance from, apart from smaller,
Similarity is bigger:
" antibody-antigene " affinity of i-th kind of clustering calculates function Fi_Affinity, indicate as follows:
Wherein for given data acquisition system, M is a constant, indicates affinity force coefficient, D (xi,yi) indicate data set to be clustered
xiTo the distance of its data set central point, wiIndicate the weighted factor of ith feature attribute, and all characteristic attribute weighted sums are
1。
2. the population of clone's dynamic optimization clusters High dimensional data analysis method as described in claim 1, which is characterized in that institute
The population cluster High dimensional data analysis method for stating clone's optimization includes the following steps:
Step 1 initializes population sample, is randomly assigned and classifies, and initialization cluster subgroup initializes the speed of particle,
Each population is measured, as the position encoded of primary, n times are repeated, generates N number of primary group;
Step 2 evaluates population based on contribution rate;Specific to calculate per the contribution rate each tieed up in one kind to such, contribution rate is most
High preceding m feature serial number is also the fitness of the particle as selected feature dimensions;
Step 3 compares the fitness value for the desired positions Best_id that fitness value is lived through with it, such as to each particle
Fruit is more preferable, updates Best_id;
Step 4 compares fitness value and the fitness value of desired positions Best_id that group is undergone to each particle
Best_Value, if more preferably, updating Best_Value;
Step 5 adjusts speed and the position of particle;
Step 6, the k mean clusters of new individual;
Step 7 reaches algorithm termination condition, then terminates;Otherwise two are gone to step.
3. the population of clone's optimization as claimed in claim 2 clusters High dimensional data analysis method, which is characterized in that the step
Particle coding uses " the combined coding mechanism based on restriction " in rapid one;General coding method focuses on the class center space of points more
On, using the real coding scheme based on cluster centre;Space encoder forms SUP, CEP, CPV by three parts, and wherein SUP is indicated
The real coding string of proper subspace, CEP indicate that the real coding string at class center, CPV indicate class center degree of change;With specific reference to
Its quantized value is encoded bunchiness by respective value range, while under restrictive condition, is effectively shortened code length, is prevented because of particle
Length increased dramatically, and runnability is quite declined;Initial population generates in a random basis, random selection SUP_
The maximum feature dimensions number feature dimensions of maxnumber and the maximum class number data objects of CEP_maxnnumber are compiled
Code composition individual, then the scale of the preset initial populations of iteration N_size time, that is, complete the generation of initial population.
4. the population of clone's optimization as claimed in claim 2 clusters High dimensional data analysis method, which is characterized in that the step
Fitness function calculates in rapid two, is indicated the contribution rate of subspace clustering with feature dimensions;
K with { C1,C2,…Ck, centered on subspace class { A1,A2,…Ak, to each subclass Ai(i=1,2 ..., k)
It is measured, contribution rate metric evaluation function is as follows:
J expressions contain intrinsic dimensionality in subspace,Indicate class AiOn the jth dimension of data point tieed up with the jth of central point
Distance, value is smaller, indicates class AiBe class on feature dimensions j it is compact, also referred to as ties up j to class AiContribution it is big, FijValue get over
Greatly;Conversely, claiming dimension j to class AiContribution it is small;Calculate AiAll feature dimensions are to A in classiContribution and be expressed as μi:
The sum of fitness by all K classes indicates the fitness of all particles:
5. the population of clone's optimization as claimed in claim 2 clusters High dimensional data analysis method, which is characterized in that the step
Immune Clone Selection dynamic clustering process specifically includes in rapid one:
The position Z of each particle in initial initialization populationi={ Zi1, Zi2..., ZikAnd speed Vi={ Vi1, Vi1...,
Vik};
While (current iteration number t<T);// provide cycle qualifications
For i=1 to populations N//cluster starts;
By minimal distance principle by all vector X' in X 'jIt assigns in the class that a cluster centre Zij is represented;
Calculate the adaptive value of each particle;
Clone's quantity is calculated, particle is cloned;
Immune Clone Selection is carried out to data;
Update the current optimal solution of each particle;
Update the current optimal solution of group;
The speed of more new particle and position;
End for
End while//cycle terminates
Calculate the index of Clustering Effect;
Export cluster result;
Terminate.
6. the population of clone's optimization as claimed in claim 5 clusters High dimensional data analysis method, characterized in that the clone
It is position and speed to select the newer attribute of particle of Dynamic Clustering Algorithm:
Vi'd=wVid+n1rand1(Pid-Xid)+n2rand2(Pgd-Xid);
X'id=Xid+Vid;
Dynamic update includes three parameters:w、n1、n2;It is related to particle rapidity and position:VmaxIndicate maximum speed, XmaxIt indicates most
Big position;
Population Size W refers to value range:0.5 to 0.9;
n1、n2Can be 1.0 to 2.0 with reference to value;
Dynamic mapping principle refers to:Maximum speed:Vmax=0.3*Xmax, maximum position Xmax=max (Xi) take per one-dimensional maximum
Value;
Population Size reference value:N=10,20,30,40,50.
7. the population of clonal vaviation optimization as claimed in claim 2 clusters High dimensional data analysis method, which is characterized in that institute
The clone stated is defined as follows method:
Population clone determines that affinity is higher with affinity and concentration, and clone's number is bigger, and antibody concentration is higher, clones number
It is smaller, it is positively correlated by affinity, clone's thought of concentration inverse correlation;Formula definition indicates as follows:
Wherein a is clone's upper limit quantity, Fi_AffinityIndicate affinity degree, FsimilarityIndicate that similarity, β indicate antibody population
Size,Indicate that certain similar population number accounts for total antibody population number ratio, ratio is higher, and concentration is bigger, clones number
It is smaller.
8. the population of clone's optimization as claimed in claim 2 clusters High dimensional data analysis method, which is characterized in that the step
For particle of new generation in rapid six, clustered according to following k mean algorithms:
(1) it is encoded according to the cluster centre of particle, according to arest neighbors rule, determines the clustering of the corresponding particle;
(2) according to clustering, new cluster centre is calculated, the fitness value of more new particle updates original encoded radio.
9. the population of clone's optimization as described in claim 1 clusters High dimensional data analysis method, which is characterized in that described gram
The population of grand optimization clusters High dimensional data analysis method:Population size N, maximum iteration T are initialized, make a variation width
Coefficient lambda is spent, antibody likeness coefficient η clusters manifold X, as follows:
It is normalized to X'={ x'1,x'2,…x'n, Clustering Effect index I (k) obtains the k of maximum value as cluster numbers, it is also necessary to
Judge the corresponding best cluster results of categorized data set X.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201810221722.8A CN108595499A (en) | 2018-03-18 | 2018-03-18 | A kind of population cluster High dimensional data analysis method of clone's optimization |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201810221722.8A CN108595499A (en) | 2018-03-18 | 2018-03-18 | A kind of population cluster High dimensional data analysis method of clone's optimization |
Publications (1)
Publication Number | Publication Date |
---|---|
CN108595499A true CN108595499A (en) | 2018-09-28 |
Family
ID=63626707
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201810221722.8A Pending CN108595499A (en) | 2018-03-18 | 2018-03-18 | A kind of population cluster High dimensional data analysis method of clone's optimization |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN108595499A (en) |
Cited By (6)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN110532429A (en) * | 2019-09-04 | 2019-12-03 | 重庆邮电大学 | It is a kind of based on cluster and correlation rule line on user group's classification method and device |
CN110782949A (en) * | 2019-10-22 | 2020-02-11 | 王文婷 | Multilayer gene weighting grouping method based on maximum minimum sequence search |
CN112309577A (en) * | 2020-10-10 | 2021-02-02 | 广东工业大学 | Multi-mode feature selection method for optimizing Parkinson voice data |
CN112784910A (en) * | 2021-01-28 | 2021-05-11 | 武汉市博畅软件开发有限公司 | Deep filtering method and system for junk data |
CN112907484A (en) * | 2021-03-18 | 2021-06-04 | 国家海洋信息中心 | Remote sensing image color cloning method based on artificial immune algorithm |
CN113571134A (en) * | 2021-07-28 | 2021-10-29 | 合肥综合性国家科学中心人工智能研究院(安徽省人工智能实验室) | Method and device for selecting gene data characteristics based on backbone particle swarm optimization |
-
2018
- 2018-03-18 CN CN201810221722.8A patent/CN108595499A/en active Pending
Cited By (8)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN110532429A (en) * | 2019-09-04 | 2019-12-03 | 重庆邮电大学 | It is a kind of based on cluster and correlation rule line on user group's classification method and device |
CN110782949A (en) * | 2019-10-22 | 2020-02-11 | 王文婷 | Multilayer gene weighting grouping method based on maximum minimum sequence search |
CN112309577A (en) * | 2020-10-10 | 2021-02-02 | 广东工业大学 | Multi-mode feature selection method for optimizing Parkinson voice data |
CN112309577B (en) * | 2020-10-10 | 2023-10-13 | 广东工业大学 | Multi-mode feature selection method for optimizing parkinsonism voice data |
CN112784910A (en) * | 2021-01-28 | 2021-05-11 | 武汉市博畅软件开发有限公司 | Deep filtering method and system for junk data |
CN112907484A (en) * | 2021-03-18 | 2021-06-04 | 国家海洋信息中心 | Remote sensing image color cloning method based on artificial immune algorithm |
CN112907484B (en) * | 2021-03-18 | 2022-08-12 | 国家海洋信息中心 | Remote sensing image color cloning method based on artificial immune algorithm |
CN113571134A (en) * | 2021-07-28 | 2021-10-29 | 合肥综合性国家科学中心人工智能研究院(安徽省人工智能实验室) | Method and device for selecting gene data characteristics based on backbone particle swarm optimization |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN108595499A (en) | A kind of population cluster High dimensional data analysis method of clone's optimization | |
CN102663100B (en) | Two-stage hybrid particle swarm optimization clustering method | |
Michalski et al. | Automated construction of classifications: Conceptual clustering versus numerical taxonomy | |
CN103116766B (en) | A kind of image classification method of encoding based on Increment Artificial Neural Network and subgraph | |
CN108875816A (en) | Merge the Active Learning samples selection strategy of Reliability Code and diversity criterion | |
CN112613552B (en) | Convolutional neural network emotion image classification method combined with emotion type attention loss | |
Tang et al. | Clustering big IoT data by metaheuristic optimized mini-batch and parallel partition-based DGC in Hadoop | |
CN108897791B (en) | Image retrieval method based on depth convolution characteristics and semantic similarity measurement | |
CN106056059B (en) | The face identification method of multi-direction SLGS feature description and performance cloud Weighted Fusion | |
CN113032613B (en) | Three-dimensional model retrieval method based on interactive attention convolution neural network | |
CN110070121A (en) | A kind of quick approximate k nearest neighbor method based on tree strategy with balance K mean cluster | |
CN111914900B (en) | User electricity utilization mode classification method | |
CN100416599C (en) | Not supervised classification process of artificial immunity in remote sensing images | |
CN108596186B (en) | Three-dimensional model retrieval method | |
CN110364264A (en) | Medical data collection feature dimension reduction method based on sub-space learning | |
CN110263855A (en) | A method of it is projected using cobasis capsule and carries out image classification | |
Tan et al. | Deep adaptive fuzzy clustering for evolutionary unsupervised representation learning | |
Saha et al. | Semi-supervised clustering using multiobjective optimization | |
Guo | Research on sports video retrieval algorithm based on semantic feature extraction | |
CN110490234A (en) | The construction method and classification method of classifier based on Cluster Classification associative mechanism | |
Zhu et al. | Beyond Similar and Dissimilar Relations: A Kernel Regression Formulation for Metric Learning. | |
CN110610420A (en) | Stock price trend prediction method and system | |
CN105654498A (en) | Image segmentation method based on dynamic local search and immune clone automatic clustering | |
CN110059752A (en) | A kind of statistical learning querying method based on comentropy Sampling Estimation | |
CN113011589B (en) | Co-evolution-based hyperspectral image band selection method and system |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination |