CN110069498A - High quality mode method for digging based on multi-objective evolutionary algorithm - Google Patents
High quality mode method for digging based on multi-objective evolutionary algorithm Download PDFInfo
- Publication number
- CN110069498A CN110069498A CN201910303505.8A CN201910303505A CN110069498A CN 110069498 A CN110069498 A CN 110069498A CN 201910303505 A CN201910303505 A CN 201910303505A CN 110069498 A CN110069498 A CN 110069498A
- Authority
- CN
- China
- Prior art keywords
- individual
- algorithm
- population
- mode
- tree structure
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Pending
Links
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/20—Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
- G06F16/22—Indexing; Data structures therefor; Storage structures
- G06F16/2228—Indexing structures
- G06F16/2246—Trees, e.g. B+trees
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N20/00—Machine learning
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/004—Artificial life, i.e. computing arrangements simulating life
- G06N3/006—Artificial life, i.e. computing arrangements simulating life based on simulated virtual individual or collective life forms, e.g. social simulations or particle swarm optimisation [PSO]
Landscapes
- Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- Software Systems (AREA)
- Physics & Mathematics (AREA)
- Data Mining & Analysis (AREA)
- General Engineering & Computer Science (AREA)
- General Physics & Mathematics (AREA)
- Mathematical Physics (AREA)
- Artificial Intelligence (AREA)
- Evolutionary Computation (AREA)
- Computing Systems (AREA)
- Medical Informatics (AREA)
- Computer Vision & Pattern Recognition (AREA)
- Health & Medical Sciences (AREA)
- Life Sciences & Earth Sciences (AREA)
- Biomedical Technology (AREA)
- Biophysics (AREA)
- Computational Linguistics (AREA)
- General Health & Medical Sciences (AREA)
- Molecular Biology (AREA)
- Databases & Information Systems (AREA)
- Information Retrieval, Db Structures And Fs Structures Therefor (AREA)
Abstract
The invention discloses the high quality mode method for digging based on multi-objective evolutionary algorithm, belong to data mining technology field.The method solves the problems, such as that user setting one suitable parameter threshold is more difficult by using multi-objective evolutionary algorithm;Initial population is constructed by using based on the initialization of population strategy of OR/NOR-tree structure and in conjunction with the raw data base for being expressed as bitmap form, and it is configured using improved crossover operator and mutation operator at the NOR in OR/NOR-tree structure and at OR, it solves that data are usually huger and sparse, lead to traditional random initializtion method and intersects and the inefficient problem of mutation operator;In addition, the application also scans for the adjustment in direction by using worst individual direction of search adjustable strategies, process and optimum results are improved and optimizated, has achieved the purpose that the quality for improving convergence rate and last solution.
Description
Technical field
The present invention relates to the high quality mode method for digging based on multi-objective evolutionary algorithm, belong to data mining technology neck
Domain.
Background technique
Data mining refers to extracting potential interesting information or mode from a large amount of data, for further
The process used.
Most of traditional mode excavation method requires setting Study first, and the user of no any experience is come
Say it is more difficult how a suitable parameter threshold is set, and solution efficiency is relatively low.And multi-objective evolution is used to calculate
Method, which can not have to setting threshold value, can explore the mode for meeting specified requirements.
Existing multiple target pattern mining algorithm, such as the Pattern Recommendation in delivered for 2017
Task-oriented Applications:A Multi-Objective Perspective is by the mode excavation of oriented mission
Problem is converted into multi-objective optimization question, and proposes MOPM algorithm, for finding the mode of the condition of satisfaction.It delivers within 2018
A multi-objective evolutionary approach for mining frequent and high utility
Itemsets discloses a kind of MOEA-FHUI algorithm, considers support and value of utility to be created as two objective optimizations together
Problem model, for exploring the mode frequently occurred and with high usage value.Above two algorithm otherwise only focus on frequently go out
Existing and complete mode or only focus on frequently occurs and the mode of effective, however in practical application, user more concerned with
It is that those are frequent and completely in data set, while can bring the mode (combinations of commodity) of fairly high profit.And with
The increase of objective function quantity and data are usually huger and sparse, cause traditional random initializtion method and
Intersect and mutation operator is inefficient, the existing pattern mining algorithm performance based on EVOLUTIONARY COMPUTATION is painstaking, it is therefore necessary to needle
New mode excavation model is established to the diversified actual demand of user, and proposes a kind of efficient pattern mining algorithm.
Summary of the invention
In order to solve the problems, such as presently, there are data mining model algorithm solution efficiency it is lower, the present invention provides be based on
The high quality mode method for digging of multi-objective evolutionary algorithm, the method are based on NSGA-II algorithm, in the algorithm using following
Step makes improvements:
Raw data base is expressed as to the form of bitmap;
The original of bitmap form has been expressed as using the initialization of population strategy based on OR/NOR-tree structure and combination
Database constructs initial population;
It is configured using improved crossover operator and mutation operator at the NOR in OR/NOR-tree structure and at OR;
The adjustment in direction is scanned for using worst individual direction of search adjustable strategies and based on OR/NOR-tree structure;
Solution is excavated using high quality mode is carried out by above-mentioned improved NSGA-II algorithm.
Optionally, described that position has been expressed as using the initialization of population strategy based on OR/NOR-tree structure and combination
The raw data base of diagram form is come before constructing initial population, comprising:
Raw data base is scanned to find all max model and all different items, is then constructed according to max model
OR/NOR-tree structure.
Optionally, described that position has been expressed as using the initialization of population strategy based on OR/NOR-tree structure and combination
The raw data base of diagram form constructs initial population, comprising:
Different tree branches is distributed to each initial individuals, and then individual is distributed in three kinds of states below:
The corresponding one of position OR of individual is initially 1 by state one, and other positions are initially 0;
The corresponding all positions OR of individual are initially 1 by state two, and all NOR bits, which are set, is initially 0;
The corresponding position OR of individual is initially 0 or 1 by state three at random, and all NOR bits, which are set, is initially 0.
Optionally, it is described using improved crossover operator and mutation operator at the NOR in OR/NOR-tree structure and OR
Place is configured, comprising:
New individual is generated using uniformity crossover, is 0 by the new corresponding NOR disposition of individual;
It for mutation operator, is operated using basic bit mutation, only carries out mutation operation at OR corresponding to individual.
Optionally, the adjustment that direction is scanned for using worst individual direction of search adjustable strategies, comprising:
The case where Population Size is greater than for OR/NOR-tree branch sum, replace current population in each iterative process
In the worst individual direction of search.
Optionally, the individual direction of search worst in the replacement population, comprising: according to non-dominated ranking and crowding distance
Individual worst in this generation is selected, then distributes OR/NOR-tree branch to the individual again.
Optionally, the improved NSGA-II algorithm uses binary coding mechanism, and selection operation uses binary prize
Match selection method.
Application of the above method in data mining technology field is also claimed in the application.
The application is also claimed the above method and is solving the application method in following three object modules, the three targets mould
Type are as follows:
Maximize F (X)={ (f1(X),f2(X),f3(X))T}
Wherein,f1(X) (opposite) support of intermediate scheme X, f2(X) intermediate scheme X occupies
Degree, f3(X) (opposite) value of utility of intermediate scheme X.
Optionally, (opposite) support of the mode X is used to measure the frequency of occurrences of mode, is defined as follows:
Wherein D={ T1,T2,…Ti..., TnIt is raw data set, Ti, it is the single data in raw data set D;|D|
It is the data T in Di, quantity;
The occupancy of the mode X is used to measure the integrality of mode, is defined as follows:
Wherein, TxIndicate the data comprising all in mode X;
The total value of utility of all items that (opposite) value of utility intermediate scheme X of the mode X is included and data lump are imitated
With the ratio of value, it is defined as follows:
Wherein, Tq∈ D (1≤q≤n) is q data, ijIndicate j-th, q (ij,Tq)(1≤j≤m,1≤q≤
N): j-th quantity for including in the q articles record, p (ij): j-th weight, TU (Tq) indicate the q articles effect recorded
With value.
The medicine have the advantages that
The application solves traditional most of mode excavation method by using multi-objective evolutionary algorithm and requires to be arranged
Study first, and for the user of no any experience, how asking for a suitable parameter threshold more difficulty is set
Topic;The original of bitmap form has been expressed as it by using the initialization of population strategy based on OR/NOR-tree structure and combination
Database constructs initial population, and using improved crossover operator and mutation operator to the NOR in OR/NOR-tree structure
It is configured at place and OR, in the practical application for solving many mode excavations, data are usually huger and sparse, cause
Traditional random initializtion method and intersection and the inefficient problem of mutation operator, to improve the integrated solution of algorithm
Efficiency;In addition, the application also scans for the adjustment in direction by using worst individual direction of search adjustable strategies, improve and optimizate
Process and optimum results have achieved the purpose that improve convergence rate and ensure that the quality of last solution.
Detailed description of the invention
To describe the technical solutions in the embodiments of the present invention more clearly, make required in being described below to embodiment
Attached drawing is briefly described, it should be apparent that, drawings in the following description are only some embodiments of the invention, for
For those of ordinary skill in the art, without creative efforts, it can also be obtained according to these attached drawings other
Attached drawing.
Fig. 1 is example OR/NOR-tree structure chart.
Fig. 2 is improved crossover operator and mutation operator figure.
Fig. 3 is Pareto optimal solution set schematic diagram of the algorithms of different on Accident_10%.
Fig. 4 is Pareto optimal solution set schematic diagram of the algorithms of different on Chess.
Fig. 5 is Pareto optimal solution set schematic diagram of the algorithms of different on Connect_50%.
Fig. 6 is Pareto optimal solution set schematic diagram of the algorithms of different on Mushroom.
Fig. 7 is on 4 data sets, with the variation schematic diagram of different algorithms hypervolume under different evaluation number.
Fig. 8 is on 4 data sets, with the variation schematic diagram of different algorithms coverage rate under different evaluation number.
Fig. 9 is the final non-dominant disaggregation schematic diagram obtained on four data sets using MOEA-PM algorithm.
Specific embodiment
To make the object, technical solutions and advantages of the present invention clearer, below in conjunction with attached drawing to embodiment party of the present invention
Formula is described in further detail.
Embodiment one:
The present embodiment provides a kind of high quality mode method for digging, require for traditional most of mode excavation method
Study first is set, and for the user of no any experience, it is more difficult how a suitable parameter threshold is set
The problem of, the application optimizes above problem model using multi-objective evolutionary algorithm, it, which can not have to setting threshold value, to explore
Meet the mode of specified requirements out;In addition, the application is directed in the practical application of many mode excavations, data are usually huger
It is big and sparse, lead to traditional random initializtion method and intersection and the inefficient problem of mutation operator, proposes one
The new initialization of population method of kind is taken into account individual in initial population while guaranteeing that initial population has higher evolution starting point
Validity and diversity;Meanwhile the application also developed suitable in the intersection and mutation operator of the problem and population compared with
The direction of search replacement policy of poor individual, with the process of improving and optimizating and optimum results.The application uses binary coding mode,
Intermediate value is that " 1 " indicates that item exists, and " 0 " indicates that corresponding item is not present.
Specifically, the application is based on NSGA-II algorithm, using a kind of new initialization of population method, and traditional more
During objective optimization theoretical research, random initialization of population method is generallyd use, random initialization of population method is right
When distribution is initialized than sparse data, the mould that initial individuals are largely all distributed across other than solution space will lead to
Formula, just there are many infeasible schemes before evolution starts for population, this largely reduces the operation effect of algorithm
Rate, so, the application uses the new initialization of population strategy based on OR/NOR-tree structure to initialize data,
To ensure that initial population is effectively distributed in solution space.
For example, raw data base in the following table 1:
1. database of table
2. profit flow table of table
In order to improve the computational efficiency of objective function, raw data base is first expressed as to the form of bitmap.
Assuming that D={ T1,T2,…Tq…,TnIt is a quantized data library (quantitative database), I={ i1,
i2,…,ivIt is the set that all different items (item) form in the database, then the bitmap (bitmap) of D is a n × v
Boolean matrix (Boolean matrix), is denoted as B (D).
The value of jth row (1≤j≤n) and the kth column (1≤k≤v) of B (D), i.e. Bj,k, it calculates in the following way:
The bitmap that table 3 gives illustrative data base in table 1 indicates.
The bitmap of 3. illustrative data base of table indicates
Before an initialization, the database is scanned first to find all max model and all different items, then
OR/NOR-tree structure is constructed according to max model.
For example, max model all in database in table 1 is { a, b, c, f }, { a, c, e, f, g }, { c, d, g },
Corresponding OR/NOR-tree structure is as shown in Figure 1.OR indicates that corresponding item may exist in chromosome, can also be not present
(that is: corresponding position is 0 or 1);NOR indicates that (that is: corresponding position is 0) centainly is not present in chromosome in corresponding item.
For example, item collection { a, b, c, d } cannot generate, because the combination is unsatisfactory for any branch of OR/NOR-tree.
Item collection { c, e, f } can generate, because the combination meets that intermediate branch.
In order to use the distribution situation of a greater degree of reaction solution of the individual of limited quantity, distributed first to each initial individuals
Then individual is distributed in three kinds of states below by different tree branches:
The corresponding one of position OR of individual is initially 1 by state one, and other positions are initially 0;
The corresponding all positions OR of individual are initially 1 by state two, and all NOR bits, which are set, is initially 0;
The corresponding position OR of individual is initially 0 or 1 by state three at random, and all NOR bits, which are set, is initially 0.
Above-mentioned state one and state two can guarantee covering of the initial population to borderline region in solution space, and state three can guarantee
To the uniform fold of the non-borderline region of solution space.
This initialization strategy improves convergence speed of the algorithm and search efficiency to a certain extent, and this strategy has
Effect property will be verified in experimental section.
After carrying out initialization process to data,
It is configured using improved crossover operator and mutation operator at the NOR in OR/NOR-tree structure and at OR;
New individual is generated using uniformity crossover first, then according to the corresponding OR/NOR-tree branch of the individual,
It is 0 by the corresponding NOR disposition of individual.
As illustrated in fig. 2, it is assumed that the wherein new dye that chromosome A and chromosome B is obtained later by uniform crossover operator
Colour solid is A '=(1101101), and assumes that the corresponding tree of chromosome A is branched among in upper figure left side OR/NOR-tree
That, then corresponds to NOR bit in tree branch for the chromosome and installs and be set to 0, finally obtains A '=(1000101).
Similarly, it for mutation operator, is operated using basic bit mutation, only to becoming at corresponding OR on every chromosome
ETTHER-OR operation.
For example, (1101100) can not generate for the chromosome for being assigned to medial fascicle, because coding
2nd position and the 4th position one are set to 0;And encode (1010101) and can produce, because it meets the 2nd of coding
Position and the 4th position are 0.
By operating above, while guaranteeing that son individual sufficiently inherits parent individual advantage, moreover it is possible to guarantee new individual
The item collection (itemset) of representative is the combination of effective item in data set, to avoid the generation of meaningless item collection combination, improves
Exploring ability of the algorithm for effective solution space, accelerates convergence speed of the algorithm.
After improving crossover operator and mutation operator, direction is scanned for using worst individual direction of search adjustable strategies
Adjustment;
The case where Population Size is greater than for OR/NOR-tree branch sum, replace current population in each iterative process
In the worst individual direction of search, to guarantee effective search to solution space domain.Because in this case, using only front
Search space possibly can not be expanded to the region where globally optimal solution by improved method, cause finally to hardly result in the overall situation most
Excellent solution.Therefore, worst individual direction of search adjustable strategies are proposed on the basis of the above.Detailed process can be summarized
Are as follows:
For follow-on individual will be entered in Evolution of Population, in this generation, is selected according to non-dominated ranking and crowding distance
In worst individual, then give individual distribution OR/NOR-tree branch again.This is equivalent to the searcher for modifying worst individual
To the strategy can improve algorithm to the ability of searching optimum of solution space to a certain extent.
Solution, specific algorithm are excavated using high quality mode is carried out by above-mentioned improved NSGA-II algorithm are as follows:
For the performance for examining the above method, the application solves following three object modules using the above method:
Maximize F (X)={ (f1(X),f2(X),f3(X))T}
Wherein:
(opposite) support for defining 1. item collection X, is defined as follows:
In table 1, the support of item collection { b, c } is sup ({ b, c })=3/10, because { b, c } goes out in illustrative data base
Present T1,T7And T10In.Similarly, because item collection { c, g } appears in T2,T4In, so sup ({ c, g })=2/10.
Assuming that minimum support threshold value minSup=0.25, because of sup ({ b, c }) >=minSup, so item collection { b, c } is
Frequent mode, because of sup ({ c, g }) < minSup, so item collection { c, g } is not frequent mode.
2. are defined for any two mode m and n, is made if there is no any item collectionThen m can regard one as
Max model.In table 1, it is clear that item collection { b, c, f } is not max model, becauseItem collection { a, b } is also not
Max model, becauseItem collection { c, d, g } is max model, because any item collection is not present in illustrative data base
Can include { c, d, g }.
The integrality that 3. occupancies are used to measurement pattern is defined, is defined as follows:
For example, it is T respectively that the record comprising it, which has 3, in mode for { b, c }1,T7And T10, then the mode is occupied
Degree is occu ({ b, c })=(2/3+2/2+2/4)/3 ≈ 0.72, if most descending occupancy threshold value minOccu=0.6, the mould
Formula is known as dominant pattern.
(opposite) value of utility of 4. item collection X is defined, is defined as:
For example, the value of utility of mode { c, f }
Uti ({ c, f })=((1 × 2+4 × 8)+(5 × 2+2 × 8)+(1 × 2+2 × 8))/(37+31+27+58+24+39+
If 12+22+28+39) ≈ 0.25. minimum effectiveness threshold value minUti is less than the value, item collection { c, f } is exactly effective item collection,
Also referred to as effective mode.
In solution procedure, using -4170@3.70GHz CPU 8G memory of 10 64 Intel's Intel Core i3s of windows
Desktop computer is completed, and algorithm realized using Matlab, assesses the performance of the above method using four disclosed real data sets, and four
A disclosed real data set includes chess, mushroom, accident, connect, and all data sets can be from
SPMF data mining is downloaded in library, because partial data collection is huger, is described the problem in order to simpler, using Accident
Preceding 10% and Connect preceding 50%.Fig. 9 is to be obtained most in four disclosed real data sets using MOEA-PM algorithm
Non-dominant disaggregation schematic diagram eventually.
Table 4 and 5 pairs of table be described in detail using the parameter and feature of data set.
The relevant parameter of 4. data set of table
The feature of 5. data set of table
It is assessed by the MOEA-PM performance that several more advanced algorithms and its variant propose the application.
1)MOEA-PM-: in order to illustrate the improved genetic operator (Improved Genetic Operators) proposed
Validity, the application compares the variant of MOEA-PM algorithm.Only retain the initialization of population strategy in MOEA-PM algorithm, it will
Improved genetic operator replaces with common operator, and is marked as MOEA-PM-。
2) two kinds of mode is defined in MOPM:MOPM algorithm, is transaction mode (transaction- respectively
Pattern) and meta schema (meta-pattern) is used to generate initial population.Transaction mode (transaction-pattern)
It is considered usual occupancy with higher and lower support, meta schema (meta-pattern) usually branch with higher
Degree of holding and lower occupancy.When therefore carrying out mode excavation using the algorithm, it is believed to obtain more diversified solution.
3) validity of MOEA-FHUI (NSGA-II): MOEA-PM algorithm and digging efficiency also with newest MOEA-HUIM
Algorithm compares, and the algorithm is using first item collection (meta-itemset) and transaction itemset (transaction-itemset)
Carry out initialization population.Different from MOPM algorithm, it is according to the support angle value and transaction item of first item collection (meta-itemset)
Collect the value of utility of (transaction-itemset) as selected probability, carrys out random initializtion population.In order to guarantee justice
Property, all algorithms are all based on NSGA-II algorithm by us, therefore he is labeled as MOEA-PM (NSGA-II) herein.
4) MOEA-PM (Random): in order to illustrate the validity of the initialization of population strategy proposed in MOEA-PM, we
Compare the variant of MOEA-PM algorithm, MOEA-PM (Random).MOEA-PM (Random) uses random initialization of population plan
Slightly, other component parts are identical as MOEA-PM.
5) MOEA-PM (Meta.) and MOEA-PM (Tran.): the variant MOEA-PM (Meta.) of MOEA-PM algorithm and
MOEA-PM (Tran.) is used in next comparative experiments, for illustrating the validity of the initialization of population strategy proposed.
MOEA-PM (Meta.) indicates that initial population is all made of meta-patterns at random, and MOEA-PM (Tran.) indicates initial
Population is all made of transaction-patterns at random, likewise, other component parts are identical as MOEA-PM.
It should be noted that above-mentioned all algorithms all use binary coding mechanism, selection operation in order to guarantee fairness
Using binary algorithm of tournament selection method, other than MOEA-PM and its variant algorithm, other algorithms be all made of uniformity crossover and
Basic bit mutation operator.For mutation operator, it is assumed that the sum of different item is I in data set, then mutation probability is Pm=1/ | I
|。
For the quality of final mode that evaluation MOEA-PM algorithm excavates, using hypervolume and coverage rate work in the application
For performance indicator.
Hypervolume (Hypervolume, HV) is one of the field EMO evaluation index, which can be comprehensive to a certain extent
The convergence and diversity of reflection disaggregation are closed, calculation formula is as follows:
Wherein, λ represents Lebesgue measure, and A represents non-dominant disaggregation, voliRepresent reference point and non-dominant individual piIt constitutes
Hypervolume.HV value is bigger, and the performance for indicating the obtained disaggregation of algorithm is more excellent.
Coverage rate (Coverage, COV) is common evaluation index in recommender system, it refers to that algorithm is recommended to user
The ratio of the total article set of commodity Zhan out, calculation formula are as follows:
Wherein, NdIndicate the number of different item in recommendation list, N indicates all quantity.If algorithm
The coverage rate of the disaggregation arrived is relatively low, and the solution range for just illustrating that this algorithm obtains compares limitation, will reduce the satisfaction of user
Degree, because low coverage rate means that at user option commodity are less.Identical as HV index, the value of COV is bigger, illustrates algorithm
Obtained mode to be recommended is more excellent.
100 are set by the Population Size of above-mentioned all algorithms, evaluation number is set as 5000, observes each algorithm less
The quality of the non-dominant disaggregation obtained under evolutionary generation.
The tired support optimal solution set of obtained bat of each algorithm in 4 real data sets is as shown in Fig. 3-Fig. 6.
Can be seen that from such as Fig. 3-Fig. 6 and concentrated in four truthful datas, the quantity either solved still convergence with
In terms of diversity, the effect of MOEA-PM algorithm is superior to other algorithms.
It can be found that the performance of MOEA-PM (Random) is worst, that is in the initial population due to completely random for observation
It is individual most of be all it is invalid, this weakens the evolvability of algorithm, therefore does not restrain in limited times evolutionary generation.
The performance of MOPM algorithm is better than MOEA-PM (Random) algorithm, that is because using meta-pattern and transaction-
It is each to have also combined both modes while guaranteeing the individual in initial population is all effective for the population of pattern initialization
From the advantages of.MOEA-FHUI (NSGA-II) algorithm is similar with MOPM algorithm performance results, or even the MOEA- on certain data sets
The performance of FHUI (NSGA-II) is better than MOPM, this illustrates that the population of MOEA-FHUI (NSGA-II) to a certain extent is initial
Change method is better than the initial method of MOPM to a certain extent.Since the initial population of MOEA-PM (Meta.) algorithm usually has
There is higher support angle value, but since distributivity is poor, the solution explored in limited times evolutionary generation is mainly concentrated
The high position for supporting angle value in solution space;Similarly, the solution master that MOEA-PM (Trans.) is explored in limited times evolutionary generation
It is distributed in the high position for supporting angle value in solution space.
And the MOEA-PM algorithm that the application proposes uses special initialization of population method and improved cross and variation operator
Solves the above problem, on the one hand, ensure that algorithm is at preferably state before evolution starts, on the other hand, avoid
The random combine of item collection in evolutionary process improves efficiency, therefore puts up the best performance in such as Fig. 3-Fig. 6.
In order to evaluate the quality for the final mode that each algorithm excavates, the Population Size of above-mentioned all algorithms is adjusted to
150, maximum evaluation number is adjusted to 45000, and algorithm evaluates HV the and COV value under numbers in the different functions of 4 data sets,
As shown in Figs. 7-8.A~d can be seen that compared to other algorithms, the HV curve convergence speed of MOEA-PM algorithm from Fig. 7
Most fast, this shows that algorithm can be with faster velocity interpolation convergence and multifarious balance.MOEA-HUIM's (NSGA-II)
HV curve convergence speed is slightly better than MOPM, and fluctuating range wants small compared to MOPM, that is because of MOEA-HUIM (NSGA-
II initial population) is to be randomly selected according to the support and value of utility of the both of which proposed, therefore restrain speed
It spends relatively fast.MOEA-PM-Performance and MOEA-HUIM (NSGA-II) behave like, be substantially better than the latter, explanation
It is proposed that initialization of population strategy be effective.But MOEA-PM-The convergence rate of HV curve to be but not so good as MOEA-PM fast,
Illustrate it is proposed that improved genetic operator be affected to Algorithm Convergence and distributivity, this also indirect proof we
The validity of improved genetic operator.
(a)~(d) can be seen that MOEA-PM algorithm can make coverage rate curve convergence with faster speed from Fig. 8, and
And curve fluctuation is more gentle.In conjunction with Fig. 7 and Fig. 8 it was found that even if we increase evolutionary generation, MOEA-PM
(Random) it still can not restrain, this illustrates the trivial solution that random initialization of population method generates, and can seriously affect algorithm
Environmental selection ability, causes evolvability to seriously undermine, therefore algorithm is difficult to restrain in limited times evolutionary generation.Although
MOEA-PM (Meta.) and MOEA-PM (Tran.) can guarantee the validity of initial population, but since initial population is empty in search
Between in be unevenly distributed, diversity is poor, will affect algorithm in the exploring ability at the initial stage of evolution, therefore convergence rate is slower.
Experiment shows that the quality of MOEA-PM algorithm either convergence rate or last solution proposed in this paper is better than above-mentioned algorithm.
Part steps in the embodiment of the present invention, can use software realization, and corresponding software program can store can
In the storage medium of reading, such as CD or hard disk.
The foregoing is merely presently preferred embodiments of the present invention, is not intended to limit the invention, it is all in spirit of the invention and
Within principle, any modification, equivalent replacement, improvement and so on be should all be included in the protection scope of the present invention.
Claims (10)
1. a kind of high quality mode method for digging, which is characterized in that the method is based on NSGA-II algorithm, adopts in the algorithm
It is made improvements with following step:
Raw data base is expressed as to the form of bitmap;
It has been expressed as the initial data of bitmap form using the initialization of population strategy based on OR/NOR-tree structure and combination
Library constructs initial population;
It is configured using improved crossover operator and mutation operator at the NOR in OR/NOR-tree structure and at OR;
The adjustment in direction is scanned for using worst individual direction of search adjustable strategies and based on OR/NOR-tree structure;
Solution is excavated using high quality mode is carried out by above-mentioned improved NSGA-II algorithm.
2. the method according to claim 1, wherein described using at the beginning of the population based on OR/NOR-tree structure
Beginningization strategy and in conjunction with the raw data base for being expressed as bitmap form come before constructing initial population, comprising:
Raw data base is scanned to find all max model and all different items, OR/ is then constructed according to max model
NOR-tree structure.
3. the method according to claim 1, wherein described using at the beginning of the population based on OR/NOR-tree structure
Beginningization strategy simultaneously constructs initial population in conjunction with the raw data base for being expressed as bitmap form, comprising:
Different tree branches is distributed to each initial individuals, and then individual is distributed in three kinds of states below:
The corresponding one of position OR of individual is initially 1 by state one, and other positions are initially 0;
The corresponding all positions OR of individual are initially 1 by state two, and all NOR bits, which are set, is initially 0;
The corresponding position OR of individual is initially 0 or 1 by state three at random, and all NOR bits, which are set, is initially 0.
4. the method according to claim 1, wherein described use improved crossover operator and mutation operator pair
It is configured at NOR in OR/NOR-tree structure and at OR, comprising:
New individual is generated using uniformity crossover, is 0 by the new corresponding NOR disposition of individual;
It for mutation operator, is operated using basic bit mutation, only carries out mutation operation at OR corresponding to individual.
5. the method according to claim 1, wherein described carried out using worst individual direction of search adjustable strategies
The adjustment of the direction of search, comprising:
The case where being greater than Population Size for OR/NOR-tree branch sum replaces in each iterative process in current population most
The individual direction of search of difference.
6. according to the method described in claim 5, it is characterized in that, the individual direction of search worst in the replacement population, packet
It includes: selecting individual worst in this generation according to non-dominated ranking and crowding distance, then distribute OR/NOR- to the individual again
Tree branch.
7. the method according to claim 1, wherein the improved NSGA-II algorithm is compiled using binary system
Ink recorder system, selection operation use binary algorithm of tournament selection method.
8. application of the method as claimed in claim 1 to 7 in data mining technology field.
9. a kind of method as claimed in claim 1 to 7 is solving the application method in following three object modules, three mesh
Mark model are as follows:
Maximize F (X)={ (f1(X),f2(X),f3(X))T}
Wherein,f1(X) the opposite support of intermediate scheme X, f2(X) occupancy of intermediate scheme X, f3(X)
The relative utility value of intermediate scheme X.
10. according to the method described in claim 9, it is characterized in that, the opposite support of the mode X is used to measure mode
The frequency of occurrences is defined as follows:
Wherein D={ T1,T2,…Tq..., TnIt is raw data set, TqFor the single data in raw data set D;| D | it is in D
Data TqQuantity;
The occupancy of the mode X is used to measure the integrality of mode, is defined as follows:
Wherein, TxIndicate the data comprising all in mode X;
The total value of utility of all items that the relative utility value intermediate scheme X of the mode X is included and raw data set total utility
The ratio of value, is defined as follows:
Wherein, Tq∈ D (1≤q≤n) is q data, ijIndicate j-th, q (ij,Tq) (1≤j≤m, 1≤q≤n): q
J-th quantity for including in item record, p (ij): j-th weight, TU (Tq) indicate the q articles value of utility recorded.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201910303505.8A CN110069498A (en) | 2019-04-16 | 2019-04-16 | High quality mode method for digging based on multi-objective evolutionary algorithm |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201910303505.8A CN110069498A (en) | 2019-04-16 | 2019-04-16 | High quality mode method for digging based on multi-objective evolutionary algorithm |
Publications (1)
Publication Number | Publication Date |
---|---|
CN110069498A true CN110069498A (en) | 2019-07-30 |
Family
ID=67367842
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201910303505.8A Pending CN110069498A (en) | 2019-04-16 | 2019-04-16 | High quality mode method for digging based on multi-objective evolutionary algorithm |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN110069498A (en) |
Cited By (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN110955702A (en) * | 2019-11-28 | 2020-04-03 | 江南大学 | Pattern data mining method based on improved genetic algorithm |
WO2021102775A1 (en) * | 2019-11-28 | 2021-06-03 | 江南大学 | Pattern data mining method based on improved genetic algorithm |
CN117010991A (en) * | 2023-07-31 | 2023-11-07 | 江南大学 | High-profit commodity combination mining method based on GPU (graphic processing Unit) parallel improved genetic algorithm |
Citations (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20120150860A1 (en) * | 2010-12-10 | 2012-06-14 | Yahoo!, Inc. | Clustering with Similarity-Adjusted Entropy |
CN103198359A (en) * | 2013-04-03 | 2013-07-10 | 南京理工大学 | Optimized and improved fuzzy regression model construction method based on nondominated sorting genetic algorithm II (NSGA- II) |
CN105740467A (en) * | 2016-03-07 | 2016-07-06 | 东北大学 | Mining method for C-Mn steel industry big data |
CN106997553A (en) * | 2017-04-12 | 2017-08-01 | 安徽大学 | A kind of method for digging of the grouping of commodities pattern based on multiple-objection optimization |
-
2019
- 2019-04-16 CN CN201910303505.8A patent/CN110069498A/en active Pending
Patent Citations (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20120150860A1 (en) * | 2010-12-10 | 2012-06-14 | Yahoo!, Inc. | Clustering with Similarity-Adjusted Entropy |
CN103198359A (en) * | 2013-04-03 | 2013-07-10 | 南京理工大学 | Optimized and improved fuzzy regression model construction method based on nondominated sorting genetic algorithm II (NSGA- II) |
CN105740467A (en) * | 2016-03-07 | 2016-07-06 | 东北大学 | Mining method for C-Mn steel industry big data |
CN106997553A (en) * | 2017-04-12 | 2017-08-01 | 安徽大学 | A kind of method for digging of the grouping of commodities pattern based on multiple-objection optimization |
Non-Patent Citations (2)
Title |
---|
张强: "基于演化计算的模式挖掘算法研究", 《中国优秀硕士学位论文全文数据库 信息科技辑》 * |
杨璐: "基于频繁且高效用项集的数据挖掘研究", 《中国优秀硕士学位论文全文数据库 信息科技辑》 * |
Cited By (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN110955702A (en) * | 2019-11-28 | 2020-04-03 | 江南大学 | Pattern data mining method based on improved genetic algorithm |
WO2021102775A1 (en) * | 2019-11-28 | 2021-06-03 | 江南大学 | Pattern data mining method based on improved genetic algorithm |
CN110955702B (en) * | 2019-11-28 | 2024-03-29 | 江南大学 | Improved genetic algorithm-based mode data mining method |
CN117010991A (en) * | 2023-07-31 | 2023-11-07 | 江南大学 | High-profit commodity combination mining method based on GPU (graphic processing Unit) parallel improved genetic algorithm |
CN117010991B (en) * | 2023-07-31 | 2024-05-03 | 江南大学 | High-profit commodity combination mining method based on GPU (graphic processing Unit) parallel improved genetic algorithm |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
Beukelaer et al. | Core Hunter II: fast core subset selection based on multiple genetic diversity measures using Mixed Replica search | |
CN110069498A (en) | High quality mode method for digging based on multi-objective evolutionary algorithm | |
Kauffman et al. | Technological evolution and adaptive organizations | |
Mampaey et al. | Summarizing data succinctly with the most informative itemsets | |
Divina et al. | A multi-objective approach to discover biclusters in microarray data | |
Dehkordi et al. | A Novel Method for Privacy Preserving in Association Rule Mining Based on Genetic Algorithms. | |
Lin et al. | A GA-based approach to hide sensitive high utility itemsets | |
WO2020210974A1 (en) | High-quality pattern mining model and method based on improved multi-objective evolutionary algorithm | |
Golshanara et al. | A multi-colony ant algorithm for optimizing join queries in distributed database systems | |
Gnägi et al. | A matheuristic for large-scale capacitated clustering | |
Liu et al. | Versatile black-box optimization | |
Zhang et al. | Improved genetic algorithm for high-utility itemset mining | |
Song et al. | Mining high average-utility itemsets based on particle swarm optimization | |
García-Martínez et al. | A simulated annealing method based on a specialised evolutionary algorithm | |
Lunardi et al. | Comparative study of genetic and discrete firefly algorithm for combinatorial optimization | |
Coelho et al. | Multi-objective design of hierarchical consensus functions for clustering ensembles via genetic programming | |
WO2000038112A2 (en) | Code compaction by evolutionary algorithm | |
Chen et al. | Improved local search for the minimum weight dominating set problem in massive graphs by using a deep optimization mechanism | |
Michelakos et al. | A hybrid classification algorithm evaluated on medical data | |
Banka et al. | Evolutionary biclustering of gene expressions | |
CN109977165A (en) | A kind of three target pattern mining models | |
Wu | Data association rules mining method based on improved apriori algorithm | |
Talebian et al. | Using genetic algorithm to select materialized views subject to dual constraints | |
CN109241134A (en) | A kind of grouping of commodities mode multiple target method for digging based on agent model | |
CN108830370A (en) | Based on the feature selection approach for enhancing learning-oriented flora foraging algorithm |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
RJ01 | Rejection of invention patent application after publication | ||
RJ01 | Rejection of invention patent application after publication |
Application publication date: 20190730 |