CN110069498A - High quality mode method for digging based on multi-objective evolutionary algorithm - Google Patents

High quality mode method for digging based on multi-objective evolutionary algorithm Download PDF

Info

Publication number
CN110069498A
CN110069498A CN201910303505.8A CN201910303505A CN110069498A CN 110069498 A CN110069498 A CN 110069498A CN 201910303505 A CN201910303505 A CN 201910303505A CN 110069498 A CN110069498 A CN 110069498A
Authority
CN
China
Prior art keywords
individual
algorithm
population
mode
tree structure
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN201910303505.8A
Other languages
Chinese (zh)
Inventor
方伟
张强
孙俊
吴小俊
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Jiangnan University
Original Assignee
Jiangnan University
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Jiangnan University filed Critical Jiangnan University
Priority to CN201910303505.8A priority Critical patent/CN110069498A/en
Publication of CN110069498A publication Critical patent/CN110069498A/en
Pending legal-status Critical Current

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/22Indexing; Data structures therefor; Storage structures
    • G06F16/2228Indexing structures
    • G06F16/2246Trees, e.g. B+trees
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N20/00Machine learning
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/004Artificial life, i.e. computing arrangements simulating life
    • G06N3/006Artificial life, i.e. computing arrangements simulating life based on simulated virtual individual or collective life forms, e.g. social simulations or particle swarm optimisation [PSO]

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Software Systems (AREA)
  • Physics & Mathematics (AREA)
  • Data Mining & Analysis (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Mathematical Physics (AREA)
  • Artificial Intelligence (AREA)
  • Evolutionary Computation (AREA)
  • Computing Systems (AREA)
  • Medical Informatics (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Health & Medical Sciences (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Biomedical Technology (AREA)
  • Biophysics (AREA)
  • Computational Linguistics (AREA)
  • General Health & Medical Sciences (AREA)
  • Molecular Biology (AREA)
  • Databases & Information Systems (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

The invention discloses the high quality mode method for digging based on multi-objective evolutionary algorithm, belong to data mining technology field.The method solves the problems, such as that user setting one suitable parameter threshold is more difficult by using multi-objective evolutionary algorithm;Initial population is constructed by using based on the initialization of population strategy of OR/NOR-tree structure and in conjunction with the raw data base for being expressed as bitmap form, and it is configured using improved crossover operator and mutation operator at the NOR in OR/NOR-tree structure and at OR, it solves that data are usually huger and sparse, lead to traditional random initializtion method and intersects and the inefficient problem of mutation operator;In addition, the application also scans for the adjustment in direction by using worst individual direction of search adjustable strategies, process and optimum results are improved and optimizated, has achieved the purpose that the quality for improving convergence rate and last solution.

Description

High quality mode method for digging based on multi-objective evolutionary algorithm
Technical field
The present invention relates to the high quality mode method for digging based on multi-objective evolutionary algorithm, belong to data mining technology neck Domain.
Background technique
Data mining refers to extracting potential interesting information or mode from a large amount of data, for further The process used.
Most of traditional mode excavation method requires setting Study first, and the user of no any experience is come Say it is more difficult how a suitable parameter threshold is set, and solution efficiency is relatively low.And multi-objective evolution is used to calculate Method, which can not have to setting threshold value, can explore the mode for meeting specified requirements.
Existing multiple target pattern mining algorithm, such as the Pattern Recommendation in delivered for 2017 Task-oriented Applications:A Multi-Objective Perspective is by the mode excavation of oriented mission Problem is converted into multi-objective optimization question, and proposes MOPM algorithm, for finding the mode of the condition of satisfaction.It delivers within 2018 A multi-objective evolutionary approach for mining frequent and high utility Itemsets discloses a kind of MOEA-FHUI algorithm, considers support and value of utility to be created as two objective optimizations together Problem model, for exploring the mode frequently occurred and with high usage value.Above two algorithm otherwise only focus on frequently go out Existing and complete mode or only focus on frequently occurs and the mode of effective, however in practical application, user more concerned with It is that those are frequent and completely in data set, while can bring the mode (combinations of commodity) of fairly high profit.And with The increase of objective function quantity and data are usually huger and sparse, cause traditional random initializtion method and Intersect and mutation operator is inefficient, the existing pattern mining algorithm performance based on EVOLUTIONARY COMPUTATION is painstaking, it is therefore necessary to needle New mode excavation model is established to the diversified actual demand of user, and proposes a kind of efficient pattern mining algorithm.
Summary of the invention
In order to solve the problems, such as presently, there are data mining model algorithm solution efficiency it is lower, the present invention provides be based on The high quality mode method for digging of multi-objective evolutionary algorithm, the method are based on NSGA-II algorithm, in the algorithm using following Step makes improvements:
Raw data base is expressed as to the form of bitmap;
The original of bitmap form has been expressed as using the initialization of population strategy based on OR/NOR-tree structure and combination Database constructs initial population;
It is configured using improved crossover operator and mutation operator at the NOR in OR/NOR-tree structure and at OR;
The adjustment in direction is scanned for using worst individual direction of search adjustable strategies and based on OR/NOR-tree structure;
Solution is excavated using high quality mode is carried out by above-mentioned improved NSGA-II algorithm.
Optionally, described that position has been expressed as using the initialization of population strategy based on OR/NOR-tree structure and combination The raw data base of diagram form is come before constructing initial population, comprising:
Raw data base is scanned to find all max model and all different items, is then constructed according to max model OR/NOR-tree structure.
Optionally, described that position has been expressed as using the initialization of population strategy based on OR/NOR-tree structure and combination The raw data base of diagram form constructs initial population, comprising:
Different tree branches is distributed to each initial individuals, and then individual is distributed in three kinds of states below:
The corresponding one of position OR of individual is initially 1 by state one, and other positions are initially 0;
The corresponding all positions OR of individual are initially 1 by state two, and all NOR bits, which are set, is initially 0;
The corresponding position OR of individual is initially 0 or 1 by state three at random, and all NOR bits, which are set, is initially 0.
Optionally, it is described using improved crossover operator and mutation operator at the NOR in OR/NOR-tree structure and OR Place is configured, comprising:
New individual is generated using uniformity crossover, is 0 by the new corresponding NOR disposition of individual;
It for mutation operator, is operated using basic bit mutation, only carries out mutation operation at OR corresponding to individual.
Optionally, the adjustment that direction is scanned for using worst individual direction of search adjustable strategies, comprising:
The case where Population Size is greater than for OR/NOR-tree branch sum, replace current population in each iterative process In the worst individual direction of search.
Optionally, the individual direction of search worst in the replacement population, comprising: according to non-dominated ranking and crowding distance Individual worst in this generation is selected, then distributes OR/NOR-tree branch to the individual again.
Optionally, the improved NSGA-II algorithm uses binary coding mechanism, and selection operation uses binary prize Match selection method.
Application of the above method in data mining technology field is also claimed in the application.
The application is also claimed the above method and is solving the application method in following three object modules, the three targets mould Type are as follows:
Maximize F (X)={ (f1(X),f2(X),f3(X))T}
Wherein,f1(X) (opposite) support of intermediate scheme X, f2(X) intermediate scheme X occupies Degree, f3(X) (opposite) value of utility of intermediate scheme X.
Optionally, (opposite) support of the mode X is used to measure the frequency of occurrences of mode, is defined as follows:
Wherein D={ T1,T2,…Ti..., TnIt is raw data set, Ti, it is the single data in raw data set D;|D| It is the data T in Di, quantity;
The occupancy of the mode X is used to measure the integrality of mode, is defined as follows:
Wherein, TxIndicate the data comprising all in mode X;
The total value of utility of all items that (opposite) value of utility intermediate scheme X of the mode X is included and data lump are imitated With the ratio of value, it is defined as follows:
Wherein, Tq∈ D (1≤q≤n) is q data, ijIndicate j-th, q (ij,Tq)(1≤j≤m,1≤q≤ N): j-th quantity for including in the q articles record, p (ij): j-th weight, TU (Tq) indicate the q articles effect recorded With value.
The medicine have the advantages that
The application solves traditional most of mode excavation method by using multi-objective evolutionary algorithm and requires to be arranged Study first, and for the user of no any experience, how asking for a suitable parameter threshold more difficulty is set Topic;The original of bitmap form has been expressed as it by using the initialization of population strategy based on OR/NOR-tree structure and combination Database constructs initial population, and using improved crossover operator and mutation operator to the NOR in OR/NOR-tree structure It is configured at place and OR, in the practical application for solving many mode excavations, data are usually huger and sparse, cause Traditional random initializtion method and intersection and the inefficient problem of mutation operator, to improve the integrated solution of algorithm Efficiency;In addition, the application also scans for the adjustment in direction by using worst individual direction of search adjustable strategies, improve and optimizate Process and optimum results have achieved the purpose that improve convergence rate and ensure that the quality of last solution.
Detailed description of the invention
To describe the technical solutions in the embodiments of the present invention more clearly, make required in being described below to embodiment Attached drawing is briefly described, it should be apparent that, drawings in the following description are only some embodiments of the invention, for For those of ordinary skill in the art, without creative efforts, it can also be obtained according to these attached drawings other Attached drawing.
Fig. 1 is example OR/NOR-tree structure chart.
Fig. 2 is improved crossover operator and mutation operator figure.
Fig. 3 is Pareto optimal solution set schematic diagram of the algorithms of different on Accident_10%.
Fig. 4 is Pareto optimal solution set schematic diagram of the algorithms of different on Chess.
Fig. 5 is Pareto optimal solution set schematic diagram of the algorithms of different on Connect_50%.
Fig. 6 is Pareto optimal solution set schematic diagram of the algorithms of different on Mushroom.
Fig. 7 is on 4 data sets, with the variation schematic diagram of different algorithms hypervolume under different evaluation number.
Fig. 8 is on 4 data sets, with the variation schematic diagram of different algorithms coverage rate under different evaluation number.
Fig. 9 is the final non-dominant disaggregation schematic diagram obtained on four data sets using MOEA-PM algorithm.
Specific embodiment
To make the object, technical solutions and advantages of the present invention clearer, below in conjunction with attached drawing to embodiment party of the present invention Formula is described in further detail.
Embodiment one:
The present embodiment provides a kind of high quality mode method for digging, require for traditional most of mode excavation method Study first is set, and for the user of no any experience, it is more difficult how a suitable parameter threshold is set The problem of, the application optimizes above problem model using multi-objective evolutionary algorithm, it, which can not have to setting threshold value, to explore Meet the mode of specified requirements out;In addition, the application is directed in the practical application of many mode excavations, data are usually huger It is big and sparse, lead to traditional random initializtion method and intersection and the inefficient problem of mutation operator, proposes one The new initialization of population method of kind is taken into account individual in initial population while guaranteeing that initial population has higher evolution starting point Validity and diversity;Meanwhile the application also developed suitable in the intersection and mutation operator of the problem and population compared with The direction of search replacement policy of poor individual, with the process of improving and optimizating and optimum results.The application uses binary coding mode, Intermediate value is that " 1 " indicates that item exists, and " 0 " indicates that corresponding item is not present.
Specifically, the application is based on NSGA-II algorithm, using a kind of new initialization of population method, and traditional more During objective optimization theoretical research, random initialization of population method is generallyd use, random initialization of population method is right When distribution is initialized than sparse data, the mould that initial individuals are largely all distributed across other than solution space will lead to Formula, just there are many infeasible schemes before evolution starts for population, this largely reduces the operation effect of algorithm Rate, so, the application uses the new initialization of population strategy based on OR/NOR-tree structure to initialize data, To ensure that initial population is effectively distributed in solution space.
For example, raw data base in the following table 1:
1. database of table
2. profit flow table of table
In order to improve the computational efficiency of objective function, raw data base is first expressed as to the form of bitmap.
Assuming that D={ T1,T2,…Tq…,TnIt is a quantized data library (quantitative database), I={ i1, i2,…,ivIt is the set that all different items (item) form in the database, then the bitmap (bitmap) of D is a n × v Boolean matrix (Boolean matrix), is denoted as B (D).
The value of jth row (1≤j≤n) and the kth column (1≤k≤v) of B (D), i.e. Bj,k, it calculates in the following way:
The bitmap that table 3 gives illustrative data base in table 1 indicates.
The bitmap of 3. illustrative data base of table indicates
Before an initialization, the database is scanned first to find all max model and all different items, then OR/NOR-tree structure is constructed according to max model.
For example, max model all in database in table 1 is { a, b, c, f }, { a, c, e, f, g }, { c, d, g }, Corresponding OR/NOR-tree structure is as shown in Figure 1.OR indicates that corresponding item may exist in chromosome, can also be not present (that is: corresponding position is 0 or 1);NOR indicates that (that is: corresponding position is 0) centainly is not present in chromosome in corresponding item.
For example, item collection { a, b, c, d } cannot generate, because the combination is unsatisfactory for any branch of OR/NOR-tree. Item collection { c, e, f } can generate, because the combination meets that intermediate branch.
In order to use the distribution situation of a greater degree of reaction solution of the individual of limited quantity, distributed first to each initial individuals Then individual is distributed in three kinds of states below by different tree branches:
The corresponding one of position OR of individual is initially 1 by state one, and other positions are initially 0;
The corresponding all positions OR of individual are initially 1 by state two, and all NOR bits, which are set, is initially 0;
The corresponding position OR of individual is initially 0 or 1 by state three at random, and all NOR bits, which are set, is initially 0.
Above-mentioned state one and state two can guarantee covering of the initial population to borderline region in solution space, and state three can guarantee To the uniform fold of the non-borderline region of solution space.
This initialization strategy improves convergence speed of the algorithm and search efficiency to a certain extent, and this strategy has Effect property will be verified in experimental section.
After carrying out initialization process to data,
It is configured using improved crossover operator and mutation operator at the NOR in OR/NOR-tree structure and at OR;
New individual is generated using uniformity crossover first, then according to the corresponding OR/NOR-tree branch of the individual, It is 0 by the corresponding NOR disposition of individual.
As illustrated in fig. 2, it is assumed that the wherein new dye that chromosome A and chromosome B is obtained later by uniform crossover operator Colour solid is A '=(1101101), and assumes that the corresponding tree of chromosome A is branched among in upper figure left side OR/NOR-tree That, then corresponds to NOR bit in tree branch for the chromosome and installs and be set to 0, finally obtains A '=(1000101).
Similarly, it for mutation operator, is operated using basic bit mutation, only to becoming at corresponding OR on every chromosome ETTHER-OR operation.
For example, (1101100) can not generate for the chromosome for being assigned to medial fascicle, because coding 2nd position and the 4th position one are set to 0;And encode (1010101) and can produce, because it meets the 2nd of coding Position and the 4th position are 0.
By operating above, while guaranteeing that son individual sufficiently inherits parent individual advantage, moreover it is possible to guarantee new individual The item collection (itemset) of representative is the combination of effective item in data set, to avoid the generation of meaningless item collection combination, improves Exploring ability of the algorithm for effective solution space, accelerates convergence speed of the algorithm.
After improving crossover operator and mutation operator, direction is scanned for using worst individual direction of search adjustable strategies Adjustment;
The case where Population Size is greater than for OR/NOR-tree branch sum, replace current population in each iterative process In the worst individual direction of search, to guarantee effective search to solution space domain.Because in this case, using only front Search space possibly can not be expanded to the region where globally optimal solution by improved method, cause finally to hardly result in the overall situation most Excellent solution.Therefore, worst individual direction of search adjustable strategies are proposed on the basis of the above.Detailed process can be summarized Are as follows:
For follow-on individual will be entered in Evolution of Population, in this generation, is selected according to non-dominated ranking and crowding distance In worst individual, then give individual distribution OR/NOR-tree branch again.This is equivalent to the searcher for modifying worst individual To the strategy can improve algorithm to the ability of searching optimum of solution space to a certain extent.
Solution, specific algorithm are excavated using high quality mode is carried out by above-mentioned improved NSGA-II algorithm are as follows:
For the performance for examining the above method, the application solves following three object modules using the above method:
Maximize F (X)={ (f1(X),f2(X),f3(X))T}
Wherein:
(opposite) support for defining 1. item collection X, is defined as follows:
In table 1, the support of item collection { b, c } is sup ({ b, c })=3/10, because { b, c } goes out in illustrative data base Present T1,T7And T10In.Similarly, because item collection { c, g } appears in T2,T4In, so sup ({ c, g })=2/10.
Assuming that minimum support threshold value minSup=0.25, because of sup ({ b, c }) >=minSup, so item collection { b, c } is Frequent mode, because of sup ({ c, g }) < minSup, so item collection { c, g } is not frequent mode.
2. are defined for any two mode m and n, is made if there is no any item collectionThen m can regard one as Max model.In table 1, it is clear that item collection { b, c, f } is not max model, becauseItem collection { a, b } is also not Max model, becauseItem collection { c, d, g } is max model, because any item collection is not present in illustrative data base Can include { c, d, g }.
The integrality that 3. occupancies are used to measurement pattern is defined, is defined as follows:
For example, it is T respectively that the record comprising it, which has 3, in mode for { b, c }1,T7And T10, then the mode is occupied Degree is occu ({ b, c })=(2/3+2/2+2/4)/3 ≈ 0.72, if most descending occupancy threshold value minOccu=0.6, the mould Formula is known as dominant pattern.
(opposite) value of utility of 4. item collection X is defined, is defined as:
For example, the value of utility of mode { c, f }
Uti ({ c, f })=((1 × 2+4 × 8)+(5 × 2+2 × 8)+(1 × 2+2 × 8))/(37+31+27+58+24+39+ If 12+22+28+39) ≈ 0.25. minimum effectiveness threshold value minUti is less than the value, item collection { c, f } is exactly effective item collection, Also referred to as effective mode.
In solution procedure, using -4170@3.70GHz CPU 8G memory of 10 64 Intel's Intel Core i3s of windows Desktop computer is completed, and algorithm realized using Matlab, assesses the performance of the above method using four disclosed real data sets, and four A disclosed real data set includes chess, mushroom, accident, connect, and all data sets can be from SPMF data mining is downloaded in library, because partial data collection is huger, is described the problem in order to simpler, using Accident Preceding 10% and Connect preceding 50%.Fig. 9 is to be obtained most in four disclosed real data sets using MOEA-PM algorithm Non-dominant disaggregation schematic diagram eventually.
Table 4 and 5 pairs of table be described in detail using the parameter and feature of data set.
The relevant parameter of 4. data set of table
The feature of 5. data set of table
It is assessed by the MOEA-PM performance that several more advanced algorithms and its variant propose the application.
1)MOEA-PM-: in order to illustrate the improved genetic operator (Improved Genetic Operators) proposed Validity, the application compares the variant of MOEA-PM algorithm.Only retain the initialization of population strategy in MOEA-PM algorithm, it will Improved genetic operator replaces with common operator, and is marked as MOEA-PM-
2) two kinds of mode is defined in MOPM:MOPM algorithm, is transaction mode (transaction- respectively Pattern) and meta schema (meta-pattern) is used to generate initial population.Transaction mode (transaction-pattern) It is considered usual occupancy with higher and lower support, meta schema (meta-pattern) usually branch with higher Degree of holding and lower occupancy.When therefore carrying out mode excavation using the algorithm, it is believed to obtain more diversified solution.
3) validity of MOEA-FHUI (NSGA-II): MOEA-PM algorithm and digging efficiency also with newest MOEA-HUIM Algorithm compares, and the algorithm is using first item collection (meta-itemset) and transaction itemset (transaction-itemset) Carry out initialization population.Different from MOPM algorithm, it is according to the support angle value and transaction item of first item collection (meta-itemset) Collect the value of utility of (transaction-itemset) as selected probability, carrys out random initializtion population.In order to guarantee justice Property, all algorithms are all based on NSGA-II algorithm by us, therefore he is labeled as MOEA-PM (NSGA-II) herein.
4) MOEA-PM (Random): in order to illustrate the validity of the initialization of population strategy proposed in MOEA-PM, we Compare the variant of MOEA-PM algorithm, MOEA-PM (Random).MOEA-PM (Random) uses random initialization of population plan Slightly, other component parts are identical as MOEA-PM.
5) MOEA-PM (Meta.) and MOEA-PM (Tran.): the variant MOEA-PM (Meta.) of MOEA-PM algorithm and MOEA-PM (Tran.) is used in next comparative experiments, for illustrating the validity of the initialization of population strategy proposed. MOEA-PM (Meta.) indicates that initial population is all made of meta-patterns at random, and MOEA-PM (Tran.) indicates initial Population is all made of transaction-patterns at random, likewise, other component parts are identical as MOEA-PM.
It should be noted that above-mentioned all algorithms all use binary coding mechanism, selection operation in order to guarantee fairness Using binary algorithm of tournament selection method, other than MOEA-PM and its variant algorithm, other algorithms be all made of uniformity crossover and Basic bit mutation operator.For mutation operator, it is assumed that the sum of different item is I in data set, then mutation probability is Pm=1/ | I |。
For the quality of final mode that evaluation MOEA-PM algorithm excavates, using hypervolume and coverage rate work in the application For performance indicator.
Hypervolume (Hypervolume, HV) is one of the field EMO evaluation index, which can be comprehensive to a certain extent The convergence and diversity of reflection disaggregation are closed, calculation formula is as follows:
Wherein, λ represents Lebesgue measure, and A represents non-dominant disaggregation, voliRepresent reference point and non-dominant individual piIt constitutes Hypervolume.HV value is bigger, and the performance for indicating the obtained disaggregation of algorithm is more excellent.
Coverage rate (Coverage, COV) is common evaluation index in recommender system, it refers to that algorithm is recommended to user The ratio of the total article set of commodity Zhan out, calculation formula are as follows:
Wherein, NdIndicate the number of different item in recommendation list, N indicates all quantity.If algorithm The coverage rate of the disaggregation arrived is relatively low, and the solution range for just illustrating that this algorithm obtains compares limitation, will reduce the satisfaction of user Degree, because low coverage rate means that at user option commodity are less.Identical as HV index, the value of COV is bigger, illustrates algorithm Obtained mode to be recommended is more excellent.
100 are set by the Population Size of above-mentioned all algorithms, evaluation number is set as 5000, observes each algorithm less The quality of the non-dominant disaggregation obtained under evolutionary generation.
The tired support optimal solution set of obtained bat of each algorithm in 4 real data sets is as shown in Fig. 3-Fig. 6.
Can be seen that from such as Fig. 3-Fig. 6 and concentrated in four truthful datas, the quantity either solved still convergence with In terms of diversity, the effect of MOEA-PM algorithm is superior to other algorithms.
It can be found that the performance of MOEA-PM (Random) is worst, that is in the initial population due to completely random for observation It is individual most of be all it is invalid, this weakens the evolvability of algorithm, therefore does not restrain in limited times evolutionary generation. The performance of MOPM algorithm is better than MOEA-PM (Random) algorithm, that is because using meta-pattern and transaction- It is each to have also combined both modes while guaranteeing the individual in initial population is all effective for the population of pattern initialization From the advantages of.MOEA-FHUI (NSGA-II) algorithm is similar with MOPM algorithm performance results, or even the MOEA- on certain data sets The performance of FHUI (NSGA-II) is better than MOPM, this illustrates that the population of MOEA-FHUI (NSGA-II) to a certain extent is initial Change method is better than the initial method of MOPM to a certain extent.Since the initial population of MOEA-PM (Meta.) algorithm usually has There is higher support angle value, but since distributivity is poor, the solution explored in limited times evolutionary generation is mainly concentrated The high position for supporting angle value in solution space;Similarly, the solution master that MOEA-PM (Trans.) is explored in limited times evolutionary generation It is distributed in the high position for supporting angle value in solution space.
And the MOEA-PM algorithm that the application proposes uses special initialization of population method and improved cross and variation operator Solves the above problem, on the one hand, ensure that algorithm is at preferably state before evolution starts, on the other hand, avoid The random combine of item collection in evolutionary process improves efficiency, therefore puts up the best performance in such as Fig. 3-Fig. 6.
In order to evaluate the quality for the final mode that each algorithm excavates, the Population Size of above-mentioned all algorithms is adjusted to 150, maximum evaluation number is adjusted to 45000, and algorithm evaluates HV the and COV value under numbers in the different functions of 4 data sets, As shown in Figs. 7-8.A~d can be seen that compared to other algorithms, the HV curve convergence speed of MOEA-PM algorithm from Fig. 7 Most fast, this shows that algorithm can be with faster velocity interpolation convergence and multifarious balance.MOEA-HUIM's (NSGA-II) HV curve convergence speed is slightly better than MOPM, and fluctuating range wants small compared to MOPM, that is because of MOEA-HUIM (NSGA- II initial population) is to be randomly selected according to the support and value of utility of the both of which proposed, therefore restrain speed It spends relatively fast.MOEA-PM-Performance and MOEA-HUIM (NSGA-II) behave like, be substantially better than the latter, explanation It is proposed that initialization of population strategy be effective.But MOEA-PM-The convergence rate of HV curve to be but not so good as MOEA-PM fast, Illustrate it is proposed that improved genetic operator be affected to Algorithm Convergence and distributivity, this also indirect proof we The validity of improved genetic operator.
(a)~(d) can be seen that MOEA-PM algorithm can make coverage rate curve convergence with faster speed from Fig. 8, and And curve fluctuation is more gentle.In conjunction with Fig. 7 and Fig. 8 it was found that even if we increase evolutionary generation, MOEA-PM (Random) it still can not restrain, this illustrates the trivial solution that random initialization of population method generates, and can seriously affect algorithm Environmental selection ability, causes evolvability to seriously undermine, therefore algorithm is difficult to restrain in limited times evolutionary generation.Although MOEA-PM (Meta.) and MOEA-PM (Tran.) can guarantee the validity of initial population, but since initial population is empty in search Between in be unevenly distributed, diversity is poor, will affect algorithm in the exploring ability at the initial stage of evolution, therefore convergence rate is slower. Experiment shows that the quality of MOEA-PM algorithm either convergence rate or last solution proposed in this paper is better than above-mentioned algorithm.
Part steps in the embodiment of the present invention, can use software realization, and corresponding software program can store can In the storage medium of reading, such as CD or hard disk.
The foregoing is merely presently preferred embodiments of the present invention, is not intended to limit the invention, it is all in spirit of the invention and Within principle, any modification, equivalent replacement, improvement and so on be should all be included in the protection scope of the present invention.

Claims (10)

1. a kind of high quality mode method for digging, which is characterized in that the method is based on NSGA-II algorithm, adopts in the algorithm It is made improvements with following step:
Raw data base is expressed as to the form of bitmap;
It has been expressed as the initial data of bitmap form using the initialization of population strategy based on OR/NOR-tree structure and combination Library constructs initial population;
It is configured using improved crossover operator and mutation operator at the NOR in OR/NOR-tree structure and at OR;
The adjustment in direction is scanned for using worst individual direction of search adjustable strategies and based on OR/NOR-tree structure;
Solution is excavated using high quality mode is carried out by above-mentioned improved NSGA-II algorithm.
2. the method according to claim 1, wherein described using at the beginning of the population based on OR/NOR-tree structure Beginningization strategy and in conjunction with the raw data base for being expressed as bitmap form come before constructing initial population, comprising:
Raw data base is scanned to find all max model and all different items, OR/ is then constructed according to max model NOR-tree structure.
3. the method according to claim 1, wherein described using at the beginning of the population based on OR/NOR-tree structure Beginningization strategy simultaneously constructs initial population in conjunction with the raw data base for being expressed as bitmap form, comprising:
Different tree branches is distributed to each initial individuals, and then individual is distributed in three kinds of states below:
The corresponding one of position OR of individual is initially 1 by state one, and other positions are initially 0;
The corresponding all positions OR of individual are initially 1 by state two, and all NOR bits, which are set, is initially 0;
The corresponding position OR of individual is initially 0 or 1 by state three at random, and all NOR bits, which are set, is initially 0.
4. the method according to claim 1, wherein described use improved crossover operator and mutation operator pair It is configured at NOR in OR/NOR-tree structure and at OR, comprising:
New individual is generated using uniformity crossover, is 0 by the new corresponding NOR disposition of individual;
It for mutation operator, is operated using basic bit mutation, only carries out mutation operation at OR corresponding to individual.
5. the method according to claim 1, wherein described carried out using worst individual direction of search adjustable strategies The adjustment of the direction of search, comprising:
The case where being greater than Population Size for OR/NOR-tree branch sum replaces in each iterative process in current population most The individual direction of search of difference.
6. according to the method described in claim 5, it is characterized in that, the individual direction of search worst in the replacement population, packet It includes: selecting individual worst in this generation according to non-dominated ranking and crowding distance, then distribute OR/NOR- to the individual again Tree branch.
7. the method according to claim 1, wherein the improved NSGA-II algorithm is compiled using binary system Ink recorder system, selection operation use binary algorithm of tournament selection method.
8. application of the method as claimed in claim 1 to 7 in data mining technology field.
9. a kind of method as claimed in claim 1 to 7 is solving the application method in following three object modules, three mesh Mark model are as follows:
Maximize F (X)={ (f1(X),f2(X),f3(X))T}
Wherein,f1(X) the opposite support of intermediate scheme X, f2(X) occupancy of intermediate scheme X, f3(X) The relative utility value of intermediate scheme X.
10. according to the method described in claim 9, it is characterized in that, the opposite support of the mode X is used to measure mode The frequency of occurrences is defined as follows:
Wherein D={ T1,T2,…Tq..., TnIt is raw data set, TqFor the single data in raw data set D;| D | it is in D Data TqQuantity;
The occupancy of the mode X is used to measure the integrality of mode, is defined as follows:
Wherein, TxIndicate the data comprising all in mode X;
The total value of utility of all items that the relative utility value intermediate scheme X of the mode X is included and raw data set total utility The ratio of value, is defined as follows:
Wherein, Tq∈ D (1≤q≤n) is q data, ijIndicate j-th, q (ij,Tq) (1≤j≤m, 1≤q≤n): q J-th quantity for including in item record, p (ij): j-th weight, TU (Tq) indicate the q articles value of utility recorded.
CN201910303505.8A 2019-04-16 2019-04-16 High quality mode method for digging based on multi-objective evolutionary algorithm Pending CN110069498A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201910303505.8A CN110069498A (en) 2019-04-16 2019-04-16 High quality mode method for digging based on multi-objective evolutionary algorithm

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201910303505.8A CN110069498A (en) 2019-04-16 2019-04-16 High quality mode method for digging based on multi-objective evolutionary algorithm

Publications (1)

Publication Number Publication Date
CN110069498A true CN110069498A (en) 2019-07-30

Family

ID=67367842

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201910303505.8A Pending CN110069498A (en) 2019-04-16 2019-04-16 High quality mode method for digging based on multi-objective evolutionary algorithm

Country Status (1)

Country Link
CN (1) CN110069498A (en)

Cited By (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110955702A (en) * 2019-11-28 2020-04-03 江南大学 Pattern data mining method based on improved genetic algorithm
WO2021102775A1 (en) * 2019-11-28 2021-06-03 江南大学 Pattern data mining method based on improved genetic algorithm
CN117010991A (en) * 2023-07-31 2023-11-07 江南大学 High-profit commodity combination mining method based on GPU (graphic processing Unit) parallel improved genetic algorithm

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20120150860A1 (en) * 2010-12-10 2012-06-14 Yahoo!, Inc. Clustering with Similarity-Adjusted Entropy
CN103198359A (en) * 2013-04-03 2013-07-10 南京理工大学 Optimized and improved fuzzy regression model construction method based on nondominated sorting genetic algorithm II (NSGA- II)
CN105740467A (en) * 2016-03-07 2016-07-06 东北大学 Mining method for C-Mn steel industry big data
CN106997553A (en) * 2017-04-12 2017-08-01 安徽大学 A kind of method for digging of the grouping of commodities pattern based on multiple-objection optimization

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20120150860A1 (en) * 2010-12-10 2012-06-14 Yahoo!, Inc. Clustering with Similarity-Adjusted Entropy
CN103198359A (en) * 2013-04-03 2013-07-10 南京理工大学 Optimized and improved fuzzy regression model construction method based on nondominated sorting genetic algorithm II (NSGA- II)
CN105740467A (en) * 2016-03-07 2016-07-06 东北大学 Mining method for C-Mn steel industry big data
CN106997553A (en) * 2017-04-12 2017-08-01 安徽大学 A kind of method for digging of the grouping of commodities pattern based on multiple-objection optimization

Non-Patent Citations (2)

* Cited by examiner, † Cited by third party
Title
张强: "基于演化计算的模式挖掘算法研究", 《中国优秀硕士学位论文全文数据库 信息科技辑》 *
杨璐: "基于频繁且高效用项集的数据挖掘研究", 《中国优秀硕士学位论文全文数据库 信息科技辑》 *

Cited By (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110955702A (en) * 2019-11-28 2020-04-03 江南大学 Pattern data mining method based on improved genetic algorithm
WO2021102775A1 (en) * 2019-11-28 2021-06-03 江南大学 Pattern data mining method based on improved genetic algorithm
CN110955702B (en) * 2019-11-28 2024-03-29 江南大学 Improved genetic algorithm-based mode data mining method
CN117010991A (en) * 2023-07-31 2023-11-07 江南大学 High-profit commodity combination mining method based on GPU (graphic processing Unit) parallel improved genetic algorithm
CN117010991B (en) * 2023-07-31 2024-05-03 江南大学 High-profit commodity combination mining method based on GPU (graphic processing Unit) parallel improved genetic algorithm

Similar Documents

Publication Publication Date Title
Beukelaer et al. Core Hunter II: fast core subset selection based on multiple genetic diversity measures using Mixed Replica search
CN110069498A (en) High quality mode method for digging based on multi-objective evolutionary algorithm
Kauffman et al. Technological evolution and adaptive organizations
Mampaey et al. Summarizing data succinctly with the most informative itemsets
Divina et al. A multi-objective approach to discover biclusters in microarray data
Dehkordi et al. A Novel Method for Privacy Preserving in Association Rule Mining Based on Genetic Algorithms.
Lin et al. A GA-based approach to hide sensitive high utility itemsets
WO2020210974A1 (en) High-quality pattern mining model and method based on improved multi-objective evolutionary algorithm
Golshanara et al. A multi-colony ant algorithm for optimizing join queries in distributed database systems
Gnägi et al. A matheuristic for large-scale capacitated clustering
Liu et al. Versatile black-box optimization
Zhang et al. Improved genetic algorithm for high-utility itemset mining
Song et al. Mining high average-utility itemsets based on particle swarm optimization
García-Martínez et al. A simulated annealing method based on a specialised evolutionary algorithm
Lunardi et al. Comparative study of genetic and discrete firefly algorithm for combinatorial optimization
Coelho et al. Multi-objective design of hierarchical consensus functions for clustering ensembles via genetic programming
WO2000038112A2 (en) Code compaction by evolutionary algorithm
Chen et al. Improved local search for the minimum weight dominating set problem in massive graphs by using a deep optimization mechanism
Michelakos et al. A hybrid classification algorithm evaluated on medical data
Banka et al. Evolutionary biclustering of gene expressions
CN109977165A (en) A kind of three target pattern mining models
Wu Data association rules mining method based on improved apriori algorithm
Talebian et al. Using genetic algorithm to select materialized views subject to dual constraints
CN109241134A (en) A kind of grouping of commodities mode multiple target method for digging based on agent model
CN108830370A (en) Based on the feature selection approach for enhancing learning-oriented flora foraging algorithm

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
RJ01 Rejection of invention patent application after publication
RJ01 Rejection of invention patent application after publication

Application publication date: 20190730