CN111950687A

CN111950687A - Method for solving minimum attribute reduction by combining local opponent learning and social spider algorithm

Info

Publication number: CN111950687A
Application number: CN202010878636.1A
Authority: CN
Inventors: 危前进; 王承先; 常亮; 黄桂敏
Original assignee: Guilin University of Electronic Technology
Current assignee: Guilin University of Electronic Technology
Priority date: 2020-08-27
Filing date: 2020-08-27
Publication date: 2020-11-17

Abstract

The invention discloses a method for solving minimum attribute reduction by combining local opponent learning and a social spider algorithm, which provides a similarity constraint at the initial stage of iteration to keep individuals in a population in a better state, introduces opponent learning in the iteration process, designs a local opponent learning strategy to expand a search range and accelerate convergence speed; furthermore, a redundancy detection mechanism is adopted to carry out redundancy detection on the global optimal solution, and minimum attribute reduction is ensured as far as possible. The method of the invention can find effective minimum reduction under most conditions, has shorter operation time and faster convergence rate; and may also exhibit better performance as the data set grows.

Description

Method for solving minimum attribute reduction by combining local opponent learning and social spider algorithm

Technical Field

The invention relates to the field of data mining and knowledge discovery, in particular to a method for solving minimum attribute reduction by combining local opponent learning and a social spider algorithm.

Background

As society develops, the scale of data grows exponentially, creating a large number of noisy, irrelevant, or misleading features in the data. Rough Set Theory (RST) is a mathematical tool that deals with uncertain, inaccurate, and ambiguous data. Rough sets are widely used in many fields, such as machine learning, feature selection, data mining, image processing, pattern recognition. RST can delete redundant attributes in data by discovering dependencies in the data without any a priori information. Given a data set with discretized values, a subset of the original set can be found by RST (reduction). Currently, many scholars have conducted extensive research into the field of attribute reduction. The minimum attribute reduction is the focus and hot spot of research.

The goal of minimum attribute reduction is to find the reduction of the minimum cardinality. One of the most basic solutions for minimum attribute reduction is to generate all possible reductions and select those with smaller cardinalities. In practice, however, generating a full reduction is impractical and the minimum attribute reduction problem is an NP-hard problem. Therefore, a heuristic approach needs to be considered.

The greedy method is a typical heuristic method, and the traditional reduction for finding the minimum attribute through a rough set theory is based on the greedy method. In the rough set theory, attribute importance, information entropy, mutual information and the like are generally adopted as heuristic information. And from the empty set or the core attribute, forward addition or reverse deletion is adopted. The forward addition is stopped by starting from an empty set or core attribute, and continuously selecting the most important features to add into the current reduction set until a condition is met. And the reverse deletion is to sequentially delete the attributes from the complete feature set until the condition is met. Pawlak proposes a forward domain-based attribute reduction method (

Pawlak.(1982).Rough sets.International Journal of Computer&Information Sciences,11(5),341- & 356.). Miao proposed an attribute reduction method based on entropy (courtesy, royal, MIAODuo-qian,&WANGJue (1999). information representation of concepts and operations in rough set theory. software bulletin, 10(2), 113-. However, the attribute reduction method based on the greedy method does not find the minimum attribute reduction well. Make itUsing attribute importance, information entropy, mutual information, etc. as heuristic information may result in searches that follow non-minimal paths.

Some scholars consider to find the minimum attribute reduction by using a group intelligence algorithm, which is an algorithm inspired by nature. The principle is to iteratively optimize by simulating the behavior of the living beings. Generally, Ant Colony Optimization (ACO), Particle Swarm Optimization (PSO), firefly optimization (GSO), fish swarm optimization (FSA), artificial bee colony optimization (ABC), BAT optimization (BAT), etc. are included. Due to the superiority of the performance of the group intelligent algorithm, the method is widely applied to the fields of path planning, fault diagnosis and the like.

Cuevas et al proposed in 2013 a novel intelligent optimization algorithm, social spider Algorithm (SSO) (Cuevas, E., Cienfuegos, M., Zaldivar, D., & Perez-Cisneros, M. (2013). Aswar optimization algorithm in the human being in the viewer of the social-spider systems with applications,40(16), 6374-. According to the difference of the sexes, each individual in the group is combined differently by an operator to simulate the behavior, and the spider finally tends to be globally optimal through continuous movement. Many scholars in recent years have improved and applied algorithms to different scenarios. For example: for the problem that the SSO algorithm is easy to fall into local solution and poor convergence speed in Zhou, Y and the like, a simplex strategy is adopted to enhance the global and local search capability to solve the data clustering problem, thereby avoiding falling into local optimum state and improving the convergence speed (Zhou, Y., Luo, Q., & Abdel-base, M. (2017). A simple method-based social pointer optimization for clustering analysis. engineering Applications of engineering intuition, 64(sep.), 67-82.). Zhou, Y et al propose an improved social spider optimization algorithm for cluster analysis (Zhou, Y., Luo, q., & Abdel-base, m. (2017). abstract method-based social classifier optimization for clustering analysis. engineering Applications of intellectual Intelligence,64(sep.), 67-82.). The local exploration capability of the SSO algorithm is enhanced by introducing a pure method, and the algorithm is proved to have better performance by comparing experiments with other schemes. Mohamed et al solve the minimum attribute reduction problem by applying rough set theory to the social spider algorithm (Mohamed, a.e.a., & Hassanien, a.e. (2017.) An improved social protocol optimization basic on route sets for solving the minimum number attribute reduction problem. Although social spiders perform well in many areas, there are also the drawbacks of other swarm intelligence algorithms, where an individual, although finding the best value in a search area, only explores in one direction, with a small area of exploration. In addition, the convergence speed is slow, and the algorithm finds a globally suboptimal solution in most cases, so that the algorithm cannot find the globally optimal solution without the constraint. In order to solve the above problems, different strategies need to be adopted to improve the solution.

Oppositional learning is applied to various intelligent algorithms to improve performance (Wangming, & Tangminzhu. (2017). hybrid Greensis optimization algorithm fusing oppositional learning, computer science and exploration, 011(004), 673-. In the iterative process, not only the solution of the original search space but also the corresponding opposite solution are considered, which is often beneficial to the algorithm optimization. However, the introduction of the opponent learning mechanism for the intelligent algorithm is generally accompanied by a great deal of calculation redundancy, and with the increase of individuals in the intelligent algorithm and the increase of the number of iterations, the calculation amount generated by the opponent learning seriously influences the running time.

Disclosure of Invention

The invention aims to provide a method for solving minimum attribute reduction by combining local opponent learning and social spider algorithm, which has high convergence speed and short running time.

In order to solve the technical problems, the invention adopts the following technical scheme:

a method for solving minimum attribute reduction by combining local opponent learning and social spider algorithm comprises the following steps:

1) initializing social spider algorithm parameters and calculating the number N of female spiders_fNumber of male spiders N_mFemale spider position x_fMale spider position x_mThe fitness value F of each spider; comparing the fitness values of all spiders, taking the spider with the largest fitness value as a global optimal spider, and recording the fitness value G of the global optimal spider_fitAnd its position G_location(ii) a Wherein the content of the first and second substances,

initializing social spider algorithm parameters includes:

total number of spiders N, which is the number of female spiders N_fAnd number of male spiders N_mThe sum of (1);

the spider is close to or far away from a threshold value PF of the vibration source, and the value range of the threshold value PF is (0, 1);

the lowest ratio P of female spiders to the total number of spiders_LowAnd the highest ratio P_HighIn which P is_Low＝0.65，P_High＝0.9；

A similarity constraint threshold Si with a value range of [0,1 ];

maximum iteration times Max _ Gen, obtaining continuous iteration times cycle with the same reduction result, wherein the values of the iteration times cycle are positive integers;

lambda represents a biased weight factor, and the value range of the weight factor is [0,1 ];

2) executing a similarity constraint strategy on the population, and executing the step 3) after the constraint condition is met;

the similarity constraint includes: computing global optimal spider G_locationWith the remaining spiders x₁,x2,...,x_N-1Similarity value of and global optimal spider G_locationComparing the average similarity value with the average similarity value of the rest population, comparing the average similarity value with the similarity constraint threshold value, and when the average similarity value is greater than the similarity constraint threshold valueStep 3) is executed, otherwise each digit is assigned to [ -4,4] again for individuals smaller than the average similarity value]Random number, converting the value of each bit of the individual into binary number, and recalculating the global optimal spider G_locationUntil the calculated average similarity value is greater than the similarity constraint threshold;

3) calculating the fitness value of an individual in the current population, finding the spider with the highest fitness value in the current population, and replacing the fitness value G of the previously recorded global optimal spider with the fitness value and the position of the spider with the highest fitness value in the current population if the spider with the highest fitness value in the current population has a higher fitness value than the previously recorded global optimal spider_fitAnd position G_location(ii) a Otherwise, not replacing;

4) female spiders move;

5) the male spider moves;

6) the spider copulation generates offspring, specifically, the female spider taking the dominant male spider as the center and within the radius r is copulated, and the next generation is generated by roulette;

7) performing worst individual replacement, namely sequentially comparing each individual in the offspring population with the individual with the minimum fitness value in the parent population, and if the fitness value of the individual in the offspring population is higher than the minimum fitness value in the parent population, replacing the individual with the minimum fitness value in the parent population by using the offspring individual, and replacing all attributes including position, number, gender, fitness value and the like; otherwise, abandoning the offspring individuals and continuing to judge the next offspring individual;

8) executing a local opponent learning strategy, wherein the local opponent learning strategy comprises the following steps: calculating the average fitness value of the current population, screening individuals lower than the average fitness value, creating opponent individuals, calculating the fitness value, and selecting the individuals with higher fitness values and the individuals with fitness higher than the average fitness value in the original population to be combined to serve as a new current population;

9) calculating the fitness value of individuals in the current population, finding out the spider with the highest fitness value in the current population, and if the fitness value in the current population is the highestIf the high spider has a higher fitness value than the previously recorded global optimal spider, replacing the previously recorded global optimal spider fitness value G with the spider fitness value and position with the highest fitness value in the current population_fitAnd position G_location(ii) a Otherwise, not replacing;

10) g recorded in step 9)_fitAnd G_locationIs assigned to G_fit' and G_location' and judge G_location' whether it is an effective reduction result; if yes, executing step 11), otherwise executing step 12);

11) for G_locationCarrying out redundancy detection to remove redundancy attributes;

12) finishing a round of iteration, judging whether the maximum iteration number is met or continuous iteration numbers with the same reduction result are obtained, if so, finishing the iteration, and outputting G_location'; otherwise, adding 1 to the iteration number, and returning to execute the step 3).

In step 1) of the above technical scheme, the number N of female spiders is calculated according to the following formula_fNumber of male spiders N_mFemale spider position x_fMale spider position x_mThe fitness value F of each spider; specifically, the method comprises the following steps:

calculating the number N of female spiders according to a formula (1)_f：

N_f＝floor[(P_High-rand₁*(P_High-P_low))*N] (1)

Where floor denotes rounding down, e.g.: when the value is between 5.1 and 5.9, all the values are 5; rand₁Is [0,1]]Random numbers in between, created by a function when used. By N-N_fCalculating the number N of male spiders_m. The invention adopts a binary string with the length of n to represent each spider

Each bit of the spider corresponds to the selection of a condition attribute in the decision table. When a bit takes the value of "1", the attribute of the bit is selected, and when a bit takes the value of "0", the attribute of the bit is not selected.

The position of the spider is converted into a binary string according to equations (2) and (3):

wherein i is an index of a spider subscript; g is iteration times; l is an index of the dimension;

represents the l-th dimension component of spider i when the iteration number is g, and the initial stage is set to [ -4,4]A random number in between; e [0,1]]Random numbers in between, created by a function when used; obtaining female spider position x_fAnd male spider position x_mThen calculating the value of each position of each spider according to formulas (2) and (3);

calculating a fitness value F for each spider according to equation (4):

wherein lambda represents a weight factor of biased value; card (—) represents the cardinality of the set; r is a reduced attribute set; c represents a condition attribute set; d represents a decision attribute; gamma ray_R(D) Representing attribute dependency, representing the proportion of knowledge in D to the whole domain of discourse U, gamma_R(D) The larger the value of (A), the more the reduced condition attribute can improve the classification capability of the decision table; the smaller the value of card (R), the fewer the number of condition attributes after reduction; gamma ray_R(D) Calculating according to equation (5):

wherein the POS_R(D) Expressed as the positive domain, calculated according to equation (6):

where R (X) represents the lower approximation of the set R, representing the set of all objects in U that must belong to the set X, as determined from prior knowledge R.

In step 2) of the above technical scheme, the hamming distance is used to reflect the differences of spiders, thereby providing a basis for similarity, and the specific implementation method is as follows:

the differences between spiders are passed over two binary strings of length n

Modulo two is added to calculate, and the calculation formula is shown as formula (7):

after determining the difference between the two spiders by formula (7), the similarity between the two spiders is further calculated by formula (8):

in the formula (7) and the formula (8),

respectively representing the values of I and j of the binary strings with the length of n, and taking the value as 1 or 0;

calculating globally optimal spider G by equation (9)_locationWith the remaining spiders x₁,x2,...,x_N-1Average similarity of (a):

in the technical scheme of the invention, information between spiders is transmitted through vibration, the vibration amplitude depends on 2 factors of weight and distance, and the vibration between the two spiders is expressed through a formula (10):

wherein, w_jExpressed as the weight of spider j, d_i,jRepresents the distance between spider i and spider j, passing | | x_i-x_j| l is calculated. The weight is calculated by the following equation (11):

wherein the content of the first and second substances,

represents the maximum value of fitness among all individuals,

represents the minimum fitness among all individuals.

In step 2) of the above technical solution, each bit is assigned to an individual with a value less than the average similarity value again as a [ -4,4] random number, and then the value of each bit on the individual is converted into a binary number by the above formulas (2) and (3). In the step 4) of the technical scheme, when the female spiders move, each female spider updates the position according to a formula (12);

wherein, rand₂And rand₃、β₁、β₂、β₃Are all [0,1]Random number in between, when in useCreating an over function; g represents the number of iterations; PF represents the threshold for female spiders to approach or move away from a vibration source; s_cAnd Vibc_iRespectively representing the position of a spider which is closest to the female spider i in the population and has a weight larger than that of the female spider i and the vibration of the female spider i caused by the spider; s_bAnd Vibb_iRespectively representing the position of the spider with the maximum weight in the population and the vibration of the spider on the female spider i;

the position of the female spider i at the number of iterations g is indicated.

In the step 5) of the technical scheme, when the male spiders move, each male spider updates the position according to a formula (13);

wherein, rand₄Eta and all mean values are [0,1]]Random numbers in between, created by a function when used; ind-_medIs the weight median of male spiders; s_fAnd Vibf_iRespectively representing the position of a female spider closest to the male spider i in the population and the vibration of the male spider i caused by the female spider;

represents a weighted average of the male population,

the position of the male spider i at the number of iterations g is indicated.

In step 6) of the above technical solution, the radius r is calculated by formula (14):

wherein the content of the first and second substances,

and

the maximum and minimum values in each dimension.

In step 8) of the above technical solution, executing the local opponent learning strategy specifically includes: calculating the fitness value of individuals in the population, further calculating the average fitness value of the population, screening out individuals smaller than the average fitness value in the population, creating a population P, recording the number of spiders as N ', creating opposite individuals of the spiders in the population P to form an opposite population OP, calculating the fitness value of the individuals in the opposite population OP, combining the population P and the population OP, sorting 2N' individuals in a descending order according to the fitness value, and selecting the N 'individuals before the fitness value as a new population P' to replace the population P; and combining the new population P' with the individuals with the fitness greater than the average fitness value in the original population to serve as a new population.

In step 10) of the above technical solution, when γ is_C(D)＝γ_R(D) When 1, then represents G_location' is an effective reduction result, wherein:

c represents a condition attribute set; d represents a decision attribute; r is a reduced attribute set; gamma ray_C(D) Representing dependency of conditional attributes, γ_R(D) Representing the dependencies of the reduced set; represents the proportion of the knowledge in D to the whole domain of discourse U, gamma_R(D) The larger the value of (A), the more the reduced condition attribute can improve the classification capability of the decision table; the smaller the value of card (R), the fewer the number of condition attributes after reduction; gamma ray_R(D) Calculating according to equation (5):

wherein card (—) represents the cardinality of the set; POS (Point of sale)_R(D) Expressed as the positive domain, calculated according to equation (6):

whereinR(X) represents the lower approximation of the set R, which represents the set of all objects in U that must be attributed to the set X, as determined from the prior knowledge R.

In step 11) of the above technical solution, the attribute is deleted according to the attribute importance of the attribute, and when the attribute importance of the attribute is 0, the attribute is deleted, otherwise, the attribute is not deleted and is placed back to the original reduction set. The attribute importance is calculated according to equation (16):

sig(α,R；D)＝γ_R(D)-γ_R-{α}(D) (16)

wherein, α represents an attribute whose importance needs to be calculated; d represents a decision attribute; r is a reduced attribute set; gamma ray_C(D) Representing dependency of conditional attributes, γ_R(D) Representing the dependencies of the reduced set; gamma ray_R(D) Expressing attribute dependency, expressing the proportion of knowledge in D which can be correctly classified into D by using the knowledge in R to the whole domain of discourse U, calculating by using a formula (5) and a formula (6), and gamma_R(D) The larger the value of (a), the more the reduced conditional attribute can improve the classification capability of the decision table.

Compared with the prior art, the method has better optimizing capability compared with an ant colony Algorithm (ACO) -based attribute reduction algorithm (IGRAACO algorithm), a Social Spider (SSO) -based attribute reduction algorithm (SSORS algorithm) and a Fish Swarm Algorithm (FSA) -based attribute reduction algorithm (FSRSAR algorithm), can find the minimum attribute reduction in a shorter iteration number, has shorter reduction time, and also has higher fitness function value and faster convergence speed. And may also exhibit better performance as the data set grows.

Drawings

FIG. 1 is a flow chart of the method of the present invention;

FIG. 2 is a flow chart of the execution of similarity constraints in the method of the present invention;

FIG. 3 is a flow chart of the method of the present invention for performing local opponent learning;

FIG. 4 is a flow chart of redundancy detection in the method of the present invention;

fig. 5 is a graph of fitness values of 4 algorithms on different data sets according to the number of iterations in the embodiment of the method of the present invention, where (a) is Zoo data set, (b) is volume data set, (c) is munorom data set, and (d) is Wine data set.

Detailed Description

The present invention will be better understood from the following detailed description taken in conjunction with the accompanying drawings, but the present invention is not limited to the following embodiments.

Referring to fig. 1 to 4, the method for solving minimum attribute reduction by combining local opponent learning and social spider algorithm according to the present invention includes the following steps:

1) initializing social spider algorithm parameters and calculating the number N of female spiders_fNumber of male spiders N_mNumber of male spiders N_mFemale spider position x_fMale spider position x_mAnd a fitness value F of each spider, wherein,

initializing social spider algorithm parameters includes:

total number of spiders N, which is the number of female spiders N_fAnd the number of male spiders N_mThe sum of (1);

A similarity constraint threshold Si with a value range of [0,1 ];

calculating the number N of female spiders according to a formula (1)_f：

N_f＝floor[(P_High-rand₁*(P_High-P_low))*N] (1)

Where floor denotes rounding down, e.g.: when the value is between 5.1 and 5.9, all the values are 5; rand₁Is [0,1]]Random numbers in between, created by a function when used. By N-N_fCalculating the number N of male spiders_m. The invention adopts binary string with length n to express each spider

representing the l-th component of spider i, the initial stage, for a number of iterations g

Is set to [ -4,4 [)]The random number of (2); e [0,1]]Random numbers in between, created by a function when used; obtaining female spider position x_fAnd male spider position x_mThen calculating the value of each position of each spider according to formulas (2) and (3);

calculating a fitness value F for each spider according to equation (4):

whereinR(X) represents a lower approximation of set R, representing a set of elements in X of the equivalence class divided by R;

after the population fitness value is calculated, by comparing the fitness values of all the spiders, the spider with the maximum fitness value is taken as the global optimal spider, and the fitness value G of the global optimal spider is recorded_fitAnd its position G_location。

the execution similarity constraint includes: computing global optimal spider G_locationWith the remaining spiders x₁,x₂,...,x_N-1Similarity value of and global optimal spider G_locationComparing the average similarity value with the average similarity value of the rest population, comparing the average similarity value with the similarity constraint threshold, executing the step 3) when the average similarity value is larger than the similarity constraint threshold, and otherwise, assigning each digit to the value of [ -4,4] again for the individuals smaller than the average similarity value]Random number and converting the value of each bit on an individual into a binary numberRecalculating and globally optimizing spider G_locationUntil the calculated average similarity value is greater than the similarity constraint threshold; the concrete implementation steps are as follows:

2.1) the differences between spiders are passed over two binary strings of length n

wherein

Respectively representing the l-th bit value of the binary string with the length of n of the spider i and the spider j, and taking the value of 1 or 0.

2.2) after determining the difference between the two spiders by the formula (7), further calculating the similarity value between the two spiders by the formula (8):

for example: let the binary string corresponding to the first spider be denoted as x₁When (0001001001), the binary string corresponding to the second spider is denoted x₂When the sum is (1101011100), sim (x) can be obtained from equation (8)₁,x₂) The similarity value between two spiders is 0.5, i.e. the information is consistent in 5 positions, i.e. 1-0.5.

2.3) calculating the current global optimum spider G through a formula (9)_locationWith the remaining spiders x₁,x₂,...,x_N-1Average similarity value of (a):

comparing the average similarity value with a similarity constraint threshold, continuing to execute the step 3 when the average similarity value is larger than Si, otherwise, re-randomly initializing the individuals smaller than the average similarity value (i.e. re-assigning [ -4,4] random numbers to the individuals smaller than the average similarity, and re-converting the values at the positions of the individuals into binary numbers through the formulas (2) and (3)) and re-executing the similarity constraint strategy (i.e. the step 2)) until the calculated average similarity value is larger than the similarity constraint threshold;

3) calculating fitness value F (according to formulas (4) - (6)) of individuals in the current population, finding out the spider with the highest fitness value in the current population, and replacing the fitness value G of the previously recorded global optimal spider with the fitness value G and the position of the spider with the highest fitness value in the current population if the spider with the highest fitness value in the current population has a higher fitness value than the previously recorded global optimal spider_fitAnd position G_location(ii) a Otherwise, not replacing;

information between spiders is transmitted by vibration, the amplitude of which depends on the weight and the distance 2 factors, and the vibration between two spiders is expressed by equation (10):

wherein

Represents the maximum value of fitness among all individuals,

represents the minimum fitness among all individuals.

4) The female spiders move, and each female spider updates the position according to a formula (12);

rand₂、rand₃、β₁、β₂、β₃are all [0,1]Random numbers in between, created by a function when used; g represents the number of iterations; PF represents the threshold for female spiders to approach or move away from a vibration source; s_cAnd Vibc_iRespectively representing the position of a spider in the population closest to the female spider i and weighted more than the female spider i, and the spider (i.e. spiders s)_c) To female spiders i (i.e. spiders s)_cThe vibration generated to spider i is recorded as Vibc_i)；s_bAnd Vibb_iRespectively representing the location of the largest weighted spider in the population and the spider (i.e., spiders s)_b) To female spiders i (i.e. spiders s)_bThe vibration generated to spider i is recorded as Vibb_i)；

The position of the female spider i at the number of iterations g is indicated.

5) The male spiders move, and each male spider updates the position according to a formula (13);

unlike female spiders' mobility strategies, male spiders are classified as dominant male spiders and non-dominant male spiders, which have a better weight than non-dominant male spiders, which are attracted by female spiders. In contrast, non-dominant male spiders will tend to move more towards the center where the male spiders gather. The central position of a male spider can be determined by the weight of each spider and its position. The specific movement pattern of the male spider is shown in formula (13):

wherein, rand₄All the sum of eta are [0,1]]Random numbers in between, created by a function when used; ind-_medIs the weight median of male spiders; s_fAnd Vibf_iRespectively representing the position of the spider closest to the male spider i in the population and the spider (i.e. spiders s)_f) To vibrations generated by male spiders i (i.e. spiders s)_fThe vibration generated by the male spider i is recorded as Vibf_i)；

Represents a weighted average of the male population,

the position of the male spider i at the number of iterations g is indicated.

The advantage of updating the male spider position using equation (13) is that: on the one hand, male spiders are attracted by female spiders to prepare for mating and promote the diversity increase of the spiders; on the other hand, the introduction of a weighted average can avoid spiders that perform very well and very poorly from affecting the search process.

6) Mating to produce offspring: mating female spiders with dominant male spiders as the center and within r radius, and betting on a roulette wheel to generate the next generation;

the dominant male spiders find female spiders that mate with them within a radius r centered on themselves and compose a colony, where the radius r is calculated by equation (14):

wherein the content of the first and second substances,

and

the maximum and minimum values in each dimension.

After the dominant spiders make up the parent population with female spiders within their radius r, the child population is generated by roulette (where the design of the roulette is built on the weight of the spiders).

7) Worst individual replacement: sequentially comparing the individuals in the offspring population with the individuals with the minimum fitness value in the parent population, and if the fitness value of the individuals in the offspring population is higher than the minimum fitness value in the parent population, replacing the individuals with the minimum fitness value in the parent population by the offspring individuals, wherein the replacement comprises all attributes such as position, number, gender, fitness value and the like; otherwise, abandoning the filial generation individual and continuing to judge the next filial generation individual. Through the operation, the individuals with the worst performance in the parent population are continuously replaced by the offspring.

8) Executing a local opponent learning strategy;

in the iteration process, part of individuals may go in the opposite direction of convergence, so that the individual performance is poor, although the opposite learning can well improve the individuals with poor performance in the population, the direct introduction of the opposite learning has no great influence on the individuals with good performance in the population and can generate a large amount of redundant calculation, and the opposite faces of the individuals with good performance are often poor. Therefore, screening populations in this application for fitness values lower than the average fitness value introduces opponent learning.

Calculating the fitness value of each individual of the population according to a formula (4), and calculating the average fitness value of the population through a formula (15):

screening female spiders and male spiders which are smaller than the average fitness value in the population, creating a population P, recording the number of the spiders as N ', creating opposite individuals of the spiders in the population P, forming an opposite population OP, calculating the fitness value of the opposite population OP, combining the population P and the population OP, sorting 2N' spider individuals in a descending order according to the fitness value, and selecting N 'spiders before the fitness value as a new population P' to replace the population P (replacement comprises position, gender and the like); and combining the new population P' with the population with the fitness value larger than the average fitness value in the original population to be used as a new current population.

9) Calculating the fitness value of an individual in the current population, finding the spider with the highest fitness value in the current population, and replacing the fitness value G of the previously recorded global optimal spider with the fitness value and the position of the spider with the highest fitness value in the current population if the spider with the highest fitness value in the current population has a higher fitness value than the previously recorded global optimal spider_fitAnd position G_location(ii) a Otherwise, not replacing;

10) g recorded in step 9)_fitAnd G_locationIs assigned to G_fit' and G_location' and judge G_location' whether it is an effective reduction result; if yes, step 11) is executed, otherwise step 12) is executed.

When gamma is_C(D)＝γ_R(D) When the value is 1, the judgment is G_location' is an effective reduction result.

for current reduction result G_location' redundancy check is performed (i.e. whether a redundancy attribute is included) and if so, the redundancy attribute is removed. The specific implementation method for removing the redundancy attribute comprises the following steps:

calculate G in turn_locationIn the' method, the attribute importance of the attribute is deleted when the attribute importance is 0, otherwise, the attribute is not deleted and is put back to the original reduction set, and the judgment of the next attribute is continued.

The attribute importance is calculated according to equation (16):

sig(α,R；D)＝γ_R(D)-γ_R-{α}(D) (16)

the redundancy check is described below with table 1 as an illustrative object:

TABLE 1 decision table

U/IND(D)＝{{x₁,x₂,x₃},{x₄}}，U/IND(C)＝{{x₁},{x₂},{x₃},{x₄}}；

Calculate POS by equation (6)_C(D)＝{x₁,x₂,x₃,x₄At the time of calculating the attribute dependency γ by the formula (5)_C(D)＝POS_C(D) If U is 1, the attribute is deleted in sequence, and α is deleted first₁Then U/IND (R)_{α2,α3}) If { { x1}, { x2}, { x3, x4} } then the POS is known from equation (6)_(C-{α1})(D)＝{x₁,x₂At the time of calculating the attribute dependency γ by the formula (6)_R(D) When 1/2, the attribute importance sig (α)₁If C, D) ≠ 1/2, it is not deletable and added back to the reduction set. Then delete alpha₂From equations (5) - (7), sig (. alpha.) can be calculated₂If C, D) is 0, then alpha is deleted₂. The reduced set after removing the redundancy attribute is { alpha }₁,α₃}. Continue judging alpha for the current reduction set₃，sig(α₃And C, D) 1/2 ≠ 0, and all the attributes are detected once to complete the current-generation redundancy detection process.

12) Finishing a round of iteration, judging whether a stopping condition is met (namely whether the maximum iteration number is met or whether the continuous iteration number with the same reduction result is obtained), if so, finishing the iteration, and outputting G_location', the G_location' is the global optimal reduction result; otherwise, adding 1 to the iteration number (namely g +1), and returning to execute the step 3).

The main performance indicators of the current measurement attribute reduction algorithm are:

1) reduction efficiency: the reduction efficiency refers to the number of attributes obtained by the algorithm after the current data set is reduced.

2) Operating time: the time consumed by the operation of the algorithm is indicated, and the operation time is subject to the stop condition. The design running time of the method is the time when the iteration number reaches 200 times or the same reduction result is obtained by 10 continuous generations.

3) Convergence rate and optimizing ability: the convergence speed reacts to the fitness value whether the algorithm can quickly find the minimum attribute reduction.

The reduction results and the run-time simulation tests using the method of the present invention (i.e., LOSSOAR), IGRARACO algorithm, FSRSAR algorithm, SSORS algorithm, respectively, are described below with specific examples.

In the simulation test of the specific embodiment, the computer environment configuration adopts Windows7 operating system, Intel (R) core (TM) I5-65003.2 GHz CPU, 8G memory.

The algorithm parameters of the invention are set as follows: the total number N of spiders is 25, the weighting factor lambda is 0.9, the threshold PF of the spiders close to or far away from the vibration source is 0.7, the similarity constraint threshold Si is 0.5, and the lowest proportion P of female spiders in the total number of spiders is P_Low0.65, the highest percentage of female spiders, P, of the total number of spiders_HighThe maximum number of iterations Max _ Gen is 200, and the number of consecutive iterations cycle for which the same reduction result is obtained is 10. The stopping condition is that the maximum number of iterations 200 is reached or that the same reduction result is obtained in 10 consecutive iterations. To ensure the accuracy of the experiment, 20 tests were performed per data set.

rand₁，rand₂，rand₃，rand₄Beta, eta and are random numbers, and the value ranges of the random numbers are [0,1]]Created by a function when used.

The data set shown in table 2 is used as an explanatory object.

Table 2: description of data sets

In table 2, the first column is represented by a number, the second column is composed of names of data sets, the third column and the fourth column correspond to the number of instances and the number of attributes, respectively, the fifth column corresponds to the number of classes of data sets, and the last column represents whether a data set is a complete decision system, i.e., whether a missing value exists in a decision table.

The experimental results are as follows:

tables 3-5 are comparisons of reduction results and run-time simulation results using the method of the present invention (LOSSOAR) with the SSORS algorithm, FSRSAR algorithm, and IGRARACO algorithm, respectively.

Table 3: results of the method and the SSORS algorithm of the present invention

Table 4: results of the method and FSRSAR algorithm of the invention

Table 5: the method and the result of IGRARACO algorithm

As can be seen from tables 3-5, the method of the present invention does not perform significantly with the other three algorithms when the data set is small, but the runtime of the method of the present invention is significantly reduced as the amount of data increases.

Tables 6-8 show the reduction process of Zoo, Mushroom, Vote datasets using the method of the present invention, respectively.

Table 6: reduction process on Zoo data set

Table 7: reduction process on Mushroom data set

Table 8: reduction process on Vote datasets

The graphs of the fitness value of the method (LOSSOAR) of the invention and 4 algorithms of the SSORS algorithm, the FSRSAR algorithm and the IGRARACO algorithm with respect to the change of the iteration number on different data sets are shown in fig. 5, where (a) is Zoo data set, (b) is Vote data set, (c) is Mushroom data set, and (d) is Wine data set.

The performance of 4 algorithms was compared by experimental results:

(1) reduction results: as can be seen from tables 3-5, in terms of reduction results, the method of the present invention finds the minimum attribute reduction in most cases, and because an effective improvement strategy is introduced on the basis of the conventional SSORS algorithm, the spider can keep a better state in the initial stage, and individual influences with poor performance are avoided in the searching process and redundant attributes in the final reduction set are removed. So that the reduction result is close to the global optimum result. Compared with an SSORS algorithm and an FSRSAR algorithm, the IGRAACO algorithm has better optimizing capability, can ensure that the minimum reduction result is found under most conditions, but also has the defects that the diversity is reduced and the local optimization is converged due to the fact that the spider cannot search other areas because of adding two times of redundant detection. The FSRSAR algorithm finds one reduction result per fish per iteration and discloses the reduction result at the end of the iteration, and in most cases a better reduction can be found. As can be seen from tables 6-8, Zoo data can find the minimum attribute reduction in the first 3 generations, Vote can find the minimum attribute reduction in the first 5 generations, and Mushroom can find the minimum attribute reduction in the first 10 generations. From the above experimental results, the algorithm of the present invention can find the minimum attribute reduction in a short number of iterations.

(2) Operating time: as can be seen from tables 3-5, the present invention requires more equivalence classes to be computed at runtime than the conventional SSORS algorithm, and therefore is more time consuming. With the increase of the data set, the algorithm has better performance, such as a Mushroom data set, and the running time of the algorithm is shorter than that of an SSORS algorithm, because the algorithm can find the optimal solution under the condition of less iteration times, and the running state of the algorithm is quickly ended. Whereas the SSORS algorithm requires multiple iterations to find a better solution. Therefore, SSORS is instead calculated for longer time. MISSOAR has a shorter run time than IGRARACO, calculating an improved information gain rate in the IGRARACO algorithm is time consuming, and adding two redundant tests also increases the amount of computation. The FSRSAR algorithm performs the worst of the 4 algorithms because each fish in each iteration of the algorithm needs to perform multiple calculations to obtain valid results, and the reduction results are disclosed on the bulletin board when all fish have valid results. In summary, the algorithm proposed herein has a shorter reduction time than the other 3 algorithms and may also exhibit better performance as the data set increases.

(3) Convergence rate and fitness value analysis: as can be seen from fig. 5, the present invention has a higher fitness function value and a faster convergence speed than other algorithms. The igraraaco algorithm is in most cases not very different than the present algorithm, but as can be seen from the voie data set, the IGRARACO algorithm performs less well than the present invention. In addition, compared with the other two algorithms, the algorithm has faster convergence speed and higher fitness value.

In summary, the present invention has fast convergence speed, short runtime, and in most cases minimal property reduction. Meanwhile, the invention also discovers that the invention can still show better performance along with the increase of the data set, which provides a new direction for large data mining.

Claims

1. A method for solving minimum attribute reduction by combining local opponent learning and social spider algorithm comprises the following steps:

1) initializing social spider algorithm parameters and calculating the number N of female spiders_fNumber of male spiders N_mFemale spider position x_fMale spider position x_mThe fitness value F of each spider; comparing the fitness values of all spidersThe largest spider is used as the global optimal spider, and the fitness value G of the global optimal spider is recorded_fitAnd its position G_location(ii) a Wherein the content of the first and second substances,

initializing social spider algorithm parameters includes:

A similarity constraint threshold Si with a value range of [0,1 ];

the similarity constraint includes: computing global optimal spider G_locationWith the remaining spiders x₁,x₂,...,x_N-1Similarity value of and global optimal spider G_locationComparing the average similarity value with the average similarity value of the rest population, comparing the average similarity value with the similarity constraint threshold, executing the step 3) when the average similarity value is larger than the similarity constraint threshold, and otherwise, assigning each digit to the value of [ -4,4] again for the individuals smaller than the average similarity value]Random number, converting the value of each bit of the individual into binary number, and recalculating the global optimal spider G_locationUntil the calculated average similarity value is greater than the similarity constraint threshold;

3) calculating the fitness value of individuals in the current population, finding the spider with the highest fitness value in the current population, and if the spider with the highest fitness value in the current population has a higher fitness value than the previously recorded global optimal spider, using the fitness value in the current populationThe highest fitness value of spiders and the global optimal fitness value G of spiders recorded before position replacement_fitAnd position G_location(ii) a Otherwise, not replacing;

4) female spiders move;

5) the male spider moves;

2. The method for solving the minimum attribute reduction by combining local opponent learning and social spider algorithm as claimed in claim 1, wherein in step 2), hamming distance can reflect the difference of spiders, and after the hamming distance is calculated, the similarity between spiders is calculated, and the specific implementation method is as follows:

the differences between spiders are passed over two binary strings of length n

in the formula (7) and the formula (8),

and respectively representing the l-th bit component of the binary string with the length of n of the spider i and the spider j, and taking the value of 1 or 0.

3. The method for solving minimum attribute reduction by combining local opponent learning and social spider algorithm as claimed in claim 1, wherein in step 4), each female spider updates position according to formula (12) while moving;

wherein, rand₂And rand₃、β₁、β₂、β₃Are all [0,1]Random numbers in between, created by a function when used; g represents the number of iterations; PF represents the threshold for female spiders to approach or move away from a vibration source; s_cAnd Vibc_iRespectively representing the position of a spider which is closest to the female spider i in the population and has a weight larger than that of the female spider i and the vibration of the female spider i caused by the spider; s_bAnd Vibb_iRespectively representing the position of the spider with the maximum weight in the population and the vibration of the spider on the female spider i;

the position of the female spider i at the number of iterations g is indicated.

4. The method for solving minimum attribute reduction by combining local opponent learning and social spider algorithm as claimed in claim 1, wherein in step 5), each male spider updates position according to formula (13) while moving;

wherein, rand₄Eta and all mean values are [0,1]]Random numbers in between, created by a function when used; ind-_medIs the weight median of male spiders; s_fAnd Vibf_iRespectively representing the position of a spider closest to the male spider i in the population and the vibration of the male spider i caused by the spider;

represents a weighted average of the male population,

the position of the male spider i at the number of iterations g is indicated.

5. The method for solving minimum attribute reduction by combining local opponent learning and social spider algorithm as claimed in claim 1, wherein in step 6), the radius r is calculated by formula (14):

wherein the content of the first and second substances,

and

the maximum and minimum values in each dimension.

6. The method for solving minimum attribute reduction by combining local opponent learning and social spider algorithm as claimed in claim 1, wherein in step 8), the local opponent learning specifically comprises: calculating the average fitness value of the current population, screening individuals lower than the average fitness value to create a population P, recording the number of spiders as N ', creating opposite individuals of the individuals to form an opposite population OP, calculating the fitness values of the individuals in the opposite population OP, combining the populations P and OP, sorting 2N' spider individuals in a descending order according to the fitness values, and selecting N 'spider individuals before the fitness value as a new population P' to replace the population P; and combining the new population P' with the population with the fitness greater than the average fitness value in the original population to serve as a new current population.

7. The method for solving minimum attribute reduction by combining local opponent learning with social spider algorithm as claimed in claim 1, wherein in step 10), when γ is used_C(D)＝γ_R(D) When the value is 1, judging G_location' is an effective reduction result, wherein:

c represents a condition attribute set; d represents a decision attribute; gamma ray_C(D) Representing dependency of conditional attributes, γ_R(D) Representing the dependency of the reduction set, representing the proportion of the knowledge in D to the whole domain of discourse U, gamma, which can be correctly classified by using the knowledge in R_R(D) The larger the value of (A), the more the reduced condition attribute can improve the classification capability of the decision table; the smaller the value of card (R), the fewer the number of condition attributes after reduction; gamma ray_R(D) Calculating according to equation (5):

8. The method for solving the minimum attribute reduction by combining local opponent learning and social spider algorithm as claimed in claim 1, wherein in step 11), the attribute is deleted according to the attribute importance of the attribute, and when the attribute importance of the attribute is 0, the attribute is deleted, otherwise, the attribute is not deleted and put back to the original reduction set.