CN106600100B

CN106600100B - Weighted multi-population particle swarm optimization-based hazard source reason analysis method

Info

Publication number: CN106600100B
Application number: CN201610940992.5A
Authority: CN
Inventors: 周良; 李诗瑶; 谢强; 郑洪源
Original assignee: Nanjing University of Aeronautics and Astronautics
Current assignee: Nanjing University of Aeronautics and Astronautics
Priority date: 2016-11-01
Filing date: 2016-11-01
Publication date: 2020-10-27
Anticipated expiration: 2036-11-01
Also published as: CN106600100A

Abstract

The invention provides a method for analyzing a dangerous source reason based on weighted multi-population particle swarm optimization. And introducing the concept of item weight in the data preprocessing stage, and redefining the concept of item set range. In the process of association rule mining, an association rule mining algorithm based on weighted multi-population particle swarm optimization is provided, and the algorithm introduces an inter-population communication mechanism on the basis of multi-population cooperative particle swarm optimization, so that the population diversity is increased, and the defect that the algorithm is easy to fall into a local optimal solution is avoided. Meanwhile, the concept of particle weight is introduced, so that the algorithm can select rules which are more meaningful to a user. Therefore, accuracy and efficiency of analysis of the dangerous source reason are improved, analysis range of the dangerous source reason analysis is expanded, and complexity of the dangerous source reason analysis is reduced.

Description

Weighted multi-population particle swarm optimization-based hazard source reason analysis method

Technical Field

The invention relates to the technical field of information systems, in particular to a weighted multi-population particle swarm optimization-based hazard source reason analysis method.

Background

In the civil aviation air traffic safety management system, hazard source identification and risk assessment are important components. The detailed analysis of the hazard sources and the obtainment of the reasons and action mechanisms of the hazard sources are the prerequisites of the related departments in effectively and accurately evaluating the risks. In a traditional risk source cause analysis system, an event tree analysis method, a butterfly analysis method and a risk and operability analysis are used for analyzing a risk source. At present, experts and scholars at home and abroad propose a plurality of different analysis methods, but most of the analysis methods are established on the traditional analysis system. These methods can analyze the cause of the hazard from different sides, but the analysis lacks comprehensiveness, and for this reason, a method capable of more comprehensively analyzing the hazard is needed.

Disclosure of Invention

The purpose of the invention is as follows: in order to overcome the defects in the prior art, improve the accuracy of the analysis of the causes of the dangerous sources, expand the analysis range of the analysis of the causes of the dangerous sources, improve the analysis efficiency of the analysis of the causes of the dangerous sources and reduce the complexity of the analysis of the causes of the dangerous sources, the invention provides a method for analyzing the causes of the dangerous sources based on weighted multi-population particle swarm optimization,

the technical scheme is as follows:

a danger source reason analysis method based on weighted multi-population particle swarm optimization comprises the following steps:

(1) distributing weights to each item of the hazard source by using a manual method or according to an existing algorithm; setting a danger source project set I and a danger source transaction database D; each danger source transaction of the danger source transaction database is represented by binary;

(2) define a weighted item set range:

wherein m ≠ n and m < n, m and n respectively denote the length of the item set, i.e. the number of items contained in the item set; wi (m) and wi (n) represent the weights of the sets of items, Tran (m) and Tran (n) represent the number of transactions containing the corresponding sets of items, WT (m, n) and Tran (m, n) represent the weight and number of transactions containing m and n and satisfying m → n, respectively, Σ WT (t) represents the sum of the weights of all transactions;

wherein, I (j) represents jth item in transaction T, | T | represents number of items in transaction;

(3) and (3) according to the m and n corresponding to the largest WRI obtained by calculation in the step (2), respectively serving as a front partition point and a rear partition point of the association rule, coding the association rule, and generating a candidate association rule set R ═ R of the hazard source₁,…,R_m}; taking the association rule as the particles of the particle swarm, and determining a fitness function:

wherein WSPI (A) is the weighted support of the item set, N₁And N₂Is a weight parameter used to balance Support and confidence, Support (a @ B) refers to the number of transactions containing both items a and B, and | N | refers to the total number of transactions in the transaction database; WI (A) is the weight of the set of items containing A, and Trans (A) is the number of transactions containing A;

(4) carrying out danger source association rule mining by using a weighted multi-population particle swarm algorithm:

(41) randomly initializing the speed and the position of the particles, and clustering the particles according to the positions to generate different particle clusters;

(42) calculating a cluster range CR of each particle cluster by using a fitness function, and gradually ordering each particle cluster according to the cluster range;

(43) updating the optimal positions pbest, global optimal positions gbest and global local optimal positions gpest of all the particles according to the fitness function; and updates the velocity v of the particle_ij(t) and position x_ij(t)；

(44) Comparing particle fitness values fitvalue_ijMinimum fitness value min fit of cluster where particles are located_iAnd a maximum fitness value max fit_iThe relationship between: if min fit_i＜fitvalue_ij＜max fit_iThe position of the particle is not changed; if fitvalue_ij＜min fit_iAnd w_ij＞min w_i-1Wherein, min w_i-1Is a cluster C preceding the cluster in which the particle is currently located_i-1The minimum weight of (c); incorporating particles into C_i-1Cluster and delete num particles in the cluster with the smallest fitness value d_i-1,1…,d_i-nu1mAt the same time, num new particles are generated_i-1,1…,new_i-nu1m}; if fitvalue_ij＞max fit_iAnd w_ij＞min w_i+1Wherein, min w_i+1Is the cluster C next to the cluster in which the particle is currently located_i+1The minimum weight of (c); incorporating particles into C_i+1Cluster and delete num' particles in the cluster with the smallest fitness value d_i+11,…,d_i+1num′At the same time, num' new particles are generated_i+11,…,new_i+1num′}；

(45) And (6) repeating the steps (42) to (44) until the optimal particle generation association rule is found or the iteration times are reached, and obtaining the precondition of the rule from the obtained association rule in a follow-up mode to obtain the reason of the hazard source.

In the step (1), each hazard source transaction in the hazard source transaction database is represented by binary specifically as follows: representing each transaction in the form of a set of binary 0 s and 1 s; the length of the transaction is the number of items in the item set; each bit in the binary system represents N items of the dangerous source item set database respectively; if the jth entry would have this position 1 in the transaction; otherwise, the position is set to 0.

The fitness function in the step (3) is used for measuring the importance of the association rule in the group; the support degree and the confidence degree of the association rule are combined by using a weighting method to obtain:

in order to better reflect the relation between the item weight and the support degree and the confidence degree, the weighting support degree is introduced into the formula; the weighted support wspi (a) for a set of items is defined as follows:

WSPI(A)＝WI(A)Trans(A)

thus, a fitness function is obtained:

the encoding of the association rule in the step (3) specifically comprises: each item is represented by a 2-bit binary code; wherein 00 indicates that the item is a precedent of the association rule, 11 indicates a successor of the association rule, and 10 and 01 both indicate that the item does not belong to the association rule; one association rule has 2n bits in total.

In said step (44) the velocity v of the particles is updated according to_ij(t) and position x_ij(t)：

The step (44) uses a mutation operation in a differential evolution algorithm in updating particles to generate new particles: randomly selecting 3 individuals in the population as a source of variation for new individuals, the new individuals being generated by:

wherein New represents a New individual in the population, r₁，r₂And r₃Is a random number between 0 and | N |, w_DEAre the differential weights.

Has the advantages that: compared with the prior art, the risk source reason analysis method based on the weighted multi-population particle swarm optimization has the following benefits:

(1) by using the group intelligent algorithm, human intervention in the analysis process is reduced, and the analysis efficiency of the analysis of the hazard source reason is improved;

(2) by introducing an inter-population communication mechanism into the group intelligent algorithm, the diversity of the population is increased, the analysis range of the cause analysis of the hazard source is expanded, and the problem that the particle swarm algorithm is easy to fall into local optimum is solved;

(3) by using the concept of weighting in the algorithm, the accuracy of the algorithm for analyzing the cause of the hazard source is improved.

Drawings

FIG. 1 is a transaction representation of an association rule; wherein (a) is a transaction and (b) is a binary representation of the transaction in (a);

FIG. 2 is a flow of association rule generation;

FIG. 3 is a flow chart of the method of the present invention.

Detailed Description

The present invention will be further described with reference to the accompanying drawings.

The method for analyzing the dangerous source reason based on the weighted multi-population particle swarm optimization HCA-WMPSO considers the reason analysis of the dangerous source as a process of mining the association rule. To mine the associationIn the process of the rule, more meaningful rules can be generated and the influence of project weight on rule mining is considered, and the concept of the project weight is introduced in the data preprocessing stage to redefine the concept of the project set range. In the process of association rule mining, an association rule mining algorithm based on weighted multi-population particle swarm optimization is provided, an inter-population communication mechanism is introduced into the algorithm on the basis of multi-population cooperative particle swarm optimization, population diversity is increased, and the defect that the algorithm is easy to fall into a local optimal solution is overcome. Meanwhile, the concept of particle weight is introduced, so that the algorithm can select rules which are more meaningful to a user. In association rule mining, HCA-WMPSO is implemented in 2 steps. Before describing the implementation steps we assume that the set of hazard items I ═ { I ═ I₁,…,I_nAnd a danger source transaction database.

First, data preprocessing

Firstly, distributing weights to each item of a hazard source by using a manual method or according to an existing algorithm; next, each hazard source transaction T of the hazard source transaction database is stored_iExpressed in binary, each bit in the binary respectively represents N items of the dangerous source item set database, if the jth item in the transaction takes the position as 1; otherwise, setting the position as 0; finally, we scan the transaction database to calculate the weight wt (t) of the transaction according to equation (2) and calculate the weighted itemset range WRI.

1. Binary conversion

To improve the scanning efficiency of the database and to more conveniently calculate the rule support and confidence, each transaction is represented in the form of a set of

binary

0 and 1. The length of a transaction is the number of items in the set of items. Suppose now that the project set contains 4 projects I₁，I₂，I₃And I₄The transaction database contains 5 transactions, T₁，T₂，T₃，T₄And T₅Their binary representation is shown in FIG. 1(a) and in FIG. 1 (b). In a transaction, if an item is contained, the corresponding bit is set to 1; otherwise, it is set to 0.

2. Item set scoping computation

To generate more meaningful association rules, the concept of item set Range (RI) is introduced herein. In the process of actual association rule mining, different importance degrees of different items in the rules are considered. We define a Weighted Itemset Range (WRI) as follows:

definition 1: weighted itemset range

In the definition, m ≠ n and m < n, m and n respectively denote the length of the set of items, i.e. the number of items contained in the set of items; wi (m) and wi (n) represent the weights of the item sets, Tran (m) and Tran (n) represent the number of transactions containing the respective item sets, WT (m, n) and Tran (m, n) represent the weights and numbers of transactions containing m and n and satisfying m → n, respectively, Σ WT (t) represents the sum of the weights of all transactions, and in formula (1), TWT () is of the form:

where I (j) represents the jth item in transaction T and | T | represents the number of items in the transaction.

And by calculating WRI, taking m and n corresponding to the largest WRI as a front partition point and a rear partition point of the association rule respectively, wherein the front partition point is the smallest number of items capable of being used as antecedents of the association rule, the rear partition point is the largest number of items contained in the association rule, and in the association rule, only the items appearing between the partition points m and n can be used as successors of the association rule. As shown in FIG. 2, WRI (2 → 4) takes the maximum value, and in one transaction record, the first two items of the transaction are used as the antecedent condition X of the rule, and the 3 rd and 4 th items are used as the result Y of the rule.

Second, analysis of causes of dangerous sources

1. Regular coding and fitness value calculation

And taking m and n corresponding to the largest WRI as the number. The association rule is encoded by a chromosome encoding method, so that the algorithm can efficiently calculate the fitness value. Each item is represented by a 2-bit binary code, where 00 represents that the item is a precedent of the association rule, 11 represents a successor of the association rule, and 10 and 01 both represent that the item does not belong to the association rule, so that one association rule has a total of 2n bits.

When mining association rules using a population of particles, we consider the association rules as particles of the population of particles. The fitness function is used to measure the importance of association rules in a population. In the association rule mining, rules with support (support) greater than minimum support Minsupport and Confidence (Confidence) greater than minimum Confidence are discovered. Compresse combines the support and confidence of association rules by using a weighted method, which is defined as follows:

in the formula, N₁And N₂Is a weighting parameter used to balance Support and confidence, Support (a @ B) refers to the number of transactions that contain both items a and B, and | N | refers to the total number of transactions in the transaction database.

To better reflect the relationship between item weight and Support and confidence, we introduce the concept of Weighted Support (WSP) in the above formula. The weighted support wspi (a) for a set of items is defined as follows:

WSPI(A)＝WI(A)Trans(A)

where WI (A) is the weight of the set of items containing A and Trans (A) is the number of transactions containing A. We define Weighted comparison as follows:

as a fitness function of the algorithm.

2. Hazard source association rule mining

To get rules that are more meaningful to the user, we introduce the concept of particle weights. In the algorithm, each particle is an association rule, and the weight of the particle is the weight of the association rule, that is, the weight of the association rule is regarded as the weight of the transaction. To increase information sharing among populations, we introduce the concept of global local best, which refers to the maximum of local best solutions among all populations. When the particle speed is updated, gptest participates in speed updating as a part of the new particle speed, and the speed updating after the gptest is introduced is as follows:

in the formula, w_inertiaIs the inertial weight, c₁，c₂And c₃Is 3 constants, r₁，r₂And r₃Is [0,1 ]]Random numbers, w, satisfying a uniform distribution therebetween_iIs the particle weight, pbest is the local optimal solution for the particle in the sub-population, gbest is the global optimal solution for the particle population, where 1-w_iIs to adjust the random coefficient r so that the particles with higher weights have a greater probability of approaching the optimal solution.

The weighted multi-population particle swarm optimization introduces an inter-population communication mechanism on the basis of multi-population, thereby increasing the population diversity of the population and avoiding the defect that the algorithm is easy to fall into the local optimal solution. Meanwhile, the concept of particle weight is introduced, so that the algorithm can select rules which are more meaningful to a user.

In the algorithm, in order to simulate the characteristics of birds in the predation process, a K-means clustering algorithm is firstly utilized to cluster association rules R to obtain different particle clusters R ═ C _i1, …, n, each cluster of particles C_iIt represents a bird "population". For each cluster C respectively_iThe rule in (1) calculates the fitness value to obtain the maximum value max fit and the minimum value min fit of the fitness value as the boundary value BV of the cluster, the range [ min fit, max fit]Is the cluster region CR. Meanwhile, by comparing the weights of all the particles in the cluster, the minimum C is found_iIs calculated. To facilitate the latter operation, the clusters are arrangedSorting was done in increments of CR. In the algorithm, let C_iEach particle R in a cluster_ijAre all in their own CR_iSearching internally, wherein in the searching process, the particles need to check whether the particles exceed the searching range of the particles and perform corresponding operation, the process is an inter-population information interaction mechanism, and the specific steps are as follows:

step 1: calculating the particle R_ijFitness value of_i；

Step 2: compare fitvalue_iBoundary value BV with cluster in which the particle is located_i: if min fit_i＜fitvalue_j＜max fit_i；

Then the particle R_ijThe position is unchanged;

and step 3: if fitvalue_j＜minfit_iOr fitvalue_j＞maxfit_i(ii) a Then at the C_iGenerating new particles in the clusters;

simultaneously, comparing the weight of the particle with the weight of the current cluster;

and 4, step 4: if the weight w of the particle_ij＞min w_i-1(or min w)_i+1) (ii) a Then the particle R_ijIncorporation into item C_i-1(or C)_i+1) Clustering; deleting the particles (num) in the cluster having the smallest fitness value; and generating new particles

Until the maximum number of iterations is reached and the weight of the new particle is less than the minimum weight of the current cluster; otherwise, only new particles are generated

Until the maximum number of iterations is reached and the weight of the new particle is less than the minimum weight of the current cluster;

the above process completes the particle update process, in this process, we randomly select 3 individuals in the population as the variation source of new individuals in DE, and generate new individuals by a certain rule, the form is as follows:

In order to ensure the effectiveness of new particles generated by the algorithm, each time a new particle is generated, the WRI of the particle is calculated, and if the value is within the WRI range of the population, the particle cluster receives the new particle; otherwise, the process is cycled through until a particle is generated that satisfies the condition or a maximum number of iterations is reached.

Setting weighted inertia coefficients w of multi-population particle swarm optimization_inertiaDecreasing from 0.9 to 0.4, c₁，c₂And c₃Are respectively c₁＝c₂＝2，c₃The initialization population size N is 100, and the maximum number of iterations num is 1_iteration300, the velocity V of the population particles₀Is 0 and randomly assigned particle position X₀. The general flow of the mining of the hazard source association rules is shown in fig. 3.

And (3) initializing the speed and the position of the particles at random at the beginning, and clustering the particles according to the positions by using a k-means algorithm to generate different particle clusters. And calculating the cluster range CR of each particle cluster by using a fitness function, and sequencing each particle cluster in an increasing way according to the cluster range. And in the searching process, particle position updating operation is carried out according to the improved speed updating formula in the text. The method specifically comprises the following steps:

step 1: initializing a population N_pAnd the maximum number of iterations N_t(ii) a Initial velocity V for initializing population particles₀And an initial position X₀(ii) a According to V₀And X₀Assigning particles to different clusters of particles C ═ C using a clustering algorithm₁,…,C_l}；

Step 2: calculating the fitness value fitvalue of each particle_ij，C₁To C_lCluster range CR of_i＝[min fit_i,maxfit_i]And minimum weight in cluster min w_iAnd sorting the particle clusters in descending order of cluster range C '═ C'₁,…,C′_l}；

And step 3: updating the optimal positions pbest, global optimal positions gbest and global local optimal positions gpest of all the particles according to the fitness function;

and 4, step 4: velocity v of the renewed particle_ij(t) and position x_ij(t)；

And 5: comparing particle fitness values fitvalue_ijMinimum fitness value min fit of cluster where particles are located_iAnd a maximum fitness value max fit_iThe relationship between:

step 5.1: if min fit_i＜fitvalue_ij＜max fit_iThe position of the particles is unchanged;

step 5.2: if fitvalue_ij＜min fit_iAnd w_ij＞min w_i-1(min w_i-1Is a cluster C preceding the cluster in which the particle is currently located_i-1Minimum weight of) incorporating the particle into the C-th_i-1Cluster, delete num particles with minimum fitness value in cluster { d_i-11,…,d_i-1numAt the same time, num new particles are generated_i-11,…,new_i-1num}；

Step 5.3: if fitvalue_ij＞max fit_iAnd w_ij＞min w_i+1(min w_i+1Is the cluster C next to the cluster in which the particle is currently located_i+1Minimum weight of) incorporating the particle into the C-th_i+1Cluster, delete num' particles in cluster with minimum fitness value { d_i+11,…,d_i+1num′At the same time, num' new particles are generated_i+11,…,new_i+1num′}；

Step 6: repeating the step 2 to the step 5 until the optimal particle generation association rule is found or the iteration times are reached;

and 7: the preconditions we have taken from the subsequent backtracking of the association rule to the rule are the cause of the hazard.

The above description is only of the preferred embodiments of the present invention, and it should be noted that: it will be apparent to those skilled in the art that various modifications and adaptations can be made without departing from the principles of the invention and these are intended to be within the scope of the invention.

Claims

1. A danger source reason analysis method based on weighted multi-population particle swarm optimization is characterized by comprising the following steps: the method comprises the following steps:

(2) define a weighted item set range:

wherein m ≠ n and m < n, m and n respectively denote the length of the item set, i.e. the number of items contained in the item set; wi (m) and wi (n) represent the weights of the sets of items, Tran (m) and Tran (n) represent the number of transactions containing the corresponding sets of items, WT (m, n) and Tran (m, n) represent the weight and number of transactions containing m and n and satisfying m → n, respectively, and Σ WT (t) represents the sum of the weights of all transactions;

(3) and (3) according to the m and n corresponding to the largest WRI obtained by calculation in the step (2), respectively serving as a front partition point and a rear partition point of the association rule, coding the association rule, and generating a candidate association rule set R ═ R of the hazard source₁,…,R_m}；

Compresse combines the support and confidence of association rules by using a weighted method, which is defined as follows:

taking the association rule as the particles of the particle swarm, and determining a fitness function:

wherein, wspi (a) is the weighted support of the item set, wspi (a) ═ wi (a) trans (a); n is a radical of₁And N₂Is a weight parameter used to balance Support and confidence, Support (a @ B) refers to the number of transactions containing both items a and B, and | N | refers to the total number of transactions in the transaction database; WI (A) is the weight of the set of items containing A, and Trans (A) is the number of transactions containing A;

(44) Comparing particle fitness values fitvalue_ijMinimum fitness value minfit to the cluster in which the particle is located_iAnd maximum fitness value maxfit_iThe relationship between: if minfit_i＜fitvalue_ij＜maxfit_iThe position of the particle is not changed; if fitvalue_ij＜minfit_iAnd w_ij＞minw_i-1Wherein, minw_i-1Is a cluster C preceding the cluster in which the particle is currently located_i-1The minimum weight of (c); incorporating particles into C_i-1Cluster and delete num particles in the cluster with the smallest fitness value d_i-11,…,d_i-1numAt the same time, num new particles are generated_i-11,…,new_i-1num}; if fitvalue_ij＞maxfit_iAnd w_ij＞minw_i+1Wherein, minw_i+1Is the cluster C next to the cluster in which the particle is currently located_i+1Minimum weight of(ii) a Incorporating particles into C_i+1Cluster and delete num' particles in the cluster with the smallest fitness value d_i+11,…, d_i+1num′At the same time, num' new particles are generated_i+11,…,new_i+1num′}；

2. The method for analyzing cause of risk source according to claim 1, wherein: in the step (1), each hazard source transaction in the hazard source transaction database is represented by binary specifically as follows: representing each transaction in the form of a set of binary 0 s and 1 s; the length of the transaction is the number of items in the item set; each bit in the binary system represents N items of the dangerous source item set database respectively; if the jth entry would have this position 1 in the transaction; otherwise, the position is set to 0.

3. The method for analyzing cause of risk source according to claim 1, wherein: the fitness function in the step (3) is used for measuring the importance of the association rule in the group; the support degree and the confidence degree of the association rule are combined by using a weighting method to obtain:

WSPI(A)＝WI(A)Trans(A)

thus, a fitness function is obtained:

4. the method for analyzing cause of risk source according to claim 1, wherein: the encoding of the association rule in the step (3) specifically comprises: each item is represented by a 2-bit binary code; wherein 00 indicates that the item is a precedent of the association rule, 11 indicates a successor of the association rule, and 10 and 01 both indicate that the item does not belong to the association rule; one association rule has 2n bits in total.

5. The method for analyzing cause of risk source according to claim 1, wherein: in said step (44) the velocity v of the particles is updated according to_ij(t) and position x_ij(t)：

x(t+1)＝x(t)+v(t+1)

In the formula, w_inertiaIs the inertial weight, c₁，c₂And c₃Is 3 constants, r₁，r₂And r₃Is [0,1 ]]Random numbers, w, satisfying a uniform distribution therebetween_iIs the particle weight, pbest is the local optimal solution of the particle in the sub-population, gbest is the global optimal solution of the particle population; wherein, 1-w_iIs to adjust the random coefficient r so that the high-weighted particles are close to the optimal solution.

6. The method for analyzing cause of risk source according to claim 1, wherein: the step (44) uses a mutation operation in a differential evolution algorithm in updating particles to generate new particles: randomly selecting 3 individuals in the population as a source of variation for new individuals, the new individuals being generated by:

wherein New represents a New individual in the population, r₁，r₂And r₃Is a random number between 0 and | N |, w_DEIs a differential weight。