CN107579844A

CN107579844A - It is a kind of that failure method for digging is dynamically associated based on service path and frequency matrix

Info

Publication number: CN107579844A
Application number: CN201710710411.3A
Authority: CN
Inventors: 郑小禄; 黄宁; 胡波; 仵伟强
Original assignee: Beihang University
Current assignee: Beihang University
Priority date: 2017-08-18
Filing date: 2017-08-18
Publication date: 2018-01-12

Abstract

The present invention is a kind of to dynamically associate failure method for digging based on service path and frequency matrix：First, structure dynamically associates the data model of failure；2nd, the piecemeal Mining Strategy based on very big service path is designed；3rd, the scanning strategy based on frequency matrix is designed；4th, BFM algorithms global design.The present invention dynamically associates fault data model by structure and analysis judges feature, specify that excavation object, gives the optimization space of digging efficiency；Linked character of the onboard networks failure based on service path is analyzed, provides the partition strategy for avoiding noise combination, algorithm just filters out most of noise information before search iteration；Each piecemeal is decomposed into characteristic vector and frequency matrix, associate(d) matrix computing feature and frequent item set property, the cyclic policy that external memory scanning is greatly reduced is provided, greatly reduces algorithm cycle-index and comparing calculation；Step 4, in summary strategy devise BFM algorithms.The inventive method, the excavation speed of frequent item set can be effectively improved.

Description

It is a kind of that failure method for digging is dynamically associated based on service path and frequency matrix

Technical field

The invention belongs to onboard networks fault diagnosis technology field, more particularly to one kind to be based on service path and frequency matrix Dynamically associate failure method for digging.

Background technology

Various Aerial Electronic Equipments on aircraft are connected by end system with interchanger, are formed by several centered on interchanger Spoke subnet composition onboard networks system.Nearly half a century, undergoing scattered, joint, comprehensive, four ranks of high integrity Duan Hou, the synthesization trend of onboard networks, cause equipment room relation complicated, influence each other more and more closely, showed between failure Complicated incidence relation：One failure often triggers other multiple failures, these failure dynamic dependencies, constantly association triggering.Work Cheng Zhong, in face of simultaneous multiple failures, in the case where its incidence relation is unknown, engineering staff can only investigate one by one, row Therefore efficiency is low.

A large amount of fault datas are saved in onboard networks operation, contain related information between failure, but because of Mishap Database Huge, it is unpractical manually to analyze searching one by one.And association rule mining main purpose is exactly from large data objectses, lead to Cross historical information, excavate implying between individual or be not easy the incidence relation being found.Utilize association rule mining, Ke Yiyou Effect finds out the incidence relation of onboard networks failure from Mishap Database.

Apriori algorithm is the classical Boolean Association Rules method for digging by propositions such as R.Agrawal.Algorithm is based on level Iteration thought, according to minimum support, by connection and beta pruning, successively search iteration, until not meeting minimum support again Item collection occur, Mining Frequent Itemsets Based, be then based on frequent item set generation Strong association rule.Algorithm is simple and practical, with a high credibility, Strong applicability.However, the algorithm needs to be repeatedly scanned with transaction database (external memory) and connection, beta pruning circulation in excavating, follow every time Ring is required for indifference combination producing candidate, then contrasts elimination noise information.Therefore, transaction database size, noise ratio Digging efficiency is directly influenced again, and combination is also directly determined with eliminating amount of calculation size by transaction database.

Onboard networks system testing business is more, and transaction database noise proportion is larger, and failure, which produces, will not form class Phenomenon is broken out like the concentration of communication network " alarm windstorm ", if Apriori algorithm is applied into onboard networks trouble correlation analytic, The problems such as denoising is not thorough, sweep speed is slow will be caused.At present, traditional association rule digging method in actual applications more is based on Apriori algorithm is improved, such as the methods of hash, Redundant Transaction Compression idea, sampling, dynamic item collection, though can solve efficiency to a certain degree The problem of low, but during applied to similar onboard networks Mishap Database, can not solve onboard networks relevant fault number of transactions compared with The more and unconspicuous problem of correlation, it is difficult to bring significantly improving for digging efficiency, engineering demand can not be met.Therefore, currently Method for digging can not be applied to the association rule mining that onboard networks dynamically associate failure very well.

The content of the invention

The invention aims to realize to dynamically associate onboard networks the fast and effective standard of correlation rule between failure True excavation, there is provided a kind of that failure method for digging is dynamically associated based on service path and frequency matrix, specially a kind of base Piecemeal Mining Strategy in service path and the quick scanning strategy based on frequency matrix, this method support corresponding engineering software Exploitation design, so as to which ancillary works personnel quickly carry out fault location in actual applications, complete fault diagnosis.

To achieve the above object, as shown in figure 1, technical scheme provided by the invention, which is one kind, is based on service path and frequency Matrix dynamically associates failure method for digging, it is characterized in that methods described includes：

Step 1: structure dynamically associates the data model of failure；

This step dynamically associates failure by definition, and analysis dynamically associates fault verification feature, to dynamic in engineering practice State relevant fault data are pre-processed, and structure onboard networks dynamically associate the data model of failure, so as to clear and definite this method Object is excavated, this outer analysis simultaneously demonstrates the feasibility for dynamically associating fault verification feature reduction noise combination, so as to method stream The statement of journey.

Step 101, clearly dynamically associate fault definition

In the present invention, the definition that onboard networks dynamically associate failure is：Using avionics business as flow in onboard networks operation Each application between Dynamically logic connection relation be present, cause under certain proof stress, certain failure occur when, on current business road On footpath association triggers new failure, and dynamically associating with service path, and failure shows dynamic dependency and constantly propagated Feature.Failure with such dynamic associations is referred to as dynamically associating failure.

Wherein, service path, dynamic, relevance are defined respectively as：

Service path：Under certain proof stress, logic connecting relation is respectively applied in avionics operation flow.As what comes into a driver's strengthens Control business, under the proof stress of particular device power-up, its service path is：What comes into a driver's enhancing → RDC → programmable integrated process → Display processing unit.Fig. 2 is certain onboard networks moment service path schematic diagram of loading section avionics business.

Dynamic：Under same proof stress, failure generation is closely related with service path, and the dynamic of service path causes The dynamic of failure.For example, Fig. 3 show certain research institute's physical fault case, certain test business can be realized by two paths, Path 1：ADC1 → DPU1, path 2：ADC1→RDC1→DPU1.Experiment finds that ADC1 equally power-up and data is sent, test During business passage path 1, the failure of vision signal shake misalignment can occur for DPU1, and during passage path 2, after RDC1, This failure does not occur for DPU1.Obviously, same proof stress condition, some failures only can be just presented by specific transactions path Out.

Relevance：Multiple failures may be caused by same failure in same test business, i.e., same proof stress and same There is relevance between the failure of service path.For example, Fig. 4 is a typical case：Failure 1 shows as bus data monitoring and set There is packet loss in the standby AFDX interchangers forwarding data that monitor；Failure 2 shows as without Distal promoter equipment alarm believing on display screen Number display, " service path " and " proof stress " information of two failures is identical.Two faulty equipments be respectively AFDX interchangers with Display screen, its failure procedure are：AFDX exchange data packet losses, cause the data-transmission interruptions on service path, and display is set It is standby not receive valid data, so as to which presentation of information can not be carried out.It can be seen that in Mishap Database, have same from the example Proof stress condition and the identical or intersecting failure of service path, it is most likely that be association triggering.

Step 102, determination dynamically associate the judgement feature of failure

Analyzed according to step 101, onboard networks failure has dynamically associates feature based on service path, its correlation rule It is closely related with fault attribute " service path " and " proof stress ", therefore the present invention provides following judgement feature：

Feature 1：Service path is non-intersect, two different failures of proof stress, it is impossible to be that direct correlation triggers.

Feature 2：Two failures of indirect association triggering, the failure pair that can be triggered by limited individual direct correlation, find it Incidence relation.

Step 103, fault data pretreatment and model construction

Investigate certain research institute's physical fault list information, failure mainly include phenomenon of the failure, faulty equipment, service path, should The attributes such as power condition, according to method for digging demand, fault message model F constructed by the present invention is as follows：

Wherein, S={ S₁,S₂,...S_i... } and it is fault attribute " experiment ID ", X={ X₁,X₂,...X_i... } and it is failure Attribute " proof stress ", I=A, B ... } it is phenomenon of the failure,For fault attribute " service path ",For failure i service path, wherein n_j,n_k∈ n, n=1,2 ... } it is malfunctioning node equipment.Each parameter phase The different value for answering sequence to be corresponding field in raw data base is formed by dictionary ordering map.

Example 1, if certain network topology, as shown in figure 5,1~7 is device node, research Mishap Database is found, the network operation In 6 kinds of failures such as A~F often occur in 1,2,3,5,6,7 equipment.

According to fault message model, 12 failure loggings are as shown in table 1 after quantization,

Table 1 is pretreated Mishap Database

In step 104, analysis mining method, the feasibility for judging feature reduction noise combination is utilized

The judgement feature provided according to step 102, it can judge whether to there may be between two failures in advance to dynamically associate Relation, so as to avoid the combination of noise in the connection step of frequent-item.

Example 2. is used as an affairs, shape based on the Mishap Database application Apriori algorithm shown in table 1 once to test Into transaction set as shown in table 2, by demonstrating by L₁Generate L₂Process, it was demonstrated that, can be significantly using the feature 1 in step 102 Improve the feasibility of digging efficiency；If minimum support Sup_min=2；

Table 2 is trade type database-transaction set

(1) trade type database-transaction set of scan table 2, L is generated₁

Table 3 is 1 Frequent Set list

(2) carry out First Contact Connections step to calculate, generation candidate C₂

Table 4 is 2 Candidate Set lists

Observe C₂, according to judgement feature 1, contrast table 2, it is known that failure combines in table 4：AD、AE、AF、BD、BE、BF、CD、 CE, CF, its service path have no to intersect, and proof stress is entirely different, it is impossible to are relevant faults, therefore are unlikely to be frequent episode Collection；These combinations are entirely the noise information of connection step generation；

In search iteration, each step of algorithm all can a large amount of similar noise information of combination producing, then beta pruning disappears one by one Except wherein most, this greatly reduces digging efficiency and the degree of accuracy；Analyzed according to judgement feature 1 and example 2, this partial noise is It is avoidable.

Therefore, in dynamically associating failure and excavating, using the judgement feature of service path, it is feasible to reduce noise combination 's；

Step 2: piecemeal Mining Strategy of the design based on very big service path；

This step proposes very big service path concept, designed on this basis according to the judgement feature for dynamically associating failure The piecemeal Mining Strategy of frequent item set.

Step 201, propose very big service path concept

Very big service path：Travel through in fault database after institute faulty proof stress and service path attribute, if business road Footpath R₁~R_nWith inclusion relation, and proof stress is X_i, wherein R_jComprising remaining (n-1) bar service path, then (X_i,R_j) be One very big service path.

Define service path vector R₂Include R₁Computing be R₁@R₂：if(X₁=X₂)&Len(R₁)≤Len(R₂),Then R₁@R₂.Wherein, Part (R_i, k) and represent vectorial R_iThe continuous fragment that length is k forms Gather (k≤Len (R_i)), Part (R_i,k)_jRepresent in the set jth item (j=1,2 ..., Len (R_i)-k+1)。

If R₁=(1,2,3,4), Len (R₁)=4, Part (R₁,3)₂=(2,3,4), Part (R₁, 3)=(1,2,3), (2,3,4) }, if R₂=(2,3), R₃=(3,4,5), and X₁=X₂=X₃, then R₂@R₁Set up, R₃@R₁It is invalid.

The Mishap Database that example is 3. 1 can generate following 4 very big service paths after traversal：

{(X₁,(1,2,3)),(X₂,(2,3)),(X₃,(5,6,7)),(X₄,(5,6))}

Step 202, clearly very big service path characteristic

Very big service path characteristic：The failure of two direct correlation triggerings, maximum probability betide same greatly business road Under footpath, and as experiment carries out being intended to 100% with fault database increase, this probability.

Specificity analysis：In fact, under same stress condition, the failure that two service paths mutually include is most likely direct pass Connection；Under same stress condition, two service paths have the failure of certain coincidence relation, the i.e. intersecting event in service path part Barrier, it may be possible to direct correlation, but it is integrated with testing, after traveling through fault database, very big service path is more perfect, this Possibility can be less and less.

Characteristic proves：As shown in fig. 6, the failure direct correlation of AB two on test path 1, as system testing is integrated, industry Business Path extension is path 2, then the failure C on path 2 and AB is likely to relevant.Because two paths equally should It is also turned under the conditions of power, and the precedent that two fault correlations occur is present, therefore can not be by two service paths with inclusion relation Piecemeal.

Two paths with overlapping relation are also possible to fault correlation relation be present, the failure AB in path 1 and the event in path 3 There is certain incidence relation in barrier BD, simply two test business are currently without integrated, i.e., does not have that " Distal promoter is to showing in fault database Show out of order precedent on the test service path of processor 2 ", therefore the current generation does not travel through out the very big industry comprising ABD also Business path.But with the integrated more and more higher of experiment, very big service path is gradually perfect, the situation of final " partially overlapping " Can be fewer and fewer.

If data are sufficiently large, path 1 and path 3 are not integrated into a service path yet, then failure ABD associations occur general Rate is minimum.Because after system completes nearly all normal work to do, two paths are not integrated into very big service path yet, can recognize It is small probability event to be also turned on for two paths in conventional operation.

Analysis can prove the characteristic of very big service path above.

Step 203, design piecemeal Mining Strategy

The very big service path characteristic provided according to step 202, proposes following piecemeal Mining Strategy：

(1) after generating transaction set, ergodic data storehouse：

Obtain very big service path set：

{(X₁,R₁),(X₂,R₂),...,(X_k,R_k)}

(2) very big service path set is utilized, database is divided into k blocks, as shown in table 5：

Table 5 is that database splits k blocks

(3) each piecemeal dredge operation isolation is carried out.Affairs frequency calculates, and is only added up in same transaction set piecemeal.Such as three There is A in individual transaction set piecemeal, then respective meter frequency, the item in different transaction set piecemeals, connect walk when not combined crosswise. Using the process of very big service path generation transaction set piecemeal as shown in example 4.

Example 4. sets certain network topology as shown in figure 8,1~7 is device node, and research Mishap Database is found, the network operation In 7 kinds of failures such as A~G often occur in 1,2,3,4,5,7 equipment, totally 17 failure loggings.

The transaction set of algorithm generation is as shown in table 6：

Table 6 is the failure transaction set of case 4

Totally 8 affairs, algorithm connection need full transaction set combination to intersect to database when walking.In fact, need combined There are three groups shown in table, be only possible to produce correlation rule inside only this three groups.Therefore, carried out using very big service path method Piecemeal, the transaction set piecemeal of generation is such as

It is shown：

Table 7 is the transaction set database after piecemeal

When middle transaction set scans for iteration, deleted afterwards without first being combined failure different piecemeals, each piece is excavated mutually Isolation, is carried out simultaneously.Although for example, include A in three piecemeals, during connection step operation, A frequency separate computations, and it is different Item (such as B, C and D, E, and F, G etc.) between piecemeal does not combine when connecting and walking.Before and after contrasting piecemeal, connection step generation L₂ Needed for the item collection C of candidate 2₂Number,Digging efficiency significantly improves.

Step 3: scanning strategy of the design based on frequency matrix；

After step 2 utilizes very big service path by database piecemeal, noise isolation is realized, excavation can be effectively improved Efficiency.During scan round, to improve search iteration efficiency between each piecemeal, this step is on the basis of piecemeal, further frequently Matrix and eigenvector method are spent, piecemeal is decomposed into characteristic vector and frequency matrix, utilizes matrix operation feature and frequent episode Collect property, to reduce cycle-index and comparing calculation amount.

Step 301, structure frequency matrix and characteristic vector

Characteristic vector：If certain piecemeal transaction set T_iInclude m mutually different affairs t₁,t₂,...,t_m, by n mutual not phases Same item I₁,I₂,...,I_nForm, wherein { t_i},{I_iSorted by dictionary, then claim n-dimensional vector (I₁,I₂,...,I_n) it is T_i's Characteristic vector, it is designated asExample of the present invention represents failure item, therefore I with A, B, C etc._i=A, B, C....

Frequency matrix：If certain piecemeal transaction set T_iInclude m mutually different affairs t₁,t₂,...,t_m, its characteristic vector ForThen existence anduniquess matrix Q_i=(q_jk)_n×mSo that

Then claim Q_iFor T_iFrequency matrix.Wherein q_jk∈ N represent T_iIn k-th of affairs t_kJth item I_jFrequency.

Therefore, a transaction set piecemeal can be analyzed to a characteristic vector and a frequency matrix.After piecemeal, changed in search Dai Zhong, each piecemeal only need run-down characteristic vector, and carry out first moment battle array computing.

Example 5. can be by each piece of decomposition, final transaction set piecemeal based on the transaction set piecemeal shown in table 7, usage frequency matrix It is as shown in table 8 with characteristic vector, frequency matrix：

Table 8 is each piecemeal frequency matrix and characteristic vector

Wherein：

Observation matrix is it can be found that often row addition is Count (C₁), second and third row is Count (C₂), first row is Count(C₃).Therefore frequency matrix both remains frequency information, and and can embodies the related information between items, in a scan need not Ergodic data storehouse, contrast computing is carried out one by one, only need to be according to matrix operation, you can complete scanning.

Step 302, analysis frequency matrix properties

Frequency matrix has following property：

(1) the i-th row of frequency matrix sum is equal to the frequency of characteristic vector i-th：

(2) frequency matrix normalization is handled：

Wherein, frequency matrix jth row coefficient lambda_jEqual to the frequency that jth arranges corresponding affairs：

Such as transaction set piecemeal T in example 5₁：

Count (ABC)=1, Count (AB)=1, Count (BC)=2

As can be seen that usage frequency matrix and its computing feature, can significantly simplify the comparing calculation in scan round.

Step 303, analysis frequent item set property

In frequent item set mining, following property be present：

(1) nonvoid subset of any frequent item set is also that frequently, also right and wrong are frequently for the supersets of nonmatching grids.

Inference 1A：If the frequency of all items is below minimum support in certain transaction set piecemeal, the transaction set piecemeal In there is no frequent item set, can directly delete the transaction set piecemeal.

Inference 1B：If a certain purpose frequency in certain transaction set piecemeal be present is less than minimum support, the project is included The equal right and wrong of any affairs frequently, directly can delete the project from transaction set piecemeal.

(2) if certain transaction length is less than k, the affairs do not support L_k+1。

Inference 2A：If the characteristic vector length of certain transaction set piecemeal is less than k+1, the maximum frequent set of the transaction set piecemeal Integrate as L_k, the transaction set piecemeal can be directly deleted in circulation afterwards.

Inference 2B：If being less than k+1 in the presence of certain transaction length in certain transaction set piecemeal, L is generated in the transaction set piecemeal_kIt The affairs can be directly deleted in circulation afterwards.

Step 304, design scanning strategy

The frequency matrix and frequent item set property provided based on step 301,302,303, this step are swept by designing five Strategy is retouched, to greatly improve the sweep speed in excavating.

(1) scanning strategy 1~4 is designed：Frequency matrix and characteristic vector are compressed using minimum support, so as to compress affairs Collect piecemeal,

Scanning strategy 1：According to inference 1A, if often row sum is respectively less than Sup to the frequency matrix of certain transaction set piecemeal_min, then directly Connect and delete the transaction set piecemeal：

Such as the transaction set piecemeal T of example 5₂Frequency matrix rows sum be respectively less than 2, therefore can directly delete the second piecemeal, press Contracting database, improve scan efficiency：

Scanning strategy 2：According to inference 1B, if certain transaction set piecemeal has a frequency to be less than minimum support, i.e. frequency square Certain row sum of battle array is less than Sup_min, then the blocking characteristic is vectorial deletes this, and frequency matrix deletes the row, so as to compress the affairs Collect piecemeal：

Such as transaction set piecemeal T in example 5₃Frequency matrix the third line sum be less than 2, therefore compress the piecemeal：Characteristic vector G is deleted, frequency matrix deletes the third line, T₃Compressible is T₃', so as to improve scan efficiency：

Scanning strategy 3：According to inference 2A, if the characteristic vector length of certain transaction set piecemeal is less than k+1, in generation L_k Afterwards, the transaction set piecemeal is directly deleted：

Generate C_k+1：

Scanning strategy 4：According to inference 2B, if certain row sum after certain transaction set piecemeal frequency matrix decomposition is less than k+1, In generation L_kAfterwards, the row are directly deleted：

Generate C_k+1：

(2) scanning strategy 5 is designed：Traditional algorithm is calculating Count (C_k) when need scan round transaction set, introduce frequency square After battle array, using Matrix Properties, contrast scans work is skipped, directly obtains result with product.

Defined function jo (x)：

Scanning strategy 5：The frequency matrix properties provided according to step 302, calculating the Count of certain transaction set piecemeal (s_i),s_i∈C_kWhen, by s_iIt is frequency vector by eigendecomposition, the product of frequency vector and frequency matrix can obtain Count (s_i), contrasted without circulating：

Such as the transaction set piecemeal T of example 5₁, calculate Count (AC), AC ∈ C₂：

Using scanning strategy 5, product calculation is directly carried out in scan round, without scanning the frequency of Frequent Set one by one, Digging efficiency can be effectively improved.

Step 4: BFM algorithm global designs

Step 1 determines excavation object, and step 2 gives the piecemeal Mining Strategy of very big service path, and step 3 is given Go out the scanning strategy based on frequency matrix, on this basis, this step global design BFM (Blocking Frequency Matrix) method for digging.

Step 401, algorithm flow description

Algorithm ergodic data storehouse first, very big service path is generated, and utilize very big service path by database piecemeal.Root Defined according to frequency matrix, the failure item collection of each piecemeal is decomposed into characteristic vector and frequency matrix.Then scanning strategy 1,2 is utilized Simplification before scanning for, the piecemeal of the condition of being unsatisfactory for and, is deleted by scanning strategy 1, by sweeping according to the row of matrix in block form Strategy 2 is retouched by matrix in block form depression of order.

1 Frequent Set of each piecemeal is firstly generated, starts the cycle over search iteration.Directly skipped using scanning strategy 3 discontented The Circulant Block of sufficient condition, the further depression of order of frequency matrix is optimized using scanning strategy 4.Piecemeal after optimization is attached step Generation candidate is walked with beta pruning, candidate frequency is calculated by scanning strategy 5, and then generate Frequent Set.Circulated in piecemeal Terminate when current frequent item set is empty set, circulate in after the completion of all piecemeals and terminate between piecemeal, finally return that all frequent episodes Collection.Fig. 9 show BFM algorithm flows.

Step 402, algorithm pseudo code

Input：Transaction database D, minimum support Sup_min

Output：All frequent item set L

1.

2.XR_i=find_i_maxpath (D)；I-th very big service path of // generation

3.T_i=find_i_block (XR_i,D)；I-th of transaction set piecemeal of // generation

4.It_i=find_i_featurevector (T_i)；The characteristic vector of i-th of transaction set piecemeal of // generation

5.Q_i=find_i_frequentmatrix (It_i,T_i)；The frequency matrix of i-th of transaction set piecemeal of // generation

6.[n_i,m_i]=size (Q_i)；The size of i-th of frequency matrix of // calculating

7.}

8.N=NUM (XR_i)；// calculate very big service path number

9.FOR (k=1；k≤N；k++){

10.{delete Q_k,It_k；} // scanning strategy 1, utilizes Q_kRow and letter Change { T_k}

11.ELSE{

12.

13.

14. } // scanning strategy 2, utilizes Q_kRow and simplified T_k

15.}

16.}

17.FOR EACH Q_k(i,j){

18.IF(Q_k(i, j)=0) { E_k(i, j)=0；}

19.ELSE{E_k(i, j)=1；λ_k(j)=Q_k(i,j)；} // by Q_kIt is normalized to matrix E_kAnd corresponding factor lambda_k

20.}

21.FOR (k=1；k≤N；K++) { // beginning piecemeal k search iteration

22.// utilize Q_k1 Frequent Set of row and generation

23.// calculate E_kRow and

24.The search iteration of { // start l item Frequent Sets

25.// scanning strategy 3, utilizes E_kArrange and skip piecemeal k scanning

26.ELSE{

27.FOR EACH M_k(j){

28.// scanning strategy 4, utilizes E_kRow and simplified T_k

29.}

30.}

31.C_k,l=apriori_gen (L_k,(l-1),Sup_min)；// walked by connecting step and beta pruning, generate l item Candidate Sets

32.FOR EACH s_i∈C_k,l{

33.g_i=find_vector (s_i,It_k)；// by affairs s_iIt is converted into vector

34.// scanning strategy 5, s is solved using matrix operation_iFrequency

35.}

36.L_k,l={ s_i∈C_k,l|Count(s_i)≥Sup_min}；// return to l item Frequent Sets

37.}

38.// return piecemeal k all frequent item sets

39.}

40.All frequent item sets in // returned data storehouse

The present invention gives a kind of association rule mining method based on service path and frequency matrix, to efficient, accurate Really excavate and show onboard networks and dynamically associate incidence relation between failure.

After analysis Apriori algorithm efficiency is mainly limited to external memory scanning and noise combination,

Step 1, fault data model is dynamically associated by structure and analysis judges feature, excavation object is specify that, provides The optimization space of digging efficiency；

Step 2, linked character of the onboard networks failure based on service path is analyzed, provides the piecemeal for avoiding noise combination Strategy, algorithm just filter out most of noise information before search iteration.

Step 3, each piecemeal is decomposed into characteristic vector and frequency matrix, associate(d) matrix computing feature and frequent item set Matter, the cyclic policy that external memory scanning is greatly reduced is provided, greatly reduces algorithm cycle-index and comparing calculation；

Step 4, in summary strategy devise BFM algorithms.

Association rule mining method more than constructed by four steps, the excavation speed of frequent item set can be effectively improved Rate.

Brief description of the drawings

Fig. 1 is to dynamically associate failure method for digging schematic flow sheet based on service path and frequency matrix.

Fig. 2 is onboard networks service path schematic diagram.

Fig. 3 is the failure dynamic schematic diagram based on service path.

Fig. 4 is the fault correlation schematic diagram based on service path.

Fig. 5 is the network topological diagram of example 1.

Fig. 6 is the service path schematic diagram for having inclusion relation.

Fig. 7 is the service path schematic diagram for having overlapping relation.

Fig. 8 is the network topological diagram of example 4.

Fig. 9 is BFM algorithm flow charts.

Figure 10 is execution time contrast schematic diagram under different supports.

Figure 11 is frequent item set number contrast schematic diagram under different supports.

Embodiment

Below in conjunction with accompanying drawing and preferred embodiment, the invention will be further described.It is emphasized that the description below It is merely exemplary, the scope being not intended to be limiting of the invention and application.

Embodiment 1

To verify BFM efficiency of algorithm, certain research institute aircraft 2009-2011 Mishap Databases are chosen as experiment number According to saving as the calculating that 8GB, CPU are Intel core i5-3337u 1.87GHz, operating system is 64 windows10 inside On machine by Matlab programming realizations Apriori algorithm and BFM algorithms.

If min confidence is 0.3, table 9 is partial simulation result example：

Table 9 is correlation rule result

Wherein, A31 refers to " flying pipe abnormal signal "；A67 refers to " AFDX loads Signal Fail ", and corresponding service path is " winged pipe System → AFDX1 → long-range 1 → inertial navigation ".Contrast troubleshooting record in failure procedure illustrate, excavate " A31 → A67 " is effective It is and believable.

As shown in Figure 10, BFM efficiency of algorithm is higher than Apriori algorithm, and as support improves, advantage is first increasingly Substantially, faded away after.Because as minimum support increases, gradually exceed the frequency of most projects in database, and Scanning for the first time judges to have filtered out most piecemeals and item collection after BFM algorithm piecemeals, compared to Apriori algorithm, efficiency Advantage can become apparent from；And as minimum support exceeds all items frequency, two algorithm an iterations judge all Frequent item set, it is iterated after first having carried out piecemeal computing due to BFM, directly starts iteration compared to Apriori algorithm, algorithm is excellent Gesture fades away.In engineering, the excavation support of onboard networks relevant fault is required typically within 20%, now BFM algorithms Efficiency will much be higher by Apriori algorithm.

As shown in figure 11, frequent item set number caused by BFM algorithms will be less than Apriori algorithm, and as support carries Height, frequent item set number gap are less and less.Because as minimum support increases, the cycle-index of search iteration is more next Fewer, caused noise combination is also fewer and fewer in circulation, the advantage increasingly unobvious of BFM algorithms " denoising ".

To sum up, BFM algorithms can support onboard networks relevant fault to excavate, and have higher efficiency compared to traditional algorithm.

Claims

1. a kind of dynamically associate failure method for digging based on service path and frequency matrix, it is characterised by：This method is included such as Lower step：

Step 1: structure dynamically associates the data model of failure；

Step 101, clearly dynamically associate fault definition

The definition that onboard networks dynamically associate failure is：Exist in onboard networks operation using avionics business between each application of flow Dynamically logic connection relation, cause under certain proof stress, when certain failure occurs, association triggering is new on current business path Failure, and dynamically associating with service path, the characteristics of failure shows dynamic dependency and constantly propagated；

Service path, dynamic, relevance are defined respectively as：

Service path：Under certain proof stress, logic connecting relation is respectively applied in avionics operation flow；

Dynamic：Under same proof stress, failure generation is closely related with service path, and the dynamic of service path causes failure Dynamic；

Relevance：Multiple failures may be caused by same failure in same test business, i.e., same proof stress and same business There is relevance between the failure in path；

Step 102, determination dynamically associate the judgement feature of failure

Analyzed according to step 101, onboard networks failure has dynamically associates feature based on service path, its correlation rule and event It is closely related to hinder attribute " service path " and " proof stress ", therefore provides following judgement feature：

Feature 1：Service path is non-intersect, two different failures of proof stress, it is impossible to be that direct correlation triggers；

Feature 2：Two failures of indirect association triggering, the failure pair that can be triggered by limited individual direct correlation, find its association Relation；

Step 103, fault data pretreatment and model construction

Failure mainly includes the attributes such as phenomenon of the failure, faulty equipment, service path, stress condition, according to method for digging demand, structure It is as follows to build fault message model F：

F={ S, X, N, Δ }

Wherein, S={ S₁,S₂,...S_i... } and it is fault attribute " experiment ID ", X={ X₁,X₂,...X_i... } and it is fault attribute " proof stress ", I={ A, B ... } are " phenomenon of the failure ", N={ N₁,N₂,...,N_i... } and it is fault attribute " service path ", N_i=(n_j,n_k...) and be failure i service path, wherein n_j,n_k∈ n, n=1,2 ... } it is malfunctioning node equipment；Each parameter Corresponding sequence is that the different value of corresponding field in raw data base is formed by dictionary ordering map；

The judgement feature provided according to step 102, can judge whether there may be dynamic associations between two failures in advance, So as to avoid the combination of noise in the connection step of frequent-item；

Once to test as an affairs, trade type database-transaction set is formed, by demonstrating by L₁Generate L₂Process, card The bright feature 1 using in step 102, the feasibility of digging efficiency can be greatly improved；If minimum support Sup_min=2；

(1) trade type database-transaction set is scanned, generates 1 Frequent Set list L₁

(2) carry out First Contact Connections step to calculate, generate 2 candidate C₂

According to feature 1 is judged, failure combination is obtained, its service path has no to intersect, and proof stress is entirely different, it is impossible to be to close Join failure, therefore be unlikely to be frequent item set；These combinations are entirely the noise information of connection step generation；

In search iteration, each step of algorithm all can a large amount of similar noise information of combination producing, then beta pruning eliminates it one by one Middle major part, this greatly reduces digging efficiency and the degree of accuracy, and this partial noise is avoidable；

Therefore, in dynamically associating failure and excavating, using the judgement feature of service path, it is feasible to reduce noise combination；

Step 201, propose very big service path concept

Very big service path：Travel through in fault database after institute faulty proof stress and service path attribute, if service path R₁~ R_nWith inclusion relation, and proof stress is X_i, wherein R_jComprising remaining (n-1) bar service path, then (X_i,R_j) it is a pole Big service path；

Define service path vector R₂Include R₁Computing be R₁@R₂：if(X₁=X₂)&Len(R₁)≤Len(R₂),Then R₁@R₂；Wherein, Part (R_i, k) and represent vectorial R_iThe continuous fragment that length is k forms Gather (k≤Len (R_i)), Part (R_i,k)_jRepresent in the set jth item (j=1,2 ..., Len (R_i)-k+1)；

If R₁=(1,2,3,4), Len (R₁)=4, Part (R₁,3)₂=(2,3,4), Part (R₁, 3)=(1,2,3), (2,3, 4) }, if R₂=(2,3), R₃=(3,4,5), and X₁=X₂=X₃, then R₂@R₁Set up, R₃@R₁It is invalid；

Mishap Database presented hereinbefore can generate following 4 very big service paths after traversal：

{(X₁,(1,2,3)),(X₂,(2,3)),(X₃,(5,6,7)),(X₄,(5,6))}

Step 202, clearly very big service path characteristic

Very big service path characteristic：The failure of two direct correlation triggerings, maximum probability are betided under same greatly service path, And as experiment carries out being intended to 100% with fault database increase, this probability；

Step 203, design piecemeal Mining Strategy

It is proposed following piecemeal Mining Strategy：

(1) after generating transaction set, ergodic data storehouse：

<mrow> <mi>i</mi> <mi>f</mi> <mo>:</mo> <mo>&ForAll;</mo> <mi>i</mi> <mo>=</mo> <mn>1</mn> <mo>,</mo> <mn>2</mn> <mo>,</mo> <mo>...</mo> <mo>,</mo> <mi>n</mi> <mo>,</mo> <msub> <mi>R</mi> <mi>i</mi> </msub> <mo>@</mo> <msub> <mi>R</mi> <mi>n</mi> </msub> <mo>,</mo> <mi>t</mi> <mi>h</mi> <mi>e</mi> <mi>n</mi> <mo>:</mo> <mrow> <mo>(</mo> <msub> <mi>X</mi> <mi>n</mi> </msub> <mo>,</mo> <msub> <mi>R</mi> <mi>n</mi> </msub> <mo>)</mo> </mrow> </mrow>

Obtain very big service path set：

{(X₁,R₁),(X₂,R₂),...,(X_k,R_k)}

(2) very big service path set is utilized, database is divided into k blocks,

(3) each piecemeal dredge operation isolation is carried out；Affairs frequency calculates, and is only added up in same transaction set piecemeal；

Step 3: scanning strategy of the design based on frequency matrix；

Step 301, structure frequency matrix and characteristic vector

Characteristic vector：If certain piecemeal transaction set T_iInclude m mutually different affairs t₁,t₂,...,t_m, it is mutually different by n Item I₁,I₂,...,I_nForm, wherein { t_i},{I_iSorted by dictionary, then claim n-dimensional vector (I₁,I₂,...,I_n) it is T_iFeature Vector, it is designated as It_i；Failure item, therefore I are represented with A, B, C etc._i=A, B, C...；

Frequency matrix：If certain piecemeal transaction set T_iInclude m mutually different affairs t₁,t₂,...,t_m, its characteristic vector is It_i =(I₁,I₂,...,I_n), then existence anduniquess matrix Q_i=(q_jk)_n×mSo that

<mrow> <msub> <mi>It</mi> <mi>i</mi> </msub> <mo>&times;</mo> <msub> <mi>Q</mi> <mi>i</mi> </msub> <mo>=</mo> <mrow> <mo>(</mo> <msub> <mi>I</mi> <mn>1</mn> </msub> <mo>,</mo> <msub> <mi>I</mi> <mn>2</mn> </msub> <mo>,</mo> <mo>...</mo> <mo>,</mo> <msub> <mi>I</mi> <mi>n</mi> </msub> <mo>)</mo> </mrow> <mfenced open = "(" close = ")"> <mtable> <mtr> <mtd> <msub> <mi>q</mi> <mn>11</mn> </msub> </mtd> <mtd> <msub> <mi>q</mi> <mn>12</mn> </msub> </mtd> <mtd> <mrow></mrow> </mtd> <mtd> <msub> <mi>q</mi> <mrow> <mn>1</mn> <mi>m</mi> </mrow> </msub> </mtd> </mtr> <mtr> <mtd> <msub> <mi>q</mi> <mn>21</mn> </msub> </mtd> <mtd> <msub> <mi>q</mi> <mn>22</mn> </msub> </mtd> <mtd> <mrow></mrow> </mtd> <mtd> <msub> <mi>q</mi> <mrow> <mn>2</mn> <mi>m</mi> </mrow> </msub> </mtd> </mtr> <mtr> <mtd> <mrow></mrow> </mtd> <mtd> <mrow></mrow> </mtd> <mtd> <mrow></mrow> </mtd> <mtd> <mrow></mrow> </mtd> </mtr> <mtr> <mtd> <msub> <mi>q</mi> <mrow> <mi>n</mi> <mn>1</mn> </mrow> </msub> </mtd> <mtd> <msub> <mi>q</mi> <mrow> <mi>n</mi> <mn>2</mn> </mrow> </msub> </mtd> <mtd> <mrow></mrow> </mtd> <mtd> <msub> <mi>q</mi> <mrow> <mi>n</mi> <mi>m</mi> </mrow> </msub> </mtd> </mtr> </mtable> </mfenced> <mo>&DoubleLeftRightArrow;</mo> <msub> <mi>T</mi> <mi>i</mi> </msub> </mrow>

Then claim Q_iFor T_iFrequency matrix；Wherein q_jk∈ N represent T_iIn k-th of affairs t_kJth item I_jFrequency；

Therefore, a transaction set piecemeal can be analyzed to a characteristic vector and a frequency matrix；After piecemeal, in search iteration In, each piecemeal only needs run-down characteristic vector, and carries out first moment battle array computing；

Each piece of decomposition can be obtained transaction set, characteristic vector, frequency by transaction set piecemeal shown below, usage frequency matrix Matrix；That is, transaction set T₁Characteristic vector be (A, B, C), frequency matrix is

Transaction set T₂Characteristic vector be (A, D, E), frequency matrix is

Transaction set T₃Characteristic vector be (A, F, G), frequency matrix is

Wherein：

Observation matrix is it can be found that often row addition is Count (C₁), second and third row is Count (C₂), first row is Count (C₃)；Therefore frequency matrix both remains frequency information, and and can embodies the related information between items, need not travel through number in a scan According to storehouse, contrast computing is carried out one by one, only need to be according to matrix operation, you can complete scanning；

Step 302, analysis frequency matrix properties

Frequency matrix has following property：

<mrow> <munderover> <mo>&Sigma;</mo> <mrow> <mi>j</mi> <mo>=</mo> <mn>1</mn> </mrow> <mi>m</mi> </munderover> <msub> <mi>q</mi> <mrow> <mi>i</mi> <mi>j</mi> </mrow> </msub> <mo>=</mo> <mi>C</mi> <mi>o</mi> <mi>u</mi> <mi>n</mi> <mi>t</mi> <mrow> <mo>(</mo> <msub> <mi>I</mi> <mi>i</mi> </msub> <mo>)</mo> </mrow> </mrow>

(2) frequency matrix normalization is handled：

<mrow> <msub> <mi>Q</mi> <mi>i</mi> </msub> <mo>=</mo> <mfenced open = "(" close = ")"> <mtable> <mtr> <mtd> <msub> <mi>q</mi> <mn>11</mn> </msub> </mtd> <mtd> <msub> <mi>q</mi> <mn>12</mn> </msub> </mtd> <mtd> <mrow></mrow> </mtd> <mtd> <msub> <mi>q</mi> <mrow> <mn>1</mn> <mi>m</mi> </mrow> </msub> </mtd> </mtr> <mtr> <mtd> <msub> <mi>q</mi> <mn>21</mn> </msub> </mtd> <mtd> <msub> <mi>q</mi> <mn>22</mn> </msub> </mtd> <mtd> <mrow></mrow> </mtd> <mtd> <msub> <mi>q</mi> <mrow> <mn>2</mn> <mi>m</mi> </mrow> </msub> </mtd> </mtr> <mtr> <mtd> <mrow></mrow> </mtd> <mtd> <mrow></mrow> </mtd> <mtd> <mrow></mrow> </mtd> <mtd> <mrow></mrow> </mtd> </mtr> <mtr> <mtd> <msub> <mi>q</mi> <mrow> <mi>n</mi> <mn>1</mn> </mrow> </msub> </mtd> <mtd> <msub> <mi>q</mi> <mrow> <mi>n</mi> <mn>2</mn> </mrow> </msub> </mtd> <mtd> <mrow></mrow> </mtd> <mtd> <msub> <mi>q</mi> <mrow> <mi>n</mi> <mi>m</mi> </mrow> </msub> </mtd> </mtr> </mtable> </mfenced> <mo>=</mo> <mrow> <mo>(</mo> <msub> <mi>&lambda;</mi> <mn>1</mn> </msub> <mfenced open = "(" close = ")"> <mtable> <mtr> <mtd> <msub> <mi>e</mi> <mn>11</mn> </msub> </mtd> </mtr> <mtr> <mtd> <msub> <mi>e</mi> <mn>21</mn> </msub> </mtd> </mtr> <mtr> <mtd> <mrow></mrow> </mtd> </mtr> <mtr> <mtd> <msub> <mi>e</mi> <mrow> <mi>n</mi> <mn>1</mn> </mrow> </msub> </mtd> </mtr> </mtable> </mfenced> <mo>,</mo> <msub> <mi>&lambda;</mi> <mn>2</mn> </msub> <mfenced open = "(" close = ")"> <mtable> <mtr> <mtd> <msub> <mi>e</mi> <mn>12</mn> </msub> </mtd> </mtr> <mtr> <mtd> <msub> <mi>e</mi> <mn>22</mn> </msub> </mtd> </mtr> <mtr> <mtd> <mrow></mrow> </mtd> </mtr> <mtr> <mtd> <msub> <mi>e</mi> <mrow> <mi>n</mi> <mn>2</mn> </mrow> </msub> </mtd> </mtr> </mtable> </mfenced> <mo>,</mo> <mo>,</mo> <msub> <mi>&lambda;</mi> <mi>m</mi> </msub> <mfenced open = "(" close = ")"> <mtable> <mtr> <mtd> <msub> <mi>e</mi> <mrow> <mn>1</mn> <mi>m</mi> </mrow> </msub> </mtd> </mtr> <mtr> <mtd> <msub> <mi>e</mi> <mrow> <mn>2</mn> <mi>m</mi> </mrow> </msub> </mtd> </mtr> <mtr> <mtd> <mrow></mrow> </mtd> </mtr> <mtr> <mtd> <msub> <mi>e</mi> <mrow> <mi>n</mi> <mi>m</mi> </mrow> </msub> </mtd> </mtr> </mtable> </mfenced> <mo>)</mo> </mrow> </mrow>

<mrow> <msub> <mi>q</mi> <mrow> <mi>i</mi> <mi>j</mi> </mrow> </msub> <mo>=</mo> <msub> <mi>&lambda;</mi> <mi>j</mi> </msub> <msub> <mi>e</mi> <mrow> <mi>i</mi> <mi>j</mi> </mrow> </msub> <mo>,</mo> <msub> <mi>e</mi> <mrow> <mi>i</mi> <mi>j</mi> </mrow> </msub> <mo>=</mo> <mfenced open = "{" close = ""> <mtable> <mtr> <mtd> <mn>0</mn> </mtd> </mtr> <mtr> <mtd> <mn>1</mn> </mtd> </mtr> </mtable> </mfenced> <mo>,</mo> <msub> <mi>&lambda;</mi> <mi>j</mi> </msub> <mo>&Element;</mo> <msub> <mi>N</mi> <mo>+</mo> </msub> </mrow>

<mrow> <msub> <mi>&lambda;</mi> <mi>j</mi> </msub> <mo>=</mo> <mi>C</mi> <mi>o</mi> <mi>u</mi> <mi>n</mi> <mi>t</mi> <mrow> <mo>(</mo> <msub> <mi>t</mi> <mi>j</mi> </msub> <mo>)</mo> </mrow> <mo>,</mo> <msub> <mi>t</mi> <mi>j</mi> </msub> <mo>=</mo> <mrow> <mo>(</mo> <msub> <mi>I</mi> <mn>1</mn> </msub> <mo>,</mo> <msub> <mi>I</mi> <mn>2</mn> </msub> <mo>,</mo> <mo>...</mo> <mo>,</mo> <msub> <mi>I</mi> <mi>n</mi> </msub> <mo>)</mo> </mrow> <mo>&times;</mo> <mfenced open = "(" close = ")"> <mtable> <mtr> <mtd> <msub> <mi>e</mi> <mrow> <mn>1</mn> <mi>j</mi> </mrow> </msub> </mtd> </mtr> <mtr> <mtd> <msub> <mi>e</mi> <mrow> <mn>2</mn> <mi>j</mi> </mrow> </msub> </mtd> </mtr> <mtr> <mtd> <mrow></mrow> </mtd> </mtr> <mtr> <mtd> <msub> <mi>e</mi> <mrow> <mi>n</mi> <mi>j</mi> </mrow> </msub> </mtd> </mtr> </mtable> </mfenced> </mrow>

Transaction set piecemeal T₁：

Count (ABC)=1, Count (AB)=1, Count (BC)=2

As can be seen that usage frequency matrix and its computing feature, can significantly simplify the comparing calculation in scan round；

Step 303, analysis frequent item set property

In frequent item set mining, following property be present：

(1) nonvoid subset of any frequent item set is also that frequently, also right and wrong are frequently for the supersets of nonmatching grids；

Inference 1A：If the frequency of all items is below minimum support in certain transaction set piecemeal, do not have in the transaction set piecemeal There is frequent item set, can directly delete the transaction set piecemeal；

Inference 1B：If a certain purpose frequency in certain transaction set piecemeal be present is less than minimum support, any of the project is included The equal right and wrong of affairs frequently, directly can delete the project from transaction set piecemeal；

(2) if certain transaction length is less than k, the affairs do not support L_k+1；

Inference 2A：If the characteristic vector length of certain transaction set piecemeal is less than k+1, the maximum frequent itemsets of the transaction set piecemeal are L_k, the transaction set piecemeal can be directly deleted in circulation afterwards；

Inference 2B：If being less than k+1 in the presence of certain transaction length in certain transaction set piecemeal, L is generated in the transaction set piecemeal_kAfterwards The affairs can be directly deleted in circulation；

Step 304, design scanning strategy

The frequency matrix and frequent item set property provided based on step 301,302,303, this step is by designing five scanning plans Omit, to greatly improve the sweep speed in excavating；

(1) scanning strategy 1~4 is designed：Frequency matrix and characteristic vector are compressed using minimum support, so as to compress transaction set point Block,

Scanning strategy 1：According to inference 1A, if often row sum is respectively less than Sup to the frequency matrix of certain transaction set piecemeal_min, then directly delete Fall the transaction set piecemeal：

<mrow> <mi>i</mi> <mi>f</mi> <mo>:</mo> <munder> <mrow> <mi>m</mi> <mi>a</mi> <mi>x</mi> </mrow> <mi>i</mi> </munder> <mrow> <mo>(</mo> <munderover> <mo>&Sigma;</mo> <mrow> <mi>j</mi> <mo>=</mo> <mn>1</mn> </mrow> <mi>m</mi> </munderover> <msub> <mi>q</mi> <mrow> <mi>i</mi> <mi>j</mi> </mrow> </msub> <mo>)</mo> </mrow> <mo><</mo> <msub> <mi>Sup</mi> <mrow> <mi>m</mi> <mi>i</mi> <mi>n</mi> </mrow> </msub> <mo>,</mo> <msub> <mi>q</mi> <mrow> <mi>i</mi> <mi>j</mi> </mrow> </msub> <mo>&Element;</mo> <msub> <mi>Q</mi> <mi>k</mi> </msub> <mo>,</mo> <mi>t</mi> <mi>h</mi> <mi>e</mi> <mi>n</mi> <mo>:</mo> <mi>d</mi> <mi>e</mi> <mi>l</mi> <mi>e</mi> <mi>t</mi> <mi>e</mi> <mi> </mi> <msub> <mi>T</mi> <mi>k</mi> </msub> </mrow>

Scanning strategy 2：According to inference 1B, if certain transaction set piecemeal has a frequency to be less than minimum support, i.e. frequency matrix Row sum is less than Sup_min, then the blocking characteristic is vectorial deletes this, and frequency matrix deletes the row, so as to compress the transaction set point Block：

<mrow> <mi>i</mi> <mi>f</mi> <mo>:</mo> <mo>&Exists;</mo> <mi>i</mi> <mo>&le;</mo> <mi>n</mi> <mo>,</mo> <munderover> <mo>&Sigma;</mo> <mrow> <mi>j</mi> <mo>=</mo> <mn>1</mn> </mrow> <mi>m</mi> </munderover> <msub> <mi>q</mi> <mrow> <mi>i</mi> <mi>j</mi> </mrow> </msub> <mo><</mo> <msub> <mi>Sup</mi> <mi>min</mi> </msub> <mo>,</mo> <mi>t</mi> <mi>h</mi> <mi>e</mi> <mi>n</mi> <mo>:</mo> <mi>d</mi> <mi>e</mi> <mi>l</mi> <mi>e</mi> <mi>t</mi> <mi>e</mi> <mi> </mi> <msub> <mi>I</mi> <mi>i</mi> </msub> <mo>,</mo> <msub> <mi>q</mi> <mrow> <mi>i</mi> <mi>j</mi> </mrow> </msub> <msub> <mo>|</mo> <mrow> <mi>j</mi> <mo>&le;</mo> <mi>m</mi> </mrow> </msub> </mrow>

Scanning strategy 3：According to inference 2A, if the characteristic vector length of certain transaction set piecemeal is less than k+1, in generation L_kAfterwards, directly Delete the transaction set piecemeal：

Generate C_k+1：

Scanning strategy 4：According to inference 2B, if certain row sum after certain transaction set piecemeal frequency matrix decomposition is less than k+1, in life Into L_kAfterwards, the row are directly deleted：

Generate C_k+1：

(2) scanning strategy 5 is designed：Traditional algorithm is calculating Count (C_k) when need scan round transaction set, after introducing frequency matrix, Using Matrix Properties, contrast scans work is skipped, directly obtains result with product；

Defined function jo (x)：

Scanning strategy 5：The frequency matrix properties provided according to step 302, calculating the Count (s of certain transaction set piecemeal_i),s_i∈ C_kWhen, by s_iIt is frequency vector by eigendecomposition, the product of frequency vector and frequency matrix can obtain Count (s_i), without Circulation contrast：

<mrow> <mo>&ForAll;</mo> <msub> <mi>s</mi> <mi>i</mi> </msub> <mo>&Element;</mo> <msub> <mi>C</mi> <mi>k</mi> </msub> <mo>,</mo> <msub> <mi>s</mi> <mi>i</mi> </msub> <mo>=</mo> <mrow> <mo>(</mo> <msub> <mi>I</mi> <mn>1</mn> </msub> <mo>,</mo> <msub> <mi>I</mi> <mn>2</mn> </msub> <mo>,</mo> <mo>...</mo> <mo>,</mo> <msub> <mi>I</mi> <mi>n</mi> </msub> <mo>)</mo> </mrow> <mfenced open = "(" close = ")"> <mtable> <mtr> <mtd> <msub> <mi>g</mi> <mrow> <mi>i</mi> <mn>1</mn> </mrow> </msub> </mtd> </mtr> <mtr> <mtd> <msub> <mi>g</mi> <mrow> <mi>i</mi> <mn>2</mn> </mrow> </msub> </mtd> </mtr> <mtr> <mtd> <msub> <mi>g</mi> <mrow> <mi>i</mi> <mi>n</mi> </mrow> </msub> </mtd> </mtr> </mtable> </mfenced> </mrow>

<mfenced open = "" close = ""> <mtable> <mtr> <mtd> <mrow> <mi>C</mi> <mi>o</mi> <mi>u</mi> <mi>n</mi> <mi>t</mi> <mrow> <mo>(</mo> <msub> <mi>s</mi> <mi>i</mi> </msub> <mo>)</mo> </mrow> <mo>=</mo> <mfrac> <mn>1</mn> <mn>2</mn> </mfrac> <mi>j</mi> <mi>o</mi> <mrow> <mo>(</mo> <mrow> <msup> <mfenced open = "(" close = ")"> <mtable> <mtr> <mtd> <msub> <mi>g</mi> <mrow> <mi>i</mi> <mn>1</mn> </mrow> </msub> </mtd> </mtr> <mtr> <mtd> <msub> <mi>g</mi> <mrow> <mi>i</mi> <mn>2</mn> </mrow> </msub> </mtd> </mtr> <mtr> <mtd> <mrow></mrow> </mtd> </mtr> <mtr> <mtd> <msub> <mi>g</mi> <mrow> <mi>i</mi> <mi>n</mi> </mrow> </msub> </mtd> </mtr> </mtable> </mfenced> <mi>T</mi> </msup> <mfenced open = "(" close = ")"> <mtable> <mtr> <mtd> <msub> <mi>e</mi> <mn>11</mn> </msub> </mtd> <mtd> <msub> <mi>e</mi> <mn>12</mn> </msub> </mtd> <mtd> <mo>...</mo> </mtd> <mtd> <msub> <mi>e</mi> <mrow> <mn>1</mn> <mi>m</mi> </mrow> </msub> </mtd> </mtr> <mtr> <mtd> <msub> <mi>e</mi> <mn>21</mn> </msub> </mtd> <mtd> <msub> <mi>q</mi> <mn>22</mn> </msub> </mtd> <mtd> <mo>...</mo> </mtd> <mtd> <msub> <mi>e</mi> <mrow> <mn>2</mn> <mi>m</mi> </mrow> </msub> </mtd> </mtr> <mtr> <mtd> <mrow></mrow> </mtd> <mtd> <mrow></mrow> </mtd> <mtd> <mrow></mrow> </mtd> <mtd> <mrow></mrow> </mtd> </mtr> <mtr> <mtd> <msub> <mi>e</mi> <mrow> <mi>n</mi> <mn>1</mn> </mrow> </msub> </mtd> <mtd> <msub> <mi>e</mi> <mrow> <mi>n</mi> <mn>2</mn> </mrow> </msub> </mtd> <mtd> <mo>...</mo> </mtd> <mtd> <msub> <mi>e</mi> <mrow> <mi>n</mi> <mi>m</mi> </mrow> </msub> </mtd> </mtr> </mtable> </mfenced> </mrow> <mo>)</mo> </mrow> <mfenced open = "(" close = ")"> <mtable> <mtr> <mtd> <msub> <mi>&lambda;</mi> <mn>1</mn> </msub> </mtd> </mtr> <mtr> <mtd> <msub> <mi>&lambda;</mi> <mn>2</mn> </msub> </mtd> </mtr> <mtr> <mtd> <mrow></mrow> </mtd> </mtr> <mtr> <mtd> <msub> <mi>&lambda;</mi> <mi>n</mi> </msub> </mtd> </mtr> </mtable> </mfenced> </mrow> </mtd> </mtr> <mtr> <mtd> <mrow> <mo>=</mo> <mfrac> <mn>1</mn> <mn>2</mn> </mfrac> <msub> <mi>&lambda;</mi> <mn>1</mn> </msub> <mi>j</mi> <mi>o</mi> <mrow> <mo>(</mo> <mrow> <msup> <mfenced open = "(" close = ")"> <mtable> <mtr> <mtd> <msub> <mi>g</mi> <mrow> <mi>i</mi> <mn>1</mn> </mrow> </msub> </mtd> </mtr> <mtr> <mtd> <msub> <mi>g</mi> <mrow> <mi>i</mi> <mn>2</mn> </mrow> </msub> </mtd> </mtr> <mtr> <mtd> <mrow></mrow> </mtd> </mtr> <mtr> <mtd> <msub> <mi>g</mi> <mrow> <mi>i</mi> <mi>n</mi> </mrow> </msub> </mtd> </mtr> </mtable> </mfenced> <mi>T</mi> </msup> <mfenced open = "(" close = ")"> <mtable> <mtr> <mtd> <msub> <mi>e</mi> <mn>11</mn> </msub> </mtd> </mtr> <mtr> <mtd> <msub> <mi>e</mi> <mn>21</mn> </msub> </mtd> </mtr> <mtr> <mtd> <mrow></mrow> </mtd> </mtr> <mtr> <mtd> <msub> <mi>e</mi> <mrow> <mi>n</mi> <mn>1</mn> </mrow> </msub> </mtd> </mtr> </mtable> </mfenced> </mrow> <mo>)</mo> </mrow> <mo>+</mo> <mo>+</mo> <mfrac> <mn>1</mn> <mn>2</mn> </mfrac> <msub> <mi>&lambda;</mi> <mi>m</mi> </msub> <mi>j</mi> <mi>o</mi> <mrow> <mo>(</mo> <mrow> <msup> <mfenced open = "(" close = ")"> <mtable> <mtr> <mtd> <msub> <mi>g</mi> <mrow> <mi>i</mi> <mn>1</mn> </mrow> </msub> </mtd> </mtr> <mtr> <mtd> <msub> <mi>g</mi> <mrow> <mi>i</mi> <mn>2</mn> </mrow> </msub> </mtd> </mtr> <mtr> <mtd> <mrow></mrow> </mtd> </mtr> <mtr> <mtd> <msub> <mi>g</mi> <mrow> <mi>i</mi> <mi>n</mi> </mrow> </msub> </mtd> </mtr> </mtable> </mfenced> <mi>T</mi> </msup> <mfenced open = "(" close = ")"> <mtable> <mtr> <mtd> <msub> <mi>e</mi> <mrow> <mn>1</mn> <mi>m</mi> </mrow> </msub> </mtd> </mtr> <mtr> <mtd> <msub> <mi>e</mi> <mrow> <mn>2</mn> <mi>m</mi> </mrow> </msub> </mtd> </mtr> <mtr> <mtd> <mrow></mrow> </mtd> </mtr> <mtr> <mtd> <msub> <mi>e</mi> <mrow> <mi>n</mi> <mi>m</mi> </mrow> </msub> </mtd> </mtr> </mtable> </mfenced> </mrow> <mo>)</mo> </mrow> </mrow> </mtd> </mtr> </mtable> </mfenced>

Transaction set piecemeal T₁, calculate Count (AC), AC ∈ C₂：

<mrow> <mi>A</mi> <mi>C</mi> <mo>&DoubleLeftRightArrow;</mo> <mrow> <mo>(</mo> <mi>A</mi> <mo>,</mo> <mi>B</mi> <mo>,</mo> <mi>C</mi> <mo>)</mo> </mrow> <mfenced open = "(" close = ")"> <mtable> <mtr> <mtd> <mn>1</mn> </mtd> </mtr> <mtr> <mtd> <mn>0</mn> </mtd> </mtr> <mtr> <mtd> <mn>1</mn> </mtd> </mtr> </mtable> </mfenced> </mrow>

<mfenced open = "" close = ""> <mtable> <mtr> <mtd> <mrow> <mi>C</mi> <mi>o</mi> <mi>u</mi> <mi>n</mi> <mi>t</mi> <mrow> <mo>(</mo> <mi>A</mi> <mi>C</mi> <mo>)</mo> </mrow> <mo>=</mo> <mfrac> <mn>1</mn> <mn>2</mn> </mfrac> <mi>j</mi> <mi>o</mi> <mrow> <mo>(</mo> <msup> <mfenced open = "(" close = ")"> <mtable> <mtr> <mtd> <mn>1</mn> </mtd> </mtr> <mtr> <mtd> <mn>0</mn> </mtd> </mtr> <mtr> <mtd> <mn>1</mn> </mtd> </mtr> </mtable> </mfenced> <mi>T</mi> </msup> <mfenced open = "(" close = ")"> <mtable> <mtr> <mtd> <mn>1</mn> </mtd> <mtd> <mn>1</mn> </mtd> <mtd> <mn>0</mn> </mtd> </mtr> <mtr> <mtd> <mn>1</mn> </mtd> <mtd> <mn>1</mn> </mtd> <mtd> <mn>1</mn> </mtd> </mtr> <mtr> <mtd> <mn>1</mn> </mtd> <mtd> <mn>0</mn> </mtd> <mtd> <mn>1</mn> </mtd> </mtr> </mtable> </mfenced> <mo>)</mo> </mrow> <mfenced open = "(" close = ")"> <mtable> <mtr> <mtd> <mn>1</mn> </mtd> </mtr> <mtr> <mtd> <mn>1</mn> </mtd> </mtr> <mtr> <mtd> <mn>2</mn> </mtd> </mtr> </mtable> </mfenced> </mrow> </mtd> </mtr> <mtr> <mtd> <mrow> <mo>=</mo> <mfrac> <mn>1</mn> <mn>2</mn> </mfrac> <mrow> <mo>(</mo> <mrow> <mn>1</mn> <mo>&times;</mo> <mi>j</mi> <mi>o</mi> <mrow> <mo>(</mo> <mn>2</mn> <mo>)</mo> </mrow> <mo>,</mo> <mn>1</mn> <mo>&times;</mo> <mi>j</mi> <mi>o</mi> <mrow> <mo>(</mo> <mn>1</mn> <mo>)</mo> </mrow> <mo>,</mo> <mn>2</mn> <mo>&times;</mo> <mi>j</mi> <mi>o</mi> <mrow> <mo>(</mo> <mn>1</mn> <mo>)</mo> </mrow> </mrow> <mo>)</mo> </mrow> </mrow> </mtd> </mtr> <mtr> <mtd> <mrow> <mo>=</mo> <mfrac> <mn>1</mn> <mn>2</mn> </mfrac> <mrow> <mo>(</mo> <mrow> <mn>1</mn> <mo>&times;</mo> <mn>2</mn> <mo>,</mo> <mn>1</mn> <mo>&times;</mo> <mn>0</mn> <mo>,</mo> <mn>2</mn> <mo>&times;</mo> <mn>0</mn> </mrow> <mo>)</mo> </mrow> <mo>=</mo> <mn>1</mn> </mrow> </mtd> </mtr> </mtable> </mfenced>

Using scanning strategy 5, product calculation is directly carried out in scan round, without scanning the frequency of Frequent Set one by one, can be had Effect improves digging efficiency；

Step 4: BFM algorithm global designs

Step 401, algorithm flow description

Algorithm ergodic data storehouse first, very big service path is generated, and utilize very big service path by database piecemeal；According to frequency Matrix definition is spent, the failure item collection of each piecemeal is decomposed into characteristic vector and frequency matrix；Then carried out using scanning strategy 1,2 Simplification before search, the piecemeal of the condition of being unsatisfactory for and, is deleted by scanning strategy 1, by scanning plan according to the row of matrix in block form 2 are omited by matrix in block form depression of order；

1 Frequent Set of each piecemeal is firstly generated, starts the cycle over search iteration；Directly skipped using scanning strategy 3 and be unsatisfactory for bar The Circulant Block of part, the further depression of order of frequency matrix is optimized using scanning strategy 4；Piecemeal after optimization is attached step and cut Branch step generation candidate, calculates candidate frequency, and then generate Frequent Set by scanning strategy 5；Circulate in and work as in piecemeal Preceding frequent item set terminates when being empty set, circulates in after the completion of all piecemeals and terminates between piecemeal, finally returns that all frequent item sets；Figure 9 show BFM algorithm flows；

Step 402, algorithm pseudo code

Input：Transaction database D, minimum support Sup_min

Output：All frequent item set L.