CN106874491A - A kind of device fault information method for digging based on dynamic association rules - Google Patents

A kind of device fault information method for digging based on dynamic association rules Download PDF

Info

Publication number
CN106874491A
CN106874491A CN201710096618.6A CN201710096618A CN106874491A CN 106874491 A CN106874491 A CN 106874491A CN 201710096618 A CN201710096618 A CN 201710096618A CN 106874491 A CN106874491 A CN 106874491A
Authority
CN
China
Prior art keywords
association rules
dynamic association
support
data set
item
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN201710096618.6A
Other languages
Chinese (zh)
Inventor
王玲
彭开香
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
University of Science and Technology Beijing USTB
Original Assignee
University of Science and Technology Beijing USTB
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by University of Science and Technology Beijing USTB filed Critical University of Science and Technology Beijing USTB
Priority to CN201710096618.6A priority Critical patent/CN106874491A/en
Publication of CN106874491A publication Critical patent/CN106874491A/en
Pending legal-status Critical Current

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/24Querying
    • G06F16/245Query processing
    • G06F16/2458Special types of queries, e.g. statistical queries, fuzzy queries or distributed queries
    • G06F16/2465Query processing support for facilitating data mining operations in structured databases
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F2216/00Indexing scheme relating to additional aspects of information retrieval not explicitly covered by G06F16/00 and subgroups
    • G06F2216/03Data mining

Abstract

The present invention provides a kind of device fault information method for digging based on dynamic association rules, can react the change that dynamic association rules are produced over time, and methods described includes:Obtain equipment fault repair message data set D;The equipment fault repair message data set D of acquisition is divided into n Sub Data Set D according to reservation maintenance date1,D2,...,Dn, wherein, D={ D1,D2,...,Dn};Dynamic association rules algorithm is defined, wherein, dynamic association rules are expressed as:, wherein, A, B represent item collection respectively, and SV represents support vector, and CV represents confidence level vector, and s represents the support of item collection, and c represents the confidence level of correlation rule;According to the dynamic association rules algorithm of definition, to n Sub Data Set D1,D2,...,DnDynamic association rules excavation is carried out, the incidence relation between equipment fault reason and maintenance measures is obtained.The present invention is applied to data mining technology field.

Description

A kind of device fault information method for digging based on dynamic association rules
Technical field
The present invention relates to data mining technology field, a kind of device fault information based on dynamic association rules is particularly related to Method for digging.
Background technology
Under the overall background that data mining technology develops and prospers, Association Rules Technology is flourished, correlation rule The purpose of excavation is to find the association between item/variable and dependency relation from mass data.Traditional association rule digging The correlation rule that method is excavated can reflect the relation of interdependence between variable, but can not reflect the regular Changing Pattern of itself. As a example by excavating failure information database, the correlation rule that traditional association rule mining method is excavated is it can be found that in database Contact between distinct device, failure cause and repair message, so as to be conducive to finding out equipment, failure cause and maintenance letter Contact between breath, and most common failure cause etc. is determined according to equipment, but, traditional association rule mining method does not have Consider the situation of the change that correlation rule is produced over time, it is believed that the correlation rule of excavation is eternal effective in database , do not account for the change of correlation rule.
The content of the invention
The technical problem to be solved in the present invention is to provide a kind of device fault information excavation side based on dynamic association rules Method, to solve the problems, such as that the correlation rule that do not account for existing for prior art produces change over time.
In order to solve the above technical problems, the embodiment of the present invention provides a kind of device fault information based on dynamic association rules Method for digging, including:
Obtain equipment fault repair message data set D;
The equipment fault repair message data set D of acquisition is divided into n Sub Data Set D according to reservation maintenance date1, D2,...,Dn, wherein, D={ D1,D2,...,Dn};
Dynamic association rules algorithm is defined, wherein, dynamic association rules are expressed as:Wherein, A, B represents item collection respectively, and SV represents support vector, and CV represents confidence level vector, and s represents the support of item collection, and c represents association rule Confidence level then,Represent the reasoning symbol of dynamic association rules;
According to the dynamic association rules algorithm of definition, to n Sub Data Set D1,D2,...,DnCarry out dynamic association rules digging Pick, obtains the incidence relation between equipment fault reason and maintenance measures.
Further, the acquisition equipment fault repair message data set D includes:
Obtain equipment fault repair message raw data set;
Concentrated from the equipment fault repair message initial data for obtaining, obtain objective attribute target attribute data;
Objective attribute target attribute data to obtaining are pre-processed, and the pretreatment includes:Attribute in treatment objective attribute target attribute lacks Mistake value, the inconsistent value of attribute format and/or removes redundancy value.
Further, after being pre-processed to the objective attribute target attribute data for obtaining, methods described also includes:
Discrete character is carried out to objective attribute target attribute data to pretreated, equipment fault repair message data set D is obtained.
Further, the dynamic association rulesSupport vector SV be expressed as:
SV=[s(A∪B)1,s(A∪B)2,...,s(A∪B)n]
Wherein,Represent item collection A ∪ B in Sub Data Set DiIn support measurement, st. represents constraints,Represent item collection A ∪ B in Sub Data Set DiThe frequency of middle appearance, | Di| it is Sub Data Set DiIn record number.
Further, the dynamic association rulesConfidence level vector CV be expressed as:
Wherein,Item collection A ∪ B are reflected in Sub Data Set DiIn confidence metric, st. represents constraints,It is i-th element in the SV of item collection A ∪ B,It is i-th element in the SV of item collection A.
Further, the dynamic association rulesSupport s be expressed as:
Wherein, M is the record number in data set D,Represent item collection A ∪ B in Sub Data Set DiThe frequency of middle appearance.
Further, the dynamic association rulesConfidence level c be expressed as:
Wherein, s(A∪B)It is the support of item collection A ∪ B, sXIt is the support of item collection A.
Further, the dynamic association rules algorithm according to definition, to n Sub Data Set D1,D2,...,DnCarry out Dynamic association rules are excavated, and the incidence relation obtained between equipment fault reason and maintenance measures includes:
Algorithm is produced to produce the left item of dynamic association rules and the right side using frequent item set the Sub Data Set on each time period ;
Determine the support of the left item of the dynamic association rules, the support of the left item of dynamic association rules is expressed as:
Wherein,The support of the left item of dynamic association rules is represented,Represent the left item of dynamic association rules in time period ti On support vector, MiRepresent and time period tiOn Sub Data Set DiIn record number Mi, M represents total note in data set D Record number;
If the support of the left item of dynamic association rules is more than default support threshold, it is determined that described to dynamically associate The support of the support of regular left and right Quan Xiang, the dynamic association rules or so Quan Xiang is expressed as:
Wherein, sRRepresent the support of dynamic association rules or so Quan Xiang, sRiRepresent dynamic association rules or so Quan Xiang Time period tiOn support vector;
By formulaDetermine the confidence level of dynamic association rules, wherein, c represents the confidence of dynamic association rules Degree;
Whether the confidence level of the dynamic association rules is judged more than default confidence threshold value, if being more than default confidence Degree threshold value, the then dynamic association rules for being obtained according to excavation, the incidence relation between analytical equipment failure cause and maintenance measures.
Further, methods described also includes:
Setup time series regression model, predicts the development trend of dynamic association rules.
Further, methods described also includes:
On Interactive Visualization interface, according to the reservation maintenance date that user clicks on, corresponding item collection and pass are linked to Join the Result interface of rule;And/or,
On Interactive Visualization interface, the querying condition according to user input carries out matching inquiry, in the form of a list Corresponding Query Result is shown, the querying condition includes:Item collection or correlation rule;And/or,
On Interactive Visualization interface, according to the correlation rule that user selects, the pass is shown in the form of block diagram Join the confidence level of rule.
Above-mentioned technical proposal of the invention has the beneficial effect that:
In such scheme, equipment fault repair message data set D is obtained;The equipment event that will be obtained according to reservation maintenance date Barrier repair message data set D is divided into n Sub Data Set D1,D2,...,Dn, wherein, D={ D1,D2,...,Dn};Definition dynamic Association rule algorithm, wherein, dynamic association rules are expressed as:Wherein, A, B represent item collection, SV respectively Support vector is represented, CV represents confidence level vector, and s represents item collection support, and c represents the confidence level of dynamic association rules;Press According to the dynamic association rules algorithm of definition, to n Sub Data Set D1,D2,...,DnDynamic association rules excavation is carried out, is set Standby incidence relation between failure cause and maintenance measures, so, equipment of the dynamic association rules with getting that excavation is obtained Reservation maintenance date dynamical correlation in breakdown maintenance message data set D such that it is able to reflect that what excavation obtained dynamically associates Relation between rule and reservation maintenance date.
Brief description of the drawings
Fig. 1 is that the flow of the device fault information method for digging based on dynamic association rules provided in an embodiment of the present invention is shown It is intended to;
Fig. 2 is the affiliated brand distribution schematic diagram of equipment provided in an embodiment of the present invention;
Fig. 3 is the affiliated major class distribution map of equipment provided in an embodiment of the present invention;
After Fig. 4 is for the division according to day of appointment provided in an embodiment of the present invention, maintenance date distribution schematic diagram is preengage;
Fig. 5 is failure cause distribution map provided in an embodiment of the present invention;
Fig. 6 is the confidence level vector column schematic diagram of composition correlation rule provided in an embodiment of the present invention;
Fig. 7 is that the support vector column of the full item frequent item set of composition correlation rule provided in an embodiment of the present invention is illustrated Figure;
Fig. 8 is the full item frequent item set support column schematic diagram of correlation rule provided in an embodiment of the present invention;
Fig. 9 is support regression forecasting curve provided in an embodiment of the present invention;
Figure 10 is the fundamental diagram of Django frameworks provided in an embodiment of the present invention;
Figure 11 is the detailed results page provided in an embodiment of the present invention;
Figure 12 is query page provided in an embodiment of the present invention;
Figure 13 is certain regular confidence level trend block diagram provided in an embodiment of the present invention.
Specific embodiment
To make the technical problem to be solved in the present invention, technical scheme and advantage clearer, below in conjunction with accompanying drawing and tool Body embodiment is described in detail.
The present invention does not account for the problem that correlation rule produces change over time for existing, there is provided one kind is based on dynamic The device fault information method for digging of state correlation rule
Referring to shown in Fig. 1, the device fault information method for digging based on dynamic association rules provided in an embodiment of the present invention, Including:
S101, obtains equipment fault repair message data set D;
S102, n Sub Data Set is divided into according to reservation maintenance date by the equipment fault repair message data set D of acquisition D1,D2,...,Dn, wherein, D={ D1,D2,...,Dn};
S103, defines dynamic association rules algorithm, wherein, dynamic association rules are expressed as:Its In, A, B represent item collection respectively, and SV represents support vector, and CV represents confidence level vector, and s represents the support of item collection, and c is represented The confidence level of correlation rule,Represent the reasoning symbol of dynamic association rules;
S104, according to the dynamic association rules algorithm of definition, to n Sub Data Set D1,D2,...,DnDynamically associated Rule digging, obtains the incidence relation between equipment fault reason and maintenance measures.
The device fault information method for digging based on dynamic association rules described in the embodiment of the present invention, obtains equipment fault Repair message data set D;The equipment fault repair message data set D of acquisition is divided into n subnumber according to reservation maintenance date According to collection D1,D2,...,Dn, wherein, D={ D1,D2,...,Dn};Dynamic association rules algorithm is defined, wherein, dynamic association rules It is expressed as:Wherein, A, B represent item collection respectively, and SV represents support vector, CV represent confidence level to Amount, s represents item collection support, and c represents the confidence level of dynamic association rules;According to the dynamic association rules algorithm of definition, to n Sub Data Set D1,D2,...,DnDynamic association rules excavation is carried out, associating between equipment fault reason and maintenance measures is obtained Relation, so, excavates the reservation maintenance in the dynamic association rules for obtaining and the equipment fault repair message data set D for getting Date dynamical correlation such that it is able to reflect the dynamic association rules that excavation is obtained and the relation preengage between maintenance date.
Device fault information is excavated in order to be based on dynamic association rules, need to first define dynamic association rules calculation Method, in the present embodiment, in order to more fully understand the dynamic association rules algorithm of the present embodiment newly definition, first closes to traditional dynamic Connection rule is illustrated:
Traditional dynamic association rules are a kind of correlation rules that can be described and itself change over time.Traditional dynamic pass Connection rule is defined as follows:
If I={ I1,I2,...InIt is item set, the related data set D of task is collected into time period t, and t can To be divided into disjoint length as the time series of n, i.e. t={ t1,t2,...,tn}.According to t, data set D can be divided into phase The n Sub Data Set D={ D for answering1,D2,...,Dn, wherein Sub Data Set DiData in (i ∈ { 1,2 ... n }) are in the time Section tiCollected in (i ∈ { 1,2 ..., n }), each affairs T is the set of item in data set D so thatEach affairs There are an identifier, referred to as TID;If A is an item collection, and if only if comprising A for affairs TCorrelation rule be shape such as(Can also be expressed as:A==>B, wherein, item collection A is located at the left side of correlation rule, and item collection A can also claim It is the left item of correlation rule;Item collection B is located at the right of correlation rule, and item collection B is referred to as the right item of correlation rule) implications, WhereinAndRuleB sets up in data set D, with support s, during wherein s is D The percentage of transaction packet ∪ containing A B (i.e. A and B both), it is probability PD(A ∪ B), if PD[(A∪B)i] it is Sub Data Set DiIn The ratio between total number of records in the record number and data set D of included A ∪ B, then s may also indicate that to be probability RuleThere is confidence level c in transaction set D, confidence level c is conditional probability PD(B | A), if PD(Bi| A) it is Sub Data Set DiIn the record number comprising A ∪ B and the ratio between record number comprising A in data set D, then c may also indicate that to be probability
Define 3-1 supports vector (Support Vector)
Using support vector (SV) and confidence level vector (CV) and support s and confidence level c, 4 variables are commented jointly The rule of valency one.The support vector of item collection A is defined as:SV=[s1,s2,...,sn], wherein si(i ∈ 1,2 ..., n }) be Item collection A is in data subset Di(i ∈ 1,2 ... n }) the middle frequency f for occurringiRecord number M in (i ∈ { 1,2 ..., n }) and D it Than that is,:
si=fi/M,i∈{1,2,...,n} (3-1)
If the support of item collection A is s, then have
If minimum support is min_sup, if s>Min_sup sets up, then item collection A is referred to as frequent item set.Sometimes, profit The frequency occurred with item collection represents that support is more particularly suitable, and the support vector of such item collection is:
SV=[f1,f2,...,fn] (3-3)
Corresponding support can be expressed as:
Define 3-2 confidence levels vector (Confidence Vector)
Because dynamic association rules are being identicals from the process of frequent item set generation rule with common association rule, different Part is the calculating of confidence level vector, so this patent is solely focused on the generating mode of confidence level vector.Dynamic association rulesConfidence good fortune vector CV=[c1,c2,...,cn], wherein ci(i ∈ { 1,2 ..., n }) are between 0%~100% A percentage.If SVA∪B=[s(A∪B)1,s(A∪B)2,...,s(A∪B)n] it is the support vector of A ∪ B,It is the support vector of A,Be B support vector, and A support It is s to spendA, then have
If the support of A ∪ B is sA∪B, the support of B is sB, andThe confidence level of rule is c, then have
If min confidence is min_conf, if c >=min_conf sets up, ruleIt is that a strong dynamic is closed Connection rule.
Define the complete representation (Whole dynamic association rule) that 3-3 defines dynamic association rules
One complete dynamic association rules can be described as follows:
(SV=[s1,s2,...,sn], CV=[c1,c2,...,cn],s,c) (3-7)
Wherein, SV, CV, s and c describe the dynamic characteristic of rule, by (formula 3-3), (formula 3-5), (formula 3-4), (formula together 3-6) formula determines.
To be used for description rule in corresponding data subset due to dynamically associating support vector sum confidence level vector Dynamic property, therefore, it is necessary to be compared rational definition to it.In original justiceIt is defined as formulaBy In M be fixed value, therefore, in fact,Item collection A ∪ B can not be reflected in data subset DiSupport measurement in several, And only frequency measures f(A∪B)i
Original justiceIt is defined asFor what is determinedsxFor fixed.It with
It is equivalent, can not equally reflect DiIn confidence metric.From from the point of view of information theory, the measurement of the latter can not be provided Any new information, therefore be redundancy.For example, according to time t={ t1,t2Divide data set D={ D1,D2, it is assumed that D1Bag Containing 990 transaction journals, wherein, support the number of transactions of A ∪ B, AIt is 10, D2Comprising 10 things Business record, supports the number of transactions of A ∪ B, ARespectively 9,10.For ruleAccording to original Adopted (3-3), (3-5) formula has:
From the value of above-mentioned SV and CV, it is not difficult to find out that original justice has following weak point:
1st, D is individually considered1,In D1Support:Individually consider D2, in D2Support Degree:SoAndAbove-mentioned SV definition defines contradiction with classical support!
2、SV and CV has same ratio, can not provide new measurement Information
In view of original definition has weak point described above, determining for improved dynamic association rules algorithm is given below Justice:
Original justiceIt is defined asFor what is determinedsxIt is fixation, if min confidence is min_ If conf c >=min_conf sets up, ruleIt is a Strong association rule.
Then, more appropriate SV is provided, CV definition is described as follows:
Define 3-4 dynamic association rulesThe support vector of (or item collection A ∪ B) has following expression shape Formula:
Wherein,Represent item collection A ∪ B in Sub Data Set Di(i ∈ 1,2 ... n }) the middle frequency for occurring, | Di| it is Sub Data Set DiIn record number.In above-mentioned definition,Represent item collection A ∪ B in Sub Data Set DiIn support measurement.This When, original formula (3-2) is no longer set up, thenSupport s can be calculated by following formula:
Wherein, M is the record number in data set D.
Define 3-5 dynamic association rulesConfidence level vector have following representation:
Wherein,It is i-th element in the SV of item collection A ∪ B,It is i-th element in the SV of item collection A.It is above-mentioned In definition,Item collection A ∪ B are reflected in Sub Data Set DiIn confidence metric.ThenConfidence level c can lead to Following formula is crossed to be calculated:
So, following dynamic association rules just newly define:
Define mono- complete dynamic association rules of 3-6 have support vector SV, confidence level vector CV, support s and Tetra- parameters of confidence level c, it have form is expressed as below:
Wherein, SV and CV, s and c can be obtained according to (formula 3-9), (formula 3-11), (formula 3-10) and (formula 3-12) respectively, And it is used to describe the dynamic property of correlation rule together.
The support of new support vector SV, new confidence level vector CV and classics, confidence level definition match, can be with Preferably reflect that rule asks the multidate information of change at any time.
Define 3-7 and be provided with dynamic association rulesMinimum support threshold value is min_sup, minimum Confidence threshold value is min_conf, if s >=min_sup, and c >=min_conf, then claim dynamic association rulesFor strong Dynamic association rules.
As shown in regular (formula 3-13), the information of traditional support and confidence level was both contained in dynamic association rules, Additionally provide the common association unexistent time-varying characteristics information of rule.
If data set D is temporally stabbed separated, (formula 3-10), (formula 3-12) formula are calculated respectively according still further to more than Support and confidence level on the corresponding time period, that is, support vector sum confidence level vector, it is possible to stated by defining 3-6 Equally certain rule is evaluated.Such as whether certain failure associates by force with certain maintenance measures, and the rule is within a period of time Confidence level situation of change then reflect the strong and weak change of the correlation degree.
Improved dynamic association rules mining algorithm:
Define 3-8Frequency vector has following form:
Wherein,It is item collection A ∪ B in Di(i ∈ 1,2 ... n }) the middle frequency for occurring.
According to new dynamic association rules definition and the record number quantitative relation of existing Sub Data Set, new dynamic pass is proposed Connection rule-based algorithm process is as follows:
1) produce algorithm to produce using frequent item set the Sub Data Set on each time period and meet support threshold requirement Frequent item set and corresponding support, here it is definition in support vector.
2) because the record count on each time period and total record count become more readily available, therefore certain is asked Can be obtained by the regular support vector sum relative to the total number of records before current slot when rule is to support, The ratio of record number that wherein can be by known record sum and on the time period relative to the support vector of the total number of records is closed System is calculated.
3) Sub Data Set on each time period is generated by the frequent item set for meeting support threshold requirement and meets confidence Spend the correlation rule of threshold requirement.Because the present embodiment is it is important that dynamic association rules algorithm, and generated by frequent item set The algorithm of correlation rule can find in basic definition, therefore not repeat here.
4) confidence level of the rule of correspondence is generated, using the confidence level record number in record sum and current each time period Quantitative relation calculates the ratio between the support (or frequency) of left item in correlation rule and support (or frequency) of right item.
In the present embodiment, for a better understanding of the present invention, the setting based on dynamic association rules provided the present embodiment The step of standby fault information mining method, is described in detail, the device fault information excavation side based on dynamic association rules The specific steps of method can include:
Step one, acquisition equipment fault repair message raw data set, for example, the acquisition equipment fault repair message is former The raw data set that beginning data set is made up of 680,000 equipment fault repair messages, as shown in table 1:
Equipment fault repair message data set (partial data) of table 1
As shown in table 1, every record can include:Purchase machine date, purchase market, purchase price, market rank, installation day Phase, day of appointment, equipment major class, equipment brand, unit type, handling time, failure cause description, maintenance measures, demand clothes The affairs composition containing 29 repair messages such as business mode, it is seen then that raw data set is that a very big higher-dimension of capacity is discrete Data set.The initial data is concentrated in the presence of many loss of learnings and redundancy, it is necessary to enter to the data that the initial data is concentrated Row pretreatment and discretization.
Step 2, the data concentrated to the initial data pre-processed and discretization before, need to determine to dig The objective attribute target attribute of pick, because the present embodiment is the device fault information method for digging based on dynamic association rules, so necessarily selecting Select and dynamic time, fault message, the main body-equipment of repair message and rule is described as the objective attribute target attribute for excavating is used to Objective attribute target attribute includes:Day of appointment, equipment major class, equipment brand, unit type, failure cause, maintenance measures, failure cause are retouched State;And this 7 target attribute datas are obtained as mining data.
Step 3, the mining data/objective attribute target attribute data to obtaining are pre-processed, and the pretreatment includes:Treatment mesh Attribute missing values in mark attribute, the inconsistent value of attribute format and/or remove redundancy value:
A) the attribute missing values in treatment objective attribute target attribute:Attribute missing values refer to the missing of indispensable attributes value, such as maintenance measures With the missing of failure cause description information, the data for treating this problem can only be deleted, because they do not include is available for what is excavated Necessary information;
B) the inconsistent value of processing attribute form:For example, being possible to be contaminated with irregular character in date data, cause to import Report an error and cannot import due to the inconsistent of data type (attribute format) during database, at this time also this partial data is picked Remove;
C) redundancy value is removed:For example, the excessively tediously long length beyond database character type field of failure cause description field, Also must be removed from.Preliminary screening is carried out to objective attribute target attribute data in preprocessing process, has been allowed to meet in subsequent step and is led Enter the requirement of database or discretization.
Objective attribute target attribute data are carried out discrete character by step 4 to pretreated, obtain equipment fault repair message Data set D, specifically:To distinct device major class, equipment brand, maintenance measures are represented with corresponding code name, so that formation rule Tables of data as parameter for rule digging program run.
The present embodiment can process this 680,000 line number as database with Microsoft SQL Server 2008 According to containing attribute missing values, after the inconsistent value of attribute format, redundancy value carry out deletion action, in objective attribute target attribute data Chinese description carries out discrete character, so that rule digging program is run, is carried out by taking maintenance measures discretization process as an example discrete Change, if maintenance measures table is in maintenance databases, here is the part steps of discretization process:
A11, chooses all different items in maintenance measures table
usemaintanance
select distinct weixiucuoshi into weixiubiao from weixiujilu
Different maintenance measures are represented by A12 with different code names
The different maintenance measures weixiucuoshi occurred in maintenance record table weixiujilu is inserted into new table In weixiubiao, weixiubiao is then updated, in the corresponding sign ID character strings of different maintenance measures addition for filtering out (different code names) are with differentiation:
usemaintanance
updateweixiubiao
SetweixiucuoshiID='M'+cast (t1.rowID as varchar (10))
from
(
selectweixiucuoshi,ROW_NUMBER()over(Order by weixiucuoshi)as rowId
fromweixiubiao
)
as t1
Where t1.weixiucuoshi=weixiubiao.weixiucuoshi
A13, combines maintenance measures frequency table
Weixiubiao and the weixiucuoshi_num_desc tables Left-wing Federation for recording each maintenance measures respective numbers are closed To the discretization character string and the corresponding frequency for occurring of individual maintenance measures
usemaintanance
select
weixiubiao.weixiucuoshiID,weixiubiao.weixiucuoshi,weixiucuoshi_num_ desc.weixiucuoshi_num
fromweixiubiao
Left join weixiucuoshi_num_desc on weixiubiao.weixiucuoshi=
weixiucuoshi_num_desc.weixiucuoshi
order by weixiucuoshi_numdesc
The discretization table of other several Chinese characters can be similarly obtained, such as failure cause describes discretization table, equipment Brand discretization table etc., after obtaining these discretization tables, carries out joint operation to obtain the number run for rule digging program According to table
A14, each property value joint of discretization
usemaintanance
select
weixiujilu.yuyue_date,weixiujilu.category,weixiujilu.brand, weixiujilu.xi
nghao,weixiujilu.guzhangyuanyindaima,guzhangyuanyinmiaoshubiao.guzhan gyu
anyinmiaoshuID,weixiucuoshibiao.weixiucuoshiID
fromweixiujilu
Inner join weixiucuoshibiao on weixiujilu.weixiucuoshi=
weixiucuoshibiao.weixiucuoshi
inner join guzhangyuanyinmiaoshubiao on weixiujilu.guzhangyuanyinmiaoshu
=guzhangyuanyinmiaoshubiao.guzhangyuanyinmiaoshu
order by yuyue_date asc
Thus obtain finally carrying out the day of appointment of dynamic association rules excavation, equipment major class, equipment brand, equipment Model, failure cause code name, failure cause describes this 7 target attribute datas such as code name, maintenance measures code name.
By taking maintenance measures as an example, the frequency of code name and corresponding maintenance measures after its discretization is as shown in table 2:
The maintenance measures discretization code name of table 2 and frequency table
By taking failure cause description as an example, the frequency such as table 3 of code name and corresponding failure the reason description after its discretization It is shown, illustrated so that the display panel module in table 3 is damaged as an example, the code name that display panel module is damaged is F1120, and frequency is 12247。
The failure cause frequency of table 3 and discrete code table
Step 5, deblocking carries out piecemeal to the data set D after discretization, and traditional is broken generally into training dataset And test data set.Training dataset is the data set for mining rule or pattern, and test data set is for test pattern Or the effective data set of rule.Due to the present embodiment research is the fault information mining based on dynamic association rules, so this Piecemeal in embodiment refer to will be discrete after data set D divided according to timestamp, obtain n Sub Data Set D1,D2,..., Dn, wherein, D={ D1,D2,...,Dn, so as to carry out dynamic association rule mining.
Due to the piecemeal in the present embodiment refer to will be discrete after data set D divided according to timestamp, the data set D In every record comprising a persond eixis attribute (i.e. time_id) as the foundation of partitioned data set D, in investigating data set D The item relevant with the time has:Purchase machine date, installed date, day of appointment accepts date etc., and the time that the present embodiment is chosen is: (day of appointment is referred to as day of appointment:Reservation maintenance date) because being divided according to reservation maintenance date, obtain Sub Data Set distribution is more reasonable than the distribution of the result to be split according to the time, uniformly, there is statistical law.By SQL After Server is arranged, the distribution results such as table 4 after being divided according to day of appointment shows:
The breakdown maintenance information time distribution statisticses table of table 4
Day of appointment Maintenance of equipment frequency
9/30/2013 5564
9/29/2013 5113
9/28/2013 5095
........ .......
9/19/2013 3008
9/21/2013 2889
In the present embodiment, if to purchase the standard that the machine date is segmentation, time span is 1980-2013, wherein, point Data of the cloth before nineteen ninety only have tens, it is clear that do not meet statistical law.Using other dates as Division Dates also all Have an identical problem, therefore the Division Dates of the present embodiment selection are reservation maintenance date, by after division, data set D's Distribution statisticses result is as shown in Fig. 2, Fig. 3, Fig. 4, Fig. 5.
Step 6, using the dynamic association rules algorithm of above new definition to by n Sub Data Set D after division1, D2,...,DnDynamically excavated, when support increases successively from 1% to 15% in the case that confidence threshold value is 50%, The frequent item set and correlation rule excavated all are gradually decreased, wherein, support threshold is 15%, and confidence threshold value is As shown in table 5, support threshold is 15% to frequent item set mining result when 50%, association rule when confidence threshold value is 50% Then Result is as shown in table 6:
The support threshold of table 5 is 15%, frequent item set mining result when confidence threshold value is 50%
Frequent item set Support
item:('M219',) 0.162
item:('C4','M219') 0.162
item:('B5','M219') 0.162
item:('B5','M219','C4') 0.162
item:('B5','C1') 0.162
item:('FTDES1120',) 0.172
item:('HTVYY90000',) 0.172
item:('B5','HTVYY90000','C4') 0.172
item:('FTDES1120','B5','C4','HTVYY9000) 0.172
item:('C2',) 0.185
item:('C1',) 0.192
item:('C4',) 0.572
item:('B5','C4') 0.572
item:('B5',) 0.841
The support threshold of table 6 is 15%, association rule mining result when confidence threshold value is 50%
Correlation rule Confidence level
Rule:(' B5')==>('C1',) 0.504
Rule:(' C1')==>('B5',) 0.838
Rule:(' C4')==>('B5',) 1
Rule:(' C1')==>('B5',) 0.81
Rule:(' C4')==>('B5',) 1
Rule:(' C2')==>('B2',) 0.671
Rule:(' C1')==>('B5',) 0.814
Rule:(' B2')==>('C2',) 0.997
Wherein, the B* in table 5 and table 6 is that equipment brand code name, C* are equipment major class code name, and equipment brand code name is specifically right As shown in table 7, the specific corresponding equipment major class of equipment major class code name is as shown in table 8 for the equipment brand answered:
The equipment brand discretization code name of table 7 and there is frequency
The equipment major class discretization code name of table 8 and there is frequency
By rule:Rule:(' B5')==>(' C1'), 0.504 understands:Equipment be west gate sub-brand name==>Equipment is The confidence level of domestic air conditioning is that the half in the Siemens's equipment for needing repairing that breaks down in 50.4%, namely mantenance data is family Use air-conditioning.It can be seen that the probability for breaking down of Siemens's domestic air conditioning is higher in the breaking down of Siemens's all devices.
By rule:Rule:(' C1')==>(' B5'), 0.838 know equipment for air-conditioning==>Equipment is Siemens's product The confidence level of board is 83.8%, namely it is west gate sub-brand name that the air-conditioning equipment for breaking down has 83.8% probability, it is seen that empty West gate sub-brand name account for most of in tune.
By rule:Rule:(' B2')==>(' C2'), 0.997 know equipment brand for Aukma==>Equipment is
The confidence level of refrigerator is ice for the probability for having 99.7% in 99.7%, namely the Aukma brand equipment for breaking down Case, refrigerator account for the overwhelming majority in illustrating Aukma equipment.
As can be seen that when support threshold is set to 15%, the correlation rule for obtaining is those intensive value attribute point rules, Because in the middle of the attribute chosen, equipment brand only has 5 kinds shown in subscript, and the failure cause after discretization has more than 1100 Kind, discretization code name is from F1 to F1181, and the frequency that the failure cause that 600 are had more than in the description of these failure causes occurs Less than 20, the support of frequent item set of the rough estimate comprising these failure causes is not over 20/11000=0.018%.And Maintenance measures after discretization have 585, and code name is from M1 to M585, wherein having more than the maintenance measures ability maintenance measures of 300 In there is frequency less than 20, the support of the frequent item set similarly comprising these minority's maintenance measures is not more than 0.018%; This if support threshold is set as 15% that is, count greatly relevant maintenance of equipment failure causes and maintenance of equipment measure Not all without being retained, the correlation rule that these failure causes and maintenance measures are included accordingly will not also be excavated frequent item set Out, because they are in a step of generation Candidate Set because support threshold is removed.
Influence for apparent performance with the setting of support threshold to Result, have chosen support below Threshold value is 8%, confidence threshold value be 50% Result as control, support threshold is 8%, and confidence threshold value is 50% When frequent item set mining result it is as shown in table 9:
The support threshold of table 9 is 8%, frequent item set mining result when confidence threshold value is 50%
It is 48 frequent item sets when 8% confidence level is 50% that table 9 gives support threshold, it is current it can be found that by In the reduction of support threshold, the larger failure cause of some frequency ratios and maintenance measures can also be remained, as frequent episode As a result, the code name being directed to is as shown in table 10:
The support threshold of table 10 is 8%, and result part code name explains table when confidence threshold value is 50%
Support threshold is 8%, and association rule mining result when confidence threshold value is 50% is as shown in table 11:
The support threshold of table 11 is 8%, association rule mining result when confidence threshold value is 50%
Be can be seen that due to reducing support threshold from this result, more occurred in that in frequent item set intentionally The maintenance measures of justice and the information of failure cause, and more than associate this information between device class and brand.Can be with It was found that, the confidence level of the correlation rule constituted between B5, C4, M219 and FTDES1120 is all especially big, with Rule:('B5',' C4', ' M219')==>As a example by (' FTDES1120') 0.902, (equipment is west gate sub-brand name to rule, and equipment is TV, maintenance Measure is debugging)==>The confidence level of (failure cause uses problem for user) is 90.2%, illustrates to be concentrated in the mantenance data The measure of Siemens television maintenance debug in the case of the overwhelming majority be to be caused using problem by user, and non-product sheet The failure problems of body, so as to illustrate that Siemens's television set should use more user-friendly product introduction to Chinese market or use Explanation.Certainly confidence level of this rule up to 90.2% also failure discrete sheet and maintenance measures table above from side illustration In come foremost the frequency of FTDES1120 and M219 be very high, in fact their frequency is respectively 12247, 11002。
But this is not most significant result, can explain and determine the frequent of failure cause and maintenance measures related information Item collection and correlation rule are only the emphasis of the present embodiment discussion, thus for it is this simply illustrate the frequent item set of customer problem with Correlation rule is necessary to carry out beta pruning deletion.If insignificant rule is carried out beta pruning deletion be placed in source program comprehend by Can be carried out in each circulation in the judgement of character string, efficiency is low.Therefore data can be combined in last result rendering step Storehouse technology gets rid of this part rule, and the present embodiment is smaller to this part meaning, and confidence level rules results but very high are used It is exactly to use sqllite3 databases, the sanction of these results is carried out using the powerful query function of database and efficient storage mode Cut.This illustrates that the data overwhelming majority that we are commonly used to contain as the data source of data mining is meaningless from another point of view , it is only fewer that just there is meaning.
Certainly 8% it is also to have excavated others compared with Strong association rule supporting that threshold value is, such as:Rule:('M483',) ==>(' B5', ' C4') 0.774, (maintenance measures are identification)==>(west gate electronic television) to be disclosed repair in fault message and arranged Apply during for fault determination, equipment is very big for the possibility of west gate electronic television, certainly as described above, this is nor the present embodiment phase The significant conclusion hoped.
As discussed earlier, if support threshold set it is excessive, excavate be entirely just those meanings not Big related information, and most repair message and failure cause are required for support threshold to be no more than 20/ in data set 110000=0.00018, considers the computing storage performance and the precise requirements degree to Result of computer, determine with 1% as frequent item set support threshold, 50% as rule confidence threshold value.The frequent item set so excavated and pass Connection regular record is in table 12 and table 13:
The support threshold of table 12 is 1%, frequent item set part Result when confidence threshold value is 50%
Table 12 is part frequent item set mining result when support is 1%, and the left side is frequent item set, and the right is corresponding Support.Because support threshold is reduced to 1%, thus many occur in data set frequency it is larger be maintenance measures or therefore Barrier description all keeps down in the result, frequency of the failure cause and maintenance measures in the result in data set can occurs Number is not less than 1%*116000=1160.This is the considerable number of a comparing, and maintenance measures and fault message are illustrated enough Between some incidence relations.
As previously described, in order to excavate the more significant relation between equipment fault and maintenance measures, above Frequent item set result specially intercepted the result of specific failure and specific maintenance measures, it is related to customer issue or debugging Result has been omitted, and is repeated no more.
With frequent item set item:As a example by (' FTDES419', ' C4', ' M167') 0.012, frequent item set (No. 419 failures Reason:Power supply board component is damaged, and equipment is TV, No. 167 maintenance measures:Plate class maintenance have position number-overhaul) support be 1.2%, actually 1.2% support can not illustrate that TV set power component damages associating between failure and class class maintenance Relation is very weak, must see confidence level, because as above analyzed, topmost maintenance measures or failure cause are all visitors Family is this and without the information of too big meaning using problem or simple debugging identification, and specific significant maintenance measures or The shared frequency very little of failure cause description, so 1.2% support not is very in the data that the present embodiment is excavated Small support.Actually No. 419 maintenance measures:Power supply board component is damaged, and frequency where in data is 1040, and the 167th The frequency of number maintenance measures M167 (maintenance of plate class has position number-overhaul) in data is 3407.Other frequent item set results item:(' M474', ' HTVYY10200', ' C4') 0.013 illustrates that television class equipment is have changing for position number with maintenance measures The support of part is 0.013.Frequent item set item:(' B5', ' M427', ' FTDES908', ' C1'), 0.012 illustrates west gate There is outdoor control panel no-voltage failure in the air-conditioning equipment of sub-brand name and maintenance measures are to change the combination of outdoor unit control panel frequently Support of numerous item collection in data set is 0.012, other frequent item set results and so on.
The frequent item set produced by 1% support threshold, the correlation rule portion produced when confidence threshold value is 50% Divide result as shown in table 13:
The support threshold of table 13 is 1%, correlation rule part Result when confidence threshold value is 50%
It can be found that more significant rule conclusions are occurred in that when support threshold is reduced to 1%, in correlation rule, and No longer simply debugging or user use this kind of insignificant information of problem, and this is occurred in that more because support threshold declines certainly What the frequent item set of many meaningful informations caused.Part rule conclusion taken above is specially eliminated and those meaningless letters The relevant conclusion of breath, makes the focus of discussion go to specific specific breakdown maintenance rule and comes up.
With Rule:(' FTDES523', ' HTVYY10000')==>As a example by (' M474', ' C4') 0.857, rule (523 Number maintenance failure reason:Integrated circuit is damaged)==>(maintenance measures are the part that changes for having position number, and device category is TV) puts Reliability is 85.7%, is illustrated in the failure that integrated circuit is damaged, and it is television set that equipment has 85.7% probability, and corresponding is dimension The measure of repairing is the replacing for carrying out part for having position number.
Similarly, Rule:(' M348')==>(' C1', ' FTDES487'), (maintenance measures are for more for 0.838 explanation rule Change power model)==>(equipment major class be domestic air conditioning, failure cause be power model without output) confidence level be 83.8%, this illustrates that the overwhelming majority is that domestic air conditioning occurs in that power model without output in needing the maintenances for change power model Failure.
Rule:(' M469', ' C4')==>(' HTVYY81000', ' FTDES357'), (No. 469 maintenances of 0.992 rule Measure:Unit module is changed without position number, device category is television set)==>(failure cause is unit module component failures) Confidence level is that the most television maintenance measures of 99.2% explanation are that the reason for changing unit module without position number ground is unit mould Block assembly fails.The degree of modularity of this television components from side illustration is very high, the unit replacement that component failures cause Ratio is very big.
Rule:(' FTDES908')==>(' B5', ' M427', ' HKTYY05208'), 0.776, rule (No. 908 events Barrier reason:Outdoor unit control panel is without operating voltage)==>(equipment brand is Siemens, and maintenance measures are the outdoor control of replacing Plate) confidence level be 77.6%.Illustrate that the failure that equipment occurs outdoor control panel no-voltage is likely to be then Siemens Equipment, and corresponding maintenance measures are the outdoor control panel of replacing.It is that outdoor control occurs to illustrate that notebook data concentrates Siemens's equipment The main equipment of making sheet failure.And most of maintenance measures when there is outdoor control panel no-voltage failure are directly to change control Maintenance inside plate, rather than control panel.
Then, the generation to dynamic association rules is illustrated:
With regular Rule:(' FTDES908', ' M427')==>(' B5', ' HKTYY05208', ' C1'), 0.829 is Example, shows the generating process of dynamic association rules.(failure cause is outdoor unit control panel without operating voltage, maintenance measures to rule To change outdoor unit control panel)==>The confidence level of (equipment brand is Siemens, and equipment is domestic air conditioning) is 82.9%.
The support vector of the left item of the rule on n (n=30) individual Sub Data Set is obtained first, is given here by reservation Left item frequent item set item on 30 Sub Data Sets that maintenance date piecemeal is obtained:(' FTDES908', ' M427') 2013/ The support vector of the support composition of 1-30 days of 09 month:
Sv=[0.027,0.038,0.035,0.034,0.046,0.029,0.025,0.026,0.033,0. 029,
0.030,0.033,0.023,0.031,0.025,0.029,0.035,0.020,0.017,0.017,0.018,
0.025,0.023,0.018,0.018,0.017,0.017,0.015,0.016,0.015]
If after discretization, record number M=116391, M in equipment fault mantenance data collection DiNumerical value as shown in table 14, Wherein, MiIt is Sub Data Set DiIn record number, i ∈ { 1,2 ... 30 }.
Table 14 is by the maintenance of equipment frequency table after reservation maintenance date piecemeal
Date Maintenance of equipment frequency Date Maintenance of equipment frequency
9/30/2013 5564 9/23/2013 3622
9/29/2013 5113 9/2/2013 3581
9/28/2013 5095 9/1/2013 3578
9/24/2013 4991 9/25/2013 3514
9/13/2013 4818 9/12/2013 3422
9/27/2013 4714 9/5/2013 3363
9/18/2013 4390 9/9/2013 3329
9/8/2013 4330 9/22/2013 3327
9/6/2013 4194 9/17/2013 3302
9/14/2013 4008 9/3/2013 3253
9/10/2013 3984 9/20/2013 3194
9/26/2013 3952 9/11/2013 3181
9/4/2013 3950 9/16/2013 3118
9/7/2013 3942 9/19/2013 3088
9/15/2013 3631 9/21/2013 2829
MiThe value of (i ∈ { 1,2 ... 30 }) correspondence from table 14 is obtained:Such as M7=3942, representative is 2013,09/07 Maintenance record sum in this data block.Then regular left item item:The support of (' FTDES908', ' M427') can be by Definition is tried to achieve:
So frequent item set item:(' FTDES908', ' M427') is 2.5% in the support of data set, more than us Setting is support threshold 1%, so the frequent item set can be remained.
Here is correlation rule Rule:(' FTDES908', ' M427')==>(' B5', ' HKTYY05208', ' C1') Confidence level vector:
Cv=[0.854,0.875,0.796,0.859,0.825,0.843,0.810,0.820,
0.800,0.807,0.872,0.850,0.838,0.887,0.813,0.865,0.807,
0.831,0.769,0.769,0.865,0.783,0.805,0.705,0.800,0.788,
0.825,0.934,0.866,0.815]
Confidence level vector is for describing distribution situation of the confidence level of dynamic association rules in each Sub Data Set.
According to new dynamic association rules algorithm, the support of the full item of composition rule is vectorial and support vector of left item It is as follows with regular confidence level:
sRi=[0.023,0.033,0.028,0.029,0.038,0.024,0.021,0.021,0.026,
0.023,0.026,0.028,0.019,0.027,0.020,0.025,0.028,0.017,
0.013,0.013,0.016,0.020,0.018,0.012,0.015,0.013,0.014,
0.014,0.014,0.012]
sRLi=[0.027,0.038,0.035,0.034,0.046,0.029,0.025,0.026,0.033,0. 029,
0.030,0.033,0.023,0.031,0.025,0.029,0.035,0.020,0.017,0.017,0.018,
0.025,0.023,0.018,0.018,0.017,0.017,0.015,0.016,0.015]
WhereinIt is that the support of left item in rule and support are vectorial, that is, frequent item set item:(' FTDES908', ' M427') support and support vector.sR,sRiIt is the frequent item set of left and right in rule full item composition Support and support vector.
Thus correlation rule Rule is drawn:(' FTDES908', ' M427')==>('B5','HKTYY05208','C1') Confidence level be 82.9%, be greater than confidence threshold value 50%, it is believed that be Strong association rule.If the regular equipment of this explanation The reason for breaking down be outdoor unit control panel without operating voltage, and maintenance measures are to change outdoor unit control panel, then this sets Spare unit board is up to 82.9% for the possibility of the domestic air conditioning of Siemens.
Knowable to mining process above to dynamic association rules, dynamic association rules algorithm Calculation Estimation frequent item set and The advantage of correlation rule is:When time change, support threshold and confidence level were met originally with the Strong association rule for instructing It is possible to do not meeting support threshold and confidence threshold value, is no longer Strong association rule.As can be seen that current frequent item set The confidence level of support and correlation rule is closely related with current time period t ime_id, and the support of current frequent item set is The vectorial information record number with the corresponding time period of support of the frequent item set on the time period before accounts for whole record The dot product of number ratio, the support of frequent item set with the record number on the time period not only with support vector about also accounting for record always Number ratio is relevant.Similarly the confidence level of current correlation rule is the full item frequent item set for constituting the correlation rule in the time before The support of the support of the formation in section and formation of the left item frequent item set of the correlation rule on the time period before it Ratio.
Dynamic association rules result above is a part for the present embodiment data mining results, and more results can lead to The visualization interface based on web is crossed to show.
In the present embodiment, in addition it is also necessary to investigate whether the frequent item set excavated or correlation rule have meaning, or say and be It is no be it is interesting, this be accomplished by excavate dynamic association rules evaluate, in the present embodiment, dynamic association rules are commented Valency has used the support and confidence level being not only in conventional rules correlation evaluation method, is also added into reflection dynamic time change Support vector sum confidence level vector as overall merit dynamic association rules foundation;In the present embodiment, it is possible to use post Two methods of shape map analysis and time series analysis are analyzed to the two vectors and obtain You Guan regular more detailed information. Meanwhile, when prediction is analyzed using the dynamic association rules of generation, it is also desirable to checking data set to dynamic association rules Dynamic regressioncoefficients carry out the calculating of error, the order of accuarcy of the dynamic characteristic of dynamic association rules is evaluated with this.
1) block diagram method, the block diagram of support vector sum confidence level vector can be explicitly described regular support and put The distribution situation of reliability;And can qualitatively reflect the situation that regular support and confidence level change with time, according to fixed Justice, it can be found that the variation tendency of support and confidence level is identical, therefore, need only to draw the post of one of vector Shape figure just can be with.Still with correlation rule Rule:(' FTDES908', ' M427')==>('B5','HKTYY05208','C1') As a example by, the rule is in the frequent item set support vector of the full item of confidence level vector sum of time section 09/01/-09/10
Cv'=[0.854,0.875,0.796,0.859,0.825,0.843,0.810,0.820,0.800,0. 807]
Sv'=[0.023,0.033,0.028,0.029,0.038,0.024,0.021,0.021,0.026,0. 023]
Can then draw the correlation rule confidence level vector, as shown in fig. 6, and constitute the correlation rule full item it is frequent The block diagram of the support vector of item collection, as shown in Figure 7.
Regular Rule:(' FTDES908', ' M427')==>The full item frequent episode of (' B5', ' HKTYY05208', ' C1') Collection item:(' FTDES908', ' HKTYY05208', ' B5', ' M427', ' C1'), by support thus its chart of frequency distribution can be obtained As shown in Figure 8.
It can be found that support, first to rise in a cycle in cycle in decline, is then being carried out with 10 from Fig. 8 The raising and lowering of next cycle.It shows that the rule is a rule for frequency cycle downward trend, and (equipment rule occurs The reason for failure be outdoor unit control panel without operating voltage, maintenance measures are to change outdoor unit control panel)==>(equipment brand Be Siemens, equipment is domestic air conditioning) confidence level into downward trend, illustrate that Siemens's air-conditioning accounts for the outdoor control version of generation without electricity The percentage of the equipment of the failure of pressure is gradually reduced, and this is probably the edge that Siemens's air-conditioning is gradually reduced in the ratio of air-conditioning market Therefore.It is this to have good effect in practical application by support, discounting for the information number difference of each time divided block, Downward trend, periodic trend are found from the block diagram of confidence level or support vector, a downward trend shows Regular is effective bad, and application effect will be poor, and the trend of a cycle shows that rule is not only meeting for stabilization The application of its period of change just has good effect.
2) time series analysis
Time series analysis is using a kind of relatively more extensive method in description data variation and prediction data trend.Such as Really a support vector is represented with the frequency that rule occurs, and containing enough elements, it may be suitable for time series Is still with above rule Rule for analysis:(' FTDES908', ' M427')==>(' B5', ' HKTYY05208', ' C1') as a example by, The full item frequent item set item of the rule:(' FTDES908', ' HKTYY05208', ' B5', ' M427', ' C1') in time piecemeal For the support vector on the time piecemeal of 09/01-09/30 is
Sv "=[0.023,0.033,0.028,0.029,0.038,0.024,0.021,0.021,0.026,
0.023,0.026,0.028,0.019,0.027,0.020,0.025,0.028,0.017,
0.013,0.013,0.016,0.020,0.018,0.012,0.015,0.013,0.014,
0.014,0.014,0.012]
And the information number on known each split time block, it is expressed as frequency vector as follows:
Num_block==[5564,5113,5095,4991,4818,4714,4399,4330,4194,
4008,3984,3952,3950,3942,3631,3622,3581,3578,3514,
3422,3363,3329,3327,3302,3253,3194,3181,3118,3088,2829] (4-10)
The support vector that frequent item set can then be calculated is:
So, a time series regression model can be set up describe the frequency of the full item frequent item set of the rule and change Journey, is designated as f (i), then can be found that trend of the regular grid DEM within the time period from f (i), and can utilize this time The support for returning formula predictions following.Using time series analysis, the quantitative model of regular support or confidence level can be found, It can provide information more accurate than block diagram, most importantly can be with the development trend of prediction rule.Using support It is feasible that degree is predicted, and the present embodiment considers the dynamic characteristic of correlation rule, by way of splitting mining data collection, Excavate and not only include support and confidence level, and comprising this rules of correlation rule of support vector sum confidence level vector The information for itself changing over time can be provided, be capable of the development trend of prediction rule, not had with common association rule Function.
The regression expression of time series regression model is:
Xt=a1Xt-1+a2Xt-2+a3Xt-3+a4Xt-4+....anXt-n
XtIt is the value of sequence current time,, by preceding n entry value into the linear relationship, generally the right has noise and adds to examine for it Test its time shift disturbance rejection.If wherein known support vector:
SV=[f1,f2,...fn]
Then coefficient correlation can be calculated according to below equation:
Due to having been given by support vector above:
Therefore dependency number instrument can be called to be calculated, and following procedure is run in Matlab, carry out autoregression and forward Prediction:
X=load (' support_num.txt')
Y=aryule (x, 4) %4order model
%y [n]=a1y [n-1]+a2y [n-2]+...
A=lpc (x, 3) %can predict the next value using previous 3values
Est_x=filter ([0-a (2:], 31) 1, [x, rand]) %1-D filtering
Est_x (end) %the end value, also the predict value
Plot (x) %plot the original value plus the predicted value
hold on
Plot (est_x, ' r') %red to mark
Plot (length (est_x), est_x (end), ' * g') %highlight the predicted value
Ext_2=filter ([- a (2:)], end 1, x) %another method to predict
Isequal (ext_2 (end), est_x (end)) %compare the two methods
If the support vector of 29 days in the past carries out autoregression, the support vector of No. 30 can be predicted, Operation said procedure can obtain predicted value:
est_x(end)
Ans=
31.807
Time series can be tried to achieve respectively with Least Square and Burg Maximum Entropy two methods in addition to return The coefficient for returning each rank of model to return, as shown in table 15 and 16.
The Burg Maximum Entropy method coefficients of table 15
The L of table 16 (Least Square) method regression coefficient
Coefficient value according to table above can be returned and predicted, the curve matching for being returned and being predicted such as Fig. 9 It is shown, the result according to Fig. 9, the 30th value that can be predicted be with the error of actual value:
Can see that, 3.5% predicated error can be receiving, then just can be using some sequential values before the sequence Try to achieve sequential value below.
Similarly for the correlation rule excavated, it is also possible to according to above thinking with confidence level histogram analysis its rise under Drop trend, carries out regression analysis, and can be predicted with time series regression model to existing confidence level vector.
The present embodiment also provides a kind of data mining visualization system using Django frameworks, and Result is carried out can Show depending on changing, as shown in Figure 10, Django is a Web application frame for open-source code name to the fundamental diagram of Django frameworks Frame, it is had been developed to for managing some websites based on news content under Lao Lunsi Publishing Groups, and Issued under BSD licensings in July, 2005.The core component of Django frameworks has:
(1) for creating the Object Relation Mapping of model (Model);
(2) it is perfect administration interface that administrator designs;
(3) first-class URL designs;
(4) the friendly template instruction of designer;
(5) caching system.
Django is a framework based on MVC constructions.But in Django, controller receives the part of user input Voluntarily processed by framework, so model (Model), template (Template) and view (Views) are more concerned with Django, Referred to as MTV patterns, their respective responsibilities are as shown in table 17:
Each level function of the Django frameworks of table 17
Django views do not process user input, and only determine to represent which data to user, and Django templates Only determine how to represent the data that Django views are specified.In other words, be further broken into for the view in MVC by Django Two parts of Django views and Django templates, determine " which data represented " and " how representing " so that Django respectively Template can replace at any time as needed, and be not limited to built-in template.
As for MVC controllers part, realized by the URLconf of Django frameworks.URLconf mechanism is to use canonical table URL is matched up to formula, suitable Python functions are then called.Rules of the URLconf for URL does not have any limitation, completely may be used It is either traditional to be designed to arbitrary URL styles, RESTful's, or abnormal type.Framework is key-course to encapsulation , all it is nothing but the reading of database table with data interaction this layer, write, delete, the operation of renewal.When program writing, as long as Call corresponding method.
The web frameworks based on Django that the front end visualization portion of notebook data digging system is used, mainly there is following several Individual part:
1) design of template page Django_base.html, is all also in face of template, and other pages are in this template It is upper to add what block content were formed.
2) view file (view.py) writes;Django has the concept of " view ", and treatment user is responsible for for encapsulating Request and the logic of return response.The content of all about view that your needs are known can be found by following link.
3) Django that writes of model file (models.py) provides a level of abstraction (Models) to build and operate Data in web applications,
4) design at Admind Admin interfaces, checking and login of keeper's identity etc.
The interface of the data mining visualization system includes:
1) keeper's login interface, login interface has the functions such as account management, back-end data management, for The data model of models.py definition can perform increase, delete, and the operation such as modification, such as the present embodiment are directed to and failure dimension is presented The information of repairing establishes frequent item set data model and correlation rule data model, as follows.In the upper right side of login page also There are welcome, the menu function such as historical data operation note and cancellation.
2) the date link page, as described in dynamic association rules algorithm above, sets according to reservation maintenance date to acquisition Standby breakdown maintenance message data set D is split, and all data can be divided into the 30 of in September, 2013 days.Into main boundary Date navigation link is clicked on behind face, the Result interface of specific frequent item set some day and correlation rule can be linked to. The respective Result on time block before namely in dynamic association rules relevant information.So there is 30 below homepage Button may be coupled in this 30 days detailed data page, be below then a query frame, and user can key in frequent episode Collection or correlation rule, click on and determine the results list comprising this then occurs.
3) detailed results interface, as shown in figure 11, the detailed results page on the date is linked into by the date, so that it may see this Result list under date, as a result can be the date under correlation rule or frequent item set it is as shown below to select the date For 09/08 when the excavation detailed results page.
4) query page, as shown in figure 12, there is a search box below the date, can inside key in frequent item set or That correlation rule carries out matching inquiry, the result for inquiring can be presented in the form of a list, the frequent item set meeting that is matched and point Block date, support vector or the frequent item set on the piecemeal are showed together, correlation rule meeting that is matched and are divided Block date, confidence level vector sum correlation rule on the piecemeal is showed together.Here is corresponding when M507 is searched for Rule digging result.
5) histogram analysis interface
Certain rule is selected, if it has dat recorder daily in 09/01-09/30, can be corresponding by the date Confidence level is shown in the form of block diagram, is as shown in figure 13 certain for finding that cycle, trend prediction etc. are further excavated Confidence level block diagram of the rule in 09/01-09/30.
In this implementation, the frequent item set and correlation rule that will be excavated are presented to user by visualization interface, are easy to use Family is checked, while it is easy to user further to be investigated, with interrelated between discovering device failure and maintenance measures Pattern and interdependent rule.
The above is the preferred embodiment of the present invention, it is noted that for those skilled in the art For, on the premise of principle of the present invention is not departed from, some improvements and modifications can also be made, these improvements and modifications Should be regarded as protection scope of the present invention.

Claims (10)

1. a kind of device fault information method for digging based on dynamic association rules, it is characterised in that including:
Obtain equipment fault repair message data set D;
The equipment fault repair message data set D of acquisition is divided into n Sub Data Set D according to reservation maintenance date1,D2,..., Dn, wherein, D={ D1,D2,...,Dn};
Dynamic association rules algorithm is defined, wherein, dynamic association rules are expressed as:Wherein, A, B point Item collection is not represented, and SV represents support vector, and CV represents confidence level vector, and s represents the support of item collection, and c represents correlation rule Confidence level,Represent the reasoning symbol of dynamic association rules;
According to the dynamic association rules algorithm of definition, to n Sub Data Set D1,D2,...,DnDynamic association rules excavation is carried out, Obtain the incidence relation between equipment fault reason and maintenance measures.
2. the device fault information method for digging based on dynamic association rules according to claim 1, it is characterised in that institute Stating acquisition equipment fault repair message data set D includes:
Obtain equipment fault repair message raw data set;
Concentrated from the equipment fault repair message initial data for obtaining, obtain objective attribute target attribute data;
Objective attribute target attribute data to obtaining are pre-processed, and the pretreatment includes:Attribute missing values in treatment objective attribute target attribute, The inconsistent value of attribute format and/or remove redundancy value.
3. the device fault information method for digging based on dynamic association rules according to claim 2, it is characterised in that After objective attribute target attribute data to obtaining are pre-processed, methods described also includes:
Discrete character is carried out to objective attribute target attribute data to pretreated, equipment fault repair message data set D is obtained.
4. the device fault information method for digging based on dynamic association rules according to claim 1, it is characterised in that institute State dynamic association rulesSupport vector SV be expressed as:
S V = [ s ( A ∪ B ) 1 , s ( A ∪ B ) 2 , ... , s ( A ∪ B ) n ]
s t . s ( A ∪ B ) i = f ( A ∪ B ) i | D i | , ( i ∈ { 1 , 2 , ... n } )
Wherein,Represent item collection A ∪ B in Sub Data Set DiIn support measurement, st. represents constraints,Represent Item collection A ∪ B are in Sub Data Set DiThe frequency of middle appearance, | Di| it is Sub Data Set DiIn record number.
5. the device fault information method for digging based on dynamic association rules according to claim 1, it is characterised in that institute State dynamic association rulesConfidence level vector CV be expressed as:
C V = [ c ( A ∪ B ) 1 , c ( A ∪ B ) 2 , ... c ( A ∪ B ) n ]
s t . c ( A ∪ B ) i = s ( A ∪ B ) i s X i , ( i ∈ { 1 , 2 , ... n } )
Wherein,Item collection A ∪ B are reflected in Sub Data Set DiIn confidence metric, st. represents constraints,For I-th element in the SV of item collection A ∪ B,It is i-th element in the SV of item collection A.
6. the device fault information method for digging based on dynamic association rules according to claim 1, it is characterised in that institute State dynamic association rulesSupport s be expressed as:
s = Σ i = 1 n f ( A ∪ B ) i M
Wherein, M is the record number in data set D,Represent item collection A ∪ B in Sub Data Set DiThe frequency of middle appearance.
7. the device fault information method for digging based on dynamic association rules according to claim 1, it is characterised in that institute State dynamic association rulesConfidence level c be expressed as:
c = s ( A ∪ B ) s X
Wherein, s(A∪B)It is the support of item collection A ∪ B, sXIt is the support of item collection A.
8. the device fault information method for digging based on dynamic association rules according to claim 1, it is characterised in that institute The dynamic association rules algorithm according to definition is stated, to n Sub Data Set D1,D2,...,DnDynamic association rules excavation is carried out, is obtained Include to the incidence relation between equipment fault reason and maintenance measures:
Algorithm is produced to produce the left item of dynamic association rules and right item using frequent item set the Sub Data Set on each time period;
Determine the support of the left item of the dynamic association rules, the support of the left item of dynamic association rules is expressed as:
s R L = Σ i = 1 n s R L i · M i M
Wherein,The support of the left item of dynamic association rules is represented,Represent the left item of dynamic association rules in time period tiOn Support vector, MiRepresent and time period tiOn Sub Data Set DiIn record number Mi, M represents total record in data set D Number;
If the support of the left item of dynamic association rules is more than default support threshold, it is determined that the dynamic association rules The support of the support of left and right Quan Xiang, the dynamic association rules or so Quan Xiang is expressed as:
s R = Σ i = 1 n s R i · M i M
Wherein, sRRepresent the support of dynamic association rules or so Quan Xiang, sRiRepresent dynamic association rules or so Quan Xiang in the time Section tiOn support vector;
By formulaDetermine the confidence level of dynamic association rules, wherein, c represents the confidence level of dynamic association rules;
Whether the confidence level of the dynamic association rules is judged more than default confidence threshold value, if being more than default confidence level threshold Value, the then dynamic association rules for being obtained according to excavation, the incidence relation between analytical equipment failure cause and maintenance measures.
9. the device fault information method for digging based on dynamic association rules according to claim 1, it is characterised in that institute Stating method also includes:
Setup time series regression model, predicts the development trend of dynamic association rules.
10. the device fault information method for digging based on dynamic association rules according to claim 1, it is characterised in that Methods described also includes:
On Interactive Visualization interface, according to the reservation maintenance date that user clicks on, it is linked to corresponding item collection and associates rule Result interface then;And/or,
On Interactive Visualization interface, the querying condition according to user input carries out matching inquiry, shows in the form of a list Corresponding Query Result, the querying condition includes:Item collection or correlation rule;And/or,
On Interactive Visualization interface, according to the correlation rule that user selects, the association rule are shown in the form of block diagram Confidence level then.
CN201710096618.6A 2017-02-22 2017-02-22 A kind of device fault information method for digging based on dynamic association rules Pending CN106874491A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201710096618.6A CN106874491A (en) 2017-02-22 2017-02-22 A kind of device fault information method for digging based on dynamic association rules

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201710096618.6A CN106874491A (en) 2017-02-22 2017-02-22 A kind of device fault information method for digging based on dynamic association rules

Publications (1)

Publication Number Publication Date
CN106874491A true CN106874491A (en) 2017-06-20

Family

ID=59169238

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201710096618.6A Pending CN106874491A (en) 2017-02-22 2017-02-22 A kind of device fault information method for digging based on dynamic association rules

Country Status (1)

Country Link
CN (1) CN106874491A (en)

Cited By (12)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN108319651A (en) * 2017-12-28 2018-07-24 南京烽火软件科技有限公司 A kind of internet information method of excavation
CN108335751A (en) * 2018-01-23 2018-07-27 上海孩子通信息科技有限公司 A kind of children's character evaluation method based on data mining
CN108363364A (en) * 2017-12-29 2018-08-03 武汉武钢众鹏信息系统有限公司 A kind of alarm method based on the driving of industrial big data
CN108763039A (en) * 2018-04-02 2018-11-06 阿里巴巴集团控股有限公司 A kind of traffic failure analogy method, device and equipment
CN110334796A (en) * 2019-06-28 2019-10-15 北京科技大学 A kind of association rule mining method and device of social security events
CN110442640A (en) * 2019-08-05 2019-11-12 西南交通大学 Subway fault correlation recommended method based on priori weight and multilayer TFP algorithm
CN111445099A (en) * 2019-01-17 2020-07-24 国网电子商务有限公司 Industrial production data analysis method and system based on association rule
CN111723941A (en) * 2020-06-02 2020-09-29 中国人民解放军军事科学院战争研究院 Rule generation method and device, electronic equipment and storage medium
CN112434104A (en) * 2020-12-04 2021-03-02 东北大学 Redundant rule screening method and device for association rule mining
CN112463847A (en) * 2020-10-30 2021-03-09 深圳市安云信息科技有限公司 Fault correlation analysis method and device based on time sequence data
CN112529232A (en) * 2019-08-30 2021-03-19 比亚迪股份有限公司 Station equipment fault prediction method and system and rail transit management system
CN114484735A (en) * 2022-03-11 2022-05-13 青岛海信日立空调系统有限公司 Multi-split system fault diagnosis and energy-saving potential identification method and multi-split system

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN102111296A (en) * 2011-01-10 2011-06-29 浪潮通信信息系统有限公司 Mining method for communication alarm association rule based on maximal frequent item set
CN103414581A (en) * 2013-07-24 2013-11-27 佳都新太科技股份有限公司 Equipment fault alarm, prediction and processing mechanism based on data mining
CN103701926A (en) * 2013-12-31 2014-04-02 小米科技有限责任公司 Method, device and system for obtaining fault reason information
CN105224616A (en) * 2015-09-18 2016-01-06 浪潮软件股份有限公司 A kind of based on seasonal effect in time series APRIORI algorithm improvement method

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN102111296A (en) * 2011-01-10 2011-06-29 浪潮通信信息系统有限公司 Mining method for communication alarm association rule based on maximal frequent item set
CN103414581A (en) * 2013-07-24 2013-11-27 佳都新太科技股份有限公司 Equipment fault alarm, prediction and processing mechanism based on data mining
CN103701926A (en) * 2013-12-31 2014-04-02 小米科技有限责任公司 Method, device and system for obtaining fault reason information
CN105224616A (en) * 2015-09-18 2016-01-06 浪潮软件股份有限公司 A kind of based on seasonal effect in time series APRIORI algorithm improvement method

Non-Patent Citations (3)

* Cited by examiner, † Cited by third party
Title
成都科技大学 等: "长期预报方案简介", 《高等学校教材 工程水文及水利计算》 *
沈斌: "一种新的动态关联规则及其挖掘算法", 《控制与决策》 *
荣冈等: "数据库中动态关联规则的挖掘", 《控制理论与应用》 *

Cited By (17)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN108319651B (en) * 2017-12-28 2022-02-15 南京烽火星空通信发展有限公司 Internet information mining method
CN108319651A (en) * 2017-12-28 2018-07-24 南京烽火软件科技有限公司 A kind of internet information method of excavation
CN108363364A (en) * 2017-12-29 2018-08-03 武汉武钢众鹏信息系统有限公司 A kind of alarm method based on the driving of industrial big data
CN108335751A (en) * 2018-01-23 2018-07-27 上海孩子通信息科技有限公司 A kind of children's character evaluation method based on data mining
CN108763039A (en) * 2018-04-02 2018-11-06 阿里巴巴集团控股有限公司 A kind of traffic failure analogy method, device and equipment
CN111445099A (en) * 2019-01-17 2020-07-24 国网电子商务有限公司 Industrial production data analysis method and system based on association rule
CN110334796A (en) * 2019-06-28 2019-10-15 北京科技大学 A kind of association rule mining method and device of social security events
CN110442640A (en) * 2019-08-05 2019-11-12 西南交通大学 Subway fault correlation recommended method based on priori weight and multilayer TFP algorithm
CN110442640B (en) * 2019-08-05 2021-08-31 西南交通大学 Subway fault association recommendation method based on prior weight and multilayer TFP algorithm
CN112529232A (en) * 2019-08-30 2021-03-19 比亚迪股份有限公司 Station equipment fault prediction method and system and rail transit management system
CN111723941B (en) * 2020-06-02 2021-09-24 中国人民解放军军事科学院战争研究院 Rule generation method and device, electronic equipment and storage medium
CN111723941A (en) * 2020-06-02 2020-09-29 中国人民解放军军事科学院战争研究院 Rule generation method and device, electronic equipment and storage medium
CN112463847A (en) * 2020-10-30 2021-03-09 深圳市安云信息科技有限公司 Fault correlation analysis method and device based on time sequence data
CN112434104A (en) * 2020-12-04 2021-03-02 东北大学 Redundant rule screening method and device for association rule mining
CN112434104B (en) * 2020-12-04 2023-10-20 东北大学 Redundant rule screening method and device for association rule mining
CN114484735A (en) * 2022-03-11 2022-05-13 青岛海信日立空调系统有限公司 Multi-split system fault diagnosis and energy-saving potential identification method and multi-split system
CN114484735B (en) * 2022-03-11 2023-08-15 青岛海信日立空调系统有限公司 Multi-split system fault diagnosis and energy-saving potential identification method and multi-split system

Similar Documents

Publication Publication Date Title
CN106874491A (en) A kind of device fault information method for digging based on dynamic association rules
Jiang et al. A hybrid approach of rough set and case-based reasoning to remanufacturing process planning
Hui et al. Data mining for customer service support
Sapia On Modeling and Predicting Query Behavior in OLAP Systems.
US9710815B2 (en) System, method, and computer program product for processing and visualization of information
WO2017183065A1 (en) Device and method for tuning relational database
CN112925901B (en) Evaluation resource recommendation method for assisting online questionnaire evaluation and application thereof
CN105260300A (en) Service test method based on CAS (General Classification Standards of China Accounting Standards) application platform
Hao et al. Business process impact visualization and anomaly detection
Suo et al. Computer assistance analysis of power grid relay protection based on data mining
Cecelja Manufacturing Information and Data Systems: Analysis, Design and Practice
Yang et al. Discovery of online shopping patterns across websites
US20200327125A1 (en) Systems and methods for hierarchical process mining
Narman et al. Enterprise architecture analysis for data accuracy assessments
Tsunoda et al. Software development productivity of Japanese enterprise applications
Ke et al. PBWA: A Provenance‐Based What‐If Analysis Approach for Data Mining Processes
Karami et al. Maintaining accurate web usage models using updates from activity diagrams
Niranjan et al. An efficient system based on closed sequential patterns for web recommendations
WO2020200750A1 (en) Method and system for operating an industrial automation system
Li et al. Reduction of the criteria system for identifying effective reservoirs in the joint operation of a flood control system
Riebisch et al. Introducing impact analysis for architectural decisions
Shahzad et al. Towards a goal-driven approach for business process improvement using process-oriented data warehouse
Rinaldi et al. A framework for a data quality module in decision support systems: an application with smart grid time
Hui et al. Application of data mining techniques for improving customer services
CN117539948B (en) Service data retrieval method and device based on deep neural network

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
RJ01 Rejection of invention patent application after publication

Application publication date: 20170620

RJ01 Rejection of invention patent application after publication