The content of the invention
In view of the above-mentioned problems, the present invention provides a kind of construction in the information system operation rule storehouse based on association rule mining
Method, information system operation rule storehouse is intelligently generated using fault-tree technology and Association Rule Mining, and use machine
Learning art optimizes to rule.Further, devise rule three domain structures, realize rule auto-sequencing and
Adjust automatically.
To realize above-mentioned technical purpose and the technique effect, the present invention is achieved through the following technical solutions:
A kind of building method in the information system operation rule storehouse based on association rule mining, it is characterised in that including such as
Lower step:
S01:Obtain the network topology architecture of information system and dynamic monitor control index and the static monitor control index of all devices;
S02:Network failure tree is generated by network topology architecture and the dynamic and static monitor control index of equipment, and passes through network
Fault tree generation primitive rule storehouse;
S03:Association rules mining algorithm is performed to the historical data of information system, obtains correlation rule storehouse;
S04:Generation extension rule storehouse is made inferences with reference to primitive rule storehouse and correlation rule storehouse;
Wherein, the retrieval priority of each rule base is:Primitive rule storehouse>Correlation rule storehouse>Extension rule storehouse.
It is preferred that each rule in primitive rule storehouse is three domain structures, that is, include,
Sequence of rules domain:The number that rule runs succeeded in the running of reality, the number of failure is performed, rule is most
Counting and rule compositor eventually;
Regular identification field:For identifying the regular subordinate object;
Regulatory body domain:For the detailed description to rule.
It is preferred that the real-time executing rule sort algorithm of system and regular flow algorithm carry out priority determination and rule to rule
Refresh.
Wherein, in each rule base, determine that rule is tested by the final counting index of rule in sequence of rules domain
The priority of rope, wherein, the formula that rule finally counts is:
F=R-0.5W
In formula, F counts to be final, and R is the number that rule runs succeeded in actual moving process, and W performs mistake for rule
The number lost;If carrying out machine learning to the scene for performing failure, to dependency rule by optimizing and solving relevant issues, then
The corresponding number W for performing failure subtracts one.
It is preferred that the regular flow algorithm in correlation rule storehouse is:During running, if rule once by
It is proved to be correct, moves directly to primitive rule storehouse;If the rule has is proved to mistake twice, the rule is deleted.
It is preferred that the regular flow algorithm in extension rule storehouse is:Usage history data verify strictly all rules,
For rule of the success rate 80%~100%, usage history data move directly to base after carrying out machine learning
This rule storehouse;
For rule of the success rate 60%~80%, after usage history data carry out machine learning, if success rate is big
Primitive rule storehouse is moved in 80%, otherwise continues to stay in extension rule storehouse, and receives the machine learning of service data, until
Its success rate is more than 80%;
For rule of the success rate 50%~60%, usage history data and service data carry out machine learning, until
Its success rate is more than 80%, is moved to primitive rule storehouse, otherwise continues to stay in extension rule storehouse;
It is less than 50% rule for success rate, directly deletes.
The present invention realizes information system operation rule storehouse dynamic construction and optimization, can be applied to company information O&M synthesis prison
Pipe platform, the foundation of monitoring alarm rule is set to be easier with maintenance, rule matching efficiency is higher, so as to adapt to information system rapidly
Object, running environment, the various change in running state data source, while meet that extensive INFORMATION SYSTEM PRECEPTS collection matching treatment is real
The requirement of when property, greatly improve the practicality of algorithm, the alarm of lifting information system monitoring, safety management, behavior auditing and conjunction
Advise management quality.
The beneficial effects of the invention are as follows:
First, the compartmentalization construction of rule base:The rule base of the inventive method design shares three subregions, and storage is basic respectively
Rule, the highest priority of correlation rule and extension rule, wherein primitive rule, correlation rule take second place, extension rule it is preferential
Level is minimum.By the subregion of rule base, the priority orders of rule search can be determined by the priority management of rule, and
Low area rule can learn to be upgraded by continuous real-time machine, the flowing of implementation rule from low to high.
2nd, three regular domain structures:Three domain structures of rule include sequence of rules domain, regular identification field and regulatory body
Domain:The priority ranking for the means implementation rule that sequence of rules domain passes through quantization;Regular identification field be used for identify the rule from
Belong to object, rule base when changing in order to network topology architecture adaptively adjusts;Regulatory body domain stores the main body of rule
Part, this is the detailed description to rule.
3rd, real-time adaptive threshold adjustment:System utility historical data and service data, analysis calculate suitable
The alarm threshold of service operation Alerting requirements, the alarm self-learning capability for information system is improved, using threshold value planning algorithm
Dynamic adjustment alarm threshold, accomplishes to reduce volume of event from the source of event, improves the quality of monitoring alarm.
4th, the automation analysis on its rationality of rule storage is increased newly:Newly-increased rule can be automatically generated by system, can also people
Work is added.For newly-increased rule, rationalization analysis is carried out to rule using historical data and real-time running data, it is determined that rule
Availability.
5th, regular adjust automatically optimization:By real-time executing rule sort algorithm and regular flow algorithm, to rule
Carry out priority determination and the refreshing or upgrading of priority, it is ensured that rule base is in optimum state, improves the recall precision of rule
With rule accuracy, so as to improve systematic function.
Embodiment
Technical solution of the present invention is described in further detail with specific embodiment below in conjunction with the accompanying drawings, so that ability
The technical staff in domain can be better understood from the present invention and can be practiced, but illustrated embodiment is not as the limit to the present invention
It is fixed.
A kind of building method in the information system operation rule storehouse based on association rule mining, as shown in figure 1, including as follows
Step:
S01:Obtain the network topology architecture of information system and dynamic monitor control index and the static monitor control index of all devices.
Network topology architecture is obtained by Topology Discovery technology first, then to each network equipment in topological structure,
Corresponding dynamic monitor control index and static monitor control index are gathered, including network index, safety index, main frame index, database refer to
Mark, middleware index and the major class of operation system index six.
Network index, which includes chain-circuit time delay, network equipment health operation duration, network device state, network equipment CPU, to be made
With rate, network equipment memory usage, receive packet loss, transmission packet loss, reception Packet Error Rate, transmission Packet Error Rate, interface
Flow, interface transmitted traffic, interface total flow and interface broad band availability;Safety index includes security incident, safety means
State (CPU, internal memory etc.) and compliance;Main frame index, which includes Host Status, healthy operation duration, CPU usage, internal memory, to be made
With rate, disk space utilization rate, critical processes number and host configuration information.
Database index has SqlServer indexs, Oracle indexs and DB2 indexs.Wherein SqlServer indexs include
SGA hit rate, available cache memory size, the hit rate of dictionary buffer, the hit rate of shared cache area, Redo log buffers area
Hit rate, number of sessions, available sessions quantity, transaction response time, table space availability, table space growth rate and MTS
Energy;Oracle indexs include number of sessions, available sessions quantity, transaction response time, table space availability, table space growth
Rate, shared drive utilization rate, shared drive hit rate and roll-back segment utilization rate;DB2 indexs include Process availabilitys, buffering
Pond (Bufferpool) availability, buffer pool hit rate, table space availability, table space growth rate, sequence index
(SortsPerTransaction), number of sessions and available sessions quantity.
Middleware index has Weblogic indexs and Websphere indexs.Wherein Weblogic indexs include JVM heaps
Free quantity, JVM heaps total amount, JVM heaps utilization rate, the single tune of execution duration, Servlet of all calling of Servlet
Most long execution duration, Servlet averagely performs duration, Servlet performs number, JDBC pool maximum capacities, JDBC
Pool has tired out since being flexibly connected the high-water line of the numbers to be connected such as the high-water line of number, JDBC Pool, JDBC Pool instantiations
Connection number, JDBC Pool mean activities connection number, the JDBC Pool of meter averagely connect time delay, the connection of JDBC Pool leakages
Number, the failure number of JDBC pool current capacities, JDBC Pool reconnect, JDBC Pool maximums can use connection number, JDBC
The maximum unavailable connection numbers of Pool, JDBC Pool LEAKED connections number, the available connection number in JDBC Pool, JDBC POOL
In unavailable connection number, JDBC Pool utilization rates, current sessions number, maximum number of sessions and session occupancy;
Websphere indexs include JVM internal memories free quantity, JVM memory amounts, JVM memory usages, average session life cycle, current
The total sessions of access, the total sessions currently survived, JDBC pool maximum capacities, JDBC Pool mean activities connection number,
JDBC Pool averagely connect time delay, the connection number of JDBC Pool leakages, JDBC pool current capacities, JDBC Pool again
The failure number of connection, JDBC Pool maximums can use the maximum unavailable connection number of connection number, JDBC Pool, JDBC Pool
Unavailable connection number and JDBC Pool profit in available connection number, JDBC POOL in LEAKED connections number, JDBC Pool
With rate.
Operation system index includes online user number, day login user number, service system running state, operation system interface
State and operation system health operation duration.
S02:Network failure tree is generated by network topology architecture and the dynamic and static monitor control index of equipment, and passes through network
Fault tree generation primitive rule storehouse.Each monitor control index and each net can be represented with concise by the structure of fault tree
Relation between network equipment.Wherein, the dependent thresholds in primitive rule storehouse by the machine learning to historical data and perform threshold
Value planning algorithm determines.
For primitive rule, three regular domain structures are devised, as shown in Fig. 2 including sequence of rules domain, regular identification field
With regulatory body domain.
Sequence of rules domain is used for number, time of execution failure that storage rule runs succeeded in the running of reality
The final counting of number, rule and rule compositor.Purpose existing for sequence of rules domain is for the ease of arranging the priority of rule
Sequence, improve the recall precision of rule.
Regular identification field is used for identifying the regular subordinate object, such as rule is the exclusive rule of some network equipment,
Or rule is subordinated to some subnet or whole network.Purpose existing for regular identification field is to enter rower to every rule
Know, when network topology structure changes, the rule deleted and changed can be needed by the identification field identification of rule,
And changed by regenerating corresponding primitive rule to the topological structure for changing part come the additions and deletions of implementation rule, intelligence construction
Adapt to the rule base of the new network architecture.
Regulatory body domain stores the main part of rule, and this is the detailed description to rule.Rule is exactly production rule
Then, a kind of fixed logic structural relation in people's thinking judgement is referred to.The structure of general production is represented by natural language
Form, in fact, in natural language expressing, people are widely used various " reasons-- result ", and " condition-conclusion " is " preceding
Carry-operate ", " fact-progress ", the structure such as " situation-behavior ", it can all be attributed to the knowledge representation form of production.Rule
Citation form:A → B or IF A THENB, A are the premises (former piece) of production, for pointing out whether the production can use
Condition.B is one group of conclusion or operation (consequent), for pointing out when the condition indicated by premise A meets, it should the knot drawn
By or the operation that should perform.The inference mode of production rule reasoning has three kinds of forward reasoning, backward inference and bidirection reasoning.
Three kinds of inference modes have corresponding advantage under different situations, consider when rule-based reasoning mode selects.
S03:Association rules mining algorithm is performed to the historical data of information system, obtains correlation rule storehouse, correlation rule
It is to be generated by association rule mining, and the rule examined by historical data.
It is preferred that using the improved Apriori algorithm based on branch's screening and optimizing strategy and database single sweep operation technology
To carry out the excavation of historical data correlation rule.Apriori algorithm is a kind of frequent item set algorithm of Mining Association Rules, algorithm
It is divided into two stages:Find frequent item set and by frequent item set mining correlation rule.Algorithm principle is found completely from data set
The frequent item set of sufficient minimum support, and then correlation rule is produced according to frequent item set.Apriori algorithm is one very classical
Association rules mining algorithm, but two drawbacks be present, many Candidate Sets can be produced finding frequent item set, waste a large amount of calculate
Efficiency and time, and Multiple-Scan database is needed, have a strong impact on efficiency of algorithm.For first problem, using Hash table and
Position container filters to Candidate Set, reduces consumption of the algorithm on Candidate Set is produced.Because the main consumption of classic algorithm exists
In C1, L1, C2, L2 generation, more branches are filtered in C2 generation, efficiency of algorithm can be greatly improved.For second
Individual problem, classic algorithm calculates support and is both needed to scan whole database every time, and calculates the frequency of support very in algorithm
Height, this just needs frequent scan database, causes efficiency of algorithm not high.So by safeguarding a Boolean matrix come record data
All transaction informations in storehouse, only need run-down database can to build Boolean matrix, and this Boolean matrix contains meter
All data that support needs are calculated, scan database again is avoided the need for later, substantially increases efficiency of algorithm.
By improved Apriori algorithm, rule digging can be associated to historical data, obtained result is in threshold value
Under the cooperation of planning algorithm, correlation rule storehouse can be intelligently generated.Correlation rule is excavated from historical data, is passed through
The inspection of historical data, Reliability ratio is higher, but to still suffer from some uncertain for correlation rule, it is necessary to passes through service data
Inspection can just upgrade to primitive rule.
Dependent thresholds in correlation rule storehouse by the machine learning to historical data and perform threshold value planning algorithm come really
It is fixed.
Primitive rule storehouse and correlation rule storehouse are in the determination of threshold value, utility historical data, and analysis calculates suitable
The alarm threshold of service operation Alerting requirements, improve the alarm self-learning capability for information system, optimization alarm logic, dynamic
Alarm threshold is adjusted, accomplishes to reduce volume of event from the source of event, improves the quality of monitoring alarm.
It is preferred that the threshold value planning algorithm of some index is:
Statistical analysis is carried out to historical data of the index under network normal operational condition, determines its maximum, minimum value
And median, then carry out threshold value as follows:
In formula, TiFor threshold value, DiFor the index maximum under network normal operational condition, XiFor network normal operational condition
Under index minimum value, MiFor the maximum of index Design, ZiFor the index median under network normal operational condition.
After rule base puts into operation, all virtual values of the index can be real-time under network normal operational condition
Participate in calculating, determine the threshold value of the index in real time.The adaptive dynamic modification of threshold value improves the ability of threshold value adaptive system, has
Beneficial to the raising of systematic function.
S04:Generation extension rule storehouse is made inferences with reference to primitive rule storehouse and correlation rule storehouse.
Rule is exactly production rule, refers to a kind of fixed logic structural relation in people's thinking judgement.The base of rule
This form:A → B or IF A THEN B, A are the premises (former piece) of production, for pointing out the whether available bar of the production
Part.B is one group of conclusion or operation (consequent), for pointing out when the condition indicated by premise A meets, it should the conclusion that draws or
The operation that should be performed.Extension rule can be directly generated by rule-based reasoning using primitive rule and correlation rule.Illustrate
It is bright, rule " A → B ", " B → C " and " A be present with correlation rule if deposited in primitive ruleD ", can be with by rule-based reasoning
Obtain three extension rules " B → C ", " D → B " " D → C ".
Extension rule is to be inferred by primitive rule and correlation rule Lai regular reasoning inherently exists uncertain
Property, so it is minimum to expand Rules control, it is necessary to (include checking and the service data of historical data by strict checking
Checking), can just upgrade to primitive rule.
Research information system operation monitoring alarm rule base constructing technology on the basis of, from the type of monitoring, data,
Source, alarm time, alert mode, performance data etc. are set about, by monitoring historical data and related daily O&M work
The analysis of single fault type, from the different time sections such as information system peak hours/period, idle period, the industry of combining information system
Business time and portfolio, understand the tide bulge and fall of business, utility historical data, and analysis calculates suitable service operation alarm
It is required that alarm threshold, improve for information system alarm self-learning capability, dynamic adjust alarm threshold, accomplish from event
Volume of event is reduced in source, improves the quality of monitoring alarm.
Rule base can be divided into three subregions by we, store different types of rule respectively, for example area storage is substantially
Rule base, 2nd area storage correlation rule storehouse, 3rd area storage extension rule storehouse.Wherein, the retrieval priority of each rule base is:Substantially
Rule base>Correlation rule storehouse>Extension rule storehouse.In the retrieving of rule, the primitive rule in an area is retrieved first,
If not finding corresponding rule, the correlation rule in the areas of ability Hui Dui bis- and the extension rule in 3rd area are retrieved.To 2nd area
Correlation rule and the extension rule in 3rd area are optimized by the machine learning to historical data to enter the adjust automatically of line discipline, this
Outside, dependency rule needs to retain by the reasonablencess check of historical data, otherwise directly removes the rule.
In addition, in each regular library partition, the priority of rule can be determined by rule compositor algorithm, is specifically
The regular priority being retrieved, the high rule precedence of priority are determined by the final counting index of rule in sequence of rules domain
Retrieval, the low rule of priority postpone retrieval, can so improve rule search efficiency.Wherein, the formula that rule finally counts
For:
F=R-0.5W
In formula, F counts to be final, and R is the number that rule runs succeeded in actual moving process, and W performs mistake for rule
The number lost;If carrying out machine learning to the scene for performing failure, to dependency rule by optimizing and solving relevant issues, then
The corresponding number W for performing failure subtracts one.
By the inspection of historical data and service data, can find out in the strictly all rules of rule base which be it is rational,
Which is irrational, and the reasonability of rule can be determined by the means of quantitative analysis, for example can pass through rule
The final counting index in sequence of rules domain carrys out the reasonability of quantizing rule in three domain structures.After analysis on its rationality by rule,
Intelligent rule can be further processed, such as, some rules meet system requirements by checking;Some rules are closed
Rationality is general, it is necessary to can just be used by machine learning;Some regular reasonability are poor, just may directly be deleted
.
Likewise, by the machine learning of historical data and service data, regular performance can be constantly lifted, is allowed to and is
Matching of uniting is higher, and provides corresponding performance and optimize and revise suggestion.For example threshold value is not unalterable, can pass through system
Service data enters the adaptive real-time learning of line discipline, improves regular reasonability.
In the design of rule base, also allow rule in flowing of the rudimentary region to premium area.Rule is from rudimentary region
Flowing to premium area, first is needed to the rational checking of rule, and second is to need, by machine learning, to improve constantly
The reasonability of rule.In the running of reality, optimization is automatically adjusted to rule by real-time service data dynamic:
By rule compositor algorithm, priority determination and sequence are carried out to the rule in the area of rule base one, 2nd area, 3rd area, pass through regular flow
The rule in the dynamic areas of algorithm Lai Dui bis- and 3rd area is upgraded or refreshed.
Wherein, the regular flow algorithm in correlation rule storehouse is:During running, if rule once by
It is proved to be correct, moves directly to primitive rule storehouse;If the rule has is proved to mistake twice, the rule is deleted.
The regular flow algorithm in extension rule storehouse is as shown in Figure 3:Usage history data verify strictly all rules,
For rule of the success rate 80%~100%, usage history data move directly to base after carrying out machine learning
This rule storehouse;
For rule of the success rate 60%~80%, after usage history data carry out machine learning, if success rate is big
Primitive rule storehouse is moved in 80%, otherwise continues to stay in extension rule storehouse, and receives the machine learning of service data, until
Its success rate is more than 80%;
For rule of the success rate 50%~60%, usage history data and service data carry out machine learning, until
Its success rate is more than 80%, is moved to primitive rule storehouse, otherwise continues to stay in extension rule storehouse;
It is less than 50% rule for success rate, directly deletes.
By that to the real-time progress priority adjustment of rule, rule base can be allowed to be in optimum state, improve the retrieval of rule
The accuracy of efficiency and rule, so as to improve systematic function.The priority adjustment of rule is extremely important, and conventional is regular and reasonable
Property higher rule ought to retrieved beforehand, the rule and the relatively low rule of reasonability being of little use can postpone retrieval, so can be with
The recall precision of rule is improved, so as to improve systematic function.
Furthermore it is also possible to certain operations are carried out by manual type, for example system operation maintenance personnel directly can increase and delete
Modified except rule, and to well-regulated association attributes.
The beneficial effects of the invention are as follows:
First, the compartmentalization construction of rule base:The rule base of the inventive method design shares three subregions, and storage is basic respectively
Rule, the highest priority of correlation rule and extension rule, wherein primitive rule, correlation rule take second place, extension rule it is preferential
Level is minimum.By the subregion of rule base, the priority orders of rule search can be determined by the priority management of rule, and
Low area rule can learn to be upgraded by continuous real-time machine, the flowing of implementation rule from low to high.
2nd, three regular domain structures:Three domain structures of rule include sequence of rules domain, regular identification field and regulatory body
Domain:The priority ranking for the means implementation rule that sequence of rules domain passes through quantization;Regular identification field be used for identify the rule from
Belong to object, rule base when changing in order to network topology architecture adaptively adjusts;Regulatory body domain stores the main body of rule
Part, this is the detailed description to rule.
3rd, real-time adaptive threshold adjustment:System utility historical data and service data, analysis calculate suitable
The alarm threshold of service operation Alerting requirements, the alarm self-learning capability for information system is improved, using threshold value planning algorithm
Dynamic adjustment alarm threshold, accomplishes to reduce volume of event from the source of event, improves the quality of monitoring alarm.
4th, the automation analysis on its rationality of rule storage is increased newly:Newly-increased rule can be automatically generated by system, can also people
Work is added.For newly-increased rule, rationalization analysis is carried out to rule using historical data and real-time running data, it is determined that rule
Availability.
5th, regular adjust automatically optimization:By real-time executing rule sort algorithm and regular flow algorithm, to rule
Carry out priority determination and the refreshing or upgrading of priority, it is ensured that rule base is in optimum state, improves the recall precision of rule
With rule accuracy, so as to improve systematic function.
The present invention realizes information system operation rule storehouse dynamic construction and optimization, can be applied to company information O&M synthesis prison
Pipe platform, the foundation of monitoring alarm rule is set to be easier with maintenance, rule matching efficiency is higher, so as to adapt to information system rapidly
Object, running environment, the various change in running state data source, while meet that extensive INFORMATION SYSTEM PRECEPTS collection matching treatment is real
The requirement of when property, greatly improve the practicality of algorithm, the alarm of lifting information system monitoring, safety management, behavior auditing and conjunction
Advise management quality.
The preferred embodiments of the present invention are these are only, are not intended to limit the scope of the invention, it is every to utilize this hair
The equivalent structure that bright specification and accompanying drawing content are made either equivalent flow conversion or to be directly or indirectly used in other related
Technical field, be included within the scope of the present invention.