CN104394149B - A kind of method of the Complex event processing based on parallel distributed framework - Google Patents

A kind of method of the Complex event processing based on parallel distributed framework Download PDF

Info

Publication number
CN104394149B
CN104394149B CN201410697724.6A CN201410697724A CN104394149B CN 104394149 B CN104394149 B CN 104394149B CN 201410697724 A CN201410697724 A CN 201410697724A CN 104394149 B CN104394149 B CN 104394149B
Authority
CN
China
Prior art keywords
stateful
parallel
event
stateless
sequence
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN201410697724.6A
Other languages
Chinese (zh)
Other versions
CN104394149A (en
Inventor
廉海明
郭旭东
谢小明
胡佳
胡大正
郭江
沈艳林
石波
沈德峰
吴朝雄
王红艳
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
706th Institute Of No2 Research Institute Casic
Original Assignee
706th Institute Of No2 Research Institute Casic
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 706th Institute Of No2 Research Institute Casic filed Critical 706th Institute Of No2 Research Institute Casic
Priority to CN201410697724.6A priority Critical patent/CN104394149B/en
Publication of CN104394149A publication Critical patent/CN104394149A/en
Application granted granted Critical
Publication of CN104394149B publication Critical patent/CN104394149B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L67/00Network arrangements or protocols for supporting network services or applications
    • H04L67/01Protocols
    • H04L67/10Protocols in which an application is distributed across nodes in the network

Landscapes

  • Engineering & Computer Science (AREA)
  • Computer Networks & Wireless Communication (AREA)
  • Signal Processing (AREA)
  • Data Exchanges In Wide-Area Networks (AREA)

Abstract

A kind of method of the Complex event processing based on parallel distributed framework belongs to network safety filed.The present invention resolves into the query statement of association analysis and operates the sequence of operation formed by stateful operation and stateless, at most have according to each subquery and only the principle comprising a stateful inquiry splits the sequence of operation, and cluster is divided into several sub-clusters according to the number of fractionation, then bundle inquiry is deployed to corresponding sub-cluster.The same operation for being so deployed to the different machines that with parallel computation, can be deployed to same subgroup between the different child-operation sequences of different subgroups can also parallel computation.It is parallel (parallel between inquiry) between i.e. different inquiries, is also parallel (parallel in inquiry) between same queries;It is parallel not only including being parallel between different operating in inquiry, and within same operation and parallel.I.e. each operation is parallel, can so solve inquiry and calculate the joint behavior bottleneck problem run into.

Description

A kind of method of the Complex event processing based on parallel distributed framework
Technical field
The invention belongs to technical field of network security, particularly a kind of network safety event Real-time Association Analyzing engine is based on The implementation method of distributed parallel framework.
Background technology
Security information and event management system technology provide a comprehensive visual angle for security related information.Security information and Event management system collects as security information and the final of event, and it is analyzed in real time, and the location of its occupy The core central of whole safety defense system.Current security information and event management system wide spread event-monitoring Ability, including active safety accident monitoring management, the passive automatic collection management of daily record.
The association analysis engine of current security information and event management system is a centralized solution, with The Log Source increase of security information and event management system is accessed, its data growth rate being collected into associates far faster than it The computing capability lifting speed of analysis engine.
Security information and event management system mainly take three kinds of thinkings in the increase in face of collecting data at present:First Kind thinking is to improve existing security information and the disposal ability of event management system association analysis engine, such as using complicated event The problem for the treatment of technology improves association analysis engine, and this kind of scheme can be with respite association analysis engine computing capability deficiency; Second of thinking is that security information to having not enough time to processing and event first cache, to wait further analysis, this kind of scheme Benefit be that security information and event will not be missed;Third Train of Thought is to take QoS (service quality to security information and event Control) strategy, invalid event, which does discard processing, to be thought to definition, the benefit of this kind of scheme is that security information and event can be done Quickly response.From a security standpoint, three of the above thinking is all defective:
Complex event processing techniques are taken, strengthen the technology energy of security information and event management system association analysis engine Power, compared to former, certain association analysis computing capability is added, but this increase can not catch up with the increase for calculating data Speed, the computing capability of association analysis engine can not be improved completely;
Take caching technology, it is ensured that security information and event will not abandon, and ensure the comprehensive of analysis, but can not Meet the promptness found safely, the analysis result delayed is to safe disposal almost without too many use;
Take qos policy, it is ensured that the instant association analysis of security information and event, and summary responses are made, still The event rules of discarding are that people formulates according to existing knowledge, may therefore omit the security incident of key, cause to analyze As a result it is incomplete.
The content of the invention
The present invention is to solve the predicament of three of the above thinking, proposes a kind of complicated thing based on parallel distributed framework Part processing method, i.e., with reference to complex event processing techniques, distributed computing technology and concurrent technique, parallel complicated thing in a distributed manner Part processing engine substitutes the processing engine of current centralization, thoroughly to solve the technological fix that Present solutions face.
The characteristics of present invention combines complex event processing techniques, the behaviour that complex event processing techniques are handled event Several atom operations are resolved into, and atom operation is divided into two classes:Stateless operates and stateful operation.So-called stateless operation, That is the processing of the processing operation to event below and event above operates onrelevant.So-called stateful operation, to thing below The processing operation of part needs to carry out processing operation according to the processing operation for the above event that combines, and passes through slide window implementation.Division Stateless operation include:Mapping, conversion operation, filter operation, joint operation;Stateful operation includes:Converging operation, connection Operation.
The flow of the inventive method is as shown in Figure 1.
The inventive method comprises the following steps that:
Step 1:Obtain the query statement for association analysis;
Step 2:Query statement is resolved into a string operation sequence according to stateful operation defined above and stateless operation Row;
Step 3:Child-operation sequence (subquery sentence) is resolved into according to the stateful operation handlebar sequence of operation (query statement), I.e. how many stateful operation is just divided into the how many sub- sequences of operation.The first sub- sequence of operation wherein divided is at least wrapped Stateless operation is included, only can be made up of stateless operation, and the child-operation sequence contains first stateful operation All stateless operation before.Its first operation of child-operation sequence comprising stateful operation must be stateful operation, The child-operation also includes all stateless operation before next stateful operation.
Step 4:The quantity of subquery is divided into according to step 3 query statement, cluster is divided into according to the number identical The subgroup of quantity, query statement according to the child-operation sequence distributed deployment that step 2 is divided into ready-portioned each subgroup On, each machine in subgroup disposes corresponding child-operation.
The present invention is resolved into the query statement of association analysis by stateful operation and nothing with reference to complex event processing techniques The sequence of operation of state operation composition, at most has according to each subquery and the principle only comprising a stateful inquiry is operation Sequence is split, and cluster is divided into several sub-clusters according to the number of fractionation, and then bundle inquiry is deployed to corresponding Sub-cluster.Same subgroup can be deployed to parallel computation by being so deployed between the different child-operation sequences of different subgroups The same operations of different machines can also parallel computation.It is parallel (parallel between inquiry) between i.e. different inquiries, it is identical (parallel in inquiry) between inquiry and parallel;It is parallel not only including being parallel (operation room between different operating in inquiry Parallel), and within same operation and parallel (parallel in operation).I.e. each operation is parallel, can so be solved Inquiry calculates the joint behavior bottleneck problem run into.
Brief description of the drawings
Fig. 1 is flow chart of the present invention.
Fig. 2 query statements decompose
Fig. 3 sequences of operation divide
Fig. 4 inquiry deployment
Embodiment
With reference to flow chart, preferred embodiment is elaborated, it should be emphasised that, the description below is only example Property, the scope being not intended to be limiting of the invention and its application.
Binding events stream defines first, and to complicated event stream, member behaviour is defined:
Map operation Map, it is defined as follows:
An incoming event stream S is given, one group of orderly transformed representation is defined according to user, such asAttribute conversion is carried out to the event of input and exports transformation result event.
Filter operation Filter, it is defined as follows:
Filter{(P1,O1),...,(Pm,Om),Om+1}(S)
An incoming event stream S is given, orderly according to predefined one group of user asserts set { (P1,O1),..., (Pm,Om),Om+1, the event forwarding of each input asserts corresponding output to first that the event is adapted to, that is, meets and assert Pm, then it is output to OmDo not complyed with if all asserted, be output to Om+1
Joint operation Union, it is defined as follows:
Union(S1,...,Sm)
The operation merges inlet flow S1,...,Sm, and the stream of a merging is exported, wherein these stream mode are identical.Output stream Pattern it is identical with the pattern of inlet flow.
Converging operation Aggregate, it is defined as follows:
An event inlet flow S is given, performs sliding window on the inlet flow, and it is orderly using one group on window Aggregate functionThe attribute of each individually event that window is kept is true by the parameter of " Group-by " It is fixed.Wherein window is by the size size of window, two parameter definitions of sliding step advance.Window type WType can be divided into base In time (Time) and the class of tuple (Tuple) two.
Attended operation Join, it is defined as follows:
Join{P,size,advance}(Sl,Sr)
Assert P to from from the stream S of left and right twol、SrEvent el、erJudged, size is window size, is slided herein Dynamic step-length advance is equal to window size size.
Example starts below:
Step 1:Obtain the query statement for association analysis.
Such as:We are interested:In 180 seconds, access destination interface 25,80,135 time scold section [15, 30] IP user on earth how many.Because this is probably the result of worm outburst.
The domain of the pattern of incoming event includes:ID (event identifier), SRC_IP (source address IP), SRC_Port (source addresses Port), DST_IP (destination address IP), DST_Port (destination address port), timestamp (Event Timestamp), Info (its His information).
Step 2:Query statement is resolved into a string operation sequence according to stateful operation defined above and stateless operation Row.
Query statement above can operate that (Map is operated by 2 stateful operations (converging operation A1, A2) and 3 stateless M and Filters operations F1, F2) composition.Specific each Operation Definition is as follows:
Map operations M is defined as follows:
M{SRC_IP←SRC_IP,DST_Port←DST_Port,Time←Time}(I,OM)
Map operations M completes to cut the Attribute domain of incoming event stream I tuple, and it is unwanted to abandon subsequent arithmetic Domain, i.e., remove the domains such as ID, SRC_Port, Info in this example.Map operations M next operation is Filters operation F1, Operate M output streams OMIt is the inlet flow of F1 operations.
Filters operations F1 is defined as follows:
F1 DST_port==25 | | DST_Port==80 | | DST_Port==135 } (OM,OF1)
F1 operations only forward the event tuple that port is equal to 25,80,135.Filters operations F1 next operation is poly- Closing operation A1, F1 output stream OF1It is the inlet flow of A1 operations.
Aggregate operations A1 is defined as follows:
A1 { Time, 180,1, Access ← count (), Group-by=(SRC_IP) } (OF1,OA1)
Converging operation A1 counts the number (Access) that each IP address in 180 seconds accesses port above.Converging operation A1 Next operation be Filters operation F2, converging operation A1 output stream OA1It is the inlet flow of F2 operations.
Filters operations F2 is defined as follows:
F2{Access≥15∧Access≤30}(OA1,OF2)
F2 operations are less than 15 times and the event tuple more than 30 times for filtering out access times, only forward at access times Event tuple in [15,30] section.Filters operations F2 next operation is converging operation A2, operates F2 output stream OF2It is A2 inlet flow.
Aggregate operations A2 is defined as follows:
A2 { Time, 180,1, SRC_Users ← count (), Group-by=(DST_IP) } (OF2,O)
Converging operation A2 is used for counting the number (SRC_Users) for the IP address for meeting conditions above.Operate A2 output Stream O is final calculation result.
The query statement repertoire of step 1 hypothesis can be completed above.
Step 3:Child-operation sequence (subquery sentence) is resolved into according to the stateful operation handlebar sequence of operation (query statement), I.e. how many stateful operation is just divided into the how many sub- sequences of operation.The first sub- sequence of operation wherein divided can also Only it is made up of stateless operation, and the child-operation sequence contains all stateless behaviour before first stateful operation Make.Its first operation of child-operation sequence comprising stateful operation must be stateful operation, and the child-operation is also comprising next All stateless operation before individual stateful operation.
Query statement above can do simple abstract by operation splitting, be illustrated in fig. 2 shown below:
According to subquery partition strategy, i.e., above-mentioned query statement can do following division, specific as shown in Figure 3:
That is first subquery includes operation M and F1, and second subquery includes operation A1 and F2, and the 3rd operation includes A2。
Step 4:The quantity of subquery is divided into according to step 3 query statement, cluster is divided into according to the number identical The subgroup of quantity, query statement according to the child-operation sequence distributed deployment that step 3 is divided into ready-portioned each subgroup On, each machine in subgroup disposes corresponding child-operation.
Assuming that the number of nodes of the cluster of deployment is 90, with reference to the subquery sentence number divided above, assemblage classification is 3 Sub-cluster, each 30 nodes of sub-cluster.
Query statement finally disposes effect such as Fig. 4:
This kind inquiry deployment way is smaller with respect to cost, and solves inquiry well and calculate the joint behavior bottle run into Neck problem.
Child-operation sequence division principle is given below and inquires about the foundation of deployment way:
For the cost of parallel computation, the cost that 2 parameters represent parallel is defined:It is fanned out to expense, hop count expense.It is fanned out to out Sell (f expressions):In system-wide, any calculate node and other calculate nodes establish UNICOM and maintain the generation of communication pipe Valency.That is the number of conduits of calculate node and other node UNICOMs is higher, and cost is bigger.Hop count expense (h expressions):Expression is being run The cost that (transfer) tuple is spent is moved between the operation of different nodes.A tuple is shifted between different nodes, Each tuple is that sequence is good before sending, is then out of order in receiving terminal.It should be noted that a given inquiry, this The serial operation pair of all operations of impacts on different nodes.For any paralleling tactic, cost function is defined as follows:
C (X)=α f (X)+β h (X)
Wherein, α, β ∈ [0,1] are two arbitrary weights.
Inquiry calculates distributed deployment method and might at first be thought that 2 kinds:
1st, full Query distribution deployment (FQ expressions):I.e. whole inquiry is deployed in each calculate node, each node Affix one's name to same inquiry.Assuming that cluster has N number of calculate node, then each node of N number of node is required for keeping with N-1 node Communication, it is fanned out to the quadratic power (N that expense is N2).Hop count expense and stateful operation number are directly proportional, because tuple exists Need to allocate again before each stateful operation., should for having s stateful operations and N number of calculate node in an inquiry The cost of distributed deployment strategy is:
2nd, single operation distribution deployment (SO expressions):Inquire about all operationss included and be deployed to each sub-cluster respectively On.It is as follows in the cluster for operating the volume formed inquiry by l and N number of calculate node forms, the cost of the deployment strategy:
Both the above inquiry deployment strategy cost can be compromised in fact.In full Query distribution deployment strategy, it is most The small expense (only between stateful operation) for having changed hop count, while maximise and be fanned out to expense (between all nodes). In single operation branch deployment strategy, it maximises hop count expense (between each pair continuous operation), while minimizes and be fanned out to out Pin.Inquiry deployment strategy that the present invention provides while minimize the two costs.The hop count of i.e. each tuple is equal to stateful The number of operation is (if operated before first stateful operation without stateless, present invention assumes that first stateful operation Before have stateless operation).The cost of deployment strategy (mixed strategy Mix expressions) of the present invention is as follows:
Obvious c (FQ)>C (Mix), c (SO)>c (Mix).The inquiry dispositions method cost of the present invention belongs to relatively small.

Claims (1)

  1. A kind of 1. method of the Complex event processing based on parallel distributed framework, it is characterised in that:
    Several atom operations are resolved into the operation that complex event processing techniques are handled event, and atom operation is divided into two Class:Stateless operates and stateful operation;So-called stateless operation, i.e. the processing operation to event below and event above Processing operation onrelevant;So-called stateful operation, the processing operation to event below are needed according to the place for combining above event Reason operation carries out processing operation, passes through slide window implementation;The stateless operation of division includes:Mapping, conversion operation, filtering behaviour Work or joint operation;Stateful operation includes:Converging operation or attended operation;
    Comprise the following steps that:
    Step 1:Obtain the query statement for association analysis;
    Step 2:Query statement is resolved into a string of sequences of operation according to stateful operation defined above and stateless operation;
    Step 3:It is that query statement resolves into child-operation sequence i.e. subquery sentence according to the stateful operation handlebar sequence of operation, that is, has How many stateful operations are just divided into the how many sub- sequences of operation;The first sub- sequence of operation wherein divided comprises at least nothing The operation of state, or be only made up of stateless operation, and the child-operation sequence contains first stateful operation Preceding all stateless operation;Its first operation of child-operation sequence comprising stateful operation must be stateful operation, should Child-operation also includes all stateless operation before next stateful operation;
    Step 4:The quantity of subquery sentence is divided into according to step 3 query statement, cluster is divided into according to the quantity identical The subgroup of quantity, query statement according to the child-operation sequence distributed deployment that step 2 is divided into ready-portioned each subgroup On, each machine in subgroup disposes corresponding child-operation.
CN201410697724.6A 2014-11-26 2014-11-26 A kind of method of the Complex event processing based on parallel distributed framework Active CN104394149B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201410697724.6A CN104394149B (en) 2014-11-26 2014-11-26 A kind of method of the Complex event processing based on parallel distributed framework

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201410697724.6A CN104394149B (en) 2014-11-26 2014-11-26 A kind of method of the Complex event processing based on parallel distributed framework

Publications (2)

Publication Number Publication Date
CN104394149A CN104394149A (en) 2015-03-04
CN104394149B true CN104394149B (en) 2017-12-12

Family

ID=52611984

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201410697724.6A Active CN104394149B (en) 2014-11-26 2014-11-26 A kind of method of the Complex event processing based on parallel distributed framework

Country Status (1)

Country Link
CN (1) CN104394149B (en)

Families Citing this family (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
EP3128423A1 (en) * 2015-08-06 2017-02-08 Hewlett-Packard Enterprise Development LP Distributed event processing
CN105930407B (en) * 2016-04-18 2019-05-17 北京思特奇信息技术股份有限公司 A kind of inter-library relation query method of distributed data base and system
CN106168983B (en) * 2016-08-08 2019-12-20 中国农业银行股份有限公司 Mixed resource processing method and device
US11120001B2 (en) 2017-11-30 2021-09-14 International Business Machines Corporation Table discovery in distributed and dynamic computing systems
US10616041B2 (en) 2017-11-30 2020-04-07 International Business Machines Corporation Collaborative triggers in distributed and dynamic computing systems
US11475000B2 (en) * 2017-11-30 2022-10-18 International Business Machines Corporation Aggregation in dynamic and distributed computing systems
CN110109957B (en) * 2019-03-29 2021-10-01 奇安信科技集团股份有限公司 Streaming event correlation matching method and device

Family Cites Families (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JP5598235B2 (en) * 2010-10-05 2014-10-01 富士通株式会社 Composite event processing apparatus and composite event processing method
CN102508640B (en) * 2011-10-27 2015-04-29 西北工业大学 Distributed radio frequency identification device (RFID) complex event detection method based on task decomposition
CN103593721B (en) * 2012-08-13 2016-10-05 中国商用飞机有限责任公司 Business process monitoring method based on Complex event processing
US9390135B2 (en) * 2013-02-19 2016-07-12 Oracle International Corporation Executing continuous event processing (CEP) queries in parallel

Also Published As

Publication number Publication date
CN104394149A (en) 2015-03-04

Similar Documents

Publication Publication Date Title
CN104394149B (en) A kind of method of the Complex event processing based on parallel distributed framework
CN106790718A (en) Service call link analysis method and system
CN110213198A (en) The monitoring method and system of network flow
CN104754640B (en) A kind of network resource scheduling method and network resource management server
Zhao et al. A unified modeling framework for distributed resource allocation of general fork and join processing networks
WO2015154484A1 (en) Traffic data classification method and device
CN104486116A (en) Multidimensional query method and multidimensional query system of flow data
CN108123962A (en) A kind of method that BFS algorithms generation attack graph is realized using Spark
CN107622064A (en) A kind of method for reading data and system
CN103916478B (en) The method and apparatus that streaming based on distributed system builds data side
CN113609201A (en) Service data processing method and system
CN104243348A (en) Data processing method and device
CN105516016B (en) A kind of packet filtering system and packet filtering method based on stream using Tilera multinuclears accelerator card
Takagiwa et al. SoR-based programmable network for future software-defined network
Aslam et al. Pre‐filtering based summarization for data partitioning in distributed stream processing
Yang et al. An efficient pipeline processing scheme for programming protocol-independent packet processors
CN104135478B (en) First Capacity Management collocation method based on XML
Tang et al. Intelligent awareness of delay-sensitive internet traffic in digital twin network
Xu et al. Object-oriented big data security analytics: A case study on home network traffic
Heavey et al. Provable sensor sets for epidemic detection over networks with minimum delay
CN107943807A (en) A kind of data processing method and storage device
CN104346340B (en) A kind of method and apparatus of storage resource describing framework data
CN106161277A (en) A kind of parallel network flow sorting technique based on body
Wang et al. Research on key technologies of intelligent operation and maintenance of communication Network
Nie et al. Simulation and analysis of campus network based on OPNET

Legal Events

Date Code Title Description
C06 Publication
PB01 Publication
C10 Entry into substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant