CN103716288A - System and method for data processing - Google Patents

System and method for data processing Download PDF

Info

Publication number
CN103716288A
CN103716288A CN201210376910.0A CN201210376910A CN103716288A CN 103716288 A CN103716288 A CN 103716288A CN 201210376910 A CN201210376910 A CN 201210376910A CN 103716288 A CN103716288 A CN 103716288A
Authority
CN
China
Prior art keywords
pattern
data
specified network
application layer
network
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN201210376910.0A
Other languages
Chinese (zh)
Other versions
CN103716288B (en
Inventor
唐文
沃尔夫冈·施密德
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Siemens AG
Original Assignee
Siemens AG
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Siemens AG filed Critical Siemens AG
Priority to CN201210376910.0A priority Critical patent/CN103716288B/en
Publication of CN103716288A publication Critical patent/CN103716288A/en
Application granted granted Critical
Publication of CN103716288B publication Critical patent/CN103716288B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Abstract

The invention relates to a method and system for data processing. The system comprises: an interface, which is used for receiving data of a designated network service; and a self-learning engine, which is used for utilizing a grammatical inference to generate the mode of the designated network service according to the received data. And the data of the designated network service contains an application layer load of a data message of the designated network service. With the method and the system, the tedious work of manual deep packet inspection (DPI) configuration mode can be omitted and thus various errors caused by the manual operation can be avoided.

Description

System and method for data processing
Technical field
The present invention relates to network safety filed, more specifically, relate to the deep packet inspection technology in network safety filed.
Background technology
It is the network message processing unit that the payload (from layer 3 to layer 7) to data message detects that deep message detects (DPI), for checking the agreement inconsistency, virus, spam, invasion etc. of Network, or based on other predefined patterns or rule, check Network, thereby determine to take what action to the data message in Network.DPI combines intruding detection system (IDS) and the function of IPS (IPS) and traditional status firewall.This combination has upwards expanded to application layer by the filtration of traditional firewall and detection from tcp/ip layer, and can check the data message stream of whole Network.DPI can based on message data payload in the corresponding signature database of information that extracts Network is identified and is classified, thereby than the classification based on data message head only, realize meticulousr control.Therefore, DPI can be used in and stops various attacks.Sorted data message can be redirected, marks/tag, prevention, rate limit and record be for analyzing in the future.
Because DPI makes it possible to realize high-level network management, user's service, safety function and internet data excavation etc., current many enterprises, company, ISP and government are used DPI in application widely.
Be similar to traditional IDS/IPS, the ability of DPI depends on mode-matching technique.In practice, DPI is the group mode (or signature) based on describing network traffic characteristics, and this pattern (or signature) has been described the characteristic of legitimate traffic (positive mode) or attack traffic (negative mode).DPI only can detect the procotol business with known mode (positive mode), or identifies the network attack being defined by negative mode.Therefore, pattern is the core roles in DPI.
But current, definition DPI pattern mainly depends on manual operation, and engineer or user must describe with regular expression or other Formal Languages the characteristic of procotol business model or attack mode.Fig. 1 shows a kind of deep message of the prior art and detects (DPI) system 100, and it comprises pattern container 103 and DPI matching engine 105.At above-mentioned deep message, detect in (DPI) system 100, engineer or user must define legitimate traffic pattern (positive mode) or attack mode (negative mode) with regular expression or other Formal Languages in artificially, and these patterns are input in pattern container 103 and are stored.Afterwards, DPI matching engine 105 can be used the various patterns of storage in pattern container 103 to carry out identification and the classification of business.At above-mentioned deep message, detect in (DPI) system 100, increase, delete still the artificial participation that new model more all needs engineer or user, obvious, this mode is complicated, with high costs and easily makes mistakes.
Summary of the invention
Consider the above problem of prior art, the embodiment of the present invention provides a kind of scheme for data processing, and it can eliminate the loaded down with trivial details work of human configuration DPI pattern, and the various mistakes that can avoid manual operation to cause.
A kind of system for data processing according to the embodiment of the present invention, comprising: interface, and it is for receiving the data of specified network business, and wherein, described packet is containing the application layer load of the data message of described specified network business; And, self study engine, it is for utilizing grammatical inference to produce the pattern of described specified network business according to the data of described reception.
Wherein, described system also comprises: matching engine, based on described pattern, check the Network type under data message to be monitored, i.e. and the Network of appointment is legal Network or illegal Network.
Wherein, the data of described specified network business are described application layer load, and described self study engine comprises self study kernel, for described application layer load being carried out to grammatical inference to generate the pattern of compressed format structure, as the pattern of described specified network business.
Wherein, the data of described specified network business are described data messages, and described self study engine comprises: preprocessor, for extract described application layer load from described data message; And self study kernel, carries out grammatical inference to generate the pattern of compressed format structure, as the pattern of described specified network business for the application layer load to extracted.
Wherein, described self study engine also comprises: translater, and for the pattern of described compressed format structure being converted to the pattern of formal grammar, as the pattern of described specified network business.
Wherein, described system also comprises: pattern container, it is for storing the pattern of described generation.
Wherein, described specified network business comprises at least one in legal Network and illegal Network.
A kind of method for data processing according to the embodiment of the present invention, comprising: receive the data of specified network business, wherein, described packet is containing the application layer load of the data message of described specified network business; And, according to the data of described reception, utilize grammatical inference to produce the pattern of described specified network business.
Wherein, described method also comprises: based on described pattern, check the Network type under data message to be monitored.
Wherein, the data of described specified network business are described application layer load, and described generation step comprises: described application layer load is carried out to grammatical inference to generate the pattern of compressed format structure, as the pattern of described specified network business.
Wherein, the data of described specified network business are described data messages,
Described generation step comprises: from described data message, extract described application layer load; And, extracted application layer load is carried out to grammatical inference to generate the pattern of compressed format structure, as the pattern of described specified network business.
Wherein, described generation step also comprises: the pattern of described compressed format structure is converted to the pattern of formal grammar, as the pattern of described specified network business.
Wherein, described method also comprises: the Network pattern of storing described generation.
Wherein, described specified network business comprises at least one in legal Network and illegal Network.
Accompanying drawing explanation
Be described in detail with reference to the attached drawings the present invention, should be appreciated that accompanying drawing and corresponding description are appreciated that it is illustrative and nonrestrictive, wherein:
Fig. 1 shows a kind of deep message detection system of the prior art;
Fig. 2 shows the schematic diagram of the system for data processing according to an embodiment of the invention;
Fig. 3 shows the schematic diagram of the system for data processing according to another embodiment of the invention;
Fig. 4 shows the schematic diagram of self study engine according to an embodiment of the invention;
Fig. 5 shows the flow chart of the method for data processing according to an embodiment of the invention; And
Fig. 6 shows the schematic diagram of the equipment for data processing according to another embodiment of the invention.
Embodiment
Below the mode with clearly understandable is also come the above-mentioned characteristic of the present invention, technical characterictic, advantage and implementation thereof to be further described by description of a preferred embodiment by reference to the accompanying drawings.Should be appreciated that these preferred embodiments are only for the present invention is carried out to example explanation, it is also nonrestrictive.
For deep message testing product, its pattern is the core of their abilities that Network is detected.But, for user or engineer, according to the legal feature of procotol business or attack signature by carry out artificially defining mode with regular expression or other Formal Languages be complicated, consuming time and easily make mistakes.In addition, this has also increased the maintenance cost of deep message testing product.
Grammatical inference (grammatical inference), also referred to as the syntax, conclude or syntactic pattern identification, refer to a kind of machine learning techniques, thereby this machine learning techniques constructs from the one group of formalization syntax corresponding to sample learning the model of having considered described sample properties.Grammatical inference is the particular example of inductive learning, and this inductive learning process for finding the process of its public structure from sample.Conventionally sample and/or negative sample are used for above-mentioned inductive learning process certainly.Certainly sample is one group of character string based on specific character list definition.Negative sample is the one group of character string that does not belong to object language, but it can be helpful to described deduction process.
In embodiments of the present invention, by grammatical inference being introduced to information technology security fields, deep message testing product can be subordinated to the pattern of the legal or illegal Network of automatic learning in the sample data of legal or illegal Network, wherein, these patterns can for example, be defined by formal expression formula (, regular expression).Here, legal Network is such as the Network that is the aspect regulations such as the syntax, semanteme and the sequential logic that meet protocol specification.Illegal Network is such as being the Network that does not meet the aspect regulations such as syntax, semanteme or sequential logic in protocol specification, such as, the Network being formed by the attack data message of the vicious syntax of tool, illegal semantic or illegal temporal aspect.Here, be subordinated to the pattern that automatic learning in the sample data of legal Network obtains legal Network, be subordinated to the pattern that automatic learning in the sample data of illegal Network obtains illegal Network.
The invention provides a kind of method and system for data processing, for utilize grammatical inference automatically to obtain the pattern of legal and/or illegal Network according to the data of legal and/or illegal Network, and these patterns can be used to carry out deep message detection.The method and system have been avoided complexity and the cost of artificial these patterns of definition.
Fig. 2 shows the schematic diagram of the system for data processing according to an embodiment of the invention.As shown in Figure 2, system 200 can comprise configuration interface 201, self study engine 202, pattern container 203 and network interface 204.
As shown in the figure, configuration interface 201 can be for receiving Network file 206 and this Network file 206 being input to self study engine 202.Network interface 204 can be for receiving Network 207 and Network 207 being input to self study engine 202.This Network file 206 and Network 207 comprise the data message that belongs to legal and/or illegal Network.Self study engine 202 can obtain the data message that belongs to legal and/or illegal Network from configuration interface 201 and/or network interface 204, and the training data using the included application layer load of obtained data message as pattern learning, then, self study engine 202 can carry out grammatical inference to produce the pattern of legal and/or illegal Network to this application layer load, wherein, the algorithm of grammatical inference can be but be not limited to PRNI, PRNI2, EDSM or SAGE etc.Pattern container 203 can receive the pattern producing and these patterns are stored from self study engine 202.
In addition, system 200 can also comprise matching engine 205, and it is the pattern in 203 storages of pattern container for basis, and which kind of Network the data message detecting in communication service belongs to.
It will be appreciated by those skilled in the art that system 200 can utilize software, hardware (such as integrated circuit, field programmable gate array (FPGA) etc.) or the mode of software and hardware combining to realize.
A plurality of parts in system 200 can be arranged in same encapsulation, for example, be integrated in an integrated circuit, and still, they can be to be also arranged in different encapsulation, the communication as long as they can intercouple.Can for example, by various connected modes (, wired connection mode, wireless connections mode or both have concurrently), carry out the coupling between parts shown in Fig. 2.And, certain parts in above-mentioned a plurality of parts also can be integrated in miscellaneous part, for example, the function of pattern container 203 can be integrated in self study engine 202 or matching engine 205, also be, can be by produced pattern storage in self study engine 202 or matching engine 205, and no longer need to provide in addition shown in the independent following texts and pictures 3 of pattern container 203().
Although a plurality of parts of system 200 have been shown in Fig. 2, should be appreciated that each in these parts is not necessary.According to the difference of actual conditions, can omit one or more parts, or can omit the one or more coupled relations between these parts.For example, any one or said two devices that can be from configuration interface 201 and network interface 204 due to self study engine 202 obtain data, therefore in the situation that not using Network file 206, can omit configuration interface 201, or, in the situation that not using Network 207, can not need network interface 204 to be coupled to self study engine 202.In addition, also can in system 200, increase some unshowned parts or some parts are wherein modified or replaced according to actual needs.For example, can to self study engine 202, provide data with other interfaces.
Fig. 3 shows the schematic diagram of the system for data processing according to another embodiment of the invention.Compare with the system 200 shown in Fig. 2, the system 300 shown in Fig. 3 does not comprise independent pattern container, but the pattern that storage produces in self study engine 202 or matching engine 205.
Fig. 4 shows the schematic diagram of self study engine according to an embodiment of the invention.As shown in Figure 4, self study engine 202 can comprise preprocessor 421, self study kernel 422 and transducer 423.Wherein, preprocessor 421 can be for extracting application layer load as training sample the data message that belongs to legal and/or illegal Network from self-configuring interface 201 and/or network interface 204 receptions.Self study kernel 422 can carry out grammatical inference to produce the pattern of compressed format structure for the training sample to extracted, and wherein, compressed format structure can be for example but be not limited to definite finite automata (DFA).Transducer 423 is for the pattern of this compressed format structure being converted to the pattern of formal grammar, thereby obtains the pattern of legal and/or illegal Network.This formal grammar is such as but not limited to regular expression.
The some embodiment that described deep message detection system above, are below described the method to relevant.
Fig. 5 shows the flow chart of the method for data processing according to an embodiment of the invention.The method can be carried out by system 200 or system 300.
As shown in Figure 5, in step 501, receive the data message that belongs to legal and/or illegal Network.For example can perform step 501 by for example network interface 204 and/or configuration interface 201.
In step 502, from received data message, extract application layer load as training sample.For example can perform step 502 by the preprocessor 421 in self study engine 202.
In step 503, extracted application layer load is carried out to grammatical inference, obtain the pattern of compressed format structure.Here, if there is the application layer load of extracting and the application layer load of extracting from the data message of illegal Network from the data message of legitimate network business simultaneously, the grammatical inference of these two kinds of application layer load is separately carried out, to obtain the pattern of compressed format structure separately.For example can perform step 503 by the self study kernel 422 in self study engine 202.
In step 504, the pattern of compressed format structure is converted to the pattern of formal grammar, thereby obtains the pattern of legal and/or illegal Network.Wherein, the application layer load of extracting for data message by from legitimate network business is carried out the pattern of the compressed format structure that grammatical inference obtains, and the pattern of its formal grammar that is converted to is as the pattern of legal Network.The application layer load of extracting for data message from illegal Network is carried out grammatical inference and the pattern of the compressed format structure that obtains, and the pattern of its formal grammar that is converted to is as the pattern of illegal Network.For example can perform step 504 by the transducer 423 in self study engine 202.Wherein, the pattern that step 504 obtains can be stored in pattern container 203, or is stored in self study engine 202.
In addition, can also comprise inspection step, in the time will determining which kind of Network certain data message belong to, the pattern obtaining based on step 504 checks the Network under this certain data message.For example can carry out this inspection step by matching engine 205.For example, when checking that this certain data message of discovery includes the pattern of illegal Network, illustrate that this certain data message belongs to illegal Network.
Other modification
Although it will be appreciated by those skilled in the art that in the above embodiments, system 200 or 300 receives from outside be legal and/data message of illegal Network, yet the present invention is not limited thereto.In some other embodiment of the present invention, system 200 or 300 also can be directly from outside, receive legal and/the application layer load of the data message of illegal Network.In this case, self study engine 202 can not comprise preprocessor 421.
Although it will be appreciated by those skilled in the art that in the above embodiments, the pattern of the type of service syntax as legal and/pattern of illegal Network, yet the present invention is not limited thereto.In some other embodiment of the present invention, the pattern that also can use compressed format structure as legal and/pattern of illegal Network.
Fig. 6 shows according to the schematic diagram of the equipment for data processing of one embodiment of the invention.As shown in Figure 6, equipment 600 can comprise memory 610 and the processor 620 for stores executable instructions.
Wherein, processor 620 can be for the executable instruction of storing according to memory 610, carry out following operation: receive the data of specified network business, wherein, this specified network business data there is the included application layer load of data message of specified network business; And, according to received data, utilize grammatical inference to produce the pattern of this specified network business.
In addition, processor 620 can also, for the executable instruction of storing according to memory 610, be carried out following operation: the pattern based on produced checks the Network under data message to be monitored.
In addition, the data of this specified network business can be the included application layer load of data message that belongs to this specified network business, and for described generation operation, the executable instruction that processor 620 is stored according to memory 610 is carried out following operation: this application layer load is carried out to grammatical inference to generate the pattern of compressed format structure, as the pattern of this specified network business.
In addition, the data of this specified network business can be the data messages of specifying Network, and for described generation operation, the executable instruction that processor 620 is stored according to memory 610 is carried out following operation: be subordinated in the data message of this specified network business and extract application layer load; And, extracted application layer load is carried out to grammatical inference to generate the pattern of compressed format structure, as the pattern of this specified network business.
In addition, processor 620 is the executable instruction for storing according to memory 610 also, carries out following operation: the pattern of generated compressed format structure is converted to the pattern of formal grammar, as the pattern of this specified network business.
In addition, processor 620 is the executable instruction for storing according to memory 610 also, carries out following operation: the pattern that storage produces.
In addition, this specified network business can comprise at least one in legal Network and illegal Network.
The embodiment of the present invention also provides a kind of machine readable media, stores executable instruction on it, when this executable instruction is performed, makes machine carry out the performed operation of processor 620.
By accompanying drawing and preferred embodiment, the present invention has been carried out to detail display and explanation above, yet the invention is not restricted to the embodiment that these have disclosed.Those skilled in the art, under the inspiration of the technology of the present invention design, not departing from the basis of design philosophy of the present invention, can make various improvement or modification.Protection scope of the present invention should be determined by the content of appending claims.
In this application, term " comprises ", " comprising " etc. do not get rid of and have other parts or step.In addition, although independently feature can be included in different claims, these features also can advantageously combine, and are not implying that the combination of feature is infeasible and/or disadvantageous comprising in different claim.

Claims (16)

1. a system for deal with data, comprising:
Interface, it is for receiving the data of specified network business, and wherein said packet is containing the application layer load of the data message of described specified network business; And
Self study engine, it is for utilizing grammatical inference to produce the Network pattern of described appointment according to the data of described reception.
2. the system as claimed in claim 1, wherein, also comprises:
Matching engine, for the Network type based under described pattern examination data message to be monitored.
3. the system as claimed in claim 1, wherein,
The data of the described specified network business receiving are described application layer load itself, and
Described self study engine comprises self study kernel, for described application layer load being carried out to grammatical inference to generate the pattern of compressed format structure, as the pattern of described specified network business.
4. the system as claimed in claim 1, wherein,
The data of described specified network business are described data message,
Described self study engine comprises: preprocessor, for extract described application layer load from described data message; And self study kernel, carries out grammatical inference to generate the pattern of compressed format structure, as the pattern of described specified network business for the application layer load to extracted.
5. the system as described in claim 3 or 4, wherein,
Described self study engine also comprises: transducer, and for the pattern of described compressed format structure being converted to the pattern of formal grammar the pattern using the pattern of described formal grammar as described specified network business.
6. the system as claimed in claim 1, wherein, also comprises:
Pattern container, it is for storing the Network pattern of described generation.
7. the system as claimed in claim 1, wherein,
Described specified network business comprises at least one in legal Network and illegal Network.
8. for a method for data processing, comprising:
Receive the data of specified network business, wherein, described packet is containing the application layer load of the data message of described specified network business; And
According to the data of described reception, utilize grammatical inference to produce the pattern of described specified network business.
9. method as claimed in claim 8, wherein, also comprises:
Based on described pattern, check the Network type under data message to be monitored.
10. method as claimed in claim 8, wherein,
The data of described specified network business are described application layer load,
Described generation step comprises: described application layer load is carried out to grammatical inference to generate the pattern of compressed format structure, as the pattern of described specified network business.
11. methods as claimed in claim 8, wherein,
The data of described specified network business are described data messages,
Described generation step comprises: from described data message, extract described application layer load; And, extracted application layer load is carried out to grammatical inference to generate the pattern of compressed format structure, as the pattern of described specified network business.
12. methods as described in claim 10 or 11, wherein,
Described generation step also comprises: the pattern of described compressed format structure is converted to the pattern of formal grammar, as the pattern of described specified network business.
13. methods as claimed in claim 8, wherein, also comprise:
Store the Network pattern of described generation.
14. methods as claimed in claim 8, wherein,
Described specified network business comprises at least one in legal Network and illegal Network.
15. 1 kinds of equipment for data processing, comprising:
Memory, for stores executable instructions; And
Processor, for according to stored executable instruction, executes claims the included operation of any one in 8-14.
16. 1 kinds of machine readable medias, store executable instruction on it, when described executable instruction is performed, make machine execute claims the included operation of any one in 8-14.
CN201210376910.0A 2012-09-29 2012-09-29 system and method for data processing Active CN103716288B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201210376910.0A CN103716288B (en) 2012-09-29 2012-09-29 system and method for data processing

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201210376910.0A CN103716288B (en) 2012-09-29 2012-09-29 system and method for data processing

Publications (2)

Publication Number Publication Date
CN103716288A true CN103716288A (en) 2014-04-09
CN103716288B CN103716288B (en) 2018-08-07

Family

ID=50408875

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201210376910.0A Active CN103716288B (en) 2012-09-29 2012-09-29 system and method for data processing

Country Status (1)

Country Link
CN (1) CN103716288B (en)

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US10484410B2 (en) 2017-07-19 2019-11-19 Cisco Technology, Inc. Anomaly detection for micro-service communications
WO2020252635A1 (en) * 2019-06-17 2020-12-24 西门子股份公司 Method and apparatus for constructing network behavior model, and computer readable medium

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US10484410B2 (en) 2017-07-19 2019-11-19 Cisco Technology, Inc. Anomaly detection for micro-service communications
WO2020252635A1 (en) * 2019-06-17 2020-12-24 西门子股份公司 Method and apparatus for constructing network behavior model, and computer readable medium

Also Published As

Publication number Publication date
CN103716288B (en) 2018-08-07

Similar Documents

Publication Publication Date Title
US11902126B2 (en) Method and system for classifying a protocol message in a data communication network
CN100429617C (en) Automatic protocol recognition method and system
CN105704103A (en) Modbus TCP communication behavior abnormity detection method based on OCSVM double-contour model
CN103997489B (en) Method and device for recognizing DDoS bot network communication protocol
CN112468347B (en) Security management method and device for cloud platform, electronic equipment and storage medium
CN105260662A (en) Detection device and method of unknown application bug threat
CN103916288B (en) A kind of Botnet detection methods and system based on gateway with local
CN109768952A (en) A kind of industry control network anomaly detection method based on trust model
CN102238021A (en) Message sequence searching method, protocol analysis engine and protocol analyzer
CN101778112A (en) Network attack detection method
CN105491018B (en) A kind of network data security analysis method based on DPI technology
CN109660518A (en) Communication data detection method, device and the machine readable storage medium of network
CN103152222A (en) Method for detecting quick-changing attack domain name based on host group characteristics
CN101997700A (en) Internet protocol version 6 (IPv6) monitoring equipment based on deep packet inspection and deep flow inspection
TW201719484A (en) Information security management system for application level log-based analysis and method using the same
CN109889471B (en) Structured Query Language (SQL) injection detection method and system
CN105071991B (en) The test method of the IP connectivity of multiple fire walls
WO2020036850A1 (en) Protocol-independent anomaly detection
CN114372519A (en) Model training method, API request filtering method, device and storage medium
CN103716288A (en) System and method for data processing
CN103501302A (en) Method and system for automatically extracting worm features
CN106034132A (en) Protection Method and Computer System
CN107463493A (en) A kind of test system and method for testing towards host antivirus software product
CN110958251A (en) Method and device for detecting and backtracking lost host based on real-time stream processing
CN103746991A (en) Security event analysis method and system in cloud computing network

Legal Events

Date Code Title Description
C06 Publication
PB01 Publication
C10 Entry into substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant