CN103716288B - system and method for data processing - Google Patents

system and method for data processing Download PDF

Info

Publication number
CN103716288B
CN103716288B CN201210376910.0A CN201210376910A CN103716288B CN 103716288 B CN103716288 B CN 103716288B CN 201210376910 A CN201210376910 A CN 201210376910A CN 103716288 B CN103716288 B CN 103716288B
Authority
CN
China
Prior art keywords
network service
pattern
data
specified network
application layer
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN201210376910.0A
Other languages
Chinese (zh)
Other versions
CN103716288A (en
Inventor
唐文
沃尔夫冈·施密德
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Siemens AG
Original Assignee
Siemens AG
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Siemens AG filed Critical Siemens AG
Priority to CN201210376910.0A priority Critical patent/CN103716288B/en
Publication of CN103716288A publication Critical patent/CN103716288A/en
Application granted granted Critical
Publication of CN103716288B publication Critical patent/CN103716288B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Landscapes

  • Data Exchanges In Wide-Area Networks (AREA)
  • Computer And Data Communications (AREA)

Abstract

The present invention relates to a kind of method and systems for data processing, wherein the system comprises:Interface is used to receive the data of specified network service, wherein the application layer load of data message of the data packet containing the specified network service;And self study engine, it is used to generate the pattern of the specified network service using grammatical inference according to the data of the reception.Using this method and system, the tedious work of human configuration DPI patterns can be eliminated and avoid various mistakes caused by manual operation possibility.

Description

System and method for data processing
Technical field
The present invention relates to network safety fileds, more particularly, to the deep packet inspection technology in network safety filed.
Background technology
Deep message detects(DPI)It is the payload to data message(From layer 3 to layer 7)The network message being detected Processing unit, it is pre- for checking agreement inconsistency, virus, spam, invasion in network service etc., or based on other The pattern or rule of definition check network service, to determine that anything to be taken the data message in network service to act. DPI is by intruding detection system(IDS)And Intrusion Prevention System(IPS)Function combined with traditional status firewall. The filtering of traditional firewall has been extended up application layer by this combination with detection from TCP/IP layer, and can be checked entire The data message stream of network service.DPI can based on the number of signature corresponding to the information extracted in message data payload Network service is identified according to library and is classified, to realize finer control than the classification for being based only upon data message head System.Therefore, DPI can be used in preventing various attacks.Sorted data message can be redirected, mark/add Label, prevention, rate limit and record for analyzing in the future.
Since DPI makes it possible to realize that high-level network management, user service, security function and internet data excavate Deng current many enterprises, company, ISP and government, which are being widely applied, uses DPI.
Similar to traditional IDS/IPS, the ability of DPI depends on mode-matching technique.In practice, DPI is to be based on retouching State one group of pattern of network traffic characteristics(Or signature), the pattern(Or signature)Describe legitimate traffic(Positive mode)Or attack Business(Negative mode)Characteristic.DPI, which is merely capable of detection, has known mode(Positive mode)Procotol business, or Identify the network attack defined by negative mode.Therefore, pattern is the core roles in DPI.
But current, define DPI patterns and depend on manual operation, and engineer or user must use canonical Expression formula or other forms language describe the characteristic of procotol business model or attack mode.Fig. 1 shows existing skill A kind of deep message detection in art(DPI)System 100 comprising pattern container 103 and DPI match engine 105.In above-mentioned depth Spend packet check(DPI)In system 100, engineer or user must artificially use regular expression or other forms language To define legitimate traffic pattern(Positive mode)Or attack mode(Negative mode), and these patterns are input to pattern container It is stored in 103.Later, DPI, which matches engine 105, to carry out business with the various patterns stored in use pattern container 103 Identification and classification.It is detected in above-mentioned deep message(DPI)In system 100, no matter increase, delete or renewal model is required for The artificial participation of engineer or user, it is clear that this mode is complicated, with high costs and error-prone.
Invention content
In view of the problem above of the prior art, an embodiment of the present invention provides a kind of scheme for data processing, The tedious work of human configuration DPI patterns can be eliminated, and can avoid various mistakes caused by manual operation possibility.
According to a kind of system for data processing of the embodiment of the present invention, including:Interface is used to receive specified network The data of business, wherein the data include the application layer load of the data message of the specified network service;And it learns by oneself Engine is practised, is used to generate the pattern of the specified network service using grammatical inference according to the data of the reception.
Wherein, the system also includes:Engine is matched, is checked based on the pattern belonging to data message to be monitored Network traffic types, i.e., specified network service is legal network service or illegal network service.
Wherein, the data of the specified network service are the application layer load, and, the self study engine includes certainly Learn kernel, for carrying out grammatical inference to generate the pattern of compressed format structure, as the finger to the application layer load Determine the pattern of network service.
Wherein, the data of the specified network service are the data messages, and the self study engine includes:Pretreatment Device, for extracting the application layer load from the data message;And self study kernel, for the application to being extracted Layer load progress grammatical inference is to generate the pattern of compressed format structure, the pattern as the specified network service.
Wherein, the self study engine further includes:Translater, for the pattern of the compressed format structure to be converted to shape The pattern of the formula syntax, the pattern as the specified network service.
Wherein, the system also includes:Pattern container is used to store the pattern of the generation.
Wherein, the specified network service includes legal network service and illegal at least one of network service.
According to a kind of method for data processing of the embodiment of the present invention, including:The data for specifying network service are received, Wherein, the data include the application layer load of the data message of the specified network service;And the number according to the reception According to the pattern for generating the specified network service using grammatical inference.
Wherein, the method further includes:The network service belonging to data message to be monitored is checked based on the pattern Type.
Wherein, the data of the specified network service are the application layer load, and the generation step includes:It answers described Grammatical inference is carried out with layer load to generate the pattern of compressed format structure, the pattern as the specified network service.
Wherein, the data of the specified network service are the data messages,
The generation step includes:The application layer load is extracted from the data message;And it answers what is extracted Grammatical inference is carried out with layer load to generate the pattern of compressed format structure, the pattern as the specified network service.
Wherein, the generation step further includes:The pattern of the compressed format structure is converted to the pattern of formal grammar, Pattern as the specified network service.
Wherein, the method further includes:Store the network traffic pattern of the generation.
Wherein, the specified network service includes legal network service and illegal at least one of network service.
Description of the drawings
The present invention is described in detail with reference to the attached drawings, it should be understood that attached drawing and corresponding description should be understood as it is to say Bright property and not restrictive, wherein:
Fig. 1 shows a kind of deep message detecting system in the prior art;
Fig. 2 shows the schematic diagrames of the system according to an embodiment of the invention for data processing;
Fig. 3 shows the schematic diagram of the system for data processing according to another embodiment of the invention;
Fig. 4 shows the schematic diagram of self study engine according to an embodiment of the invention;
Fig. 5 shows the flow chart of the method according to an embodiment of the invention for data processing;And
Fig. 6 shows the schematic diagram of the equipment for data processing according to another embodiment of the invention.
Specific implementation mode
Hereafter by a manner of clearly understandable by description of a preferred embodiment and in conjunction with attached drawing come above-mentioned to the present invention Characteristic, technical characteristic, advantage and its realization method are further described.It should be appreciated that these preferred embodiments are used only for The present invention is illustrated, and unrestricted.
Product is detected for deep message, pattern is the core for the ability that they are detected network service.But For user or engineer, according to the legal feature of procotol business or attack signature by using regular expression or Other forms language is complicated, taking and error-prone come artificially defining mode.In addition, which increases depth The maintenance cost of packet check product.
Grammatical inference(grammatical inference), also referred to as the syntax conclude or syntax pattern distinguishment, refer to one kind Machine learning techniques, the machine learning techniques learn the corresponding formalization syntax from one group of sample and consider institute to construct State the model of sample properties.Grammatical inference is the particular example of inductive learning, which is to find it from sample The process of public structure.Usually sample and/or negative sample are used for above-mentioned inductive learning process certainly.Certainly sample is based on spy One group of character string that fixed character list defines.And negate sample it is then to be not belonging to one group of character string of object language, but it can be with It is helpful to the deduction process.
In embodiments of the present invention, by the way that grammatical inference is introduced information technology security fields, deep message detects product It can be subordinated in the sample data of legal or illegal network service and learn legal or illegal network service automatically Pattern, wherein these patterns can be by the expression formula that formalizes(For example, regular expression)To define.Here, legal network Business is, for example, to meet network service as defined in syntax, semanteme and sequential logic of protocol specification etc..Illegal network industry Business does not meet network service as defined in syntax, semanteme or sequential logic in protocol specification etc. e.g., for example, by having The network service of the attack data message formation of the syntax of mistake, illegal semantic or illegal temporal aspect.Here, it is subordinated to conjunction It is automatically learned the pattern of legal network service in the sample data of the network service of method, is subordinated to illegal network service Sample data in be automatically learned the pattern of illegal network service.
The present invention provides a kind of method and systems for data processing, for according to legal and/or illegal The data of network service automatically obtain the pattern of legal and/or illegal network service, and this using grammatical inference A little patterns can be used to carry out deep message detection.This method and system avoid Manual definition these patterns complexity and Cost.
Fig. 2 shows the schematic diagrames of the system according to an embodiment of the invention for data processing.Such as Fig. 2 institutes Show, system 200 may include configuration interface 201, self study engine 202, pattern container 203 and network interface 204.
As shown, configuration interface 201 can be used for receiving network service file 206 and by the network service file 206 It is input to self study engine 202.Network interface 204 can be used for receiving network service 207 and be input to network service 207 certainly Learn engine 202.The network service file 206 and network service 207 include belonging to legal and/or illegal network service Data message.Self study engine 202 can from configuration interface 201 and/or network interface 204 obtain belong to it is legal and/or The data message of illegal network service, and using the application layer load included by the data message obtained as pattern learning Training data, then, it is legal and/or non-to generate that self study engine 202 can carry out grammatical inference to the application layer load The pattern of the network service of method, wherein the algorithm of grammatical inference can be, but not limited to, PRNI, PRNI2, EDSM or SAGE Deng.Pattern container 203 can receive the pattern generated from self study engine 202 and be stored to these patterns.
In addition, system 200 can also include matching engine 205, it is used for according to the mould stored in pattern container 203 Which kind of network service formula, the data message detected in communication service belong to.
It will be appreciated by those skilled in the art that system 200 can utilize software, hardware(Such as integrated circuit, scene can be compiled Journey gate array(FPGA)Deng)Or the mode of software and hardware combining is realized.
Multiple components in system 200 can be located in the same encapsulation, such as be integrated into an integrated circuit, but It is that they can also be the communication as long as they can intercouple in different encapsulation.Various connections can be passed through Mode(For example, wired connection mode, radio connection or both have concurrently)To carry out the coupling between component shown in Fig. 2. Moreover, some component in above-mentioned multiple components can also be integrated into other component, for example, can be by pattern container 203 Function be integrated into self study engine 202 or matching engine 205 in, that is, can self study be stored in generated pattern In engine 202 or matching engine 205, and no longer need in addition to provide an individual pattern container 203(Following article Fig. 3 institutes Show).
Although showing multiple components of system 200 in Fig. 2, it is to be understood that, not each in these components It is necessary.According to the difference of actual conditions, it is convenient to omit one or more of components, or can be omitted these components Between one or more coupled relations.For example, since self study engine 202 can be from configuration interface 201 and network interface 204 In any one or said two devices obtain data, therefore without using network service file 206, it is convenient to omit Interface 201 is configured, alternatively, without using network service 207, it may not be necessary to be coupled to network interface 204 certainly Learn engine 202.Alternatively, it is also possible to according to actual needs in some the unshowned components of increase of system 200 or to therein Some components are modified or are replaced.It is, for example, possible to use other interfaces to provide data to self study engine 202.
Fig. 3 shows the schematic diagram of the system for data processing according to another embodiment of the invention.With Fig. 2 institutes The system 200 shown is compared, system 300 shown in Fig. 3 include individual pattern container, but self study engine 202 or With pattern caused by storage in engine 205.
Fig. 4 shows the schematic diagram of self study engine according to an embodiment of the invention.As shown in figure 4, self study Engine 202 may include preprocessor 421, self study kernel 422 and converter 423.Wherein, preprocessor 421 can be used for The data message for belonging to legal and/or illegal network service received from self-configuring interface 201 and/or network interface 204 Middle extraction application layer load is as training sample.Self study kernel 422 can be used for carrying out the syntax to the training sample extracted Infer to generate the pattern of compressed format structure, wherein it is limited certainly that compressed format construction can for example be, but not limited to, determination Motivation(DFA).Converter 423 is used to be converted to the pattern of the compressed format structure pattern of formal grammar, to be closed The pattern of method and/or illegal network service.The formal grammar is such as, but not limited to regular expression.
Described above is several embodiments of deep message detecting system, and hereafter relevant method will be described.
Fig. 5 shows the flow chart of the method according to an embodiment of the invention for data processing.This method can To be executed by system 200 or system 300.
As shown in figure 5, in step 501, the data message for belonging to legal and/or illegal network service is received.Such as Can step 501 be executed by such as network interface 204 and/or configuration interface 201.
In step 502, application layer load is extracted from received data message as training sample.Such as it can be by certainly The preprocessor 421 learnt in engine 202 executes step 502.
In step 503, grammatical inference is carried out to the application layer load extracted, obtains the pattern of compressed format construction.This In, if existing simultaneously the application layer load extracted from the data message of legitimate network business and the number from illegal network service According to the application layer load extracted in message, then the grammatical inference of both application layer load is carried out separately, it is each to obtain From compressed format structure pattern.Such as can step 503 be executed by the self study kernel 422 in self study engine 202.
In step 504, the pattern that compressed format constructs is converted to the pattern of formal grammar, to obtain it is legal and/ Or the pattern of illegal network service.Wherein, for by being carried to the application layer extracted from the data message of legitimate network business Lotus carries out grammatical inference and the pattern of compressed format structure that obtains, and the pattern of be converted to formal grammar is as legal The pattern of network service.Grammatical inference is carried out for the application layer load extracted from the data message of illegal network service and is obtained Compressed format structure pattern, the pattern of the pattern of be converted to formal grammar as illegal network service.Such as Can step 504 be executed by the converter 423 in self study engine 202.Wherein, the pattern that step 504 obtains can be stored in In pattern container 203, or it is stored in self study engine 202.
In addition, it can include checking step, for when to determine which kind of network service some data message belong to, base The network service belonging to some data message is checked in the pattern that step 504 obtains.Such as can by matching engine 205 Execute the checking step.For example, when checking that discovery some data message includes the pattern of illegal network service, then say Bright some data message belongs to illegal network service.
Other modifications
It will be appreciated by those skilled in the art that although in the above embodiments, system 200 or 300 is received from outside It is legal and/illegal network service data message, however, the present invention is not limited thereto.Other the one of the present invention In a little embodiments, system 200 or 300 directly legal and/illegal network service data message can also be received from external Application layer load.In this case, self study engine 202 can not include preprocessor 421.
It will be appreciated by those skilled in the art that although in the above embodiments, the pattern of the use form syntax is as conjunction Method and/illegal network service pattern, however, the present invention is not limited thereto.In some other embodiment of the present invention In, the pattern of compressed format structure can also be used as legal and/illegal network service pattern.
Fig. 6 shows the schematic diagram of the equipment for data processing according to one embodiment of the invention.As shown in fig. 6, Equipment 600 may include the memory 610 and processor 620 for storing executable instruction.
Wherein, the executable instruction that processor 620 can be used for being stored according to memory 610 executes following operation:It connects Receive the data for specifying network service, wherein the data of the specified network service have the data message institute of specified network service Including application layer load;And the pattern of the specified network service is generated using grammatical inference according to received data.
In addition, processor 620 can be also used for the executable instruction stored according to memory 610, following operation is executed: The network service belonging to data message to be monitored is checked based on generated pattern.
In addition, the data of the specified network service can belong to answering included by the data message of the specified network service Operated with layer load, and for the generation, processor 620 according to the executable instruction that memory 610 is stored execute with Lower operation:Pattern of the grammatical inference to generate compressed format structure is carried out to the application layer load, network service is specified as this Pattern.
In addition, the data of the specified network service can be the data message of specified network service, also, for the production Raw operation, processor 620 execute following operation according to the executable instruction that memory 610 is stored:It is subordinated to the specified network Application layer load is extracted in the data message of business;And the application layer load to being extracted carries out grammatical inference to generate pressure The pattern of contracting form structure specifies the pattern of network service as this.
In addition, processor 620 is additionally operable to the executable instruction stored according to memory 610, following operation is executed:By institute The pattern of the compressed format structure of generation is converted to the pattern of formal grammar, and the pattern of network service is specified as this.
In addition, processor 620 is additionally operable to the executable instruction stored according to memory 610, following operation is executed:Storage Generated pattern.
In addition, this specifies at least one in the network service and illegal network service that network service may include legal Kind.
The embodiment of the present invention also provides a kind of machine readable media, is stored thereon with executable instruction, refers to when this is executable Order is performed so that machine executes the operation performed by processor 620.
Detailed displaying and explanation carried out to the present invention above by attached drawing and preferred embodiment, however the present invention is not limited to These embodiments having revealed that.Those skilled in the art are not departing from design of the present invention under the inspiration that the technology of the present invention is conceived On the basis of thought, various improvement or modification can be made.Protection scope of the present invention should be by appended claims Content determines.
In this application, the terms "include", "comprise" etc. do not exclude the presence of other components or step.In addition, although independent Feature may be embodied in different claims, but these features can also be advantageously combined, and different claims In comprising not implying that the combination of feature is infeasible and/or unfavorable.

Claims (16)

1. a kind of system of processing data, including:
Interface is used to receive the data for the specified network service for belonging to legal respectively and belongs to the illegal specified network industry The data of business, wherein the application layer load of data message of the data packet containing the specified network service;And
Self study engine is used for using the application layer load included by the data of the reception as the training number of pattern learning According to carrying out grammatical inference to the application layer load and generate the pattern of the specified network service, wherein is legal for belonging to Specified network service data application layer load and belong to illegal specified network service data application layer load Grammatical inference is carried out separately.
2. the system as claimed in claim 1, wherein further include:
Engine is matched, for based on the network traffic types belonging to pattern examination data message to be monitored.
3. the system as claimed in claim 1, wherein
The data of the specified network service received are the application layer load itself, and
The self study engine includes self study kernel, for carrying out grammatical inference to the application layer load to generate compressed shape The pattern of formula structure, the pattern as the specified network service.
4. the system as claimed in claim 1, wherein
The data of the specified network service are the data message,
The self study engine includes:Preprocessor, for extracting the application layer load from the data message;And Self study kernel, for the application layer load progress grammatical inference to being extracted to generate the pattern of compressed format structure, as The pattern of the specified network service.
5. system as described in claim 3 or 4, wherein
The self study engine further includes:Converter, for the pattern of the compressed format structure to be converted to formal grammar Pattern, and using the pattern of the formal grammar as the pattern of the specified network service.
6. the system as claimed in claim 1, wherein further include:
Pattern container is used to store the network traffic pattern of the generation.
7. the system as claimed in claim 1, wherein
The specified network service includes legal network service and illegal at least one of network service.
8. a kind of method for data processing, including:
It receives respectively and belongs to the data of legal specified network service and belong to the data of the illegal specified network service, In, the data include the application layer load of the data message of the specified network service;And
Using the application layer load included by the data of the reception as the training data of pattern learning, to the application layer load Grammatical inference is carried out to generate the pattern of the specified network service, wherein for belonging to the number of legal specified network service According to application layer load and belong to illegal specified network service data application layer load grammatical inference be separate progress 's.
9. method as claimed in claim 8, wherein further include:
The network traffic types belonging to data message to be monitored are checked based on the pattern.
10. method as claimed in claim 8, wherein
The data of the specified network service are the application layer load,
The generation step includes:Grammatical inference is carried out to generate the pattern of compressed format structure to the application layer load, is made For the pattern of the specified network service.
11. method as claimed in claim 8, wherein
The data of the specified network service are the data messages,
The generation step includes:The application layer load is extracted from the data message;And the application layer to being extracted Load progress grammatical inference is to generate the pattern of compressed format structure, the pattern as the specified network service.
12. the method as described in claim 10 or 11, wherein
The generation step further includes:The pattern that the pattern of the compressed format structure is converted to formal grammar, as described The pattern of specified network service.
13. method as claimed in claim 8, wherein further include:
Store the network traffic pattern of the generation.
14. method as claimed in claim 8, wherein
The specified network service includes legal network service and illegal at least one of network service.
15. a kind of equipment for data processing, including:
Memory, for storing executable instruction;And
Processor, for according to the executable instruction stored, perform claim to require the behaviour included by any one in 8-14 Make.
16. a kind of machine readable media, is stored thereon with executable instruction, when the executable instruction is performed so that machine Device perform claim requires the operation included by any one in 8-14.
CN201210376910.0A 2012-09-29 2012-09-29 system and method for data processing Active CN103716288B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201210376910.0A CN103716288B (en) 2012-09-29 2012-09-29 system and method for data processing

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201210376910.0A CN103716288B (en) 2012-09-29 2012-09-29 system and method for data processing

Publications (2)

Publication Number Publication Date
CN103716288A CN103716288A (en) 2014-04-09
CN103716288B true CN103716288B (en) 2018-08-07

Family

ID=50408875

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201210376910.0A Active CN103716288B (en) 2012-09-29 2012-09-29 system and method for data processing

Country Status (1)

Country Link
CN (1) CN103716288B (en)

Families Citing this family (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US10484410B2 (en) 2017-07-19 2019-11-19 Cisco Technology, Inc. Anomaly detection for micro-service communications
CN113812116A (en) * 2019-06-17 2021-12-17 西门子股份公司 Network behavior model construction method and device and computer readable medium

Also Published As

Publication number Publication date
CN103716288A (en) 2014-04-09

Similar Documents

Publication Publication Date Title
CN107438052B (en) A kind of anomaly detection method towards unknown industrial communication protocol specification
WO2020143227A1 (en) Method for generating malicious sample of industrial control system based on adversarial learning
CN106713371B (en) Fast Flux botnet detection method based on DNS abnormal mining
JP6117202B2 (en) Method and system for classifying protocol messages in a data communication network
CN105703963B (en) Industrial control system communication behavior method for detecting abnormality based on PSO OCSVM
CN100429617C (en) Automatic protocol recognition method and system
WO2023065712A1 (en) Distributed train control network intrusion detection method, system, and storage medium
WO2016082284A1 (en) Modbus tcp communication behaviour anomaly detection method based on ocsvm dual-profile model
CN108737410B (en) Limited knowledge industrial communication protocol abnormal behavior detection method based on feature association
Saxena et al. General study of intrusion detection system and survey of agent based intrusion detection system
CN112468347B (en) Security management method and device for cloud platform, electronic equipment and storage medium
CN104008332A (en) Intrusion detection system based on Android platform
Vidal et al. Alert correlation framework for malware detection by anomaly-based packet payload analysis
CN110719250B (en) Powerlink industrial control protocol anomaly detection method based on PSO-SVDD
CN103916288B (en) A kind of Botnet detection methods and system based on gateway with local
CN105260662A (en) Detection device and method of unknown application bug threat
CN101778112A (en) Network attack detection method
CN109600362A (en) Zombie host recognition methods, identification equipment and medium based on identification model
CN104022924A (en) Method for detecting HTTP (hyper text transfer protocol) communication content
CN111669354A (en) Threat information industrial firewall based on machine learning
US11297082B2 (en) Protocol-independent anomaly detection
CN103716288B (en) system and method for data processing
Zhang et al. Robust network traffic identification with unknown applications
CN113965393B (en) Botnet detection method based on complex network and graph neural network
CN106973051A (en) Set up method, device, storage medium and the processor of detection Cyberthreat model

Legal Events

Date Code Title Description
C06 Publication
PB01 Publication
C10 Entry into substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant