CN103716288B - system and method for data processing - Google Patents
system and method for data processing Download PDFInfo
- Publication number
- CN103716288B CN103716288B CN201210376910.0A CN201210376910A CN103716288B CN 103716288 B CN103716288 B CN 103716288B CN 201210376910 A CN201210376910 A CN 201210376910A CN 103716288 B CN103716288 B CN 103716288B
- Authority
- CN
- China
- Prior art keywords
- network service
- pattern
- data
- specified network
- application layer
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Active
Links
Landscapes
- Data Exchanges In Wide-Area Networks (AREA)
- Computer And Data Communications (AREA)
Abstract
The present invention relates to a kind of method and systems for data processing, wherein the system comprises:Interface is used to receive the data of specified network service, wherein the application layer load of data message of the data packet containing the specified network service;And self study engine, it is used to generate the pattern of the specified network service using grammatical inference according to the data of the reception.Using this method and system, the tedious work of human configuration DPI patterns can be eliminated and avoid various mistakes caused by manual operation possibility.
Description
Technical field
The present invention relates to network safety fileds, more particularly, to the deep packet inspection technology in network safety filed.
Background technology
Deep message detects(DPI)It is the payload to data message(From layer 3 to layer 7)The network message being detected
Processing unit, it is pre- for checking agreement inconsistency, virus, spam, invasion in network service etc., or based on other
The pattern or rule of definition check network service, to determine that anything to be taken the data message in network service to act.
DPI is by intruding detection system(IDS)And Intrusion Prevention System(IPS)Function combined with traditional status firewall.
The filtering of traditional firewall has been extended up application layer by this combination with detection from TCP/IP layer, and can be checked entire
The data message stream of network service.DPI can based on the number of signature corresponding to the information extracted in message data payload
Network service is identified according to library and is classified, to realize finer control than the classification for being based only upon data message head
System.Therefore, DPI can be used in preventing various attacks.Sorted data message can be redirected, mark/add
Label, prevention, rate limit and record for analyzing in the future.
Since DPI makes it possible to realize that high-level network management, user service, security function and internet data excavate
Deng current many enterprises, company, ISP and government, which are being widely applied, uses DPI.
Similar to traditional IDS/IPS, the ability of DPI depends on mode-matching technique.In practice, DPI is to be based on retouching
State one group of pattern of network traffic characteristics(Or signature), the pattern(Or signature)Describe legitimate traffic(Positive mode)Or attack
Business(Negative mode)Characteristic.DPI, which is merely capable of detection, has known mode(Positive mode)Procotol business, or
Identify the network attack defined by negative mode.Therefore, pattern is the core roles in DPI.
But current, define DPI patterns and depend on manual operation, and engineer or user must use canonical
Expression formula or other forms language describe the characteristic of procotol business model or attack mode.Fig. 1 shows existing skill
A kind of deep message detection in art(DPI)System 100 comprising pattern container 103 and DPI match engine 105.In above-mentioned depth
Spend packet check(DPI)In system 100, engineer or user must artificially use regular expression or other forms language
To define legitimate traffic pattern(Positive mode)Or attack mode(Negative mode), and these patterns are input to pattern container
It is stored in 103.Later, DPI, which matches engine 105, to carry out business with the various patterns stored in use pattern container 103
Identification and classification.It is detected in above-mentioned deep message(DPI)In system 100, no matter increase, delete or renewal model is required for
The artificial participation of engineer or user, it is clear that this mode is complicated, with high costs and error-prone.
Invention content
In view of the problem above of the prior art, an embodiment of the present invention provides a kind of scheme for data processing,
The tedious work of human configuration DPI patterns can be eliminated, and can avoid various mistakes caused by manual operation possibility.
According to a kind of system for data processing of the embodiment of the present invention, including:Interface is used to receive specified network
The data of business, wherein the data include the application layer load of the data message of the specified network service;And it learns by oneself
Engine is practised, is used to generate the pattern of the specified network service using grammatical inference according to the data of the reception.
Wherein, the system also includes:Engine is matched, is checked based on the pattern belonging to data message to be monitored
Network traffic types, i.e., specified network service is legal network service or illegal network service.
Wherein, the data of the specified network service are the application layer load, and, the self study engine includes certainly
Learn kernel, for carrying out grammatical inference to generate the pattern of compressed format structure, as the finger to the application layer load
Determine the pattern of network service.
Wherein, the data of the specified network service are the data messages, and the self study engine includes:Pretreatment
Device, for extracting the application layer load from the data message;And self study kernel, for the application to being extracted
Layer load progress grammatical inference is to generate the pattern of compressed format structure, the pattern as the specified network service.
Wherein, the self study engine further includes:Translater, for the pattern of the compressed format structure to be converted to shape
The pattern of the formula syntax, the pattern as the specified network service.
Wherein, the system also includes:Pattern container is used to store the pattern of the generation.
Wherein, the specified network service includes legal network service and illegal at least one of network service.
According to a kind of method for data processing of the embodiment of the present invention, including:The data for specifying network service are received,
Wherein, the data include the application layer load of the data message of the specified network service;And the number according to the reception
According to the pattern for generating the specified network service using grammatical inference.
Wherein, the method further includes:The network service belonging to data message to be monitored is checked based on the pattern
Type.
Wherein, the data of the specified network service are the application layer load, and the generation step includes:It answers described
Grammatical inference is carried out with layer load to generate the pattern of compressed format structure, the pattern as the specified network service.
Wherein, the data of the specified network service are the data messages,
The generation step includes:The application layer load is extracted from the data message;And it answers what is extracted
Grammatical inference is carried out with layer load to generate the pattern of compressed format structure, the pattern as the specified network service.
Wherein, the generation step further includes:The pattern of the compressed format structure is converted to the pattern of formal grammar,
Pattern as the specified network service.
Wherein, the method further includes:Store the network traffic pattern of the generation.
Wherein, the specified network service includes legal network service and illegal at least one of network service.
Description of the drawings
The present invention is described in detail with reference to the attached drawings, it should be understood that attached drawing and corresponding description should be understood as it is to say
Bright property and not restrictive, wherein:
Fig. 1 shows a kind of deep message detecting system in the prior art;
Fig. 2 shows the schematic diagrames of the system according to an embodiment of the invention for data processing;
Fig. 3 shows the schematic diagram of the system for data processing according to another embodiment of the invention;
Fig. 4 shows the schematic diagram of self study engine according to an embodiment of the invention;
Fig. 5 shows the flow chart of the method according to an embodiment of the invention for data processing;And
Fig. 6 shows the schematic diagram of the equipment for data processing according to another embodiment of the invention.
Specific implementation mode
Hereafter by a manner of clearly understandable by description of a preferred embodiment and in conjunction with attached drawing come above-mentioned to the present invention
Characteristic, technical characteristic, advantage and its realization method are further described.It should be appreciated that these preferred embodiments are used only for
The present invention is illustrated, and unrestricted.
Product is detected for deep message, pattern is the core for the ability that they are detected network service.But
For user or engineer, according to the legal feature of procotol business or attack signature by using regular expression or
Other forms language is complicated, taking and error-prone come artificially defining mode.In addition, which increases depth
The maintenance cost of packet check product.
Grammatical inference(grammatical inference), also referred to as the syntax conclude or syntax pattern distinguishment, refer to one kind
Machine learning techniques, the machine learning techniques learn the corresponding formalization syntax from one group of sample and consider institute to construct
State the model of sample properties.Grammatical inference is the particular example of inductive learning, which is to find it from sample
The process of public structure.Usually sample and/or negative sample are used for above-mentioned inductive learning process certainly.Certainly sample is based on spy
One group of character string that fixed character list defines.And negate sample it is then to be not belonging to one group of character string of object language, but it can be with
It is helpful to the deduction process.
In embodiments of the present invention, by the way that grammatical inference is introduced information technology security fields, deep message detects product
It can be subordinated in the sample data of legal or illegal network service and learn legal or illegal network service automatically
Pattern, wherein these patterns can be by the expression formula that formalizes(For example, regular expression)To define.Here, legal network
Business is, for example, to meet network service as defined in syntax, semanteme and sequential logic of protocol specification etc..Illegal network industry
Business does not meet network service as defined in syntax, semanteme or sequential logic in protocol specification etc. e.g., for example, by having
The network service of the attack data message formation of the syntax of mistake, illegal semantic or illegal temporal aspect.Here, it is subordinated to conjunction
It is automatically learned the pattern of legal network service in the sample data of the network service of method, is subordinated to illegal network service
Sample data in be automatically learned the pattern of illegal network service.
The present invention provides a kind of method and systems for data processing, for according to legal and/or illegal
The data of network service automatically obtain the pattern of legal and/or illegal network service, and this using grammatical inference
A little patterns can be used to carry out deep message detection.This method and system avoid Manual definition these patterns complexity and
Cost.
Fig. 2 shows the schematic diagrames of the system according to an embodiment of the invention for data processing.Such as Fig. 2 institutes
Show, system 200 may include configuration interface 201, self study engine 202, pattern container 203 and network interface 204.
As shown, configuration interface 201 can be used for receiving network service file 206 and by the network service file 206
It is input to self study engine 202.Network interface 204 can be used for receiving network service 207 and be input to network service 207 certainly
Learn engine 202.The network service file 206 and network service 207 include belonging to legal and/or illegal network service
Data message.Self study engine 202 can from configuration interface 201 and/or network interface 204 obtain belong to it is legal and/or
The data message of illegal network service, and using the application layer load included by the data message obtained as pattern learning
Training data, then, it is legal and/or non-to generate that self study engine 202 can carry out grammatical inference to the application layer load
The pattern of the network service of method, wherein the algorithm of grammatical inference can be, but not limited to, PRNI, PRNI2, EDSM or SAGE
Deng.Pattern container 203 can receive the pattern generated from self study engine 202 and be stored to these patterns.
In addition, system 200 can also include matching engine 205, it is used for according to the mould stored in pattern container 203
Which kind of network service formula, the data message detected in communication service belong to.
It will be appreciated by those skilled in the art that system 200 can utilize software, hardware(Such as integrated circuit, scene can be compiled
Journey gate array(FPGA)Deng)Or the mode of software and hardware combining is realized.
Multiple components in system 200 can be located in the same encapsulation, such as be integrated into an integrated circuit, but
It is that they can also be the communication as long as they can intercouple in different encapsulation.Various connections can be passed through
Mode(For example, wired connection mode, radio connection or both have concurrently)To carry out the coupling between component shown in Fig. 2.
Moreover, some component in above-mentioned multiple components can also be integrated into other component, for example, can be by pattern container 203
Function be integrated into self study engine 202 or matching engine 205 in, that is, can self study be stored in generated pattern
In engine 202 or matching engine 205, and no longer need in addition to provide an individual pattern container 203(Following article Fig. 3 institutes
Show).
Although showing multiple components of system 200 in Fig. 2, it is to be understood that, not each in these components
It is necessary.According to the difference of actual conditions, it is convenient to omit one or more of components, or can be omitted these components
Between one or more coupled relations.For example, since self study engine 202 can be from configuration interface 201 and network interface 204
In any one or said two devices obtain data, therefore without using network service file 206, it is convenient to omit
Interface 201 is configured, alternatively, without using network service 207, it may not be necessary to be coupled to network interface 204 certainly
Learn engine 202.Alternatively, it is also possible to according to actual needs in some the unshowned components of increase of system 200 or to therein
Some components are modified or are replaced.It is, for example, possible to use other interfaces to provide data to self study engine 202.
Fig. 3 shows the schematic diagram of the system for data processing according to another embodiment of the invention.With Fig. 2 institutes
The system 200 shown is compared, system 300 shown in Fig. 3 include individual pattern container, but self study engine 202 or
With pattern caused by storage in engine 205.
Fig. 4 shows the schematic diagram of self study engine according to an embodiment of the invention.As shown in figure 4, self study
Engine 202 may include preprocessor 421, self study kernel 422 and converter 423.Wherein, preprocessor 421 can be used for
The data message for belonging to legal and/or illegal network service received from self-configuring interface 201 and/or network interface 204
Middle extraction application layer load is as training sample.Self study kernel 422 can be used for carrying out the syntax to the training sample extracted
Infer to generate the pattern of compressed format structure, wherein it is limited certainly that compressed format construction can for example be, but not limited to, determination
Motivation(DFA).Converter 423 is used to be converted to the pattern of the compressed format structure pattern of formal grammar, to be closed
The pattern of method and/or illegal network service.The formal grammar is such as, but not limited to regular expression.
Described above is several embodiments of deep message detecting system, and hereafter relevant method will be described.
Fig. 5 shows the flow chart of the method according to an embodiment of the invention for data processing.This method can
To be executed by system 200 or system 300.
As shown in figure 5, in step 501, the data message for belonging to legal and/or illegal network service is received.Such as
Can step 501 be executed by such as network interface 204 and/or configuration interface 201.
In step 502, application layer load is extracted from received data message as training sample.Such as it can be by certainly
The preprocessor 421 learnt in engine 202 executes step 502.
In step 503, grammatical inference is carried out to the application layer load extracted, obtains the pattern of compressed format construction.This
In, if existing simultaneously the application layer load extracted from the data message of legitimate network business and the number from illegal network service
According to the application layer load extracted in message, then the grammatical inference of both application layer load is carried out separately, it is each to obtain
From compressed format structure pattern.Such as can step 503 be executed by the self study kernel 422 in self study engine 202.
In step 504, the pattern that compressed format constructs is converted to the pattern of formal grammar, to obtain it is legal and/
Or the pattern of illegal network service.Wherein, for by being carried to the application layer extracted from the data message of legitimate network business
Lotus carries out grammatical inference and the pattern of compressed format structure that obtains, and the pattern of be converted to formal grammar is as legal
The pattern of network service.Grammatical inference is carried out for the application layer load extracted from the data message of illegal network service and is obtained
Compressed format structure pattern, the pattern of the pattern of be converted to formal grammar as illegal network service.Such as
Can step 504 be executed by the converter 423 in self study engine 202.Wherein, the pattern that step 504 obtains can be stored in
In pattern container 203, or it is stored in self study engine 202.
In addition, it can include checking step, for when to determine which kind of network service some data message belong to, base
The network service belonging to some data message is checked in the pattern that step 504 obtains.Such as can by matching engine 205
Execute the checking step.For example, when checking that discovery some data message includes the pattern of illegal network service, then say
Bright some data message belongs to illegal network service.
Other modifications
It will be appreciated by those skilled in the art that although in the above embodiments, system 200 or 300 is received from outside
It is legal and/illegal network service data message, however, the present invention is not limited thereto.Other the one of the present invention
In a little embodiments, system 200 or 300 directly legal and/illegal network service data message can also be received from external
Application layer load.In this case, self study engine 202 can not include preprocessor 421.
It will be appreciated by those skilled in the art that although in the above embodiments, the pattern of the use form syntax is as conjunction
Method and/illegal network service pattern, however, the present invention is not limited thereto.In some other embodiment of the present invention
In, the pattern of compressed format structure can also be used as legal and/illegal network service pattern.
Fig. 6 shows the schematic diagram of the equipment for data processing according to one embodiment of the invention.As shown in fig. 6,
Equipment 600 may include the memory 610 and processor 620 for storing executable instruction.
Wherein, the executable instruction that processor 620 can be used for being stored according to memory 610 executes following operation:It connects
Receive the data for specifying network service, wherein the data of the specified network service have the data message institute of specified network service
Including application layer load;And the pattern of the specified network service is generated using grammatical inference according to received data.
In addition, processor 620 can be also used for the executable instruction stored according to memory 610, following operation is executed:
The network service belonging to data message to be monitored is checked based on generated pattern.
In addition, the data of the specified network service can belong to answering included by the data message of the specified network service
Operated with layer load, and for the generation, processor 620 according to the executable instruction that memory 610 is stored execute with
Lower operation:Pattern of the grammatical inference to generate compressed format structure is carried out to the application layer load, network service is specified as this
Pattern.
In addition, the data of the specified network service can be the data message of specified network service, also, for the production
Raw operation, processor 620 execute following operation according to the executable instruction that memory 610 is stored:It is subordinated to the specified network
Application layer load is extracted in the data message of business;And the application layer load to being extracted carries out grammatical inference to generate pressure
The pattern of contracting form structure specifies the pattern of network service as this.
In addition, processor 620 is additionally operable to the executable instruction stored according to memory 610, following operation is executed:By institute
The pattern of the compressed format structure of generation is converted to the pattern of formal grammar, and the pattern of network service is specified as this.
In addition, processor 620 is additionally operable to the executable instruction stored according to memory 610, following operation is executed:Storage
Generated pattern.
In addition, this specifies at least one in the network service and illegal network service that network service may include legal
Kind.
The embodiment of the present invention also provides a kind of machine readable media, is stored thereon with executable instruction, refers to when this is executable
Order is performed so that machine executes the operation performed by processor 620.
Detailed displaying and explanation carried out to the present invention above by attached drawing and preferred embodiment, however the present invention is not limited to
These embodiments having revealed that.Those skilled in the art are not departing from design of the present invention under the inspiration that the technology of the present invention is conceived
On the basis of thought, various improvement or modification can be made.Protection scope of the present invention should be by appended claims
Content determines.
In this application, the terms "include", "comprise" etc. do not exclude the presence of other components or step.In addition, although independent
Feature may be embodied in different claims, but these features can also be advantageously combined, and different claims
In comprising not implying that the combination of feature is infeasible and/or unfavorable.
Claims (16)
1. a kind of system of processing data, including:
Interface is used to receive the data for the specified network service for belonging to legal respectively and belongs to the illegal specified network industry
The data of business, wherein the application layer load of data message of the data packet containing the specified network service;And
Self study engine is used for using the application layer load included by the data of the reception as the training number of pattern learning
According to carrying out grammatical inference to the application layer load and generate the pattern of the specified network service, wherein is legal for belonging to
Specified network service data application layer load and belong to illegal specified network service data application layer load
Grammatical inference is carried out separately.
2. the system as claimed in claim 1, wherein further include:
Engine is matched, for based on the network traffic types belonging to pattern examination data message to be monitored.
3. the system as claimed in claim 1, wherein
The data of the specified network service received are the application layer load itself, and
The self study engine includes self study kernel, for carrying out grammatical inference to the application layer load to generate compressed shape
The pattern of formula structure, the pattern as the specified network service.
4. the system as claimed in claim 1, wherein
The data of the specified network service are the data message,
The self study engine includes:Preprocessor, for extracting the application layer load from the data message;And
Self study kernel, for the application layer load progress grammatical inference to being extracted to generate the pattern of compressed format structure, as
The pattern of the specified network service.
5. system as described in claim 3 or 4, wherein
The self study engine further includes:Converter, for the pattern of the compressed format structure to be converted to formal grammar
Pattern, and using the pattern of the formal grammar as the pattern of the specified network service.
6. the system as claimed in claim 1, wherein further include:
Pattern container is used to store the network traffic pattern of the generation.
7. the system as claimed in claim 1, wherein
The specified network service includes legal network service and illegal at least one of network service.
8. a kind of method for data processing, including:
It receives respectively and belongs to the data of legal specified network service and belong to the data of the illegal specified network service,
In, the data include the application layer load of the data message of the specified network service;And
Using the application layer load included by the data of the reception as the training data of pattern learning, to the application layer load
Grammatical inference is carried out to generate the pattern of the specified network service, wherein for belonging to the number of legal specified network service
According to application layer load and belong to illegal specified network service data application layer load grammatical inference be separate progress
's.
9. method as claimed in claim 8, wherein further include:
The network traffic types belonging to data message to be monitored are checked based on the pattern.
10. method as claimed in claim 8, wherein
The data of the specified network service are the application layer load,
The generation step includes:Grammatical inference is carried out to generate the pattern of compressed format structure to the application layer load, is made
For the pattern of the specified network service.
11. method as claimed in claim 8, wherein
The data of the specified network service are the data messages,
The generation step includes:The application layer load is extracted from the data message;And the application layer to being extracted
Load progress grammatical inference is to generate the pattern of compressed format structure, the pattern as the specified network service.
12. the method as described in claim 10 or 11, wherein
The generation step further includes:The pattern that the pattern of the compressed format structure is converted to formal grammar, as described
The pattern of specified network service.
13. method as claimed in claim 8, wherein further include:
Store the network traffic pattern of the generation.
14. method as claimed in claim 8, wherein
The specified network service includes legal network service and illegal at least one of network service.
15. a kind of equipment for data processing, including:
Memory, for storing executable instruction;And
Processor, for according to the executable instruction stored, perform claim to require the behaviour included by any one in 8-14
Make.
16. a kind of machine readable media, is stored thereon with executable instruction, when the executable instruction is performed so that machine
Device perform claim requires the operation included by any one in 8-14.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201210376910.0A CN103716288B (en) | 2012-09-29 | 2012-09-29 | system and method for data processing |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201210376910.0A CN103716288B (en) | 2012-09-29 | 2012-09-29 | system and method for data processing |
Publications (2)
Publication Number | Publication Date |
---|---|
CN103716288A CN103716288A (en) | 2014-04-09 |
CN103716288B true CN103716288B (en) | 2018-08-07 |
Family
ID=50408875
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201210376910.0A Active CN103716288B (en) | 2012-09-29 | 2012-09-29 | system and method for data processing |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN103716288B (en) |
Families Citing this family (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US10484410B2 (en) | 2017-07-19 | 2019-11-19 | Cisco Technology, Inc. | Anomaly detection for micro-service communications |
CN113812116A (en) * | 2019-06-17 | 2021-12-17 | 西门子股份公司 | Network behavior model construction method and device and computer readable medium |
-
2012
- 2012-09-29 CN CN201210376910.0A patent/CN103716288B/en active Active
Also Published As
Publication number | Publication date |
---|---|
CN103716288A (en) | 2014-04-09 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN107438052B (en) | A kind of anomaly detection method towards unknown industrial communication protocol specification | |
WO2020143227A1 (en) | Method for generating malicious sample of industrial control system based on adversarial learning | |
CN106713371B (en) | Fast Flux botnet detection method based on DNS abnormal mining | |
JP6117202B2 (en) | Method and system for classifying protocol messages in a data communication network | |
CN105703963B (en) | Industrial control system communication behavior method for detecting abnormality based on PSO OCSVM | |
CN100429617C (en) | Automatic protocol recognition method and system | |
WO2023065712A1 (en) | Distributed train control network intrusion detection method, system, and storage medium | |
WO2016082284A1 (en) | Modbus tcp communication behaviour anomaly detection method based on ocsvm dual-profile model | |
CN108737410B (en) | Limited knowledge industrial communication protocol abnormal behavior detection method based on feature association | |
Saxena et al. | General study of intrusion detection system and survey of agent based intrusion detection system | |
CN112468347B (en) | Security management method and device for cloud platform, electronic equipment and storage medium | |
CN104008332A (en) | Intrusion detection system based on Android platform | |
Vidal et al. | Alert correlation framework for malware detection by anomaly-based packet payload analysis | |
CN110719250B (en) | Powerlink industrial control protocol anomaly detection method based on PSO-SVDD | |
CN103916288B (en) | A kind of Botnet detection methods and system based on gateway with local | |
CN105260662A (en) | Detection device and method of unknown application bug threat | |
CN101778112A (en) | Network attack detection method | |
CN109600362A (en) | Zombie host recognition methods, identification equipment and medium based on identification model | |
CN104022924A (en) | Method for detecting HTTP (hyper text transfer protocol) communication content | |
CN111669354A (en) | Threat information industrial firewall based on machine learning | |
US11297082B2 (en) | Protocol-independent anomaly detection | |
CN103716288B (en) | system and method for data processing | |
Zhang et al. | Robust network traffic identification with unknown applications | |
CN113965393B (en) | Botnet detection method based on complex network and graph neural network | |
CN106973051A (en) | Set up method, device, storage medium and the processor of detection Cyberthreat model |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
C06 | Publication | ||
PB01 | Publication | ||
C10 | Entry into substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
GR01 | Patent grant | ||
GR01 | Patent grant |