CN104767744B - Protocol state machine active estimating method based on protocol knowledge - Google Patents
Protocol state machine active estimating method based on protocol knowledge Download PDFInfo
- Publication number
- CN104767744B CN104767744B CN201510134335.7A CN201510134335A CN104767744B CN 104767744 B CN104767744 B CN 104767744B CN 201510134335 A CN201510134335 A CN 201510134335A CN 104767744 B CN104767744 B CN 104767744B
- Authority
- CN
- China
- Prior art keywords
- protocol
- state machine
- message
- sequence
- inquiry
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Active
Links
Classifications
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04L—TRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
- H04L63/00—Network architectures or network communication protocols for network security
- H04L63/02—Network architectures or network communication protocols for network security for separating internal from external traffic, e.g. firewalls
- H04L63/0227—Filtering policies
- H04L63/0254—Stateful filtering
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04L—TRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
- H04L63/00—Network architectures or network communication protocols for network security
- H04L63/20—Network architectures or network communication protocols for network security for managing network security; network security policies in general
- H04L63/205—Network architectures or network communication protocols for network security for managing network security; network security policies in general involving negotiation or determination of the one or more network security mechanisms to be used, e.g. by negotiation between the client and the server or between peers or by selection according to the capabilities of the entities involved
Abstract
The present invention provides a kind of protocol state machine active estimating method based on protocol knowledge, comprises the following steps:Message format extraction, the initialization of observation table, the inspection of observation table closed, the filtering of invalid inquiry sequence, inquire and made directly in response to, candidate state mechanism, of equal value judge and extend observation table according to counter-example.The problem of efficiency is relatively low during the present invention actively infers for protocol state machine, according to protocol conversation sample set, the sequence constraint extracted between protocol massages filters various invalid inquiries, the inquiry classification that occurred in session sample is carried out based on protocol conversation sample set at the same time directly in response to, in addition, the counter-example of candidate state machine is effectively searched by the method based on positive example sample variation, improves the efficiency that protocol state machine is actively inferred on the whole.
Description
Technical field
The present invention relates to network technique field, in particular to a kind of net for receiving and sending according to protocol entity program
Network message, on the basis of respective protocol concrete knowledge is analyzed, by being interacted with protocol entity, is continuously replenished the agreement letter of needs
Breath, the method that the protocol state machine of procotol is inferred in automation.
Background technology
To ensure exchange data without any confusion in a computer network, communicating pair has to comply with procotol.Network
Agreement is the key element that network communicating function is realized, is the primary study object of the various fields such as network service, network security.
A large amount of network security technologies such as intrusion detection, fuzz testing, agreement reuse, agreement vulnerability analysis all rely on detailed network
Protocol specification information.
Network protocol standard mainly includes protocol format and protocol state machine two parts.Protocol format is concerned with communication report
The Nomenclature Composition and Structure of Complexes of each protocol domain in text.Protocol state machine is concerned with the quantity and agreement of protocol status in protocol system
System is in the case where receiving different inputs from a protocol status to the rule of another transferring protocol state.
Procotol includes disclosure agreement and proprietary protocol.The content and details of disclosure agreement have disclosed standard text
Shelves, such as HTTP, SMTP communication protocol.And proprietary protocol does not disclose document and illustrates, often by specific network application
Used, TNS procotols used in the communication protocol, oracle database such as QQ instant chat softwares and some malice
Communication protocol used in software.
Proprietary protocol widely uses in network so that all kinds of network security technologies dependent on information norm are using model
Place and be extremely restricted.In order to resolution protocol information it is unknown the problem of, researcher is obtained not using the reverse method of agreement
The protocol specification known.Agreement inversely refers in the case where being described independent of agreement, passes through the network inputs to protocol entity
Output, system action and instruction execution flow are monitored and analyze, and extract the process of procotol concrete norm information.
Traditional agreement inversely uses manual type, and process tedious takes, and accuracy depends on the technology water of analysis personnel
Gentle practical experience.With the expansion of network size and increasing for protocol type, conversed analysis accuracy and timeliness are wanted
Ask higher and higher, tradition has been unable to meet the needs of practical application based on artificial agreement conversed analysis.Agreement inversely may be used automatically
To substantially reduce manual analysis, the analysis efficiency of proprietary protocol is improved, obtains higher and higher attention.
Major part agreement inversely studies the extraction for concentrating on protocol format automatically at present, lacks protocol status in analysis result
Machine information, constrains the practical application of the reverse result of agreement.In recent years, with the relative maturity of protocol format extractive technique, one
A little researchers begin attempt to protocol state machine conversed analysis, or referred to as protocol state machine is inferred.
Whether need to interact with protocol entity according to during protocol state machine deduction, protocol state machine deduction side
Method can be divided into passive deduction and actively infer two classes.The passive message sample set for inferring that foundation is given is implemented to infer, in state machine
It need not be interacted during deduction with protocol entity.Actively infer on the basis of known sample collection, utilize request query
Constantly expand original sample collection with response feedback, obtain protocol status machine information based on this.
Actively infer field, the L* algorithms that current estimating method is mainly proposed with Angluin et al. in protocol state machine
For the basis of implementation.L* algorithms maintain a data structure for being known as observation table (Observation Table), observe table quilt
A triple (S, E, T) is defined as, wherein S and E are respectively the prefix of the finite length character string based on incoming symbol table Σ
Closed set and suffix closed set, T (s, e) are the functions determined by s ∈ S ∪ S Σ, e ∈ E, and symbol " " represents intercharacter
Splicing relation.The expression state machine of T (s, e)=1 receives character string se, T (s, e)=0 and represents state machine refusal character string s
e.The representation of table generally use bivariate table is observed, the row of wherein bivariate table is the element in S ∪ S Σ, the row of bivariate table
It is the element in E, the entry in table is the value of T (s, e).
Observation table is required to meet two kinds of characteristics of closed and uniformity.For any s, t ∈ S ∪ S Σ, and if only if right
All e ∈ E, are satisfied by T (s, e)=T (t, e), claim s to be equivalent to t, useRepresent, all and s equivalences row is represented with [s].
If to any t ∈ S Σ, there are s ∈ S, meetThen observation table is claimed to meet closed.If for any s, t ∈ S, meetSo for all i ∈ Σ,Then observation table is claimed to meet uniformity.
L* algorithms, which assume that there are one, can inquire membership qualification (membership query) and inquiry of equal value
(equivalence query) makes the arbitrator (Oracle) accurately answered.The implementation of L* algorithms is firstly the need of according to member
The continuous observation table of qualification inquiry structure closure, and then corresponding candidate state machine M is generated, on this basis, ask according to of equal value
Ask and judge whether candidate state machine M is consistent with real state machine, be then to terminate to infer, the counter-example otherwise provided according to arbitrator
(counterexample), continue to infer state machine.
The deduction target of traditional L* algorithms is the deterministic type finite state machine of no output.The deterministic type of this no output is limited
State machine, only considers message input, it is not intended that message exports, ignores the inherent connection between protocol system input and output message
System.Protocol system is the state transition system with output, using the deterministic type finite state machine of no output as deduction target, institute
There are larger difference with actual agreements system for obtained state machine.
Li is based on L* algorithms, in paper《Integration testing of components guided by
incremental state machine learning》(Keqin Li,Roland Groz and Muzammil
Shahbaz.Integration testing of components guided by incremental state machine
learning.In Testing:Academic and industrial conference-Practice and research
Techniques, 59-70, IEEE Computer Society, 2006) the deduction algorithm LM to Mealy machines is proposed in first
+, algorithm replaces membership qualification to inquire with output inquiry (output query), obtains protocol state machine to incoming symbol sequence
Export result.LM+ algorithms require that identical element is not present in observation table S set, and so there is no need to be pushed away in state machine
The uniformity of observation table is considered during disconnected.In addition, LM+ algorithm improvements processing method of the L* algorithms to counter-example, anti-in processing
During example, the distinguishing sequence of length minimum is determined, avoid during the E of excessive suffix extension to observation table is gathered.
Cho et al. has been inferred to Botnet based on protocol format reverse method Dispatcher and LM+ algorithm
The protocol state machine of proprietary protocol used by MegaD.Cho has carried out 2 points of improvement on the basis of LM+ algorithms.First, increase
Add caching, Series poll is improved to parallel query, reduce the required times to be sent such as inquiry;Secondly, by inquiry
Return the result and be stored in caching, the inquiry comprising self-cycle structure is responded in advance, reduce the inquiry sent to arbitrator
Quantity.The improvement of Cho improves the deduction efficiency of protocol state machine.
It is more commonly used at this stage due in systems in practice and there is no the arbitrators of all kinds of decision problems can be answered
Alternative is that the symbol sebolic addressing for exporting inquiry is instantiated as sequence of message, sends to protocol entity and observes its output.Meanwhile
Test sample is generated based on incoming message set of types at random, come judge to infer obtained candidate state machine whether with time of day machine
Approximately equivalent.
Needs can be continuously replenished compared with passive estimating method in protocol state machine active estimating method during deduction
Protocol entity information, the information content of collection is comprehensive, has the advantages that inferred results are high.But when the major defect of such method is
Between expense it is high, less efficient, to all kinds of inquiries feedback dependent on protocol entity program operational efficiency and network delay, this
Outside, effective counter-example is found in decision process of equal value, counter-example can help to find the state omitted during deduction, but
The searching of counter-example needs higher time overhead, objectively constrains deduction efficiency.
On the whole, the raising for inferring efficiency is the most distinct issues that protocol state machine active estimating method faces.Mesh
Preceding protocol state machine active estimating method is mainly manifested in following aspect infer in terms of efficiency the problem of:(1) pushed away in state machine
All kinds of messages are abstracted as independently of each other during disconnected, it is not intended to which adopted symbol, the sequence constraint that have ignored between all kinds of messages are closed
System, the generation inquiry message of completely random are sent to protocol entity program, and caused a large amount of inquiry messages can be by protocol entity
Programmed decision is invalid packet, reduces deduction efficiency;(2) do not made full use of during protocol state machine is actively inferred
The session sample set of capture, the response message of part inquiry can directly be inferred by session sample set draw, without by institute
There is inquiry all to send to protocol entity and wait to be answered.(3) needed when inferring whether candidate state machine is of equal value with time of day machine
Want the structure counter-example sample, but current decision method have ignored negative data and positive example sample often there are part common prefix,
Test sample is generated to completely random to attempt to find the counter-example that can distinguish candidate state machine and time of day machine, it is a large amount of by producing
Invalid test sample, reduces judgement efficiency of equal value.
The content of the invention
For problems of the prior art, the object of the invention aims to provide a kind of protocol status based on protocol knowledge
Owner moves estimating method.The problem of efficiency is relatively low during actively inferring for protocol state machine, actively infers in state machine and calculates
On the basis of method LM+ algorithms, according to protocol conversation sample set, the sequence constraint extracted between protocol massages filters various invalid inquiries
Ask, while the inquiry classification occurred in session sample is carried out directly in response in addition, passing through base based on protocol conversation sample set
In the significantly more efficient counter-example for searching candidate state machine of the method for positive example sample variation, protocol status owner is improved on the whole
The dynamic efficiency inferred.
To reach above-mentioned purpose, the technical solution adopted in the present invention is as follows:
A kind of protocol state machine active estimating method based on protocol knowledge, comprises the following steps:
(1) message format extracts:Inputted, the protocol format of outgoing message, will be had using message format extracting method
The message for having same format is divided into one kind, and the classification information of message is represented with abstract symbol, and input, outgoing message are formed
Session sequence be abstracted as the character string sequence that abstract symbol is formed.
(2) table initialization is observed:Observation table is triple (S, E, T), and the row of observation table is made of S ∪ S Σ, S Σ
=sa | and s ∈ S, a ∈ Σ }, Σ is denoted as the set of the abstract symbol of input, and the splicing that symbol " " represents intercharacter is closed
System, the row of observation table are made of E.S ∪ S Σ and E are mapped as output character by function T, as corresponding table entries in form
Value.When observing table initialization, S={ ε } is made, E=Σ, wherein ε represent null character string;Produced to observe each table entries of table
Corresponding incoming message sequence, and give table entries assignment according to the corresponding outgoing message information of protocol entity.
(3) table closed inspection is observed:Judge whether the observation table after initialization meets the requirement of closed, if closure
Property condition be unsatisfactory for, it is necessary to be extended to observation table, and be newly-increased table entries assignment., will if closed condition meets
Construction candidate's protocol state machine corresponding with observation table.
(4) invalid inquiry sequence filtering:According to session sample set, the sequence constraint relation between all kinds of messages is extracted, is made
Fixed corresponding filtering rule, if the incoming message sequence as inquiry is judged as invalid, directly filtering;If as inquiry
The incoming message sequence asked meets the sequence constraint relation between message, then carries out the processing of next step;
(5) inquiry directly in response to:, can be to incoming message sequence of the part as inquiry by the study of session sample set
Directly implement response, without inquiry is sent to protocol entity;If it can not determine corresponding sound according to message sample set
Information is answered, then inquiry is sent to protocol entity program and is interacted.
(6) candidate state mechanism makes:, can be according to the association of observation table construction candidate after observation table meets closed requirement
Discuss state machine.Protocol state machine is expressed as the form of Mealy machines, wherein input set and output set are extracted according to message format
The information filling that stage collects, other information is according to observation table filling.
(7) it is of equal value to judge:In order to determine whether candidate's protocol state machine is of equal value, it is necessary to produce with real protocol state machine
Sufficient amount of cycle tests, compare output of the protocol entity to corresponding cycle tests whether to candidate's protocol state machine to corresponding
The output phase of cycle tests is same.Cycle tests is constructed based on normal agreement character string sequence, avoids generation at random from causing big
The defects of measuring invalid test sample.If it find that cause to export different cycle tests, the cycle tests be counter-example, it is necessary to
Take it as a basis and further implement state machine deduction.
(8) according to counter-example extension observation table:, it is necessary to which the suffix according to counter-example expands after finding as the character string of counter-example
Unroll and view a scroll painting and examine the E set of table, on this basis filling observation table, so as to more fully distinguish different agreement state.Repeat
(3)-(7) the step of, until exporting the state machine result with real protocol state machine equivalence.
The workflow in the protocol massages form extraction stage is as follows:Obtain the input of target protocol entity program and defeated
Outgoing packet, is inputted, the protocol format information of outgoing message by message format extracting method, and is carried out according to message format
Category division, represents the classification information of message with abstract symbol, so will input, the session sequence of outgoing message composition is abstracted as
The character string sequence of abstract symbol composition.
The workflow of the observation table initial phase is as follows:Observation table is triple (S, E, T), observes S collection in table
Close corresponding character string and represent the state that agreement may have, the corresponding character string of E set is played different associations in observation table
The effect that view state distinguishes.When observing table initialization, after the row and column of observation table is determined, to observe each form of table
Item structural string, the prefix of character string are the row values corresponding to table entries, and the suffix of character string is the row corresponding to table entries
Value.Character string is instantiated according to message format, generation corresponds to the incoming message sequence of protocol entity program, and obtains
Outgoing message of the protocol entity program for corresponding incoming message sequence.Outgoing message is abstracted as output word according to format information
Symbol is gone here and there, the length of the train value of foundation table entries, and the value of table entries is arranged to the output string suffix of corresponding length.
The workflow of the observation table closed examination phase is as follows:Judge whether the observation table after initialization meets to close
The requirement of conjunction property, i.e., to any t ∈ S Σ, if all there are s ∈ S, meet that s and t is of equal value.If closed condition is unsatisfactory for,
Finding causes the ungratified row t ∈ S Σ of closed, and t is moved to S, and the S Σ set of respective extension observation table.In order to fill out
Newly generated table entries are filled, it is necessary to produce inquiry message, obtain corresponding output information, so as to be table entries assignment.If close
Conjunction property condition is to meet, by construction candidate's protocol state machine corresponding with observation table.
The workflow of the invalid inquiry sequence filtration stage is as follows:By being inputted to protocol entity program, output is reported
The study of literary session sample set, extracts the sequence constraint relation between all types of messages, formulates corresponding filtering rule, if certain
It is a as inquiry incoming message sequence be judged as it is invalid, then directly filtering, without be sent to protocol entity program into
Row processing;If the incoming message sequence as inquiry meets the sequence constraint relation between message, then carries out the processing of next step.
The inquiry is as follows directly in response to the workflow in stage:Learn the agreement being made of abstract characters string sequence first
Session sample, using can accurate recording session sample inquire about at the same time simplicity data structure, such as enhancing prefix trees converter
EPTT (Extended Prefix Tree Transducer), the information of recording conversation sample set.Occurring as the defeated of inquiry
When entering sequence of message, first attempt to using the information inference response results in the structures such as EPTT, without inquiry is sent to
Protocol entity;If can not determine response message according to message sample set, then inquiry is sent to protocol entity program.
The workflow of the candidate state machine construction phase is as follows:, can foundation if observation table meets closed requirement
Observe table construction candidate's protocol state machine.Protocol state machine is expressed as hexa-atomic group of Mealy machines:(QM,I,O,δM,λM,q0M), its
The middle set I for representing input and the set O of representative output are filled according to the information that the message format extraction stage collects.According to observation
Other information in table filling protocol state machine, finite state collection QM=[s] | s ∈ S };Original state q0M=[ε];State turns
Move function δM([s], i)=[si], for any s ∈ S, i ∈ Σ, state transition function is represented after inputting i under state [s]
The dbjective state of transfer, dbjective state correspond in S the state for being equivalent to si;For any input i, output function λM
([s], i)=T (s, i), corresponding in observation table using s as row, i be row table entries value.
The workflow of the equivalence decision stage is as follows:In order to determine candidate's protocol state machine whether with real agreement
State machine equivalence is, it is necessary to produce sufficient amount of cycle tests.Cycle tests is being instantiated as to the incoming message of protocol entity
Afterwards, output response of the protocol entity program for these cycle tests is obtained, whether the output for comparing protocol entity assists with candidate
It is same to the output phase of corresponding cycle tests to discuss state machine.It is counter-example to cause both to export the different cycle tests of result.It is complete
The full cycle tests generated at random, most of is all invalid.Therefore use on the basis of normal message sequence, by inserting
The operation such as enter, replace, deleting and carrying out character string variation, making full use of the similitude of normal message sequence and counter-example in structure,
The validity of cycle tests is improved on the whole.If not finding counter-example after sufficient amount of cycle tests is sent, recognize
For protocol state machine equivalence of candidate's protocol state machine in the range of restriction with reality.If have found counter-example, illustrate candidate
Protocol state machine is unsatisfactory for decision condition of equal value and is pushed away, it is necessary to which counter-example Information expansion is further implemented state machine into observation table
It is disconnected.
The workflow according to counter-example extension observation table is as follows:After finding as the character string of counter-example, determine
The minimum of counter-example distinguishes suffix.It is to meet that output is real with agreement in the prefix of c to make counter-example c=uv, wherein u ∈ S ∪ SI, u
The same longest prefix character string of length of body program the output phase, correspondingly, v is exactly by candidate's protocol state machine and agreement in counter-example
The shortest suffix character string of length that substantive truth state machine distinguishes.After v is determined, all suffix of v and v are all added
Into the E of observation table, it is ensured that the closed of observation table is to meet, is also avoided to the unnecessary extension of observation table.Herein
On the basis of filling observation table, the state machine before repeating infers step, until exporting and the shape of real protocol state machine equivalence
State machine result.
From technical scheme, the beneficial effects of the present invention are utilize the inputting of protocol entity program, defeated
Outgoing packet sequence, obtains the relevant knowledge in protocol communication, makes full use of agreement to know during protocol state machine is actively inferred
Know, reduce the transmission for the invalid inquiry message for not meeting protocol specification, and based on the information of message sample set, to part as inquiry
The incoming message sequence asked is directly in response to reduction is interacted with protocol entity, can be significantly improved protocol state machine and actively be inferred
Efficiency.In addition, constructing cycle tests on the basis of normal message sequence, the validity of cycle tests is helped to improve, is kept away
Exempt to improve the accuracy of protocol state machine inferred results on the whole by the wasting of resources in invalid test case.
Brief description of the drawings
The entirety that Fig. 1 is the present invention realizes flow diagram.
Fig. 2 is the example that table is observed in the present invention.
Fig. 3 is for the example to inquiry progress directly in response to the EPTT of construction in the present invention;Wherein Fig. 3 (a) is symbol
Abstract session sample set;Fig. 3 (b) is the corresponding EPTT examples of session sample set S constructions being abstracted by symbol in Fig. 3 (a).
Fig. 4 is candidate's protocol state machine according to the observation table construction shown in Fig. 2 in the present invention.
Embodiment
In order to be better understood by the technology contents of the present invention, especially exemplified by specific embodiment and coordinate brief description of the drawings as follows.
As shown in Figure 1, preferred embodiment according to the present invention, the protocol state machine active deduction side based on protocol knowledge
Method, comprises the following steps:
(1) message format extracts:Inputted, the protocol format of outgoing message, will be had using message format extracting method
The message for having same format is divided into one kind, and the classification information of message is represented with abstract symbol, and input, outgoing message are formed
Session sequence be abstracted as the character string sequence that abstract symbol is formed.
(2) table initialization is observed:Observation table is triple (S, E, T), and the row of observation table is made of S ∪ S Σ, S Σ
=sa | and s ∈ S, a ∈ Σ }, Σ is denoted as the set of the abstract symbol of input, and the splicing that symbol " " represents intercharacter is closed
System, the row of observation table are made of E.S ∪ S Σ and E are mapped as output character by function T, as corresponding table entries in form
Value.When observing table initialization, S={ ε } is made, E=Σ, wherein ε represent null character string, are produced to observe each table entries of table
Corresponding incoming message sequence, and be table entries assignment according to the corresponding outgoing message information of protocol entity.
(3) table closed inspection is observed:Judge whether the observation table after initialization meets the requirement of closed, if closure
Property condition be unsatisfactory for, it is necessary to be extended to observation table, and be newly-increased table entries assignment., will if closed condition meets
Construction candidate's protocol state machine corresponding with observation table.
(4) invalid inquiry sequence filtering:According to session sample set, the sequence constraint relation between all kinds of messages is extracted, is made
Fixed corresponding filtering rule (for example, the message for belonging to type 4 must be present in before belonging to the message of type 7), if conduct
It is invalid that the incoming message sequence of inquiry is judged as, then directly filters;If the incoming message sequence as inquiry meets message
Between sequence constraint relation, then be further processed;
(5) inquiry directly in response to:By learning to session sample set, if the incoming message sequence as inquiry,
It can directly be inferred by session sample set and draw response results, then directly implement response, without inquiry is sent to association
Discuss entity;If can not determine corresponding response message according to message sample set, then by inquiry be sent to protocol entity program into
Row interaction.
(6) candidate state mechanism makes:, can be according to the association of observation table construction candidate after observation table meets closed requirement
Discuss state machine.Protocol state machine is expressed as the form of Mealy machines, wherein input set and output set are extracted according to message format
The information filling that stage collects, other information is according to observation table filling.
(7) it is of equal value to judge:In order to determine whether candidate's protocol state machine is of equal value, it is necessary to produce with real protocol state machine
Sufficient amount of cycle tests, compare output of the protocol entity to corresponding cycle tests whether to candidate's protocol state machine to corresponding
The output phase of cycle tests is same.Cycle tests is constructed based on normal agreement character string sequence, avoids generation at random from causing big
The defects of measuring invalid test sample.If it find that cause to export different cycle tests, the cycle tests be counter-example, it is necessary to
Take it as a basis and further implement state machine deduction.
(8) according to counter-example extension observation table:, it is necessary to which the suffix according to counter-example expands after finding as the character string of counter-example
Unroll and view a scroll painting and examine the E set of table, on this basis filling observation table, so as to more fully distinguish different agreement state.Repeat
(3)-(7) the step of, until exporting the state machine result with real protocol state machine equivalence.
Entirety with reference to shown in figure 1 realizes flow, and the protocol state machine estimating method of the present embodiment mainly includes message lattice
Formula extraction, the initialization of observation table, observation table closed inspection, the filtering of invalid inquiry sequence, inquire directly in response to, candidate state machine
8 parts, the specific embodiments such as construction, judgement of equal value, foundation counter-example extension observation table illustrate individually below.
(1) message format extracts
The embodiment of the present invention largely collects the input and output sequence of message that the communication of protocol entity program network produces first, and
Incoming message and outgoing message are obtained using the message format extracting method of PI (Protocol Information Project)
Format information.Classify respectively to incoming message and outgoing message according to message format, there will be mutually isostructural message
It is classified as one kind.For each classification, it is identified using unique Arabic numerals (such as 1,2,3).
On the basis of message classification, in units of session, network service behavior is abstracted.Session represents communication ginseng
The partial data carried out between person exchanges, and can be reflected in the migration situation of protocol status in communication process.In order to just
In implementing state machine deduction, the present invention will input, the Sequence Transformed character formed for abstract symbol of the session of outgoing message composition
String sequence.
(2) table initialization is observed
Observe the form that table is triple:(S, E, T), the row of observation table are made of S ∪ S Σ, and S Σ=sa | s
∈ S, a ∈ Σ }, Σ is denoted as the set of the abstract symbol of input, and symbol " " represents the splicing relation of intercharacter, observes table
Row be made of E.S ∪ S Σ and E are mapped as output character by function T, the value as corresponding table entries in form.Observation
S gathers corresponding character string and represents the state that agreement may have in table, in observation table E gather corresponding character string play by
The effect that different agreement state distinguishes.Fig. 2 is the example of an observation table.
When observing table initialization, S={ ε } is made, E=Σ, wherein ε represent null character string.Determining the row and column of observation table
Afterwards, to observe each table entries structural string of table, the prefix of character string is the row value corresponding to table entries, character string
Suffix is the train value corresponding to table entries.Character string is instantiated according to message format, generates the defeated of protocol entity program
Enter sequence of message, obtain outgoing message of the protocol entity program for corresponding incoming message sequence.By outgoing message according to form
Information is output string, and according to the length of the train value of table entries, the value of table entries is arranged to the defeated of corresponding length
Go out string postfix.
(3) table closed inspection is observed
Whether the requirement that observation table must is fulfilled for closed can construct protocol state machine, it is therefore desirable to full to observation table
Sufficient closed requirement is judged:I.e. to any t ∈ S Σ, if all there are s ∈ S, meet that s and t is of equal value.For example, in Fig. 2
In, all there is row of equal value in the row in all S Σ, which meets closed condition in S.
If closed condition is unsatisfactory for, finding causes the ungratified row t ∈ S Σ of closed, and t is moved to S, needs at the same time
T Σ are gathered and are added in the original S Σ set of observation table.In order to fill newly generated table entries after the extension of observation table,
Need to produce inquiry message, corresponding output information is obtained, so as to be table entries assignment.If closed condition meets, by structure
Make candidate's protocol state machine corresponding with observation table.
(4) invalid inquiry sequence filtering
By the way that to the input of protocol entity program, the study of outgoing message session sample set, filtering rule can be formulated, with mistake
The obvious sequence of message for not meeting protocol requirement of filter.The embodiment of the present invention mainly considers to close according to the order between all types of messages
System formulates filtering rule.
Often there are certain ordinal relation between the type of message of agreement.For example, smtp protocol with HELO (or
EHLO) the beginning as session, the end of session is used as using QUIT.
The analysis object of the present invention is proprietary protocol, can not grasp the ordinal relation between type of message in advance, it is necessary to fill
Divide and obtain information using message session sample set.For the ease of analysis, the base for being extracted in message format extraction of filtering rule
Implement on plinth.For an incoming message sequence S=represented with abstract symbol (... 4 ... 7 ... 10 ...), in the sequence of message
Symbol 4 is located at before 7, and symbol 7 is located at before 10.The embodiment of the present invention is intersymbol in recorded message sequence with a matrix type
Order information.
If the number of types of incoming message is n, the sequence constraint set of relations on message sample set can be with n × n's
Matrix represents.The element m of the jth row kth row of matrixjkRepresent the quantity for the session sample that symbol j is located at before symbol k.If
mjk>=10 (10 threshold values set by the present embodiment) and mkj=0, then it is assumed that symbol j must be present in before symbol k.If
In cycle tests, if before symbol k appears in symbol j, then it is assumed that corresponding cycle tests is invalid.
The present embodiment sets a threshold to 10, and the order mainly occurred in view of partial symbols does not have stringent in the protocol
Limitation, the ordinal relation of two kinds of sign patterns must have enough generalities in sample set, can just close this order
It is the condition as invalid judgement.
Based on the matrix constructed, the sequence constraint relation between all types of messages is extracted, formulates corresponding filtering
Rule, for example, the message for belonging to type 4 and type 7 must be present in before belonging to the message of Class1 0.If some is as inquiry
It is invalid that the incoming message sequence asked is judged as, then directly filters, handled without being sent to protocol entity program, together
When return and inquire invalid information.If the incoming message sequence as inquiry meets sequence constraint relation between message, then into
The processing of row next step.
(5) inquiry directly in response to
The many inquiries occurred during state machine deduction, can utilize the information of sample focusing study to be directly inferred to
As a result and response is carried out, without being interacted with protocol entity program.In order to reach this purpose, learn first by abstract characters
String sequence form protocol conversation sample, using can accurate recording session sample inquire about at the same time simplicity data structure, record
The information of session sample set.
The embodiment of the present invention is using enhancing prefix trees converter EPTT (Extended Prefix Tree Transducer)
Structure recording conversation sample set information.EPTT is substantially a multiway tree, can see an original state machine, branch's generation as
Table some session sample corresponding abstract symbol sequence.
Fig. 3 (b) illustrates the example for the corresponding EPTT of session sample set S constructions being abstracted by symbol in Fig. 3 (a).In Fig. 3
(b) in, original state is root node, is identified with Arabic numerals 0.State machine receives input since original state, if defeated
Enter the state that symbol sebolic addressing is reached to be not present in original state machine, then create a new state, uniquely marked with Arabic numerals
Know, and add corresponding output information in state migration procedure;If state is existing, the state is transferred to, is received next
A input, repeats the above process all incoming symbol sequences received until EPTT in sample set S.
After EPTT construction completes, for the inquiry sent, carried out directly using the traversal mode of depth-first search
Response.Assuming that output inquiry is (19,11), represent that inquiry inputs the output response of " 11 " under state [19].With Fig. 3
(b) exemplified by, state [19] corresponds to the state 4 being transferred to after 0 time incoming symbol sequence " 19 " of original state, traversing graph
It is " 12 " that 3 (b), which understands that the output in 4 times inputs " 11 " of state responds, can be directly in response to renewal T (19,11)=12.If
Output inquiry is (14,11), and corresponding information is not included in EPTT, then directly in response to failure, it is necessary to further real with agreement
Body interacts.
In inquiry directly in response to the stage, when there is the incoming message sequence as inquiry, first attempt to tie using EPTT
Structure responds inquiry, without inquiry is sent to protocol entity.If the message sample information according to EPTT records
It can not determine to respond, then inquiry is sent to protocol entity program, accurately output result is obtained by interaction.Furthermore, it is possible to
Record is sent to the inquiry of protocol entity and corresponding output, updates EPTT, and avoiding still can not be straight to similar inquiry in subsequent process
Response is connect, further reduces the interaction times with protocol entity.
(6) candidate state mechanism makes
, can be according to observation table construction candidate's protocol state machine if observation table meets closed requirement.Protocol state machine table
It is shown as hexa-atomic group of Mealy machines:(QM,I,O,δM,λM,q0M), wherein represent input set I and represent export set O according to
The information filling that the message format extraction stage collects.According to the other information in observation table filling protocol state machine, finite state
Collect QM=[s] | s ∈ S };Original state q0M=[ε];State transition function δM([s], i)=[si], for any s ∈ S, i
∈ Σ, state transition function represent the dbjective state shifted under state [s] after input i, and dbjective state, which corresponds in S, to be equivalent to
The state of si;For any input i, output function λM([s], i)=T (s, i), is by row, i of s corresponding in observation table
The table entries value of row.
Fig. 4 is candidate's protocol state machine according to the observation table construction shown in Fig. 2 in the present invention.QM={ [ε], [1] };Just
Beginning state q0M=[ε];State transition function Output function λM([ε], 1)=TM(ε, 1)=10, λM([ε], 2)=TM(ε, 2)=
10,λM([1], 1)=TM(1,1)=11, λM([1], 2)=TM(1,2)=10.
(7) it is of equal value to judge
In order to determine whether candidate's protocol state machine is of equal value, it is necessary to produce sufficient amount of survey with real protocol state machine
Try sequence.After cycle tests is instantiated as the incoming message of protocol entity, protocol entity program is obtained for these tests
Sequence output response, compare protocol entity output whether the output phase to candidate's protocol state machine to corresponding cycle tests
Together.
With riThe quantity of the cycle tests produced needed for representing, the embodiment of the present invention is by riAngulin is arranged in its paper
《Learning regular sets from queries and counterexamples》(Angluin D.Learning
regular sets from queries and counterexamples[J].Information and computation,
1987,75(2):Numerical value in 87-106):1/ε(ln(1/δ)+ln2(i+1)).Wherein ε is accuracy, represents what is generated every time
Random mark sequence is that the probability of counter-example is not more than ε;δ is confidence level, and i is the number of the candidate's Mealy machines generated.
Angulin judges the upper limit of membership qualification inquiry in paper using the numerical value, and application scenarios of the invention are similar to its, real
Apply in example and determine the end condition of equal value judged based on output response using the value.
In decision process of equal value, it is counter-example to cause both to export the different cycle tests of result.Cycle tests is such as
Fruit completely random generates, then wherein most is all invalid.The embodiment of the present invention is used on the basis of normal message sequence,
By being inserted into, replacing, the operation such as deleting and carry out character string variation, the phase of normal message sequence and counter-example in structure is made full use of
Like the validity of property, on the whole raising cycle tests.
If do not find counter-example after sufficient amount of cycle tests is sent, then it is assumed that candidate's protocol state machine is limiting
In the range of it is of equal value with actual protocol state machine.If have found counter-example, illustrate that candidate's protocol state machine is unsatisfactory for equivalence and sentences
Fixed condition into observation table by counter-example Information expansion, it is necessary to further implement state machine deduction.
(8) according to counter-example extension observation table
, it is necessary to the E set of the suffix extension observation table according to counter-example after finding as the character string of counter-example.The present invention
Embodiment determines that the minimum of counter-example distinguishes suffix first.Make counter-example c=uv, wherein u ∈ S ∪ SI, u be c prefix in it is right
The output the answered length longest character string same with protocol entity program the output phase.Correspondingly, v is exactly by candidate's agreement in counter-example
The shortest suffix character string of length that state machine is distinguished with protocol entity time of day machine.
After v is determined, all suffix of v and v are all added in the E of observation table, it is ensured that observe the closure of table
Property be meet, also avoid to the unnecessary extension of observation table.For example, if v is 123,123,23 and 3 all
It should be added in the E set of observation table and (if respective element has existed in E set, need not additionally increase).
Filling observation table on this basis, repeat before the step of, it is of equal value with real protocol state machine until exporting
State machine result.
From above technical scheme, the protocol state machine active estimating method of the invention based on protocol knowledge, in shape
On the basis of state owner moves deduction algorithm LM+ algorithms, according to protocol conversation sample set, the sequence constraint between protocol massages is extracted
Various invalid inquiries are filtered, while the inquiry classification occurred in session sample directly ring based on protocol conversation sample set
Should, in addition, by the significantly more efficient counter-example for searching candidate state machine of the method based on positive example sample variation, lifted on the whole
The efficiency that protocol state machine is actively inferred.Need to obtain protocol entity program using the method, and can run as needed
Entity program, is sent to specific sequence of message, and observes corresponding message output, infers in this, as protocol state machine
Basis.
In conclusion the protocol state machine active estimating method based on protocol knowledge of the present invention, protocol system is considered as
State transition system with output, using the input and output message of protocol procedure as foundation is inferred during state machine deduction, fills
Divide the sequence constraint relation considered between all kinds of messages, avoid and be sent to the invalid packet for largely running counter to agreement fundamental characteristics
Protocol entity program.Secondly, during state machine is inferred, make full use of protocol conversation sample set trial to inquiry message into
Row is directly in response to can reduce and to be interacted with protocol entity, improve and infer efficiency.In addition, infer candidate state machine whether with
When time of day machine is of equal value, test packet sequence is constructed based on normal message sequence, makes full use of positive example being tied with counter-example
Similitude on structure, improves the validity of cycle tests, and then improves the accuracy of state machine inferred results on the whole.
Although the present invention is disclosed above with preferred embodiment, so it is not limited to the present invention.Skill belonging to the present invention
Has usually intellectual in art field, without departing from the spirit and scope of the present invention, when can be used for a variety of modifications and variations.Cause
This, the scope of protection of the present invention is defined by those of the claims.
Claims (4)
1. a kind of protocol state machine active estimating method based on protocol knowledge, it is characterised in that comprise the following steps:
(1) message format extracts:The protocol format for outputting and inputting message is obtained using message format extracting method, there will be phase
Message with form is divided into one kind, and the classification information of message is represented with abstract symbol, will output and input message composition
Session sequence is abstracted as the character string sequence that abstract symbol is formed;
(2) table initialization is observed:Observation table is triple (S, E, T), and the row of observation table is made of S ∪ S Σ, S Σ=
Sa | and s ∈ S, a ∈ Σ }, Σ is denoted as the set of the abstract symbol of input, and the splicing that symbol " " represents intercharacter is closed
System, the row of observation table are made of E;S ∪ S Σ and E are mapped as output character by function T, as corresponding table entries in form
Value;When observing table initialization, S={ ε } is made, E=Σ, wherein ε represent null character string, are produced to observe each table entries of table
Corresponding incoming message sequence, and be table entries assignment according to the corresponding outgoing message information of protocol entity;
(3) table closed inspection is observed:Judge whether the observation table after initialization meets the requirement of closed, if closed bar
Part is unsatisfactory for, and finding causes the ungratified row t ∈ S Σ of closed, and t is moved to S, and the S Σ collection of respective extension observation table
Close;In order to fill newly generated table entries, it is necessary to produce inquiry message, corresponding output information is obtained, so as to be assigned for table entries
Value;
If closed condition meets, it will enter step (6), construction candidate's protocol state machine corresponding with observation table;
(4) invalid inquiry sequence filtering:According to session sample set, the sequence constraint relation between all kinds of messages is extracted, formulates phase
The filtering rule answered, if the incoming message sequence as inquiry is judged as invalid, directly filtering;If as inquiry
Incoming message sequence meets the sequence constraint relation between message, then carries out the processing of next step;
(5) inquiry directly in response to:, can be with if the incoming message sequence as inquiry by learning to session sample set
Directly inferred by session sample set and draw response results, then directly implement response, without inquiry is sent to agreement reality
Body;If can not determine corresponding response message according to message sample set, then inquiry is sent to protocol entity program and is handed over
Mutually;
(6) candidate state mechanism makes:, can be according to the agreement shape of observation table construction candidate after observation table meets closed requirement
State machine;Protocol state machine is expressed as the form of Mealy machines, wherein input set and output set extract the stage according to message format
The information filling of collection, other information is according to observation table filling;
(7) it is of equal value to judge:In order to determine whether candidate's protocol state machine is of equal value, it is necessary to produce enough with real protocol state machine
The cycle tests of quantity, compares whether output of the protocol entity to corresponding cycle tests is tested corresponding to candidate's protocol state machine
The output phase of sequence is same;Cycle tests is constructed based on normal agreement character string sequence, avoids generation at random from causing a large amount of nothings
The defects of imitating test sample;If it find that causing to export different cycle tests, which is counter-example, it is necessary to it
Based on further implement state machine deduction;
(8) according to counter-example extension observation table:, it is necessary to which the suffix extension according to counter-example is seen after finding as the character string of counter-example
The E set of table is examined, on this basis filling observation table, so as to more fully distinguish different agreement state;Repetition (3)-
(7) the step of, until exporting the state machine result with real protocol state machine equivalence.
2. the protocol state machine active estimating method according to claim 1 based on protocol knowledge, it is characterised in that foregoing
The workflow of the invalid inquiry sequence filtering of step (4) is as follows:By outputting and inputting message session sample to protocol entity program
The study of this collection, extracts the sequence constraint relation between all types of messages, formulates corresponding filtering rule, if some is as inquiry
It is invalid that the incoming message sequence asked is judged as according to filtering rule, then directly filters, without being sent to protocol entity journey
Sequence is handled;If the incoming message sequence as inquiry meets the sequence constraint relation between message, then carries out next step
Processing.
3. the protocol state machine active estimating method according to claim 1 based on protocol knowledge, it is characterised in that foregoing
Step (5) inquiry directly in response to workflow it is as follows:Learn the protocol conversation sample being made of abstract characters string sequence first,
Using enhancing prefix trees converter EPTT (Extended Prefix Tree Transducer) data structure, EPTT is one
Multiway tree, can see an original state machine as, and branch represents the corresponding abstract symbol sequential recording meeting of some session sample
Talk about the information of sample set;When there is the incoming message sequence as inquiry, first attempt to utilize the information pair in EPTT structures
Inquiry is responded, without inquiry is sent to protocol entity;If can not determine response message according to message sample set,
Inquiry is sent to protocol entity program again.
4. the protocol state machine active estimating method according to claim 1 based on protocol knowledge, it is characterised in that foregoing
The workflow of step (7) judgement of equal value is as follows:In order to determine candidate's protocol state machine whether with real protocol state machine etc.
Valency is, it is necessary to produce sufficient amount of cycle tests;After cycle tests is instantiated as the incoming message of protocol entity, assisted
Discuss entity program for these cycle tests output response, compare protocol entity output whether with candidate's protocol state machine pair
The output phase of corresponding cycle tests is same, and it is counter-example to cause both to export the different cycle tests of result;
Using on the basis of normal message sequence, by being inserted into, replacing and delete operation carries out character string variation, make full use of
The similitude of normal message sequence and counter-example in structure, improves the validity of cycle tests on the whole;
If do not find counter-example after sufficient amount of cycle tests is sent, then it is assumed that candidate's protocol state machine is limiting scope
It is interior of equal value with actual protocol state machine;If have found counter-example, illustrate that candidate's protocol state machine is unsatisfactory for judgement bar of equal value
Part into observation table by counter-example Information expansion, it is necessary to further implement state machine deduction.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201510134335.7A CN104767744B (en) | 2015-03-25 | 2015-03-25 | Protocol state machine active estimating method based on protocol knowledge |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201510134335.7A CN104767744B (en) | 2015-03-25 | 2015-03-25 | Protocol state machine active estimating method based on protocol knowledge |
Publications (2)
Publication Number | Publication Date |
---|---|
CN104767744A CN104767744A (en) | 2015-07-08 |
CN104767744B true CN104767744B (en) | 2018-05-15 |
Family
ID=53649351
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201510134335.7A Active CN104767744B (en) | 2015-03-25 | 2015-03-25 | Protocol state machine active estimating method based on protocol knowledge |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN104767744B (en) |
Families Citing this family (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN109446146B (en) * | 2018-11-09 | 2022-02-08 | 中国科学院长春光学精密机械与物理研究所 | State transition sequence generation method of application layer communication protocol |
CN110213130A (en) * | 2019-06-03 | 2019-09-06 | 南京莱克贝尔信息技术有限公司 | A kind of industry control protocol format analysis method based on iteration optimization |
CN112733155B (en) * | 2021-01-28 | 2024-04-16 | 中国人民解放军国防科技大学 | Software forced safety protection method based on external environment model learning |
CN113609344B (en) * | 2021-09-29 | 2022-01-14 | 北京泰迪熊移动科技有限公司 | Method and device for constructing byte stream state machine, electronic equipment and storage medium |
CN114172972B (en) * | 2021-11-11 | 2023-08-15 | 中国工程物理研究院计算机应用研究所 | Unknown protocol behavior reverse inference method based on optimized random converter model |
Citations (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN102123413A (en) * | 2011-03-29 | 2011-07-13 | 杭州电子科技大学 | Network monitoring and protocol analysis system of wireless sensor network |
CN103200203A (en) * | 2013-04-24 | 2013-07-10 | 中国人民解放军理工大学 | Semantic-level protocol format inference method based on execution trace |
CN103441990A (en) * | 2013-08-09 | 2013-12-11 | 中国人民解放军理工大学 | Protocol state machine automatic inference method based on state fusion |
WO2014169227A1 (en) * | 2013-04-12 | 2014-10-16 | Northeastern University | Ontology-based waveform reconfiguration |
CN104142888A (en) * | 2014-07-14 | 2014-11-12 | 北京理工大学 | Regularization state machine model design method with stateful protocol |
-
2015
- 2015-03-25 CN CN201510134335.7A patent/CN104767744B/en active Active
Patent Citations (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN102123413A (en) * | 2011-03-29 | 2011-07-13 | 杭州电子科技大学 | Network monitoring and protocol analysis system of wireless sensor network |
WO2014169227A1 (en) * | 2013-04-12 | 2014-10-16 | Northeastern University | Ontology-based waveform reconfiguration |
CN103200203A (en) * | 2013-04-24 | 2013-07-10 | 中国人民解放军理工大学 | Semantic-level protocol format inference method based on execution trace |
CN103441990A (en) * | 2013-08-09 | 2013-12-11 | 中国人民解放军理工大学 | Protocol state machine automatic inference method based on state fusion |
CN104142888A (en) * | 2014-07-14 | 2014-11-12 | 北京理工大学 | Regularization state machine model design method with stateful protocol |
Non-Patent Citations (2)
Title |
---|
《Integration testing of components guided by incremental state machine learning》;Keqin Li et al.;《IEEE》;20061031;全文 * |
《网络协议状态机逆向工程方法的研究》;刘威;《中国优秀硕士学位论文全文数据库信息科技辑》;20140615(第6期);全文 * |
Also Published As
Publication number | Publication date |
---|---|
CN104767744A (en) | 2015-07-08 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN104767744B (en) | Protocol state machine active estimating method based on protocol knowledge | |
CN105871882B (en) | Network security risk analysis method based on network node fragility and attack information | |
Siganos et al. | Analyzing BGP policies: Methodology and tool | |
CN109150640A (en) | A kind of method for discovering network topology and system based on double layer network agreement | |
CN107665191A (en) | A kind of proprietary protocol message format estimating method based on expanded prefix tree | |
CN108123931A (en) | Ddos attack defence installation and method in a kind of software defined network | |
CN105282123A (en) | Network protocol identification method and device | |
CN109508420A (en) | A kind of cleaning method and device of knowledge mapping attribute | |
CN101426000A (en) | General protocol parsing method and system | |
CN109697455B (en) | Fault diagnosis method and device for distribution network switch equipment | |
CN107204975A (en) | A kind of industrial control system network attack detection technology based on scene fingerprint | |
CN103457909B (en) | A kind of Botnet detection method and device | |
CN108011894A (en) | Botnet detecting system and method under a kind of software defined network | |
CN103441990B (en) | The automatic estimating method of protocol state machine based on state fusion | |
CN110336789A (en) | Domain-flux Botnet detection method based on blended learning | |
CN106452955A (en) | Abnormal network connection detection method and system | |
CN101883023A (en) | Firewall pressure testing method | |
CN108229578A (en) | Image data target identification method based on three layers of data, information and knowledge collection of illustrative plates framework | |
Maier | Identification of timed behavior models for diagnosis in production systems. | |
CN112134720A (en) | Network topology discovery method | |
CN112291226B (en) | Method and device for detecting abnormity of network flow | |
CN108121796A (en) | Electric energy metering device failure analysis methods and device based on confidence level | |
CN105553787B (en) | Edge net egress network Traffic anomaly detection method based on Hadoop | |
Hillston et al. | Formal techniques for performance analysis: blending SAN and PEPA | |
CN109861846A (en) | Using call relation acquisition methods, system and storage medium |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
C06 | Publication | ||
PB01 | Publication | ||
EXSB | Decision made by sipo to initiate substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
GR01 | Patent grant | ||
GR01 | Patent grant |