CN104767744B - Protocol state machine active estimating method based on protocol knowledge - Google Patents

Protocol state machine active estimating method based on protocol knowledge Download PDF

Info

Publication number
CN104767744B
CN104767744B CN201510134335.7A CN201510134335A CN104767744B CN 104767744 B CN104767744 B CN 104767744B CN 201510134335 A CN201510134335 A CN 201510134335A CN 104767744 B CN104767744 B CN 104767744B
Authority
CN
China
Prior art keywords
protocol
state machine
message
sequence
inquiry
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN201510134335.7A
Other languages
Chinese (zh)
Other versions
CN104767744A (en
Inventor
洪征
吴礼发
赖海光
李华波
王辰
郑成辉
黄康宇
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
PLA University of Science and Technology
Original Assignee
PLA University of Science and Technology
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by PLA University of Science and Technology filed Critical PLA University of Science and Technology
Priority to CN201510134335.7A priority Critical patent/CN104767744B/en
Publication of CN104767744A publication Critical patent/CN104767744A/en
Application granted granted Critical
Publication of CN104767744B publication Critical patent/CN104767744B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L63/00Network architectures or network communication protocols for network security
    • H04L63/02Network architectures or network communication protocols for network security for separating internal from external traffic, e.g. firewalls
    • H04L63/0227Filtering policies
    • H04L63/0254Stateful filtering
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L63/00Network architectures or network communication protocols for network security
    • H04L63/20Network architectures or network communication protocols for network security for managing network security; network security policies in general
    • H04L63/205Network architectures or network communication protocols for network security for managing network security; network security policies in general involving negotiation or determination of the one or more network security mechanisms to be used, e.g. by negotiation between the client and the server or between peers or by selection according to the capabilities of the entities involved

Abstract

The present invention provides a kind of protocol state machine active estimating method based on protocol knowledge, comprises the following steps:Message format extraction, the initialization of observation table, the inspection of observation table closed, the filtering of invalid inquiry sequence, inquire and made directly in response to, candidate state mechanism, of equal value judge and extend observation table according to counter-example.The problem of efficiency is relatively low during the present invention actively infers for protocol state machine, according to protocol conversation sample set, the sequence constraint extracted between protocol massages filters various invalid inquiries, the inquiry classification that occurred in session sample is carried out based on protocol conversation sample set at the same time directly in response to, in addition, the counter-example of candidate state machine is effectively searched by the method based on positive example sample variation, improves the efficiency that protocol state machine is actively inferred on the whole.

Description

Protocol state machine active estimating method based on protocol knowledge
Technical field
The present invention relates to network technique field, in particular to a kind of net for receiving and sending according to protocol entity program Network message, on the basis of respective protocol concrete knowledge is analyzed, by being interacted with protocol entity, is continuously replenished the agreement letter of needs Breath, the method that the protocol state machine of procotol is inferred in automation.
Background technology
To ensure exchange data without any confusion in a computer network, communicating pair has to comply with procotol.Network Agreement is the key element that network communicating function is realized, is the primary study object of the various fields such as network service, network security. A large amount of network security technologies such as intrusion detection, fuzz testing, agreement reuse, agreement vulnerability analysis all rely on detailed network Protocol specification information.
Network protocol standard mainly includes protocol format and protocol state machine two parts.Protocol format is concerned with communication report The Nomenclature Composition and Structure of Complexes of each protocol domain in text.Protocol state machine is concerned with the quantity and agreement of protocol status in protocol system System is in the case where receiving different inputs from a protocol status to the rule of another transferring protocol state.
Procotol includes disclosure agreement and proprietary protocol.The content and details of disclosure agreement have disclosed standard text Shelves, such as HTTP, SMTP communication protocol.And proprietary protocol does not disclose document and illustrates, often by specific network application Used, TNS procotols used in the communication protocol, oracle database such as QQ instant chat softwares and some malice Communication protocol used in software.
Proprietary protocol widely uses in network so that all kinds of network security technologies dependent on information norm are using model Place and be extremely restricted.In order to resolution protocol information it is unknown the problem of, researcher is obtained not using the reverse method of agreement The protocol specification known.Agreement inversely refers in the case where being described independent of agreement, passes through the network inputs to protocol entity Output, system action and instruction execution flow are monitored and analyze, and extract the process of procotol concrete norm information.
Traditional agreement inversely uses manual type, and process tedious takes, and accuracy depends on the technology water of analysis personnel Gentle practical experience.With the expansion of network size and increasing for protocol type, conversed analysis accuracy and timeliness are wanted Ask higher and higher, tradition has been unable to meet the needs of practical application based on artificial agreement conversed analysis.Agreement inversely may be used automatically To substantially reduce manual analysis, the analysis efficiency of proprietary protocol is improved, obtains higher and higher attention.
Major part agreement inversely studies the extraction for concentrating on protocol format automatically at present, lacks protocol status in analysis result Machine information, constrains the practical application of the reverse result of agreement.In recent years, with the relative maturity of protocol format extractive technique, one A little researchers begin attempt to protocol state machine conversed analysis, or referred to as protocol state machine is inferred.
Whether need to interact with protocol entity according to during protocol state machine deduction, protocol state machine deduction side Method can be divided into passive deduction and actively infer two classes.The passive message sample set for inferring that foundation is given is implemented to infer, in state machine It need not be interacted during deduction with protocol entity.Actively infer on the basis of known sample collection, utilize request query Constantly expand original sample collection with response feedback, obtain protocol status machine information based on this.
Actively infer field, the L* algorithms that current estimating method is mainly proposed with Angluin et al. in protocol state machine For the basis of implementation.L* algorithms maintain a data structure for being known as observation table (Observation Table), observe table quilt A triple (S, E, T) is defined as, wherein S and E are respectively the prefix of the finite length character string based on incoming symbol table Σ Closed set and suffix closed set, T (s, e) are the functions determined by s ∈ S ∪ S Σ, e ∈ E, and symbol " " represents intercharacter Splicing relation.The expression state machine of T (s, e)=1 receives character string se, T (s, e)=0 and represents state machine refusal character string s e.The representation of table generally use bivariate table is observed, the row of wherein bivariate table is the element in S ∪ S Σ, the row of bivariate table It is the element in E, the entry in table is the value of T (s, e).
Observation table is required to meet two kinds of characteristics of closed and uniformity.For any s, t ∈ S ∪ S Σ, and if only if right All e ∈ E, are satisfied by T (s, e)=T (t, e), claim s to be equivalent to t, useRepresent, all and s equivalences row is represented with [s]. If to any t ∈ S Σ, there are s ∈ S, meetThen observation table is claimed to meet closed.If for any s, t ∈ S, meetSo for all i ∈ Σ,Then observation table is claimed to meet uniformity.
L* algorithms, which assume that there are one, can inquire membership qualification (membership query) and inquiry of equal value (equivalence query) makes the arbitrator (Oracle) accurately answered.The implementation of L* algorithms is firstly the need of according to member The continuous observation table of qualification inquiry structure closure, and then corresponding candidate state machine M is generated, on this basis, ask according to of equal value Ask and judge whether candidate state machine M is consistent with real state machine, be then to terminate to infer, the counter-example otherwise provided according to arbitrator (counterexample), continue to infer state machine.
The deduction target of traditional L* algorithms is the deterministic type finite state machine of no output.The deterministic type of this no output is limited State machine, only considers message input, it is not intended that message exports, ignores the inherent connection between protocol system input and output message System.Protocol system is the state transition system with output, using the deterministic type finite state machine of no output as deduction target, institute There are larger difference with actual agreements system for obtained state machine.
Li is based on L* algorithms, in paper《Integration testing of components guided by incremental state machine learning》(Keqin Li,Roland Groz and Muzammil Shahbaz.Integration testing of components guided by incremental state machine learning.In Testing:Academic and industrial conference-Practice and research Techniques, 59-70, IEEE Computer Society, 2006) the deduction algorithm LM to Mealy machines is proposed in first +, algorithm replaces membership qualification to inquire with output inquiry (output query), obtains protocol state machine to incoming symbol sequence Export result.LM+ algorithms require that identical element is not present in observation table S set, and so there is no need to be pushed away in state machine The uniformity of observation table is considered during disconnected.In addition, LM+ algorithm improvements processing method of the L* algorithms to counter-example, anti-in processing During example, the distinguishing sequence of length minimum is determined, avoid during the E of excessive suffix extension to observation table is gathered.
Cho et al. has been inferred to Botnet based on protocol format reverse method Dispatcher and LM+ algorithm The protocol state machine of proprietary protocol used by MegaD.Cho has carried out 2 points of improvement on the basis of LM+ algorithms.First, increase Add caching, Series poll is improved to parallel query, reduce the required times to be sent such as inquiry;Secondly, by inquiry Return the result and be stored in caching, the inquiry comprising self-cycle structure is responded in advance, reduce the inquiry sent to arbitrator Quantity.The improvement of Cho improves the deduction efficiency of protocol state machine.
It is more commonly used at this stage due in systems in practice and there is no the arbitrators of all kinds of decision problems can be answered Alternative is that the symbol sebolic addressing for exporting inquiry is instantiated as sequence of message, sends to protocol entity and observes its output.Meanwhile Test sample is generated based on incoming message set of types at random, come judge to infer obtained candidate state machine whether with time of day machine Approximately equivalent.
Needs can be continuously replenished compared with passive estimating method in protocol state machine active estimating method during deduction Protocol entity information, the information content of collection is comprehensive, has the advantages that inferred results are high.But when the major defect of such method is Between expense it is high, less efficient, to all kinds of inquiries feedback dependent on protocol entity program operational efficiency and network delay, this Outside, effective counter-example is found in decision process of equal value, counter-example can help to find the state omitted during deduction, but The searching of counter-example needs higher time overhead, objectively constrains deduction efficiency.
On the whole, the raising for inferring efficiency is the most distinct issues that protocol state machine active estimating method faces.Mesh Preceding protocol state machine active estimating method is mainly manifested in following aspect infer in terms of efficiency the problem of:(1) pushed away in state machine All kinds of messages are abstracted as independently of each other during disconnected, it is not intended to which adopted symbol, the sequence constraint that have ignored between all kinds of messages are closed System, the generation inquiry message of completely random are sent to protocol entity program, and caused a large amount of inquiry messages can be by protocol entity Programmed decision is invalid packet, reduces deduction efficiency;(2) do not made full use of during protocol state machine is actively inferred The session sample set of capture, the response message of part inquiry can directly be inferred by session sample set draw, without by institute There is inquiry all to send to protocol entity and wait to be answered.(3) needed when inferring whether candidate state machine is of equal value with time of day machine Want the structure counter-example sample, but current decision method have ignored negative data and positive example sample often there are part common prefix, Test sample is generated to completely random to attempt to find the counter-example that can distinguish candidate state machine and time of day machine, it is a large amount of by producing Invalid test sample, reduces judgement efficiency of equal value.
The content of the invention
For problems of the prior art, the object of the invention aims to provide a kind of protocol status based on protocol knowledge Owner moves estimating method.The problem of efficiency is relatively low during actively inferring for protocol state machine, actively infers in state machine and calculates On the basis of method LM+ algorithms, according to protocol conversation sample set, the sequence constraint extracted between protocol massages filters various invalid inquiries Ask, while the inquiry classification occurred in session sample is carried out directly in response in addition, passing through base based on protocol conversation sample set In the significantly more efficient counter-example for searching candidate state machine of the method for positive example sample variation, protocol status owner is improved on the whole The dynamic efficiency inferred.
To reach above-mentioned purpose, the technical solution adopted in the present invention is as follows:
A kind of protocol state machine active estimating method based on protocol knowledge, comprises the following steps:
(1) message format extracts:Inputted, the protocol format of outgoing message, will be had using message format extracting method The message for having same format is divided into one kind, and the classification information of message is represented with abstract symbol, and input, outgoing message are formed Session sequence be abstracted as the character string sequence that abstract symbol is formed.
(2) table initialization is observed:Observation table is triple (S, E, T), and the row of observation table is made of S ∪ S Σ, S Σ =sa | and s ∈ S, a ∈ Σ }, Σ is denoted as the set of the abstract symbol of input, and the splicing that symbol " " represents intercharacter is closed System, the row of observation table are made of E.S ∪ S Σ and E are mapped as output character by function T, as corresponding table entries in form Value.When observing table initialization, S={ ε } is made, E=Σ, wherein ε represent null character string;Produced to observe each table entries of table Corresponding incoming message sequence, and give table entries assignment according to the corresponding outgoing message information of protocol entity.
(3) table closed inspection is observed:Judge whether the observation table after initialization meets the requirement of closed, if closure Property condition be unsatisfactory for, it is necessary to be extended to observation table, and be newly-increased table entries assignment., will if closed condition meets Construction candidate's protocol state machine corresponding with observation table.
(4) invalid inquiry sequence filtering:According to session sample set, the sequence constraint relation between all kinds of messages is extracted, is made Fixed corresponding filtering rule, if the incoming message sequence as inquiry is judged as invalid, directly filtering;If as inquiry The incoming message sequence asked meets the sequence constraint relation between message, then carries out the processing of next step;
(5) inquiry directly in response to:, can be to incoming message sequence of the part as inquiry by the study of session sample set Directly implement response, without inquiry is sent to protocol entity;If it can not determine corresponding sound according to message sample set Information is answered, then inquiry is sent to protocol entity program and is interacted.
(6) candidate state mechanism makes:, can be according to the association of observation table construction candidate after observation table meets closed requirement Discuss state machine.Protocol state machine is expressed as the form of Mealy machines, wherein input set and output set are extracted according to message format The information filling that stage collects, other information is according to observation table filling.
(7) it is of equal value to judge:In order to determine whether candidate's protocol state machine is of equal value, it is necessary to produce with real protocol state machine Sufficient amount of cycle tests, compare output of the protocol entity to corresponding cycle tests whether to candidate's protocol state machine to corresponding The output phase of cycle tests is same.Cycle tests is constructed based on normal agreement character string sequence, avoids generation at random from causing big The defects of measuring invalid test sample.If it find that cause to export different cycle tests, the cycle tests be counter-example, it is necessary to Take it as a basis and further implement state machine deduction.
(8) according to counter-example extension observation table:, it is necessary to which the suffix according to counter-example expands after finding as the character string of counter-example Unroll and view a scroll painting and examine the E set of table, on this basis filling observation table, so as to more fully distinguish different agreement state.Repeat (3)-(7) the step of, until exporting the state machine result with real protocol state machine equivalence.
The workflow in the protocol massages form extraction stage is as follows:Obtain the input of target protocol entity program and defeated Outgoing packet, is inputted, the protocol format information of outgoing message by message format extracting method, and is carried out according to message format Category division, represents the classification information of message with abstract symbol, so will input, the session sequence of outgoing message composition is abstracted as The character string sequence of abstract symbol composition.
The workflow of the observation table initial phase is as follows:Observation table is triple (S, E, T), observes S collection in table Close corresponding character string and represent the state that agreement may have, the corresponding character string of E set is played different associations in observation table The effect that view state distinguishes.When observing table initialization, after the row and column of observation table is determined, to observe each form of table Item structural string, the prefix of character string are the row values corresponding to table entries, and the suffix of character string is the row corresponding to table entries Value.Character string is instantiated according to message format, generation corresponds to the incoming message sequence of protocol entity program, and obtains Outgoing message of the protocol entity program for corresponding incoming message sequence.Outgoing message is abstracted as output word according to format information Symbol is gone here and there, the length of the train value of foundation table entries, and the value of table entries is arranged to the output string suffix of corresponding length.
The workflow of the observation table closed examination phase is as follows:Judge whether the observation table after initialization meets to close The requirement of conjunction property, i.e., to any t ∈ S Σ, if all there are s ∈ S, meet that s and t is of equal value.If closed condition is unsatisfactory for, Finding causes the ungratified row t ∈ S Σ of closed, and t is moved to S, and the S Σ set of respective extension observation table.In order to fill out Newly generated table entries are filled, it is necessary to produce inquiry message, obtain corresponding output information, so as to be table entries assignment.If close Conjunction property condition is to meet, by construction candidate's protocol state machine corresponding with observation table.
The workflow of the invalid inquiry sequence filtration stage is as follows:By being inputted to protocol entity program, output is reported The study of literary session sample set, extracts the sequence constraint relation between all types of messages, formulates corresponding filtering rule, if certain It is a as inquiry incoming message sequence be judged as it is invalid, then directly filtering, without be sent to protocol entity program into Row processing;If the incoming message sequence as inquiry meets the sequence constraint relation between message, then carries out the processing of next step.
The inquiry is as follows directly in response to the workflow in stage:Learn the agreement being made of abstract characters string sequence first Session sample, using can accurate recording session sample inquire about at the same time simplicity data structure, such as enhancing prefix trees converter EPTT (Extended Prefix Tree Transducer), the information of recording conversation sample set.Occurring as the defeated of inquiry When entering sequence of message, first attempt to using the information inference response results in the structures such as EPTT, without inquiry is sent to Protocol entity;If can not determine response message according to message sample set, then inquiry is sent to protocol entity program.
The workflow of the candidate state machine construction phase is as follows:, can foundation if observation table meets closed requirement Observe table construction candidate's protocol state machine.Protocol state machine is expressed as hexa-atomic group of Mealy machines:(QM,I,O,δMM,q0M), its The middle set I for representing input and the set O of representative output are filled according to the information that the message format extraction stage collects.According to observation Other information in table filling protocol state machine, finite state collection QM=[s] | s ∈ S };Original state q0M=[ε];State turns Move function δM([s], i)=[si], for any s ∈ S, i ∈ Σ, state transition function is represented after inputting i under state [s] The dbjective state of transfer, dbjective state correspond in S the state for being equivalent to si;For any input i, output function λM ([s], i)=T (s, i), corresponding in observation table using s as row, i be row table entries value.
The workflow of the equivalence decision stage is as follows:In order to determine candidate's protocol state machine whether with real agreement State machine equivalence is, it is necessary to produce sufficient amount of cycle tests.Cycle tests is being instantiated as to the incoming message of protocol entity Afterwards, output response of the protocol entity program for these cycle tests is obtained, whether the output for comparing protocol entity assists with candidate It is same to the output phase of corresponding cycle tests to discuss state machine.It is counter-example to cause both to export the different cycle tests of result.It is complete The full cycle tests generated at random, most of is all invalid.Therefore use on the basis of normal message sequence, by inserting The operation such as enter, replace, deleting and carrying out character string variation, making full use of the similitude of normal message sequence and counter-example in structure, The validity of cycle tests is improved on the whole.If not finding counter-example after sufficient amount of cycle tests is sent, recognize For protocol state machine equivalence of candidate's protocol state machine in the range of restriction with reality.If have found counter-example, illustrate candidate Protocol state machine is unsatisfactory for decision condition of equal value and is pushed away, it is necessary to which counter-example Information expansion is further implemented state machine into observation table It is disconnected.
The workflow according to counter-example extension observation table is as follows:After finding as the character string of counter-example, determine The minimum of counter-example distinguishes suffix.It is to meet that output is real with agreement in the prefix of c to make counter-example c=uv, wherein u ∈ S ∪ SI, u The same longest prefix character string of length of body program the output phase, correspondingly, v is exactly by candidate's protocol state machine and agreement in counter-example The shortest suffix character string of length that substantive truth state machine distinguishes.After v is determined, all suffix of v and v are all added Into the E of observation table, it is ensured that the closed of observation table is to meet, is also avoided to the unnecessary extension of observation table.Herein On the basis of filling observation table, the state machine before repeating infers step, until exporting and the shape of real protocol state machine equivalence State machine result.
From technical scheme, the beneficial effects of the present invention are utilize the inputting of protocol entity program, defeated Outgoing packet sequence, obtains the relevant knowledge in protocol communication, makes full use of agreement to know during protocol state machine is actively inferred Know, reduce the transmission for the invalid inquiry message for not meeting protocol specification, and based on the information of message sample set, to part as inquiry The incoming message sequence asked is directly in response to reduction is interacted with protocol entity, can be significantly improved protocol state machine and actively be inferred Efficiency.In addition, constructing cycle tests on the basis of normal message sequence, the validity of cycle tests is helped to improve, is kept away Exempt to improve the accuracy of protocol state machine inferred results on the whole by the wasting of resources in invalid test case.
Brief description of the drawings
The entirety that Fig. 1 is the present invention realizes flow diagram.
Fig. 2 is the example that table is observed in the present invention.
Fig. 3 is for the example to inquiry progress directly in response to the EPTT of construction in the present invention;Wherein Fig. 3 (a) is symbol Abstract session sample set;Fig. 3 (b) is the corresponding EPTT examples of session sample set S constructions being abstracted by symbol in Fig. 3 (a).
Fig. 4 is candidate's protocol state machine according to the observation table construction shown in Fig. 2 in the present invention.
Embodiment
In order to be better understood by the technology contents of the present invention, especially exemplified by specific embodiment and coordinate brief description of the drawings as follows.
As shown in Figure 1, preferred embodiment according to the present invention, the protocol state machine active deduction side based on protocol knowledge Method, comprises the following steps:
(1) message format extracts:Inputted, the protocol format of outgoing message, will be had using message format extracting method The message for having same format is divided into one kind, and the classification information of message is represented with abstract symbol, and input, outgoing message are formed Session sequence be abstracted as the character string sequence that abstract symbol is formed.
(2) table initialization is observed:Observation table is triple (S, E, T), and the row of observation table is made of S ∪ S Σ, S Σ =sa | and s ∈ S, a ∈ Σ }, Σ is denoted as the set of the abstract symbol of input, and the splicing that symbol " " represents intercharacter is closed System, the row of observation table are made of E.S ∪ S Σ and E are mapped as output character by function T, as corresponding table entries in form Value.When observing table initialization, S={ ε } is made, E=Σ, wherein ε represent null character string, are produced to observe each table entries of table Corresponding incoming message sequence, and be table entries assignment according to the corresponding outgoing message information of protocol entity.
(3) table closed inspection is observed:Judge whether the observation table after initialization meets the requirement of closed, if closure Property condition be unsatisfactory for, it is necessary to be extended to observation table, and be newly-increased table entries assignment., will if closed condition meets Construction candidate's protocol state machine corresponding with observation table.
(4) invalid inquiry sequence filtering:According to session sample set, the sequence constraint relation between all kinds of messages is extracted, is made Fixed corresponding filtering rule (for example, the message for belonging to type 4 must be present in before belonging to the message of type 7), if conduct It is invalid that the incoming message sequence of inquiry is judged as, then directly filters;If the incoming message sequence as inquiry meets message Between sequence constraint relation, then be further processed;
(5) inquiry directly in response to:By learning to session sample set, if the incoming message sequence as inquiry, It can directly be inferred by session sample set and draw response results, then directly implement response, without inquiry is sent to association Discuss entity;If can not determine corresponding response message according to message sample set, then by inquiry be sent to protocol entity program into Row interaction.
(6) candidate state mechanism makes:, can be according to the association of observation table construction candidate after observation table meets closed requirement Discuss state machine.Protocol state machine is expressed as the form of Mealy machines, wherein input set and output set are extracted according to message format The information filling that stage collects, other information is according to observation table filling.
(7) it is of equal value to judge:In order to determine whether candidate's protocol state machine is of equal value, it is necessary to produce with real protocol state machine Sufficient amount of cycle tests, compare output of the protocol entity to corresponding cycle tests whether to candidate's protocol state machine to corresponding The output phase of cycle tests is same.Cycle tests is constructed based on normal agreement character string sequence, avoids generation at random from causing big The defects of measuring invalid test sample.If it find that cause to export different cycle tests, the cycle tests be counter-example, it is necessary to Take it as a basis and further implement state machine deduction.
(8) according to counter-example extension observation table:, it is necessary to which the suffix according to counter-example expands after finding as the character string of counter-example Unroll and view a scroll painting and examine the E set of table, on this basis filling observation table, so as to more fully distinguish different agreement state.Repeat (3)-(7) the step of, until exporting the state machine result with real protocol state machine equivalence.
Entirety with reference to shown in figure 1 realizes flow, and the protocol state machine estimating method of the present embodiment mainly includes message lattice Formula extraction, the initialization of observation table, observation table closed inspection, the filtering of invalid inquiry sequence, inquire directly in response to, candidate state machine 8 parts, the specific embodiments such as construction, judgement of equal value, foundation counter-example extension observation table illustrate individually below.
(1) message format extracts
The embodiment of the present invention largely collects the input and output sequence of message that the communication of protocol entity program network produces first, and Incoming message and outgoing message are obtained using the message format extracting method of PI (Protocol Information Project) Format information.Classify respectively to incoming message and outgoing message according to message format, there will be mutually isostructural message It is classified as one kind.For each classification, it is identified using unique Arabic numerals (such as 1,2,3).
On the basis of message classification, in units of session, network service behavior is abstracted.Session represents communication ginseng The partial data carried out between person exchanges, and can be reflected in the migration situation of protocol status in communication process.In order to just In implementing state machine deduction, the present invention will input, the Sequence Transformed character formed for abstract symbol of the session of outgoing message composition String sequence.
(2) table initialization is observed
Observe the form that table is triple:(S, E, T), the row of observation table are made of S ∪ S Σ, and S Σ=sa | s ∈ S, a ∈ Σ }, Σ is denoted as the set of the abstract symbol of input, and symbol " " represents the splicing relation of intercharacter, observes table Row be made of E.S ∪ S Σ and E are mapped as output character by function T, the value as corresponding table entries in form.Observation S gathers corresponding character string and represents the state that agreement may have in table, in observation table E gather corresponding character string play by The effect that different agreement state distinguishes.Fig. 2 is the example of an observation table.
When observing table initialization, S={ ε } is made, E=Σ, wherein ε represent null character string.Determining the row and column of observation table Afterwards, to observe each table entries structural string of table, the prefix of character string is the row value corresponding to table entries, character string Suffix is the train value corresponding to table entries.Character string is instantiated according to message format, generates the defeated of protocol entity program Enter sequence of message, obtain outgoing message of the protocol entity program for corresponding incoming message sequence.By outgoing message according to form Information is output string, and according to the length of the train value of table entries, the value of table entries is arranged to the defeated of corresponding length Go out string postfix.
(3) table closed inspection is observed
Whether the requirement that observation table must is fulfilled for closed can construct protocol state machine, it is therefore desirable to full to observation table Sufficient closed requirement is judged:I.e. to any t ∈ S Σ, if all there are s ∈ S, meet that s and t is of equal value.For example, in Fig. 2 In, all there is row of equal value in the row in all S Σ, which meets closed condition in S.
If closed condition is unsatisfactory for, finding causes the ungratified row t ∈ S Σ of closed, and t is moved to S, needs at the same time T Σ are gathered and are added in the original S Σ set of observation table.In order to fill newly generated table entries after the extension of observation table, Need to produce inquiry message, corresponding output information is obtained, so as to be table entries assignment.If closed condition meets, by structure Make candidate's protocol state machine corresponding with observation table.
(4) invalid inquiry sequence filtering
By the way that to the input of protocol entity program, the study of outgoing message session sample set, filtering rule can be formulated, with mistake The obvious sequence of message for not meeting protocol requirement of filter.The embodiment of the present invention mainly considers to close according to the order between all types of messages System formulates filtering rule.
Often there are certain ordinal relation between the type of message of agreement.For example, smtp protocol with HELO (or EHLO) the beginning as session, the end of session is used as using QUIT.
The analysis object of the present invention is proprietary protocol, can not grasp the ordinal relation between type of message in advance, it is necessary to fill Divide and obtain information using message session sample set.For the ease of analysis, the base for being extracted in message format extraction of filtering rule Implement on plinth.For an incoming message sequence S=represented with abstract symbol (... 4 ... 7 ... 10 ...), in the sequence of message Symbol 4 is located at before 7, and symbol 7 is located at before 10.The embodiment of the present invention is intersymbol in recorded message sequence with a matrix type Order information.
If the number of types of incoming message is n, the sequence constraint set of relations on message sample set can be with n × n's Matrix represents.The element m of the jth row kth row of matrixjkRepresent the quantity for the session sample that symbol j is located at before symbol k.If mjk>=10 (10 threshold values set by the present embodiment) and mkj=0, then it is assumed that symbol j must be present in before symbol k.If In cycle tests, if before symbol k appears in symbol j, then it is assumed that corresponding cycle tests is invalid.
The present embodiment sets a threshold to 10, and the order mainly occurred in view of partial symbols does not have stringent in the protocol Limitation, the ordinal relation of two kinds of sign patterns must have enough generalities in sample set, can just close this order It is the condition as invalid judgement.
Based on the matrix constructed, the sequence constraint relation between all types of messages is extracted, formulates corresponding filtering Rule, for example, the message for belonging to type 4 and type 7 must be present in before belonging to the message of Class1 0.If some is as inquiry It is invalid that the incoming message sequence asked is judged as, then directly filters, handled without being sent to protocol entity program, together When return and inquire invalid information.If the incoming message sequence as inquiry meets sequence constraint relation between message, then into The processing of row next step.
(5) inquiry directly in response to
The many inquiries occurred during state machine deduction, can utilize the information of sample focusing study to be directly inferred to As a result and response is carried out, without being interacted with protocol entity program.In order to reach this purpose, learn first by abstract characters String sequence form protocol conversation sample, using can accurate recording session sample inquire about at the same time simplicity data structure, record The information of session sample set.
The embodiment of the present invention is using enhancing prefix trees converter EPTT (Extended Prefix Tree Transducer) Structure recording conversation sample set information.EPTT is substantially a multiway tree, can see an original state machine, branch's generation as Table some session sample corresponding abstract symbol sequence.
Fig. 3 (b) illustrates the example for the corresponding EPTT of session sample set S constructions being abstracted by symbol in Fig. 3 (a).In Fig. 3 (b) in, original state is root node, is identified with Arabic numerals 0.State machine receives input since original state, if defeated Enter the state that symbol sebolic addressing is reached to be not present in original state machine, then create a new state, uniquely marked with Arabic numerals Know, and add corresponding output information in state migration procedure;If state is existing, the state is transferred to, is received next A input, repeats the above process all incoming symbol sequences received until EPTT in sample set S.
After EPTT construction completes, for the inquiry sent, carried out directly using the traversal mode of depth-first search Response.Assuming that output inquiry is (19,11), represent that inquiry inputs the output response of " 11 " under state [19].With Fig. 3 (b) exemplified by, state [19] corresponds to the state 4 being transferred to after 0 time incoming symbol sequence " 19 " of original state, traversing graph It is " 12 " that 3 (b), which understands that the output in 4 times inputs " 11 " of state responds, can be directly in response to renewal T (19,11)=12.If Output inquiry is (14,11), and corresponding information is not included in EPTT, then directly in response to failure, it is necessary to further real with agreement Body interacts.
In inquiry directly in response to the stage, when there is the incoming message sequence as inquiry, first attempt to tie using EPTT Structure responds inquiry, without inquiry is sent to protocol entity.If the message sample information according to EPTT records It can not determine to respond, then inquiry is sent to protocol entity program, accurately output result is obtained by interaction.Furthermore, it is possible to Record is sent to the inquiry of protocol entity and corresponding output, updates EPTT, and avoiding still can not be straight to similar inquiry in subsequent process Response is connect, further reduces the interaction times with protocol entity.
(6) candidate state mechanism makes
, can be according to observation table construction candidate's protocol state machine if observation table meets closed requirement.Protocol state machine table It is shown as hexa-atomic group of Mealy machines:(QM,I,O,δMM,q0M), wherein represent input set I and represent export set O according to The information filling that the message format extraction stage collects.According to the other information in observation table filling protocol state machine, finite state Collect QM=[s] | s ∈ S };Original state q0M=[ε];State transition function δM([s], i)=[si], for any s ∈ S, i ∈ Σ, state transition function represent the dbjective state shifted under state [s] after input i, and dbjective state, which corresponds in S, to be equivalent to The state of si;For any input i, output function λM([s], i)=T (s, i), is by row, i of s corresponding in observation table The table entries value of row.
Fig. 4 is candidate's protocol state machine according to the observation table construction shown in Fig. 2 in the present invention.QM={ [ε], [1] };Just Beginning state q0M=[ε];State transition function Output function λM([ε], 1)=TM(ε, 1)=10, λM([ε], 2)=TM(ε, 2)= 10,λM([1], 1)=TM(1,1)=11, λM([1], 2)=TM(1,2)=10.
(7) it is of equal value to judge
In order to determine whether candidate's protocol state machine is of equal value, it is necessary to produce sufficient amount of survey with real protocol state machine Try sequence.After cycle tests is instantiated as the incoming message of protocol entity, protocol entity program is obtained for these tests Sequence output response, compare protocol entity output whether the output phase to candidate's protocol state machine to corresponding cycle tests Together.
With riThe quantity of the cycle tests produced needed for representing, the embodiment of the present invention is by riAngulin is arranged in its paper 《Learning regular sets from queries and counterexamples》(Angluin D.Learning regular sets from queries and counterexamples[J].Information and computation, 1987,75(2):Numerical value in 87-106):1/ε(ln(1/δ)+ln2(i+1)).Wherein ε is accuracy, represents what is generated every time Random mark sequence is that the probability of counter-example is not more than ε;δ is confidence level, and i is the number of the candidate's Mealy machines generated. Angulin judges the upper limit of membership qualification inquiry in paper using the numerical value, and application scenarios of the invention are similar to its, real Apply in example and determine the end condition of equal value judged based on output response using the value.
In decision process of equal value, it is counter-example to cause both to export the different cycle tests of result.Cycle tests is such as Fruit completely random generates, then wherein most is all invalid.The embodiment of the present invention is used on the basis of normal message sequence, By being inserted into, replacing, the operation such as deleting and carry out character string variation, the phase of normal message sequence and counter-example in structure is made full use of Like the validity of property, on the whole raising cycle tests.
If do not find counter-example after sufficient amount of cycle tests is sent, then it is assumed that candidate's protocol state machine is limiting In the range of it is of equal value with actual protocol state machine.If have found counter-example, illustrate that candidate's protocol state machine is unsatisfactory for equivalence and sentences Fixed condition into observation table by counter-example Information expansion, it is necessary to further implement state machine deduction.
(8) according to counter-example extension observation table
, it is necessary to the E set of the suffix extension observation table according to counter-example after finding as the character string of counter-example.The present invention Embodiment determines that the minimum of counter-example distinguishes suffix first.Make counter-example c=uv, wherein u ∈ S ∪ SI, u be c prefix in it is right The output the answered length longest character string same with protocol entity program the output phase.Correspondingly, v is exactly by candidate's agreement in counter-example The shortest suffix character string of length that state machine is distinguished with protocol entity time of day machine.
After v is determined, all suffix of v and v are all added in the E of observation table, it is ensured that observe the closure of table Property be meet, also avoid to the unnecessary extension of observation table.For example, if v is 123,123,23 and 3 all It should be added in the E set of observation table and (if respective element has existed in E set, need not additionally increase).
Filling observation table on this basis, repeat before the step of, it is of equal value with real protocol state machine until exporting State machine result.
From above technical scheme, the protocol state machine active estimating method of the invention based on protocol knowledge, in shape On the basis of state owner moves deduction algorithm LM+ algorithms, according to protocol conversation sample set, the sequence constraint between protocol massages is extracted Various invalid inquiries are filtered, while the inquiry classification occurred in session sample directly ring based on protocol conversation sample set Should, in addition, by the significantly more efficient counter-example for searching candidate state machine of the method based on positive example sample variation, lifted on the whole The efficiency that protocol state machine is actively inferred.Need to obtain protocol entity program using the method, and can run as needed Entity program, is sent to specific sequence of message, and observes corresponding message output, infers in this, as protocol state machine Basis.
In conclusion the protocol state machine active estimating method based on protocol knowledge of the present invention, protocol system is considered as State transition system with output, using the input and output message of protocol procedure as foundation is inferred during state machine deduction, fills Divide the sequence constraint relation considered between all kinds of messages, avoid and be sent to the invalid packet for largely running counter to agreement fundamental characteristics Protocol entity program.Secondly, during state machine is inferred, make full use of protocol conversation sample set trial to inquiry message into Row is directly in response to can reduce and to be interacted with protocol entity, improve and infer efficiency.In addition, infer candidate state machine whether with When time of day machine is of equal value, test packet sequence is constructed based on normal message sequence, makes full use of positive example being tied with counter-example Similitude on structure, improves the validity of cycle tests, and then improves the accuracy of state machine inferred results on the whole.
Although the present invention is disclosed above with preferred embodiment, so it is not limited to the present invention.Skill belonging to the present invention Has usually intellectual in art field, without departing from the spirit and scope of the present invention, when can be used for a variety of modifications and variations.Cause This, the scope of protection of the present invention is defined by those of the claims.

Claims (4)

1. a kind of protocol state machine active estimating method based on protocol knowledge, it is characterised in that comprise the following steps:
(1) message format extracts:The protocol format for outputting and inputting message is obtained using message format extracting method, there will be phase Message with form is divided into one kind, and the classification information of message is represented with abstract symbol, will output and input message composition Session sequence is abstracted as the character string sequence that abstract symbol is formed;
(2) table initialization is observed:Observation table is triple (S, E, T), and the row of observation table is made of S ∪ S Σ, S Σ= Sa | and s ∈ S, a ∈ Σ }, Σ is denoted as the set of the abstract symbol of input, and the splicing that symbol " " represents intercharacter is closed System, the row of observation table are made of E;S ∪ S Σ and E are mapped as output character by function T, as corresponding table entries in form Value;When observing table initialization, S={ ε } is made, E=Σ, wherein ε represent null character string, are produced to observe each table entries of table Corresponding incoming message sequence, and be table entries assignment according to the corresponding outgoing message information of protocol entity;
(3) table closed inspection is observed:Judge whether the observation table after initialization meets the requirement of closed, if closed bar Part is unsatisfactory for, and finding causes the ungratified row t ∈ S Σ of closed, and t is moved to S, and the S Σ collection of respective extension observation table Close;In order to fill newly generated table entries, it is necessary to produce inquiry message, corresponding output information is obtained, so as to be assigned for table entries Value;
If closed condition meets, it will enter step (6), construction candidate's protocol state machine corresponding with observation table;
(4) invalid inquiry sequence filtering:According to session sample set, the sequence constraint relation between all kinds of messages is extracted, formulates phase The filtering rule answered, if the incoming message sequence as inquiry is judged as invalid, directly filtering;If as inquiry Incoming message sequence meets the sequence constraint relation between message, then carries out the processing of next step;
(5) inquiry directly in response to:, can be with if the incoming message sequence as inquiry by learning to session sample set Directly inferred by session sample set and draw response results, then directly implement response, without inquiry is sent to agreement reality Body;If can not determine corresponding response message according to message sample set, then inquiry is sent to protocol entity program and is handed over Mutually;
(6) candidate state mechanism makes:, can be according to the agreement shape of observation table construction candidate after observation table meets closed requirement State machine;Protocol state machine is expressed as the form of Mealy machines, wherein input set and output set extract the stage according to message format The information filling of collection, other information is according to observation table filling;
(7) it is of equal value to judge:In order to determine whether candidate's protocol state machine is of equal value, it is necessary to produce enough with real protocol state machine The cycle tests of quantity, compares whether output of the protocol entity to corresponding cycle tests is tested corresponding to candidate's protocol state machine The output phase of sequence is same;Cycle tests is constructed based on normal agreement character string sequence, avoids generation at random from causing a large amount of nothings The defects of imitating test sample;If it find that causing to export different cycle tests, which is counter-example, it is necessary to it Based on further implement state machine deduction;
(8) according to counter-example extension observation table:, it is necessary to which the suffix extension according to counter-example is seen after finding as the character string of counter-example The E set of table is examined, on this basis filling observation table, so as to more fully distinguish different agreement state;Repetition (3)- (7) the step of, until exporting the state machine result with real protocol state machine equivalence.
2. the protocol state machine active estimating method according to claim 1 based on protocol knowledge, it is characterised in that foregoing The workflow of the invalid inquiry sequence filtering of step (4) is as follows:By outputting and inputting message session sample to protocol entity program The study of this collection, extracts the sequence constraint relation between all types of messages, formulates corresponding filtering rule, if some is as inquiry It is invalid that the incoming message sequence asked is judged as according to filtering rule, then directly filters, without being sent to protocol entity journey Sequence is handled;If the incoming message sequence as inquiry meets the sequence constraint relation between message, then carries out next step Processing.
3. the protocol state machine active estimating method according to claim 1 based on protocol knowledge, it is characterised in that foregoing Step (5) inquiry directly in response to workflow it is as follows:Learn the protocol conversation sample being made of abstract characters string sequence first, Using enhancing prefix trees converter EPTT (Extended Prefix Tree Transducer) data structure, EPTT is one Multiway tree, can see an original state machine as, and branch represents the corresponding abstract symbol sequential recording meeting of some session sample Talk about the information of sample set;When there is the incoming message sequence as inquiry, first attempt to utilize the information pair in EPTT structures Inquiry is responded, without inquiry is sent to protocol entity;If can not determine response message according to message sample set, Inquiry is sent to protocol entity program again.
4. the protocol state machine active estimating method according to claim 1 based on protocol knowledge, it is characterised in that foregoing The workflow of step (7) judgement of equal value is as follows:In order to determine candidate's protocol state machine whether with real protocol state machine etc. Valency is, it is necessary to produce sufficient amount of cycle tests;After cycle tests is instantiated as the incoming message of protocol entity, assisted Discuss entity program for these cycle tests output response, compare protocol entity output whether with candidate's protocol state machine pair The output phase of corresponding cycle tests is same, and it is counter-example to cause both to export the different cycle tests of result;
Using on the basis of normal message sequence, by being inserted into, replacing and delete operation carries out character string variation, make full use of The similitude of normal message sequence and counter-example in structure, improves the validity of cycle tests on the whole;
If do not find counter-example after sufficient amount of cycle tests is sent, then it is assumed that candidate's protocol state machine is limiting scope It is interior of equal value with actual protocol state machine;If have found counter-example, illustrate that candidate's protocol state machine is unsatisfactory for judgement bar of equal value Part into observation table by counter-example Information expansion, it is necessary to further implement state machine deduction.
CN201510134335.7A 2015-03-25 2015-03-25 Protocol state machine active estimating method based on protocol knowledge Active CN104767744B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201510134335.7A CN104767744B (en) 2015-03-25 2015-03-25 Protocol state machine active estimating method based on protocol knowledge

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201510134335.7A CN104767744B (en) 2015-03-25 2015-03-25 Protocol state machine active estimating method based on protocol knowledge

Publications (2)

Publication Number Publication Date
CN104767744A CN104767744A (en) 2015-07-08
CN104767744B true CN104767744B (en) 2018-05-15

Family

ID=53649351

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201510134335.7A Active CN104767744B (en) 2015-03-25 2015-03-25 Protocol state machine active estimating method based on protocol knowledge

Country Status (1)

Country Link
CN (1) CN104767744B (en)

Families Citing this family (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109446146B (en) * 2018-11-09 2022-02-08 中国科学院长春光学精密机械与物理研究所 State transition sequence generation method of application layer communication protocol
CN110213130A (en) * 2019-06-03 2019-09-06 南京莱克贝尔信息技术有限公司 A kind of industry control protocol format analysis method based on iteration optimization
CN112733155B (en) * 2021-01-28 2024-04-16 中国人民解放军国防科技大学 Software forced safety protection method based on external environment model learning
CN113609344B (en) * 2021-09-29 2022-01-14 北京泰迪熊移动科技有限公司 Method and device for constructing byte stream state machine, electronic equipment and storage medium
CN114172972B (en) * 2021-11-11 2023-08-15 中国工程物理研究院计算机应用研究所 Unknown protocol behavior reverse inference method based on optimized random converter model

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN102123413A (en) * 2011-03-29 2011-07-13 杭州电子科技大学 Network monitoring and protocol analysis system of wireless sensor network
CN103200203A (en) * 2013-04-24 2013-07-10 中国人民解放军理工大学 Semantic-level protocol format inference method based on execution trace
CN103441990A (en) * 2013-08-09 2013-12-11 中国人民解放军理工大学 Protocol state machine automatic inference method based on state fusion
WO2014169227A1 (en) * 2013-04-12 2014-10-16 Northeastern University Ontology-based waveform reconfiguration
CN104142888A (en) * 2014-07-14 2014-11-12 北京理工大学 Regularization state machine model design method with stateful protocol

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN102123413A (en) * 2011-03-29 2011-07-13 杭州电子科技大学 Network monitoring and protocol analysis system of wireless sensor network
WO2014169227A1 (en) * 2013-04-12 2014-10-16 Northeastern University Ontology-based waveform reconfiguration
CN103200203A (en) * 2013-04-24 2013-07-10 中国人民解放军理工大学 Semantic-level protocol format inference method based on execution trace
CN103441990A (en) * 2013-08-09 2013-12-11 中国人民解放军理工大学 Protocol state machine automatic inference method based on state fusion
CN104142888A (en) * 2014-07-14 2014-11-12 北京理工大学 Regularization state machine model design method with stateful protocol

Non-Patent Citations (2)

* Cited by examiner, † Cited by third party
Title
《Integration testing of components guided by incremental state machine learning》;Keqin Li et al.;《IEEE》;20061031;全文 *
《网络协议状态机逆向工程方法的研究》;刘威;《中国优秀硕士学位论文全文数据库信息科技辑》;20140615(第6期);全文 *

Also Published As

Publication number Publication date
CN104767744A (en) 2015-07-08

Similar Documents

Publication Publication Date Title
CN104767744B (en) Protocol state machine active estimating method based on protocol knowledge
CN105871882B (en) Network security risk analysis method based on network node fragility and attack information
Siganos et al. Analyzing BGP policies: Methodology and tool
CN109150640A (en) A kind of method for discovering network topology and system based on double layer network agreement
CN107665191A (en) A kind of proprietary protocol message format estimating method based on expanded prefix tree
CN108123931A (en) Ddos attack defence installation and method in a kind of software defined network
CN105282123A (en) Network protocol identification method and device
CN109508420A (en) A kind of cleaning method and device of knowledge mapping attribute
CN101426000A (en) General protocol parsing method and system
CN109697455B (en) Fault diagnosis method and device for distribution network switch equipment
CN107204975A (en) A kind of industrial control system network attack detection technology based on scene fingerprint
CN103457909B (en) A kind of Botnet detection method and device
CN108011894A (en) Botnet detecting system and method under a kind of software defined network
CN103441990B (en) The automatic estimating method of protocol state machine based on state fusion
CN110336789A (en) Domain-flux Botnet detection method based on blended learning
CN106452955A (en) Abnormal network connection detection method and system
CN101883023A (en) Firewall pressure testing method
CN108229578A (en) Image data target identification method based on three layers of data, information and knowledge collection of illustrative plates framework
Maier Identification of timed behavior models for diagnosis in production systems.
CN112134720A (en) Network topology discovery method
CN112291226B (en) Method and device for detecting abnormity of network flow
CN108121796A (en) Electric energy metering device failure analysis methods and device based on confidence level
CN105553787B (en) Edge net egress network Traffic anomaly detection method based on Hadoop
Hillston et al. Formal techniques for performance analysis: blending SAN and PEPA
CN109861846A (en) Using call relation acquisition methods, system and storage medium

Legal Events

Date Code Title Description
C06 Publication
PB01 Publication
EXSB Decision made by sipo to initiate substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant