CN103825784B - A kind of non-public protocol fields recognition methods and system - Google Patents

A kind of non-public protocol fields recognition methods and system Download PDF

Info

Publication number
CN103825784B
CN103825784B CN201410110570.6A CN201410110570A CN103825784B CN 103825784 B CN103825784 B CN 103825784B CN 201410110570 A CN201410110570 A CN 201410110570A CN 103825784 B CN103825784 B CN 103825784B
Authority
CN
China
Prior art keywords
field
message
identified
dynamic
character
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Expired - Fee Related
Application number
CN201410110570.6A
Other languages
Chinese (zh)
Other versions
CN103825784A (en
Inventor
于宏毅
李林林
李青
张效义
林荣强
陶思宇
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
PLA Information Engineering University
Original Assignee
PLA Information Engineering University
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by PLA Information Engineering University filed Critical PLA Information Engineering University
Priority to CN201410110570.6A priority Critical patent/CN103825784B/en
Publication of CN103825784A publication Critical patent/CN103825784A/en
Application granted granted Critical
Publication of CN103825784B publication Critical patent/CN103825784B/en
Expired - Fee Related legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Landscapes

  • Communication Control (AREA)
  • Computer And Data Communications (AREA)

Abstract

This application discloses a kind of non-public protocol fields recognition methods, by choosing binary sequence field to any portion message in message sample set, then the binary sequence field of selection is matched with known character field, the binary sequence field that the match is successful is defined as character field;Then character field will all be removed per a message, remainder is marked off into multiple fields to be identified according to identical preset rules, field to be identified to being in same position in all messages, count its mathematical feature, when mathematical feature meets some requirements, dynamic field is determined that it is, is otherwise non-dynamic field;The correlation of each bit and character field in non-dynamic field is calculated, correlation is more than the first preset value and continuous bit merges into a field, the field after merging is defined as static fields.So as to complete the identification to the character field, dynamic field and static fields of message.

Description

A kind of non-public protocol fields recognition methods and system
Technical field
The application is related to non-public agreement conversed analysis technical field, more specifically to a kind of non-public agreement word Section recognition methods and system.
Background technology
It, to carry out a series of rule, the standard convention that network data exchange is set up, is that computer data leads to that agreement, which is, The core of letter, is also the primary study object of network safety filed.It is all non-disclosure agreement to have in the agreement used at present a lot, I.e. not the data format of disclosure agreement and interaction code.And agreement conversed analysis is referred in situation about being described independent of agreement Under, it is monitored and analyzes by the network inputs and output to protocol entity, system action and instruction execution flow, extracts association Discuss the process of form and protocol status machine information.Existing relatively common agreement conversed analysis such as sequence of message is analyzed.
But, existing sequence of message analysis method is analyzed by base unit of byte, and to agreement report The analysis of literary form all relies on the format identification field in protocol massages.And in actual applications, some can be run into bit It is base unit without mark(Form)The agreement of field, such protocol message is made up of content field completely, different field it Between there is no blank character, each field is also without flag bit.Therefore, existing this sequence of message analysis method is not suitable for such The analysis of agreement.
The content of the invention
In view of this, it is existing for solving this application provides a kind of non-public protocol fields recognition methods and system Sequence of message analysis method be not suitable for bitwise defining without identity protocol message the problem of.
To achieve these goals, it is proposed that scheme it is as follows:
A kind of non-public protocol fields recognition methods, including:
Receive the message sample set being made up of same type of message;
By any portion message in the message sample set according to the first preset length, slide successively choose two from left to right System sequence field, multiple binary sequence fields is matched with the character field of known length, the character words The length of section is equal with first preset length;
The binary sequence field that the match is successful is defined as character field, the character field is used to state destination name Claim;
The part of the character field will be removed in every a message, marked off according to identical preset rules multiple Field to be identified;
The field to be identified to being in same position in all messages, counts its mathematical feature;
When the mathematical feature meets preparatory condition, the field to be identified is defined as dynamic field, otherwise to be non-dynamic State field, the dynamic field is used for the multidate information for stating target;
The continuous dynamic field is merged into a field, the field after merging is defined as a complete dynamic Field;
The correlation of each bit and the character field in the non-dynamic field is calculated, correlation is more than first Preset value and continuous bit merges into a field, and the field after merging is defined as static fields, the static word Section is used to state the intrinsic characteristic of target.
Preferably, the part that the character field will be removed in every a message, according to the default rule of identical Multiple fields to be identified are then marked off, are specially:
Field to be identified is selected using sliding window method, window length is set to L, using 1 bit as step-length, in every a message The part of the character field is removed, from left to right slides successively and selects multiple fields to be identified.
Preferably, the field to be identified of same position is in all messages, is specially:
In each message, the field to be identified that the identical and length in initial bits position is L.
Preferably, the field to be identified of same position is in described pair of all messages, its mathematical feature is counted, had Body is:
The field to be identified to being in same position in all messages, counts its " 0 " " 1 " ratio average deviation, number It is worth coverage rate and normalization numeric distribution variance.
Preferably, described its " 0 " " 1 " ratio average deviation of statistics includes:
Count each bit in the field to be identified and the probability of " 0 " or " 1 " occur:
Wherein, ajFor the probability that j-th bit position is " 0 " or " 1 ", NjRepresent the message sample that j-th bit position is " 0 " or " 1 " Quantity, m be message sample concentrated messages sample total quantity, n be each message sample length;
Field to be identified " 0 " " 1 " the ratio average deviation:
Wherein x1Span is [0,1].
Preferably, described its numerical value coverage rate of statistics, including:
Numerical value coverage rate:
Wherein, KLRepresent that length is the value species number of L field, K in the messageL∈ { 1,2 ..., 2n, 2LRepresent Random length is the different value species numbers of L field in the message,
Preferably, its normalization numeric distribution variance of the statistics, including:
Numeric distribution variance is:
Wherein, NiRepresent that field value to be identified is the quantity of i message sample in m message sample, i is to treat herein The value of identification field is scaled the value after the decimal system;
Normalizing numeric distribution variance is:
x3Span is [0,1].
It is preferably, described that the field to be identified is defined as dynamic field when the mathematical feature meets preparatory condition, Otherwise it is non-dynamic field, is specially:
" 0 " " 1 " ratio average deviation, the numerical value coverage rate and the normalization numeric distribution variance by described in add respectively Summed after power:
Wherein, ωi(i=1,2,3)Respectively xiThe weighted value of (i=1,2,3),
Judge Score (L) and threshold value σ magnitude relationship, when Score (L) is more than σ, the field to be identified is defined as Dynamic field, is otherwise defined as non-dynamic field.
A kind of non-public protocol fields identifying system, including:
Message sample set receiving unit, for receiving the message sample set being made up of same type of message;
Character match unit, for by any portion message is according to the first preset length in the message sample set, from a left side Binary sequence field is chosen to right slide successively, the character field of multiple the binary sequence fields and known length is entered Row matching, the length of the character field is equal with first preset length, then by the binary sequence word that the match is successful Section is defined as character field, and the character field is used to state target designation;
Field chooses unit, for the part by the character field is removed in every a message, according to identical Preset rules mark off multiple fields to be identified;
Dynamic field determining unit, for the field to be identified to being in same position in all messages, counts it Mathematical feature, and judge whether the mathematical feature meets preparatory condition, it is when judged result is to be, the field to be identified is true It is set to dynamic field, is otherwise non-dynamic field, the dynamic field is used for the multidate information for stating target, finally will be continuous The dynamic field merges into a field, and the field after merging is defined as into a complete dynamic field;
Static fields determining unit, the phase for calculating each bit and the character field in the non-dynamic field Guan Xing, is more than the first preset value by correlation and continuous bit merges into a field, and the field after merging is determined For static fields, the static fields are used to state the intrinsic characteristic of target.
Preferably, the dynamic field determining unit includes:Characteristic statisticses unit, feature judging unit and combining unit, Wherein,
The characteristic statisticses unit is used for " 0 " " 1 " ratio average deviation, the numerical value coverage rate for counting the field to be identified With normalization numeric distribution variance;
The feature judging unit is used to judging described " 0 " " 1 " ratio average deviation, the numerical value coverage rate and described returned Whether the summing value after one change numeric distribution variance is weighted respectively is more than threshold value, when judged result is to be, by the word to be identified Section is defined as dynamic field, is otherwise non-dynamic field, and the dynamic field is used for the multidate information for stating target;
The combining unit is used to the continuous dynamic field merging into a field, and the field after merging is determined For a complete dynamic field.
It can be seen from above-mentioned technical scheme that, non-public protocol fields recognition methods disclosed in the present application, by report Any portion message enters line slip selection binary sequence field according to the first preset length in literary sample set, then by selection Binary sequence field is matched with the character field of known length, wherein the length of character field and the first preset length phase Deng, the binary sequence field that the match is successful is defined as character field, then character field will be all removed per a message, will Remainder marks off multiple fields to be identified according to identical preset rules, and treating in same position in all messages is known Malapropism section, counts its mathematical feature, when mathematical feature meets some requirements, that is, dynamic field is determined that it is, otherwise to be non- Dynamic field, a field is merged into by continuous dynamic field, and the field after merging is defined as a complete dynamic field, Dynamic field states the multidate information of target.Because static fields are all the static informations that describe same target, therefore static word All there is very strong correlation between section, and character field statement be target title, it can also see in some sense Work is a kind of static fields, therefore calculates the correlation of each bit and character field in non-dynamic field, and correlation is big In the first preset value and continuous bit merges into a field, the field for merging sum is defined as static fields.So as to complete Into the identification of character field, dynamic field and static fields to message.
Brief description of the drawings
, below will be to embodiment or existing in order to illustrate more clearly of the embodiment of the present application or technical scheme of the prior art There is the accompanying drawing used required in technology description to be briefly described, it should be apparent that, drawings in the following description are only this Some embodiments of application, for those of ordinary skill in the art, on the premise of not paying creative work, can be with Other accompanying drawings are obtained according to these accompanying drawings.
Fig. 1 is a kind of non-public protocol fields recognition methods flow chart disclosed in the embodiment of the present application;
Fig. 2 is the disclosed schematic diagram for sliding the binary sequence field for choosing message of the embodiment of the present application;
Fig. 3 is a kind of non-public protocol fields identifying system structure chart disclosed in the embodiment of the present application;
Fig. 4 is dynamic field determining unit structure chart disclosed in the embodiment of the present application.
Embodiment
Below in conjunction with the accompanying drawing in the embodiment of the present application, the technical scheme in the embodiment of the present application is carried out clear, complete Site preparation is described, it is clear that described embodiment is only some embodiments of the present application, rather than whole embodiments.It is based on Embodiment in the application, it is all other that those of ordinary skill in the art are obtained under the premise of creative work is not paid Embodiment, belongs to the scope of the application protection.
Traditional sequence of message analysis method is analyzed by base unit of byte, to protocol massages format analysis Dependent on the format identification field in protocol massages.And for some agreements without identification field by base unit of bit, Traditional method will be unable to be analyzed.The characteristics of the application is directed to such agreement, it is proposed that one kind is based on statistics feature Analysis method, its message is divided into character field, dynamic field and quiet by this method according to the application background of such agreement first The class primary fields of state field three, respectively by the mathematical feature of character match, three class fields of analysis, to obtain the word of final message Paragraph format.Embodiment sees below.
Embodiment one
Referring to Fig. 1, Fig. 1 is a kind of non-public protocol fields recognition methods flow chart disclosed in the embodiment of the present application.
As shown in figure 1, this method includes:
Step 101:Receive the message sample set being made up of same type of message;
Specifically, same type of message is referred to here:The form of message is identical.For example:
Message " ... 00110110 ... " first three bit represents the entitled A of target, and rear five bits are represented The speed of target is V;Message " ... 10110010 ... " first three bit represents the entitled B of target, rear five bits The speed that position represents target is M.It can be seen that, although the title of the target representated by two parts of messages is different with speed, but two parts of reports The form of text is identical, i.e., be the title that first three bit represents target, rear five bits represent the speed of target.
Therefore, here we receive be the message sample set being made up of same type of message.
Step 102:Any portion message in the message sample set, according to the first preset length, is slided successively from left to right It is dynamic to choose binary sequence field, multiple binary sequence fields are matched with the character field of known length;
Specifically, the length of the character field is equal with first preset length, multiple character fields known to us Length and content, then according to the length of character field come from left to right successively slide choose message binary sequence word Section, then matches the binary sequence field of selection with character field, for example:
Referring to Fig. 2, Fig. 2 is the disclosed schematic diagram for sliding the binary sequence field for choosing message of the embodiment of the present application.
As illustrated in fig. 2, it is assumed that the length of known character field is 6 bits, then from left to right according to every 6 bits For a unit, using 1 bit as step-length, slide choose binary sequence field successively.Then it is multiple by what is selected Field is gradually matched with known character field.
Step 103:The binary sequence field that the match is successful is defined as character field, the character field is used to state Target designation;
Specifically, matched in the manner described above, can draw some mark off come binary sequence field with Know that character field is matched completely, then it is a character field to represent the binary sequence field, it is used for the title for stating target.
Step 104:The part of the character field will be removed in every a message, drawn according to identical preset rules Separate multiple fields to be identified;
Specifically, because the type of each part message is identical, as long as so identifying any a character field, we are Can be according to the character field for identifying other messages.Then, the binary sequence field of every a message is all removed into character words Section, then marks off multiple fields to be identified by remainder according to identical preset rules.Here preset rules are not entered Row is limited, naturally it is also possible to be according to the method shown in Fig. 2, or others mode.
Step 105:The field to be identified to being in same position in all messages, counts its mathematical feature;
Specifically, because all messages are the divisions according to identical rule progress, therefore in every a message All there is a position identical field to be identified.For example:Also according to Fig. 2 dividing mode, every six bits, which are one, to be treated Recognize field, then in all messages, the field to be identified all constituted in the presence of " 1-6 " bit, this is a same position Field to be identified.Equally, all there is the field to be identified that " 2-7 " bit is constituted in all messages, this is also a phase With the field to be identified of position.Generally speaking, same position is to refer to that initial bits position is identical and length of field to be identified Also it is identical.Field to be identified for being in same position in all messages, we count its mathematical feature.
Step 106:When the mathematical feature meets preparatory condition, the field to be identified is defined as dynamic field, it is no It is then non-dynamic field, the dynamic field is used for the multidate information for stating target;
Specifically, because the specific of dynamic field is to change over time, the probability of the numeric state appearance of each bit Relatively, therefore by statistical mathematics feature, when it meets certain preparatory condition, it is believed that the field to be identified is One dynamic field, is otherwise non-dynamic field.Dynamic field is used for the multidate information for stating target, for example:The speed of target, The position of target.
Step 107:The continuous dynamic field is merged into a field, by the field after merging be defined as one it is complete Whole dynamic field;
Specifically, because we do not know the original position and length of dynamic field in the message in advance, therefore mark off The length of field to be identified be not necessarily the length of true dynamic field, it is possible to real dynamic field is one very long Field, and the field length that we mark off is smaller, it is therefore desirable to continuous dynamic field is merged into a complete dynamic Field, the dynamic field after merging is only real dynamic field.
Step 108:The correlation of each bit and the character field in the non-dynamic field is calculated, by correlation More than the first preset value and continuous bit merges into a field, and the field after merging is defined as static fields, institute Stating static fields is used to state the intrinsic characteristic of target.
Specifically, by former steps, character field and dynamic field are all determined, remaining part includes static state Field and other fields, other fields are not that we want what is obtained here, as long as therefore identifying static fields.And Static fields are substantially the static information for describing same target, therefore this type of information is shown very in protocol massages sample set Strong correlation.And the character field determined before can equally regard a kind of static attribute of target as, therefore we can To calculate the correlation of each bit and the character field in non-dynamic field, correlation is more than the first preset value and continuous Bit merge into a field, and the field after merging is defined as static fields, the static fields are used to state mesh The intrinsic characteristic of mark.
So far, we be complete to the character field without identity protocol message bitwise, dynamic field and The identification work of static fields.
Non-public protocol fields recognition methods disclosed in the embodiment of the present application, by any a report in message sample set Text according to the first preset length enter line slip choose binary sequence field, then by the binary sequence field of selection with it is known The character field of length is matched, and wherein the length of character field is equal with the first preset length, two is entered the match is successful Sequence field processed is defined as character field, then character field will be all removed per a message, by remainder according to identical Preset rules mark off multiple fields to be identified, the field to be identified to being in same position in all messages, count its mathematics Feature, when mathematical feature meets some requirements, that is, determines that it is dynamic field, is otherwise non-dynamic field, will be continuous Dynamic field merges into a field, and the field after merging is defined as a complete dynamic field, dynamic field statement target Multidate information.All have because static fields are all the static informations that describe same target, therefore between static fields very strong Correlation, and character field statement be target title, it can also regard a kind of static fields as in some sense, Therefore the correlation of each bit and character field in non-dynamic field is calculated, correlation is more than the first preset value and continuous Bit merge into a field, by merge sum field be defined as static fields.So as to complete the character words to message The identification of section, dynamic field and static fields.
It should be noted that the above-mentioned part to removing character field in message, is marked off according to identical preset rules The process of multiple fields to be identified, can specifically use sliding window method, set a length of L of window, then using 1 bit as step-length, from a left side to Right slide successively selects multiple fields to be identified.Certainly, step-length can also be arranged to other numerical value herein, and this is all to need root Depending on actual conditions.
If field to be identified is chosen according to above-mentioned sliding window method, in each message, in the to be identified of same position Field is the field to be identified that the identical and length in initial bits position is the long L of window.
Embodiment two
In the present embodiment, in the identification process of dynamic field, the mathematical feature that we count further is explained:
We select in all messages be in same position field to be identified, carry out " 0 " " 1 " ratio average deviation, The statistics of numerical value coverage rate and normalization numeric distribution variance.
(1), " 0 " " 1 " ratio average deviation statistic processes it is as follows:
" 0 " " 1 " probability is counted first, and the probability of " 0 " or " 1 " occurs in each bit that " 0 " " 1 " probability refers to, and occurs " 0 " and there is " 1 " probability and be 1, i.e. p (0)+p (1)=1.
Wherein, ajFor the probability that j-th bit position is " 0 " or " 1 ", NjRepresent the message sample that j-th bit position is " 0 " or " 1 " Quantity, m be message sample concentrated messages sample total quantity, n be each message sample length;
Then field " 0 " " 1 " ratio average deviation to be identified is:
Wherein x1Span is [0,1], works as ajDuring equal to 0.5, i.e., when each bit " 0 ", " 1 " equiprobability occur, x1Take Maximum 1 is obtained, works as ajDuring equal to 0 or 1, i.e., when each bit " 0 " or " 1 " probability of occurrence are 1, x1Obtain minimum value 0.
(2), numerical value coverage rate statistic processes it is as follows:
The a length of L of any window field has 2LDifferent values are planted, it is L field that numerical value coverage rate, which refers to length in message, Value species number and 2LRatio:
Wherein, KL∈ { 1,2 ..., 2n}、
(3), normalization numeric distribution variance statistic processes it is as follows:
Field value distribution situation to be identified is counted first, and numerical value refers to binary numeral being converted to the decimal system herein Value afterwards, i.e. numeric distribution variance is:
Wherein, NiRepresent that field value to be identified is the quantity of i message sample in m message sample, i is to treat herein The value of identification field is scaled the value after the decimal system;
Normalizing numeric distribution variance is:
x3Span is [0,1].
Count after mathematical feature, then the judgement of dynamic field is carried out using the mathematical feature, detailed process is as follows:
Three mathematical features of above-mentioned statistics, its value is more big, represent the field to be identified as dynamic field possibility more Greatly, in order to comprehensively utilize these three features, we are by " 0 " " 1 " ratio average deviation, numerical value coverage rate and the normalization numerical value Distribution variance is summed after weighting respectively:
Wherein, ωi(i=1,2,3)Respectively xiThe weighted value of (i=1,2,3),
Judge Score (L) and threshold value σ magnitude relationship, when Score (L) is more than σ, the field to be identified is defined as Dynamic field, is otherwise defined as non-dynamic field.σ value is relevant with protocol data, can not provide here one it is pervasive Optimal value.0.5 is rule of thumb typically taken, exact numerical values recited need to make the appropriate adjustments with reference to agreement application background.
Embodiment three
In the present embodiment, during introducing sliding window method selection field to be identified, the long setting principle of window.
For length of window L, selection can neither it is too small can not be too big, field length to be identified easily surpasses if excessive The scope of dynamic field is crossed, then dynamic field can be determined as to non-dynamic field;On the contrary, if too small field values to be identified Non-dynamic field very little, is easily mistaken for dynamic field by change.
We can utilize numerical value coverage rate x2To determine L value:
Gradually increase L value since 1, until the numerical value coverage rate of all fields to be identified is both less than 1, obtain now L values, referred to as Lmax, then L value is as follows:
Experimental verification, according to above-mentioned L value mode, discrimination highest.
Example IV
For the identification process of static fields, we do description below:
Static fields mainly carry the static information of target, such as AIS(Automatic Identification System, Ship automatic identification system)Ship IMO numberings in protocol massages(IMO, International Maritime Organization, International Maritime Organization, ship IMO numberings are that International Maritime Organization is the number that each ship is compiled), captain, the beam Deng.Static fields in same message all describe the static information of same target substantially, therefore in protocol massages sample set This type of information shows very strong correlation.We need to recognize static fields according to the correlation between static information. The character field identified in the first stage in character match can equally regard a kind of static attribute of target as, in this stage Given information is considered as to use.By character field(Such as name of vessel)As benchmark field, by calculating other fields and benchmark The correlation of field identifies other static fields, if a certain field is related to character field each bit of this field Position is all associated, simultaneously because length and the position of field to be identified are not can determine that, so bitwise carrying out here Correlation analysis, most continuous related bits position merges composition static fields at last.
Protocol massages sample set is clustered according to benchmark field first, the message with same datum field is polymerized to One class, it is assumed that be copolymerized into ncGroup.Then, calculate it is all kinds of in ratio shared by each bit " 0 " and " 1 ", take among both most Big value is corresponding bits position and the correlation of benchmark field in the group.Finally seek average correlation of each bit in each group Property, as the bit in such message and the correlation of benchmark field, it is shown below:
WhereinJ-th bit position " 0 " in i-th group of message, ratio shared by " 1 " are represented respectively; Respectively Represent j-th bit position upper " 0 ", the quantity of " 1 " message in i-th group of message;niRepresent i-th group of message amount.
δijRepresent j-th bit position and the correlation of benchmark field in i-th group of message.
δjRepresent the bit and the correlation of benchmark field in such message.δjThe bigger explanation of value is in bit and benchmark The correlation of field is stronger, δjRepresent that the bit and benchmark field are perfectly correlated, illustrate that the bit belongs to static word equal to 1 Section, most the adjacent bit hyte related to benchmark field constitutes static fields altogether at last.According to the purity level of sample set δ can be adjustedjJudgment threshold, the more pure then threshold value δ of sample setjShould be bigger, rule of thumb threshold value typically takes 0.9.
In addition, various types of messages there are some bits to be seldom used, it is primarily due to leave when Protocol Design remaining Amount, such as represents the highest order of Speed fields, the situation of non-zero seldom occurs, such bit can be made to the identification of static fields Into interference, so the bit that " 0 " or " 1 " frequency of occurrences is more than 95% can be rejected in static fields identification process.
Embodiment five
Referring to Fig. 3, Fig. 3 is a kind of non-public protocol fields identifying system structure chart disclosed in the embodiment of the present application.
As shown in figure 3, the system includes:
Message sample set receiving unit 31, for receiving the message sample set being made up of same type of message;
Character match unit 32, for by any portion message in the message sample set according to the first preset length, from Left-to-right slides selection binary sequence field successively, by the character field of multiple the binary sequence fields and known length Matched, the length of the character field is equal with first preset length, then by the binary sequence that the match is successful Field is defined as character field, and the character field is used to state target designation;
Field chooses unit 33, for the part by the character field is removed in every a message, according to identical Preset rules mark off multiple fields to be identified;
Dynamic field determining unit 34, for the field to be identified to being in same position in all messages, statistics Its mathematical feature, and judge whether the mathematical feature meets preparatory condition, when judged result is to be, by the field to be identified It is defined as dynamic field, is otherwise non-dynamic field, the dynamic field is used for the multidate information for stating target, finally will be continuous The dynamic field merge into a field, the field after merging is defined as a complete dynamic field;
Static fields determining unit 35, for calculating each bit and the character field in the non-dynamic field Correlation, is more than the first preset value by correlation and continuous bit merges into a field, and the field after merging is true It is set to static fields, the static fields are used to state the intrinsic characteristic of target.
Non-public protocol fields identifying system disclosed in the embodiment of the present application, by character match unit 32 to message sample Concentrate any a message to enter line slip according to the first preset length and choose binary sequence field, then by the binary system of selection Sequence field is matched with the character field of known length, and wherein the length of character field is equal with the first preset length, will The binary sequence field that the match is successful is defined as character field, and then choosing unit 33 by field will all remove per a message Character field, marks off multiple fields to be identified, by dynamic field determining unit by remainder according to identical preset rules The field to be identified of same position is in 34 pairs of all messages, its mathematical feature is counted, certain bar is met in mathematical feature During part, that is, dynamic field is determined that it is, be otherwise non-dynamic field, continuous dynamic field is merged into a field, merged Field afterwards is defined as a complete dynamic field, and dynamic field states the multidate information of target.Because static fields are all The static information of same target is described, therefore all there is very strong correlation between static fields, and character field statement is The title of target, it can also regard a kind of static fields as in some sense, therefore pass through static fields determining unit 35 The correlation of each bit and character field in non-dynamic field is calculated, correlation is more than the first preset value and continuous ratio A field is merged into special position, and the field for merging sum is defined as into static fields.So as to, complete character field to message, The identification of dynamic field and static fields.
It should be noted that such as Fig. 4, Fig. 4 is dynamic field determining unit structure chart disclosed in the embodiment of the present application.Dynamic Field determining unit 34 includes:Characteristic statisticses unit 341, feature judging unit 342 and combining unit 343, wherein,
The characteristic statisticses unit 341 covers for counting " 0 " " 1 " ratio average deviation of the field to be identified, numerical value Lid rate and normalization numeric distribution variance;
The feature judging unit 342 is used to judge " 0 " " 1 " ratio average deviation, the numerical value coverage rate and the institute State whether the summing value after normalization numeric distribution variance is weighted respectively is more than threshold value, when judged result is to be, this is waited to know Malapropism section is defined as dynamic field, is otherwise non-dynamic field, and the dynamic field is used for the multidate information for stating target;
The combining unit 343 is used to the continuous dynamic field merging into a field, by the field after merging It is defined as a complete dynamic field.
Finally, in addition it is also necessary to explanation, herein, such as first and second or the like relational terms be used merely to by One entity or operation make a distinction with another entity or operation, and not necessarily require or imply these entities or operation Between there is any this actual relation or order.Moreover, term " comprising ", "comprising" or its any other variant meaning Covering including for nonexcludability, so that process, method, article or equipment including a series of key elements not only include that A little key elements, but also other key elements including being not expressly set out, or also include be this process, method, article or The intrinsic key element of equipment.In the absence of more restrictions, the key element limited by sentence "including a ...", is not arranged Except also there is other identical element in the process including the key element, method, article or equipment.
The embodiment of each in this specification is described by the way of progressive, and what each embodiment was stressed is and other Between the difference of embodiment, each embodiment identical similar portion mutually referring to.
The foregoing description of the disclosed embodiments, enables professional and technical personnel in the field to realize or use the application. A variety of modifications to these embodiments will be apparent for those skilled in the art, as defined herein General Principle can in other embodiments be realized in the case where not departing from spirit herein or scope.Therefore, the application The embodiments shown herein is not intended to be limited to, and is to fit to and principles disclosed herein and features of novelty phase one The most wide scope caused.

Claims (10)

1. a kind of non-public protocol fields recognition methods, it is characterised in that including:
Receive the message sample set being made up of same type of message;
By any portion message in the message sample set according to the first preset length, slide successively choose binary system from left to right Sequence field, multiple binary sequence fields are matched with the character field of known length, the character field Length is equal with first preset length;
The binary sequence field that the match is successful is defined as character field, the character field is used to state target designation;
The part of the character field will be removed in every a message, marked off according to identical preset rules and multiple wait to know Malapropism section;
The field to be identified to being in same position in all messages, counts its mathematical feature;
When the mathematical feature meets preparatory condition, the field to be identified is defined as dynamic field, is otherwise non-dynamic word Section, the dynamic field is used for the multidate information for stating target;
The continuous dynamic field is merged into a field, the field after merging is defined as a complete dynamic word Section;
The correlation of each bit and the character field in the non-dynamic field is calculated, correlation is preset more than first Value and continuously bit merge into a field, and the field after merging is defined as into static fields, and the static fields are used In the intrinsic characteristic of statement target.
2. according to the method described in claim 1, it is characterised in that described to remove the character words in every a message The part of section, multiple fields to be identified are marked off according to identical preset rules, are specially:
Field to be identified is selected using sliding window method, window length is set to L, using 1 bit as step-length, removed in every a message The part of the character field, from left to right slides and selects multiple fields to be identified successively.
3. method according to claim 2, it is characterised in that the described of same position is in all messages and waits to know Malapropism section, be specially:
In each message, the field to be identified that the identical and length in initial bits position is L.
4. method according to claim 3, it is characterised in that the described of same position is in described pair of all messages and is treated Field is recognized, its mathematical feature is counted, is specially:
The field to be identified to being in same position in all messages, counts its " 0 " " 1 " ratio average deviation, numerical value and covers Lid rate and normalization numeric distribution variance.
5. method according to claim 4, it is characterised in that its " 0 " " 1 " ratio average deviation of the statistics includes:
Count each bit in the field to be identified and the probability of " 0 " or " 1 " occur:
Wherein, ajFor the probability that j-th bit position is " 0 " or " 1 ", NjRepresent number of the j-th bit position for the message sample of " 0 " or " 1 " Amount, m is the total quantity of message sample concentrated messages sample, and n is the length of each message sample;
Field to be identified " 0 " " 1 " the ratio average deviation:
Wherein x1Span is [0,1].
6. method according to claim 4, it is characterised in that its numerical value coverage rate of the statistics, including:
Numerical value coverage rate:
Wherein, KLRepresent that length is the value species number of L field, K in the messageL∈(1,2,...,2L), 2LRepresent described Random length is the different value species numbers of L field in message,
7. method according to claim 4, it is characterised in that it normalizes numeric distribution variance to the statistics, including:
Numeric distribution variance is:
Wherein, NiRepresent m message sample in field value to be identified for i message sample quantity, herein i be will be to be identified The value of field is scaled the value after the decimal system;2LRepresent the different value species of the field that random length is L in the message Number;
Normalizing numeric distribution variance is:
x3Span is [0,1].
8. method according to claim 4, it is characterised in that described when the mathematical feature meets preparatory condition, will The field to be identified is defined as dynamic field, is otherwise non-dynamic field, is specially:
" 0 " " 1 " ratio average deviation x by described in1, the numerical value coverage rate x2With the normalization numeric distribution variance x3Respectively Summed after weighting:
Wherein,x1Span is [0,1], ajFor the probability that j-th bit position is " 0 " or " 1 ", NjQuantity of the j-th bit position for the message sample of " 0 " or " 1 " is represented, m is report The total quantity of literary sample concentrated messages sample, n is the length of each message sample;
KLRepresent that length is the value species number of L field, K in the messageL∈{1,2,...,2L, 2LRepresent Random length is the different value species numbers of L field in the message,
x3Span is [0,1], wherein,Ni Represent that field value to be identified is the quantity of i message sample in m message sample, i is by the value of field to be identified herein It is scaled the value after the decimal system;
Wherein, ωi(i=1,2,3) it is respectively xiThe weighted value of (i=1,2,3),
Judge Score (L) and threshold value σ magnitude relationship, when Score (L) is more than σ, the field to be identified is defined as dynamic Field, is otherwise defined as non-dynamic field.
9. a kind of non-public protocol fields identifying system, it is characterised in that including:
Message sample set receiving unit, for receiving the message sample set being made up of same type of message;
Character match unit, for by any portion message in the message sample set according to the first preset length, from left to right Slide successively and choose binary sequence field, by the character field progress of multiple the binary sequence fields and known length Match somebody with somebody, the length of the character field is equal with first preset length, it is then that the binary sequence field that the match is successful is true It is set to character field, the character field is used to state target designation;
Field chooses unit, default according to identical for the part by the character field is removed in every a message Regular partition goes out multiple fields to be identified;
Dynamic field determining unit, for the field to be identified to being in same position in all messages, counts its mathematics Feature, and judge whether the mathematical feature meets preparatory condition, when judged result is to be, the field to be identified is defined as Dynamic field, is otherwise non-dynamic field, and the dynamic field is used for the multidate information for stating target, finally will be continuous described Dynamic field merges into a field, and the field after merging is defined as into a complete dynamic field;
Static fields determining unit is related to the character field for calculating each bit in the non-dynamic field Property, correlation is more than the first preset value and continuous bit merges into a field, and the field after merging is defined as Static fields, the static fields are used to state the intrinsic characteristic of target.
10. system according to claim 9, it is characterised in that the dynamic field determining unit includes:Characteristic statisticses list Member, feature judging unit and combining unit, wherein,
The characteristic statisticses unit is used to count " 0 " " 1 " ratio average deviation, numerical value coverage rate of the field to be identified and returned One changes numeric distribution variance;
The feature judging unit is used to judge " 0 " " 1 " ratio average deviation, the numerical value coverage rate and the normalization Whether the summing value after numeric distribution variance is weighted respectively is more than threshold value, when judged result is to be, the field to be identified is true It is set to dynamic field, is otherwise non-dynamic field, the dynamic field is used for the multidate information for stating target;
The combining unit is used to the continuous dynamic field merging into a field, and the field after merging is defined as into one Individual complete dynamic field.
CN201410110570.6A 2014-03-24 2014-03-24 A kind of non-public protocol fields recognition methods and system Expired - Fee Related CN103825784B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201410110570.6A CN103825784B (en) 2014-03-24 2014-03-24 A kind of non-public protocol fields recognition methods and system

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201410110570.6A CN103825784B (en) 2014-03-24 2014-03-24 A kind of non-public protocol fields recognition methods and system

Publications (2)

Publication Number Publication Date
CN103825784A CN103825784A (en) 2014-05-28
CN103825784B true CN103825784B (en) 2017-08-08

Family

ID=50760629

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201410110570.6A Expired - Fee Related CN103825784B (en) 2014-03-24 2014-03-24 A kind of non-public protocol fields recognition methods and system

Country Status (1)

Country Link
CN (1) CN103825784B (en)

Families Citing this family (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN105791278B (en) * 2016-02-29 2019-01-22 中国工程物理研究院计算机应用研究所 A kind of unknown binary protocol frame cutting and hierarchical division method
CN106850349B (en) * 2017-02-08 2020-01-03 杭州迪普科技股份有限公司 Feature information extraction method and device
CN107301210A (en) * 2017-06-06 2017-10-27 福建中经汇通有限责任公司 A kind of data processing method
CN108667839A (en) * 2018-05-11 2018-10-16 南京天控信息技术有限公司 A kind of protocol format estimating method excavated based on closed sequential pattern
CN109325015B (en) * 2018-08-31 2021-07-20 创新先进技术有限公司 Method and device for extracting characteristic field of domain model
CN110445750A (en) * 2019-06-18 2019-11-12 国家计算机网络与信息安全管理中心 A kind of car networking protocol traffic recognition methods and device
CN113761297A (en) * 2020-11-10 2021-12-07 北京沃东天骏信息技术有限公司 Method and device for determining field relevancy in database table
CN113569106B (en) * 2021-06-16 2023-10-13 东风汽车集团股份有限公司 CAN data identification method, device and equipment

Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN102523167A (en) * 2011-12-23 2012-06-27 中山大学 Optimal segmentation method of unknown application layer protocol message format
CN103200203A (en) * 2013-04-24 2013-07-10 中国人民解放军理工大学 Semantic-level protocol format inference method based on execution trace
CN103414708A (en) * 2013-08-01 2013-11-27 清华大学 Method and device for protocol automatic reverse analysis of embedded equipment

Family Cites Families (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
GB2443005A (en) * 2006-07-19 2008-04-23 Chronicle Solutions Analysing network traffic by decoding a wide variety of protocols (or object types) of each packet

Patent Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN102523167A (en) * 2011-12-23 2012-06-27 中山大学 Optimal segmentation method of unknown application layer protocol message format
CN103200203A (en) * 2013-04-24 2013-07-10 中国人民解放军理工大学 Semantic-level protocol format inference method based on execution trace
CN103414708A (en) * 2013-08-01 2013-11-27 清华大学 Method and device for protocol automatic reverse analysis of embedded equipment

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
基于网络层的链路层协议盲分析;高逸龙;《中国优秀硕士学位论文全文数据库 信息科技辑 2014年》;20140115(第1期);全文 *

Also Published As

Publication number Publication date
CN103825784A (en) 2014-05-28

Similar Documents

Publication Publication Date Title
CN103825784B (en) A kind of non-public protocol fields recognition methods and system
US20180246962A1 (en) Playlist list determining method and device, electronic apparatus, and storage medium
Grabczewski Separability of Split Value Criterion with Weighted Separation Gains.
CN110162632B (en) Method for discovering news special events
CN104809393A (en) Shilling attack detection algorithm based on popularity classification features
CN112001170B (en) Method and system for identifying deformed sensitive words
Nomoto NEAL: A neurally enhanced approach to linking citation and reference
CN106815200A (en) Objectionable text detection method and device based on keyword
CN107256205A (en) Abnormal data automatic identifying method, equipment and readable storage medium storing program for executing
US11570069B2 (en) Network traffic classification method and system based on improved K-means algorithm
CN104239321B (en) A kind of data processing method and device of Search Engine-Oriented
CN107886130A (en) A kind of kNN rapid classification methods based on cluster and Similarity-Weighted
US20190348150A1 (en) Method and system for identification of key driver organisms from microbiome / metagenomics studies
CN112016317A (en) Sensitive word recognition method and device based on artificial intelligence and computer equipment
CN104531844B (en) A kind of fruit variety differentiation and characteristic fingerprint methods of exhibiting based on SSR genotype
Zaw et al. Multi-level sentiment information extraction using the crbsa algorithm
WO2004097560A3 (en) Systems and methods for investment decision support
CN106776965A (en) The group technology and device of feature set of strings
CN103970727B (en) Anti- cheat method, device and server based on topic
CN110222484A (en) A kind of method for identifying ID, device, electronic equipment and storage medium
US20230030210A1 (en) Tea impurity data annotation method based on supervised machine learning
CN113539369B (en) Optimized kraken2 algorithm and application thereof in second-generation sequencing
CN115858219A (en) Token conversion-based multi-sequence log analysis method and system
CN114265922A (en) Automatic question answering and model training method and device based on cross-language
CN114124565A (en) Network intrusion detection method based on graph embedding

Legal Events

Date Code Title Description
C06 Publication
PB01 Publication
C10 Entry into substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant
CF01 Termination of patent right due to non-payment of annual fee
CF01 Termination of patent right due to non-payment of annual fee

Granted publication date: 20170808