CN103825784A - Non-public protocol field identification method and system - Google Patents

Non-public protocol field identification method and system Download PDF

Info

Publication number
CN103825784A
CN103825784A CN201410110570.6A CN201410110570A CN103825784A CN 103825784 A CN103825784 A CN 103825784A CN 201410110570 A CN201410110570 A CN 201410110570A CN 103825784 A CN103825784 A CN 103825784A
Authority
CN
China
Prior art keywords
field
identified
message
dynamic
character
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN201410110570.6A
Other languages
Chinese (zh)
Other versions
CN103825784B (en
Inventor
于宏毅
李林林
李青
张效义
林荣强
陶思宇
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
PLA Information Engineering University
Original Assignee
PLA Information Engineering University
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by PLA Information Engineering University filed Critical PLA Information Engineering University
Priority to CN201410110570.6A priority Critical patent/CN103825784B/en
Publication of CN103825784A publication Critical patent/CN103825784A/en
Application granted granted Critical
Publication of CN103825784B publication Critical patent/CN103825784B/en
Expired - Fee Related legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Landscapes

  • Communication Control (AREA)
  • Computer And Data Communications (AREA)

Abstract

The invention discloses a non-public protocol field identification method which comprises the steps of selecting a binary sequence field for any message in a message sample set, matching the selected binary sequence field with a known character field, and determining the successfully matched binary sequence field as a character field; then removing the character field from each message, dividing the rest of the message into a plurality of fields to be identified according to the same preset rule, counting math features of the fields to be identified at the same position in all the messages, and determining the field to be identified as a dynamic field if the math feature of the field meet a certain condition, otherwise, determining the field to be identified as a non-dynamic field; calculating a relevance between each bit in the non-dynamic field and the character field, combining the continuous bits of which the relevance is larger than a first preset value to form a field, and determining the combined field as a static field. Therefore, identification on the character fields, the dynamic fields and the static fields of the messages is finished.

Description

A kind of non-public protocol fields recognition methods and system
Technical field
The application relates to non-public agreement conversed analysis technical field, more particularly, relates to a kind of non-public protocol fields recognition methods and system.
Background technology
Agreement is a series of rule, the standard convention of setting up for carrying out network data exchange, is the core of Computer Data Communication, is also the primary study object of network safety filed.In the agreement using at present, having is all much non-disclosure agreement, i.e. not the data format of disclosure agreement and alternately rules.And agreement conversed analysis refers in the situation that not relying on protocol description, monitor and analyze by network input and output, system action and instruction execution flow to protocol entity, extract the process of protocol format and protocol status machine information.Existing more common for example sequence of message analysis of agreement conversed analysis.
But existing sequence of message analytical method is all to analyze take byte as base unit, and the analysis of protocol massages form is all depended on to the format identification field in protocol massages.And in actual applications, can run into some agreements without mark (form) field take bit as base unit, and this type of protocol message is made up of content field completely, there is no blank character between different field, and each field does not have flag bit yet.Therefore, existing this sequence of message analytical method is not suitable for the analysis of this quasi-protocol.
Summary of the invention
In view of this, the application provides a kind of non-public protocol fields recognition methods and system, is not suitable for the bitwise problem without identity protocol message of definition for solving existing sequence of message analytical method.
To achieve these goals, the existing scheme proposing is as follows:
A kind of non-public protocol fields recognition methods, comprising:
Receive the message sample set being formed by the message of same type;
By any a message in described message sample set according to the first preset length, slide and choose binary sequence field successively from left to right, multiple described binary sequence fields are mated with the character field of known length, and the length of described character field equates with described the first preset length;
The binary sequence field that the match is successful is defined as to character field, and described character field is used for explaining target designation;
By the part of removing described character field in described every portion message, mark off multiple fields to be identified according to identical preset rules;
To the field described to be identified in same position in all messages, add up its mathematical feature;
When described mathematical feature meets when pre-conditioned, this field to be identified is defined as to dynamic field, otherwise is non-dynamic field, the multidate information of described dynamic field for explaining target;
Continuous described dynamic field is merged into a field, the field after merging is defined as to a complete dynamic field;
Calculate the correlation of each bit and described character field in described non-dynamic field, correlation is greater than to the first preset value and continuous bit is merged into a field, and the field after merging is defined as to static fields, described static fields is for explaining the intrinsic characteristic of target.
Preferably, described by the part of removing described character field in described every portion message, mark off multiple fields to be identified according to identical preset rules, be specially:
Utilize sliding window method to select field to be identified, window length is set to L, and take 1 bit as step-length, the part of removing described character field in the described message of every portion, from left to right slides and select multiple fields to be identified successively.
Preferably, the field described to be identified in same position in described all messages, is specially:
In each message, the field to be identified that initial bits position is identical and length is L.
Preferably, described the field described to be identified in same position in all messages is added up to its mathematical feature, is specially:
To the field described to be identified in same position in all messages, add up its " 0 " " 1 " ratio average deviation, numerical value coverage rate and normalization numeric distribution variance.
Preferably, its " 0 " " 1 " ratio average deviation of described statistics comprises:
Add up the probability of each bit appearance " 0 " or " 1 " in described field to be identified:
a j = N j m , ( j = 1,2 , . . . , n )
Wherein, a jbe the probability that j bit is " 0 " or " 1 ", N jthe quantity that represents the message sample that j bit is " 0 " or " 1 ", m is the total quantity of message sample concentrated messages sample, n is the length of each message sample;
Described field to be identified " 0 " " 1 " ratio average deviation:
x 1 = 1 n Σ i = 1 n 0.5 - | a j - 0.5 | 0.5
Wherein x 1span is [0,1].
Preferably, its numerical value coverage rate of described statistics, comprising:
Numerical value coverage rate:
x 2 = K L 2 L
Wherein, K lrepresent the value species number of the field that in described message, length is L, K l∈ 1,2 ..., 2 n, 2 lrepresent the different value species numbers of the field that in described message, random length is L, x 2 ∈ ( 1 2 n , 2 2 n , . . . , 1 ) .
Preferably, its normalization numeric distribution variance of described statistics, comprising:
Numeric distribution variance is:
R L = var ( N 0 , N 1 , . . . , N 2 L - 1 )
R L ∈ [ 0 , m 2 2 L * 2 ( 1 - 1 2 L ) ]
Wherein, N ithe quantity that represents the message sample that in m message sample, field value to be identified is i, the value of field to be identified is scaled the value after the decimal system by i herein;
Normalization numeric distribution variance is:
x 3 = 1 - R L m 2 2 L * 2 ( 1 - 1 2 L )
X 3span is [0,1].
Preferably, describedly meet when pre-conditioned when described mathematical feature, this field to be identified be defined as to dynamic field, otherwise be non-dynamic field, be specially:
Described " 0 " " 1 " ratio average deviation, described numerical value coverage rate and described normalization numeric distribution variance are sued for peace after weighting respectively:
Score ( L ) = ω 1 x 1 + ω 2 x 2 + ω 3 x 3 Σ i = 1 3 ω i = 1
Wherein, ω i(i=1,2,3) are respectively x ithe weighted value of (i=1,2,3),
The magnitude relationship that judges Score (L) and threshold value σ, in the time that Score (L) is greater than σ, is defined as dynamic field by this field to be identified, otherwise is defined as non-dynamic field.
A kind of non-public protocol fields recognition system, comprising:
Message sample set receiving element, for receiving the message sample set being made up of the message of same type;
Character match unit, be used for any described message sample set a message according to the first preset length, slide and choose binary sequence field successively from left to right, multiple described binary sequence fields are mated with the character field of known length, the length of described character field equates with described the first preset length, then the binary sequence field that the match is successful is defined as to character field, described character field is used for explaining target designation;
Field is chosen unit, for the part that described every portion message is removed to described character field, marks off multiple fields to be identified according to identical preset rules;
Dynamic field determining unit, for the field described to be identified in same position to all messages, add up its mathematical feature, and it is pre-conditioned to judge whether described mathematical feature meets, when being, this field to be identified is defined as to dynamic field in judged result, otherwise be non-dynamic field, the multidate information of described dynamic field for explaining target, finally merges into a field by continuous described dynamic field, and the field after merging is defined as to a complete dynamic field;
Static fields determining unit, for calculating the correlation of described each bit of non-dynamic field and described character field, correlation is greater than to the first preset value and continuous bit is merged into a field, and the field after merging is defined as to static fields, described static fields is for explaining the intrinsic characteristic of target.
Preferably, described dynamic field determining unit comprises: characteristic statistics unit, feature judging unit and merge cells, wherein,
Described characteristic statistics unit is for adding up " 0 " " 1 " ratio average deviation, numerical value coverage rate and the normalization numeric distribution variance of described field to be identified;
Described feature judging unit be used for judging described " 0 " " 1 " ratio average deviation, described numerical value coverage rate and described normalization numeric distribution variance respectively the summing value after weighting whether be greater than threshold value, in judged result when being, this field to be identified is defined as to dynamic field, otherwise be non-dynamic field, the multidate information of described dynamic field for explaining target;
Described merge cells, for continuous described dynamic field is merged into a field, is defined as a complete dynamic field by the field after merging.
Can find out from above-mentioned technical scheme, the disclosed non-public protocol fields recognition methods of the application, by any a message in message sample set is slided and chooses binary sequence field according to the first preset length, then the binary sequence field of choosing is mated with the character field of known length, wherein the length of character field equates with the first preset length, the binary sequence field that the match is successful is defined as to character field, then every a message is all removed to character field, remainder is marked off to multiple fields to be identified according to identical preset rules, to the field to be identified in same position in all messages, add up its mathematical feature, in the time that mathematical feature meets some requirements, determine that it is dynamic field, otherwise be non-dynamic field, continuous dynamic field is merged into a field, field after merging is defined as a complete dynamic field, the multidate information of dynamic field statement target.Because static fields is all to describe the static information of same target, therefore between static fields, all there is very strong correlation, and character field statement is the title of target, it also can be regarded as a kind of static fields in some sense, therefore calculate the correlation of each bit and character field in non-dynamic field, correlation is greater than to the first preset value and continuous bit is merged into a field, by merge and field be defined as static fields.Thereby, completed the identification of character field, dynamic field and static fields to message.
Accompanying drawing explanation
In order to be illustrated more clearly in the embodiment of the present application or technical scheme of the prior art, to the accompanying drawing of required use in embodiment or description of the Prior Art be briefly described below, apparently, accompanying drawing in the following describes is only some embodiment of the application, for those of ordinary skills, do not paying under the prerequisite of creative work, can also obtain according to these accompanying drawings other accompanying drawing.
Fig. 1 is the non-public protocol fields recognition methods of the disclosed one of the embodiment of the present application flow chart;
Fig. 2 is the schematic diagram that the binary sequence field of message is chosen in the disclosed slip of the embodiment of the present application;
Fig. 3 is the non-public protocol fields recognition system of the disclosed one of the embodiment of the present application structure chart;
Fig. 4 is the disclosed dynamic field determining unit of the embodiment of the present application structure chart.
Embodiment
Below in conjunction with the accompanying drawing in the embodiment of the present application, the technical scheme in the embodiment of the present application is clearly and completely described, obviously, described embodiment is only the application's part embodiment, rather than whole embodiment.Based on the embodiment in the application, those of ordinary skills are not paying all other embodiment that obtain under creative work prerequisite, all belong to the scope of the application's protection.
Traditional sequence of message analytical method is all to analyze take byte as base unit, and protocol massages format analysis is depended on to the format identification field in protocol massages.And for some the agreement without identification field take bit as base unit, traditional method cannot be analyzed.The application is for the feature of this quasi-protocol, a kind of analytical method based on field statistical nature has been proposed, first the method is divided into character field, dynamic field and the main field of static fields three class according to the application background of this quasi-protocol by its message, by the mathematical feature of character match, analysis three class fields, obtain the field format of final message respectively.Embodiment sees below.
Embodiment mono-
Referring to Fig. 1, Fig. 1 is the non-public protocol fields recognition methods of the disclosed one of the embodiment of the present application flow chart.
As shown in Figure 1, the method comprises:
Step 101: receive the message sample set being formed by the message of same type;
Particularly, the message of same type refers to here: the form of message is identical.For instance:
Message " ... 00110110 ... " first three bit represent target name be called A, rear five bits represent that the speed of target is V; Message " ... 10110010 ... " first three bit represent target name be called B, rear five bits represent that the speed of target is M.Visible, although the title of the target of two parts of message representatives is different with speed, the form of two parts of messages is identical, is all titles that first three bit represents target, and rear five bits represent the speed of target.
Therefore the message sample set being formed by the message of same type that, we receive here.
Step 102: any a message in described message sample set, according to the first preset length, is slided and chooses binary sequence field successively from left to right, multiple described binary sequence fields are mated with the character field of known length;
Particularly, the length of described character field equates with described the first preset length, the length of our known multiple character fields and content, then carry out sliding and choosing the binary sequence field of message successively from left to right according to the length of character field, then the binary sequence field of choosing is mated with character field, for instance:
Referring to Fig. 2, Fig. 2 is the schematic diagram that the binary sequence field of message is chosen in the disclosed slip of the embodiment of the present application.
As shown in Figure 2, the length of supposing known character field is 6 bits, is from left to right a unit according to every 6 bits, take 1 bit as step-length, slides and chooses binary sequence field successively.Then multiple fields of choosing out are successively mated with known character field.
Step 103: the binary sequence field that the match is successful is defined as to character field, and described character field is used for explaining target designation;
Particularly, mate in the manner described above, can show that some divisions binary sequence field out mates completely with known character field, represent that this binary sequence field is a character field, it is for explaining the title of target.
Step 104: by the part of removing described character field in described every portion message, mark off multiple fields to be identified according to identical preset rules;
Particularly, because the type of each part of message is identical, so as long as identify the character field of any portion, we can be according to the character field that identifies other message.Then, the binary sequence field of every a message is all removed to character field, then remainder is marked off to multiple fields to be identified according to identical preset rules.The preset rules here does not limit, and can certainly be according to the method shown in Fig. 2, or other mode.
Step 105: to the field described to be identified in same position in all messages, add up its mathematical feature;
Particularly, because all messages are all divisions of carrying out according to identical rule, therefore in every a message, all there is a field to be identified that position is identical.For example: or according to the dividing mode of Fig. 2, every six bits are a field to be identified,, in all messages, all have the field to be identified of " 1-6 " bit composition, and this is the field to be identified of a same position.Equally, all have the field to be identified of " 2-7 " bit composition in all messages, this is also the field to be identified of a same position.Generally speaking, same position refers to, and initial bits position length identical and field to be identified is also identical.For the field to be identified in same position in all messages, we add up its mathematical feature.
Step 106: when described mathematical feature meets when pre-conditioned, this field to be identified is defined as to dynamic field, otherwise is non-dynamic field, the multidate information of described dynamic field for explaining target;
Particularly, because the specific of dynamic field is temporal evolution, the likelihood ratio that the numeric state of each bit occurs is more approaching, therefore by statistical mathematics feature, when it meets necessarily when pre-conditioned, can think that this field to be identified is a dynamic field, otherwise be non-dynamic field.The multidate information of dynamic field for explaining target, for example: the speed of target, the position of target.
Step 107: continuous described dynamic field is merged into a field, the field after merging is defined as to a complete dynamic field;
Particularly, because we do not know original position and the length of dynamic field in this message in advance, the length of the field to be identified therefore marking off is the length of true dynamic field not necessarily, likely real dynamic field is a very long field, and the field length that we mark off is less, therefore need continuous dynamic field to merge into a complete dynamic field, the dynamic field after merging is only real dynamic field.
Step 108: the correlation of calculating each bit and described character field in described non-dynamic field, correlation is greater than to the first preset value and continuous bit is merged into a field, and the field after merging is defined as to static fields, described static fields is for explaining the intrinsic characteristic of target.
Particularly, by former steps, character field and dynamic field have all been determined, remaining part comprises static fields and other field, and other field is not that we want to obtain here, as long as therefore identify static fields.And static fields is substantially all to describe the static information of same target, therefore in protocol massages sample set, this type of information shows very strong correlation.And the character field of determining before can be regarded a kind of static attribute of target equally as, therefore we can calculate the correlation of each bit and this character field in non-dynamic field, correlation is greater than to the first preset value and continuous bit is merged into a field, and the field after merging is defined as to static fields, described static fields is for explaining the intrinsic characteristic of target.
So far, we completed the identification work to the character field without identity protocol message, dynamic field and static fields bitwise.
The disclosed non-public protocol fields recognition methods of the embodiment of the present application, by any a message in message sample set is slided and chooses binary sequence field according to the first preset length, then the binary sequence field of choosing is mated with the character field of known length, wherein the length of character field equates with the first preset length, the binary sequence field that the match is successful is defined as to character field, then every a message is all removed to character field, remainder is marked off to multiple fields to be identified according to identical preset rules, to the field to be identified in same position in all messages, add up its mathematical feature, in the time that mathematical feature meets some requirements, determine that it is dynamic field, otherwise be non-dynamic field, continuous dynamic field is merged into a field, field after merging is defined as a complete dynamic field, the multidate information of dynamic field statement target.Because static fields is all to describe the static information of same target, therefore between static fields, all there is very strong correlation, and character field statement is the title of target, it also can be regarded as a kind of static fields in some sense, therefore calculate the correlation of each bit and character field in non-dynamic field, correlation is greater than to the first preset value and continuous bit is merged into a field, by merge and field be defined as static fields.Thereby, completed the identification of character field, dynamic field and static fields to message.
It should be noted that, above-mentioned to removing the part of character field in message, mark off the process of multiple fields to be identified according to identical preset rules, specifically can use sliding window method, window is set long for L, then take 1 bit as step-length, slides and select multiple fields to be identified successively from left to right.Certainly, step-length can also be arranged to other numerical value herein, and these all need to be according to actual conditions and be fixed.
If choose field to be identified according to above-mentioned sliding window method, in each message, being the identical and length in initial bits position in the field to be identified of same position is the field to be identified of the long L of window.
Embodiment bis-
In the present embodiment, in the identifying of dynamic field, we further explain by the mathematical feature of statistics:
We select the field to be identified in same position in all messages, carry out the statistics of " 0 " " 1 " ratio average deviation, numerical value coverage rate and normalization numeric distribution variance.
(1), the statistic processes of " 0 " " 1 " ratio average deviation is as follows:
First statistics " 0 " " 1 " probability, there is the probability of " 0 " or " 1 " in each bit that " 0 " " 1 " probability refers to, and occur " 0 " and appearance " 1 " probability and be 1, i.e. p (0)+p (1)=1.
a j = N j m , ( j = 1,2 , . . . , n )
Wherein, a jbe the probability that j bit is " 0 " or " 1 ", N jthe quantity that represents the message sample that j bit is " 0 " or " 1 ", m is the total quantity of message sample concentrated messages sample, n is the length of each message sample;
Field to be identified " 0 " " 1 " ratio average deviation is:
x 1 = 1 n Σ i = 1 n 0.5 - | a j - 0.5 | 0.5
Wherein x 1span is [0,1], works as a jequal at 0.5 o'clock, when each bit " 0 ", " 1 " equiprobability occur, x 1obtain maximum 1, work as a jequal 0 or at 1 o'clock, each bit " 0 " or " 1 " probability of occurrence are 1 o'clock, x 1obtain minimum value 0.
(2), numerical value coverage rate statistic processes is as follows:
The long field for L of window has 2 arbitrarily lplant different values, numerical value coverage rate refers to the value species number and 2 of the field that in message, length is L lratio:
x 2 = K L 2 L
Wherein, K l∈ 1,2 ..., 2 n,
Figure BDA0000481156400000111
(3) statistic processes of, normalization numeric distribution variance is as follows:
First add up field value distribution situation to be identified, numerical value refers to binary numeral is converted to the value after the decimal system herein, and numeric distribution variance is:
R L = var ( N 0 , N 1 , . . . , N 2 L - 1 )
R L ∈ [ 0 , m 2 2 L * 2 ( 1 - 1 2 L ) ]
Wherein, N ithe quantity that represents the message sample that in m message sample, field value to be identified is i, the value of field to be identified is scaled the value after the decimal system by i herein;
Normalization numeric distribution variance is:
x 3 = 1 - R L m 2 2 L * 2 ( 1 - 1 2 L )
X 3span is [0,1].
Added up after mathematical feature, utilized this mathematical feature to carry out the judgement of dynamic field, detailed process is as follows:
Three mathematical features of above-mentioned statistics, the larger possibility that represents that this field to be identified is dynamic field of its value is larger, in order to fully utilize this three features, we sue for peace " 0 " " 1 " ratio average deviation, numerical value coverage rate and described normalization numeric distribution variance respectively after weighting:
Score ( L ) = ω 1 x 1 + ω 2 x 2 + ω 3 x 3 Σ i = 1 3 ω i = 1
Wherein, ω i(i=1,2,3) are respectively x ithe weighted value of (i=1,2,3),
The magnitude relationship that judges Score (L) and threshold value σ, in the time that Score (L) is greater than σ, is defined as dynamic field by this field to be identified, otherwise is defined as non-dynamic field.The value of σ is relevant with protocol data, can not provide a pervasive optimal value here.Rule of thumb generally get 0.5, exact numerical values recited need make the appropriate adjustments in conjunction with protocol application background.
Embodiment tri-
In the present embodiment, introduce sliding window method and choose in the process of field to be identified, the long setting principle of window.
For length of window L, selection can not be too little can not be too large, if excessive, field length to be identified easily exceedes the scope of dynamic field, dynamic field can be judged to be to non-dynamic field; On the contrary, if too little, field numerical value change to be identified very little, is easily mistaken for dynamic field by non-dynamic field.
We can utilize numerical value coverage rate x 2determine the value of L:
Since 1 value that increases gradually L, until the numerical value coverage rate of all fields to be identified is all less than 1, obtain L value now, be referred to as L max, the value of L is as follows:
L = max ( 5 , L max 2 )
Experimental verification, according to the value mode of above-mentioned L, discrimination is the highest.
Embodiment tetra-
For the identifying of static fields, we do description below:
Static fields is mainly carried the static information of target, as AIS(Automatic Identification System, ship automatic identification system) ship IMO numbering (IMO in protocol massages, International Maritime Organization, International Maritime Organization, ship IMO numbering is that International Maritime Organization is the number that each ship is compiled), captain, the beam etc.Static fields in same message is substantially all described the static information of same target, and therefore in protocol massages sample set, this type of information shows very strong correlation.We need to identify static fields according to the correlation between static information.The character field identifying in first stage character match can be regarded a kind of static attribute of target equally as, can regard Given information as use in this stage.Using character field (as name of vessel) as benchmark field, identify other static fields by the correlation of calculating other field and benchmark field, if a certain field is relevant to character field, each bit of this field is all associated, simultaneously owing to can not determine the length and location of field to be identified, so bitwise carry out correlation analysis here, the most continuous related bits position merges composition static fields.
First according to benchmark field, protocol massages sample set is carried out to cluster, the message with same datum field is polymerized to a class, suppose to be copolymerized into n cgroup.Then, calculate all kinds of in each bit " 0 " and " 1 " shared ratio, get maximum among both and be the correlation of corresponding bits position and benchmark field in this group.Finally ask the average correlation of each bit in each group, as the correlation of this bit in this type of message and benchmark field, be shown below:
δ 0 ij = n 0 ij n i δ 1 ij = n 1 ij n i
Wherein
Figure BDA0000481156400000132
represent respectively j bit " 0 ", " 1 " shared ratio in i group message;
Figure BDA0000481156400000133
Figure BDA0000481156400000134
represent respectively in i group message the quantity of " 0 ", " 1 " message on j bit; n irepresent i group message amount.
δ ij = max ( δ 0 ij , δ 1 ij )
δ ijrepresent the correlation of j bit and benchmark field in i group message.
δ j = 1 n c Σ i = 1 n c δ ij = 1 n c Σ i = 1 n c max ( n 0 ij n i , n 1 ij n i )
δ jrepresent the correlation of this bit and benchmark field in this type of message.δ jbe worth larger explanation stronger in the correlation of bit and benchmark field, δ jequal this bit of 1 expression and benchmark field complete dependence, illustrate that this bit belongs to static fields, the adjacent bit bit combination relevant to benchmark field gets up to form static fields the most at last.Can adjust δ according to the pure degree of sample set jjudgment threshold, the purer threshold value of sample set δ jshould be larger, rule of thumb threshold value generally gets 0.9.
In addition, various types of messages has some bits to be seldom used to, mainly because leave surplus when Protocol Design, as represent the highest order of speed field, seldom there will be the situation of non-zero, such bit can cause interference to the identification of static fields, so can reject the bit that " 0 " or " 1 " frequency of occurrences is greater than 95% in to static fields identifying.
Embodiment five
Referring to Fig. 3, Fig. 3 is the non-public protocol fields recognition system of the disclosed one of the embodiment of the present application structure chart.
As shown in Figure 3, this system comprises:
Message sample set receiving element 31, for receiving the message sample set being made up of the message of same type;
Character match unit 32, be used for any described message sample set a message according to the first preset length, slide and choose binary sequence field successively from left to right, multiple described binary sequence fields are mated with the character field of known length, the length of described character field equates with described the first preset length, then the binary sequence field that the match is successful is defined as to character field, described character field is used for explaining target designation;
Field is chosen unit 33, for the part that described every portion message is removed to described character field, marks off multiple fields to be identified according to identical preset rules;
Dynamic field determining unit 34, for the field described to be identified in same position to all messages, add up its mathematical feature, and it is pre-conditioned to judge whether described mathematical feature meets, when being, this field to be identified is defined as to dynamic field in judged result, otherwise be non-dynamic field, the multidate information of described dynamic field for explaining target, finally merges into a field by continuous described dynamic field, and the field after merging is defined as to a complete dynamic field;
Static fields determining unit 35, for calculating the correlation of described each bit of non-dynamic field and described character field, correlation is greater than to the first preset value and continuous bit is merged into a field, and the field after merging is defined as to static fields, described static fields is for explaining the intrinsic characteristic of target.
The disclosed non-public protocol fields recognition system of the embodiment of the present application, by character match unit 32, any a message in message sample set is slided and chooses binary sequence field according to the first preset length, then the binary sequence field of choosing is mated with the character field of known length, wherein the length of character field equates with the first preset length, the binary sequence field that the match is successful is defined as to character field, then choose unit 33 by field every a message is all removed to character field, remainder is marked off to multiple fields to be identified according to identical preset rules, by dynamic field determining unit 34 to the field to be identified in same position in all messages, add up its mathematical feature, in the time that mathematical feature meets some requirements, determine that it is dynamic field, otherwise be non-dynamic field, continuous dynamic field is merged into a field, field after merging is defined as a complete dynamic field, the multidate information of dynamic field statement target.Because static fields is all to describe the static information of same target, therefore between static fields, all there is very strong correlation, and character field statement is the title of target, it also can be regarded as a kind of static fields in some sense, therefore calculate the correlation of each bit and character field in non-dynamic field by static fields determining unit 35, correlation is greater than to the first preset value and continuous bit is merged into a field, by merge and field be defined as static fields.Thereby, completed the identification of character field, dynamic field and static fields to message.
It should be noted that, as Fig. 4, Fig. 4 is the disclosed dynamic field determining unit of the embodiment of the present application structure chart.Dynamic field determining unit 34 comprises: characteristic statistics unit 341, feature judging unit 342 and merge cells 343, wherein,
Described characteristic statistics unit 341 is for adding up " 0 " " 1 " ratio average deviation, numerical value coverage rate and the normalization numeric distribution variance of described field to be identified;
Described feature judging unit 342 for judge described " 0 " " 1 " ratio average deviation, described numerical value coverage rate and described normalization numeric distribution variance respectively the summing value after weighting whether be greater than threshold value, in judged result when being, this field to be identified is defined as to dynamic field, otherwise be non-dynamic field, the multidate information of described dynamic field for explaining target;
Described merge cells 343, for continuous described dynamic field is merged into a field, is defined as a complete dynamic field by the field after merging.
Finally, also it should be noted that, in this article, relational terms such as the first and second grades is only used for an entity or operation to separate with another entity or operating space, and not necessarily requires or imply and between these entities or operation, have the relation of any this reality or sequentially.And, term " comprises ", " comprising " or its any other variant are intended to contain comprising of nonexcludability, thereby the process, method, article or the equipment that make to comprise a series of key elements not only comprise those key elements, but also comprise other key elements of clearly not listing, or be also included as the intrinsic key element of this process, method, article or equipment.The in the situation that of more restrictions not, the key element being limited by statement " comprising ... ", and be not precluded within process, method, article or the equipment that comprises described key element and also have other identical element.
In this specification, each embodiment adopts the mode of going forward one by one to describe, and what each embodiment stressed is and the difference of other embodiment, between each embodiment identical similar part mutually referring to.
To the above-mentioned explanation of the disclosed embodiments, make professional and technical personnel in the field can realize or use the application.To be apparent for those skilled in the art to the multiple modification of these embodiment, General Principle as defined herein can, in the case of not departing from the application's spirit or scope, realize in other embodiments.Therefore, the application will can not be restricted to these embodiment shown in this article, but will meet the widest scope consistent with principle disclosed herein and features of novelty.

Claims (10)

1. a non-public protocol fields recognition methods, is characterized in that, comprising:
Receive the message sample set being formed by the message of same type;
By any a message in described message sample set according to the first preset length, slide and choose binary sequence field successively from left to right, multiple described binary sequence fields are mated with the character field of known length, and the length of described character field equates with described the first preset length;
The binary sequence field that the match is successful is defined as to character field, and described character field is used for explaining target designation;
By the part of removing described character field in described every portion message, mark off multiple fields to be identified according to identical preset rules;
To the field described to be identified in same position in all messages, add up its mathematical feature;
When described mathematical feature meets when pre-conditioned, this field to be identified is defined as to dynamic field, otherwise is non-dynamic field, the multidate information of described dynamic field for explaining target;
Continuous described dynamic field is merged into a field, the field after merging is defined as to a complete dynamic field;
Calculate the correlation of each bit and described character field in described non-dynamic field, correlation is greater than to the first preset value and continuous bit is merged into a field, and the field after merging is defined as to static fields, described static fields is for explaining the intrinsic characteristic of target.
2. method according to claim 1, is characterized in that, described by the part of removing described character field in described every portion message, marks off multiple fields to be identified according to identical preset rules, is specially:
Utilize sliding window method to select field to be identified, window length is set to L, and take 1 bit as step-length, the part of removing described character field in the described message of every portion, from left to right slides and select multiple fields to be identified successively.
3. method according to claim 2, is characterized in that, the field described to be identified in same position in described all messages, is specially:
In each message, the field to be identified that initial bits position is identical and length is L.
4. method according to claim 3, is characterized in that, described the field described to be identified in same position in all messages is added up to its mathematical feature, is specially:
To the field described to be identified in same position in all messages, add up its " 0 " " 1 " ratio average deviation, numerical value coverage rate and normalization numeric distribution variance.
5. method according to claim 4, is characterized in that, its " 0 " " 1 " ratio average deviation of described statistics comprises:
Add up the probability of each bit appearance " 0 " or " 1 " in described field to be identified:
a j = N j m , ( j = 1,2 , . . . , n )
Wherein, a jbe the probability that j bit is " 0 " or " 1 ", N jthe quantity that represents the message sample that j bit is " 0 " or " 1 ", m is the total quantity of message sample concentrated messages sample, n is the length of each message sample;
Described field to be identified " 0 " " 1 " ratio average deviation:
x 1 = 1 n Σ i = 1 n 0.5 - | a j - 0.5 | 0.5
Wherein x 1span is [0,1].
6. method according to claim 4, is characterized in that, its numerical value coverage rate of described statistics, comprising:
Numerical value coverage rate:
x 2 = K L 2 L
Wherein, K lrepresent the value species number of the field that in described message, length is L, K l∈ 1,2 ..., 2 n, 2 lrepresent the different value species numbers of the field that in described message, random length is L, x 2 ∈ ( 1 2 n , 2 2 n , . . . , 1 ) .
7. method according to claim 4, is characterized in that, its normalization numeric distribution variance of described statistics, comprising:
Numeric distribution variance is:
R L = var ( N 0 , N 1 , . . . , N 2 L - 1 )
R L ∈ [ 0 , m 2 2 L * 2 ( 1 - 1 2 L ) ]
Wherein, N ithe quantity that represents the message sample that in m message sample, field value to be identified is i, the value of field to be identified is scaled the value after the decimal system by i herein;
Normalization numeric distribution variance is:
x 3 = 1 - R L m 2 2 L * 2 ( 1 - 1 2 L )
X 3span is [0,1].
8. according to the method described in claim 4,5,6 or 7, it is characterized in that, describedly meet when pre-conditioned when described mathematical feature, this field to be identified is defined as to dynamic field, otherwise is non-dynamic field, be specially:
Described " 0 " " 1 " ratio average deviation, described numerical value coverage rate and described normalization numeric distribution variance are sued for peace after weighting respectively:
Score ( L ) = ω 1 x 1 + ω 2 x 2 + ω 3 x 3 Σ i = 1 3 ω i = 1
Wherein, ω i(i=1,2,3) are respectively x ithe weighted value of (i=1,2,3),
The magnitude relationship that judges Score (L) and threshold value σ, in the time that Score (L) is greater than σ, is defined as dynamic field by this field to be identified, otherwise is defined as non-dynamic field.
9. a non-public protocol fields recognition system, is characterized in that, comprising:
Message sample set receiving element, for receiving the message sample set being made up of the message of same type;
Character match unit, be used for any described message sample set a message according to the first preset length, slide and choose binary sequence field successively from left to right, multiple described binary sequence fields are mated with the character field of known length, the length of described character field equates with described the first preset length, then the binary sequence field that the match is successful is defined as to character field, described character field is used for explaining target designation;
Field is chosen unit, for the part that described every portion message is removed to described character field, marks off multiple fields to be identified according to identical preset rules;
Dynamic field determining unit, for the field described to be identified in same position to all messages, add up its mathematical feature, and it is pre-conditioned to judge whether described mathematical feature meets, when being, this field to be identified is defined as to dynamic field in judged result, otherwise be non-dynamic field, the multidate information of described dynamic field for explaining target, finally merges into a field by continuous described dynamic field, and the field after merging is defined as to a complete dynamic field;
Static fields determining unit, for calculating the correlation of described each bit of non-dynamic field and described character field, correlation is greater than to the first preset value and continuous bit is merged into a field, and the field after merging is defined as to static fields, described static fields is for explaining the intrinsic characteristic of target.
10. system according to claim 9, is characterized in that, described dynamic field determining unit comprises: characteristic statistics unit, feature judging unit and merge cells, wherein,
Described characteristic statistics unit is for adding up " 0 " " 1 " ratio average deviation, numerical value coverage rate and the normalization numeric distribution variance of described field to be identified;
Described feature judging unit be used for judging described " 0 " " 1 " ratio average deviation, described numerical value coverage rate and described normalization numeric distribution variance respectively the summing value after weighting whether be greater than threshold value, in judged result when being, this field to be identified is defined as to dynamic field, otherwise be non-dynamic field, the multidate information of described dynamic field for explaining target;
Described merge cells, for continuous described dynamic field is merged into a field, is defined as a complete dynamic field by the field after merging.
CN201410110570.6A 2014-03-24 2014-03-24 A kind of non-public protocol fields recognition methods and system Expired - Fee Related CN103825784B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201410110570.6A CN103825784B (en) 2014-03-24 2014-03-24 A kind of non-public protocol fields recognition methods and system

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201410110570.6A CN103825784B (en) 2014-03-24 2014-03-24 A kind of non-public protocol fields recognition methods and system

Publications (2)

Publication Number Publication Date
CN103825784A true CN103825784A (en) 2014-05-28
CN103825784B CN103825784B (en) 2017-08-08

Family

ID=50760629

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201410110570.6A Expired - Fee Related CN103825784B (en) 2014-03-24 2014-03-24 A kind of non-public protocol fields recognition methods and system

Country Status (1)

Country Link
CN (1) CN103825784B (en)

Cited By (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN105791278A (en) * 2016-02-29 2016-07-20 中国工程物理研究院计算机应用研究所 Unknown binary protocol frame segmentation and hierarchical division method
CN106850349A (en) * 2017-02-08 2017-06-13 杭州迪普科技股份有限公司 The extracting method and device of a kind of characteristic information
CN107301210A (en) * 2017-06-06 2017-10-27 福建中经汇通有限责任公司 A kind of data processing method
CN108667839A (en) * 2018-05-11 2018-10-16 南京天控信息技术有限公司 A kind of protocol format estimating method excavated based on closed sequential pattern
CN109325015A (en) * 2018-08-31 2019-02-12 阿里巴巴集团控股有限公司 A kind of extracting method and device of the feature field of domain model
CN110445750A (en) * 2019-06-18 2019-11-12 国家计算机网络与信息安全管理中心 A kind of car networking protocol traffic recognition methods and device
CN113569106A (en) * 2021-06-16 2021-10-29 东风汽车集团股份有限公司 CAN data identification method, device and equipment
CN113761297A (en) * 2020-11-10 2021-12-07 北京沃东天骏信息技术有限公司 Method and device for determining field relevancy in database table

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20100057903A1 (en) * 2006-07-19 2010-03-04 Chronicle Solutions (Uk) Limited Network monitoring by using packet header analysis
CN102523167A (en) * 2011-12-23 2012-06-27 中山大学 Optimal segmentation method of unknown application layer protocol message format
CN103200203A (en) * 2013-04-24 2013-07-10 中国人民解放军理工大学 Semantic-level protocol format inference method based on execution trace
CN103414708A (en) * 2013-08-01 2013-11-27 清华大学 Method and device for protocol automatic reverse analysis of embedded equipment

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20100057903A1 (en) * 2006-07-19 2010-03-04 Chronicle Solutions (Uk) Limited Network monitoring by using packet header analysis
CN102523167A (en) * 2011-12-23 2012-06-27 中山大学 Optimal segmentation method of unknown application layer protocol message format
CN103200203A (en) * 2013-04-24 2013-07-10 中国人民解放军理工大学 Semantic-level protocol format inference method based on execution trace
CN103414708A (en) * 2013-08-01 2013-11-27 清华大学 Method and device for protocol automatic reverse analysis of embedded equipment

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
高逸龙: "基于网络层的链路层协议盲分析", 《中国优秀硕士学位论文全文数据库 信息科技辑 2014年》 *

Cited By (13)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN105791278B (en) * 2016-02-29 2019-01-22 中国工程物理研究院计算机应用研究所 A kind of unknown binary protocol frame cutting and hierarchical division method
CN105791278A (en) * 2016-02-29 2016-07-20 中国工程物理研究院计算机应用研究所 Unknown binary protocol frame segmentation and hierarchical division method
CN106850349A (en) * 2017-02-08 2017-06-13 杭州迪普科技股份有限公司 The extracting method and device of a kind of characteristic information
CN106850349B (en) * 2017-02-08 2020-01-03 杭州迪普科技股份有限公司 Feature information extraction method and device
CN107301210A (en) * 2017-06-06 2017-10-27 福建中经汇通有限责任公司 A kind of data processing method
CN108667839A (en) * 2018-05-11 2018-10-16 南京天控信息技术有限公司 A kind of protocol format estimating method excavated based on closed sequential pattern
CN109325015B (en) * 2018-08-31 2021-07-20 创新先进技术有限公司 Method and device for extracting characteristic field of domain model
CN109325015A (en) * 2018-08-31 2019-02-12 阿里巴巴集团控股有限公司 A kind of extracting method and device of the feature field of domain model
CN110445750A (en) * 2019-06-18 2019-11-12 国家计算机网络与信息安全管理中心 A kind of car networking protocol traffic recognition methods and device
CN113761297A (en) * 2020-11-10 2021-12-07 北京沃东天骏信息技术有限公司 Method and device for determining field relevancy in database table
CN113761297B (en) * 2020-11-10 2024-06-18 北京沃东天骏信息技术有限公司 Method and device for determining field relatedness in database table
CN113569106A (en) * 2021-06-16 2021-10-29 东风汽车集团股份有限公司 CAN data identification method, device and equipment
CN113569106B (en) * 2021-06-16 2023-10-13 东风汽车集团股份有限公司 CAN data identification method, device and equipment

Also Published As

Publication number Publication date
CN103825784B (en) 2017-08-08

Similar Documents

Publication Publication Date Title
CN103825784A (en) Non-public protocol field identification method and system
EP4273746A1 (en) Model training method and apparatus, and image retrieval method and apparatus
US11915104B2 (en) Normalizing text attributes for machine learning models
CN103885937B (en) Method for judging repetition of enterprise Chinese names on basis of core word similarity
CN109388712A (en) A kind of trade classification method and terminal device based on machine learning
CN103823896A (en) Subject characteristic value algorithm and subject characteristic value algorithm-based project evaluation expert recommendation algorithm
CN102592148A (en) Face identification method based on non-negative matrix factorization and a plurality of distance functions
CN101980211A (en) Machine learning model and establishing method thereof
CN101980210A (en) Marked word classifying and grading method and system
CN103678436A (en) Information processing system and information processing method
CN110110213A (en) Excavate method, apparatus, computer readable storage medium and the terminal device of user's occupation
CN103366009A (en) Book recommendation method based on self-adaption clustering
CN115081515A (en) Energy efficiency evaluation model construction method and device, terminal and storage medium
CN106802958A (en) Conversion method and system of the CAD data to GIS data
CN107368569A (en) Data difference control methods and device, storage medium and processor
US20230030210A1 (en) Tea impurity data annotation method based on supervised machine learning
CN114780649A (en) Method and device for identifying structured data entity type
CN104778373A (en) Method and device for identifying tactic of physical exercise
KR20190104745A (en) Issue interest based news value evaluation apparatus and method, storage media storing the same
CN106980989A (en) Method is recommended by trade company based on user behavior specificity analysis
CN114595760A (en) Data classification method and device
CN113886547A (en) Client real-time conversation switching method and device based on artificial intelligence and electronic equipment
CN112232952A (en) Data acquisition method and device for transaction dense area
CN112784903B (en) Method, device and equipment for training target recognition model
CN105681097A (en) Method and device for obtaining replacing cycle of terminal device

Legal Events

Date Code Title Description
C06 Publication
PB01 Publication
C10 Entry into substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant
CF01 Termination of patent right due to non-payment of annual fee
CF01 Termination of patent right due to non-payment of annual fee

Granted publication date: 20170808