CN114520838A - K-nearest neighbor-based network message matching method for custom protocol application layer - Google Patents

K-nearest neighbor-based network message matching method for custom protocol application layer Download PDF

Info

Publication number
CN114520838A
CN114520838A CN202210029243.2A CN202210029243A CN114520838A CN 114520838 A CN114520838 A CN 114520838A CN 202210029243 A CN202210029243 A CN 202210029243A CN 114520838 A CN114520838 A CN 114520838A
Authority
CN
China
Prior art keywords
protocol
protocol type
storage structure
matching
value
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN202210029243.2A
Other languages
Chinese (zh)
Other versions
CN114520838B (en
Inventor
韩升
林友芳
万怀宇
王晶
董兴业
武志昊
吕凯
张硕
曹端鑫
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Beijing Jiaotong University
Original Assignee
Beijing Jiaotong University
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Beijing Jiaotong University filed Critical Beijing Jiaotong University
Priority to CN202210029243.2A priority Critical patent/CN114520838B/en
Publication of CN114520838A publication Critical patent/CN114520838A/en
Application granted granted Critical
Publication of CN114520838B publication Critical patent/CN114520838B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L69/00Network arrangements, protocols or services independent of the application payload and not provided for in the other groups of this subclass
    • H04L69/22Parsing or analysis of headers
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/22Matching criteria, e.g. proximity measures
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/24Classification techniques
    • G06F18/241Classification techniques relating to the classification model, e.g. parametric or non-parametric approaches
    • G06F18/2413Classification techniques relating to the classification model, e.g. parametric or non-parametric approaches based on distances to training or reference patterns
    • G06F18/24147Distances to closest patterns, e.g. nearest neighbour classification
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L69/00Network arrangements, protocols or services independent of the application payload and not provided for in the other groups of this subclass
    • H04L69/18Multiprotocol handlers, e.g. single devices capable of handling multiple protocols
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L69/00Network arrangements, protocols or services independent of the application payload and not provided for in the other groups of this subclass
    • H04L69/30Definitions, standards or architectural aspects of layered protocol stacks
    • H04L69/32Architecture of open systems interconnection [OSI] 7-layer type protocol stacks, e.g. the interfaces between the data link level and the physical level
    • H04L69/322Intralayer communication protocols among peer entities or protocol data unit [PDU] definitions
    • H04L69/329Intralayer communication protocols among peer entities or protocol data unit [PDU] definitions in the application layer [OSI layer 7]
    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y02TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
    • Y02DCLIMATE CHANGE MITIGATION TECHNOLOGIES IN INFORMATION AND COMMUNICATION TECHNOLOGIES [ICT], I.E. INFORMATION AND COMMUNICATION TECHNOLOGIES AIMING AT THE REDUCTION OF THEIR OWN ENERGY USE
    • Y02D30/00Reducing energy consumption in communication networks
    • Y02D30/50Reducing energy consumption in communication networks in wire-line communication networks, e.g. low power modes or reduced link rate

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Data Mining & Analysis (AREA)
  • Signal Processing (AREA)
  • Computer Networks & Wireless Communication (AREA)
  • Computer Security & Cryptography (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Evolutionary Biology (AREA)
  • Evolutionary Computation (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Artificial Intelligence (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Communication Control (AREA)

Abstract

The invention provides a network message matching method of a user-defined protocol application layer based on K neighbor, belonging to the field of data transmission. The method comprises the following steps: constructing a protocol type storage structure according to historical network messages, numbering the protocol type storage structure according to timestamps, obtaining a structural characteristic value and a value array of each protocol type, obtaining K neighbor prediction protocols and characteristic values based on a K neighbor model, inputting the characteristic values into a linear regression equation, obtaining similar weights of protocols to be matched and the K prediction protocols, sequencing the similar weights, obtaining source IP and target IP of the messages to be matched, inquiring the corresponding protocol types, and forming a U with the prediction protocols; matching the network messages to be matched with the protocols in the U one by one, and transmitting data and returning to the protocol types when the matching is successful; and when the matching fails, polling local protocols except the matching U until the matching is correct. The invention improves the data transmission efficiency while ensuring the transmission reliability and safety.

Description

K neighbor-based network message matching method for custom protocol application layer
Technical Field
The invention belongs to the field of message matching, and particularly relates to a network message matching method of a user-defined application layer based on K neighbor.
Background
The internet performs data transmission through a multi-layer protocol stack. With the complexity of networks, especially in some special scenarios, in order to ensure the security and efficiency of communication, application layer protocols are generally customized and used as communication carriers to transmit data. Although the custom application layer protocol has the characteristics of flexibility and the like, difficulties are brought to the protocol matching of the network messages. The conventional transport layer protocol can be simply identified by a protocol number due to the standardization and stability of the protocol. Because the customized protocol has no standardization and instability, namely the old protocol can be deleted at any time or a new protocol can be added at any time, and the application layer protocol has other properties such as mutual nesting and loop nesting, the generally customized application layer protocol adopts a mode of an identification field to carry out protocol identification on network messages.
By adopting the identification field mode, the traditional network message matching mode is changed. For example, in the transport layer, the network packet matches the protocol in such a way that the protocol number of the tenth byte of the network packet is checked to determine which protocol is the protocol, and then the protocol is parsed. And if the mode of the identification field is adopted, the mode of network message matching is to carry out identification field matching on the network messages by sequentially polling and local protocols, and then the matching is analyzed. However, the polling method for matching the local protocol requires traversing the local protocol, which is time-consuming and has low performance.
Disclosure of Invention
In view of the above defects or shortcomings in the prior art, the present invention aims to provide a network packet matching method based on a K-nearest neighbor custom protocol application layer, which improves the K-nearest neighbor, changes time by space, reduces the time for searching the K-nearest neighbors, performs fuzzy prediction on the protocol type of an unknown network packet, dynamically adjusts the K value according to the prediction result, and improves data transmission efficiency while ensuring transmission reliability and security.
In order to achieve the above purpose, the embodiment of the invention adopts the following technical scheme:
a network message matching method of a user-defined protocol application layer based on K neighbor comprises the following steps:
step S1, acquiring historical network messages, constructing a protocol type storage structure according to the historical network messages, numbering protocols in the protocol type storage structure according to timestamps in sequence, and acquiring a structure characteristic value and a value array of each protocol type;
step S2, acquiring K neighbor prediction protocols and feature values of the current network message to be matched based on a K neighbor model according to the protocol type storage structure, the structure feature value and the value array;
step S3, inputting the characteristic values of K neighbor prediction protocols into a linear regression equation, and acquiring the similar weight of the network message protocol to be matched and the K neighbor prediction protocols;
step S4, constructing a current application layer protocol group U; sorting the similar weights of the K neighbor prediction protocols into the protocol group U according to the descending order of the similar weights; then, acquiring a source IP and a target IP of the network message to be matched, inquiring all protocol types corresponding to the source IP and the target IP in a local historical matching record, and adding all protocol types into the tail part of a protocol group U;
step S5, matching the network message to be matched with the protocol in U one by one, when the matching is successful, transmitting data according to the matched protocol, and returning the protocol type of the matching success; and when the matching fails, matching local protocols except the U in a polling mode until the matching is correct, transmitting data according to the matched protocol, and returning the protocol type which is successfully matched.
As a preferred embodiment of the present invention, the method further comprises:
step S6, optimizing a linear regression equation according to the returned protocol type successfully matched;
and step S7, updating the historical network message sequence, the protocol type storage structure and the K value according to the returned successfully matched protocol type.
As a preferred embodiment of the present invention, the step S1 constructs a protocol type storage structure, and obtains a structure characteristic value and a value array of each protocol type, including:
step S101, abstracting different protocol types of historical network messages into different letters, wherein the same protocol type adopts the same letter, and each letter represents a protocol type; continuously storing letters representing protocol types according to the time stamp sequence of the corresponding historical network message to form a protocol type storage structure, and numbering each letter in the protocol type storage structure in sequence;
step S102, counting the letter types in the protocol type storage structure as J types, and searching all serial numbers m of the jth letter in the protocol type storage structure according to the time stamp in the reverse orderij,i∈[1,...,i,...,I], j∈[1,...,j,...,J]I is the number of J-th letters in the sequence, I is less than or equal to n, J is less than or equal to n, I multiplied by J is less than or equal to n, and J takes the current value;
step S103, obtaining the maximum letter number m of the jth letterIjAnd m isIjThe structural characteristic value of the jth letter corresponding to the jth protocol type is used as the structural characteristic value;
step S104, let yij=mi,j-mi-1,j,i∈[2,...,i,...,I]Then Y isj=[y2j,...yij,...,yI,j]And the value array is the j protocol type.
As a preferred embodiment of the present invention, step S2 includes:
step S201, acquiring the protocol with the largest number in the protocol type storage structure, wherein the largest number is recorded as maxSeq, the corresponding protocol type is recorded as k, and cnt is a value array YkAdjusting the K value according to the cnt; when K < cnt, K ═ cnt; when K is larger than or equal to cnt, K is equal to K;
step S202, the serial number in the protocol type storage structure is taken as ZaThe predicted protocols are sequentially marked as Pa(a ═ 1, 2,. K), where,
Figure BDA0003464221040000031
step S203, for any prediction protocol PaThe first characteristic is obtained, and the first characteristic value is maxSeq-Za
Step S204, for any prediction protocol PaThe second characteristic is obtained, the second characteristic value is the time stamp corresponding to the protocol with the number of maxSeq in the protocol type storage structure, and the time stamp is subtracted by the time stamp with the number of ZaA timestamp corresponding to the protocol of (a);
step S205, for any prediction protocol PaAnd solving a third characteristic, wherein the third characteristic value is len, and the len takes the following values:
in the protocol type storage structure, the protocol type satisfying the number maxSeq +1-j is equal to the number ZaJ starts from 1 and increases 1 step by step until len decreases 1 from j when the condition is not satisfied.
As a preferred embodiment of the present invention, step S3 includes:
step S301, for any prediction protocol PaInputting the three characteristic values into a linear regression equation to obtain the similarity value of the neighbor;
step S302, forming a similarity value vector by the K similarity values corresponding to the K prediction protocols, inputting the similarity value vector into the softmax function, carrying out normalization processing, and obtaining the similarity weight.
As a preferred embodiment of the present invention, step S6 employs gradient descentThe method is used for carrying out parameter optimization on the linear regression equation, solving the Loss function value of softmax and the Loss function LossiComprises the following steps:
Lossi=-lnyi (3)
for Loss function LossiDerivative to yi-1,yi-1 is the gradient that needs to be updated in reverse; and performing parameter optimization on the linear regression equation according to the gradient.
As a preferred embodiment of the present invention, step S7 includes:
step 701, if the protocol of the network message is not in the K neighbor prediction protocols but in the protocol type storage structure, adding 1 to K; if in the prediction of continuous omega network messages, the real protocols of the network messages are all in the prediction result of the K neighbor algorithm, but are not PKThen subtracting 1 from K; where ω is defined as the amount of data per second in the actual environment;
step 702, filling the predicted network message protocol into a protocol type storage structure, and numbering the protocol type storage structure as maxSeq + 1;
step 703, taking the protocol type of the network message at the tail of the protocol type storage structure, and recording as q; value array Y for letter type qqTail addition value (maxSeq + 1-m)Iq) (ii) a Updating the structural characteristic value m of the protocol type q corresponding to the current letterIqIf it is maxSeq +1, the step S2 is returned to, and the protocol type of the next network packet is predicted.
The technical scheme provided by the embodiment of the invention has the following beneficial effects:
the message matching method of the user-defined protocol application layer based on the K neighbor is used for carrying out fuzzy prediction on the protocol type of the unknown network message, so that the traditional polling matching mode is avoided. In the fuzzy prediction model, firstly, an improved K neighbor algorithm is used for solving K neighbors, and the characteristics of each neighbor are extracted. And then, a linear regression equation is established for the characteristics to calculate the matching degree weight of each neighbor, and softmax is used for the weight, so that the reliability of the weight of the prediction result is increased. In the method for improving the K neighbors, a space time-changing method is adopted, and the time for searching the K neighbors is greatly reduced. And the K value can be dynamically adjusted according to the predicted result, the rule of data is fully utilized, the reliability is ensured, and the efficiency is improved. The model is recorded in a known high-real-time recording system, and experimental results show that the performance of the system is greatly improved. In the polling matching mode, the peak value analyzed every second of the high-speed real-time admission system is 2000 packets; in the frequency-matched mode, the peak value resolved per second is 3000 packets. The network message receiving quantity of the current system can reach 5000 packets per second, the resolution rate can reach 5000 packets per second through KNN prediction, the system performance bottleneck is broken through, the communication efficiency of an aircraft carrier combat system is improved, the comprehensive combat capability of an aircraft carrier combat group is improved, and the system is assisted by national defense career in China.
Of course, not all of the advantages described above need to be achieved at the same time in the practice of any one product or method of the invention.
Drawings
In order to more clearly illustrate the technical solutions in the embodiments of the present invention, the drawings needed to be used in the description of the embodiments will be briefly introduced below, and it is obvious that the drawings in the following description are only some embodiments of the present invention, and it is obvious for those skilled in the art to obtain other drawings based on these drawings without creative efforts.
Fig. 1 is a flowchart of a network packet matching method of a custom protocol application layer based on K neighbors in an embodiment of the present invention;
FIG. 2 is a schematic diagram of a protocol type storage structure constructed in an embodiment of the present invention;
fig. 3 is a schematic diagram of input and output based on K-nearest neighbor protocol prediction in an embodiment of the present invention.
Detailed Description
After finding the above problems, the inventors of the present application have conducted a detailed study on the existing network packet matching method of the custom application layer. Research shows that the matching times can be reduced and the transmission performance of the application layer data of the custom protocol can be greatly improved by a network message matching mode of predicting the protocol type of the network message of unknown type and then analyzing according to the prediction result.
The existing network message protocol type prediction has three types of methods, including a time sequence prediction method, a traditional machine learning method and a deep learning prediction method. Among the classical time series prediction methods, there are the Historical Average (HA), the Vector Autoregressive (VAR), the moving average autoregressive (ARIMA), and their variants. The methods mainly carry out prediction by excavating rules on a time dimension from a time sequence of flow, and generally require that the time sequence has certain periodicity or regularity, so the prediction effect is poor; the deep learning prediction method comprises a cyclic neural network method, a graph neural network method and the like, and the methods can not be applied to a high-speed real-time scene because the training of the model consumes more time; in the traditional machine learning method, an inertia learning method K is near, a training network is not needed, and the inertia learning method can be directly applied to different scenes. However, the K-nearest neighbor algorithm has the disadvantages of slow classification speed, strong dependence on sample library capacity, same feature action, and inaccurate K value selection, and needs to be improved.
It should be noted that the above prior art solutions have defects which are the results of practical and careful study by the inventors, and therefore, the discovery process of the above problems and the solutions proposed by the following embodiments of the present invention to the above problems should be the contribution of the inventors to the present invention in the course of the present invention.
The technical solution in the embodiments of the present invention will be clearly and completely described below with reference to the accompanying drawings in the embodiments of the present invention. It is to be understood that the described embodiments are merely a few embodiments of the invention, and not all embodiments. The components of embodiments of the present invention generally described and illustrated in the figures herein may be arranged and designed in a wide variety of different configurations. It should be noted that the embodiments and features of the embodiments of the present invention may be combined with each other without conflict.
It should be noted that: like reference numbers and letters refer to like items in the following figures, and thus, once an item is defined in one figure, it need not be further defined and explained in subsequent figures. In the description of the present invention, the terms "first," "second," "third," "fourth," and the like are used merely to distinguish one description from another, and are not to be construed as merely or implying relative importance.
After the above deep analysis, the embodiment of the present application provides a network packet matching method for a user-defined protocol application layer based on K neighbor, which performs fuzzy prediction on the protocol type of an unknown network packet by using an improved K neighbor method, dynamically adjusts a K value according to a prediction result, fully utilizes a rule of data, ensures transmission reliability and security, and improves data transmission efficiency. For the improvement of the K neighbors, a space time-changing method is adopted, and the time for searching the K neighbors is greatly reduced.
In this embodiment, a communication data set generated in a simulation communication environment established with reference to a network communication system of a certain military system in China is taken as an example, and the communication data set includes up to hundreds of application layer protocols and various types of network messages, such as a high-frequency small message, a high-frequency large message, a low-frequency large message, a random occurrence message, an instantaneous high-frequency message, and the like. In this embodiment, a historical network message sequence is established for the network messages in the simulation environment, and the K neighbor is optimized by adopting a space time-changing mode. Finally, the effect of matching the application-optimized K neighbor result is compared with the traditional direct polling and frequency matching mode, the effect on the data set in the existing simulation environment is obviously improved compared with the prior art, and the content matching efficiency of the application layer network messages is obviously improved from ten times of average matching of each original network message to two times of the original network message.
Fig. 1 is a flowchart of a network packet matching method for a custom protocol application layer based on K neighbors in this embodiment. As shown in fig. 1, the network packet matching method includes the following steps:
and step S1, initializing, acquiring historical network messages, constructing a protocol type storage structure according to the historical network messages, numbering the protocols in the protocol type storage structure according to the time stamps in sequence, and acquiring the structural characteristic value and the value array of each protocol type.
In this step, the number of the acquired historical network messages is arbitrary, and usually enough samples are selected to cover all possible protocol types. The protocol type storage structure is a data basis for acquiring K neighbors.
Further, the constructing a protocol type storage structure and obtaining a structure characteristic value and a value array of each protocol type includes:
step S101, abstracting different protocol types of historical network messages into different letters, wherein the same protocol type adopts the same letter, and each letter represents a protocol type; and continuously storing letters representing the protocol types according to the time stamp sequence of the corresponding historical network message to form a protocol type storage structure, and numbering each letter in the protocol type storage structure in sequence.
As shown in fig. 2, the letters representing the protocol types corresponding to the historical network messages are stored continuously according to the timestamps, and if the protocol letters corresponding to the historical network messages are B, A, C, M, X, Y, B, C, M, Y, Z, D, Y, B, A, C, M in sequence, the protocol type storage structure is: BACMXYBCMYZDYBACM. The last M-type message is closest to the message to be matched in time and is the tail of the storage structure; b is farthest, which is the first bit of the storage structure; there are three messages with the same protocol type as M.
In this step, the protocol type letter corresponding to the historical network packet with the farthest timestamp of the network packet to be matched is usually used as the first to perform numbering, and the numbering is started from the first B by taking the storage structure as an example. The number of the serial numbers is from 0 to the protocol type letter corresponding to the historical network message with the latest timestamp of the network message to be matched. For example, corresponding to n historical network messages, n letters are arranged in the corresponding sequence, the letter farthest from the timestamp of the network message to be matched is numbered as 0, and the nearest letter is numbered as n-1.
Step S102, counting the letter types in the protocol type storage structure as J types, and searching all serial numbers m of the jth letter in the protocol type storage structure according to the time stamp in the reverse orderij,i∈[1,...,i,...,I], j∈[1,...,j,...,J]And I is the number of J-th letters in the sequence, I is less than or equal to n, J is less than or equal to n, I multiplied by J is less than or equal to n, and J takes the current value. It should be noted that the number of the letter types is the number of the protocol types, i.e. there are J protocol types.
Step S103, obtaining the maximum letter number m of the jth letterIjAnd m isIjThe j-th letter is the structure characteristic value corresponding to the j-th protocol type.
Step S104, let yij=mi,j-mi-1,j,i∈[2,...,i,...,I]Then Y isj=[y2j,...yij,...,yI,j]Is a value array of the jth protocol type.
Correspondingly, for J letter types, J protocol types exist, and correspondingly J numeric arrays and J structural characteristic values exist. In this step, the numbers of all the historical network messages under each letter type, i.e. protocol type, are processed numerically, the difference between the numbers of the adjacent messages of the same type is calculated, and a sequence Y is formed.
And step S2, acquiring prediction protocols and characteristic values of K neighbors of the current network message to be matched based on the K neighbor model according to the protocol type storage structure, the structural characteristic value and the value array.
The method comprises the following steps:
step S201, acquiring the protocol with the largest number in the protocol type storage structure, wherein the largest number is recorded as maxSeq, the corresponding protocol type is recorded as k, and cnt is a value array YkAdjusting the K value according to the cnt; when K < cnt, K ═ cnt; when K is larger than cnt, K is equal to K.
Step S202, number Z in the protocol type storage structure is takenaThe predicted protocols are sequentially marked as Pa(a ═ 1, 2,. K), where,
Figure BDA0003464221040000081
step S203, for any prediction protocol PaThe first characteristic is obtained, and the first characteristic value is maxSeq-Za
Step S204, for any prediction protocol PaThe second characteristic is obtained, the second characteristic value is the time stamp corresponding to the protocol with the number of maxSeq in the protocol type storage structure, and the time stamp is subtracted by the time stamp with the number of ZaThe protocol of (2) is corresponding to the timestamp.
Step S205, for any prediction protocol PaAnd solving a third characteristic, wherein the third characteristic value is len, and the len takes the following values:
in the protocol type storage structure, the protocol type satisfying the number maxSeq +1-j is equal to the number ZaJ starts from 1 and increases 1 step by step until len decreases 1 from j when the condition is not satisfied.
Correspondingly, three characteristic values of the prediction protocols of the K neighbors are respectively solved.
And step S3, inputting the characteristic values of the K neighbor prediction protocols into a linear regression equation, and acquiring the similar weight of the network message protocol to be matched and the K neighbor prediction protocols.
The method comprises the following steps:
step S301, for any prediction protocol PaThe three characteristic values of the neighbor are input into a linear regression equation, and the similarity value of the neighbor is obtained.
Step S302, forming a similarity value vector by the K similarity values corresponding to the K prediction protocols, inputting the similarity value vector into the softmax function, carrying out normalization processing, and acquiring the similarity weight. The larger the weight percentage, the higher the similarity.
Step S4, constructing a current application layer protocol group U; sorting the similar weights of the K neighbor prediction protocols into the protocol group U according to the descending order of the similar weights; and then acquiring a source IP and a destination IP of the network message to be matched, inquiring all protocol types corresponding to the source IP and the destination IP in the local historical matching record, and adding all protocol types into the tail part of the protocol group U.
Fig. 3 is a schematic diagram of input and output based on a K-nearest neighbor model when performing protocol prediction in the network packet matching method. As shown in fig. 3, a historical network packet and a network packet to be predicted are input in the K neighbor model, K predicted network packet protocols are obtained through the above steps, and are arranged in a descending order according to similar weights, where the percentage shown in fig. 3 is the similar weight.
And the matched network message sequence is a protocol type corresponding to each pair of source IP and destination IP predicted in the local history matching process, and the protocol type is recorded under the corresponding source IP and destination IP pair or IP group. After each prediction is completed, the result of each prediction is also added into the local record. To ensure reliability of the predicted protocol for the network message, the prediction output should contain enough predicted protocols to ensure that the correct protocol containing the network message to be matched is contained. Before prediction, matching the source IP and the target IP of the message to be matched with the local historical IP pair, and recording all protocols of the network message transmitted before each pair of source IP and target IP, namely IP pair, in the local historical data; every time a network message to be matched is received, all protocols corresponding to the corresponding group of IP pairs are inquired, and the protocol which is not predicted by the K neighbor but corresponds to the IP group is added into output.
Step S5, matching the network message to be matched with the protocol in U one by one, when the matching is successful, transmitting data according to the matched protocol, and returning the protocol type of the matching success; and when the matching fails, matching local protocols except the U in a polling mode until the matching is correct, transmitting data according to the matched protocol, and returning the protocol type which is successfully matched.
And step S6, optimizing a linear regression equation according to the returned protocol type successfully matched.
This step is to explain that, under the condition that the protocol type of the network message is predicted, the parameter of the linear regression equation is adjusted and optimized by adopting a gradient descent method. Solving the loss function value of softmax, wherein the cross entropy is used as the loss function:
Figure BDA0003464221040000091
in the formula (2), tiRepresenting true value, yiIndicates the found softmax value, and when the ith value is predicted, it can be considered that t isiThe loss function becomes 1:
Lossi=-lnyi (3)
derivation of the loss function to yi-1,yiThe-1 is the gradient that needs to be updated backwards. And performing parameter optimization on the linear regression equation according to the gradient.
And step S7, updating the historical network message sequence, the protocol type storage structure, the K value and the like according to the returned successfully matched protocol type.
The method comprises the following specific steps:
step 701, if the protocol to which the network packet belongs is not in the K neighbor prediction protocols but in the protocol type storage structure, add 1 to K to ensure that the K neighbor model can predict more possible results. If in the prediction of continuous omega network messages, the real protocols of the network messages are all in the prediction result of the K neighbor algorithm, but are not PKThen K is decremented by 1. Where ω is defined as the amount of data per second in the actual environment, determined according to actual requirements.
Step 702, filling the predicted network message protocol into a protocol type storage structure, and numbering as maxSeq + 1.
Step 703, the protocol type of the network message at the tail of the protocol type storage structure is taken and recorded as q. Value array Y for letter type qqTail addition value (maxSeq + 1-m)Iq). Updating the structural characteristic value m of the protocol type q corresponding to the current letterIqIf it is maxSeq +1, the step S2 is returned to, and the protocol type of the next network packet is predicted.
It can be seen from the above that, the network message matching method based on the K-nearest neighbor custom protocol application layer according to the embodiment of the present invention fully extracts the rules of application layer network messages in a complex network, and by utilizing the rules, and by establishing a protocol type storage structure for a historical network message sequence, the purpose of predicting the type of an unknown datagram is achieved. The embodiment of the invention optimizes the K nearest neighbor method from multiple angles and combines with regression analysis, thereby solving the problems of slow nearest neighbor searching, unreliable nearest neighbor searching and the like of the K nearest neighbor, ensuring that the K nearest neighbor algorithm is successfully applied to engineering, and greatly improving the efficiency of application layer network message matching.
The above description is only a preferred embodiment of the invention and an illustration of the applied technical principle and is not intended to limit the scope of the claimed invention but only to represent a preferred embodiment of the invention. It will be appreciated by those skilled in the art that the scope of the invention herein disclosed is not limited to the particular combination of features described above, but also encompasses other arrangements formed by any combination of the above features or their equivalents without departing from the spirit of the invention. All other embodiments, which can be derived by a person skilled in the art from the embodiments of the present invention without making any creative effort, shall fall within the protection scope of the present invention.

Claims (7)

1. A network message matching method of a user-defined protocol application layer based on K neighbor is characterized by comprising the following steps:
step S1, acquiring historical network messages, constructing a protocol type storage structure according to the historical network messages, numbering protocols in the protocol type storage structure according to timestamps in sequence, and acquiring a structure characteristic value and a value array of each protocol type;
step S2, acquiring K neighbor prediction protocols and feature values of the current network message to be matched based on a K neighbor model according to the protocol type storage structure, the structure feature value and the value array;
step S3, inputting the characteristic values of K neighbor prediction protocols into a linear regression equation, and acquiring the similar weight of the network message protocol to be matched and the K neighbor prediction protocols;
step S4, constructing a current application layer protocol group U; sorting the similar weights of the K neighbor prediction protocols into the protocol group U according to the descending order of the similar weights; then acquiring a source IP and a target IP of the network message to be matched, inquiring all protocol types corresponding to the source IP and the target IP in a local historical matching record, and adding all protocol types into the tail part of a protocol group U;
step S5, matching the network message to be matched with the protocol in U one by one, when matching is successful, transmitting data according to the matched protocol, and returning the successfully matched protocol type; and when the matching fails, matching local protocols except the U in a polling mode until the matching is correct, transmitting data according to the matched protocol, and returning the protocol type which is successfully matched.
2. The method of claim 1, further comprising:
step S6, optimizing a linear regression equation according to the returned protocol type successfully matched;
and step S7, updating the historical network message sequence, the protocol type storage structure and the K value according to the returned successfully matched protocol type.
3. The method for matching network packets of a custom protocol application layer according to claim 2, wherein the step S1 is to construct a protocol type storage structure, and obtain a structure characteristic value and a value array of each protocol type, and the method includes:
step S101, abstracting different protocol types of historical network messages into different letters, wherein the same protocol type adopts the same letter, and each letter represents a protocol type; continuously storing letters representing protocol types according to the time stamp sequence of the corresponding historical network message to form a protocol type storage structure, and numbering each letter in the protocol type storage structure in sequence;
step S102, counting the letter types in the protocol type storage structure as J types, and searching all serial numbers m of the jth letter in the protocol type storage structure according to the time stamp in the reverse orderij,i∈[1,...,i,...,I],j∈[1,...,j,...,J]I is the number of J-th letters in the sequence, I is less than or equal to n, J is less than or equal to n, I multiplied by J is less than or equal to n, and J takes the current value;
step S103, obtaining the maximum letter of the jth letterNumber mIjAnd m isIjThe structural characteristic value of the jth letter corresponding to the jth protocol type is used as the structural characteristic value;
step S104, let yij=mi,j-mi-1,j,i∈[2,...,i,...,I]Then Y isj=[y2j,...yij,...,yI,j]And the value array is the j protocol type.
4. The method for matching network messages of the custom protocol application layer according to claim 3, wherein the step S2 includes:
step S201, acquiring the protocol with the largest number in the protocol type storage structure, wherein the largest number is recorded as maxSeq, the corresponding protocol type is recorded as k, and cnt is a value array YkAdjusting the K value according to the cnt; when K < cnt, K ═ cnt; when K is larger than or equal to cnt, K is equal to K;
step S202, number Z in the protocol type storage structure is takenaThe predicted protocols are sequentially marked as Pa(a ═ 1, 2,. K), where,
Figure FDA0003464221030000021
step S203, for any prediction protocol PaThe first characteristic is obtained, and the first characteristic value is maxSeq-Za
Step S204, for any prediction protocol PaThe second characteristic is obtained, the second characteristic value is the time stamp corresponding to the protocol with the number of maxSeq in the protocol type storage structure, and the time stamp is subtracted by the time stamp with the number of ZaA timestamp corresponding to the protocol of (a);
step S205, for any prediction protocol PaAnd solving a third characteristic, wherein the third characteristic value is len, and the len takes the following values:
in the protocol type storage structure, the protocol type satisfying the number maxSeq +1-j is equal to the number ZaJ is taken from 1 and is gradually added with 1 until the condition is not met, len is the j value at the momentMinus 1.
5. The method for matching network messages of the custom protocol application layer according to claim 4, wherein the step S3 includes:
step S301, for any prediction protocol PaInputting the three characteristic values into a linear regression equation to obtain the similarity value of the neighbor;
step S302, forming a similarity value vector by the K similarity values corresponding to the K prediction protocols, inputting the similarity value vector into the softmax function, carrying out normalization processing, and acquiring the similarity weight.
6. The method for matching network messages of the custom protocol application layer according to claim 5, wherein in step S6, a gradient descent method is used to optimize parameters of the linear regression equation, and a Loss function value of softmax, Loss function Loss, is obtainediComprises the following steps:
Lossi=-lnyi (3)
for Loss function LossiDerivative to yi-1,yi-1 is the gradient that needs to be updated in reverse; and performing parameter optimization on the linear regression equation according to the gradient.
7. The method for matching network messages of the custom protocol application layer according to claim 6, wherein the step S7 includes:
step 701, if the protocol of the network message is not in the K neighbor prediction protocols but in the protocol type storage structure, adding 1 to K; if in the prediction of continuous omega network messages, the real protocols of the network messages are all in the prediction result of the K neighbor algorithm, but are not PKThen subtracting 1 from K; where ω is defined as the amount of data per second in the actual environment;
step 702, filling the predicted network message protocol into a protocol type storage structure, and numbering the protocol type storage structure as maxSeq + 1;
step 703, taking the protocol type of the network message at the tail of the protocol type storage structure, and recording as q; array of values for letter type qYqTail addition value (maxSeq + 1-m)Iq) (ii) a Updating the structural characteristic value m of the protocol type q corresponding to the current letterIqIf the value is maxSeq +1, the step S2 is returned to, and the protocol type of the next network packet is predicted.
CN202210029243.2A 2022-01-11 2022-01-11 K-nearest neighbor-based network message matching method for custom protocol application layer Active CN114520838B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202210029243.2A CN114520838B (en) 2022-01-11 2022-01-11 K-nearest neighbor-based network message matching method for custom protocol application layer

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202210029243.2A CN114520838B (en) 2022-01-11 2022-01-11 K-nearest neighbor-based network message matching method for custom protocol application layer

Publications (2)

Publication Number Publication Date
CN114520838A true CN114520838A (en) 2022-05-20
CN114520838B CN114520838B (en) 2023-10-17

Family

ID=81597627

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202210029243.2A Active CN114520838B (en) 2022-01-11 2022-01-11 K-nearest neighbor-based network message matching method for custom protocol application layer

Country Status (1)

Country Link
CN (1) CN114520838B (en)

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN115499332A (en) * 2022-09-13 2022-12-20 科东(广州)软件科技有限公司 Network message monitoring method, device, equipment and medium
CN116545772A (en) * 2023-07-04 2023-08-04 杭州海康威视数字技术股份有限公司 Protocol identification method, device and equipment for lightweight Internet of things traffic

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN106850338A (en) * 2016-12-30 2017-06-13 西可通信技术设备(河源)有限公司 A kind of R+1 classes application protocol recognition method and device based on semantic analysis
CN109547409A (en) * 2018-10-19 2019-03-29 中国电力科学研究院有限公司 A kind of method and system for being parsed to industrial network transport protocol
CN111756874A (en) * 2020-06-24 2020-10-09 北京天融信网络安全技术有限公司 Method and device for identifying type of DNS tunnel upper layer protocol
CN112702235A (en) * 2020-12-21 2021-04-23 中国人民解放军陆军炮兵防空兵学院 Method for automatically and reversely analyzing unknown protocol
WO2021103135A1 (en) * 2019-11-25 2021-06-03 中国科学院深圳先进技术研究院 Deep neural network-based traffic classification method and system, and electronic device

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN106850338A (en) * 2016-12-30 2017-06-13 西可通信技术设备(河源)有限公司 A kind of R+1 classes application protocol recognition method and device based on semantic analysis
CN109547409A (en) * 2018-10-19 2019-03-29 中国电力科学研究院有限公司 A kind of method and system for being parsed to industrial network transport protocol
WO2021103135A1 (en) * 2019-11-25 2021-06-03 中国科学院深圳先进技术研究院 Deep neural network-based traffic classification method and system, and electronic device
CN111756874A (en) * 2020-06-24 2020-10-09 北京天融信网络安全技术有限公司 Method and device for identifying type of DNS tunnel upper layer protocol
CN112702235A (en) * 2020-12-21 2021-04-23 中国人民解放军陆军炮兵防空兵学院 Method for automatically and reversely analyzing unknown protocol

Non-Patent Citations (8)

* Cited by examiner, † Cited by third party
Title
CAO SS 等: "Deep neural networks for learning graph representations", 《THIRTIETH AAAI CONFERENCE ON ARTIFICIAL INTELLIGENCE》, pages 1145 - 1152 *
DEEPAK NADIG 等: "Comparative Performance Evaluation of High-performance Data Transfer Tools", 《2018 IEEE INTERNATIONAL CONFERENCE ON ADVANCED NETWORKS AND TELECOMMUNICATIONS SYSTEMS (ANTS)》, pages 1 - 6 *
SCHOPPMANN, P 等: "Make Some ROOM for the Zeros: Data Sparsity in Secure Distributed Machine Learning", 《PROCEEDINGS OF THE 2019 ACM SIGSAC CONFERENCE ON COMPUTER AND COMMUNICATIONS SECURITY (CCS\'19)》, pages 1335 - 1350 *
冯文博;洪征;吴礼发;李毅豪;林培鸿;: "基于卷积神经网络的应用层协议识别方法", 计算机应用, no. 12, pages 3615 - 3621 *
殷亚博 等: "基于搜索改进的KNN文本分类算法", 《计算机工程与设计》, vol. 39, no. 9, pages 2923 - 2928 *
洪征;龚启缘;冯文博;李毅豪;: "自适应聚类的未知应用层协议识别方法", 计算机工程与应用, no. 05, pages 109 - 117 *
葛月月: "改进的LMS-KNN近邻分类方法研究", 《中国优秀硕士学位论文全文数据库 (信息科技辑)》, pages 3 *
谭骏;陈兴蜀;杜敏;: "基于特征加权与最近邻法的P2P协议识别算法", 四川大学学报(工程科学版), no. 04, pages 116 - 123 *

Cited By (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN115499332A (en) * 2022-09-13 2022-12-20 科东(广州)软件科技有限公司 Network message monitoring method, device, equipment and medium
CN115499332B (en) * 2022-09-13 2023-12-15 科东(广州)软件科技有限公司 Method, device, equipment and medium for monitoring network message
CN116545772A (en) * 2023-07-04 2023-08-04 杭州海康威视数字技术股份有限公司 Protocol identification method, device and equipment for lightweight Internet of things traffic
CN116545772B (en) * 2023-07-04 2023-09-19 杭州海康威视数字技术股份有限公司 Protocol identification method, device and equipment for lightweight Internet of Things traffic

Also Published As

Publication number Publication date
CN114520838B (en) 2023-10-17

Similar Documents

Publication Publication Date Title
CN114520838B (en) K-nearest neighbor-based network message matching method for custom protocol application layer
Yang et al. Generative adversarial learning for intelligent trust management in 6G wireless networks
CN109951444B (en) Encrypted anonymous network traffic identification method
Kim et al. Dynamic clustering in federated learning
CN113489674B (en) Malicious traffic intelligent detection method and application for Internet of things system
CN111813858B (en) Distributed neural network hybrid synchronous training method based on self-organizing grouping of computing nodes
Yousefiankalareh et al. Tree-based routing protocol in wireless sensor networks using optimization algorithm batch particles with a mobile sink
Sharma et al. eeFFA/DE-a fuzzy-based clustering algorithm using hybrid technique for wireless sensor networks
CN114154685A (en) Electric energy data scheduling method in smart power grid
Jeong et al. Deep reinforcement learning-based task offloading decision in the time varying channel
Li et al. Adaptive and Resilient Model-Distributed Inference in Edge Computing Systems
Mahanipour et al. Wrapper-based federated feature selection for iot environments
CN115037638B (en) Unmanned aerial vehicle network data acquisition and transmission control method with low energy consumption and high timeliness
WO2022160752A1 (en) Operations research optimization method and apparatus, and computing device
Song et al. Machine learning-based traffic classification of wireless traffic
CN113114677B (en) Botnet detection method and device
Zhang et al. Low sample and communication complexities in decentralized learning: A triple hybrid approach
Fang et al. Distributed online adaptive subgradient optimization with dynamic bound of learning rate over time‐varying networks
Li et al. Adaptive and Lightweight Network Traffic Classification for Edge Devices
Bano et al. Federated semi-supervised classification of multimedia flows for 3d networks
CN116132167B (en) Multi-protocol botnet detection method oriented to Internet of things
CN113556286B (en) Communication method and system of peer-to-peer network
CN112887300B (en) Data packet classification method
Huff et al. DHA-FL: Enabling Efficient and Effective AIoT via Decentralized Hierarchical Asynchronous Federated Learning
CN116257361B (en) Unmanned aerial vehicle-assisted fault-prone mobile edge computing resource scheduling optimization method

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant