CN115603980A

CN115603980A - Data packet aggregation method and device and electronic equipment

Info

Publication number: CN115603980A
Application number: CN202211215644.3A
Authority: CN
Inventors: 韦云川; 申勇; 万朝华
Original assignee: Hillstone Networks Co Ltd
Current assignee: Hillstone Networks Co Ltd
Priority date: 2022-09-30
Filing date: 2022-09-30
Publication date: 2023-01-13

Abstract

The application discloses a data packet aggregation method and device and electronic equipment. Wherein, the method comprises the following steps: acquiring a data stream to be processed, wherein the data stream consists of N data packets, and N is an integer greater than 1; acquiring the acquisition time of each data packet and the number of bytes of each data packet; determining a four-dimensional relevance vector of each data packet according to the acquisition time, the number of bytes and the transmission direction of each data packet, wherein the four-dimensional relevance vector is used for representing relevance characteristics between two adjacent data packets; and dividing the N data packets into at least one data packet set according to the four-dimensional relevance vector. The data packet set partitioning method and device solve the technical problem that in the prior art, when a plurality of data packets are partitioned into at least one data packet set, the partitioning quality of the data packet set is poor.

Description

Data packet aggregation method and device and electronic equipment

Technical Field

The present application relates to the field of data processing and the field of information security, and in particular, to a method and an apparatus for aggregating data packets, and an electronic device.

Background

In the field of information security, whether abnormal behaviors exist in equipment for sending data streams can be determined by analyzing the data streams, so that the aim of information security detection is fulfilled, and therefore, how to efficiently and accurately analyze the data streams is the key for improving the quality and efficiency of information security detection.

In order to analyze the data stream more efficiently, a plurality of data packets in the data stream are generally divided into at least one data packet set for analysis. However, in the prior art, when a plurality of data packets are divided into at least one data packet set, only a single division is performed according to the acquisition time of the data packets, for example, the data packets acquired within 1 minute are grouped into one data packet set. The method ignores the incidence relation of other characteristic dimensions among the data packets, and causes the poor quality of the division of the data packet set.

In view of the above problems, no effective solution has been proposed.

Disclosure of Invention

The embodiment of the application provides a method and a device for aggregating data packets, and an electronic device, so as to at least solve the technical problem that the dividing quality of a data packet set is poor when a plurality of data packets are divided into at least one data packet set in the prior art.

According to an aspect of an embodiment of the present application, there is provided a method for aggregating data packets, including: acquiring a data stream to be processed, wherein the data stream consists of N data packets, and N is an integer greater than 1; acquiring the acquisition time of each data packet and the number of bytes of each data packet; determining a four-dimensional relevance vector of each data packet according to the acquisition time, the number of bytes and the transmission direction of each data packet, wherein the four-dimensional relevance vector is used for representing relevance characteristics between two adjacent data packets; and dividing the N data packets into at least one data packet set according to the four-dimensional relevance vector.

Further, the aggregation method of the data packets further comprises: determining the acquisition time of the previous data packet of each data packet as a first time according to the arrangement sequence of the N data packets in the data stream; determining the acquisition time of the next data packet of each data packet as a second time; and determining a four-dimensional relevance vector of each data packet according to the first time, the second time, the number of bytes of each data packet and the transmission direction of each data packet.

Further, the aggregation method of the data packets further comprises: when the data packet is a first data packet in the N data packets, determining that first time corresponding to the first data packet is acquisition time of the first data packet; and when the data packet is the last data packet in the N data packets, determining that the second time corresponding to the last data packet is the acquisition time of the last data packet.

Further, the aggregation method of the data packets further comprises: determining the acquisition time of each data packet as target time; calculating the absolute value of the difference value between the target time and the first time to obtain a first interval duration; calculating the absolute value of the difference value between the target time and the second time to obtain a second interval duration; and determining the four-dimensional relevance vector of each data packet according to the first interval duration, the second interval duration, the number of bytes of each data packet and the transmission direction of each data packet.

Further, the method for aggregating data packets further comprises: after determining the four-dimensional relevance vector of each data packet according to the acquisition time, the number of bytes and the transmission direction of each data packet, determining a target data packet from the N data packets, wherein the target data packet is any one of the N data packets except for the last data packet; determining a four-dimensional relevance vector corresponding to a target data packet as a first vector to be processed; determining a four-dimensional relevance vector corresponding to a next data packet of the target data packet as a second to-be-processed vector; according to the first to-be-processed vector and the second to-be-processed vector determining a cosine value of a target included angle of a target data packet; and constructing a relevance vector field according to the cosine value of the target included angle and the four-dimensional relevance vector of each data packet, wherein the relevance vector field is used for representing the relevance degree among the N data packets.

Further, the method for aggregating data packets further comprises: step one, starting from a first data packet in a data stream, determining S data packets according to the arrangement sequence of N data packets in the data stream, step two, detecting whether the S data packets meet preset conditions, and forming a data packet set by the S data packets when the S data packets meet the preset conditions, wherein S is an integer larger than 1, S is smaller than or equal to N, and the preset conditions are used for representing the association degree between the data packets in the data packet set; step three, removing S data packets forming the data packet set in the step two from the data stream to obtain a new data stream; and step four, updating the data stream based on the new data stream, and repeatedly executing the processes from the step one to the step three until all the N data packets are divided into the data packet set.

Further, the method for aggregating data packets further comprises: step 1, determining a four-dimensional relevance vector corresponding to a first data packet as a first vector; step 2, according to the arrangement sequence, starting from a second data packet, determining that four-dimensional relevance vectors corresponding to S-1 data packets are second vectors to obtain S-1 second vectors, wherein S is an integer greater than or equal to 2, and S is less than or equal to N; step 3, summing the first vector and the S-1 second vectors to obtain a first target vector; step 4, determining a four-dimensional relevance vector corresponding to the S +1 th data packet as a third vector; step 5, summing the first target vector and the third vector to obtain a second target vector; step 6, determining whether S data packets meet preset conditions according to the first vector, the first target vector and the second target vector, wherein the S data packets comprise a first data packet and S-1 data packets; and 7, when the S data packets do not meet the preset condition, adding 1 to the S to obtain the updated S, and then circularly executing the processes from the step 2 to the step 7 until the S data packets meet the preset condition, and forming a data packet set by the S data packets.

Further, the method for aggregating data packets further comprises: calculating a cosine value of an included angle between the first target vector and the first vector to obtain a cosine value of the first included angle; calculating a cosine value of an included angle between the second target vector and the first vector to obtain a cosine value of a second included angle; and when the cosine values of the first included angle and the second included angle are opposite numbers, determining that the S data packets meet the preset condition.

According to another aspect of the embodiments of the present application, there is also provided an apparatus for aggregating data packets, including: the device comprises a first acquisition module, a second acquisition module and a processing module, wherein the first acquisition module is used for acquiring a data stream to be processed, the data stream consists of N data packets, and N is an integer greater than 1; the second acquisition module is used for acquiring the acquisition time of each data packet and the number of bytes of each data packet; the determining module is used for determining a four-dimensional relevance vector of each data packet according to the acquisition time, the number of bytes and the transmission direction of each data packet, wherein the four-dimensional relevance vector is used for representing relevance characteristics between two adjacent data packets; and the dividing module is used for dividing the N data packets into at least one data packet set according to the four-dimensional relevance vector.

According to another aspect of embodiments of the present application, there is also provided an electronic device, including one or more processors; a storage device for storing one or more programs which, when executed by the one or more processors, cause the one or more processors to implement a method for running a program, wherein the program is arranged to perform the above-described method of determining abnormal software when run.

In the application, a mode of determining a four-dimensional association degree vector of each data packet according to the acquisition time, the byte number and the transmission direction of each data packet is adopted, after a data stream to be processed is acquired, the acquisition time of each data packet and the byte number of each data packet are acquired, then the four-dimensional association degree vector of each data packet is determined according to the acquisition time, the byte number and the transmission direction of each data packet, and finally N data packets are divided into at least one data packet set according to the four-dimensional association degree vector. The data stream is composed of N data packets, N is an integer greater than 1, and the four-dimensional relevance degree vector is used for representing relevance characteristics between two adjacent data packets.

Based on the above, the four-dimensional relevance vector is determined according to the acquisition time, the byte number and the transmission direction of the data packets, and the relevance characteristics among the data packets are analyzed from multiple dimensions, on the basis, compared with the prior art, the four-dimensional relevance vector dividing method divides N data packets into at least one data packet set according to the four-dimensional relevance vector, can determine the relevance relation among the data packets from more characteristic dimensions and divides the data packet set, thereby improving the dividing quality of the data packet set and ensuring that at least one data packet in each data packet set has strong relevance.

Therefore, the technical scheme of the application achieves the purpose of determining the correlation characteristics among the data packets from multiple dimensions, so that the effect of improving the dividing quality of the data packets is achieved, and the technical problem that the dividing quality of the data packet set is poor when a plurality of data packets are divided into at least one data packet set in the prior art is solved.

Drawings

The accompanying drawings, which are included to provide a further understanding of the application and are incorporated in and constitute a part of this application, illustrate embodiment(s) of the application and together with the description serve to explain the application and not to limit the application. In the drawings:

fig. 1 is a flow chart of an alternative method of aggregating packets according to an embodiment of the present application;

FIG. 2 is a diagram illustrating four characteristic dimensions of an alternative packet according to an embodiment of the present application;

FIG. 3 is a schematic diagram of an alternative relevancy vector according to embodiments of the present application;

fig. 4 is a flowchart of detecting whether S packets satisfy a predetermined condition according to an embodiment of the present application;

fig. 5 is a schematic diagram of an alternative packet aggregation apparatus according to an embodiment of the present application.

Detailed Description

In order to make the technical solutions better understood by those skilled in the art, the technical solutions in the embodiments of the present application will be clearly and completely described below with reference to the drawings in the embodiments of the present application, and it is obvious that the described embodiments are only partial embodiments of the present application, but not all embodiments. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present application.

It should be noted that the terms "first," "second," and the like in the description and claims of this application and in the drawings described above are used for distinguishing between similar elements and not necessarily for describing a particular sequential or chronological order. It is to be understood that the data so used is interchangeable under appropriate circumstances such that the embodiments of the application described herein are capable of operation in sequences other than those illustrated or described herein. Furthermore, the terms "comprises," "comprising," and "having," and any variations thereof, are intended to cover a non-exclusive inclusion, such that a process, method, system, article, or apparatus that comprises a list of steps or elements is not necessarily limited to those steps or elements expressly listed, but may include other steps or elements not expressly listed or inherent to such process, method, article, or apparatus.

Example 1

In accordance with an embodiment of the present application, there is provided an embodiment of a method for aggregating packets, it is noted that the steps illustrated in the flowchart of the figure may be performed in a computer system such as a set of computer executable instructions, and that while a logical order is illustrated in the flowchart, in some cases the steps illustrated or described may be performed in an order different than here.

Fig. 1 is a flowchart of an optional packet aggregation method according to an embodiment of the present application, and as shown in fig. 1, the method includes the following steps:

step S101, acquiring a data stream to be processed.

In step S101, the data stream is composed of N packets, where N is an integer greater than 1. Specifically, the data stream in the present application may be a Domain Name System (DNS) data stream in a plaintext state, or may be an encrypted DNS data stream, for example, the encrypted DNS data stream is encrypted by a Domain Name System (DNS over HTTPS) technology. In addition, the data stream may also be a data stream based on other information transmission protocols.

In addition, a network security device may be used as an execution subject of the aggregation method of the data packets in the embodiment of the present application, where the network security device includes, but is not limited to, a firewall device and other security devices. The network security device can be connected with a plurality of terminal devices, and acquires data streams to be processed from the plurality of terminal devices. The terminal devices include, but are not limited to, desktop computers, industrial personal computers, servers and other devices.

Step S102, acquiring the acquiring time of each data packet and the number of bytes of each data packet.

In step S102, after obtaining the data stream to be processed, the network security device may obtain the byte number of each of N data packets in the data stream, where the byte numbers of any two data packets may be the same or different. A packet may be a request packet or a response packet.

In addition, each time the network security device obtains a data packet, the network security device records the obtaining time of the data packet. For example, the network security device is connected to the terminal device a and the terminal device B, and the terminal device a generates three data packets, namely, a data packet 1, a data packet 2, and a data packet 3, and sends the three data packets to the terminal device B, however, before the terminal device B receives the three data packets, the three data packets are first acquired by the network security device, and then when the network security device performs security detection on the three data packets and confirms that the data packets are in a normal state, the network security device forwards the three data packets to the terminal device B, wherein when the network security device acquires the three data packets, the network security device records the acquisition time of each data packet.

Step S103, determining a four-dimensional relevance vector of each data packet according to the acquisition time, the number of bytes and the transmission direction of each data packet.

In step S103, the four-dimensional association degree vector is used to characterize the association between two adjacent data packets. Specifically, in the N data packets, the transmission direction of each data packet is a first direction or a second direction, where the first direction and the second direction are opposite directions, for example, when the first direction is a forward direction, the second direction is a reverse direction. In addition, according to the acquisition time of each data packet, the network security device calculates the interval duration of each data packet and the adjacent data packet in the acquisition time. Because there are two front and back data packets, there are two interval durations. On the basis, each data packet can be described from four characteristic dimensions, namely the number of bytes of the data packet, the transmission direction of the data packet, the interval duration between the data packet and the previous data packet, and the interval duration between the data packet and the next data packet. It should be noted that the duration of the interval between the first data packet and the previous data packet may be set to 0 by default; the interval duration between the last data packet and the next data packet may default to 0.

Further, the four-dimensional relevancy vector is used for specifically describing four characteristic dimensions of each data packet, and relevancy among the data packets can be determined through the four characteristic dimensions described by the four-dimensional relevancy vector. For example, the similarity of two packets in each characteristic dimension.

And step S104, dividing the N data packets into at least one data packet set according to the four-dimensional relevance vector.

In step S104, since the four-dimensional relevance vector of each data packet at least characterizes the relevance between two adjacent data packets, the N data packets may be divided into at least one data packet set based on the four-dimensional relevance vector. Each data packet set at least comprises one data packet, and the same data packet does not exist in different data packet sets. For example, assume that there are 10 data packets, wherein the 1 st data packet and the 2 nd data packet are divided into a first data packet set, the 3 rd to 7 th data packets are divided into a second data packet set, and the 8 th to 10 th data packets are divided into a third data packet set.

Optionally, in the prior art, DNS is a common information transmission protocol, where although a mechanism for DNS plaintext transmission facilitates security check and audit of security devices such as firewalls, a serious privacy problem and a serious security problem are caused at the same time. In order to solve the information security problem in the DNS plaintext transmission mechanism, a DoH technique may be used to encrypt the DNS request packet and the DNS response packet, so as to prevent a third party application from eavesdropping and tampering the DNS packet in the plaintext state.

However, as the use range of the DoH technology is increasingly expanded, the tunnel attack behavior based on the DoH is more frequent. The tunnel attack behavior of DoH is essentially a layer of HTTPS encapsulation on the original DNS tunnel attack behavior.

It should be noted that, in the actual network communication process, the external connection status is checked through various firewall devices, and if the communication abnormality is found, the firewall device blocks the communication. On the basis, the data initiator encapsulates the data packet according to the type or port of the data packet allowed by the firewall equipment, then the data packet passes through the firewall equipment to communicate with the data receiver, and after the encapsulated data packet reaches the data receiver, the data receiver restores the data packet, and then the restored data packet is sent to the corresponding server. This technique is called tunneling.

When a lawbreaker engages in an information intrusion behavior, the lawbreaker uses a legal protocol such as DNS and ICMP to construct a hidden tunnel to shield illegal information transmitted by a device, such information security attack behavior is called tunnel attack behavior, and the above-mentioned tunnel attack behavior of DoH is a tunnel attack behavior performed by combining a DoH technology with a DNS protocol.

It should be further noted that, in the prior art, DNS tunnel attack behavior is usually detected based on domain name information, but this detection technique can be successfully implemented if any third-party application can acquire domain name information by analyzing a DNS data stream because the DNS data stream is an unencrypted data stream in a plaintext state. On this basis, under the condition that the DNS data stream is encrypted by using the DoH technology, since the content of the DNS request and the content of the DNS response are no longer visible, the existing DNS tunneling attack detection technology based on domain name information cannot be applied to the process of detecting the DNS tunneling attack.

In order to detect the attack behavior of the DoH tunnel, time sequence analysis of time dimension needs to be performed on the encrypted HTTPS data stream, so as to lay a technical foundation for subsequent further detection, and the time sequence analysis needs to effectively aggregate data packets in the encrypted data stream. However, in the prior art, when a plurality of packets in an encrypted state are divided into packet sets, only a single packet is divided according to the acquisition time of the packet, and for example, the packets acquired within 1 minute are grouped into one data set. The method ignores the incidence relation of other characteristic dimensions among the data packets, and causes the poor quality of the division of the data packet set.

In the method, the four-dimensional relevance vector is determined through the acquisition time, the byte number and the transmission direction of the data packets, the relevance characteristics among the data packets are analyzed from multiple dimensions, on the basis, the method divides N data packets into at least one data packet set according to the four-dimensional relevance vector, not only can the relevance relation among the data packets be determined from more characteristic dimensions, and the dividing quality of the data packet set is improved, but also the purposes of aggregating and analyzing the data packets on the basis of not analyzing domain name information when the data stream is an encrypted data stream are achieved, and further a data basis is laid for the subsequent DoH tunnel attack behavior detection.

In an alternative embodiment, to determine the four-dimensional association degree vector of each data packet, the network security device determines, according to the arrangement order of the N data packets in the data stream, the acquisition time of a previous data packet of each data packet as a first time, and also determines the acquisition time of a next data packet of each data packet as a second time, and finally determines, by the network security device, the four-dimensional association degree vector of each data packet according to the first time, the second time, the number of bytes of each data packet, and the transmission direction of each data packet.

It should be noted that the first packet has no previous packet and the last packet has no next packet. When the data packet is a first data packet of the N data packets, the network security device determines that a first time corresponding to the first data packet is an acquisition time of the first data packet. And when the data packet is the last data packet in the N data packets, the network security equipment determines that the second time corresponding to the last data packet is the acquisition time of the last data packet.

In particular, in order to more clearly describe the technical scheme in the embodiment of the application, the data Flow to be processed can be recorded as Flow = { P = { P = } ₁ ，P ₂ ，…，P _I ，…P _N }，I∈[1，N]In which P is _I Is the I data packet.

Wherein the data packet P is calculated _I The formula of the four-dimensional relevance vector is as follows:

α ₂ intertime(P _I-1 ，P _I )，α ₃ intertime(P _I ，P _I+1 )，direction(P _I ))

wherein alpha is ₁ ∈[0.001，0.01]，α ₂ ∈[0.1，0.8]，α ₃ ∈[0.1，0.8](ii) a size () is the number of bytes of the packet; interrupt () is the absolute value of the difference in acquisition times of two packets; the direction () is the direction of the data packet, and is marked as 1 if the transmission direction of the data packet is a first direction, and is marked as-1 if the transmission direction of the data packet is a second direction; when I =1, interrupt (P) _I-1 ，P _I ) Is marked as 0; interertime (P) when I = N _I ，P _I+1 ) And is noted as 0.

Specifically, the above-mentioned interrupt (P) _I-1 ，P _I ) The first interval duration, i.e., the above-mentioned interrupt (P) _I ，P _I+1 ) For the second interval duration, in order to obtain the first interval duration and the second interval duration, the network security device firstly determines the acquisition time of each data packet as target time, then calculates the absolute value of the difference between the target time and the first time to obtain the first interval duration, calculates the absolute value of the difference between the target time and the second time to obtain the second interval duration, and finally determines the four-dimensional association degree vector of each data packet according to the first interval duration, the second interval duration, the byte number of each data packet and the transmission direction of each data packet.

Optionally, since the target time and the first time of the first data packet are both the acquiring time of the first data packet,therefore, the first interval duration of the first packet is 0, i.e. when I =1, interrupt (P) _I-1 ，P _I ) And is noted as 0. Meanwhile, since the target time and the second duration of the last packet are both the acquisition time of the last packet, the second interval duration of the last packet is 0, that is, when I = N, interrupt (P) _I ，P _I+1 ) And is noted as 0.

In addition, the first interval duration, the second interval duration, the byte number and the transmission direction corresponding to each data packet are four characteristic dimensions corresponding to each data packet, and a four-dimensional association degree vector corresponding to each data packet can be constructed and obtained through the four characteristic dimensions.

Alternatively, fig. 2 shows a schematic diagram of four characteristic dimensions of a data packet, as shown in fig. 2, P is contained in a data stream ₁ To P ₇ A total of 7 packets, where P ₃ 、P ₄ 、P ₅ The transmission direction of three packets is the first direction (the dotted line indicates that the transmission direction is the first direction), and the transmission direction of the remaining four packets is the second direction. Also shown in FIG. 2 is P ₂ Byte size (P) of packet ₂ ) And P ₃ And P ₄ Interval duration interrupt (P) of two data packets ₃ ，P ₄ ) The duration of the interval is P ₃ The second interval duration of the data packet, also P ₄ The first interval duration of the data packet.

In an alternative embodiment, after determining the four-dimensional relevance vector of each data packet according to the acquisition time, the number of bytes and the transmission direction of each data packet, the network security device determines a target data packet from the N data packets, wherein the target data packet is any one of the N data packets except for the last data packet. Then, the network security equipment determines that the four-dimensional relevance vector corresponding to the target data packet is a first vector to be processed, and determines that the four-dimensional relevance vector corresponding to a next data packet of the target data packet is a second vector to be processed. And finally, the network security equipment determines a target included angle cosine value of the target data packet according to the first to-be-processed vector and the second to-be-processed vector, and constructs a relevance vector field according to the target included angle cosine value and the four-dimensional relevance vector of each data packet, wherein the relevance vector field is used for representing the relevance degree between the N data packets.

Alternatively, the above-mentioned correlation vector field may be denoted by F, specifically, the calculation formula of the relevance vector field is as follows:

optionally, the formula for calculating the cosine of the target included angle is as follows:

wherein,

for representing the first to-be-processed vector as described above,

for representing the second pending vector as described above.

Optionally, fig. 3 shows a schematic diagram of a relevancy vector, wherein, in fig. 3, the fourth characteristic dimension direction (i.e. transmission direction) is omitted, and as shown in fig. 3, the abscissa is

Alpha in (A) ₁ size(P _I ) On the ordinate of

Alpha in (A) ₂ intertime(P _I-1 ，P _I )，

Is composed of

And

the inverse cosine value of the angle, and fig. 3 also shows

Alpha in (A) ₃ intertime(P _I ，P _I+1 )。

In an alternative embodiment, the network security device may divide the N packets into at least one packet set by the following four steps.

Step one, starting from the first data packet in the data stream, determining S data packets according to the arrangement sequence of the N data packets in the data stream,

detecting whether S data packets meet preset conditions or not, and forming the S data packets into a data packet set when the S data packets meet the preset conditions, wherein S is an integer larger than 1, S is smaller than or equal to N, and the preset conditions are used for representing the association degree between the data packets in the data packet set;

removing S data packets forming the data packet set in the step two from the data stream to obtain a new data stream;

and step four, updating the data stream based on the new data stream, and repeatedly executing the processes from the step one to the step three until all the N data packets are divided into the data packet set.

Fig. 4 shows a flowchart for detecting whether S data packets satisfy a preset condition according to an embodiment of the present application, and as shown in fig. 4, when detecting whether S data packets satisfy the preset condition, the method includes the following steps:

step 1, determining a four-dimensional relevance vector corresponding to a first data packet as a first vector;

step 2, according to the arrangement sequence, starting from a second data packet, determining that four-dimensional relevance vectors corresponding to S-1 data packets are second vectors to obtain S-1 second vectors, wherein S is an integer greater than or equal to 2, and S is less than or equal to N;

step 3, summing the first vector and the S-1 second vectors to obtain a first target vector;

step 4, determining a four-dimensional relevance vector corresponding to the S +1 th data packet as a third vector;

step 5, summing the target vector and the third vector to obtain a second target vector;

step 6, determining whether S data packets meet preset conditions according to the first vector, the first target vector and the second target vector, wherein the S data packets comprise a first data packet and S-1 data packets;

and 7, when the S data packets do not meet the preset condition, adding 1 to the S to obtain the updated S, and then circularly executing the processes from the step 2 to the step 7 until the S data packets meet the preset condition, and forming a data packet set by the S data packets.

Optionally, the preset condition is the following formula:

wherein,

for a first vector, i.e. a four-dimensional correlation vector corresponding to a first packet in the data stream,

is the vector sum of the first target vector, i.e. the first vector, and S-1 second vectors.

Is the second target vector, i.e. the vector sum of the third vector and the first target vector.

Specifically, the network security device calculates a cosine value of an included angle between a first target vector and the first vector to obtain a first cosine value of the included angle, calculates a cosine value of an included angle between a second target vector and the first vector to obtain a second cosine value of the included angle, and finally determines that the S data packets meet a preset condition when the cosine values of the first included angle and the second included angle are opposite numbers.

Wherein in the formula representing the predetermined condition

Namely the cosine value of the first included angle,

the cosine value of the second included angle is obtained.

For a more clear illustration of the above scheme, reference may be made to the following three steps:

step A1, with a first data packet P ₁ As a starting point, a four-dimensional relevance vector of the subsequent continuous (s-1) data packets and the data packet P are calculated ₁ Vector sum of four-dimensional correlation vectors

And calculates the vector sum

Cosine of the angle

Step A2, the value of s is started from 2, and is gradually increased by 1, when the first requirement is met

Then, the s data packets { P } ₁ ，…，P _s As a polymerization;

step A3, using s +1 data packet P _s+1 Repeating the step A1 and the step A2 as a starting point until the step P is traversed _N The aggregation for all N packets is completed. This process can be understood as the process of step three and step four described above.

Example 2

According to an embodiment of the present application, there is further provided an embodiment of an aggregation apparatus for data packets, where fig. 5 is a schematic diagram of an optional aggregation apparatus for data packets according to an embodiment of the present application, and as shown in fig. 5, the apparatus includes: a first obtaining module 501, configured to obtain a data stream to be processed, where the data stream is composed of N data packets, and N is an integer greater than 1.

Alternatively, the aggregation apparatus of the data packet may be disposed in a network security device, and the network security device may be an execution main body of the aggregation method of the data packet in embodiment 1. The network security device includes but is not limited to a firewall device and other security devices. The network security device can be connected with a plurality of terminal devices, and acquires data streams to be processed from the plurality of terminal devices. The terminal devices include, but are not limited to, desktop computers, industrial personal computers, servers and other devices.

The data stream in the present application may be a DNS data stream in a plaintext state, or may be an encrypted DNS (Domain Name System) data stream, for example, a DNS data stream encrypted by a DoH (DNS over HTTPS) protocol. In addition, the data stream may also be a data stream based on other information transmission protocols.

The aggregation device of the data packets further comprises: a second obtaining module 502, configured to obtain the obtaining time of each data packet and the number of bytes of each data packet.

Optionally, after obtaining the data stream to be processed, the network security device may obtain the byte number of each data packet of the N data packets in the data stream, where the byte numbers of any two data packets may be the same or different. A packet may be a request packet or a response packet.

In addition, each time the network security device obtains a data packet, the network security device records the obtaining time of the data packet. For example, the network security device is connected to the terminal device a and the terminal device B, and the terminal device a generates three data packets, namely, a data packet 1, a data packet 2, and a data packet 3, and sends the three data packets to the terminal device B, however, before the terminal device B receives the three data packets, the three data packets are firstly acquired by the network security device, and then when the network security device performs security detection on the three data packets and confirms that the data packets are in a normal state, the network security device forwards the three data packets to the terminal device B, wherein when the network security device acquires the three data packets, the network security device records the acquisition time of each data packet.

The aggregation device of the data packets further comprises: a determining module 503, configured to determine a four-dimensional association degree vector of each data packet according to the obtaining time, the number of bytes, and the transmission direction of each data packet, where the four-dimensional association degree vector is used to characterize an association feature between two adjacent data packets.

Optionally, the four-dimensional association degree vector is used to characterize the association between two adjacent data packets. Specifically, in the N data packets, the transmission direction of each data packet is a first direction or a second direction, where the first direction and the second direction are opposite directions, for example, when the first direction is a forward direction, the second direction is a reverse direction. In addition, according to the acquisition time of each data packet, the network security device calculates the interval duration of each data packet and the adjacent data packet in the acquisition time. Because the adjacent packets have a front part and a rear part, the interval duration is two. On the basis, each data packet can be described from four characteristic dimensions, namely the number of bytes of the data packet, the transmission direction of the data packet, the interval duration between the data packet and the previous data packet, and the interval duration between the data packet and the next data packet. It should be noted that the duration of the interval between the first packet and the previous packet may be set to 0 by default; the interval duration between the last data packet and the next data packet may default to 0.

Furthermore, the four-dimensional relevance vector specifically describes four characteristic dimensions of each data packet, and relevance between the data packets can be determined through the four characteristic dimensions described by the four-dimensional relevance vector. For example, the similarity of two packets in each characteristic dimension.

The aggregation device of the data packets further comprises: a dividing module 504, configured to divide the N data packets into at least one data packet set according to the four-dimensional relevancy vector.

Optionally, since the four-dimensional association degree vector of each data packet at least characterizes association characteristics between two adjacent data packets, the N data packets may be divided into at least one data packet set based on the four-dimensional association degree vector. Each data packet set at least comprises one data packet, and the same data packet does not exist in different data packet sets. For example, assume that there are 10 data packets, wherein the 1 st data packet and the 2 nd data packet are divided into a first data packet set, the 3 rd to 7 th data packets are divided into a second data packet set, and the 8 th to 10 th data packets are divided into a third data packet set.

In addition, in the prior art, DNS is a common information transmission protocol, wherein although a mechanism for clear text transmission of DNS facilitates security check and audit of security devices such as firewalls, serious privacy and security problems are caused at the same time. In order to solve the information security problem in the DNS plaintext transmission mechanism, a DNS request packet and a DNS response packet may be encrypted by using a DoH technique, thereby preventing a third-party application from eavesdropping and tampering a DNS packet in a plaintext state.

However, as the use range of the DoH technology is increasingly expanded, the tunnel attack behavior based on the DoH is more frequent. The tunnel attack behavior of the DoH is essentially a layer of HTTPS encapsulation on the original DNS tunnel attack behavior.

It should be noted that, in the prior art, DNS tunnel attack behavior is usually detected based on domain name information, but this detection technique can be successfully implemented if any third-party application can obtain domain name information by analyzing a DNS data stream because the DNS data stream is an unencrypted data stream in a plaintext state. On this basis, when the DNS data stream is encrypted by using the DoH technique, since the content of the DNS request and the content of the DNS response are no longer visible, the existing DNS tunnel attack behavior detection technique based on domain name information cannot be applied to the process of detecting the DoH tunnel attack behavior.

It should be noted that the first obtaining module 501, the second obtaining module 502, the determining module 503 and the dividing module 504 correspond to steps S101 to S104 in the above embodiment 1, and the four modules are the same as the corresponding steps in the implementation example and application scenario, but are not limited to the disclosure in the above embodiment 1.

According to the content, the four-dimensional relevance vector is determined through the acquisition time, the byte number and the transmission direction of the data packets, the relevance characteristics among the data packets are analyzed from multiple dimensions, then the N data packets are divided into at least one data packet set according to the four-dimensional relevance vector, namely the data set is divided according to the relevance relationship of the deeper layers of the data packets.

Therefore, the technical scheme of the application achieves the purpose of determining the association characteristics among the data packets from multiple dimensions, so that the effect of improving the dividing quality of the data packets is achieved, and the technical problem that the dividing quality of the data packet set is poor when a plurality of data packets are divided into at least one data packet set in the prior art is solved.

Optionally, the determining module further includes: a first determination unit, a second determination unit, and a third determination unit. The first determining unit is configured to determine, according to an arrangement order of the N data packets in the data stream, that an acquisition time of a previous data packet of each data packet is a first time; a second determining unit, configured to determine that an acquisition time of a subsequent data packet of each data packet is a second time; a third determining unit, configured to determine a four-dimensional association degree vector of each data packet according to the first time, the second time, the number of bytes of each data packet, and a transmission direction of each data packet.

Optionally, the first determining unit further includes: the first determining submodule is configured to determine, when a data packet is a first data packet of the N data packets, that a first time corresponding to the first data packet is an acquisition time of the first data packet.

Optionally, the second determining unit further includes: and the second determining submodule is used for determining that the second time corresponding to the last data packet is the acquisition time of the last data packet when the data packet is the last data packet in the N data packets.

Optionally, the third determining unit further includes: a third determination submodule, a first calculation submodule, a second calculation submodule, and a fourth determination submodule. The third determining submodule is used for determining the acquisition time of each data packet as the target time; the first calculation submodule is used for calculating the absolute value of the difference value between the target time and the first time to obtain a first interval duration; the second calculating submodule is used for calculating the absolute value of the difference value between the target time and the second time to obtain a second interval duration; and the fourth determining submodule is used for determining the four-dimensional association degree vector of each data packet according to the first interval duration, the second interval duration, the number of bytes of each data packet and the transmission direction of each data packet.

It should be noted that the first packet has no previous packet and the last packet has no next packet. When the data packet is a first data packet of the N data packets, the network security device determines that the first time corresponding to the first data packet is the acquisition time of the first data packet. And when the data packet is the last data packet in the N data packets, the network security equipment determines that the second time corresponding to the last data packet is the acquisition time of the last data packet.

wherein alpha is ₁ ∈[0.001，0.01]，α ₂ ∈[0.1，0.8]，α ₃ ∈[0.1，0.8](ii) a size () is the number of bytes of the packet; interrupt () is the absolute value of the difference in acquisition times of two packets; the direction () is the direction of the data packet, and is marked as 1 if the transmission direction of the data packet is a first direction, and is marked as-1 if the transmission direction of the data packet is a second direction; when I =1, interrupt (P) _I-1 ，P _I ) Is marked as 0; when I = N, interrupt (P) _I ，P _I+1 ) And is noted as 0.

Alternatively, fig. 2 shows a schematic diagram of four characteristic dimensions of a data packet, as shown in fig. 2, containing P in one data stream ₁ To P ₇ A total of 7 packets, where P ₃ 、P ₄ 、P ₅ The transmission direction of three data packets is the first direction, and the transmission direction of the remaining four data packets is the second direction. Also shown in FIG. 2 is P ₂ Byte size (P) of packet ₂ )，And P ₃ And P ₄ Interval duration interrupt (P) of two data packets ₃ ，P ₄ ) The duration of the interval is P ₃ The second interval duration of the data packet, also P ₄ The first interval duration of the data packet.

Optionally, the apparatus for aggregating data packets further includes: the device comprises a first determining module, a second determining module, a third determining module, a fourth determining module and a constructing module. The first determining module is configured to determine a target data packet from the N data packets, where the target data packet is any one of the N data packets except for a last data packet; the second determining module is used for determining that the four-dimensional relevance vector corresponding to the target data packet is a first vector to be processed; a third determining module, configured to determine that a four-dimensional relevance vector corresponding to a subsequent data packet of the target data packet is a second to-be-processed vector; a fourth determining module, configured to determine a cosine value of a target included angle of the target data packet according to the first to-be-processed vector and the second to-be-processed vector; and the construction module is used for constructing a relevance vector field according to the cosine value of the target included angle and the four-dimensional relevance vector of each data packet, wherein the relevance vector field is used for representing the relevance degree among the N data packets.

Optionally, the relevance vector field may be represented by F, and specifically, the calculation formula of the relevance vector field is as follows:

wherein,

it can be considered as the first vector to be processed,

can be considered as the second pending vector.

Alternatively, fig. 3 shows a schematic diagram of a relevance vector, wherein in fig. 3, the fourth characteristic dimension direction (i.e. transmission direction) is omitted, and as shown in fig. 3, the abscissa is

Alpha in (A) ₁ size(P _I ) On the ordinate of

Alpha in (A) ₂ intertime(P _I-1 ，P _I )，

Is composed of

And

the inverse cosine value of the angle, and fig. 3 also shows

Alpha in (A) ₃ intertime(P _I ，P _I+1 )。

Optionally, the dividing module further includes: the device comprises a first execution module, a second execution module, a third execution module and a fourth execution module. The first execution module is configured to execute the first step, and determine, starting from a first data packet in the data stream, S data packets according to an arrangement order of the N data packets in the data stream; a second execution module, configured to execute the second step, detect whether S data packets meet a preset condition, and when S data packets meet the preset condition, combine the S data packets into a data packet set, where S is an integer greater than 1, and S is less than or equal to N, and the preset condition is used to represent a correlation degree between data packets in the data packet set; a third execution module, configured to execute step three, remove, from the data stream, S data packets that form a data packet set in step two, and obtain a new data stream; and a fourth execution module, configured to execute step four, update the data stream based on the new data stream, and repeatedly execute the processes of step one to step three until all the N data packets are divided into a data packet set.

Optionally, the second execution module further includes: the first execution submodule, the second execution submodule, the third execution submodule, the fourth execution submodule, the fifth execution submodule, the sixth execution submodule and the seventh execution submodule. The first execution submodule is configured to execute step 1, and determine that the four-dimensional association degree vector corresponding to the first data packet is a first vector; a second execution submodule, configured to execute step 2, and determine, starting from a second data packet according to the arrangement order, that four-dimensional relevance degree vectors corresponding to S-1 data packets are second vectors, to obtain S-1 second vectors, where S is an integer greater than or equal to 2, and S is less than or equal to N; a third execution submodule, configured to execute step 3, sum the first vector and the S-1 second vectors to obtain a first target vector; a fourth execution submodule, configured to execute step 4, determine that the four-dimensional relevancy vector corresponding to the S +1 th packet is a third vector; a fifth execution submodule, configured to execute step 5, sum the first target vector and the third vector to obtain a second target vector; a sixth execution submodule, configured to execute step 6, determine whether S data packets meet the preset condition according to the first vector, the first target vector, and the second target vector, where the S data packets include a first data packet and S-1 data packets; and a seventh execution submodule, configured to execute step 7, perform an operation of adding 1 to S when the S data packets do not meet the preset condition, to obtain an updated S, and then execute the processes of steps 2 to 7 in a loop until the S data packets meet the preset condition, so as to combine the S data packets into a data packet set.

Optionally, the sixth execution sub-module further includes: the device comprises a first calculating subunit, a second calculating subunit and a first determining subunit. The first calculating subunit is configured to calculate a cosine value of an included angle between the first target vector and the first vector, so as to obtain a cosine value of a first included angle; the second calculating subunit is used for calculating a cosine value of an included angle between the second target vector and the first vector to obtain a cosine value of a second included angle; and the first determining subunit is configured to determine that the S data packets satisfy the preset condition when the cosine value of the first included angle and the cosine value of the second included angle are opposite numbers.

Optionally, the preset condition is the following formula:

wherein,

Wherein in the formula representing the predetermined condition

Namely the cosine value of the first included angle,

the cosine value of the second included angle is obtained.

And calculates the vector sum

Cosine of the angle

Then, the s data packets { P } ₁ ，…，P _s As a polymerization;

step A3, using the s +1 th data packet P _s+1 Repeating the step A1 and the step A2 as a starting point until the step P is traversed _N The aggregation for all N packets is completed. This process can be understood as the process of step three and step four described above.

From the above, according to the present application, a four-dimensional association vector is determined according to the acquisition time, the number of bytes, and the transmission direction of the data packets, association characteristics between the data packets are analyzed from multiple dimensions, and then N data packets are divided into at least one data packet set according to the four-dimensional association vector, that is, the data set is divided according to an association relationship of a deeper layer of the data packets.

Example 3

There is also provided, in accordance with an embodiment of the present application, an electronic device, including one or more processors; storage means for storing one or more programs which, when executed by the one or more processors, cause the one or more processors to implement a method for running a program, wherein the program is arranged to perform the above-mentioned method of aggregation of data packets when run.

The above-mentioned serial numbers of the embodiments of the present application are merely for description and do not represent the merits of the embodiments.

In the above embodiments of the present application, the descriptions of the respective embodiments have respective emphasis, and for parts that are not described in detail in a certain embodiment, reference may be made to related descriptions of other embodiments.

In the embodiments provided in the present application, it should be understood that the disclosed technical content can be implemented in other manners. The above-described embodiments of the apparatus are merely illustrative, and for example, the division of the units may be a logical division, and in actual implementation, there may be another division, for example, multiple units or components may be combined or integrated into another system, or some features may be omitted, or not executed. In addition, the shown or discussed coupling or direct coupling or communication connection between each other may be an indirect coupling or communication connection through some interfaces, units or modules, and may be electrical or in other forms.

The units described as separate parts may or may not be physically separate, and parts displayed as units may or may not be physical units, may be located in one place, or may be distributed on a plurality of units. Some or all of the units can be selected according to actual needs to achieve the purpose of the solution of the embodiment.

In addition, functional units in the embodiments of the present application may be integrated into one processing unit, or each unit may exist alone physically, or two or more units are integrated into one unit. The integrated unit can be realized in a form of hardware, and can also be realized in a form of a software functional unit.

The integrated unit, if implemented in the form of a software functional unit and sold or used as a stand-alone product, may be stored in a computer readable storage medium. Based on such understanding, the technical solution of the present application may be substantially implemented or contributed to by the prior art, or all or part of the technical solution may be embodied in a software product, which is stored in a storage medium and includes instructions for causing a computer device (which may be a personal computer, a server, or a network device) to execute all or part of the steps of the method according to the embodiments of the present application. And the aforementioned storage medium includes: a U-disk, a Read-Only Memory (ROM), a Random Access Memory (RAM), a removable hard disk, a magnetic or optical disk, and other various media capable of storing program codes.

The foregoing is only a preferred embodiment of the present application and it should be noted that those skilled in the art can make several improvements and modifications without departing from the principle of the present application, and these improvements and modifications should also be considered as the protection scope of the present application.

Claims

1. A method for aggregating packets, comprising:

acquiring a data stream to be processed, wherein the data stream consists of N data packets, and N is an integer greater than 1;

acquiring the acquisition time of each data packet and the number of bytes of each data packet;

determining a four-dimensional relevance vector of each data packet according to the acquisition time, the number of bytes and the transmission direction of each data packet, wherein the four-dimensional relevance vector is used for representing relevance characteristics between two adjacent data packets;

and dividing the N data packets into at least one data packet set according to the four-dimensional relevance vector.

2. The method of claim 1, wherein determining the four-dimensional association degree vector of each data packet according to the obtaining time, the number of bytes and the transmission direction of each data packet comprises:

determining the acquisition time of the previous data packet of each data packet as a first time according to the arrangement sequence of the N data packets in the data stream;

determining the acquisition time of the next data packet of each data packet as a second time;

and determining the four-dimensional association degree vector of each data packet according to the first time, the second time, the byte number of each data packet and the transmission direction of each data packet.

3. The method of claim 2, further comprising:

when the data packet is a first data packet in the N data packets, determining that first time corresponding to the first data packet is acquisition time of the first data packet;

and when the data packet is the last data packet in the N data packets, determining that the second time corresponding to the last data packet is the acquisition time of the last data packet.

4. The method of claim 3, wherein determining the four-dimensional correlation vector for each packet according to the first time, the second time, the number of bytes for each packet, and the transmission direction of each packet comprises:

determining the acquisition time of each data packet as target time;

calculating the absolute value of the difference value between the target time and the first time to obtain a first interval duration;

calculating the absolute value of the difference value between the target time and the second time to obtain a second interval duration;

and determining the four-dimensional association degree vector of each data packet according to the first interval duration, the second interval duration, the number of bytes of each data packet and the transmission direction of each data packet.

5. The method of claim 1, wherein after determining the four-dimensional association degree vector of each data packet according to the obtaining time, the number of bytes and the transmission direction of each data packet, the method further comprises:

determining a target data packet from the N data packets, wherein the target data packet is any one of the N data packets except for the last data packet;

determining that the four-dimensional relevance vector corresponding to the target data packet is a first vector to be processed;

determining a four-dimensional relevance vector corresponding to a next data packet of the target data packet as a second vector to be processed;

determining a cosine value of a target included angle of the target data packet according to the first vector to be processed and the second vector to be processed;

and constructing a relevance vector field according to the cosine value of the target included angle and the four-dimensional relevance vector of each data packet, wherein the relevance vector field is used for representing the relevance degree among the N data packets.

6. The method of claim 1, wherein dividing the N packets into at least one packet set according to the four-dimensional relevance vector comprises:

step one, starting from a first data packet in the data stream, determining S data packets according to the arrangement sequence of the N data packets in the data stream;

and step four, updating the data stream based on the new data stream, and repeatedly executing the processes from the step one to the step three until all the N data packets are divided into data packet sets.

7. The method of claim 6, wherein detecting whether S packets meet the predetermined condition, and when S packets meet the predetermined condition, grouping S packets into a packet set comprises:

step 1, determining a four-dimensional relevancy vector corresponding to the first data packet as a first vector;

step 2, according to the arrangement sequence, determining four-dimensional relevance vectors corresponding to S-1 data packets as second vectors from the second data packet to obtain S-1 second vectors, wherein S is an integer greater than or equal to 2, and S is less than or equal to N;

step 3, summing the first vector and S-1 second vectors to obtain a first target vector;

step 5, summing the first target vector and the third vector to obtain a second target vector;

step 6, determining whether S data packets meet the preset condition or not according to the first vector, the first target vector and the second target vector, wherein the S data packets comprise a first data packet and S-1 data packets;

and 7, when the S data packets do not meet the preset condition, adding 1 to the S to obtain an updated S, and then circularly executing the processes from the step 2 to the step 7 until the S data packets meet the preset condition, and forming a data packet set by the S data packets.

8. The method of claim 7, wherein determining whether S packets satisfy the predetermined condition according to the first vector, the first target vector, and the second target vector comprises:

calculating a cosine value of an included angle between the first target vector and the first vector to obtain a cosine value of a first included angle;

calculating a cosine value of an included angle between the second target vector and the first vector to obtain a second included angle cosine value;

and when the cosine values of the first included angle and the second included angle are opposite numbers, determining that S data packets meet the preset condition.

9. An apparatus for aggregating packets, comprising:

the device comprises a first acquisition module, a second acquisition module and a processing module, wherein the first acquisition module is used for acquiring a data stream to be processed, the data stream consists of N data packets, and N is an integer greater than 1;

the second acquisition module is used for acquiring the acquisition time of each data packet and the number of bytes of each data packet;

a determining module, configured to determine a four-dimensional association degree vector of each data packet according to the obtaining time, the number of bytes, and a transmission direction of each data packet, where the four-dimensional association degree vector is used to characterize an association feature between two adjacent data packets;

and the dividing module is used for dividing the N data packets into at least one data packet set according to the four-dimensional relevance vector.

10. An electronic device, wherein the electronic device comprises one or more processors; storage means for storing one or more programs which, when executed by the one or more processors, cause the one or more processors to implement a method for running a program, wherein the program is arranged to perform the method of aggregating data packets of any of claims 1-8 when run.