CN117857146A

CN117857146A - Method for identifying V2Ray flow

Info

Publication number: CN117857146A
Application number: CN202311759275.9A
Authority: CN
Inventors: 林飞; 聂冰; 刘俊; 曾文杰; 易永波; 古元; 毛华阳; 华仲峰
Original assignee: Beijing Act Technology Development Co ltd
Current assignee: Beijing Act Technology Development Co ltd
Priority date: 2023-12-20
Filing date: 2023-12-20
Publication date: 2024-04-09

Abstract

The invention provides a method for identifying V2Ray flow, which comprises the steps of firstly collecting flow, screening suspected V2Ray flow by calculating the entropy of TCP packets, actively detecting based on an instruction part in the V2Ray protocol open source code interaction process, extracting the first packet after TCP handshake in the suspected V2Ray flow, capturing the first 64 bytes of construction attack data, analyzing the variable length bytes after transmitting the construction attack data, and judging the service type. Different from the prior detection scheme, which is basically based on probability guess, the method adopts a detection mode of passive detection and active detection to identify the V2Ray flow, and the method can bypass V2Ray protection and can traverse attack data of a target plaintext space to send data by screening the suspicious V2Ray flow, and judges whether the V2Ray is the V2Ray according to a variable length value, thereby effectively improving the accuracy of the V2Ray flow identification.

Description

Method for identifying V2Ray flow

Technical Field

The invention relates to the technical field of electric digital data processing, in particular to a method for identifying V2Ray flow.

Background

With the rapid development of internet technology and the increasing value of personal privacy and other big data, the demands of internet users for data encryption transmission are gradually increased, and V2Ray is used as a symmetric encryption protocol for encrypting TCP network traffic, and has excellent capability in aspects of feature confusion, platform compatibility, running speed and the like, and is widely used in the field of encryption transmission. At present, encryption traffic identification methods are mainly divided into a rule matching-based method, a machine learning-based method and a deep learning-based method

(1) Rule matching-based method

The rule matching based method identifies encrypted communication software by comparing encrypted traffic characteristics in a database, such as port information, specific byte information, etc. The method has simple steps and extremely fast judging process, but the accuracy of the identification method based on the ports is greatly reduced due to the occurrence of the technologies such as port forwarding, random port allocation, flow disguise and the like.

(2) Deep packet inspection method

The deep packet inspection method distinguishes encrypted traffic by identifying and analyzing key features such as handshake protocol fields during interaction. However, DPI often cannot analyze the payload due to encryption confusion of traffic by V2Ray software.

(3) Deep neural network-based method

The V2Ray flow identification method based on deep learning can automatically learn and extract characteristic information contained in encrypted flow without artificial characteristic extraction and selection, so that the method is favored by the industry, and the convolutional neural network is most widely applied.

The existing method has the following same problems: the recognition accuracy is very low.

Disclosure of Invention

The invention provides a method for identifying V2Ray traffic, which aims to solve the problem of low identification rate of V2Ray traffic, and comprises the steps of firstly collecting traffic, screening suspected V2Ray traffic by calculating the entropy of TCP packets, actively detecting based on an instruction part in the V2Ray protocol open source code interaction process, extracting the first packet after TCP handshake in the suspected V2Ray traffic, capturing the first 64 bytes of construction attack data, sending the construction attack data, analyzing variable length bytes, and judging service types. Different from the prior detection scheme, which is basically based on probability guess, the method adopts a detection mode of passive detection and active detection to identify the V2Ray flow, and the method can bypass V2Ray protection and can traverse attack data of a target plaintext space to send data by screening the suspicious V2Ray flow, and judges whether the V2Ray is the V2Ray according to a variable length value, thereby effectively improving the accuracy of the V2Ray flow identification.

The invention provides a method for identifying V2Ray flow, which comprises the following steps:

s1, collecting flow, extracting effective load of data packets, calculating the length of the data packets, calculating entropy of the length of the data packets according to occurrence frequency or probability distribution of the length of each data packet, judging whether the collected flow is suspected V2Ray flow or not according to the entropy of the length of the data packets, if yes, entering step S2, and if not, continuing to collect the flow;

s2, extracting a first data packet after TCP handshake in suspected V2Ray traffic to obtain an original data packet and capturing the first K bytes;

s3, modifying an instruction of the original data packet to construct a detection load and actively detect, analyzing the detected variable value, judging whether the difference value between the maximum value and the minimum value of the variable value is X and has no repeated variable value, if so, the target service is V2Ray service, and if not, the target service is non-V2 Ray service, and completing a method for identifying the V2Ray flow.

The invention relates to a method for identifying V2Ray flow, which is characterized in that, as a preferable mode, a step S1 comprises the following steps:

s11, collecting flow, and acquiring TCP stream data packets;

s12, analyzing the captured data packet to obtain the content of the TCP stream and extracting the effective load of the data packet;

s13, acquiring the load length of each data packet;

s14, merging the data packets of the TCP stream into a data set, wherein the data set comprises the data packet length, and calculating the data packet length entropy H (X) by the occurrence frequency or probability distribution of each data packet length;

s15, judging whether the length entropy H (X) of the data packet is larger than a threshold T, if so, judging that the TCP flow is suspected V2Ray flow, entering step S2, and if not, returning to step S11.

In the method for identifying V2Ray traffic, as a preferred mode, in step S11, a data packet of a TCP stream is captured through a Libpcap;

in step S12, the captured packet is parsed using libpcap.

In the method for identifying V2Ray traffic, as a preferred mode, in step S15, the length entropy H (X) of the data packet is:

H(X)＝-Σ(p(x)*log2(p(x)))，

where p (x) is the probability of the packet length.

In the method for identifying V2Ray flow, in step S15, T is preferably 0.35.

In the method for identifying the V2Ray flow, as a preferred mode, in the step S2, the original data packet is extracted and then analyzed based on the V2Ray protocol open source code interaction process, and the server-side session of the V2Ray protocol is processed to analyze the request head of the client; resolving the client request header includes a decryption operation and a verification operation;

in step S3, the probe payload is constructed from the decryption operation and the verification operation extracted in step S2.

In the method for identifying V2Ray traffic, as an optimal manner, in step S2, the structure of the original data packet includes: authentication, instructions, and variable length;

in step S3, modifying the instruction in the original data packet to obtain a modified instruction, wherein the modified instruction comprises a version number, a data encryption vector, a data encryption key, response authentication, options, a margin, an encryption mode, reservation and an instruction, and the authentication, the modified instruction and the variable length form detection data; in the probe data, the last byte of the data encryption key is assigned a traversal value x, and the modified instruction, the authentication and the variable length are combined into a probe load.

In the method for identifying the V2Ray flow, in the step S2, K is 64, the length of the instruction is 48 bytes, and the structure of the modified instruction is as follows: version number 1 byte, data encryption vector 16 bytes, data encryption key 16 bytes, response authentication 1 byte, option 1 byte, margin, encryption mode 1 byte, reserved 2 bytes, and instruction 2 bytes.

The invention relates to a method for identifying V2Ray flow, which is characterized in that, as a preferable mode, a step S3 comprises the following steps:

s31, modifying the instruction in the original data packet to obtain a modified instruction;

s32, generating X pieces of detection data according to the modified instruction, the authentication and the variable length, wherein the last byte of the data encryption key is assigned with a traversal value X, and the data encryption key is also assigned with the traversal value X to traverse from 0 to X one by one;

s33, sending the detection data with the traversal value x of 1 to a target service;

s34, recording the number of bytes successfully transmitted;

s35, traversing the value x+1, returning to the step S33 until X pieces of detection data are transmitted in total, and recording all the measured variable values;

s36, analyzing all recorded variable values to find out a maximum value and a minimum value; if the difference between the maximum value and the minimum value of the variable values is X and there is no repeated variable value, if all the variable values are X, the target service is V2Ray service, if part of the variable values are not V2Ray service, and a method for identifying the V2Ray flow is completed.

In the method for identifying the V2Ray flow, in the step S3, X is 32 as a preferable mode.

The invention has the following advantages:

the essence of the existing detection scheme is a probability-based guess, and the accuracy is very low and basically below 20%; the invention provides a detection mode of passive detection and active detection, which is characterized in that suspicious V2Ray flow is screened, attack data which can bypass V2Ray protection and traverse a target plaintext space is constructed and then transmitted, whether the V2Ray is judged according to a variable length value, so that the accuracy of V2Ray flow identification is effectively improved, the accuracy is close to 100%, and erroneous judgment is basically avoided.

Drawings

FIG. 1 is a flow chart of a method for identifying V2Ray flow;

FIG. 2 is a flow chart of a method for identifying V2Ray flow for collecting flow and screening suspected V2Ray flow;

FIG. 3 is a flow chart of an active probing method for identifying V2Ray traffic.

Detailed Description

The following description of the embodiments of the present invention will be made clearly and completely with reference to the accompanying drawings, in which it is apparent that the embodiments described are only some embodiments of the present invention, but not all embodiments.

Example 1

1-3, a method for identifying V2Ray flow carries out V2Ray flow identification by collecting flow, screening suspected V2Ray flow, constructing attack data, sending data, analyzing variable values and judging service types;

mainly comprises the following steps:

1. the method for collecting flow, calculating the entropy of TCP packet and screening out suspected V2Ray flow comprises the following steps as shown in figure 2:

1.1 acquiring TCP stream data packets: first, the packets of the TCP flow are captured by the libpcap.

1.2 parsing the data packet: the captured packets are parsed using a libpcap to obtain the contents of the TCP stream. The payload of the data packet is extracted.

1.3 calculating the data packet length: for each data packet, a payload length of the data packet is obtained. These length values are recorded for later analysis.

1.4 calculating entropy: entropy is a concept in information theory that is used to measure uncertainty or randomness of a data set. In calculating entropy, the method comprises the following steps:

1.4.1 merging packets of a TCP stream into one data set, the packet length can be part of the data set.

1.4.2 calculating the frequency of occurrence or probability distribution for each packet length.

1.4.3 calculating the entropy of the packet length using the calculation formula of entropy. The calculation formula of the entropy is as follows:

H(X)＝-Σ(p(x)*log2(p(x)))，

where p (x) is the probability of the packet length.

1.5 analysis results: the calculated packet length and entropy values may be used to analyze characteristics of the TCP stream. TCP traffic with entropy greater than 0.35 is used as suspected V2Ray protocol data.

2. The active detection mode is realized based on the problems existing in the V2Ray protocol open source code interaction process; the main interaction process of the V2Ray server is as follows:

2.1 session history: the historical session ID is tracked to prevent replay attacks. It uses a periodic task to periodically clear the expired session ID.

2.2 server session: a server-side session handling V2Ray protocol. It is responsible for and text, including encryption and other operations.

2.3 parse request header: resolving the client request header includes reading user information, decrypting the request data, and the like.

2.4 parsing the request body: a buffer reader interface for reading the decrypted request body is returned based on the request header.

2.5 coding response header: the response header of the server end is encoded, including operations such as encryption.

2.6 coding response body: based on the request header and the server response header, the method returns a buffer writer interface for writing the encrypted response body.

Step 2.2 in the interaction process mainly realizes decryption and verification processes, and active detection can be realized by constructing an instruction part in step 2.2.

3. Extract the first packet after TCP handshake in suspected V2Ray traffic and capture the first 64 bytes

4. The following is a method of active probe data construction, which constructs such streams and performs active probing, as shown in fig. 3, as follows:

the V2Ray packet structure is as follows:

16 bytes	48 bytes	Variable length bytes
			Authentication	Instructions for	Variable length

4.1 modifying instruction part:

4.2, generating detection data: first, taking 32 times of transmission as an example, traversal is performed one by one from 0 to 32 (traversal value is set to x). According to a given rule, probe data of length 16+48=54 bytes is constructed, wherein the last byte of the data encryption key is assigned a traversal value x, and the data encryption key is also assigned a traversal value x. Thus, the V2Ray protection can be bypassed and the target plaintext space is traversed.

4.3, transmitting data: and transmitting the generated probe data to the target service.

4.4, recording a transmission value: after each transmission, the number of bytes successfully transmitted is recorded.

4.5, repeating the steps: the above procedure was repeated 32 times, a total of 32 probe data were transmitted, and all the measured variable values were recorded.

4.6, analyzing variable values: all the recorded variable values are analyzed to find the maximum and minimum values. If the difference between the maximum value and the minimum value of the variable values is 32 and there is no repeated variable value, it may be determined that the target service is the V2Ray service. A difference of 32 means that there is a certain difference in each transmitted data, and that there is no repeated variable value indicates that the target service has a definite response to different probe data.

The foregoing is only a preferred embodiment of the present invention, but the scope of the present invention is not limited thereto, and any person skilled in the art, who is within the scope of the present invention, should make equivalent substitutions or modifications according to the technical scheme of the present invention and the inventive concept thereof, and should be covered by the scope of the present invention.

Claims

1. A method for identifying V2Ray flow is characterized in that: the method comprises the following steps:

s1, collecting flow, extracting effective load of data packets, calculating the length of the data packets, calculating entropy of the length of the data packets according to occurrence frequency or probability distribution of the length of each data packet, judging whether the collected flow is suspected V2Ray flow or not according to the entropy of the length of the data packets, if yes, entering a step S2, and if not, continuing to collect the flow;

s3, modifying an instruction of the original data packet to construct a detection load and actively detect, analyzing the detected variable value, judging whether the difference value between the maximum value and the minimum value of the variable value is X and has no repeated variable value, if so, the target service is V2Ray service, and if not, the target service is non-V2 Ray service, and the method for identifying the V2Ray flow is completed.

2. A method of identifying V2Ray traffic as claimed in claim 1, wherein: the step S1 comprises the following steps:

s11, collecting flow, and acquiring TCP stream data packets;

s13, acquiring the load length of each data packet;

s14, merging data packets of the TCP stream into a data set, wherein the data set comprises data packet lengths, and calculating the data packet length entropy H (X) by the occurrence frequency or probability distribution of each data packet length;

s15, judging whether the length entropy H (X) of the data packet is larger than a threshold T, if so, determining that the TCP flow is the suspected V2Ray flow, entering a step S2, and if not, returning to the step S11.

3. A method of identifying V2Ray traffic as claimed in claim 2, wherein:

in step S11, capturing a data packet of the TCP flow through the libpcap;

in step S12, the captured packet is parsed using libpcap.

4. A method of identifying V2Ray traffic as claimed in claim 2, wherein: in step S15, the packet length entropy H (X) is:

H(X)＝-Σ(p(x)*log2(p(x)))，

where p (x) is the probability of the packet length.

5. A method of identifying V2Ray traffic as claimed in claim 2, wherein: in step S15, T is 0.35.

6. A method of identifying V2Ray traffic as claimed in claim 1, wherein: in step S2, after extracting the original data packet, analyzing based on a V2Ray protocol open source code interaction process, and processing a server-side session of the V2Ray protocol to analyze a client-side request header; resolving the client request header includes a decryption operation and a verification operation;

in step S3, the probe payload is constructed according to the decryption operation and the verification operation extracted in step S2.

7. A method of identifying V2Ray traffic as claimed in claim 1, wherein: in step S2, the structure of the original data packet includes: authentication, instructions, and variable length;

in step S3, modifying the instruction in the original data packet to obtain a modified instruction, where the modified instruction includes a version number, a data encryption vector, a data encryption key, a response authentication, an option, a margin, an encryption mode, a reservation and an instruction, and the authentication, the modified instruction and a variable length form the probe data; and in the detection data, the last byte of the data encryption key is assigned to be a traversal value x, and the modified instruction, the authentication and the variable length are combined into the detection load.

8. A method of identifying V2Ray traffic as in claim 7, wherein: in step S2, K is 64, the instruction length is 48 bytes, and the structure of the modified instruction is: version number 1 byte, data encryption vector 16 bytes, data encryption key 16 bytes, response authentication 1 byte, option 1 byte, margin, encryption mode 1 byte, reserved 2 bytes, and instruction 2 bytes.

9. A method of identifying V2Ray traffic as in claim 7, wherein: step S3 comprises the steps of:

s31, modifying the instruction in the original data packet to obtain the modified instruction;

s32, generating X pieces of detection data according to the modified instruction, the authentication and the variable length, wherein the last byte of the data encryption key is assigned as a traversal value X, and the data encryption key is also assigned as traversal values X, and the traversal values X are traversed one by one from 0 to X;

s34, recording the number of bytes successfully transmitted;

10. A method of identifying V2Ray traffic as claimed in claim 1, wherein: in step S3, X is 32.