CN112187774A - Encrypted data length reduction method based on HTTP/2 transmission characteristics - Google Patents

Encrypted data length reduction method based on HTTP/2 transmission characteristics Download PDF

Info

Publication number
CN112187774A
CN112187774A CN202011012391.0A CN202011012391A CN112187774A CN 112187774 A CN112187774 A CN 112187774A CN 202011012391 A CN202011012391 A CN 202011012391A CN 112187774 A CN112187774 A CN 112187774A
Authority
CN
China
Prior art keywords
data
length
len
http
tls
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN202011012391.0A
Other languages
Chinese (zh)
Other versions
CN112187774B (en
Inventor
吴桦
李欣
程光
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Southeast University
Original Assignee
Southeast University
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Southeast University filed Critical Southeast University
Priority to CN202011012391.0A priority Critical patent/CN112187774B/en
Publication of CN112187774A publication Critical patent/CN112187774A/en
Application granted granted Critical
Publication of CN112187774B publication Critical patent/CN112187774B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L63/00Network architectures or network communication protocols for network security
    • H04L63/04Network architectures or network communication protocols for network security for providing a confidential data exchange among entities communicating through data packet networks
    • H04L63/0428Network architectures or network communication protocols for network security for providing a confidential data exchange among entities communicating through data packet networks wherein the data content is protected, e.g. by encrypting or encapsulating the payload
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L63/00Network architectures or network communication protocols for network security
    • H04L63/14Network architectures or network communication protocols for network security for detecting or protecting against malicious traffic
    • H04L63/1408Network architectures or network communication protocols for network security for detecting or protecting against malicious traffic by monitoring network traffic
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L67/00Network arrangements or protocols for supporting network services or applications
    • H04L67/01Protocols
    • H04L67/10Protocols in which an application is distributed across nodes in the network
    • H04L67/1001Protocols in which an application is distributed across nodes in the network for accessing one among a plurality of replicated servers
    • H04L67/1004Server selection for load balancing
    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y02TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
    • Y02DCLIMATE CHANGE MITIGATION TECHNOLOGIES IN INFORMATION AND COMMUNICATION TECHNOLOGIES [ICT], I.E. INFORMATION AND COMMUNICATION TECHNOLOGIES AIMING AT THE REDUCTION OF THEIR OWN ENERGY USE
    • Y02D30/00Reducing energy consumption in communication networks
    • Y02D30/70Reducing energy consumption in communication networks in wireless communication networks

Landscapes

  • Engineering & Computer Science (AREA)
  • Computer Security & Cryptography (AREA)
  • Computer Networks & Wireless Communication (AREA)
  • Signal Processing (AREA)
  • Computer Hardware Design (AREA)
  • Computing Systems (AREA)
  • General Engineering & Computer Science (AREA)
  • Data Exchanges In Wide-Area Networks (AREA)

Abstract

The invention discloses an encrypted data length reduction method based on HTTP/2 transmission characteristics, which is characterized in that based on acquired encrypted flow and partial plaintext data, a linear regression model is used for calculating the length of a head part added by a TLS protocol, and a convolutional neural network model is trained for calculating the length of information added by the HTTP/2 protocol; when side channel analysis is carried out on the HTTP/2 encrypted data, the trained linear regression model and the trained neural network model are used, the additional information lengths of the TLS protocol and the HTTP/2 protocol are subtracted from the encrypted flow, and the application data length before encryption can be accurately restored. The method can restore the length of the encrypted data transmitted by using the HTTP/2 protocol, makes up the blank that the current ciphertext length restoration works in the HTTP/2 field, and can be well applied to the encrypted flow side channel analysis taking the data length as the key characteristic under the background that the HTTP/2 is gradually popularized.

Description

Encrypted data length reduction method based on HTTP/2 transmission characteristics
Technical Field
The invention belongs to the technical field of computer network security, relates to a length reduction technology of encrypted data, and particularly relates to a method for reducing the length of the encrypted data based on HTTP/2 transmission characteristics.
Background
With the continuous improvement of network security awareness of people, more and more network application service providers start to perform encrypted transmission on network traffic, which protects the privacy of users to a certain extent, but also brings certain challenges to network supervision. Service providers often need to know the application data transported in the network in order to assess the user's Quality of Experience (QoE) for the application or for possible network security monitoring needs. Since the encrypted traffic cannot be directly decrypted, a side channel analysis method is usually used to study the data information of the encrypted traffic by using the data length after the encrypted traffic is restored (i.e. the length of the data in the clear text is restored) as a key feature. In this case, whether the plaintext data length before encryption can be accurately recovered from the encrypted network traffic directly affects the accuracy of the side channel analysis operation.
The HTTP protocol, which is the most widely used application layer protocol in the internet, is also one of the key factors affecting the variation of the length of encrypted data. Since the HTTPs encryption protocol based on HTTP/1.1 was proposed, not a few scholars have studied the length reduction work of encrypted data around HTTP/1.1. However, the rapid development of the network makes the HTTP/1.1 protocol increasingly prominent in terms of network latency and security, and in order to solve these problems, the HTTP/2 protocol is produced and gradually popularized. HTTP/2 is not a simple update of the HTTP/1.1 version, but a complete reconstruction on its transport mechanism, although compatible with the semantics of HTTP/1.1, with already great differences in protocol format. The HTTP/2 is structurally added with a binary frame layer, all transmitted request and response data are divided into smaller frames, the HTTP/1.1 text-based transmission mode is abandoned, and the binary form is adopted for transmission, so that the analysis is more efficient. In addition, the HTTP/2 also adopts a multiplexing technology, so that one connection can bear a plurality of HTTP requests, the frequent creation and closing of the connection are avoided, and the transmission performance is greatly improved. HTTP/2 improves bandwidth utilization while significantly reducing latency and is therefore widely adopted by various large service providers. Although the HTTP/2 protocol itself does not require that the TLS-based protocol be mandatory, all browsers currently only support HTTP/2 access over TLS, which also makes HTTP/2 a de facto new HTTPs transport standard, with more and more encrypted traffic being transported over HTTP/2.
In the currently disclosed literature, a length reduction method for encrypted traffic is mainly applied to the field of streaming media, and some researches calculate an offset generated by data length in an encryption transmission process according to the reason that the ciphertext length is offset relative to the plaintext length, so as to realize length reduction on the basis. However, most of the existing methods are performed for the HTTP/1.1 protocol, and the HTTP/2 and HTTP/1.1 are different significantly, so that none of the existing encryption traffic length reduction methods can be directly applied to encrypted data under the HTTP/2 protocol, and the research of the part is still in a blank state.
Disclosure of Invention
In order to solve the problems, the invention discloses an encrypted data length reduction method based on HTTP/2 transmission characteristics. The method comprises the steps of firstly collecting flow data of target application, then using collected encrypted data and partial plaintext data, using a linear regression model to calculate the header information length added by a TLS protocol during data encryption transmission, and training a convolutional neural network model to calculate the information length added by an HTTP/2 protocol. When length reduction is needed, the length of plaintext data before encryption can be accurately reduced from the encryption flow by using the trained linear regression model and the trained neural network model.
In order to achieve the purpose, the invention provides the following technical scheme:
an encrypted data length restoring method based on HTTP/2 transmission characteristics comprises the following steps:
(1) acquiring encrypted flow and corresponding plaintext data of a target application;
(2) finding out encrypted response data transmitted by only using a single TLS fragment and corresponding plaintext data thereof, and calculating the length of header information attached to the TLS protocol by using a linear regression model;
(3) extracting a TLS load length data set from the ciphertext data by using the linear regression model in the step (2), training a convolutional neural network model based on the data set, and calculating the frame number of HTTP/2 contained in the encrypted data;
(4) when the length reduction is needed to be carried out on the flow of the target application, collecting ciphertext data to be reduced by using collection equipment and storing the ciphertext data;
(5) and (4) removing the header information lengths added by the TLS protocol and the HTTP/2 protocol from the ciphertext data obtained in the step (4) by using the trained linear regression model and the neural network model, and restoring the corresponding plaintext length.
Further, the step (1) specifically includes the following sub-steps:
(1.1) selecting an application for data transmission by using an HTTP/2 protocol as a target application, connecting the acquisition equipment and a terminal with the target application to the same wireless network, and enabling the network flow of the terminal to pass through the acquisition equipment during transmission; the acquisition equipment is provided with an agent application and a network flow acquisition application which are respectively used for the acquisition work of plaintext data and ciphertext data;
(1.2) establishing a content list to be acquired according to the target application, and setting the content to be acquired currently as the first content in the list;
(1.3) setting a network where the terminal is located, and forbidding an HTTP/2 protocol;
(1.4) opening the target application, starting the agent application on the acquisition equipment, and starting acquisition work;
(1.5) browsing the content to be acquired currently in the target application, and closing the application after browsing;
(1.6) stopping the collection of the proxy application and storing the currently collected plaintext data file;
(1.7) closing the forbidden setting of the HTTP/2, opening the target application, and clearing the application cache data;
(1.8) starting a flow acquisition application on the acquisition equipment, repeating the step (1.5), stopping flow acquisition work, and storing a ciphertext data file acquired currently, wherein the ciphertext data file corresponds to the plaintext data file stored in the step (1.6);
(1.9) if the content list has the content which is not collected, setting the current content to be collected as the next content which is not collected, and entering the step (1.3), otherwise, finishing the collection work.
Further, in the step (1.3), the target application is forced to use HTTPs based on HTTP/1.1 for transmission, so that the capture device can obtain plaintext data through proxy service parsing.
Further, the step (2) specifically includes the following sub-steps:
(2.1) extracting the plaintext length of each response data from the plaintext data file;
(2.2) finding out a request message sent from a client to a server in the ciphertext data file, taking all server response messages between two request messages as response data of a previous request, and extracting a TCP load part of the response data from the ciphertext data file;
(2.3) finding encrypted response data and its corresponding plaintext data that are transmitted using only a single TLS fragment, when:
ADUlen=TCPloadlen-TLSheaderlen*1-framelen*1
wherein, ADUlenFor the restored application data length, i.e. the corresponding plaintext data length, TCPloadlenFor the sum of all TCP load lengths in the corresponding ciphertext data, TLSheaderlenAdditional header information length, frame, for each TLS fragmentlenIs HTTP/2 frame header length;
calculating TCPload from the TCP payload fraction obtained in (2.2)lenMarking the plaintext data length ADU in sequencelenTraining a linear regression model to obtain TLSheaderlen
Further, for response data containing only one TLS segment, the response data is encapsulated in one HTTP/2 frame at the time of transmission.
Further, the step (3) specifically includes the following sub-steps:
(3.1) obtaining each TLS segment from the TCP load of the ciphertext data, and utilizing the TLS header information length TLSheader obtained by the linear regression model in the step (2)lenRemoving the additional length of each TLS segment to obtain the load length of each TLS segment; extracting the load length of each TLS in each response data, and summarizing the load length into a data set;
(3.2) calculating the number of HTTP/2 frames contained in all TLS fragments of the response data according to the plaintext data by using the following formula, and taking the calculated number of frames as a label of the TLS load length data set:
Nh2=(TLSloadlen-ADUlen)/framelen
wherein N ish2TLSload for the calculated HTTP/2 frame numberlenADU is the sum of the payload lengths of all TLS fragments in the response datalenFor the plaintext length, frame, corresponding to the response datalenIs HTTP/2 frame header length;
and (3.3) training the one-dimensional convolutional neural network by using the data set, wherein the trained convolutional neural network model can calculate the number of contained HTTP/2 frames according to the load length of the TLS fragment in the encrypted data.
Further, the step (4) specifically includes the following sub-steps:
(4.1) connecting the acquisition equipment and the terminal with the target application to the same wireless network, and enabling the network flow of the terminal to pass through the acquisition equipment during transmission; installing a network flow acquisition application on the acquisition equipment;
(4.2) establishing ciphertext data to be restored, establishing a content list to be acquired, and setting the current content to be acquired as the first content in the list;
(4.3) opening the target application, clearing application cache data, starting a flow acquisition application on acquisition equipment, and starting acquisition work;
(4.4) browsing the content to be acquired currently in the target application, and closing the application after browsing;
(4.5) stopping the flow collection work, and storing the ciphertext data file collected currently;
(4.6) if the content list has the content which is not collected, setting the current content to be collected as the next content which is not collected, and entering the step (4.3), otherwise, finishing the collection work.
Further, in the step (5), the following formula is adopted to perform length reduction on the ciphertext data:
ADUlen=TCPloadlen-TLSheaderlen*NTLS-framelen*Nh2
wherein, ADUlenFor the restored application data length, i.e. plaintext data length, TCPloadlenIs the sum of all TCP load lengths during the application data transmission, TLSheaderlenAdditional header information length, N, for each TLS fragmentTLSThe number of TLS fragments, frame, in the application data transmissionlenFor HTTP/2 frame header length, Nh2HTTP/2 frame number that the application data is packetized at transmission time.
Further, the specific length reduction step is as follows:
(5.1) finding out a request message sent to a server by a client from ciphertext data to be restored, taking all server response messages between two request messages as response data of a previous request, and extracting TCP load parts of the response data;
(5.2) calculating the sum of the lengths of all TCP payload data in the response data, TCPloadlenAnd obtaining the number N of TLS fragments contained in the loadsTLSAnd TLS fragment information; according to the header information length TLSheader of the TLS fragment obtained from the linear regression model in the step (2)lenSubtracting the additional length of all TLS segments from the sum of the TCP load lengths to obtain the sum of the load lengths of all TLS segments, i.e.
TLSloadlen=TCPloadlen-TLSheaderlen*NTLS
Wherein, TLSloadlenIs the sum of the payload lengths of all TLS segments, and is also the total length of all HTTP/2 data frames;
(5.3) subtracting the additional header information length from each TLS fragment length to obtain the load length of each TLS fragment; calculating the HTTP/2 frame number N contained in the TLS segments by using the convolutional neural network model trained in the step (3)h2
(5.4) removing the length occupied by the HTTP/2 frame header from the TLS load length to obtain the restored plaintext data length, namely:
ADUlen=TLSloadlen-framelen*Nh2
further, according to RFC7540, the framelenFixed to 9 bytes.
Compared with the prior art, the invention has the following advantages and beneficial effects:
(1) the method can be directly used for data analysis work under the HTTP/2 protocol, can restore the length of the encrypted data transmitted by using the HTTP/2 protocol, makes up the blank of the prior ciphertext length restoration work in the HTTP/2 field, can be well applied to encrypted flow side channel analysis taking the data length as a key characteristic under the background of gradual popularization of the HTTP/2, and has good development prospect.
(2) The offset generated by the data length during encryption transmission is not a fixed value, the invention carries out length reduction based on the reason of the length offset, and uses a linear regression model and a convolutional neural network model to respectively obtain the additional lengths of the TLS protocol and the HTTP/2 protocol, thereby accurately calculating the total offset generated by the data length during the encryption process.
(3) The invention has certain universality because the process that the length of the application data is shifted when the application data is transmitted is approximately the same for the same type of application. Based on the method, all encrypted flows of the same type of application can be efficiently restored by using the trained model only by acquiring part of application data to train the model.
Drawings
FIG. 1 is a method framework for encrypted traffic length reduction;
FIG. 2 is a data collection environment topology diagram;
FIG. 3 is an encapsulation process and corresponding recovery process for response data transmitted using a single TLS;
FIG. 4 is a partial TLS payload length dataset;
fig. 5 is a flowchart of length reduction of ciphertext data.
Detailed Description
The technical solutions provided by the present invention will be described in detail below with reference to specific examples, and it should be understood that the following specific embodiments are only illustrative of the present invention and are not intended to limit the scope of the present invention.
Since the current length reduction work is mainly carried out to solve the problem of identification of encrypted video streams, and HTTP/2 protocol is widely used for transmission of video data, a specific description is given below around a representative Facebook among video applications. The frame of the encrypted data length restoring method based on the HTTP/2 transmission characteristics, which is provided by the invention, is shown in figure 1, and the method specifically comprises the following steps:
(1) video data of a target application Facebook for data transmission using the HTTP/2 protocol is collected, including encrypted traffic and corresponding plaintext data.
The specific process of the step is as follows:
and (1.1) taking a computer as acquisition equipment, and taking a smart phone with Facebook application as a mobile terminal. A wireless network is created on a computer using a wireless network card, and a mobile terminal is connected to the wireless network, and network traffic of the terminal passes through an acquisition device during transmission, as shown in fig. 2. Fiddler software is installed on the acquisition equipment as proxy application, and Wireshark software is installed on the acquisition equipment as network flow acquisition application, and the Fiddler software and the Wireshark software are respectively used for the acquisition work of plaintext data and ciphertext data.
And (1.2) establishing a content list to be acquired according to the target application, wherein the content list is a Facebook video list. Selecting 5 different video contents from a Facebook video list, repeatedly acquiring for 5 times aiming at 5 different resolutions, thereby establishing a list to be acquired containing 125 video contents (including repetition), and setting the current video content to be acquired as the first video content in the list;
(1.3) because the current proxy equipment only supports the analysis of HTTP/1.x, in order to acquire plaintext data, setting a network where a terminal is located, disabling an HTTP/2 protocol, and forcing a Facebook server to transmit by using HTTPS based on HTTP/1.x, so that the acquisition equipment can acquire the plaintext data through proxy service analysis;
(1.4) opening a Facebook application, searching and finding a current video to be acquired, and starting Fiddler software to start acquisition;
(1.5) browsing the content to be acquired currently in the Facebook application, in the example, clicking the video to play, and closing the Facebook application after the playing is finished;
(1.6) stopping the acquisition work of Fiddler, and storing the currently acquired plaintext data file;
(1.7) closing the forbidden setting of the HTTP/2, opening Facebook and clearing the cache data of the application;
(1.8) searching and finding the current video to be acquired, opening Wireshark software to acquire data, repeating the step (1.5), stopping the acquisition work of Wireshark, and storing the ciphertext data file acquired currently, wherein the ciphertext data file corresponds to the plaintext data file in the step (1.6).
(1.9) if the content list to be collected has the content which is not collected, setting the current video content to be collected as the next content which is not collected, and entering the step (1.3), otherwise, finishing the collection work.
(2) And finding out encrypted response data and corresponding plaintext data transmitted by using only a single TLS fragment, and calculating the length of the header information attached by the TLS protocol by using a linear regression model.
The method specifically comprises the following steps:
and (2.1) extracting the plaintext length of each response data from the plaintext data file. At the Facebook server, each video is cut into several video segments, and each response of the server transmits one video segment, and the information of the segments is stored in an mpd (media Presentation description) file. The MPD file is sent to the client when the client requests the video for the first time, and thus the first segment in the response data is the MPD file. The file is obtained from the file collected by Fiddler, that is, the sizes and corresponding sequences of all video segments of the video can be extracted from the file, so that the plaintext data length of each response data is obtained.
The partial video clip information extracted from the captured plaintext data file is shown in table 1.
Table 1 plaintext information for a portion of a video segment
Figure BDA0002697910740000071
And (2.2) finding the request message sent to the Facebook server by the client side in the ciphertext data file, taking all server side response messages between the two request messages as response data of the previous request, and extracting the TCP load parts of the response messages from the ciphertext data file.
(2.3) during the transmission of the application data, the data of one HTTP/2 frame may be split into a plurality of TLS fragments for transmission, but if only one TLS fragment exists in certain response data, the response data is only encapsulated in one HTTP/2 frame during transmission.
Therefore, finding the encrypted response data and the corresponding plaintext data transmitted by using only a single TLS fragment, the encapsulation process and the corresponding length recovery process in the application data transmission are shown in fig. 3, where:
ADUlen=TCPloadlen-TLSheaderlen*1-framelen*1
wherein, ADUlenFor the restored application data length (i.e., the restored plaintext data length), TCPloadlenFor the sum of all TCP load lengths in the corresponding ciphertext data, TLSheaderlenAdditional header information length, frame, for each TLS fragmentlenFor HTTP/2 frame header length (frame according to RFC 7540)lenFixed to 9 bytes).
Calculating TCPload from TCP load obtained in (2.2)lenSequentially marking the plaintext data length (ADU) according to the plaintext data obtained in (2.1)len) A training data set is obtained (part of the data set is shown in table 2). Training a linear regression model by using the data set to obtain the head information length (TLSheader) attached to each TLS fragmentlen) Is 22 bytes.
TABLE 2 partial data set
Figure BDA0002697910740000072
Figure BDA0002697910740000081
(3) And (3) obtaining a TLS load length data set from the ciphertext data by using the linear regression model in the step (2), and training a convolutional neural network model based on the data set for calculating the number of HTTP/2 frames contained in the encrypted data.
The method specifically comprises the following steps:
(3.1) acquiring each TLS fragment data from the TCP load of the ciphertext data, and utilizing the TLS header information length (TLSheader) obtained by the linear regression model in the step (2)len) The additional length is removed for each TLS segment, and the payload length of each TLS segment is obtained. And extracting the TLS load length of each response data in the ciphertext data, and summarizing the TLS load length into a data set. (partial data set as shown in FIG. 4);
(3.2) calculating the number of HTTP/2 frames contained in all TLS fragments of the response data according to the plaintext data by using the following formula, and taking the calculated number of frames as a label of the TLS load length data set:
Nh2=(TLSloadlen-ADUlen)/framelen
wherein N ish2TLSload for the calculated HTTP/2 frame numberlenThe ADU is the sum of the payload lengths of all TLS fragments in the response data (and also the total length of all HTTP/2 data frames), andlenfor the plaintext length, frame, corresponding to the response datalenFor HTTP/2 frame header length (frame according to RFC 7540)lenFixed to 9 bytes).
And (3.3) training the one-dimensional convolutional neural network by using the data set, wherein the accuracy of the trained model is 89.84%. By using the trained convolutional neural network model, the number of contained HTTP/2 frames can be calculated according to the condition of TLS load length in the encrypted data.
(4) And when the encrypted flow of the Facebook application needs to be restored, acquiring and storing ciphertext data to be restored by using acquisition equipment.
The method for collecting the ciphertext data to be restored in the step comprises the following steps:
and (4.1) taking a computer as acquisition equipment, and taking a smart phone with Facebook application as a mobile terminal. A wireless network card is used on a computer to create a wireless network, and a mobile terminal is connected to the wireless network, so that network traffic of the terminal passes through an acquisition device during transmission, as shown in fig. 2. Installing Wireshark software on the acquisition equipment as network flow acquisition application;
(4.2) the data acquisition personnel establishes a video content list to be acquired according to the requirements, and sets the current video content to be acquired as the first video content in the list;
and (4.3) opening a Facebook, searching and finding a video to be acquired at present after the cache data of the application is cleared, and starting Wireshark software to start acquisition.
(4.4) browsing the content to be acquired currently in the Facebook application, namely clicking the video to play, and closing the Facebook application after the playing is finished;
(4.5) stopping Wireshark flow collection work, and storing the currently collected ciphertext data file;
(4.6) if the video content list has the content which is not collected, setting the current content to be collected as the next content which is not collected, and entering the step (4.3), otherwise, finishing the collection work.
(5) And (4) carrying out length reduction on the ciphertext data in the step (4) by using the trained linear regression model and the trained neural network model. In this step, the length reduction is performed according to the following formula:
ADUlen=TCPloadlen-TLSheaderlen*NTLS-framelen*Nh2
wherein, ADUlenFor the restored video data length (i.e., plaintext data length), TCPloadlenIs the sum of all TCP load lengths during video encryption transmission, TLSheaderlenAdditional header information length, N, for each TLS fragmentTLSFor the number of TLS fragments, frame in video encryption transmissionlenFor HTTP/2 frame header lengthDegree (according to RFC7540, frame)lenFixed to 9 bytes), Nh2The video data is transmitted by the distributed HTTP/2 frame number.
The length reduction process is shown in fig. 5, and comprises the following specific steps:
(5.1) finding out the request messages sent to the Facebook server by the client side in the ciphertext data files to be restored, taking all server side response messages between the two request messages as response data of the previous request, and extracting TCP load parts of the response messages.
(5.2) calculate the sum of the lengths of all TCP payload data in a response data (i.e. TCPload)len) And obtains the number (N) of TLS fragments contained in these loadsTLS) And TLS fragment information. Obtaining TLSheader according to the linear regression model in the step (2)lenSubtracting the additional information length of all TLS segments from the TCP payload length to obtain the sum of the payload lengths of all TLS segments of the response data, which is:
TLSloadlen=TCPloadlen-TLSheaderlen*NTLS
wherein, TLSloadlenIs the sum of the payload lengths of all TLS fragments and is also the total length of all HTTP/2 data frames in the response data.
In an example of captured encrypted video data, the extracted portion TLSload is extractedlenAs shown in table 3:
table 3 extract of TLSloadlen
Serial number TCPloadlen NTLS TLSloadlen
1 12357 2 12313
2 21437 3 21371
3 12427 2 12383
4 24164 3 24098
5 12324 2 12280
6 139881 19 139463
7 12302 2 12258
8 137614 20 137174
9 12306 2 12262
10 93918 14 93610
(5.3) subtracting the additional TLS header information length (TLSheader) from each TLS fragment lengthlen) And obtaining the load length of each TLS fragment. Using the convolutional neural network model trained in the step (3), taking the load length of each TLS segment as input, and calculating the number of HTTP/2 frames (namely N) contained in the TLS segments by using the modelh2)。
(5.4) removing the length occupied by the HTTP/2 frame header from the TLS load length to obtain the corresponding plaintext data length, namely:
ADUlen=TLSloadlen-framelen*Nh2
in an example of captured encrypted video data, the extracted feature values and the recovered partial plaintext length are shown in table 4:
TABLE 4 extracted eigenvalues and recovered partial plaintext length
Serial number TLSloadlen Nh2 ADUlen
1 12313 1 12304
2 21371 2 21353
3 12383 1 12374
4 24098 2 24080
5 12280 1 12271
6 139463 10 139373
7 12258 2 12240
8 137174 10 137084
9 12262 1 12253
10 93610 7 93547
The technical means disclosed in the invention scheme are not limited to the technical means disclosed in the above embodiments, but also include the technical scheme formed by any combination of the above technical features. It should be noted that those skilled in the art can make various improvements and modifications without departing from the principle of the present invention, and such improvements and modifications are also considered to be within the scope of the present invention.

Claims (10)

1. An encrypted data length restoring method based on HTTP/2 transmission characteristics is characterized by comprising the following steps:
(1) acquiring encrypted flow and corresponding plaintext data of a target application;
(2) finding out encrypted response data transmitted by only using a single TLS fragment and corresponding plaintext data thereof, and calculating the length of header information attached to the TLS protocol by using a linear regression model;
(3) extracting a TLS load length data set from the ciphertext data by using the linear regression model in the step (2), training a convolutional neural network model based on the data set, and calculating the frame number of HTTP/2 contained in the encrypted data;
(4) when the length reduction is needed to be carried out on the flow of the target application, collecting ciphertext data to be reduced by using collection equipment and storing the ciphertext data;
(5) and (4) removing the header information lengths added by the TLS protocol and the HTTP/2 protocol from the ciphertext data obtained in the step (4) by using the trained linear regression model and the neural network model, and restoring the corresponding plaintext length.
2. The encrypted data length reduction method based on HTTP/2 transmission characteristics as claimed in claim 1, wherein the step (1) specifically includes the following sub-steps:
(1.1) selecting an application for data transmission by using an HTTP/2 protocol as a target application, connecting the acquisition equipment and a terminal with the target application to the same wireless network, and enabling the network flow of the terminal to pass through the acquisition equipment during transmission; the acquisition equipment is provided with an agent application and a network flow acquisition application which are respectively used for the acquisition work of plaintext data and ciphertext data;
(1.2) establishing a content list to be acquired according to the target application, and setting the content to be acquired currently as the first content in the list;
(1.3) setting a network where the terminal is located, and forbidding an HTTP/2 protocol;
(1.4) opening the target application, starting the agent application on the acquisition equipment, and starting acquisition work;
(1.5) browsing the content to be acquired currently in the target application, and closing the application after browsing;
(1.6) stopping the collection of the proxy application and storing the currently collected plaintext data file;
(1.7) closing the forbidden setting of the HTTP/2, opening the target application, and clearing the application cache data;
(1.8) starting a flow acquisition application on the acquisition equipment, repeating the step (1.5), stopping flow acquisition work, and storing a ciphertext data file acquired currently, wherein the ciphertext data file corresponds to the plaintext data file stored in the step (1.6);
(1.9) if the content list has the content which is not collected, setting the current content to be collected as the next content which is not collected, and entering the step (1.3), otherwise, finishing the collection work.
3. The encrypted data length reduction method based on HTTP/2 transmission characteristics according to claim 1, wherein in the step (1.3), the target application is forced to use HTTP/1.1 based HTTPs for transmission, so that the capture device can obtain plaintext data through proxy service parsing.
4. The encrypted data length reduction method based on HTTP/2 transmission characteristics as claimed in claim 1, wherein the step (2) specifically includes the following sub-steps:
(2.1) extracting the plaintext length of each response data from the plaintext data file;
(2.2) finding out a request message sent from a client to a server in the ciphertext data file, taking all server response messages between two request messages as response data of a previous request, and extracting a TCP load part of the response data from the ciphertext data file;
(2.3) finding encrypted response data and its corresponding plaintext data that are transmitted using only a single TLS fragment, when:
ADUlen=TCPloadlen-TLSheaderlen*1-framelen*1
wherein, ADUlenFor the restored application data length, i.e. the corresponding plaintext data length, TCPloadlenFor the sum of all TCP load lengths in the corresponding ciphertext data, TLSheaderlenAdditional header information length, frame, for each TLS fragmentlenIs HTTP/2 frame header length;
calculating TCPload from the TCP payload fraction obtained in (2.2)lenMarking the plaintext data length ADU in sequencelenTraining a linear regression model to obtain TLSheaderlen
5. The encrypted data length reduction method according to claim 4, wherein the response data containing only one TLS segment is encapsulated in one HTTP/2 frame during transmission.
6. The encrypted data length reduction method based on HTTP/2 transmission characteristics as claimed in claim 1, wherein the step (3) specifically includes the following sub-steps:
(3.1) obtaining each TLS segment from the TCP load of the ciphertext data, and utilizing the TLS header information length TLSheader obtained by the linear regression model in the step (2)lenRemoving the additional length of each TLS segment to obtain the load length of each TLS segment; extracting the load length of each TLS in each response data, and summarizing the load length into a data set;
(3.2) calculating the number of HTTP/2 frames contained in all TLS fragments of the response data according to the plaintext data by using the following formula, and taking the calculated number of frames as a label of the TLS load length data set:
Nh2=(TLSloadlen-ADUlen)/framelen
wherein N ish2TLSload for the calculated HTTP/2 frame numberlenADU is the sum of the payload lengths of all TLS fragments in the response datalenFor the plaintext length, frame, corresponding to the response datalenIs HTTP/2 frame header length;
and (3.3) training the one-dimensional convolutional neural network by using the data set, wherein the trained convolutional neural network model can calculate the number of contained HTTP/2 frames according to the load length of the TLS fragment in the encrypted data.
7. The encrypted data length reduction method based on HTTP/2 transmission characteristics as claimed in claim 1, wherein the step (4) specifically includes the following sub-steps:
(4.1) connecting the acquisition equipment and the terminal with the target application to the same wireless network, and enabling the network flow of the terminal to pass through the acquisition equipment during transmission; installing a network flow acquisition application on the acquisition equipment;
(4.2) establishing ciphertext data to be restored, establishing a content list to be acquired, and setting the current content to be acquired as the first content in the list;
(4.3) opening the target application, clearing application cache data, starting a flow acquisition application on acquisition equipment, and starting acquisition work;
(4.4) browsing the content to be acquired currently in the target application, and closing the application after browsing;
(4.5) stopping the flow collection work, and storing the ciphertext data file collected currently;
(4.6) if the content list has the content which is not collected, setting the current content to be collected as the next content which is not collected, and entering the step (4.3), otherwise, finishing the collection work.
8. The HTTP/2 transmission characteristic-based encrypted data length reduction method according to claim 1, wherein in the step (5), the ciphertext data is subjected to length reduction by using the following formula:
ADUlen=TCPloadlen-TLSheaderlen*NTLS-framelen*Nh2
wherein, ADUlenFor the restored application data length, i.e. plaintext data length, TCPloadlenIs the sum of all TCP load lengths during the application data transmission, TLSheaderlenAdditional header information length, N, for each TLS fragmentTLSThe number of TLS fragments, frame, in the application data transmissionlenFor HTTP/2 frame header length, Nh2HTTP/2 frame number that the application data is packetized at transmission time.
9. The encrypted data length reduction method based on HTTP/2 transmission characteristics as claimed in claim 8, wherein the specific length reduction step is:
(5.1) finding out a request message sent to a server by a client from ciphertext data to be restored, taking all server response messages between two request messages as response data of a previous request, and extracting TCP load parts of the response data;
(5.2) calculating the sum of the lengths of all TCP payload data in the response data, TCPloadlenAnd obtaining the number N of TLS fragments contained in the loadsTLSAnd TLS fragment information; according to the header information length TLSheader of the TLS fragment obtained from the linear regression model in the step (2)lenSubtracting the additional length of all TLS segments from the sum of the TCP load lengths to obtain the sum of the load lengths of all TLS segments, i.e.
TLSloadlen=TCPloadlen-TLSheaderlen*NTLS
Wherein, TLSloadlenIs the sum of the payload lengths of all TLS segments, and is also the total length of all HTTP/2 data frames;
(5.3) subtracting the additional header information length from each TLS fragment length to obtain the load length of each TLS fragment; calculating the HTTP/2 frame number N contained in the TLS segments by using the convolutional neural network model trained in the step (3)h2
(5.4) removing the length occupied by the HTTP/2 frame header from the TLS load length to obtain the restored plaintext data length, namely:
ADUlen=TLSloadlen-framelen*Nh2
10. the encrypted data length reduction method based on HTTP/2 transmission characteristics according to any of claims 4, 6, 8 and 9, characterized in that said frame is according to RFC7540lenFixed to 9 bytes.
CN202011012391.0A 2020-09-23 2020-09-23 Encrypted data length reduction method based on HTTP/2 transmission characteristics Active CN112187774B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202011012391.0A CN112187774B (en) 2020-09-23 2020-09-23 Encrypted data length reduction method based on HTTP/2 transmission characteristics

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202011012391.0A CN112187774B (en) 2020-09-23 2020-09-23 Encrypted data length reduction method based on HTTP/2 transmission characteristics

Publications (2)

Publication Number Publication Date
CN112187774A true CN112187774A (en) 2021-01-05
CN112187774B CN112187774B (en) 2023-03-24

Family

ID=73956232

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202011012391.0A Active CN112187774B (en) 2020-09-23 2020-09-23 Encrypted data length reduction method based on HTTP/2 transmission characteristics

Country Status (1)

Country Link
CN (1) CN112187774B (en)

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113179223A (en) * 2021-04-23 2021-07-27 中山大学 Network application identification method and system based on deep learning and serialization features
CN113254975A (en) * 2021-06-15 2021-08-13 湖南三湘银行股份有限公司 Digital financial data sharing method

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110602059A (en) * 2019-08-23 2019-12-20 东南大学 Method for accurately restoring clear text length fingerprint of TLS protocol encrypted transmission data
CN110855669A (en) * 2019-11-14 2020-02-28 北京理工大学 Video QoE index prediction method suitable for encrypted flow based on neural network
CN111131069A (en) * 2019-11-25 2020-05-08 北京理工大学 Abnormal encryption flow detection and classification method based on deep learning strategy
CN111428225A (en) * 2020-02-26 2020-07-17 深圳壹账通智能科技有限公司 Data interaction method and device, computer equipment and storage medium

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110602059A (en) * 2019-08-23 2019-12-20 东南大学 Method for accurately restoring clear text length fingerprint of TLS protocol encrypted transmission data
CN110855669A (en) * 2019-11-14 2020-02-28 北京理工大学 Video QoE index prediction method suitable for encrypted flow based on neural network
CN111131069A (en) * 2019-11-25 2020-05-08 北京理工大学 Abnormal encryption flow detection and classification method based on deep learning strategy
CN111428225A (en) * 2020-02-26 2020-07-17 深圳壹账通智能科技有限公司 Data interaction method and device, computer equipment and storage medium

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
HUA WU等: "Monitoring Video Resolution of Adaptive Encrypted Video Traffic Based on HTTP/2 Features", 《IEEE INFOCOM 2021 - IEEE CONFERENCE ON COMPUTER COMMUNICATIONS WORKSHOPS (INFOCOM WKSHPS)》 *

Cited By (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113179223A (en) * 2021-04-23 2021-07-27 中山大学 Network application identification method and system based on deep learning and serialization features
CN113254975A (en) * 2021-06-15 2021-08-13 湖南三湘银行股份有限公司 Digital financial data sharing method
CN113254975B (en) * 2021-06-15 2021-09-28 湖南三湘银行股份有限公司 Digital financial data sharing method

Also Published As

Publication number Publication date
CN112187774B (en) 2023-03-24

Similar Documents

Publication Publication Date Title
US10652265B2 (en) Method and apparatus for network forensics compression and storage
CN112187774B (en) Encrypted data length reduction method based on HTTP/2 transmission characteristics
CN110519177B (en) Network traffic identification method and related equipment
US20110125748A1 (en) Method and Apparatus for Real Time Identification and Recording of Artifacts
Qian et al. Characterizing resource usage for mobile web browsing
US8761757B2 (en) Identification of communication devices in telecommunication networks
CN102045393A (en) Method, equipment and system for controlling band width
CN109275045B (en) DFI-based mobile terminal encrypted video advertisement traffic identification method
EP3364627B1 (en) Adaptive session intelligence extender
CN109831448A (en) For the detection method of particular encryption web page access behavior
CN112203136B (en) Method and device for predicting definition of encrypted flow video
CN103473107A (en) Interactive interface dynamic update method based on movable middleware
CN101212485A (en) Method for obtaining stream media link address
CN103546829A (en) Method and device for processing video service
CN101572633B (en) Network forensics method and system
CN110691007A (en) Method for accurately measuring QUIC connection packet loss rate
CN110602059B (en) Method for accurately restoring clear text length fingerprint of TLS protocol encrypted transmission data
CN113114968A (en) Video processing method, device, equipment and storage medium
CN112350986B (en) Shaping method and system for audio and video network transmission fragmentation
US9130827B2 (en) Sampling from distributed streams of data
Dubin et al. Video quality representation classification of encrypted http adaptive video streaming
CN115174961A (en) Multi-platform video flow early identification method facing high-speed network
CN113407880A (en) Access behavior identification method suitable for encrypted HTTP/2 webpage
CN112399209B (en) Video service identification processing method and device
CN113438503A (en) Video file restoration method and device, computer equipment and storage medium

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant