CN110602059B - Method for accurately restoring clear text length fingerprint of TLS protocol encrypted transmission data - Google Patents

Method for accurately restoring clear text length fingerprint of TLS protocol encrypted transmission data Download PDF

Info

Publication number
CN110602059B
CN110602059B CN201910782693.7A CN201910782693A CN110602059B CN 110602059 B CN110602059 B CN 110602059B CN 201910782693 A CN201910782693 A CN 201910782693A CN 110602059 B CN110602059 B CN 110602059B
Authority
CN
China
Prior art keywords
adu
plaintext
tls
data
ciphertext
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN201910782693.7A
Other languages
Chinese (zh)
Other versions
CN110602059A (en
Inventor
吴桦
吴秋艳
程光
于振华
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Southeast University
Original Assignee
Southeast University
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Southeast University filed Critical Southeast University
Priority to CN201910782693.7A priority Critical patent/CN110602059B/en
Publication of CN110602059A publication Critical patent/CN110602059A/en
Application granted granted Critical
Publication of CN110602059B publication Critical patent/CN110602059B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F17/00Digital computing or data processing equipment or methods, specially adapted for specific functions
    • G06F17/10Complex mathematical operations
    • G06F17/18Complex mathematical operations for evaluating statistical data, e.g. average values, frequency distributions, probability functions, regression analysis
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L63/00Network architectures or network communication protocols for network security
    • H04L63/12Applying verification of the received information
    • H04L63/123Applying verification of the received information received data contents, e.g. message integrity
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L63/00Network architectures or network communication protocols for network security
    • H04L63/16Implementing security features at a particular protocol layer
    • H04L63/168Implementing security features at a particular protocol layer above the transport layer

Abstract

The invention discloses a method for accurately restoring the fingerprint of the plaintext length of encrypted transmission data of a TLS protocol. The method comprises the steps of firstly, acquiring application data on proxy equipment, obtaining the plaintext data of the application, extracting length characteristics of the plaintext data to form a plaintext dictionary of an Application Data Unit (ADU), then extracting ciphertext length fingerprints from ciphertext data corresponding to the ADU plaintext dictionary, labeling the ciphertext length fingerprints through the plaintext dictionary, and obtaining a TLS protocol encrypted data length fingerprint accurate recovery regression model through machine learning. When the model is required to be used, ciphertext length fingerprints are extracted from the ADU encrypted and transmitted by the TLS protocol, and then regression analysis is performed by using an accurate recovery model obtained by machine learning, so that application layer plaintext ADU length fingerprints corresponding to the ADU ciphertext data can be accurately recovered. The invention has universality, and the recovered plaintext length fingerprint can be applied to the most common application content identification by using TLS1.2 encryption transmission.

Description

Method for accurately restoring clear text length fingerprint of TLS protocol encrypted transmission data
Technical Field
The invention belongs to the technical field of network security, and particularly relates to a method for accurately restoring a TLS protocol encrypted transmission data plaintext length fingerprint.
Background
The original design function of the internet has not kept up with the actual demand, privacy protection and Security protection are issues that must be considered for internet applications, and implementing encrypted transmission of data by using the TLS (transport Layer Security) protocol on top of the TCP protocol is the most common method for implementing secure transmission. Because the importance degrees of the data are different, some applications only encrypt and transmit the login data of the user, and some applications encrypt and transmit all the data, and the measures provide good safety protection for the Internet application. On the other hand, however, the increase in the proportion of encrypted traffic poses a significant challenge to internet traffic policing. How to extract information required for network management and security management from encrypted data has become an urgent problem to be solved in the internet.
At present, data analysis on encrypted traffic is mainly classified into two categories according to analysis targets: type identification and content identification. Research on identifying the type of the encrypted traffic is carried out earlier, and the range of the encrypted traffic is wide, including identification of the application type of the network traffic, identification of malicious software traffic, identification of an encrypted video playing mode and the like. The analysis does not relate to the specific content of the user information, most of the encryption flow analysis methods are similar from the aspect of the method, firstly, the characteristics with high relevance to type classification are extracted from the data, after the training data are labeled, a classification model is obtained through machine learning, and then the trained model is used for carrying out classification and identification on the test data. The encrypted traffic type identification needs to have a labeled training data set, and finally, the encrypted traffic type classification result can be obtained. However, the result cannot contain information related to the specific content of the upper application, such as the specific content transmitted by the user, the operation of interacting with the server, and the like, which cannot be identified.
There is a great demand and the most challenging in the area of network management and network security is the identification of encrypted application content. With the popularity of encrypted transmissions, the identification of encrypted application content becomes the greatest need and challenge for network regulators. Because of the completeness of the encryption algorithm, the content of the encrypted flow cannot be directly identified, but the data length after the data is encrypted has a relatively close corresponding relationship with the plaintext data length, the data length is generally called as the length fingerprint of the data, the identification of the encrypted data content needs to match the ciphertext length fingerprint of the encrypted transmission data with the plaintext length fingerprint in the fingerprint library, the existing matching algorithm directly matches the ciphertext length fingerprint with the plaintext length fingerprint, so that the identification error is large, and the existing matching algorithm cannot be used in a large-scale fingerprint library. However, no literature research exists on how to restore the ciphertext length fingerprint into the plaintext length fingerprint and then match the plaintext length fingerprint, so that the invention provides a new method, which can be applied to the identification of encrypted application contents by extracting TLS fragmentation characteristics, restoring the ciphertext length fingerprint into the plaintext length fingerprint and then matching the plaintext length fingerprint.
Disclosure of Invention
The purpose of the invention is as follows: aiming at the problems, the invention discloses a method for accurately restoring the fingerprint of the plaintext length of the encrypted transmission data of the TLS protocol. The method comprises the steps of firstly, acquiring application data on proxy equipment, obtaining the plaintext data of the application, extracting length characteristics of the plaintext data to form a plaintext dictionary of an Application Data Unit (ADU), then extracting ciphertext length fingerprints from ciphertext data corresponding to the ADU plaintext dictionary, labeling the ciphertext length fingerprints through the plaintext dictionary, and obtaining a TLS protocol encrypted data length fingerprint accurate recovery regression model through machine learning. When the model is required to be used, ciphertext length fingerprints are extracted from the ADU encrypted and transmitted by the TLS protocol, and then regression analysis is performed by using an accurate recovery model obtained by machine learning, so that application layer plaintext ADU length fingerprints corresponding to the ADU ciphertext data can be accurately recovered. The invention has universality, and the recovered plaintext length fingerprint can be applied to the most common application content identification by using TLS1.2 encryption transmission.
The technical scheme is as follows: in order to realize the purpose of the invention, the technical scheme adopted by the invention is as follows: a method for accurately restoring the clear text length fingerprint of TLS protocol encrypted transmission data specifically comprises the following steps:
(1) acquiring plaintext data information of application, and extracting plaintext characteristics of the information to form an Application Data Unit (ADU) plaintext dictionary;
(2) collecting ciphertext data corresponding to the ADU plaintext dictionary, and extracting ciphertext transmission fingerprint feature L of AUD from the ciphertext dataencrypted_ADU、NTLSAnd LHTTPhead_TLSWherein L isencrypted_ADUThe data length after being encrypted and transmitted for an ADU; n is a radical ofTLSThe number of TLS encryption blocks used for transmitting encrypted data of an ADU; l isHTTPhead_TLSThe length of the TLS block used for transmitting HTTP protocol header information when transmitting the encrypted data of an ADU;
(3) searching a plaintext dictionary by using the transmission ciphertext fingerprint characteristics extracted in the step (2), and acquiring a corresponding plaintext ADU length fingerprint Lplaintext_ADUObtaining a training set as a ciphertext label, and obtaining a regression model containing the relation between ciphertext transmission fingerprint characteristics and plaintext ADU length fingerprints through machine learning;
(4) when the ciphertext data needs to be identified, the ciphertext data is collected and stored;
(5) using the same party in the step (2) for the ciphertext data collected in the step (4)Method for ciphertext transmission fingerprint feature Lencrypted_ADU、NTLSAnd LHTTPhead_TLSExtracting;
(6) using the regression model obtained in the step (3) to extract L in the step (5)encrypted_ADU、NTLSAnd LHTTPhead_TLSSubstituting into regression model to recover fingerprint L with corresponding plaintext lengthplaintext_ADU
Further, in step (1), collecting plaintext data information of the application, and extracting plaintext features of the information to form an application data unit ADU plaintext dictionary:
(1.1) accessing the mobile terminal to a network through a hot spot provided by a PC;
(1.2) running agent software on the PC to acquire plaintext data;
(1.3) extracting the characteristics of the application layer ADU from the plaintext data.
Further, in the step (2), ciphertext data corresponding to the ADU plaintext dictionary is collected, and ciphertext transmission fingerprint feature L of the AUD is extracted from the ciphertext dataencrypted_ADU、NTLSAnd LHTTPhead_TLSThe method comprises the following steps:
(2.1) accessing the mobile terminal to a network through a hot spot provided by a PC (personal computer) and running an encryption application;
(2.2) collecting ciphertext data at the access point;
(2.3) opening the ciphertext data file, counting the data volume transmitted by a TLS protocol between all IP pairs, wherein the video stream is the IP stream with the largest data volume, and extracting the IP address pair with the largest flow;
(2.4) filtering all TCP flows between the pair of IP addresses according to the quintuple rule for the flow between the pair of IP addresses;
(2.5) separately extracting L for ADU in each TCP streamencrypted_ADUAnd NTLS
Extraction of Lencrypted_ADU: the sum of the application layer data volume lengths of all response messages returned by an ADU request is the ciphertext transmission fingerprint characteristic L of the ADUencrypted_ADU
Extraction of NTLS: counting the total TLS blocks contained in all encrypted messages of one HTTP responseThe number of TLS blocks used for transmitting HTTP response headers is not counted, and N is obtained after the total number of TLS blocks is reduced by 1TLS
(2.6) for each request, an ADU ciphertext is corresponding, the ADU ciphertext comprises a plurality of TLS blocks, if the length of the first TLS block is between 400 bytes and 800 bytes, the first TLS block is a valid sample, and the length of the first TLS block is counted as L of the ADUHTTPhead_TLS
Further, the specific method of step (3) is as follows:
(3.1) searching all effective samples in a plaintext dictionary according to the ciphertext ADU characteristics in the step (2) to obtain the length fingerprint L of the plaintextplaintext_ADU
(3.2) then using the three eigenvalues L extracted in step (2)encrypted_ADU、NTLSAnd LHTTPhead_TLSMarking the plaintext ADU in the corresponding plaintext dictionary by the value to form a training set;
and (3.3) performing fitting training on the training set to obtain a regression model of the relation between the transmission ciphertext ADU length fingerprint and the plaintext ADU length fingerprint.
Further, in the step (4), when the ciphertext data needs to be identified, the method for collecting and storing the ciphertext data is as follows: starting data acquisition equipment at a monitoring point, monitoring a platform source of a video according to needs, collecting IP addresses of corresponding servers to form an IP address list to be detected, and acquiring and storing data messages of which the source IP addresses or the destination IP addresses are contained in the range of the IP addresses to be detected.
Has the advantages that: compared with the prior art, the technical scheme of the invention has the following beneficial technical effects:
(1) in the identification of encrypted application contents, the existing research mainly focuses on the optimization research of a ciphertext length fingerprint and plaintext length fingerprint matching algorithm, but whether the input data of the matching algorithm, namely a data fingerprint extraction method, is reasonable and credible is not researched. Because the ciphertext fingerprint length of the same plaintext is related to various factors during each actual transmission and is not fixed length, the existing method for directly matching the ciphertext fingerprint length and the plaintext fingerprint length must use a larger confidence interval during matching, and thus the misjudgment rate of the matching algorithm is higher. The invention provides a method for accurately restoring the plaintext fingerprint of TLS protocol encrypted transmission data, which can ensure that a matching algorithm uses the transmission fingerprint after accurate restoration to match with the plaintext, and can use a smaller confidence interval so as to reduce the misjudgment rate of the matching algorithm.
(2) Aiming at the most universal TLS1.2 protocol specification, the invention provides that TLS fragment characteristics are used as main fitting characteristics for fingerprint restoration, three values of data length when an application layer data unit is encrypted and transmitted, the data length of the first TLS fragment conforming to the HTTP Head length range and the TLS fragment number (not including the fragment where the HTTP Head is located) carrying single ADU encrypted data are used as characteristics, and regression model training is carried out by extracting the characteristics.
(3) The method has universality, and at present, most application encryption transmission uses TLS1.2 protocol, so the method can be applied to identification of most encryption application contents.
Drawings
FIG. 1 is a schematic diagram of a system configuration of the method of the present invention;
FIG. 2 shows a process of an Application Data Unit (ADU) being encapsulated by an HTTP protocol and a TLS protocol to form a TCP packet
FIG. 3 is a timing diagram of HTTP request messages from a client to 1 ADU and HTTP responses from the server back to the ADU distributed among a plurality of response messages;
FIG. 4 shows the ciphertext transmission fingerprint L of the AUD extracted from the ciphertext dataencrypted_ADU、NTLSAnd LHTTPhead_TLSThe meaning of (a);
FIG. 5 shows ciphertext transmission fingerprint L of AUD extracted from ciphertext dataencrypted_ADU、NTLSAnd LHTTPhead_TLSAnd (4) a flow chart.
Detailed Description
The technical solution of the present invention is further described below with reference to the accompanying drawings and examples.
The invention provides a method for accurately restoring a clear text fingerprint of TLS protocol encrypted transmission data, which specifically comprises the following steps:
(1) acquiring plaintext Data information of an application through an agent and extracting plaintext characteristics of the information to form an Application Data Unit (ADU) (application Data unit) plaintext dictionary;
(2) collecting ciphertext data corresponding to the ADU plaintext dictionary, and extracting ciphertext transmission fingerprint feature L of AUD from the ciphertext dataencrypted_ADU、NTLSAnd LHTTPhead_TLSWherein L isencrypted_ADUThe data length after being encrypted and transmitted for an ADU; n is a radical ofTLSThe number of TLS encryption blocks used for transmitting encrypted data of an ADU; l isHTTPhead_TLSThe length of the TLS block used for transmitting HTTP protocol header information when transmitting the encrypted data of an ADU;
(3) searching a plaintext dictionary by using the transmission ciphertext fingerprint characteristics extracted in the step (2), and acquiring a corresponding plaintext ADU length fingerprint Lplaintext_ADUObtaining a training set as a ciphertext label, and obtaining a regression model containing the relation between ciphertext transmission fingerprint characteristics and plaintext ADU length fingerprints through machine learning;
(4) when the ciphertext data needs to be identified, the ciphertext data is collected and stored;
(5) carrying out ciphertext transmission fingerprint feature L on the ciphertext data acquired in the step (4) by using the same method in the step (2)encrypted_ADU、NTLSAnd LHTTPhead_TLSExtracting;
(6) using the regression model obtained in the step (3) to extract L in the step (5)encrypted_ADU、NTLSAnd LHTTPhead_TLSSubstituting into regression model to recover fingerprint L with corresponding plaintext lengthplaintext_ADU
In an embodiment of the method of the present invention, in step (1), the method for constructing the application data unit ADU plaintext dictionary by collecting plaintext data information of an application by proxy and extracting plaintext features thereof is as follows:
(1.1) accessing the mobile terminal to a network through a hot spot provided by a PC;
(1.2) running agent software on the PC to acquire plaintext data;
(1.3) extracting the characteristics of the application layer ADU from the plaintext data.
There are different extraction methods depending on the application. For example, for a video transmitted by using a dash (dynamic Adaptive Streaming over http) mechanism, accurate information of plaintext may be obtained from an mpd (media Presentation description) file obtained in plaintext data at the time of video transmission. The MPD file is a metafile of a video segment in DASH mode, and includes video segment information and video segment resource address information. When a video is transmitted in the DASH mode, an MPD file of a resolution corresponding to the video is transmitted at the start of playback and at the time of resolution switching. Through parsing of the MPD file, the plaintext characteristics of these video segments (video ADUs) can be obtained, including the data size length L of the segmentsplaintext_ADUAnd constructing an ADU plaintext dictionary.
In one embodiment of the method, plaintext data information of a video application is collected by a proxy, and plaintext characteristics of the plaintext data information are extracted to form an application data unit ADU plaintext dictionary, wherein part of information of the ADU plaintext dictionary of the video with the number of FBJOZakd2SezQ provided by the YouTube platform is as follows:
Figure BDA0002177088210000051
Figure BDA0002177088210000061
in one embodiment of the method, in the step (2), ciphertext data corresponding to the ADU plaintext dictionary is collected, and ciphertext transmission fingerprint feature L of the AUD is extracted from the ciphertext dataencrypted_ADU、NTLSAnd LHTTPhead_TLSThe method comprises the following steps:
(2.1) accessing the mobile terminal to a network through a hot spot provided by a PC (personal computer), and running an encryption application, wherein if the application is video on demand, a specific video is on demand;
(2.2) acquiring ciphertext data at the access point, wherein in the application example, the ciphertext data can be acquired on a PC;
(2.3) opening the ciphertext data file, and counting the data volume transmitted by the TLS protocol between all the IP pairs, wherein in the application example, the video stream is the IP stream with the largest data volume, so that the IP address pair with the largest flow is extracted;
(2.4) filtering all TCP flows between the pair of IP addresses according to the rule of a quintuple (source IP, source port, TCP, sink IP and sink port);
(2.5) separately extracting L for ADU in each TCP streamencrypted_ADUAnd NTLS
Extraction of Lencrypted_ADU: in each TCP stream, aiming at each HTTP request message from a client to a server, the server returns an HTTP response, the application layer content of the response is the head of the HTTP response and an ADU (application data Unit) which needs to be returned to a user, because the length of the message in a local area network is limited, 1 response needs to be transmitted by a plurality of data messages, and the sum of the application layer data volume lengths of all response messages returned by one ADU request is the ciphertext transmission fingerprint characteristic L of the ADUencrypted_ADU
Extraction of NTLS: before TLS transmission is used, an application layer integrates an HTTP response header and an ADU plaintext and then performs block encryption transmission, because a protocol specifies that each TLS data block cannot exceed 16KB, data of one ADU can be transmitted through a plurality of TLS data blocks, namely, response messages belong to different TLS data blocks respectively, description data of the TLS blocks are arranged at the beginning of each TLS data block, the TLS block description data are not encrypted and the data volume length of each TLS block is given, so that the total number of the TLS blocks contained in all encrypted messages of one HTTP response can be counted, during feature extraction, the TLS blocks used for transmitting the HTTP response header are not counted, and the total number of the TLS blocks is N after being reduced by 1, namely NTLS
(2.6) for each request, an ADU ciphertext is corresponding, the ADU ciphertext comprises a plurality of TLS blocks, if the length of the first TLS block is between 400 bytes and 800 bytes, the first TLS block is a valid sample, and the length of the first TLS block is counted as L of the ADUHTTPhead_TLS
The method of the invention is used for processing a TCP stream in an encrypted transmission video application example to obtain the TCP streamPart of ADU is shown in the following table, where bAccurateA "1" indicates a valid sample.
Figure BDA0002177088210000071
Because the HTTP header information is directly fetched from the memory and the video data is read from the hard disk in response to the server, the two data arrive at the buffer at different speeds, which results in the HTTP header arriving first as a single TLS fragment, and the fragment is significantly smaller than the other TLS fragments, according to the rule, the TLS fragments where most HTTP header ciphertext resides can be effectively determined and the L can be extractedHTTPhead_TLS
In one example of the method of the present invention, in step (3), the transmission ciphertext fingerprint feature extracted in step (2) is used to look up a plaintext dictionary, and a corresponding plaintext ADU length fingerprint L is usedplaintext_ADUThe specific method for obtaining the regression model containing the relation between the ciphertext transmission fingerprint characteristics and the plaintext ADU length fingerprints through machine learning by using the training set as the ciphertext label is as follows:
(3.1) searching all effective samples in a plaintext dictionary according to the ciphertext ADU characteristics in the step (2) to obtain the length fingerprint L of the plaintextplaintext_ADU
In one embodiment of the invention, the collection of the encrypted video data is performed according to the existing video tags in the plaintext dictionary, and the length fingerprint L of the plaintext can be obtained by searching the collected encrypted video ADU corresponding to the corresponding video plaintext ADUplaintext_ADU
(3.2) then using the three eigenvalues L extracted in step (2)encrypted_ADU、NTLSAnd LHTTPhead_TLSMarking the plaintext ADU in the corresponding plaintext dictionary by the value to form a training set;
in an encrypted transmission video application example, a part of training sets obtained by the method of the invention are shown in the following table:
Figure BDA0002177088210000081
and (3.3) performing fitting training on the training set to obtain a regression model of the relation between the transmission ciphertext ADU length fingerprint and the plaintext ADU length fingerprint.
In an encrypted transmission video application example, 12739 data which meet the requirements are trained according to an extraction method of fingerprint characteristics, and a training set and a test set are 7: 3; the trained regression model is as follows:
Figure BDA0002177088210000092
the revised fingerprint Restored _ L was calculated using this formula for 12739 transmission fingerprints that meet the requirements in the datasetADUAnd comparing with the plain text fingerprint, wherein 12738 calculation results are completely matched with the plain text data, and the calculation result of only one data has an error of 232 bytes with the plain text, and the accuracy is 99.99%.
In one example of the method of the present invention, in step (4), when ciphertext data needs to be identified, the method for collecting and storing ciphertext data is as follows:
starting data acquisition equipment at a monitoring point, monitoring a platform source of a video according to needs, collecting IP addresses of corresponding servers to form an IP address list to be detected, and acquiring and storing data messages of which the source IP addresses or the destination IP addresses are contained in the range of the IP addresses to be detected.
In one example of the method of the present invention, in step (5), the ciphertext transmission fingerprint feature L is performed on the ciphertext data collected in step (4) by using the same method in step (2)encrypted_ADU、NTLSAnd LHTTPhead_TLSThe extraction of (1).
In an example of a captured encrypted transmission video, ciphertext transmission fingerprint features as shown in the following table are extracted:
Figure BDA0002177088210000091
Figure BDA0002177088210000101
in one example of the method of the present invention, in step (6), the regression model obtained in step (3) is used to extract L from step (5)encrypted_ADU、NTLSAnd LHTTPhead_TLSSubstituting into regression model to recover fingerprint L with corresponding plaintext lengthplaintext_ADU
In one example of a captured encrypted transmission video, the corresponding plaintext length fingerprints recovered are shown in the following table:
ADU number Lencrypted_ADU bAccurate LHTTPhead_TLS NTLS Restored_LADU
1 79584 1 515 11 78750
2 84703 1 518 12 83837
3 143579 1 522 21 142448
4 110504 1 522 14 109576
5 83243 1 522 12 82373
6 79903 1 522 11 79062
7 141863 1 523 19 140789
8 118294 1 523 16 117307
9 75944 1 523 11 75102
10 129033 1 525 18 127986
The above examples are only preferred embodiments of the present invention, it should be noted that: it will be apparent to those skilled in the art that various modifications and equivalents can be made without departing from the spirit of the invention, and it is intended that all such modifications and equivalents fall within the scope of the invention as defined in the claims.

Claims (5)

1. A method for accurately restoring the clear text length fingerprint of TLS protocol encrypted transmission data is characterized by comprising the following steps:
(1) acquiring plaintext data information of application, and extracting plaintext characteristics of the information to form an Application Data Unit (ADU) plaintext dictionary;
(2) collecting ciphertext data corresponding to the ADU plaintext dictionary, and extracting ciphertext transmission fingerprint feature L of the ADU from the ciphertext dataencrypted_ADU、NTLSAnd LHTTPhead_TLSWherein L isencrypted_ADUThe data length after being encrypted and transmitted for an ADU; n is a radical ofTLSThe number of TLS encryption blocks used for transmitting encrypted data of an ADU; l isHTTPhead_TLSThe length of the TLS block used for transmitting HTTP protocol header information when transmitting the encrypted data of an ADU;
(3) searching a plaintext dictionary by using the ciphertext transmission fingerprint characteristics extracted in the step (2), and acquiring a corresponding plaintext ADU length fingerprint Lplaintext_ADUObtaining a training set as a ciphertext label, and obtaining a regression model containing the relation between ciphertext transmission fingerprint characteristics and plaintext ADU length fingerprints through machine learning;
(4) when the ciphertext data needs to be identified, the ciphertext data is collected and stored;
(5) carrying out ciphertext transmission fingerprint feature L on the ciphertext data acquired in the step (4) by using the same method in the step (2)encrypted_ADU、NTLSAnd LHTTPhead_TLSExtracting;
(6) using the regression model obtained in the step (3) to extract L in the step (5)encrypted_ADU、NTLSAnd LHTTPhead_TLSSubstituting into regression model to recover fingerprint L with corresponding plaintext lengthplaintext_ADU
2. The method according to claim 1, wherein in step (1), the plaintext data information of the application is collected, and the plaintext features of the information are extracted to form an Application Data Unit (ADU) plaintext dictionary:
(1.1) accessing the mobile terminal to a network through a hot spot provided by a PC;
(1.2) running agent software on the PC to acquire plaintext data;
(1.3) extracting the characteristics of the application layer ADU from the plaintext data.
3. The method for accurately recovering the plaintext length fingerprint of the TLS protocol encrypted transmission data according to claim 1 or 2, wherein in the step (2), ciphertext data corresponding to an ADU plaintext dictionary is collected, and the ciphertext transmission fingerprint feature L of the ADU is extracted from the ciphertext dataencrypted_ADU、NTLSAnd LHTTPhead_TLSThe method comprises the following steps:
(2.1) accessing the mobile terminal to a network through a hot spot provided by a PC (personal computer) and running an encryption application;
(2.2) collecting ciphertext data at the access point;
(2.3) opening the ciphertext data file, counting the data volume transmitted by a TLS protocol between all IP pairs, wherein the video stream is the IP stream with the largest data volume, and extracting the IP address pair with the largest flow;
(2.4) filtering all TCP flows between the pair of IP addresses according to the quintuple rule for the flow between the pair of IP addresses;
(2.5) separately extracting L for ADU in each TCP streamencrypted_ADUAnd NTLS
Extraction of Lencrypted_ADU: the sum of the application layer data volume lengths of all response messages returned by an ADU request is the ciphertext transmission fingerprint characteristic L of the ADUencrypted_ADU
Extraction of NTLS: counting the total number of TLS blocks contained in all encrypted messages of one HTTP response, wherein the TLS blocks used for transmitting HTTP response headers are not counted, and the total number of the TLS blocks is subtracted by 1 to obtain NTLS
(2.6) for each request, an ADU ciphertext is corresponding, the ADU ciphertext comprises a plurality of TLS blocks, if the length of the first TLS block is between 400 bytes and 800 bytes, the first TLS block is a valid sample, and the length of the first TLS block is counted as L of the ADUHTTPhead_TLS
4. The method for recovering the fingerprint of the plaintext length of the TLS protocol according to claim 3, wherein the step (3) is performed as follows:
(3.1) for allSearching the valid sample in a plaintext dictionary according to the ciphertext ADU characteristics in the step (2) to obtain the length fingerprint L of the plaintextplaintext_ADU
(3.2) then using the three eigenvalues L extracted in step (2)encrypted_ADU、NTLSAnd LHTTPhead_TLSMarking the plaintext ADU in the corresponding plaintext dictionary by the value to form a training set;
and (3.3) performing fitting training on the training set to obtain a regression model of the relation between the transmission ciphertext ADU length fingerprint and the plaintext ADU length fingerprint.
5. The method for accurately recovering the fingerprint of the plaintext length of the TLS protocol encoded transmission data according to claim 1, 2 or 4, wherein in the step (4), when the ciphertext data needs to be identified, the method for collecting and storing the ciphertext data comprises the following steps: starting data acquisition equipment at a monitoring point, monitoring a platform source of a video according to needs, collecting IP addresses of corresponding servers to form an IP address list to be detected, and acquiring and storing data messages of which the source IP addresses or the destination IP addresses are contained in the range of the IP addresses to be detected.
CN201910782693.7A 2019-08-23 2019-08-23 Method for accurately restoring clear text length fingerprint of TLS protocol encrypted transmission data Active CN110602059B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201910782693.7A CN110602059B (en) 2019-08-23 2019-08-23 Method for accurately restoring clear text length fingerprint of TLS protocol encrypted transmission data

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201910782693.7A CN110602059B (en) 2019-08-23 2019-08-23 Method for accurately restoring clear text length fingerprint of TLS protocol encrypted transmission data

Publications (2)

Publication Number Publication Date
CN110602059A CN110602059A (en) 2019-12-20
CN110602059B true CN110602059B (en) 2021-09-07

Family

ID=68855465

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201910782693.7A Active CN110602059B (en) 2019-08-23 2019-08-23 Method for accurately restoring clear text length fingerprint of TLS protocol encrypted transmission data

Country Status (1)

Country Link
CN (1) CN110602059B (en)

Families Citing this family (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112187774B (en) * 2020-09-23 2023-03-24 东南大学 Encrypted data length reduction method based on HTTP/2 transmission characteristics
CN114915566A (en) * 2021-01-28 2022-08-16 腾讯科技(深圳)有限公司 Application identification method, device, equipment and computer readable storage medium

Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN104009836A (en) * 2014-05-26 2014-08-27 南京泰锐斯通信科技有限公司 Encrypted data detection method and system
US20150229621A1 (en) * 2014-02-13 2015-08-13 Safe Frontier Llc One-time-pad data encryption in communication channels
CN109257358A (en) * 2018-09-28 2019-01-22 成都信息工程大学 A kind of In-vehicle networking intrusion detection method and system based on clock skew

Patent Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20150229621A1 (en) * 2014-02-13 2015-08-13 Safe Frontier Llc One-time-pad data encryption in communication channels
CN104009836A (en) * 2014-05-26 2014-08-27 南京泰锐斯通信科技有限公司 Encrypted data detection method and system
CN109257358A (en) * 2018-09-28 2019-01-22 成都信息工程大学 A kind of In-vehicle networking intrusion detection method and system based on clock skew

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
《移动网络加密YouTube视频流QoE参数识别方法》;潘吴斌、程光等;《计算机学报》;20170601;全文 *

Also Published As

Publication number Publication date
CN110602059A (en) 2019-12-20

Similar Documents

Publication Publication Date Title
US20220368703A1 (en) Method and device for detecting security based on machine learning in combination with rule matching
CN107483488B (en) Malicious Http detection method and system
US20110125748A1 (en) Method and Apparatus for Real Time Identification and Recording of Artifacts
CN106936667B (en) Host real-time identification method based on application program flow distributed analysis
US20120239652A1 (en) Hardware Accelerated Application-Based Pattern Matching for Real Time Classification and Recording of Network Traffic
WO2009093226A3 (en) A method and apparatus for fingerprinting systems and operating systems in a network
CN109275045B (en) DFI-based mobile terminal encrypted video advertisement traffic identification method
CN101639880A (en) File test method and device
CN110868409A (en) Passive operating system identification method and system based on TCP/IP protocol stack fingerprint
CN110602059B (en) Method for accurately restoring clear text length fingerprint of TLS protocol encrypted transmission data
WO2020199603A1 (en) Server vulnerability detection method and apparatus, device, and storage medium
CN106330584A (en) Identification method and identification device of business flow
CN105138709A (en) Remote evidence taking system based on physical memory analysis
CN109831448A (en) For the detection method of particular encryption web page access behavior
CN111147394A (en) Multi-stage classification detection method for remote desktop protocol traffic behavior
Sammour et al. DNS tunneling: A review on features
US20210185059A1 (en) Label guided unsupervised learning based network-level application signature generation
US8910281B1 (en) Identifying malware sources using phishing kit templates
CN102984242A (en) Automatic identification method and device of application protocols
CN112187774B (en) Encrypted data length reduction method based on HTTP/2 transmission characteristics
CN106982147B (en) Communication monitoring method and device for Web communication application
He et al. Mobile app identification for encrypted network flows by traffic correlation
Wu et al. SFIM: Identify user behavior based on stable features
CN112350986B (en) Shaping method and system for audio and video network transmission fragmentation
CN105703930A (en) Session log processing method and session log processing device based on application

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant