IL270392B2 - System and method for estimating sizes of files transferred over encrypted connections - Google Patents
System and method for estimating sizes of files transferred over encrypted connectionsInfo
- Publication number
- IL270392B2 IL270392B2 IL270392A IL27039219A IL270392B2 IL 270392 B2 IL270392 B2 IL 270392B2 IL 270392 A IL270392 A IL 270392A IL 27039219 A IL27039219 A IL 27039219A IL 270392 B2 IL270392 B2 IL 270392B2
- Authority
- IL
- Israel
- Prior art keywords
- file
- sequence
- packets
- size
- processor
- Prior art date
Links
Classifications
-
- G—PHYSICS
- G06—COMPUTING OR CALCULATING; COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F21/00—Security arrangements for protecting computers, components thereof, programs or data against unauthorised activity
- G06F21/60—Protecting data
-
- G—PHYSICS
- G06—COMPUTING OR CALCULATING; COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F21/00—Security arrangements for protecting computers, components thereof, programs or data against unauthorised activity
- G06F21/60—Protecting data
- G06F21/602—Providing cryptographic facilities or services
Landscapes
- Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- Software Systems (AREA)
- General Health & Medical Sciences (AREA)
- Computer Hardware Design (AREA)
- Computer Security & Cryptography (AREA)
- Health & Medical Sciences (AREA)
- Physics & Mathematics (AREA)
- General Engineering & Computer Science (AREA)
- General Physics & Mathematics (AREA)
- Bioethics (AREA)
- Communication Control (AREA)
- Information Transfer Between Computers (AREA)
Description
1011-1141.1 SYSTEM AND METHOD FOR ESTIMATING SIZES OF FILES TRANSFERRED OVERENCRYPTED CONNECTIONS CROSS-REFERENCE TO RELATED APPLICATIONS The present application is related to another application entitled "System and method for identifying exchanges of files" (attorney ref. no. 1011-1141), filed on even date herewith.
FIELD OF THE DISCLOSURE The present disclosure relates to the monitoring of communication traffic generated by users of computer applications.
BACKGROUND OF THE DISCLOSURE Many computer applications use encrypted protocols, such that the communication traffic exchanged by these applications is encrypted. Examples of such applications include WhatsApp, Skype, Line, and Dropbox. Examples of encrypted protocols include the Secure Sockets Layer (SSL) protocol and the Transport Layer Security (TLS) protocol.
US Patent Application Publication 2016/0285978, whose disclosure is incorporated herein by reference, describes a monitoring system that monitors traffic flows exchanged over a communication network. The system characterizes the flows in terms of their temporal traffic features, and uses this characterization to identify communication devices that participate in the same communication session. By identifying the communication devices that serve as endpoints in the same session, the system establishes correlations between the users of these communication devices. The monitoring system characterizes the flows using traffic features such as flow start time, flow end time, inter-burst time and burst size, and/or statistical properties of such features. The system typically generates compressed-form representations ("signatures") for the traffic flows based on the temporal traffic features, and finds matching flows by finding similarities between1 1011-1141.1 signatures.
SUMMARY OF THE DISCLOSURE There is provided, in accordance with some embodiments of the present disclosure, a system including a data storage and a processor. The processor is configured to posit, by analyzing encrypted communication traffic passed over multiple connections without decrypting the traffic, that at least one file was transferred over one connection of the connections. The processor is further configured to group packets belonging to the connection into at least one sequence, in response to the positing. The processor is further configured to compute an estimated size of the file, based on respective sizes of those of the packets belonging to the sequence, and to store the estimated size in the data storage.
In some embodiments, the processor is configured to posit that the file was transferred over the connection in response to identifying, in one of the packets, an identifier of a server known to service file exchanges.
In some embodiments, the identifier is an Internet Protocol (IP) address.
In some embodiments, the processor is configured to posit that the file was transferred over the connection in response to an indication in one of the packets or in another packet that the connection was made by an application used for file transfers.
In some embodiments, the indication includes a specification of a protocol used by a class of applications used for file trans fers.
In some embodiments, the processor is configured to group the packets by demarcating between the sequence and others of the packets that were communicated in the same direction as was the sequence.
In some embodiments, the processor is configured to demarcate2 1011-1141.1 between the sequence and the others of the packets based on a time gap between the sequence and a closest one of the others of the packets.
In some embodiments, the processor is configured to demarcate between the sequence and the others of the packets based on a decrease in throughput at an end of the sequence.
In some embodiments, the processor is configured to compute the estimated size of the file by:computing a sum of the respective sizes, andcomputing the estimated size of the file by dividing the computed sum by a predefined packet-size inflation divisor that is greater than one.
In some embodiments, the predefined packet-size inflation divisor is expressed as a probability distribution, such that the processor is configured to compute the estimated size as another probability distribution.
In some embodiments,the sequence was downloaded by a client, andthe processor is further configured to, prior to computing the estimated size of the file:identify another sequence of other packets downloaded by the client,compute another sum of respective sizes of the other packets,posit that the other sequence carried another file having a known size, andin response to the positing, compute the packet-size inflation divisor by dividing the other sum by the known size.
In some embodiments, the processor is further configured to posit that the other file was downloaded by multiple other clients, and the processor is configured to posit that the other sequence carried the other file in response to a number of the other clients. 1011-1141.1 In some embodiments, the processor is further configured to communicate the other file to the client, such as to cause the other sequence to be downloaded by the client.
In some embodiments, the sequence was exchanged between a server and a client, and the processor is further configured to, prior to computing the estimated size of the file:infer one or more parameters from one or more of the connections belonging to the client, andselect the packet-size inflation divisor from multiple predefined inflation divisors, based on a predefined association between the packet-size inflation divisor and the parameters.
In some embodiments, at least one of the parameters is selected from the group of parameters consisting of: a type of the client, an operating system running on the client, and an encryption protocol used by the client.
In some embodiments,the sequence was uploaded by a client, andthe processor is further configured to, prior to computing the estimated size of the file:identify, with respective levels of confidence, instances in which respective other files were communicated from the client to respective other clients, andbased on the identified instances and on a predefined distribution of another packet-size inflation divisor for downloads, compute the packet-size inflation divisor.
There is further provided, in accordance with some embodiments of the present disclosure, a method including, by analyzing encrypted communication traffic passed over multiple connections without decrypting the traffic, positing that at least one file was transferred over one connection of the connections. The method further includes, in response to the positing, grouping packets belonging to the connection into at least one sequence. The method further includes computing an estimated size of the 1011-1141.1 file, based on respective sizes of those of the packets belonging to the sequence, and storing the estimated size in a database.
There is further provided, in accordance with some embodiments of the present disclosure, a computer software product including a tangible non-transitory computer-readable medium in which program instructions are stored. The instructions, when read by a processor, cause the processor to posit, by analyzing encrypted communication traffic passed over multiple connections without decrypting the traffic, that at least one file was transferred over one connection of the connections. The instructions further cause the processor to group packets belonging to the connection into at least one sequence, in response to the positing. The instructions further cause the processor to compute an estimated size of the file, based on respective sizes of those of the packets belonging to the sequence, and to store the estimated size in a database.
There is further provided, in accordance with some embodiments of the present disclosure, a system including a peripheral device and a processor. The processor is configured to compute a measure of similarity between (i) a first estimated size of first encrypted file content transferred over a network and (ii) a second estimated size of second encrypted file content transferred over the network. The processor is further configured to posit, based on the measure of similarity, that the first encrypted file content and the second encrypted file content represent the same file. The processor is further configured to generate an output to the peripheral device in response to the positing.
In some embodiments, the peripheral device is selected from the group of devices consisting of: a display, and a data storage.
In some embodiments, the first estimated size and the second estimated size are expressed as respective probability distributions.5 1011-1141.1 In some embodiments,the first encrypted file content was uploaded by a first user at a first time,the second encrypted file content was downloaded by a second user at a second time subsequent to the first time,the processor is further configured to compute a difference between the second time and the first time, andthe processor is configured to generate the output responsively to the difference being less than a predefined threshold.
In some embodiments, the output indicates that the first user communicated the file to the second user.
In some embodiments, the processor is configured to generate the output by increasing a relatedness score between the first user and the second user.
In some embodiments,the first encrypted file content was downloaded by a first user at a first time,the second encrypted file content was downloaded by a second user at a second time subsequent to the first time,the processor is configured to generate the output in response to a metadata link having been communicated by the first user between the first time and the second time, andthe output indicates that the metadata link pointed to the file and was communicated to the second user.
In some embodiments, the output indicates that a first user communicated the file to a second user, and the processor is configured to generate the output in response to a relatedness score between the first user and the second user.
In some embodiments, the processor is further configured to identify a frequency with which files having the first estimated size are communicated over the network, and the processor is configured to posit that the first encrypted file content and the 1011-1141.1 second encrypted file content represent the same file with a likelihood that decreases with the frequency.
In some embodiments,the first encrypted file content was downloaded by a first user at a first time,the second encrypted file content was downloaded by a second user at a second time,the processor is further configured to compute a difference between the first time and the second time, andthe processor is configured to generate the output responsively to the difference being less than a predefined threshold.
In some embodiments, the processor is further configured to: receive a query specifying a second-file-content transfer of the second encrypted file content, andidentify a first-file-content transfer of the first encrypted file content in response to the query,the processor is configured to compute the measure of similarity in response to identifying the first-file-content transfer, andthe output includes parameters of the first-file-content transfer.
In some embodiments,the second-file-content transfer was performed using a class of applications, andthe processor is configured to identify the first-file- content transfer of the first encrypted file content by:retrieving, from a database, multiple other-file-content transfers of other encrypted file content, which were performed using the class of applications, andidentifying the first-file-content transfer from among the other-file-content transfers.
In some embodiments, the processor is further configured to: 1011-1141.1 identify multiple other-file-content transfers of other encrypted file content in response to the query, andposit that the file was transferred in each of the other- file-content transfers, andthe processor is configured to generate the output by outputting a timeline of the other-file-content transfers and the first-file-content transfer.
In some embodiments, at least some of the other-file-content transfers were performed using different respective applications.
There is further provided, in accordance with some embodiments of the present disclosure, a method including computing a measure of similarity between (i) a first estimated size of first encrypted file content transferred over a network and (ii) a second estimated size of second encrypted file content transferred over the network. The method further includes, based on the measure of similarity, positing that the first encrypted file content and the second encrypted file content represent the same file, and in response to the positing, generating an output.
There is further provided, in accordance with some embodiments of the present disclosure, a computer software product including a tangible non-transitory computer-readable medium in which program instructions are stored. The instructions, when read by a processor, cause the processor to compute a measure of similarity between (i) a first estimated size of first encrypted file content transferred over a network and (ii) a second estimated size of second encrypted file content transferred over the network. The instructions further cause the processor to posit, based on the measure of similarity, that the first encrypted file content and the second encrypted file content represent the same file, and to generate an output in response to the positing.
The present disclosure will be more fully understood from the following detailed description of embodiments thereof, taken together with the drawings, in which: 1011-1141.1 BRIEF DESCRIPTION OF THE DRAWINGS Fig. 1 is a schematic illustration of a system for monitoring communication exchanged over a network, in accordance with some embodiments of the present disclosure; Fig. 2 is a schematic illustration of a series of file exchanges that may be identified in accordance with some embodiments of the present disclosure; Fig. 3 is a schematic illustration of a method for identifying file transfers, in accordance with some embodiments of the present disclosure; Fig. 4 is a flow diagram for an algorithm for maintaining a file-transfer database, in accordance with some embodiments of the present disclosure; Fig. 5 is a flow diagram for an algorithm for maintaining a relationship database, in accordance with some embodiments of the present disclosure; and Figs. 6-7 are flow diagrams for algorithms for handling queries, in accordance with some embodiments of the present disclosure.
Claims (33)
1. A system, comprising: at least one network tap; a data storage; and a processor, configured to: receive, through the at least one network tap, encrypted communication traffic passed over multiple connections, each of the connections being between a) one of a plurality of user devices and b) one of one or more servers, each server servicing an application on the user device connected thereto; by analyzing the encrypted communication traffic without decrypting the traffic, posit that at least one file was transferred over one connection of the connections, in response to the positing, group encrypted packets belonging to the connection into at least one sequence, compute an estimated size of the file, based on respective sizes of those of the packets belonging to the sequence, and store the estimated size in the data storage.
2. The system according to claim 1, wherein the processor is configured to posit that the file was transferred over the connection in response to identifying, in one of the packets, an identifier of a server known to service file exchanges.
3. The system according to claim 2, wherein the identifier is an Internet Protocol (IP) address.
4. The system according to claim 1, wherein the processor is configured to posit that the file was transferred over the connection in response to an indication in one of the packets or in another packet that the connection was made by an application used for file transfers. 270,392/
5. The system according to claim 4, wherein the indication includes a specification of a protocol used by a class of applications used for file transfers.
6. The system according to claim 1, wherein the processor is configured to group the packets by demarcating between the sequence and others of the packets that were communicated in the same direction as was the sequence.
7. The system according to claim 6, wherein the processor is configured to demarcate between the sequence and the others of the packets based on a time gap between the sequence and a closest one of the others of the packets.
8. The system according to claim 6, wherein the processor is configured to demarcate between the sequence and the others of the packets based on a decrease in throughput at an end of the sequence.
9. The system according to claim 1, wherein the processor is configured to compute the estimated size of the file by: computing a sum of the respective sizes, and computing the estimated size of the file by dividing the computed sum by a predefined packet-size inflation divisor that is greater than one.
10. The system according to claim 9, wherein the predefined packet-size inflation divisor is expressed as a probability distribution, such that the processor is configured to compute the estimated size as another probability distribution.
11. The system according to claim 9, wherein the sequence was downloaded by one of the user devices, and wherein the processor is further configured to, prior to computing the estimated size of the file: identify another sequence of other packets downloaded by the client, compute another sum of respective sizes of the other 270,392/ packets, posit that the other sequence carried another file having a known size, and in response to the positing, compute the packet-size inflation divisor by dividing the other sum by the known size.
12. The system according to claim 11, wherein the processor is further configured to posit that the other file was downloaded by multiple other clients, and wherein the processor is configured to posit that the other sequence carried the other file in response to a number of the other clients.
13. The system according to claim 11, wherein the processor is further configured to communicate the other file to the client, such as to cause the other sequence to be downloaded by the client.
14. The system according to claim 9, wherein the sequence was exchanged between one of the servers and the user device connected thereto, and wherein the processor is further configured to, prior to computing the estimated size of the file: infer one or more parameters from one or more of the connections belonging to the user device, and select the packet-size inflation divisor from multiple predefined inflation divisors, based on a predefined association between the packet-size inflation divisor and the parameters.
15. The system according to claim 14, wherein at least one of the parameters is selected from the group of parameters consisting of: a type of the client, an operating system running on the client, and an encryption protocol used by the client.
16. The system according to claim 9, wherein the sequence was uploaded by one of the user devices, and wherein the processor is further configured to, prior to computing the estimated size of the file: identify, with respective levels of confidence, instances in which respective other files were communicated 270,392/ from the user device to respective other user devices, and based on the identified instances and on a predefined distribution of another packet-size inflation divisor for downloads, compute the packet-size inflation divisor.
17. A method, comprising: receiving, through at least one network tap, encrypted communication traffic passed over multiple connections, each of the connections being between a) one of a plurality of user devices and b) one of one or more servers, each server servicing an application on the user device connected thereto; by analyzing the encrypted communication traffic without decrypting the traffic, positing that at least one file was transferred over one connection of the connections; in response to the positing, grouping encrypted packets belonging to the connection into at least one sequence; computing an estimated size of the file, based on respective sizes of those of the packets belonging to the sequence; and storing the estimated size in a database.
18. The method according to claim 17, wherein the positing comprises positing that the file was transferred over the connection in response to identifying, in one of the packets, an identifier of a server known to service file exchanges.
19. The method according to claim 18, wherein the identifier is an Internet Protocol (IP) address.
20. The method according to claim 17, wherein the positing comprises positing that the file was transferred over the connection in response to an indication in one of the packets or in another packet that the connection was made by an application used for file transfers.
21. The method according to claim 20, wherein the indication includes a specification of a protocol used by a class of applications used for file transfers.
22. The method according to claim 17, wherein grouping the packets 270,392/ comprises grouping the packets by demarcating between the sequence and others of the packets that were communicated in the same direction as was the sequence.
23. The method according to claim 22, wherein demarcating between the sequence and the others of the packets comprises demarcating between the sequence and the others of the packets based on a time gap between the sequence and a closest one of the others of the packets.
24. The method according to claim 22, wherein demarcating between the sequence and the others of the packets comprises demarcating between the sequence and the others of the packets based on a decrease in throughput at an end of the sequence.
25. The method according to claim 17, wherein computing the estimated size of the file comprises: computing a sum of the respective sizes; and computing the estimated size of the file by dividing the computed sum by a predefined packet-size inflation divisor that is greater than one.
26. The method according to claim 25, wherein the predefined packet-size inflation divisor is expressed as a probability distribution, such that computing the estimated size comprises computing the estimated size as another probability distribution.
27. The method according to claim 25, wherein the sequence was downloaded by one of the user devices, and wherein the method further comprises, prior to computing the estimated size of the file: identifying another sequence of other packets downloaded by the client; computing another sum of respective sizes of the other packets; positing that the other sequence carried another file having a known size; and 270,392/ in response to the positing, computing the packet-size inflation divisor by dividing the other sum by the known size.
28. The method according to claim 27, further comprising positing that the other file was downloaded by multiple other clients, wherein positing that the other sequence carried the other file comprises positing that the other sequence carried the other file in response to a number of the other clients.
29. The method according to claim 27, further comprising communicating the other file to the client, such as to cause the other sequence to be downloaded by the client.
30. The method according to claim 25, wherein the sequence was exchanged between one of the servers and the user device connected thereto, and wherein the method further comprises, prior to computing the estimated size of the file: inferring one or more parameters from one or more of the connections belonging to the user device; and selecting the packet-size inflation divisor from multiple predefined inflation divisors, based on a predefined association between the packet-size inflation divisor and the parameters.
31. The method according to claim 30, wherein at least one of the parameters is selected from the group of parameters consisting of: a type of the client, an operating system running on the client, and an encryption protocol used by the client.
32. The method according to claim 25, wherein the sequence was uploaded by one of the user devices, and wherein the method further comprises, prior to computing the estimated size of the file: identifying, with respective levels of confidence, instances in which respective other files were communicated from the user device to respective other user devices; and based on the identified instances and on a predefined distribution of another packet-size inflation divisor for 270,392/ downloads, computing the packet-size inflation divisor.
33. A computer software product comprising a tangible non-transitory computer-readable medium in which program instructions are stored, which instructions, when read by a processor, cause the processor to: receive, through at least one network tap, encrypted communication traffic passed over multiple connections, each of the connections being between a) one of a plurality of user devices and b) one of one or more servers, each server servicing an application on the user device connected thereto; by analyzing the encrypted communication traffic without decrypting the traffic, posit that at least one file was transferred over one connection of the connections, in response to the positing, group encrypted packets belonging to the connection into at least one sequence, compute an estimated size of the file, based on respective sizes of those of the packets belonging to the sequence, and store the estimated size in a database.
Priority Applications (4)
| Application Number | Priority Date | Filing Date | Title |
|---|---|---|---|
| IL270392A IL270392B2 (en) | 2019-11-03 | 2019-11-03 | System and method for estimating sizes of files transferred over encrypted connections |
| US17/082,152 US11399016B2 (en) | 2019-11-03 | 2020-10-28 | System and method for identifying exchanges of encrypted communication traffic |
| PCT/IB2020/060102 WO2021084439A1 (en) | 2019-11-03 | 2020-10-28 | System and method for identifying exchanges of encrypted communication traffic |
| EP20803657.4A EP4046337A1 (en) | 2019-11-03 | 2020-10-28 | System and method for identifying exchanges of encrypted communication traffic |
Applications Claiming Priority (1)
| Application Number | Priority Date | Filing Date | Title |
|---|---|---|---|
| IL270392A IL270392B2 (en) | 2019-11-03 | 2019-11-03 | System and method for estimating sizes of files transferred over encrypted connections |
Publications (3)
| Publication Number | Publication Date |
|---|---|
| IL270392A IL270392A (en) | 2021-05-31 |
| IL270392B1 IL270392B1 (en) | 2023-04-01 |
| IL270392B2 true IL270392B2 (en) | 2023-08-01 |
Family
ID=76584262
Family Applications (1)
| Application Number | Title | Priority Date | Filing Date |
|---|---|---|---|
| IL270392A IL270392B2 (en) | 2019-11-03 | 2019-11-03 | System and method for estimating sizes of files transferred over encrypted connections |
Country Status (1)
| Country | Link |
|---|---|
| IL (1) | IL270392B2 (en) |
Citations (3)
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| US20120331556A1 (en) * | 2011-06-27 | 2012-12-27 | Dmitri Alperovitch | System and method for protocol fingerprinting and reputation correlation |
| US20180316638A1 (en) * | 2017-04-30 | 2018-11-01 | Verint Systems Ltd. | System and method for identifying relationships between users of computer applications |
| JP2019079280A (en) * | 2017-10-25 | 2019-05-23 | 富士ゼロックス株式会社 | File verification device, file transfer system and program |
-
2019
- 2019-11-03 IL IL270392A patent/IL270392B2/en unknown
Patent Citations (3)
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| US20120331556A1 (en) * | 2011-06-27 | 2012-12-27 | Dmitri Alperovitch | System and method for protocol fingerprinting and reputation correlation |
| US20180316638A1 (en) * | 2017-04-30 | 2018-11-01 | Verint Systems Ltd. | System and method for identifying relationships between users of computer applications |
| JP2019079280A (en) * | 2017-10-25 | 2019-05-23 | 富士ゼロックス株式会社 | File verification device, file transfer system and program |
Also Published As
| Publication number | Publication date |
|---|---|
| IL270392A (en) | 2021-05-31 |
| IL270392B1 (en) | 2023-04-01 |
Similar Documents
| Publication | Publication Date | Title |
|---|---|---|
| KR101632187B1 (en) | Methods to combine stateless and stateful server load balancing | |
| CN107438994B (en) | Method, apparatus, and computer storage medium for server load balancing | |
| CN107360159A (en) | A kind of method and device for identifying abnormal encryption flow | |
| CA2947325C (en) | Protocol type identification method and apparatus | |
| WO2018094743A1 (en) | Method for processing packet, and computer device | |
| CN104618253A (en) | Dynamically changed transmission message processing method and device | |
| CN103986747B (en) | File Sharing and Downloading Method in P2P Protocol | |
| CN106716974B (en) | Access distribution method, apparatus and system | |
| US10298653B1 (en) | Methods for monitoring streaming video content quality of experience (QOE) and devices thereof | |
| CN113221146B (en) | Methods and devices for data transmission between blockchain nodes | |
| US10992702B2 (en) | Detecting malware on SPDY connections | |
| US11399016B2 (en) | System and method for identifying exchanges of encrypted communication traffic | |
| US10652626B2 (en) | Gateway, and method, computer program and storage means corresponding thereto | |
| CN106302661A (en) | P2P data accelerated method, device and system | |
| CN110177116B (en) | Secure data transmission method and device for Zhirong identification network | |
| CN108076149B (en) | Session maintaining method and device | |
| US8560715B1 (en) | System, method, and computer program product to automate the flagging of obscure flows as at least potentially unwanted | |
| IL270392B2 (en) | System and method for estimating sizes of files transferred over encrypted connections | |
| US9825942B2 (en) | System and method of authenticating a live video stream | |
| IL270391B (en) | System and method for detecting file transfers | |
| CN107113305A (en) | Apparatus and method for sending and verifying signature | |
| CN119728142A (en) | Method and device for generating actions based on TLS parameters | |
| CN116094738B (en) | Information processing method, device and communication equipment | |
| EP4075727A1 (en) | System and method for identifying services with which encrypted traffic is exchanged | |
| CN115412282A (en) | Message security check method based on MQTT protocol |