IL270392B2 - System and method for estimating sizes of files transferred over encrypted connections - Google Patents

System and method for estimating sizes of files transferred over encrypted connections

Info

Publication number
IL270392B2
IL270392B2 IL270392A IL27039219A IL270392B2 IL 270392 B2 IL270392 B2 IL 270392B2 IL 270392 A IL270392 A IL 270392A IL 27039219 A IL27039219 A IL 27039219A IL 270392 B2 IL270392 B2 IL 270392B2
Authority
IL
Israel
Prior art keywords
file
sequence
packets
size
processor
Prior art date
Application number
IL270392A
Other languages
Hebrew (he)
Other versions
IL270392A (en
IL270392B1 (en
Original Assignee
Cognyte Tech Israel Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Cognyte Tech Israel Ltd filed Critical Cognyte Tech Israel Ltd
Priority to IL270392A priority Critical patent/IL270392B2/en
Priority to US17/082,152 priority patent/US11399016B2/en
Priority to PCT/IB2020/060102 priority patent/WO2021084439A1/en
Priority to EP20803657.4A priority patent/EP4046337A1/en
Publication of IL270392A publication Critical patent/IL270392A/en
Publication of IL270392B1 publication Critical patent/IL270392B1/en
Publication of IL270392B2 publication Critical patent/IL270392B2/en

Links

Classifications

    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F21/00Security arrangements for protecting computers, components thereof, programs or data against unauthorised activity
    • G06F21/60Protecting data
    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F21/00Security arrangements for protecting computers, components thereof, programs or data against unauthorised activity
    • G06F21/60Protecting data
    • G06F21/602Providing cryptographic facilities or services

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Software Systems (AREA)
  • General Health & Medical Sciences (AREA)
  • Computer Hardware Design (AREA)
  • Computer Security & Cryptography (AREA)
  • Health & Medical Sciences (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Bioethics (AREA)
  • Communication Control (AREA)
  • Information Transfer Between Computers (AREA)

Description

1011-1141.1 SYSTEM AND METHOD FOR ESTIMATING SIZES OF FILES TRANSFERRED OVERENCRYPTED CONNECTIONS CROSS-REFERENCE TO RELATED APPLICATIONS The present application is related to another application entitled "System and method for identifying exchanges of files" (attorney ref. no. 1011-1141), filed on even date herewith.
FIELD OF THE DISCLOSURE The present disclosure relates to the monitoring of communication traffic generated by users of computer applications.
BACKGROUND OF THE DISCLOSURE Many computer applications use encrypted protocols, such that the communication traffic exchanged by these applications is encrypted. Examples of such applications include WhatsApp, Skype, Line, and Dropbox. Examples of encrypted protocols include the Secure Sockets Layer (SSL) protocol and the Transport Layer Security (TLS) protocol.
US Patent Application Publication 2016/0285978, whose disclosure is incorporated herein by reference, describes a monitoring system that monitors traffic flows exchanged over a communication network. The system characterizes the flows in terms of their temporal traffic features, and uses this characterization to identify communication devices that participate in the same communication session. By identifying the communication devices that serve as endpoints in the same session, the system establishes correlations between the users of these communication devices. The monitoring system characterizes the flows using traffic features such as flow start time, flow end time, inter-burst time and burst size, and/or statistical properties of such features. The system typically generates compressed-form representations ("signatures") for the traffic flows based on the temporal traffic features, and finds matching flows by finding similarities between1 1011-1141.1 signatures.
SUMMARY OF THE DISCLOSURE There is provided, in accordance with some embodiments of the present disclosure, a system including a data storage and a processor. The processor is configured to posit, by analyzing encrypted communication traffic passed over multiple connections without decrypting the traffic, that at least one file was transferred over one connection of the connections. The processor is further configured to group packets belonging to the connection into at least one sequence, in response to the positing. The processor is further configured to compute an estimated size of the file, based on respective sizes of those of the packets belonging to the sequence, and to store the estimated size in the data storage.
In some embodiments, the processor is configured to posit that the file was transferred over the connection in response to identifying, in one of the packets, an identifier of a server known to service file exchanges.
In some embodiments, the identifier is an Internet Protocol (IP) address.
In some embodiments, the processor is configured to posit that the file was transferred over the connection in response to an indication in one of the packets or in another packet that the connection was made by an application used for file transfers.
In some embodiments, the indication includes a specification of a protocol used by a class of applications used for file trans fers.
In some embodiments, the processor is configured to group the packets by demarcating between the sequence and others of the packets that were communicated in the same direction as was the sequence.
In some embodiments, the processor is configured to demarcate2 1011-1141.1 between the sequence and the others of the packets based on a time gap between the sequence and a closest one of the others of the packets.
In some embodiments, the processor is configured to demarcate between the sequence and the others of the packets based on a decrease in throughput at an end of the sequence.
In some embodiments, the processor is configured to compute the estimated size of the file by:computing a sum of the respective sizes, andcomputing the estimated size of the file by dividing the computed sum by a predefined packet-size inflation divisor that is greater than one.
In some embodiments, the predefined packet-size inflation divisor is expressed as a probability distribution, such that the processor is configured to compute the estimated size as another probability distribution.
In some embodiments,the sequence was downloaded by a client, andthe processor is further configured to, prior to computing the estimated size of the file:identify another sequence of other packets downloaded by the client,compute another sum of respective sizes of the other packets,posit that the other sequence carried another file having a known size, andin response to the positing, compute the packet-size inflation divisor by dividing the other sum by the known size.
In some embodiments, the processor is further configured to posit that the other file was downloaded by multiple other clients, and the processor is configured to posit that the other sequence carried the other file in response to a number of the other clients. 1011-1141.1 In some embodiments, the processor is further configured to communicate the other file to the client, such as to cause the other sequence to be downloaded by the client.
In some embodiments, the sequence was exchanged between a server and a client, and the processor is further configured to, prior to computing the estimated size of the file:infer one or more parameters from one or more of the connections belonging to the client, andselect the packet-size inflation divisor from multiple predefined inflation divisors, based on a predefined association between the packet-size inflation divisor and the parameters.
In some embodiments, at least one of the parameters is selected from the group of parameters consisting of: a type of the client, an operating system running on the client, and an encryption protocol used by the client.
In some embodiments,the sequence was uploaded by a client, andthe processor is further configured to, prior to computing the estimated size of the file:identify, with respective levels of confidence, instances in which respective other files were communicated from the client to respective other clients, andbased on the identified instances and on a predefined distribution of another packet-size inflation divisor for downloads, compute the packet-size inflation divisor.
There is further provided, in accordance with some embodiments of the present disclosure, a method including, by analyzing encrypted communication traffic passed over multiple connections without decrypting the traffic, positing that at least one file was transferred over one connection of the connections. The method further includes, in response to the positing, grouping packets belonging to the connection into at least one sequence. The method further includes computing an estimated size of the 1011-1141.1 file, based on respective sizes of those of the packets belonging to the sequence, and storing the estimated size in a database.
There is further provided, in accordance with some embodiments of the present disclosure, a computer software product including a tangible non-transitory computer-readable medium in which program instructions are stored. The instructions, when read by a processor, cause the processor to posit, by analyzing encrypted communication traffic passed over multiple connections without decrypting the traffic, that at least one file was transferred over one connection of the connections. The instructions further cause the processor to group packets belonging to the connection into at least one sequence, in response to the positing. The instructions further cause the processor to compute an estimated size of the file, based on respective sizes of those of the packets belonging to the sequence, and to store the estimated size in a database.
There is further provided, in accordance with some embodiments of the present disclosure, a system including a peripheral device and a processor. The processor is configured to compute a measure of similarity between (i) a first estimated size of first encrypted file content transferred over a network and (ii) a second estimated size of second encrypted file content transferred over the network. The processor is further configured to posit, based on the measure of similarity, that the first encrypted file content and the second encrypted file content represent the same file. The processor is further configured to generate an output to the peripheral device in response to the positing.
In some embodiments, the peripheral device is selected from the group of devices consisting of: a display, and a data storage.
In some embodiments, the first estimated size and the second estimated size are expressed as respective probability distributions.5 1011-1141.1 In some embodiments,the first encrypted file content was uploaded by a first user at a first time,the second encrypted file content was downloaded by a second user at a second time subsequent to the first time,the processor is further configured to compute a difference between the second time and the first time, andthe processor is configured to generate the output responsively to the difference being less than a predefined threshold.
In some embodiments, the output indicates that the first user communicated the file to the second user.
In some embodiments, the processor is configured to generate the output by increasing a relatedness score between the first user and the second user.
In some embodiments,the first encrypted file content was downloaded by a first user at a first time,the second encrypted file content was downloaded by a second user at a second time subsequent to the first time,the processor is configured to generate the output in response to a metadata link having been communicated by the first user between the first time and the second time, andthe output indicates that the metadata link pointed to the file and was communicated to the second user.
In some embodiments, the output indicates that a first user communicated the file to a second user, and the processor is configured to generate the output in response to a relatedness score between the first user and the second user.
In some embodiments, the processor is further configured to identify a frequency with which files having the first estimated size are communicated over the network, and the processor is configured to posit that the first encrypted file content and the 1011-1141.1 second encrypted file content represent the same file with a likelihood that decreases with the frequency.
In some embodiments,the first encrypted file content was downloaded by a first user at a first time,the second encrypted file content was downloaded by a second user at a second time,the processor is further configured to compute a difference between the first time and the second time, andthe processor is configured to generate the output responsively to the difference being less than a predefined threshold.
In some embodiments, the processor is further configured to: receive a query specifying a second-file-content transfer of the second encrypted file content, andidentify a first-file-content transfer of the first encrypted file content in response to the query,the processor is configured to compute the measure of similarity in response to identifying the first-file-content transfer, andthe output includes parameters of the first-file-content transfer.
In some embodiments,the second-file-content transfer was performed using a class of applications, andthe processor is configured to identify the first-file- content transfer of the first encrypted file content by:retrieving, from a database, multiple other-file-content transfers of other encrypted file content, which were performed using the class of applications, andidentifying the first-file-content transfer from among the other-file-content transfers.
In some embodiments, the processor is further configured to: 1011-1141.1 identify multiple other-file-content transfers of other encrypted file content in response to the query, andposit that the file was transferred in each of the other- file-content transfers, andthe processor is configured to generate the output by outputting a timeline of the other-file-content transfers and the first-file-content transfer.
In some embodiments, at least some of the other-file-content transfers were performed using different respective applications.
There is further provided, in accordance with some embodiments of the present disclosure, a method including computing a measure of similarity between (i) a first estimated size of first encrypted file content transferred over a network and (ii) a second estimated size of second encrypted file content transferred over the network. The method further includes, based on the measure of similarity, positing that the first encrypted file content and the second encrypted file content represent the same file, and in response to the positing, generating an output.
There is further provided, in accordance with some embodiments of the present disclosure, a computer software product including a tangible non-transitory computer-readable medium in which program instructions are stored. The instructions, when read by a processor, cause the processor to compute a measure of similarity between (i) a first estimated size of first encrypted file content transferred over a network and (ii) a second estimated size of second encrypted file content transferred over the network. The instructions further cause the processor to posit, based on the measure of similarity, that the first encrypted file content and the second encrypted file content represent the same file, and to generate an output in response to the positing.
The present disclosure will be more fully understood from the following detailed description of embodiments thereof, taken together with the drawings, in which: 1011-1141.1 BRIEF DESCRIPTION OF THE DRAWINGS Fig. 1 is a schematic illustration of a system for monitoring communication exchanged over a network, in accordance with some embodiments of the present disclosure; Fig. 2 is a schematic illustration of a series of file exchanges that may be identified in accordance with some embodiments of the present disclosure; Fig. 3 is a schematic illustration of a method for identifying file transfers, in accordance with some embodiments of the present disclosure; Fig. 4 is a flow diagram for an algorithm for maintaining a file-transfer database, in accordance with some embodiments of the present disclosure; Fig. 5 is a flow diagram for an algorithm for maintaining a relationship database, in accordance with some embodiments of the present disclosure; and Figs. 6-7 are flow diagrams for algorithms for handling queries, in accordance with some embodiments of the present disclosure.

Claims (33)

270,392/ CLAIMS
1. A system, comprising: at least one network tap; a data storage; and a processor, configured to: receive, through the at least one network tap, encrypted communication traffic passed over multiple connections, each of the connections being between a) one of a plurality of user devices and b) one of one or more servers, each server servicing an application on the user device connected thereto; by analyzing the encrypted communication traffic without decrypting the traffic, posit that at least one file was transferred over one connection of the connections, in response to the positing, group encrypted packets belonging to the connection into at least one sequence, compute an estimated size of the file, based on respective sizes of those of the packets belonging to the sequence, and store the estimated size in the data storage.
2. The system according to claim 1, wherein the processor is configured to posit that the file was transferred over the connection in response to identifying, in one of the packets, an identifier of a server known to service file exchanges.
3. The system according to claim 2, wherein the identifier is an Internet Protocol (IP) address.
4. The system according to claim 1, wherein the processor is configured to posit that the file was transferred over the connection in response to an indication in one of the packets or in another packet that the connection was made by an application used for file transfers. 270,392/
5. The system according to claim 4, wherein the indication includes a specification of a protocol used by a class of applications used for file transfers.
6. The system according to claim 1, wherein the processor is configured to group the packets by demarcating between the sequence and others of the packets that were communicated in the same direction as was the sequence.
7. The system according to claim 6, wherein the processor is configured to demarcate between the sequence and the others of the packets based on a time gap between the sequence and a closest one of the others of the packets.
8. The system according to claim 6, wherein the processor is configured to demarcate between the sequence and the others of the packets based on a decrease in throughput at an end of the sequence.
9. The system according to claim 1, wherein the processor is configured to compute the estimated size of the file by: computing a sum of the respective sizes, and computing the estimated size of the file by dividing the computed sum by a predefined packet-size inflation divisor that is greater than one.
10. The system according to claim 9, wherein the predefined packet-size inflation divisor is expressed as a probability distribution, such that the processor is configured to compute the estimated size as another probability distribution.
11. The system according to claim 9, wherein the sequence was downloaded by one of the user devices, and wherein the processor is further configured to, prior to computing the estimated size of the file: identify another sequence of other packets downloaded by the client, compute another sum of respective sizes of the other 270,392/ packets, posit that the other sequence carried another file having a known size, and in response to the positing, compute the packet-size inflation divisor by dividing the other sum by the known size.
12. The system according to claim 11, wherein the processor is further configured to posit that the other file was downloaded by multiple other clients, and wherein the processor is configured to posit that the other sequence carried the other file in response to a number of the other clients.
13. The system according to claim 11, wherein the processor is further configured to communicate the other file to the client, such as to cause the other sequence to be downloaded by the client.
14. The system according to claim 9, wherein the sequence was exchanged between one of the servers and the user device connected thereto, and wherein the processor is further configured to, prior to computing the estimated size of the file: infer one or more parameters from one or more of the connections belonging to the user device, and select the packet-size inflation divisor from multiple predefined inflation divisors, based on a predefined association between the packet-size inflation divisor and the parameters.
15. The system according to claim 14, wherein at least one of the parameters is selected from the group of parameters consisting of: a type of the client, an operating system running on the client, and an encryption protocol used by the client.
16. The system according to claim 9, wherein the sequence was uploaded by one of the user devices, and wherein the processor is further configured to, prior to computing the estimated size of the file: identify, with respective levels of confidence, instances in which respective other files were communicated 270,392/ from the user device to respective other user devices, and based on the identified instances and on a predefined distribution of another packet-size inflation divisor for downloads, compute the packet-size inflation divisor.
17. A method, comprising: receiving, through at least one network tap, encrypted communication traffic passed over multiple connections, each of the connections being between a) one of a plurality of user devices and b) one of one or more servers, each server servicing an application on the user device connected thereto; by analyzing the encrypted communication traffic without decrypting the traffic, positing that at least one file was transferred over one connection of the connections; in response to the positing, grouping encrypted packets belonging to the connection into at least one sequence; computing an estimated size of the file, based on respective sizes of those of the packets belonging to the sequence; and storing the estimated size in a database.
18. The method according to claim 17, wherein the positing comprises positing that the file was transferred over the connection in response to identifying, in one of the packets, an identifier of a server known to service file exchanges.
19. The method according to claim 18, wherein the identifier is an Internet Protocol (IP) address.
20. The method according to claim 17, wherein the positing comprises positing that the file was transferred over the connection in response to an indication in one of the packets or in another packet that the connection was made by an application used for file transfers.
21. The method according to claim 20, wherein the indication includes a specification of a protocol used by a class of applications used for file transfers.
22. The method according to claim 17, wherein grouping the packets 270,392/ comprises grouping the packets by demarcating between the sequence and others of the packets that were communicated in the same direction as was the sequence.
23. The method according to claim 22, wherein demarcating between the sequence and the others of the packets comprises demarcating between the sequence and the others of the packets based on a time gap between the sequence and a closest one of the others of the packets.
24. The method according to claim 22, wherein demarcating between the sequence and the others of the packets comprises demarcating between the sequence and the others of the packets based on a decrease in throughput at an end of the sequence.
25. The method according to claim 17, wherein computing the estimated size of the file comprises: computing a sum of the respective sizes; and computing the estimated size of the file by dividing the computed sum by a predefined packet-size inflation divisor that is greater than one.
26. The method according to claim 25, wherein the predefined packet-size inflation divisor is expressed as a probability distribution, such that computing the estimated size comprises computing the estimated size as another probability distribution.
27. The method according to claim 25, wherein the sequence was downloaded by one of the user devices, and wherein the method further comprises, prior to computing the estimated size of the file: identifying another sequence of other packets downloaded by the client; computing another sum of respective sizes of the other packets; positing that the other sequence carried another file having a known size; and 270,392/ in response to the positing, computing the packet-size inflation divisor by dividing the other sum by the known size.
28. The method according to claim 27, further comprising positing that the other file was downloaded by multiple other clients, wherein positing that the other sequence carried the other file comprises positing that the other sequence carried the other file in response to a number of the other clients.
29. The method according to claim 27, further comprising communicating the other file to the client, such as to cause the other sequence to be downloaded by the client.
30. The method according to claim 25, wherein the sequence was exchanged between one of the servers and the user device connected thereto, and wherein the method further comprises, prior to computing the estimated size of the file: inferring one or more parameters from one or more of the connections belonging to the user device; and selecting the packet-size inflation divisor from multiple predefined inflation divisors, based on a predefined association between the packet-size inflation divisor and the parameters.
31. The method according to claim 30, wherein at least one of the parameters is selected from the group of parameters consisting of: a type of the client, an operating system running on the client, and an encryption protocol used by the client.
32. The method according to claim 25, wherein the sequence was uploaded by one of the user devices, and wherein the method further comprises, prior to computing the estimated size of the file: identifying, with respective levels of confidence, instances in which respective other files were communicated from the user device to respective other user devices; and based on the identified instances and on a predefined distribution of another packet-size inflation divisor for 270,392/ downloads, computing the packet-size inflation divisor.
33. A computer software product comprising a tangible non-transitory computer-readable medium in which program instructions are stored, which instructions, when read by a processor, cause the processor to: receive, through at least one network tap, encrypted communication traffic passed over multiple connections, each of the connections being between a) one of a plurality of user devices and b) one of one or more servers, each server servicing an application on the user device connected thereto; by analyzing the encrypted communication traffic without decrypting the traffic, posit that at least one file was transferred over one connection of the connections, in response to the positing, group encrypted packets belonging to the connection into at least one sequence, compute an estimated size of the file, based on respective sizes of those of the packets belonging to the sequence, and store the estimated size in a database.
IL270392A 2019-11-03 2019-11-03 System and method for estimating sizes of files transferred over encrypted connections IL270392B2 (en)

Priority Applications (4)

Application Number Priority Date Filing Date Title
IL270392A IL270392B2 (en) 2019-11-03 2019-11-03 System and method for estimating sizes of files transferred over encrypted connections
US17/082,152 US11399016B2 (en) 2019-11-03 2020-10-28 System and method for identifying exchanges of encrypted communication traffic
PCT/IB2020/060102 WO2021084439A1 (en) 2019-11-03 2020-10-28 System and method for identifying exchanges of encrypted communication traffic
EP20803657.4A EP4046337A1 (en) 2019-11-03 2020-10-28 System and method for identifying exchanges of encrypted communication traffic

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
IL270392A IL270392B2 (en) 2019-11-03 2019-11-03 System and method for estimating sizes of files transferred over encrypted connections

Publications (3)

Publication Number Publication Date
IL270392A IL270392A (en) 2021-05-31
IL270392B1 IL270392B1 (en) 2023-04-01
IL270392B2 true IL270392B2 (en) 2023-08-01

Family

ID=76584262

Family Applications (1)

Application Number Title Priority Date Filing Date
IL270392A IL270392B2 (en) 2019-11-03 2019-11-03 System and method for estimating sizes of files transferred over encrypted connections

Country Status (1)

Country Link
IL (1) IL270392B2 (en)

Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20120331556A1 (en) * 2011-06-27 2012-12-27 Dmitri Alperovitch System and method for protocol fingerprinting and reputation correlation
US20180316638A1 (en) * 2017-04-30 2018-11-01 Verint Systems Ltd. System and method for identifying relationships between users of computer applications
JP2019079280A (en) * 2017-10-25 2019-05-23 富士ゼロックス株式会社 File verification device, file transfer system and program

Patent Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20120331556A1 (en) * 2011-06-27 2012-12-27 Dmitri Alperovitch System and method for protocol fingerprinting and reputation correlation
US20180316638A1 (en) * 2017-04-30 2018-11-01 Verint Systems Ltd. System and method for identifying relationships between users of computer applications
JP2019079280A (en) * 2017-10-25 2019-05-23 富士ゼロックス株式会社 File verification device, file transfer system and program

Also Published As

Publication number Publication date
IL270392A (en) 2021-05-31
IL270392B1 (en) 2023-04-01

Similar Documents

Publication Publication Date Title
KR101632187B1 (en) Methods to combine stateless and stateful server load balancing
CN107438994B (en) Method, apparatus, and computer storage medium for server load balancing
CN107360159A (en) A kind of method and device for identifying abnormal encryption flow
CA2947325C (en) Protocol type identification method and apparatus
WO2018094743A1 (en) Method for processing packet, and computer device
CN104618253A (en) Dynamically changed transmission message processing method and device
CN103986747B (en) File Sharing and Downloading Method in P2P Protocol
CN106716974B (en) Access distribution method, apparatus and system
US10298653B1 (en) Methods for monitoring streaming video content quality of experience (QOE) and devices thereof
CN113221146B (en) Methods and devices for data transmission between blockchain nodes
US10992702B2 (en) Detecting malware on SPDY connections
US11399016B2 (en) System and method for identifying exchanges of encrypted communication traffic
US10652626B2 (en) Gateway, and method, computer program and storage means corresponding thereto
CN106302661A (en) P2P data accelerated method, device and system
CN110177116B (en) Secure data transmission method and device for Zhirong identification network
CN108076149B (en) Session maintaining method and device
US8560715B1 (en) System, method, and computer program product to automate the flagging of obscure flows as at least potentially unwanted
IL270392B2 (en) System and method for estimating sizes of files transferred over encrypted connections
US9825942B2 (en) System and method of authenticating a live video stream
IL270391B (en) System and method for detecting file transfers
CN107113305A (en) Apparatus and method for sending and verifying signature
CN119728142A (en) Method and device for generating actions based on TLS parameters
CN116094738B (en) Information processing method, device and communication equipment
EP4075727A1 (en) System and method for identifying services with which encrypted traffic is exchanged
CN115412282A (en) Message security check method based on MQTT protocol