IL270391B - System and method for identifying exchanges of files - Google Patents

System and method for identifying exchanges of files

Info

Publication number
IL270391B
IL270391B IL270391A IL27039119A IL270391B IL 270391 B IL270391 B IL 270391B IL 270391 A IL270391 A IL 270391A IL 27039119 A IL27039119 A IL 27039119A IL 270391 B IL270391 B IL 270391B
Authority
IL
Israel
Prior art keywords
file
content
user device
encrypted file
time
Prior art date
Application number
IL270391A
Other languages
Hebrew (he)
Other versions
IL270391A (en
Original Assignee
Cognyte Tech Israel Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Cognyte Tech Israel Ltd filed Critical Cognyte Tech Israel Ltd
Priority to IL270391A priority Critical patent/IL270391B/en
Priority to EP20803657.4A priority patent/EP4046337A1/en
Priority to PCT/IB2020/060102 priority patent/WO2021084439A1/en
Priority to US17/082,152 priority patent/US11399016B2/en
Publication of IL270391A publication Critical patent/IL270391A/en
Publication of IL270391B publication Critical patent/IL270391B/en

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F1/00Details not covered by groups G06F3/00 - G06F13/00 and G06F21/00
    • G06F1/02Digital function generators
    • G06F1/025Digital function generators for functions having two-valued amplitude, e.g. Walsh functions
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L1/00Arrangements for detecting or preventing errors in the information received
    • H04L1/004Arrangements for detecting or preventing errors in the information received by using forward error control
    • H04L1/0056Systems characterized by the type of code used
    • H04L1/0061Error detection codes
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L1/00Arrangements for detecting or preventing errors in the information received
    • H04L1/004Arrangements for detecting or preventing errors in the information received by using forward error control
    • H04L1/0056Systems characterized by the type of code used
    • H04L1/0061Error detection codes
    • H04L1/0063Single parity check
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L1/00Arrangements for detecting or preventing errors in the information received
    • H04L1/004Arrangements for detecting or preventing errors in the information received by using forward error control
    • H04L1/0056Systems characterized by the type of code used
    • H04L1/0064Concatenated codes

Landscapes

  • Engineering & Computer Science (AREA)
  • Computer Networks & Wireless Communication (AREA)
  • Signal Processing (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Description

1011-1141 SYSTEM AND METHOD FOR IDENTIFYING EXCHANGES OF FILES CROSS-REFERENCE TO RELATED APPLICATIONS The present application is related to another application entitled "System and method for estimating sizes of files transferred over encrypted connections" (attorney ref. no. 1011­1141.1), filed on even date herewith.
FIELD OF THE DISCLOSURE The present disclosure relates to the monitoring of communication traffic generated by users of computer applications.
BACKGROUND OF THE DISCLOSURE Many computer applications use encrypted protocols, such that the communication traffic exchanged by these applications is encrypted. Examples of such applications include WhatsApp, Skype, Line, and Dropbox. Examples of encrypted protocols include the Secure Sockets Layer (SSL) protocol and the Transport Layer Security (TLS) protocol.
US Patent Application Publication 2016/0285978, whose disclosure is incorporated herein by reference, describes a monitoring system that monitors traffic flows exchanged over a communication network. The system characterizes the flows in terms of their temporal traffic features, and uses this characterization to identify communication devices that participate in the same communication session. By identifying the communication devices that serve as endpoints in the same session, the system establishes correlations between the users of these communication devices. The monitoring system characterizes the flows using traffic features such as flow start time, flow end time, inter-burst time and burst size, and/or statistical properties of such features. The system typically generates compressed-form representations ("signatures") for the traffic flows based on the temporal traffic features, and finds matching flows by finding similarities between1 1011-1141 signatures.
SUMMARY OF THE DISCLOSURE There is provided, in accordance with some embodiments of the present disclosure, a system including a data storage and a processor. The processor is configured to posit, by analyzing encrypted communication traffic passed over multiple connections without decrypting the traffic, that at least one file was transferred over one connection of the connections. The processor is further configured to group packets belonging to the connection into at least one sequence, in response to the positing. The processor is further configured to compute an estimated size of the file, based on respective sizes of those of the packets belonging to the sequence, and to store the estimated size in the data storage.
In some embodiments, the processor is configured to posit that the file was transferred over the connection in response to identifying, in one of the packets, an identifier of a server known to service file exchanges.
In some embodiments, the identifier is an Internet Protocol (IP) address.
In some embodiments, the processor is configured to posit that the file was transferred over the connection in response to an indication in one of the packets or in another packet that the connection was made by an application used for file transfers.
In some embodiments, the indication includes a specification of a protocol used by a class of applications used for file trans fers.
In some embodiments, the processor is configured to group the packets by demarcating between the sequence and others of the packets that were communicated in the same direction as was the sequence.
In some embodiments, the processor is configured to demarcate2 1011-1141 between the sequence and the others of the packets based on a time gap between the sequence and a closest one of the others of the packets.
In some embodiments, the processor is configured to demarcate between the sequence and the others of the packets based on a decrease in throughput at an end of the sequence.
In some embodiments, the processor is configured to compute the estimated size of the file by:computing a sum of the respective sizes, andcomputing the estimated size of the file by dividing the computed sum by a predefined packet-size inflation divisor that is greater than one.
In some embodiments, the predefined packet-size inflation divisor is expressed as a probability distribution, such that the processor is configured to compute the estimated size as another probability distribution.
In some embodiments,the sequence was downloaded by a client, andthe processor is further configured to, prior to computing the estimated size of the file:identify another sequence of other packets downloaded by the client,compute another sum of respective sizes of the other packets,posit that the other sequence carried another file having a known size, andin response to the positing, compute the packet-size inflation divisor by dividing the other sum by the known size.
In some embodiments, the processor is further configured to posit that the other file was downloaded by multiple other clients, and the processor is configured to posit that the other sequence carried the other file in response to a number of the other clients. 1011-1141 In some embodiments, the processor is further configured to communicate the other file to the client, such as to cause the other sequence to be downloaded by the client.
In some embodiments, the sequence was exchanged between a server and a client, and the processor is further configured to, prior to computing the estimated size of the file:infer one or more parameters from one or more of the connections belonging to the client, andselect the packet-size inflation divisor from multiple predefined inflation divisors, based on a predefined association between the packet-size inflation divisor and the parameters.
In some embodiments, at least one of the parameters is selected from the group of parameters consisting of: a type of the client, an operating system running on the client, and an encryption protocol used by the client.
In some embodiments,the sequence was uploaded by a client, andthe processor is further configured to, prior to computing the estimated size of the file:identify, with respective levels of confidence, instances in which respective other files were communicated from the client to respective other clients, andbased on the identified instances and on a predefined distribution of another packet-size inflation divisor for downloads, compute the packet-size inflation divisor.
There is further provided, in accordance with some embodiments of the present disclosure, a method including, by analyzing encrypted communication traffic passed over multiple connections without decrypting the traffic, positing that at least one file was transferred over one connection of the connections. The method further includes, in response to the positing, grouping packets belonging to the connection into at least one sequence. The method further includes computing an estimated size of the 1011-1141 file, based on respective sizes of those of the packets belonging to the sequence, and storing the estimated size in a database.
There is further provided, in accordance with some embodiments of the present disclosure, a computer software product including a tangible non-transitory computer-readable medium in which program instructions are stored. The instructions, when read by a processor, cause the processor to posit, by analyzing encrypted communication traffic passed over multiple connections without decrypting the traffic, that at least one file was transferred over one connection of the connections. The instructions further cause the processor to group packets belonging to the connection into at least one sequence, in response to the positing. The instructions further cause the processor to compute an estimated size of the file, based on respective sizes of those of the packets belonging to the sequence, and to store the estimated size in a database.
There is further provided, in accordance with some embodiments of the present disclosure, a system including a peripheral device and a processor. The processor is configured to compute a measure of similarity between (i) a first estimated size of first encrypted file content transferred over a network and (ii) a second estimated size of second encrypted file content transferred over the network. The processor is further configured to posit, based on the measure of similarity, that the first encrypted file content and the second encrypted file content represent the same file. The processor is further configured to generate an output to the peripheral device in response to the positing.
In some embodiments, the peripheral device is selected from the group of devices consisting of: a display, and a data storage.
In some embodiments, the first estimated size and the second estimated size are expressed as respective probability distributions.5 1011-1141 In some embodiments,the first encrypted file content was uploaded by a first user at a first time,the second encrypted file content was downloaded by a second user at a second time subsequent to the first time,the processor is further configured to compute a difference between the second time and the first time, andthe processor is configured to generate the output responsively to the difference being less than a predefined threshold.
In some embodiments, the output indicates that the first user communicated the file to the second user.
In some embodiments, the processor is configured to generate the output by increasing a relatedness score between the first user and the second user.
In some embodiments,the first encrypted file content was downloaded by a first user at a first time,the second encrypted file content was downloaded by a second user at a second time subsequent to the first time,the processor is configured to generate the output in response to a metadata link having been communicated by the first user between the first time and the second time, andthe output indicates that the metadata link pointed to the file and was communicated to the second user.
In some embodiments, the output indicates that a first user communicated the file to a second user, and the processor is configured to generate the output in response to a relatedness score between the first user and the second user.
In some embodiments, the processor is further configured to identify a frequency with which files having the first estimated size are communicated over the network, and the processor is configured to posit that the first encrypted file content and the 1011-1141 second encrypted file content represent the same file with a likelihood that decreases with the frequency.
In some embodiments,the first encrypted file content was downloaded by a first user at a first time,the second encrypted file content was downloaded by a second user at a second time,the processor is further configured to compute a difference between the first time and the second time, andthe processor is configured to generate the output responsively to the difference being less than a predefined threshold.
In some embodiments, the processor is further configured to: receive a query specifying a second-file-content transfer of the second encrypted file content, andidentify a first-file-content transfer of the first encrypted file content in response to the query,the processor is configured to compute the measure of similarity in response to identifying the first-file-content transfer, andthe output includes parameters of the first-file-content transfer.
In some embodiments,the second-file-content transfer was performed using a class of applications, andthe processor is configured to identify the first-file- content transfer of the first encrypted file content by:retrieving, from a database, multiple other-file-content transfers of other encrypted file content, which were performed using the class of applications, andidentifying the first-file-content transfer from among the other-file-content transfers.
In some embodiments, the processor is further configured to: 1011-1141 identify multiple other-file-content transfers of other encrypted file content in response to the query, andposit that the file was transferred in each of the other- file-content transfers, andthe processor is configured to generate the output by outputting a timeline of the other-file-content transfers and the first-file-content transfer.
In some embodiments, at least some of the other-file-content transfers were performed using different respective applications.
There is further provided, in accordance with some embodiments of the present disclosure, a method including computing a measure of similarity between (i) a first estimated size of first encrypted file content transferred over a network and (ii) a second estimated size of second encrypted file content transferred over the network. The method further includes, based on the measure of similarity, positing that the first encrypted file content and the second encrypted file content represent the same file, and in response to the positing, generating an output.
There is further provided, in accordance with some embodiments of the present disclosure, a computer software product including a tangible non-transitory computer-readable medium in which program instructions are stored. The instructions, when read by a processor, cause the processor to compute a measure of similarity between (i) a first estimated size of first encrypted file content transferred over a network and (ii) a second estimated size of second encrypted file content transferred over the network. The instructions further cause the processor to posit, based on the measure of similarity, that the first encrypted file content and the second encrypted file content represent the same file, and to generate an output in response to the positing.
The present disclosure will be more fully understood from the following detailed description of embodiments thereof, taken together with the drawings, in which: 1011-1141 BRIEF DESCRIPTION OF THE DRAWINGS Fig. 1 is a schematic illustration of a system for monitoring communication exchanged over a network, in accordance with some embodiments of the present disclosure; Fig. 2 is a schematic illustration of a series of file exchanges that may be identified in accordance with some embodiments of the present disclosure; Fig. 3 is a schematic illustration of a method for identifying file transfers, in accordance with some embodiments of the present disclosure; Fig. 4 is a flow diagram for an algorithm for maintaining a file-transfer database, in accordance with some embodiments of the present disclosure; Fig. 5 is a flow diagram for an algorithm for maintaining a relationship database, in accordance with some embodiments of the present disclosure; and Figs. 6-7 are flow diagrams for algorithms for handling queries, in accordance with some embodiments of the present disclosure.

Claims (28)

270,391/ CLAIMS
1. A system, comprising: a peripheral device; and a processor, configured to: compute a measure of similarity between (i) a first estimated size of first encrypted file content transferred over a network, via a first application on a first user device, over a first connection between the first user device and a first application server servicing the first application, and (ii) a second estimated size of second encrypted file content transferred over the network, via a second application on a second user device, over a second connection between the second user device and a second application server servicing the second application, based on the measure of similarity, posit that the first encrypted file content and the second encrypted file content represent the same file, and in response to the positing, generate an output to the peripheral device.
2. The system according to claim 1, wherein the peripheral device is selected from the group of devices consisting of: a display, and a data storage.
3. The system according to claim 1, wherein the first estimated size and the second estimated size are expressed as respective probability distributions.
4. The system according to claim 1, wherein the first encrypted file content was uploaded by the first user device at a first time, wherein the second encrypted file content was downloaded by the second user device at a second time subsequent to the first time, wherein the processor is further configured to compute a 270,391/ difference between the second time and the first time, and wherein the processor is configured to generate the output responsively to the difference being less than a predefined threshold.
5. The system according to claim 4, wherein the output indicates that the first user device communicated the file to the second user device.
6. The system according to claim 4, wherein the processor is configured to generate the output by increasing a relatedness score between a first user of the first user device and a second user of the second user device.
7. The system according to claim 1, wherein the first encrypted file content was downloaded by the first user device at a first time, wherein the second encrypted file content was downloaded by the second user device at a second time subsequent to the first time, wherein the processor is configured to generate the output in response to a metadata link having been communicated by the first user device between the first time and the second time, and wherein the output indicates that the metadata link pointed to the file and was communicated to the second user device.
8. The system according to claim 1, wherein the output indicates that a first user communicated the file to a second user, and wherein the processor is configured to generate the output in response to a relatedness score between the first user and the second user.
9. The system according to claim 1, wherein the processor is further configured to identify a frequency with which files having the first estimated size are communicated over the network, and wherein the processor is configured to posit that the first encrypted file content and the second encrypted file content represent the same file with a likelihood that decreases with the 270,391/ frequency.
10. The system according to claim 1, wherein the first encrypted file content was downloaded by the first user device at a first time, wherein the second encrypted file content was downloaded by the second user device at a second time, wherein the processor is further configured to compute a difference between the first time and the second time, and wherein the processor is configured to generate the output responsively to the difference being less than a predefined threshold.
11. The system according to claim 1, wherein the processor is further configured to: receive a query specifying a second-file-content transfer of the second encrypted file content, and identify a first-file-content transfer of the first encrypted file content in response to the query, wherein the processor is configured to compute the measure of similarity in response to identifying the first-file-content transfer, and wherein the output includes parameters of the first-file-content transfer.
12. The system according to claim 11, wherein the second-file-content transfer was performed using a class of applications, and wherein the processor is configured to identify the first-file-content transfer of the first encrypted file content by: retrieving, from a database, multiple other-file-content transfers of other encrypted file content, which were performed using the class of applications, and identifying the first-file-content transfer from among the other-file-content transfers.
13. The system according to claim 11, wherein the processor is 270,391/ further configured to: identify multiple other-file-content transfers of other encrypted file content in response to the query, and posit that the file was transferred in each of the other-file-content transfers, wherein the processor is configured to generate the output by outputting a timeline of the other-file-content transfers and the first-file-content transfer.
14. The system according to claim 13, wherein at least some of the other-file-content transfers were performed using different respective applications.
15. A method, comprising: computing a measure of similarity between (i) a first estimated size of first encrypted file content transferred over a network, via a first application on a first user device, over a first connection between the first user device and a first application server servicing the first application, and (ii) a second estimated size of second encrypted file content transferred over the network, via a second application on a second user device, over a second connection between the second user device and a second application server servicing the second application; based on the measure of similarity, positing that the first encrypted file content and the second encrypted file content represent the same file; and in response to the positing, generating an output.
16. The method according to claim 15, wherein the first estimated size and the second estimated size are expressed as respective probability distributions.
17. The method according to claim 15, wherein the first encrypted file content was uploaded by the first user device at a first time, wherein the second encrypted file content was downloaded by the second user device at a second time subsequent to the first 270,391/ time, wherein the method further comprises computing a difference between the second time and the first time, and wherein generating the output comprises generating the output responsively to the difference being less than a predefined threshold.
18. The method according to claim 17, wherein the output indicates that the first user device communicated the file to the second user device.
19. The method according to claim 17, wherein generating the output comprises increasing a relatedness score between a first user of the first user device and a second user of the second user device.
20. The method according to claim 15, wherein the first encrypted file content was downloaded by the first user device at a first time, wherein the second encrypted file content was downloaded by the second user device at a second time subsequent to the first time, wherein generating the output comprises generating the output in response to a metadata link having been communicated by the first user device between the first time and the second time, and wherein the output indicates that the metadata link pointed to the file and was communicated to the second user device.
21. The method according to claim 15, wherein the output indicates that a first user communicated the file to a second user, and wherein generating the output comprises generating the output in response to a relatedness score between the first user and the second user.
22. The method according to claim 15, further comprising identifying a frequency with which files having the first estimated size are communicated over the network, wherein the positing comprising positing that the first encrypted file content and the 270,391/ second encrypted file content represent the same file with a likelihood that decreases with the frequency.
23. The method according to claim 15, wherein the first encrypted file content was downloaded by the first user device at a first time, wherein the second encrypted file content was downloaded by the second user device at a second time, wherein the method further comprises computing a difference between the first time and the second time, and wherein generating the output comprises generating the output responsively to the difference being less than a predefined threshold.
24. The method according to claim 15, further comprising: receiving a query specifying a second-file-content transfer of the second encrypted file content; and identifying a first-file-content transfer of the first encrypted file content in response to the query, wherein computing the measure of similarity comprises computing the measure of similarity in response to identifying the first-file-content transfer, and wherein the output includes parameters of the first-file-content transfer.
25. The method according to claim 24, wherein the second-file-content transfer was performed using a class of applications, and wherein identifying the first-file-content transfer of the first encrypted file content comprises: retrieving, from a database, multiple other-file-content transfers of other encrypted file content, which were performed using the class of applications, and identifying the first-file-content transfer from among the other-file-content transfers.
26. The method according to claim 24, further comprising: 270,391/ identifying multiple other-file-content transfers of other encrypted file content in response to the query; and positing that the file was transferred in each of the other-file-content transfers, wherein generating the output comprises generating the output by outputting a timeline of the other-file-content transfers and the first-file-content transfer.
27. The method according to claim 26, wherein at least some of the other-file-content transfers were performed using different respective applications.
28. A computer software product comprising a tangible non-transitory computer-readable medium in which program instructions are stored, which instructions, when read by a processor, cause the processor to: compute a measure of similarity between (i) a first estimated size of first encrypted file content transferred over a network, via a first application on a first user device, over a first connection between the first user device and a first application server servicing the first application, and (ii) a second estimated size of second encrypted file content transferred over the network, via a second application on a second user device, over a second connection between the second user device and a second application server servicing the second application, based on the measure of similarity, posit that the first encrypted file content and the second encrypted file content represent the same file, and in response to the positing, generate an output.
IL270391A 2019-11-03 2019-11-03 System and method for identifying exchanges of files IL270391B (en)

Priority Applications (4)

Application Number Priority Date Filing Date Title
IL270391A IL270391B (en) 2019-11-03 2019-11-03 System and method for identifying exchanges of files
EP20803657.4A EP4046337A1 (en) 2019-11-03 2020-10-28 System and method for identifying exchanges of encrypted communication traffic
PCT/IB2020/060102 WO2021084439A1 (en) 2019-11-03 2020-10-28 System and method for identifying exchanges of encrypted communication traffic
US17/082,152 US11399016B2 (en) 2019-11-03 2020-10-28 System and method for identifying exchanges of encrypted communication traffic

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
IL270391A IL270391B (en) 2019-11-03 2019-11-03 System and method for identifying exchanges of files

Publications (2)

Publication Number Publication Date
IL270391A IL270391A (en) 2021-05-31
IL270391B true IL270391B (en) 2022-08-01

Family

ID=76584261

Family Applications (1)

Application Number Title Priority Date Filing Date
IL270391A IL270391B (en) 2019-11-03 2019-11-03 System and method for identifying exchanges of files

Country Status (1)

Country Link
IL (1) IL270391B (en)

Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20120331556A1 (en) * 2011-06-27 2012-12-27 Dmitri Alperovitch System and method for protocol fingerprinting and reputation correlation
US20180316638A1 (en) * 2017-04-30 2018-11-01 Verint Systems Ltd. System and method for identifying relationships between users of computer applications
JP2019079280A (en) * 2017-10-25 2019-05-23 富士ゼロックス株式会社 File verification device, file transfer system and program

Patent Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20120331556A1 (en) * 2011-06-27 2012-12-27 Dmitri Alperovitch System and method for protocol fingerprinting and reputation correlation
US20180316638A1 (en) * 2017-04-30 2018-11-01 Verint Systems Ltd. System and method for identifying relationships between users of computer applications
JP2019079280A (en) * 2017-10-25 2019-05-23 富士ゼロックス株式会社 File verification device, file transfer system and program

Also Published As

Publication number Publication date
IL270391A (en) 2021-05-31

Similar Documents

Publication Publication Date Title
EP3780523B1 (en) Network traffic identification method and related device
KR101632187B1 (en) Methods to combine stateless and stateful server load balancing
CA2947325C (en) Protocol type identification method and apparatus
CN107438994B (en) Method, apparatus, and computer storage medium for server load balancing
WO2018094743A1 (en) Method for processing packet, and computer device
CN109218216B (en) Link aggregation flow distribution method, device, equipment and storage medium
US11722752B2 (en) Resource segmentation to improve delivery performance
WO2014094441A1 (en) Virus detection method and device
US20210359952A1 (en) Technologies for protocol-agnostic network packet segmentation
CN106101007B (en) Handle the method and device of message
WO2015131597A1 (en) Method and device for flow analysis
US9544242B2 (en) Network data prioritizer
US10298653B1 (en) Methods for monitoring streaming video content quality of experience (QOE) and devices thereof
US11399016B2 (en) System and method for identifying exchanges of encrypted communication traffic
CN108076149B (en) Session maintaining method and device
US20180288072A1 (en) Fragmented malware hash lookup in cloud repository
US10652626B2 (en) Gateway, and method, computer program and storage means corresponding thereto
WO2018018490A1 (en) Access distribution method, device and system
CN110430111B (en) OpenVPN data transmission method and VPN server
US8560715B1 (en) System, method, and computer program product to automate the flagging of obscure flows as at least potentially unwanted
IL270391B (en) System and method for identifying exchanges of files
IL270392B2 (en) System and method for estimating sizes of files transferred over encrypted connections
US11695546B2 (en) Decoupled custom event system based on ephemeral tokens for enabling secure custom services on a digital audio stream
WO2018023858A1 (en) Acceleration method, device, and system for p2p data
US9825942B2 (en) System and method of authenticating a live video stream