IL285479B2 - System and method for using a user-action log to learn to classify encrypted traffic - Google Patents

System and method for using a user-action log to learn to classify encrypted traffic

Info

Publication number
IL285479B2
IL285479B2 IL285479A IL28547921A IL285479B2 IL 285479 B2 IL285479 B2 IL 285479B2 IL 285479 A IL285479 A IL 285479A IL 28547921 A IL28547921 A IL 28547921A IL 285479 B2 IL285479 B2 IL 285479B2
Authority
IL
Israel
Prior art keywords
action
blocks
actions
classifier
time
Prior art date
Application number
IL285479A
Other languages
Hebrew (he)
Other versions
IL285479B1 (en
IL285479A (en
Original Assignee
Cognyte Tech Israel Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Cognyte Tech Israel Ltd filed Critical Cognyte Tech Israel Ltd
Priority to IL285479A priority Critical patent/IL285479B2/en
Publication of IL285479A publication Critical patent/IL285479A/en
Publication of IL285479B1 publication Critical patent/IL285479B1/en
Publication of IL285479B2 publication Critical patent/IL285479B2/en

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N20/00Machine learning
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N20/00Machine learning
    • G06N20/10Machine learning using kernel methods, e.g. support vector machines [SVM]
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L47/00Traffic control in data switching networks
    • H04L47/10Flow control; Congestion control
    • H04L47/24Traffic characterised by specific attributes, e.g. priority or QoS
    • H04L47/2441Traffic characterised by specific attributes, e.g. priority or QoS relying on flow classification, e.g. using integrated services [IntServ]
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L63/00Network architectures or network communication protocols for network security
    • H04L63/02Network architectures or network communication protocols for network security for separating internal from external traffic, e.g. firewalls
    • H04L63/0227Filtering policies
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L63/00Network architectures or network communication protocols for network security
    • H04L63/04Network architectures or network communication protocols for network security for providing a confidential data exchange among entities communicating through data packet networks
    • H04L63/0428Network architectures or network communication protocols for network security for providing a confidential data exchange among entities communicating through data packet networks wherein the data content is protected, e.g. by encrypting or encapsulating the payload
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L63/00Network architectures or network communication protocols for network security
    • H04L63/14Network architectures or network communication protocols for network security for detecting or protecting against malicious traffic
    • H04L63/1408Network architectures or network communication protocols for network security for detecting or protecting against malicious traffic by monitoring network traffic
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L63/00Network architectures or network communication protocols for network security
    • H04L63/30Network architectures or network communication protocols for network security for supporting lawful interception, monitoring or retaining of communications or communication related information
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L63/00Network architectures or network communication protocols for network security
    • H04L63/30Network architectures or network communication protocols for network security for supporting lawful interception, monitoring or retaining of communications or communication related information
    • H04L63/306Network architectures or network communication protocols for network security for supporting lawful interception, monitoring or retaining of communications or communication related information intercepting packet switched data communications, e.g. Web, Internet or IMS communications

Landscapes

  • Engineering & Computer Science (AREA)
  • General Engineering & Computer Science (AREA)
  • Computer Security & Cryptography (AREA)
  • Computing Systems (AREA)
  • Signal Processing (AREA)
  • Computer Networks & Wireless Communication (AREA)
  • Computer Hardware Design (AREA)
  • Software Systems (AREA)
  • Theoretical Computer Science (AREA)
  • Medical Informatics (AREA)
  • General Physics & Mathematics (AREA)
  • Mathematical Physics (AREA)
  • Physics & Mathematics (AREA)
  • Evolutionary Computation (AREA)
  • Data Mining & Analysis (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Artificial Intelligence (AREA)
  • Technology Law (AREA)
  • Electrically Operated Instructional Devices (AREA)
  • Management, Administration, Business Operations System, And Electronic Commerce (AREA)
  • Input From Keyboards Or The Like (AREA)

Description

1011-1143.1 SYSTEM AND METHOD FOR USING A USER-ACTION LOG TO LEARN TO CLASSIFY ENCRYPTED TRAFFIC FIELD OF THE DISCLOSURE The present disclosure is related to the monitoring of encrypted communication over communication networks, and to the application of machine-learning techniques to facilitate such monitoring.
BACKGROUND OF THE DISCLOSURE Many applications, such as Gmail, Facebook, Twitter, and Instagram, use an encrypted protocol, such as the Secure Sockets Layer (SSL) protocol or the Transport Layer Security (TLS) protocol. An application that uses an encrypted protocol generates encrypted traffic, upon a user using the application to perform a user action.
In some cases, marketing personnel may wish to learn more about a user’s online activities, in order to provide the user with relevant marketing material that is tailored to the user's behavioral and demographic profile. However, if the user’s traffic is mostly encrypted, it may be difficult to learn anything about the user’s online activities.
Conti, Mauro, et al. "Can't you hear me knocking: Identification of user actions on Android apps via traffic analysis," Proceedings of the 5th ACM Conference on Data and Application Security and Privacy, ACM, 2015, describes an investigation as to which extent it is feasible to identify the specific actions that a user is performing on mobile apps, by eavesdropping on their encrypted network traffic.
Saltaformaggio, Brendan, et al. "Eavesdropping on fine­grained user activities within smartphone apps over encrypted network traffic," Proc. USENIX Workshop on Offensive Technologies, 2016, demonstrates that a passive eavesdropper is capable of 1011-1143.1 identifying fine-grained user activities within the wireless network traffic generated by apps. The paper presents a technique, called NetScope, that is based on the intuition that the highly specific implementation of each app leaves a fingerprint on its traffic behavior (e.g., transfer rates, packet exchanges, and data movement). By learning the subtle traffic behavioral differences between activities (e.g., "browsing" versus "chatting" in a dating app), NetScope is able to perform robust inference of users’ activities, for both Android and iOS devices, based solely on inspecting IP headers.
Grolman, Edita, et al., "Transfer Learning for User Action Identification in Mobile Apps via Encrypted Traffic Analysis," IEEE Intelligent Systems (2018), describes an approach for inferring user actions performed in mobile apps by analyzing the resulting encrypted network traffic. The approach generalizes across different app versions, mobile operating systems, and device models, collectively referred to as configurations. The different configurations are treated as a case for transfer learning, and the co-training method is adapted to support the transfer learning process. The approach leverages a small number of labeled instances of encrypted traffic from a source configuration, in order to construct a classifier capable of identifying a user’s actions in a different (target) configuration which is completely unlabeled.
Hanneke, Steve, et al., Iterative Labeling for Semi­Supervised Learning, University of Illinois, 2004 proposes a unified perspective of a large family of semi-supervised learning algorithms, which select and label unlabeled data in an iterative process.
SUMMARY OF THE DISCLOSURE There is provided, in accordance with some embodiments of the present disclosure ,a system that includes a communication interface and a processor. The processor is configured to obtain 2 1011-1143.1 a user-action log that specifies (i) a series of actions, of respective action types, performed using an application, and (ii) respective action times at which the actions were performed. The processor is further configured to, using the communication interface, obtain a network-traffic report that specifies properties of a plurality of packets that were exchanged, while the series of actions were performed, between the application and a server for the application, the properties including respective receipt times at which the packets were received while en route between the application and the server. The processor is further configured to, based on the receipt times, define multiple non­overlapping blocks of consecutive ones of the packets. The processor is further configured to identify a correspondence between the actions and respective corresponding ones of the blocks, by correlating between the action times and the receipt times, and, based on the identified correspondence, train a classifier to associate other blocks of packets with respective ones of the action types based on the properties of the other blocks.
In some embodiments, the processor is configured to identify the correspondence and train the classifier by iteratively (i) using the classifier to select additional ones of the corresponding blocks, and augmenting a training set with the additional corresponding blocks, and (ii) using the augmented training set, retraining the classifier.
In some embodiments, the processor is configured to select the additional ones of the corresponding blocks by, for each action in a subset of the actions that do not yet belong to the training set:identifying one or more candidate blocks whose respective earliest receipt times correspond to the action time of the action, andusing the classifier to select one of the candidate blocks as the block that corresponds to the action. 1011-1143.1 In some embodiments, the processor is configured to identify the candidate blocks by:defining a window of time that includes the action time of the action, andidentifying the candidate blocks in response to the candidate blocks beginning in the window of time.
In some embodiments, the processor is configured to use the classifier to select one of the candidate blocks by:using the classifier, computing respective levels of confidence for the candidate blocks being associated with the action type of the action, andselecting the candidate block whose level of confidence is highest, relative to the other candidate blocks.
In some embodiments, the processor is configured to select the candidate block whose level of confidence is highest provided that the highest level of confidence is greater than a level-of- confidence threshold, and the processor is further configured to iteratively lower the level-of-confidence threshold when iteratively augmenting the training set.
In some embodiments, the processor is further configured to add the other candidate blocks, with respective labels indicating that the other candidate blocks do not correspond to any of the actions, to the training set.
In some embodiments, the processor is further configured to cause the user actions to be performed automatically.
In some embodiments, content of the packets is encrypted, and the properties of the packets do not include any of the encrypted content.
In some embodiments, the processor is further configured to, prior to identifying the correspondence between the actions and the respective corresponding ones of the blocks, inflate the action times.
In some embodiments, the processor is configured to inflate 4 1011-1143.1 the action times by, for each unique action type:computing, for a subgroup of the actions that are of the unique action type, respective estimated communication delays, by, for each action in the subgroup:identifying a block whose earliest receipt time follows the action time of the action and is closest to the action time of the action, relative to the other blocks, andcomputing the estimated communication delay for the action, by subtracting the action time of the action from the earliest receipt time of the identified block,computing a median of the estimated communication delays, and adding the median to the respective action times of the subgroup.
In some embodiments, the processor is further configured to: repeatedly define the blocks based on different respective sets of packet-aggregation rules, such that multiple classifiers are trained for the different respective sets of packet­aggregation rules, andselect a best-performing one of the multiple classifiers for use.
There is further provided, in accordance with some embodiments of the present disclosure, a method that includes obtaining a user-action log that specifies (i) a series of actions, of respective action types, performed using an application, and (ii) respective action times at which the actions were performed. The method further includes obtaining a network-traffic report that specifies properties of a plurality of packets that were exchanged, while the series of actions were performed, between the application and a server for the application, the properties including respective receipt times at which the packets were received while en route between the application and the server. The method further includes, based on the receipt times, defining multiple non-overlapping blocks of consecutive ones of the packets. The method further includes identifying a correspondence 5 1011-1143.1 between the actions and respective corresponding ones of the blocks, by correlating between the action times and the receipt times, and, based on the identified correspondence, training a classifier to associate other blocks of packets with respective ones of the action types based on the properties of the other blocks.
The present disclosure will be more fully understood from the following detailed description of embodiments thereof, taken together with the drawings, in which: BRIEF DESCRIPTION OF THE DRAWINGS Fig. 1 is a schematic illustration of a system for training a classifier to classify encrypted network traffic, in accordance with some embodiments of the present disclosure; Fig. 2 is a schematic illustration of an example network­traffic report, in accordance with some embodiments of the present disclosure; Fig. 3 is a flow diagram for a method for preprocessing a user-action log, in accordance with some embodiments of the present disclosure; Fig. 4 is a flow diagram for a method for training multiple classifiers, in accordance with some embodiments of the present disclosure; Fig. 5 is a flow diagram for a method for training a classifier, in accordance with some embodiments of the present disclosure; and Fig. 6 pictorially illustrates various aspects of training a classifier, in accordance with some embodiments of the present disclosure. 1011-1143.1

Claims (14)

1.,479/
2.CLAIMS 1. A system, comprising: a communication interface; and a processor, configured to: obtain a user-action log that specifies (i) a series of actions, of respective action types, performed using an application, and (ii) respective action times at which the actions were performed, using the communication interface, obtain a network-traffic report that specifies properties of a plurality of packets that were exchanged, while the series of actions were performed, between the application and a server for the application, the properties including respective receipt times at which the packets were received while en route between the application and the server, based on the receipt times, define multiple non-overlapping blocks of consecutive ones of the packets, inflate the action times, by, for each unique action type, computing, for a subgroup of the actions that are of the unique action type, respective estimated communication delays, by, for each action in the subgroup: identifying a block whose earliest receipt time follows the action time of the action and is closest to the action time of the action, relative to the other blocks, and computing the estimated communication delay for the action, by subtracting the action time of the action from the earliest receipt time of the identified block, computing a median of the estimated communication delays, and 285,479/ adding the median to the respective action times of the subgroup; identify a correspondence between the actions and respective corresponding ones of the blocks, by correlating between the action times and the receipt times, and based on the identified correspondence, train a classifier to associate other blocks of packets with respective ones of the action types based on the properties of the other blocks. 2. The system according to claim 1, wherein the processor is configured to identify the correspondence and train the classifier by iteratively (i) using the classifier to select additional ones of the corresponding blocks by, for each action in a subset of the actions that do not yet belong to the training set; identifying one or more candidate blocks whose respective earliest receipt times correspond to the action time of the action, and using the classifier to select one of the candidate blocks as the block that corresponds to the action, and (ii) augmenting a training set with the additional corresponding blocks, and (iii) using the augmented training set, retraining the classifier.
3. The system according to claim 2, wherein the processor is configured to identify the candidate blocks by: defining a window of time that includes the action time of the action, and identifying the candidate blocks in response to the candidate blocks beginning in the window of time. 285,479/
4. The system according to claim 2, wherein the processor is configured to use the classifier to select one of the candidate blocks by: using the classifier, computing respective levels of confidence for the candidate blocks being associated with the action type of the action, and selecting the candidate block whose level of confidence is highest, relative to the other candidate blocks.
5. The system according to claim 4, wherein the processor is configured to select the candidate block whose level of confidence is highest provided that the highest level of confidence is greater than a level-of-confidence threshold, and wherein the processor is further configured to iteratively lower the level-of-confidence threshold when iteratively augmenting the training set.
6. The system according to claim 4, wherein the processor is further configured to add the other candidate blocks as no-action blocks, with respective labels indicating that the other candidate blocks do not correspond to any of the actions, to the training set.
7. The system according to claim 1, wherein the processor is further configured to: repeatedly define the blocks based on different respective sets of packet-aggregation rules, such that multiple classifiers are trained for the different respective sets of packet-aggregation rules, and select a best-performing one of the multiple classifiers for use. 285,479/
8. A method, comprising: obtaining a user-action log that specifies (i) a series of actions, of respective action types, performed using an application, and (ii) respective action times at which the actions were performed; obtaining a network-traffic report that specifies properties of a plurality of packets that were exchanged, while the series of actions were performed, between the application and a server for the application, the properties including respective receipt times at which the packets were received while en route between the application and the server; based on the receipt times, defining multiple non-overlapping blocks of consecutive ones of the packets; inflating the action times by computing, for a subgroup of the actions that are of the unique action type, respective estimated communication delays, by, for each action in the subgroup: identifying a block whose earliest receipt time follows the action time of the action and is closest to the action time of the action, relative to the other blocks, and computing the estimated communication delay for the action, by subtracting the action time of the action from the earliest receipt time of the identified block; computing a median of the estimated communication delays; and adding the median to the respective action times of the subgroup; 285,479/ identifying a correspondence between the actions and respective corresponding ones of the blocks, by correlating between the action times and the receipt times; and based on the identified correspondence, training a classifier to associate other blocks of packets with respective ones of the action types based on the properties of the other blocks.
9. The method according to claim 8, wherein identifying the correspondence and training the classifier comprises iteratively (i) using the classifier to select additional ones of the corresponding blocks by, for each action in a subset of the actions that do not yet belong to the training set; identifying one or more candidate blocks whose respective earliest receipt times correspond to the action time of the action; and using the classifier to select one of the candidate blocks as the block that corresponds to the action; (ii) augmenting a training set with the additional corresponding blocks, and (iii) using the augmented training set, retraining the classifier.
10. The method according to claim 9, wherein identifying the candidate blocks comprises: defining a window of time that includes the action time of the action; and identifying the candidate blocks in response to the candidate blocks beginning in the window of time.
11. The method according to claim 9, wherein using the classifier to select one of the candidate blocks comprises: 285,479/ using the classifier, computing respective levels of confidence for the candidate blocks being associated with the action type of the action, and selecting the candidate block whose level of confidence is highest, relative to the other candidate blocks.
12. The method according to claim 11, wherein selecting the candidate block whose level of confidence is highest comprises selecting the block whose level of confidence is highest provided that the highest level of confidence is greater than a level-of-confidence threshold, and wherein iteratively augmenting the training set further comprises iteratively lowering the level-of-confidence threshold.
13. The method according to claim 11, wherein iteratively augmenting the training set further comprises adding the other candidate blocks as no-action blocks, with respective labels indicating that the other candidate blocks do not correspond to any of the actions, to the training set.
14. The method according to claim 8, further comprising: repeatedly defining the blocks based on different respective sets of packet-aggregation rules, such that multiple classifiers are trained for the different respective sets of packet-aggregation rules; and selecting a best-performing one of the multiple classifiers for use.
IL285479A 2021-08-09 2021-08-09 System and method for using a user-action log to learn to classify encrypted traffic IL285479B2 (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
IL285479A IL285479B2 (en) 2021-08-09 2021-08-09 System and method for using a user-action log to learn to classify encrypted traffic

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
IL285479A IL285479B2 (en) 2021-08-09 2021-08-09 System and method for using a user-action log to learn to classify encrypted traffic

Publications (3)

Publication Number Publication Date
IL285479A IL285479A (en) 2021-09-30
IL285479B1 IL285479B1 (en) 2023-04-01
IL285479B2 true IL285479B2 (en) 2023-08-01

Family

ID=77989493

Family Applications (1)

Application Number Title Priority Date Filing Date
IL285479A IL285479B2 (en) 2021-08-09 2021-08-09 System and method for using a user-action log to learn to classify encrypted traffic

Country Status (1)

Country Link
IL (1) IL285479B2 (en)

Family Cites Families (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20140321290A1 (en) * 2013-04-30 2014-10-30 Hewlett-Packard Development Company, L.P. Management of classification frameworks to identify applications
IL248306B (en) * 2016-10-10 2019-12-31 Verint Systems Ltd System and method for generating data sets for learning to identify user actions

Also Published As

Publication number Publication date
IL285479B1 (en) 2023-04-01
IL285479A (en) 2021-09-30

Similar Documents

Publication Publication Date Title
Ahuja et al. Automated DDOS attack detection in software defined networking
US10284588B2 (en) Dynamic selection of security posture for devices in a network using risk scoring
Zeng et al. Senior2local: A machine learning based intrusion detection method for vanets
US11403559B2 (en) System and method for using a user-action log to learn to classify encrypted traffic
Gupta et al. Towards detecting fake user accounts in facebook
US11374944B2 (en) Instant network threat detection system
Cid-Fuentes et al. An adaptive framework for the detection of novel botnets
CN111492635A (en) Malicious software host network flow analysis system and method
TW201909016A (en) Gateway device, detection method of malicious domain and host host, and non-transitory computer readable media
Aiolli et al. Mind your wallet's privacy: identifying Bitcoin wallet apps and user's actions through network traffic analysis
Letteri et al. Performance of Botnet Detection by Neural Networks in Software-Defined Networks.
US20240106836A1 (en) Learning of malicious behavior vocabulary and threat detection
Maiti et al. Link-layer device type classification on encrypted wireless traffic with COTS radios
Algelal et al. Botnet detection using ensemble classifiers of network flow
Mubarak et al. Industrial datasets with ICS testbed and attack detection using machine learning techniques
Aveleira-Mata et al. Functional prototype for intrusion detection system oriented to intelligent iot models
IL285479B2 (en) System and method for using a user-action log to learn to classify encrypted traffic
Mehdi et al. Survey on intrusion detection system in iot network
Abhilash et al. Intrusion detection and prevention in software defined networking
Ramraj et al. Hybrid feature learning framework for the classification of encrypted network traffic
Seo et al. Heimdallr: Fingerprinting SD-WAN Control-Plane Architecture via Encrypted Control Traffic
Sujana et al. Temporal based network packet anomaly detection using machine learning
Sahraoui et al. LearnPhi: A real-time learning model for early prediction of phishing attacks in IoV
Sapello et al. Application of learning using privileged information (LUPI): botnet detection
Mizumura et al. Smartphone application usage prediction using cellular network traffic