IL285479B2 - System and method for using a user-action log to learn to classify encrypted traffic - Google Patents
System and method for using a user-action log to learn to classify encrypted trafficInfo
- Publication number
- IL285479B2 IL285479B2 IL285479A IL28547921A IL285479B2 IL 285479 B2 IL285479 B2 IL 285479B2 IL 285479 A IL285479 A IL 285479A IL 28547921 A IL28547921 A IL 28547921A IL 285479 B2 IL285479 B2 IL 285479B2
- Authority
- IL
- Israel
- Prior art keywords
- action
- blocks
- actions
- classifier
- time
- Prior art date
Links
- 238000000034 method Methods 0.000 title claims description 20
- 238000012549 training Methods 0.000 claims description 24
- 238000004891 communication Methods 0.000 claims description 15
- 230000003190 augmentative effect Effects 0.000 claims description 10
- 238000004220 aggregation Methods 0.000 claims description 6
- 230000001934 delay Effects 0.000 claims description 6
- 230000004044 response Effects 0.000 claims description 3
- 230000000694 effects Effects 0.000 description 6
- 238000013459 approach Methods 0.000 description 3
- 238000010586 diagram Methods 0.000 description 3
- 238000013526 transfer learning Methods 0.000 description 3
- 230000003542 behavioural effect Effects 0.000 description 2
- 238000012544 monitoring process Methods 0.000 description 2
- 230000006399 behavior Effects 0.000 description 1
- 238000005516 engineering process Methods 0.000 description 1
- 238000011835 investigation Methods 0.000 description 1
- 238000012804 iterative process Methods 0.000 description 1
- 238000002372 labelling Methods 0.000 description 1
- 238000010801 machine learning Methods 0.000 description 1
- 238000007781 pre-processing Methods 0.000 description 1
- 238000012546 transfer Methods 0.000 description 1
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N20/00—Machine learning
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N20/00—Machine learning
- G06N20/10—Machine learning using kernel methods, e.g. support vector machines [SVM]
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04L—TRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
- H04L47/00—Traffic control in data switching networks
- H04L47/10—Flow control; Congestion control
- H04L47/24—Traffic characterised by specific attributes, e.g. priority or QoS
- H04L47/2441—Traffic characterised by specific attributes, e.g. priority or QoS relying on flow classification, e.g. using integrated services [IntServ]
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04L—TRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
- H04L63/00—Network architectures or network communication protocols for network security
- H04L63/02—Network architectures or network communication protocols for network security for separating internal from external traffic, e.g. firewalls
- H04L63/0227—Filtering policies
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04L—TRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
- H04L63/00—Network architectures or network communication protocols for network security
- H04L63/04—Network architectures or network communication protocols for network security for providing a confidential data exchange among entities communicating through data packet networks
- H04L63/0428—Network architectures or network communication protocols for network security for providing a confidential data exchange among entities communicating through data packet networks wherein the data content is protected, e.g. by encrypting or encapsulating the payload
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04L—TRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
- H04L63/00—Network architectures or network communication protocols for network security
- H04L63/14—Network architectures or network communication protocols for network security for detecting or protecting against malicious traffic
- H04L63/1408—Network architectures or network communication protocols for network security for detecting or protecting against malicious traffic by monitoring network traffic
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04L—TRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
- H04L63/00—Network architectures or network communication protocols for network security
- H04L63/30—Network architectures or network communication protocols for network security for supporting lawful interception, monitoring or retaining of communications or communication related information
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04L—TRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
- H04L63/00—Network architectures or network communication protocols for network security
- H04L63/30—Network architectures or network communication protocols for network security for supporting lawful interception, monitoring or retaining of communications or communication related information
- H04L63/306—Network architectures or network communication protocols for network security for supporting lawful interception, monitoring or retaining of communications or communication related information intercepting packet switched data communications, e.g. Web, Internet or IMS communications
Landscapes
- Engineering & Computer Science (AREA)
- General Engineering & Computer Science (AREA)
- Computer Security & Cryptography (AREA)
- Computing Systems (AREA)
- Signal Processing (AREA)
- Computer Networks & Wireless Communication (AREA)
- Computer Hardware Design (AREA)
- Software Systems (AREA)
- Theoretical Computer Science (AREA)
- Medical Informatics (AREA)
- General Physics & Mathematics (AREA)
- Mathematical Physics (AREA)
- Physics & Mathematics (AREA)
- Evolutionary Computation (AREA)
- Data Mining & Analysis (AREA)
- Computer Vision & Pattern Recognition (AREA)
- Artificial Intelligence (AREA)
- Technology Law (AREA)
- Electrically Operated Instructional Devices (AREA)
- Management, Administration, Business Operations System, And Electronic Commerce (AREA)
- Input From Keyboards Or The Like (AREA)
Description
1011-1143.1 SYSTEM AND METHOD FOR USING A USER-ACTION LOG TO LEARN TO CLASSIFY ENCRYPTED TRAFFIC FIELD OF THE DISCLOSURE The present disclosure is related to the monitoring of encrypted communication over communication networks, and to the application of machine-learning techniques to facilitate such monitoring.
BACKGROUND OF THE DISCLOSURE Many applications, such as Gmail, Facebook, Twitter, and Instagram, use an encrypted protocol, such as the Secure Sockets Layer (SSL) protocol or the Transport Layer Security (TLS) protocol. An application that uses an encrypted protocol generates encrypted traffic, upon a user using the application to perform a user action.
In some cases, marketing personnel may wish to learn more about a user’s online activities, in order to provide the user with relevant marketing material that is tailored to the user's behavioral and demographic profile. However, if the user’s traffic is mostly encrypted, it may be difficult to learn anything about the user’s online activities.
Conti, Mauro, et al. "Can't you hear me knocking: Identification of user actions on Android apps via traffic analysis," Proceedings of the 5th ACM Conference on Data and Application Security and Privacy, ACM, 2015, describes an investigation as to which extent it is feasible to identify the specific actions that a user is performing on mobile apps, by eavesdropping on their encrypted network traffic.
Saltaformaggio, Brendan, et al. "Eavesdropping on finegrained user activities within smartphone apps over encrypted network traffic," Proc. USENIX Workshop on Offensive Technologies, 2016, demonstrates that a passive eavesdropper is capable of 1011-1143.1 identifying fine-grained user activities within the wireless network traffic generated by apps. The paper presents a technique, called NetScope, that is based on the intuition that the highly specific implementation of each app leaves a fingerprint on its traffic behavior (e.g., transfer rates, packet exchanges, and data movement). By learning the subtle traffic behavioral differences between activities (e.g., "browsing" versus "chatting" in a dating app), NetScope is able to perform robust inference of users’ activities, for both Android and iOS devices, based solely on inspecting IP headers.
Grolman, Edita, et al., "Transfer Learning for User Action Identification in Mobile Apps via Encrypted Traffic Analysis," IEEE Intelligent Systems (2018), describes an approach for inferring user actions performed in mobile apps by analyzing the resulting encrypted network traffic. The approach generalizes across different app versions, mobile operating systems, and device models, collectively referred to as configurations. The different configurations are treated as a case for transfer learning, and the co-training method is adapted to support the transfer learning process. The approach leverages a small number of labeled instances of encrypted traffic from a source configuration, in order to construct a classifier capable of identifying a user’s actions in a different (target) configuration which is completely unlabeled.
Hanneke, Steve, et al., Iterative Labeling for SemiSupervised Learning, University of Illinois, 2004 proposes a unified perspective of a large family of semi-supervised learning algorithms, which select and label unlabeled data in an iterative process.
SUMMARY OF THE DISCLOSURE There is provided, in accordance with some embodiments of the present disclosure ,a system that includes a communication interface and a processor. The processor is configured to obtain 2 1011-1143.1 a user-action log that specifies (i) a series of actions, of respective action types, performed using an application, and (ii) respective action times at which the actions were performed. The processor is further configured to, using the communication interface, obtain a network-traffic report that specifies properties of a plurality of packets that were exchanged, while the series of actions were performed, between the application and a server for the application, the properties including respective receipt times at which the packets were received while en route between the application and the server. The processor is further configured to, based on the receipt times, define multiple nonoverlapping blocks of consecutive ones of the packets. The processor is further configured to identify a correspondence between the actions and respective corresponding ones of the blocks, by correlating between the action times and the receipt times, and, based on the identified correspondence, train a classifier to associate other blocks of packets with respective ones of the action types based on the properties of the other blocks.
In some embodiments, the processor is configured to identify the correspondence and train the classifier by iteratively (i) using the classifier to select additional ones of the corresponding blocks, and augmenting a training set with the additional corresponding blocks, and (ii) using the augmented training set, retraining the classifier.
In some embodiments, the processor is configured to select the additional ones of the corresponding blocks by, for each action in a subset of the actions that do not yet belong to the training set:identifying one or more candidate blocks whose respective earliest receipt times correspond to the action time of the action, andusing the classifier to select one of the candidate blocks as the block that corresponds to the action. 1011-1143.1 In some embodiments, the processor is configured to identify the candidate blocks by:defining a window of time that includes the action time of the action, andidentifying the candidate blocks in response to the candidate blocks beginning in the window of time.
In some embodiments, the processor is configured to use the classifier to select one of the candidate blocks by:using the classifier, computing respective levels of confidence for the candidate blocks being associated with the action type of the action, andselecting the candidate block whose level of confidence is highest, relative to the other candidate blocks.
In some embodiments, the processor is configured to select the candidate block whose level of confidence is highest provided that the highest level of confidence is greater than a level-of- confidence threshold, and the processor is further configured to iteratively lower the level-of-confidence threshold when iteratively augmenting the training set.
In some embodiments, the processor is further configured to add the other candidate blocks, with respective labels indicating that the other candidate blocks do not correspond to any of the actions, to the training set.
In some embodiments, the processor is further configured to cause the user actions to be performed automatically.
In some embodiments, content of the packets is encrypted, and the properties of the packets do not include any of the encrypted content.
In some embodiments, the processor is further configured to, prior to identifying the correspondence between the actions and the respective corresponding ones of the blocks, inflate the action times.
In some embodiments, the processor is configured to inflate 4 1011-1143.1 the action times by, for each unique action type:computing, for a subgroup of the actions that are of the unique action type, respective estimated communication delays, by, for each action in the subgroup:identifying a block whose earliest receipt time follows the action time of the action and is closest to the action time of the action, relative to the other blocks, andcomputing the estimated communication delay for the action, by subtracting the action time of the action from the earliest receipt time of the identified block,computing a median of the estimated communication delays, and adding the median to the respective action times of the subgroup.
In some embodiments, the processor is further configured to: repeatedly define the blocks based on different respective sets of packet-aggregation rules, such that multiple classifiers are trained for the different respective sets of packetaggregation rules, andselect a best-performing one of the multiple classifiers for use.
There is further provided, in accordance with some embodiments of the present disclosure, a method that includes obtaining a user-action log that specifies (i) a series of actions, of respective action types, performed using an application, and (ii) respective action times at which the actions were performed. The method further includes obtaining a network-traffic report that specifies properties of a plurality of packets that were exchanged, while the series of actions were performed, between the application and a server for the application, the properties including respective receipt times at which the packets were received while en route between the application and the server. The method further includes, based on the receipt times, defining multiple non-overlapping blocks of consecutive ones of the packets. The method further includes identifying a correspondence 5 1011-1143.1 between the actions and respective corresponding ones of the blocks, by correlating between the action times and the receipt times, and, based on the identified correspondence, training a classifier to associate other blocks of packets with respective ones of the action types based on the properties of the other blocks.
The present disclosure will be more fully understood from the following detailed description of embodiments thereof, taken together with the drawings, in which: BRIEF DESCRIPTION OF THE DRAWINGS Fig. 1 is a schematic illustration of a system for training a classifier to classify encrypted network traffic, in accordance with some embodiments of the present disclosure; Fig. 2 is a schematic illustration of an example networktraffic report, in accordance with some embodiments of the present disclosure; Fig. 3 is a flow diagram for a method for preprocessing a user-action log, in accordance with some embodiments of the present disclosure; Fig. 4 is a flow diagram for a method for training multiple classifiers, in accordance with some embodiments of the present disclosure; Fig. 5 is a flow diagram for a method for training a classifier, in accordance with some embodiments of the present disclosure; and Fig. 6 pictorially illustrates various aspects of training a classifier, in accordance with some embodiments of the present disclosure. 1011-1143.1
Claims (14)
1.,479/
2.CLAIMS 1. A system, comprising: a communication interface; and a processor, configured to: obtain a user-action log that specifies (i) a series of actions, of respective action types, performed using an application, and (ii) respective action times at which the actions were performed, using the communication interface, obtain a network-traffic report that specifies properties of a plurality of packets that were exchanged, while the series of actions were performed, between the application and a server for the application, the properties including respective receipt times at which the packets were received while en route between the application and the server, based on the receipt times, define multiple non-overlapping blocks of consecutive ones of the packets, inflate the action times, by, for each unique action type, computing, for a subgroup of the actions that are of the unique action type, respective estimated communication delays, by, for each action in the subgroup: identifying a block whose earliest receipt time follows the action time of the action and is closest to the action time of the action, relative to the other blocks, and computing the estimated communication delay for the action, by subtracting the action time of the action from the earliest receipt time of the identified block, computing a median of the estimated communication delays, and 285,479/ adding the median to the respective action times of the subgroup; identify a correspondence between the actions and respective corresponding ones of the blocks, by correlating between the action times and the receipt times, and based on the identified correspondence, train a classifier to associate other blocks of packets with respective ones of the action types based on the properties of the other blocks. 2. The system according to claim 1, wherein the processor is configured to identify the correspondence and train the classifier by iteratively (i) using the classifier to select additional ones of the corresponding blocks by, for each action in a subset of the actions that do not yet belong to the training set; identifying one or more candidate blocks whose respective earliest receipt times correspond to the action time of the action, and using the classifier to select one of the candidate blocks as the block that corresponds to the action, and (ii) augmenting a training set with the additional corresponding blocks, and (iii) using the augmented training set, retraining the classifier.
3. The system according to claim 2, wherein the processor is configured to identify the candidate blocks by: defining a window of time that includes the action time of the action, and identifying the candidate blocks in response to the candidate blocks beginning in the window of time. 285,479/
4. The system according to claim 2, wherein the processor is configured to use the classifier to select one of the candidate blocks by: using the classifier, computing respective levels of confidence for the candidate blocks being associated with the action type of the action, and selecting the candidate block whose level of confidence is highest, relative to the other candidate blocks.
5. The system according to claim 4, wherein the processor is configured to select the candidate block whose level of confidence is highest provided that the highest level of confidence is greater than a level-of-confidence threshold, and wherein the processor is further configured to iteratively lower the level-of-confidence threshold when iteratively augmenting the training set.
6. The system according to claim 4, wherein the processor is further configured to add the other candidate blocks as no-action blocks, with respective labels indicating that the other candidate blocks do not correspond to any of the actions, to the training set.
7. The system according to claim 1, wherein the processor is further configured to: repeatedly define the blocks based on different respective sets of packet-aggregation rules, such that multiple classifiers are trained for the different respective sets of packet-aggregation rules, and select a best-performing one of the multiple classifiers for use. 285,479/
8. A method, comprising: obtaining a user-action log that specifies (i) a series of actions, of respective action types, performed using an application, and (ii) respective action times at which the actions were performed; obtaining a network-traffic report that specifies properties of a plurality of packets that were exchanged, while the series of actions were performed, between the application and a server for the application, the properties including respective receipt times at which the packets were received while en route between the application and the server; based on the receipt times, defining multiple non-overlapping blocks of consecutive ones of the packets; inflating the action times by computing, for a subgroup of the actions that are of the unique action type, respective estimated communication delays, by, for each action in the subgroup: identifying a block whose earliest receipt time follows the action time of the action and is closest to the action time of the action, relative to the other blocks, and computing the estimated communication delay for the action, by subtracting the action time of the action from the earliest receipt time of the identified block; computing a median of the estimated communication delays; and adding the median to the respective action times of the subgroup; 285,479/ identifying a correspondence between the actions and respective corresponding ones of the blocks, by correlating between the action times and the receipt times; and based on the identified correspondence, training a classifier to associate other blocks of packets with respective ones of the action types based on the properties of the other blocks.
9. The method according to claim 8, wherein identifying the correspondence and training the classifier comprises iteratively (i) using the classifier to select additional ones of the corresponding blocks by, for each action in a subset of the actions that do not yet belong to the training set; identifying one or more candidate blocks whose respective earliest receipt times correspond to the action time of the action; and using the classifier to select one of the candidate blocks as the block that corresponds to the action; (ii) augmenting a training set with the additional corresponding blocks, and (iii) using the augmented training set, retraining the classifier.
10. The method according to claim 9, wherein identifying the candidate blocks comprises: defining a window of time that includes the action time of the action; and identifying the candidate blocks in response to the candidate blocks beginning in the window of time.
11. The method according to claim 9, wherein using the classifier to select one of the candidate blocks comprises: 285,479/ using the classifier, computing respective levels of confidence for the candidate blocks being associated with the action type of the action, and selecting the candidate block whose level of confidence is highest, relative to the other candidate blocks.
12. The method according to claim 11, wherein selecting the candidate block whose level of confidence is highest comprises selecting the block whose level of confidence is highest provided that the highest level of confidence is greater than a level-of-confidence threshold, and wherein iteratively augmenting the training set further comprises iteratively lowering the level-of-confidence threshold.
13. The method according to claim 11, wherein iteratively augmenting the training set further comprises adding the other candidate blocks as no-action blocks, with respective labels indicating that the other candidate blocks do not correspond to any of the actions, to the training set.
14. The method according to claim 8, further comprising: repeatedly defining the blocks based on different respective sets of packet-aggregation rules, such that multiple classifiers are trained for the different respective sets of packet-aggregation rules; and selecting a best-performing one of the multiple classifiers for use.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
IL285479A IL285479B2 (en) | 2021-08-09 | 2021-08-09 | System and method for using a user-action log to learn to classify encrypted traffic |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
IL285479A IL285479B2 (en) | 2021-08-09 | 2021-08-09 | System and method for using a user-action log to learn to classify encrypted traffic |
Publications (3)
Publication Number | Publication Date |
---|---|
IL285479A IL285479A (en) | 2021-09-30 |
IL285479B1 IL285479B1 (en) | 2023-04-01 |
IL285479B2 true IL285479B2 (en) | 2023-08-01 |
Family
ID=77989493
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
IL285479A IL285479B2 (en) | 2021-08-09 | 2021-08-09 | System and method for using a user-action log to learn to classify encrypted traffic |
Country Status (1)
Country | Link |
---|---|
IL (1) | IL285479B2 (en) |
Family Cites Families (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20140321290A1 (en) * | 2013-04-30 | 2014-10-30 | Hewlett-Packard Development Company, L.P. | Management of classification frameworks to identify applications |
IL248306B (en) * | 2016-10-10 | 2019-12-31 | Verint Systems Ltd | System and method for generating data sets for learning to identify user actions |
-
2021
- 2021-08-09 IL IL285479A patent/IL285479B2/en unknown
Also Published As
Publication number | Publication date |
---|---|
IL285479B1 (en) | 2023-04-01 |
IL285479A (en) | 2021-09-30 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
Ahuja et al. | Automated DDOS attack detection in software defined networking | |
US10284588B2 (en) | Dynamic selection of security posture for devices in a network using risk scoring | |
Zeng et al. | Senior2local: A machine learning based intrusion detection method for vanets | |
US11403559B2 (en) | System and method for using a user-action log to learn to classify encrypted traffic | |
Gupta et al. | Towards detecting fake user accounts in facebook | |
US11374944B2 (en) | Instant network threat detection system | |
Cid-Fuentes et al. | An adaptive framework for the detection of novel botnets | |
CN111492635A (en) | Malicious software host network flow analysis system and method | |
TW201909016A (en) | Gateway device, detection method of malicious domain and host host, and non-transitory computer readable media | |
Aiolli et al. | Mind your wallet's privacy: identifying Bitcoin wallet apps and user's actions through network traffic analysis | |
Letteri et al. | Performance of Botnet Detection by Neural Networks in Software-Defined Networks. | |
US20240106836A1 (en) | Learning of malicious behavior vocabulary and threat detection | |
Maiti et al. | Link-layer device type classification on encrypted wireless traffic with COTS radios | |
Algelal et al. | Botnet detection using ensemble classifiers of network flow | |
Mubarak et al. | Industrial datasets with ICS testbed and attack detection using machine learning techniques | |
Aveleira-Mata et al. | Functional prototype for intrusion detection system oriented to intelligent iot models | |
IL285479B2 (en) | System and method for using a user-action log to learn to classify encrypted traffic | |
Mehdi et al. | Survey on intrusion detection system in iot network | |
Abhilash et al. | Intrusion detection and prevention in software defined networking | |
Ramraj et al. | Hybrid feature learning framework for the classification of encrypted network traffic | |
Seo et al. | Heimdallr: Fingerprinting SD-WAN Control-Plane Architecture via Encrypted Control Traffic | |
Sujana et al. | Temporal based network packet anomaly detection using machine learning | |
Sahraoui et al. | LearnPhi: A real-time learning model for early prediction of phishing attacks in IoV | |
Sapello et al. | Application of learning using privileged information (LUPI): botnet detection | |
Mizumura et al. | Smartphone application usage prediction using cellular network traffic |