WO2017115347A1 - Utilizing behavioral features to identify bot - Google Patents

Utilizing behavioral features to identify bot Download PDF

Info

Publication number
WO2017115347A1
WO2017115347A1 PCT/IL2016/051300 IL2016051300W WO2017115347A1 WO 2017115347 A1 WO2017115347 A1 WO 2017115347A1 IL 2016051300 W IL2016051300 W IL 2016051300W WO 2017115347 A1 WO2017115347 A1 WO 2017115347A1
Authority
WO
WIPO (PCT)
Prior art keywords
behavioral features
agent
page
interaction
bot
Prior art date
Application number
PCT/IL2016/051300
Other languages
French (fr)
Inventor
Yaron Oliker
Alon Dayan
Yaacov Fernandess
Original Assignee
Unbotify Ltd.
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Unbotify Ltd. filed Critical Unbotify Ltd.
Priority to CN201680076727.5A priority Critical patent/CN108604272A/en
Priority to EP16881394.7A priority patent/EP3398106B1/en
Publication of WO2017115347A1 publication Critical patent/WO2017115347A1/en

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F21/00Security arrangements for protecting computers, components thereof, programs or data against unauthorised activity
    • G06F21/30Authentication, i.e. establishing the identity or authorisation of security principals
    • G06F21/31User authentication
    • G06F21/316User authentication by observing the pattern of computer usage, e.g. typical user behaviour
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F2221/00Indexing scheme relating to security arrangements for protecting computers, components thereof, programs or data against unauthorised activity
    • G06F2221/21Indexing scheme relating to G06F21/00 and subgroups addressing additional information or applications relating to security arrangements for protecting computers, components thereof, programs or data against unauthorised activity
    • G06F2221/2133Verifying human interaction, e.g., Captcha
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F2221/00Indexing scheme relating to security arrangements for protecting computers, components thereof, programs or data against unauthorised activity
    • G06F2221/21Indexing scheme relating to G06F21/00 and subgroups addressing additional information or applications relating to security arrangements for protecting computers, components thereof, programs or data against unauthorised activity
    • G06F2221/2151Time stamp

Definitions

  • the present disclosure relates to bot detection in general, and to bot detection using behavior biometrics, in particular.
  • Bots are an automatic software agent that accesses web pages of websites. Bots can potentially harm websites in various ways, such as data scraping, advertisement fraud, credential stuffing, Denial of Service (DoS) attacks, and more. Bots have become a substantial problem in today's World Wide Web. For example, some studies indicate that billions of dollars are stolen every year by bot fraudsters.
  • DoS Denial of Service
  • CAPTCHA Completely Automated Public Turing test to tell Computers and Humans Apart
  • a CAPTCHA is a challenge-response test used to determine whether or not the user is human.
  • a CAPTCHA presents a challenge to the agent accessing a web page.
  • the challenge may be a task that automated tools are less prone to respond correctly, such as identifying numbers or letters in a distorted image, which may be a relatively easy task for a human and relatively hard task for an automated agent.
  • One exemplary embodiment of the disclosed subject matter is a method comprising: obtaining a set of behavioral features of a usage of an input device during an interaction of an agent with a page, wherein the set of behavioral features are consistent with a human-generated interaction, wherein the set of behavioral features are generated based on events obtained from a client device used by the agent; automatically estimating whether the agent is a bot based on comparison of the set of behavioral features with one or more additional sets of behavioral features, wherein the one or more additional sets of behavioral features were previously obtained based on interactions of one or more agents with one or more pages; and in response to an estimation that the agent is a bot, performing a responsive action.
  • the method may comprise verifying that the set of behavioral features are consistent with a human-generated interaction, and wherein in response to said verifying, performing said automatically estimating.
  • said automatically estimating is performed by a classifier, wherein the classifier obtains multidimensional vectors each of which is defined by a set of behavioral features, wherein said classifier is configured to provide a classification of a bot for a set of behavioral features based on a statistical measurement being above or below a threshold.
  • the set of behavioral features is based on a Page Interaction Packet (PIP), wherein the one or more additional sets of behavioral features are based on PIPs.
  • PIP Page Interaction Packet
  • the PIP and the PIPs are obtained from a same session.
  • the PIP is obtained from a first session, wherein the PIPs comprise at least a subset of PIPs that are obtained from a second session.
  • the PIP and the PIPs originate from pages, wherein the pages are pages of a same type.
  • the type of the pages is selected from the group consisting of: a search results page; an item display page; a verification page; a login page; a form page; and a checkout page.
  • the pages are pages of the same type from different websites, wherein the different websites are websites of a similar subject matter.
  • the set of behavioral features is based on a Session Interaction Packet (SIP), wherein the one or more additional sets of behavioral features are based on SIPs.
  • SIP Session Interaction Packet
  • said automatically estimating comprises: obtaining a plurality of sets of behavioral features each of which is based on a Page Interaction Packet (PIP) of the agent, all of which were obtained during a same session, wherein the plurality of sets of behavioral features comprises the set of behavioral features, for each set of the plurality of sets of behavioral features, computing a statistical measurement with respect to the one or more additional sets of behavioral features; computing an aggregated statistical measurement based on the statistical measurement of each set of the plurality of set of behavioral features; and estimating whether the agent is a bot based on the aggregated statistical measurement being above or below a threshold,
  • PIP Page Interaction Packet
  • the set of behavioral features comprise at least one of: a speed of a movement of a pointing device; an acceleration of a movement of a pointing device; a curvature feature of a movement of a pointing device; one or more movement vectors of a pointing device; a pause pattern of a movement of a pointing device; a click pattern of one or more buttons of an input device; a time duration of the interaction; a flight time; a dwell time; an angle movement of a pointing device; acceleration measurement of a user device during the interaction; and an orientation of a user device during the interaction.
  • said obtaining comprises receiving, by a server, the set of behavioral features or indications thereof from a client-side script embedded in the page that is being executed by a computerized device of the agent while the agent is interacting with the page.
  • the one or more additional sets of behavioral features were previously obtained based on interactions with the page.
  • Another exemplary embodiment of the disclosed subject matter is an apparatus having a processor, the processor being adapted to perform the steps of: obtaining a set of behavioral features of a usage of an input device during an interaction of an agent with a page, wherein the set of behavioral features are consistent with a human- generated interaction, wherein the set of behavioral features are generated based on events obtained from a client device used by the agent; automatically estimating whether the agent is a bot based on comparison of the set of behavioral features with one or more additional sets of behavioral features, wherein the one or more additional sets of behavioral features were previously obtained based on interactions of one or more agents with one or more pages; and in response to an estimation that the agent is a bot, performing a responsive action.
  • said processor is further adapted to perform: verifying that the set of behavioral features are consistent with a human-generated interaction; wherein in response to said verifying, performing said automatically estimating,
  • said automatically estimating is performed by a classifier, wherein the classifier obtains multidimensional vectors each of which is defined by a set of behavioral features, wherein said classifier is configured to provide a classification of a bot for a set of behavioral features based on a statistical measurement being above or below a threshold,
  • the set of behavioral features is based on a Page Interaction Packet (PIP), wherein the one or more additional sets of behavioral features are based on PIPs.
  • PIP Page Interaction Packet
  • the PIP and the PIPs are obtained from a same session.
  • the PIP is obtained from a first session, wherein the PIPs comprise at least a subset of PIPs that are obtained from a second session.
  • the PIP and the PIPs originate from pages, wherein the pages are pages of a same type.
  • the pages are pages of the same type from different websites, wherein the different websites are websites of a similar subject matter.
  • the set of behavioral features is based on a Session Interaction Packet (SIP), wherein the one or more additional sets of behavioral features are based on SIPs.
  • SIP Session Interaction Packet
  • the apparatus is a server connectable to computerized devices over a communication network, wherein said obtaining comprises receiving, by the server, the set of behavioral features or indications thereof from a client-side script embedded in the page that is being executed by a computerized device of the agent while the agent is interacting with the page.
  • Yet another exemplar ⁇ ' embodiment of the disclosed subject matter is a computer program product comprising a non-transitory computer readable storage medium retaining program instructions, which program instructions when read by a processor, cause the processor to perform a method comprising: obtaining a set of behavioral features of a usage of an input device during an interaction of an agent with a page, wherein the set of behavioral features are consistent with a human-generated interaction, wherein the set of behavioral features are generated based on events obtained from a client device used by the agent; automatically estimating whether the agent is a bot based on comparison of the set of behavioral features with one or more additional sets of behavioral features, wherein the one or more additional sets of behavioral features were previously obtained based on interactions of one or more agents with one or more pages; and in response to an estimation that the agent is a bot, performing a responsive action.
  • Figure 1 shows an illustration of histograms representing empirical distribution of behavioral feature, in accordance with some exemplar ⁇ ' embodiments of the disclosed subject matter
  • Figures 2A-2D show flowchart diagrams of methods, in accordance with some exemplar ⁇ ' embodiments of the disclosed subject matter
  • Figures 3A-3B show flowchart diagram s of methods, in accordance with some exemplaiy embodiments of the disclosed subject matter
  • Figure 4 shows an illustration of clusters and usage thereof, in accordance with some exemplary embodiments of the disclosed subject matter.
  • Figure 5 shows a block diagram of an apparatus, in accordance with some exemplaiy embodiments of the disclosed subject matter.
  • page may refer to a web page within a website; a page within a web application; a page within an application which may or may not be implemented using native language for the operating system, such as native AndroidTM app, desktop app, and a similar app; or the like.
  • a bot may perform a "replay attack" during which a recorded set of interactions originally performed by a human are injected by the bot, thereby mimicking a human behavior.
  • behavioral features alone may not be sufficient to distinguish between bot agents and human agents, as both agents generate interactions that can be characterized as human-generated.
  • Such bots may record genuine hum an -generated interactions and re-use them ("replay") on additional pages, thereby providing interactions which may cause one to believe that the agent is a human agent and not a bot.
  • a bot agent may use a Page Interaction Packet (PIP) generated based on a human agent's actions (i.e., "human-generated") and replay such interaction when accessing a page.
  • PIP Page Interaction Packet
  • the PIP may comprise interaction data that an agent produces while interacting with a page.
  • the PIP may include, for example, mouse movements, keyboard strokes, touch gestures, device orientation data, or the like.
  • the PIP may be multiplied by the bot agent either as is without modification or while introducing stochastic noise to create variations on the PIP. The multiplied PIPs may then be used while interacting with a single website during a session, an operation which may be referred to as Intra-Session Replay Attack.
  • a Session Interaction Packet may comprise the set of PIPs generated during a single session.
  • a SIP may be compiled based on the multiplied PIPs, and used when the bot is accessing a website to simulate a human interaction during a session.
  • Some bot agents employ Inter-Session Replay Attacks.
  • a bot may utilize a same human-generated PIP in different sessions.
  • several human PIPs may be gathered, multiplied (with or without variations) and used in different SIPs. The different SIPs may be sent to websites in different sessions in order to fake human behavior.
  • a single session may comprise human-generated PIPs that are different from one another.
  • a same human-generated PIP (or variations thereof) may not be used in the same session more than once. However, looking at different sessions, the same human-generated PIP (or variation thereof) may be used a large number of times by the bot.
  • Some bot agents automate interaction with the page until reaching a verification page used to differentiate human agents from bot agents, such as a page comprising CAPTCHA
  • Verification pages may be introduced dynamically, such as based on a suspicion by the server that the agent may be a bot agent, for example, based on a volume of queries generated by the agent during a timeframe, or may be set a- priori, such as in login pages, at entry points to websites or sections thereof, or the like.
  • a bot agent may detect a verification page and pass the session to a human user to overcome the verification page itself.
  • a CAPTCHA presented to the bot agent may be transferred to a CAPTCHA farm service, in which a human may be presented with the CAPTCHA and the interactions produced by the human during interaction with the verification page and the CAPTCHA component embedded therein, also referred to as Verification Interaction Packet (VIP), may be gathered.
  • the VIP may be sent by the bot agent to the verification page to overcome the CAPTCHA.
  • the bot agent may continue to traverse the website automatically, such as using other replayed human-generated interactions or using other forms of interactions.
  • the human-generated PIP may be modified to provide a desired result, such as pressing a specific link, interacting with a widget, or the like.
  • a desired result such as pressing a specific link, interacting with a widget, or the like.
  • modification may not influence at all, or influence only in a negligible manner, behavioral features of the human-generated interactions.
  • One technical solution is to obtain behavioral features of the usage of an input device during an interaction of an agent with a page.
  • the behavioral features may then be used in order to estimate whether or not the agent is a bot agent.
  • the interaction is a human-generated interaction, the interaction by itself appears to be of human origin (i.e., behavioral features are consistent with a human user using the input device in such a manner).
  • Estimation of whether the agent is a bot agent may be performed by comparing the behavioral features with a corpus of behavioral features.
  • the corpus may be used to provide a baseline. A large deviation from the baseline may be indicative of the agent being a bot agent.
  • the deviation may be that the interactions are too similar to one another (e.g., indicative of a same ⁇ being reused).
  • the deviation may be that the interactions are too dissimilar from one another (e.g., indicative of PIPs generated by different human agents being compiled and used together by a same bot).
  • Figure 1 shows an illustration of histograms that describe the empirical distribution of a behavioral feature, where Graph 100 exemplify a histogram that describes the average speed of a group of human-agents on a certain website. Graph 110 exemplify the same histogram on the same website, for a group of bot agents. As can be appreciated from such example, the behavioral features may be used to differentiate between human agents and bot agents.
  • the histograms may represent any behavioral feature or statistical measurement of samples thereof, such as but not limited to average speed of the mouse movements of the agent, mouse pointer acceleration, the curvature of mouse movement, elapsed time, click patterns, pause pattern, movement vectors, average latency between the release of a keyboard key and the pressure of a next key (also referred to as "Flight Time”), average latency between the release of two successive keyboard keys, average latency between the pressure of two successive keys, duration of pressure on one key (also referred to as "Dwell Time”).
  • any behavioral feature or statistical measurement of samples thereof such as but not limited to average speed of the mouse movements of the agent, mouse pointer acceleration, the curvature of mouse movement, elapsed time, click patterns, pause pattern, movement vectors, average latency between the release of a keyboard key and the pressure of a next key (also referred to as "Flight Time”), average latency between the release of two successive keyboard keys, average latency between the pressure of two successive keys, duration of pressure on one key (also
  • the set of values for each behavioral feature (referred to as generally a set of behavioral features, also denoted as BF) of the interaction being examined may be compared with a corpus of additional BFs.
  • the additional BFs may be BFs that were observed from other interactions of the same agent during the same session. Using such BFs may be useful to identify Intra-Session Replay Attacks as the BF and additional BFs may be too similar (e.g., a similarity measurement being above an acceptable threshold). Additionally or alternatively, the additional BFs may be BFs that were observed from other interactions of agents during different sessions.
  • BFs may be useful to identify Inter-Session Replay Attack, as a PIP that is re-used by a bot in different sessions may be identified as creating a relatively dense cluster in a multidimensional space defined based on the BFs.
  • Other metrics may be used instead of or in addition to density of clusters, in order to identify a PIP that was replayed.
  • FIG. 2A showing a flowchart diagram of a method in accordance with some exemplary embodiments of the disclosed subject matter.
  • client-side event acquisition may be performed.
  • event acquisition may be performed by a client-side script that audits, logs or otherwise monitors user interactions that are generated by the agent of the client device.
  • the client device may be, for example, a laptop, a Personal Computer, a mobile phone, a tablet computer, a Personal Digital Assistant (PDA), or any other computerized device capable of accessing pages over a communication network, such as the Internet.
  • the client-side script may be a computer program executed by the client device.
  • the client-side script may be embedded and provided as part of a page being viewed using a page viewer, such as an internet browser, an app interpreter, or the like, and be executed by the page viewer.
  • the client-side script may be a javascript code embedded within the web page being accessed.
  • the client-side script may be native code directly- executable by the operating system of the device executing the page, such as objective-c for iOSTM.
  • the interactions may include various communication channels of data potentially associated with Graphical User Interface (GUI) input devices or peripherals of the client device, such as mouse movements, keyboard strokes, touch gestures, device orientation data (e.g., obtainable from gyroscope and accelerometer sensors of the device), or the like.
  • GUI Graphical User Interface
  • a stream of data maybe monitored in each such channel and collected for analysis. Some or all of the analysis may be performed locally by the client-side script.
  • the analysis may be performed, in whole or in part, by a remote server, by a cloud-based server, or the like.
  • server-side analysis may be more robust and less susceptive to manipulation by a malicious agent tampering with the client-side code.
  • the client-side event acquisition may generate a log of events exhibited by the agent.
  • the log may be transmitted to a remote server for analysis.
  • the log may comprise tracking data regarding peripherals and other input devices during the interaction of the agent with the page.
  • complementary information such as a copy of Hypertext Transfer Protocol (HTTP) traffic may be collected as well or HTTP metadata, either by the client-side script, or the web server responding to such HTTP traffic.
  • the complementary information may include information useful for identifying geo-location of the agent, demographic information, connection method, session identification (e.g., session cookie), connectivity patterns, or the like.
  • HTTP session metadata may be collected.
  • information may be collected at PIP level, indicating information for a single page interaction. Additionally or alternatively, information may be collected at SIP level, aggregating information over several pages for a single session.
  • behavioral features may be extracted from the interactions data, such as from a PIP.
  • measurements of one or more features may be extracted from the event acquired at Step 200.
  • the valuations of each behavioral feature utilized by the disclosed subject matter may be referred to as a "set of behavioral features".
  • a data stream (also referred to a series) may be transformed into empirical distribution statistic using a discretization method such as entropy of the empirical distribution of the series, moment of the empirical distribution of the series, or the like.
  • each behavioral feature may be descriptive of how the user utilizes the input device to interact with the page.
  • the behavioral features may comprise any of the following non-limiting examples: a speed of a movement of a pointing device, an acceleration of a movement of a pointing device, a curvature feature of a movement of a pointing device, movement vectors of a pointing device, a pause pattern of a movement of a pointing device, a click pattern of one or more buttons of an input device, a time duration of the interaction, an average angle of a movement of a pointing device during the interaction, a sharp turns count during the interaction (e.g.,
  • a set of behavioral features may be used as a biometric identifier of a user, which can be used to identify the same person at different interactions. Additionally or alternatively, a set of behavioral features may be used to differentiate human-generated interactions from interactions not generated by- humans (e.g., bot-generated interactions). However, the set of behavioral features themselves may be ineffective to differentiate original human interactions (e.g., human- generated interactions generated by the human agent interacting with the page) from replayed human interactions (e.g., human-generated interactions recorded and reused by a bot agent interacting with the page).
  • a first classifier may be used.
  • the set of behavioral features may ⁇ be applied on the classifier to determine whether the behavioral features are consistent with being human-generated or non-human-generated.
  • the first classifier may be a predictive classifier configured to label a set of behavioral features as either "BOT” or "HUMAN",
  • the first classifier may be trained using training data to be able to distinguish between human-generated interaction and non-human-generated interactions. The training may be based on a large number of instances of both human- and non-human-generated interactions, whose classification is known.
  • the first classifi er may be used to label the set of behavioral features as either "BOT" or "HUMAN".
  • classification may be based on additional features, and may not be limited solely to behavioral features. For example, features relating to demographic information may be used and the prediction may be based on such features as well.
  • the label determination by the first classifier (220) may be used to determine whether the interaction is human-generated or not. It will be noted, however, that human-generated interaction does not guarantee, original interaction that is not being replayed. In case, the interaction is determined to be non-human-generated, Step 232 may be performed. Otherwise, Step 230 may be performed. [0056] On Step 230, a second classifier may be utilized. The set of behavioral features may be provided to the second classifier to determine whether the interaction is original or replayed. The behavioral features used in Step 230 may be the same or different than those used in Step 210. Additionally or alternatively, the second classifier may utilize additional features that are not behavioral features, such as features extracted from the HTTP metadata.
  • the second classifier may be a classifier implementing non-supervised learning techniques or semi-supervised learning techniques.
  • each set of features used by the second classifier to represent an interaction by an agent may be viewed as an n-dimensional vector in an n-dimensional space.
  • the second classifier may inspect statistical measurement relating to the vector representing the interaction being examined.
  • the vector in case the vector is in a relatively dense area of the n- dimensional space, such as an area for which calculated density is above a predetermined threshold, above a designated percentile density in the space, or the like, it may be determined that the interaction is suspected as being replayed.
  • the second classifier may identify clusters in a multidimensional space and provide its estimation based thereof. Density and other statistical measurement may be computed with respect to the cluster of which the vector is a member.
  • the classifier may compare the vector of the interaction with a subset of all vectors available thereto.
  • vectors of other recorded interactions mav be available to the classifier and retained in a data storage. A subset of the interactions may be employed for the determination.
  • the space may consist of only other PIP-based vector (e.g., excluding SIP -based vectors from the space).
  • a relative dense area of the VUE is high (e.g., density above threshold, which is relative and based on general density in the space)
  • density above threshold which is relative and based on general density in the space
  • the PIP may be deemed as a human-generated interaction which is replayed in the same session or in different sessions.
  • the PIP -based VUE may be compared with other PIP- based vectors obtained from interactions in the same session.
  • the interaction may be deemed as being reused in the same session.
  • the interaction may be deemed as inconsistent with the person performing other interactions in the same session, and is therefore indicative of a combination of different PIPs from different sessions and users that are being reused in the same session.
  • the PIP-based VUE may be compared with other P IP- based vectors obtained based on interactions of users with the same type of page as the interaction of the VUE.
  • interaction with a page may be substantially different depending on the type thereof. Interactions with pages of a same type, albeit being in different sites, may be relatively similar.
  • users' interaction with GOOGLETM search results page which may be substantially similar to the users' interaction with BINGTM search results page.
  • the type of page may be, for example, a search results page, an item display page, a verification page, a login page, a form page, a checkout page, or the like.
  • the item display page may be a page displaying information about an item, such as without limitation, a movie page in IMDBTM, a book page in AMAZONTM, a product page in ALIEXPRESSTM, or the like.
  • the verification page may be a page comprising a mechanism to differentiate between humans and bots, such as CAPTCHA.
  • a login page may be a page comprising input fields to be used to provide credentials.
  • the login page may be categorized into sub-types based on the credentials type. As an example, a page requiring only site-specific credentials may be of different type than a page allowing Single Sign On (SSO) credentials.
  • SSO Single Sign On
  • a checkout page may be a page where the user performs a checkout activity, such as completing transaction created through the session.
  • the form page may be a page comprising a tillable form.
  • the tillable form may be comprised of input widgets such as radio button widgets, text input widgets, select widgets, or the like.
  • form pages having a same or similar (e.g., about ⁇ 30%; about ⁇ 3; or the like) number of widgets may be considered as pages of the same type. Additionally or alternatively, only form pages having a same or similar number of widgets of each type may be considered as pages of the same type.
  • the VUE may be compared with vectors that are based with other PIPs or SIPs in which the same page was visited. Such comparison may allow for differentiating abnormal behavior that are abnormal to the specific page. Such embodiments may enable treating the same VUE differently depending on the specific page in which the interaction occurred. The same may be applicable to sessions in the same site.
  • the disclosed subject matter may provide for a framework for checking interactions in different pages, in different sites, and not limited to a specific page or site, while still providing accuracy that is based on big data analytics of interactions that were viewed with each specific page or site.
  • the VUE may be an S IP-based vector.
  • the VUE may be compared with a space consisting of only other SIP -based vectors (e.g., excluding PIP -based vectors from the space).
  • high density may be indicative of a same SIP (or variations thereof) being reused by a bot in different sessions.
  • the classifier may obtain the vectors based on all the PIPs that were obtained in the same session as the interactions upon which the VUE is based. For each same session vector, a statistical measurement (e.g., density measurement) may be computed with respect to one or more additional vectors (e.g., based on PIPs from different sessions). An aggregated statistical measurement, such as average density, variance of density, skewness of density and kurtosis of density, or the like, may be computed, and used to determine whether the session comprises replayed interactions, such as created using Inter- or Intra-Session Replay Attack.
  • a statistical measurement e.g., density measurement
  • An aggregated statistical measurement such as average density, variance of density, skewness of density and kurtosis of density, or the like, may be computed, and used to determine whether the session comprises replayed interactions, such as created using Inter- or Intra-Session Replay Attack.
  • all PIPs of the same session may be used. Additionally or alternatively, all PIPs except for VIPs may be used.
  • the VIP itself may be original VIP generated by a CAPTCHA farm, rather than a replayed interaction. As a result, such original interaction should be overlooked when examining the entire session for determining whether the agent operating is a human or a bot.
  • any combination of the above mentioned examinations may be employed by the second classifier to estimate whether the interaction of the VUE is original or replayed.
  • Step 240 based on the prediction provided by the second classifier, it may be determined whether the human-generated interaction is original or replayed. In case the interaction is original, normal processing may be performed (Step 245) without interference. In case the interaction is estimated to be replayed, responsive action may be performed (Step 232).
  • a responsive action is performed.
  • the responsive action may be designed to mitigate the risk from the bot operation.
  • the session may be discontinued to prevent the bot from acting.
  • the web server of the web page may blacklist the IP address of the agent to prevent the same IP to be reused by the bot.
  • IP information may be used as part of the estimation process.
  • bot agents may be redirected to designated bot servers for handling additional requests therefrom.
  • different pages may be provided to bot agents, such as pages specifically designed for bots to provide the bots with false information about the site, thereby preventing impermissible data mining from the site, vulnerability detection in the site, or the like.
  • a verification page may ⁇ be provided to the bot to verify the estimated decision. It will be noted, however, that verification pages may be circumvented, such as using the semces of CAPTCHA farms.
  • actions taken by the bot agent may be disregarded, such as discounting credit accumulated to the agent, overlooking advertisement impressions served to the bot and Pay Per Click (PPC) advertisements presented to the bot in response to the bot performing an action (e.g., clicking on a banner), or the like.
  • PPC Pay Per Click
  • the responsive action may be designed to allow for analysis of bot and non-bot sessions.
  • the responsive action may tag sessions with a bot or non-bot label, allowing for offline analysis of the session information.
  • session data may be utilized for website optimization, such as for a-b testing of different versions of the website. The optimization may be improved by relating solely to non-bot sessions, which may be achieved using the tagging.
  • a single classifier may be utilized to unify Steps 210 and 230.
  • the classifier may be configured to label each VUE as "HUMAN", "BOT” or "REPLAYED", indicating human agent, bot-generated interaction, or hum an -generated bot-replayed interaction.
  • the label of REPLAYED may be unified with the BOT label.
  • the first classifier may not be employed and instead only the second classifier may be used without first verifying that the interaction is consistent with human-generated interaction.
  • FIG. 3 A showing a flowchart diagram of a method in accordance with some exemplary embodiments of the disclosed subject matter.
  • Web Agent 310 may be an agent interacting with a Website 300 over a session.
  • Web Agent 310 provides activity reports to Event Acquisition System (EAS) 320.
  • EAS 320 may be configured to obtain peripherals tracking data, such as of mouse, keyboard, touchpad, touch screen, orientation, or the like, originated from the device employed by- Web Agent 310.
  • EAS 320 may further obtain HTTP session metadata of Web Agent 310 interacting with Website 300.
  • EAS 320 is deployed on a server, and obtains by receiving reports issued by a client-side script embedded in the page of Website 300 that is being executed by the device of Web Agent 310 while Web Agent 310 is interacting with Website 300.
  • Extract, Transform & Load (ETL) 330 may be a module configured to transform each event acquired by EAS 320 into a set of features to be used by Classifier 340.
  • the features may comprise at least in pari, behavioral features based on the peripherals tracking data.
  • the set of features generated by ETL 330 may be a multidimensional vector in a multidimensional space,
  • the vector of features generated by ETL 330 may be provided to Classifier 340 for training.
  • training may be performed with respect to human-verified Web Agent 310 or to bot-verified Web Agent 310.
  • the vector of features generated by ETL 330 may be added to a repository used by Classifier 340 for its estimation.
  • the predicative capabilities of Classifier 340 may be improved.
  • FIG. 3B showing a flowchart diagram of a method in accordance with some exemplary embodiments of the disclosed subject matter.
  • Classifier 340 is being used to provide a prediction.
  • Web Agent 311 is interacting with Website 301, which may or may not be the same website as Website 300.
  • EAS 320 may acquire events which are transformed by ETL 330 to features.
  • the features may be fed to Classifier 340 to provide a prediction.
  • the features are also added to Classifier's 340 repository to be used for future predictions as well.
  • Bot Responsive Module 350 may be employed to implement a responsive action.
  • the responsive action may be determined based on rules, based on the identity of the Website 301 or web page therein being accessed, or the like.
  • Bot Responsive Module 350 may instruct Website 301 to alter its functionality in view of the bot detection, such as issue a redirect notification, provide different web pages to Web Agent 311, or the like.
  • Figure 4 shows an illustration of clusters that are automatically determined based on the set of behavioral features. Based on clustering process, Clusters 410, 420, 430, 440 may be identified.
  • a cluster may represent a typical type of behavior.
  • Cluster 410 may comprise interactions with very slow mouse movements, while interactions with ver fast mouse movements may be grouped in Cluster 440.
  • centroids of each cluster may be computed and potentially stored to be retrieved when needed.
  • other statistical features of the clusters may be computed, such as, for example, a density of the clusters, a scattering and the number of elements in each cluster, an average distance of the elements in each cluster from their centroid, a standard deviation of the distances, or the like.
  • Such statistical features may be stored in a database for furt er usage.
  • Centroids of each cluster may be computed, such as Centroids 412, 422, 432, 442. Additionally or alternatively, density of Cluster 430 may be greater than that of Cluster 420, In some exemplary embodiments, dense clusters with small diameter may be indicative of bot agents replaying the same interaction over and over in different sessions. Though Cluster 440 may be denser than Cluster 430, it has a relatively larger diameter, indicating that the cluster has a large degree of variations, and therefore is not necessarily a bot cluster. In some exemplary embodiments, Cluster 410 may have relatively small number of instances scattered therein, indicating that the cluster is likely to comprise rare interactions that have relatively low correlation with one another and therefore may be indicative of non-standard human behavior, and potentially various interactions by bots.
  • clusters may be labeled.
  • clusters may be labeled as "HUMAN" or "BOT". Labeling may be performed manually, automatically or in a semi-automatic manner. In some exemplary embodiments, automatic labeling may be based on sessions in which bots were detected as indicting bot interactions which may infer a bot cluster. Additionally or alternatively, sessions in which a human agent was validated without a doubt may be used to infer a human cluster. As an example, a session in which a purchase order was completed and including making payment may be treated as a validated human agent session.
  • the session may be no longer assumed to be a human agent session if the purchase order was canceled and/or the payment was canceled subsequently to the purchase order being completed.
  • the training data for such semi-supervised method may not include information from validation mechanisms, such as CAPTCHA, as such mechanisms may be circumvented. Instead, the training data may include data that is validated in other manners, such as by performing actions which bot agents do not perform, by relying on human interaction with the human agent to verify her identity, or the like.
  • automatic labeling may be based on clusters' statistical features such as diameter, density, standard deviation, or the like. Such features may be compared with predetermined or computed thresholds, which may be absolute or relative.
  • Vectors 450, 460 and 470 may be analyzed.
  • Vector 450 is relatively close to Cluster 410. Therefore, it may be deduced that the chances of Vector 450 being produced by a bot agent are not high, even though it does not match any existing cluster.
  • the probability of Vector 460 being of a bot agent is higher, though it may be estimated to have some relation with Cluster 430.
  • Vector 470 is very distant from all obtained clusters, and can therefore be estimated that it is associated with a bot agent.
  • the distances may be weighted according to various factors, such as a number of elements (either normalized or not) in the clusters, cluster density, or the like. These weights may help in reflecting the clusters more accurately, as they reflect not only the typical human-behaviors, but also how typical they are.
  • the examined distance of the VTJE from the centroids could be for example, but not limited to - the minimum distance, the average distance, the weighted average distance, or any other distance-heuristic,
  • each interaction may be examined from several data-perspectives, in accordance to the number of the behavioral features that are examined .
  • the chances that an interaction is generated by a bot agent is higher when many of its vectors are too far from the obtained centroids.
  • a user that moves his mouse in a specific SIP in an irregular speed, for example, is not necessarily a bot.
  • the chances that it is a bot may be higher. Therefore, a large collection of behavioral features may reduce the chances of false positive determinations (i.e. erroneously flagging an interaction generated by a human agent as a bot agent).
  • the irregularity is local to a single VUE, for example, in one PIP, such irregularity may be insufficient to estimate, with a high confidence level, that the user is a bot.
  • the confidence level of the bot detection may be considered higher and therefore an estimation may be produced.
  • the estimation may be produced only in case that the confidence level is above a threshold.
  • a responsive action may be implemented to gather additional information to re-perform the determination.
  • the additional information may include causing the user to pass through a verification page, gathering additional PIP data to perform additional checks or the like.
  • the agent may be estimated as a bot, and may be prohibited from performing some actions (e.g., accessing specific data elements). Additionally or alternatively, bot estimated agent may be allowed to click on ads. However, in order to prevent click frauds, ad serving may be disabled or the economic effect of the ad serving may be disabled. In some exemplary embodiments, if the estimation is later on refuted, the economic effect may be retroactively enabled.
  • An apparatus 500 may be configured to perform any of the methods of Figures 2, 3A-3B, or the like.
  • Apparatus 500 may comprise a Processor 502.
  • Processor 502 may be a Central Processing Unit (CPU), a microprocessor, an electronic circuit, an Integrated Circuit (IC) or the like.
  • Processor 502 may be utilized to perform computations required by Apparatus 500 or any of it subcomponents.
  • Apparatus 500 may comprise an Input/Output (I/O) Module 505.
  • the I/O Module 505 may be utilized to provide an output to and receive input from a user.
  • user input may be provided in manual validation of cluster labels, in manual labeling of clusters, or the like. It will be understood, that Apparatus 500 may be configured to operate without user input or human interaction.
  • Apparatus 500 may comprise Memory 507.
  • Memory 507 may be a hard disk drive, a Flash disk, a Random Access Memory (RAM), a memory chip, or the like.
  • Memory 507 may retain program code operative to cause Processor 502 to perform acts associated with an of the subcomponents of Apparatus 500.
  • Memory 507 may retain clustering data, such as clusters, centroids or other statistical features thereof, labeling of clusters, or the like.
  • BF Determinator 510 may be configured to obtain and compute the set of behavioral features based on an interaction packet, such as PIP, SIP, CIP, or the like. In some exemplary embodiments, BF Determinator 5 0 may be configured to compute various BFs for a same interaction packet, each of which associated with a different group of behavioral features.
  • BF Distance Computer 520 may be configured to compute a distance measurement between two vectors representing BFs. In some exemplary embodiments, the distance measurement may be based on any distance metric. In some exemplary embodiments, the distance measurement may take into account information regarding a cluster of one of the BFs, such as in case an BF is compared with a centroid BF of a cluster.
  • Clustering Module 530 may be configured to compute clusters based on BF data. Clustering performed by Clustering Module 530 may utilize non-supervised machine learning techniques, semi-supervised machine learning techniques, or the like. In some exemplary embodiments, Clustering Module 530 may provide labeling for clusters, such as based on automated deduction, semi-automatic determination having manual validation, manual input by human users, or the like.
  • One embodiment of the disclosed subject matter is a method for classifying an HTTP session in a manner that distinguishes between a human agent and a bot agent.
  • Client-side events are acquired.
  • the client-side events are events generated by the client's device GUI peripherals, such as, pointing devices, keyboard devices and move sensors, on a given page in an HTTP session.
  • a machine learning classifier is provided with the vector.
  • the classifier distinguishes between pages that originated from a human agent and between a bot agent. If the classifier determines the agent to be a bot agent, such indication is returned. Otherwise, the process continues.
  • the multi-dimensional numerical vector is embedded in a mutli- dimensional space.
  • the vector's similarity/distance/density is compared to that of a control group of previously generated multi -dimensional numerical vectors.
  • the control group can be generated using events from pages as follows: (a) previous pages from the same session; (b) previous pages accessed by different sessions on the same page; (c) previous pages accessed by different sessions that are similar to the current page.
  • a rale engine may be employed.
  • the rule agent may implement any or all of the following non-limiting, exemplary, rales. If the current page generates a highly dense or very sparse density with reference to control group (a), it may be concluded that the page originated by a bot agent.
  • control group (b) If the current page generates a highly dense or very sparse density with reference to control group (b), it may be concluded that the page originated by a bot agent via multiple sessions, if the current page generates a highly dense or very sparse density with reference to control group (c), it may be concluded that the page originated by a bot agent. Additional control groups and rules may be used in the determination. In case no rule indicated that the current page was visited by a bot agent, an indication that the agent is a human agent may be returned.
  • the present invention may be a system, a method, and/or a computer program product.
  • the computer program product may include a computer readable storage medium (or media) having computer readable program instructions thereon for causing a processor to carry out aspects of the present invention.
  • the computer readable storage medium can be a tangible device that can retain and store instructions for use by an instruction execution device.
  • the computer readable storage medium may be, for example, but is not limited to, an electronic storage device, a magnetic storage device, an optical storage device, an electromagnetic storage device, a semiconductor storage device, or any suitable combination of the foregoing.
  • a non- exhaustive list of more specific examples of the computer readable storage medium includes the following: a portable computer diskette, a hard disk, a random access memory (RAM), a read-only memory (ROM), an erasable programmable read-only memory (EPROM or Flash memory), a static random access memory (SRAM), a portable compact disc read-only memory (CD-ROM), a digital versatile disk (DVD), a memory stick, a floppy disk, a mechanically encoded device such as punch-cards or raised structures in a groove having instructions recorded thereon, and any suitable combination of the foregoing.
  • RAM random access memory
  • ROM read-only memory
  • EPROM or Flash memory erasable programmable read-only memory
  • SRAM static random access memory
  • CD-ROM compact disc read-only memory
  • DVD digital versatile disk
  • memory stick a floppy disk
  • mechanically encoded device such as punch-cards or raised structures in a groove having instructions recorded thereon
  • a computer readable storage medium is not to be construed as being transitory signals per se, such as radio waves or other freely propagating electromagnetic waves, electromagnetic waves propagating through a waveguide or other transmission media (e.g., light pulses passing through a fiber-optic cable), or electrical signals transmitted through a wire.
  • Computer readable program instructions described herein can be downloaded to respective computing/processing devices from a computer readable storage medium or to an external computer or external storage device via a network, for example, the Internet, a local area network, a wide area network and/or a wireless network.
  • the network may comprise copper transmission cables, optical transmission fibers, wireless transmission, routers, firewalls, switches, gateway computers and/or edge servers.
  • a network adapter card or network interface in each computing/processing device receives computer readable program instructions from the network and forwards the computer readable program instructions for storage in a computer readable storage medium within the respective computing processing device.
  • Computer readable program instructions for carrying out operations of the present invention may be assembler instructions, instruction-set-architecture (ISA) instructions, machine instructions, machine dependent instructions, microcode, firmware instructions, state-setting data, or either source code or object code written in any combination of one or more programming languages, including an object oriented programming language such as Smalltalk, C++ or the like, and conventional procedural programming languages, such as the "C" programming language or similar programming languages.
  • the computer readable program instructions may execute entirely on the user's computer, partly on the user's computer, as a stand-alone software package, partly on the user's computer and partly on a remote computer or entirely on the remote computer or server.
  • the remote computer may be connected to the user's computer through any type of network, including a local area network (LAN) or a wide area network (WAN), or the connection may be made to an external computer (for example, through the Internet using an Internet Service Provider).
  • electronic circuitry including, for example, programmable logic circuitry, field-programmable gate arrays (FPGA), or programmable logic arrays (PLA) may execute the computer readable program instructions by utilizing state information of the computer readable program instructions to personalize the electronic circuitry, in order to perform aspects of the present invention.
  • These computer readable program instructions may be provided to a processor of a general purpose computer, special purpose computer, or other programmable data processing apparatus to produce a machine, such that the instructions, which execute via the processor of the computer or other programmable data processing apparatus, create means for implementing the functions/acts specified in the flowchart and/or block diagram block or blocks.
  • These computer readable program instructions may also be stored in a computer readable storage medium that can direct a computer, a programmable data processing apparatus, and/or other devices to function in a particular manner, such that the computer readable storage medium having instructions stored therein comprises an article of manufacture including instructions which implement aspects of the function/act specified in the flowchart and/or block diagram block or blocks.
  • the computer readable program instructions may also be loaded onto a computer, other programmable data processing apparatus, or other device to cause a series of operational steps to be performed on the computer, other programmable apparatus or other device to produce a computer implemented process, such that the instructions which execute on the computer, other programmable apparatus, or other device implement the functions/acts specified in the flowchart and/or block diagram block or blocks.
  • each block in the flowchart or block diagrams may represent a module, segment, or portion of instructions, which comprises one or more executable instructions for implementing the specified logical function(s).
  • the functions noted in the block may occur out of the order noted in the figures.
  • two blocks shown in succession may, in fact, be executed substantially concurrently, or the blocks may sometimes be executed in the reverse order, depending upon the functionality involved.

Landscapes

  • Engineering & Computer Science (AREA)
  • Computer Security & Cryptography (AREA)
  • Computer Hardware Design (AREA)
  • General Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • General Health & Medical Sciences (AREA)
  • Social Psychology (AREA)
  • Health & Medical Sciences (AREA)
  • Software Systems (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • User Interface Of Digital Computer (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

A method, apparatus and product for identifying a bot agent using behavioral features. The method comprising obtaining a set of behavioral features of a usage of an input device during an interaction of an agent with a page, wherein the set of behavioral features are consistent with a human-generated interaction, wherein the set of behavioral features are generated based on events obtained from a client device used by the agent; automatically estimating whether the agent is a bot based on comparison of the set of behavioral features with one or more additional sets of behavioral features, wherein the one or more additional sets of behavioral features were previously obtained based on interactions of one or more agents with one or more pages; and in response to an estimation that the agent is a bot, performing a responsive action.

Description

UTILIZING BEHAVIORAL FEATURES TO IDENTIFY ROT
CROSS REFERENCE TO RELATED APPLICATIONS
[0001 ] The present application claims the benefit of Provisional Patent Application US 62/272,065 filed on December 28, 2015, and of Provisional Patent Application US 62/272,058 filed on December 28, 2015, both of which are incorporated in reference in their entirety for ail purposes, without giving rise to disavowment.
TECHNICAL FIELD
[0002] The present disclosure relates to bot detection in general, and to bot detection using behavior biometrics, in particular.
BACKGROUND
[0003] In the context of the present disclosure a "bot" is an automatic software agent that accesses web pages of websites. Bots can potentially harm websites in various ways, such as data scraping, advertisement fraud, credential stuffing, Denial of Service (DoS) attacks, and more. Bots have become a substantial problem in today's World Wide Web. For example, some studies indicate that billions of dollars are stolen every year by bot fraudsters.
[0004] One known method to distinguish bots from human interaction is Completely Automated Public Turing test to tell Computers and Humans Apart (CAPTCHA). A CAPTCHA is a challenge-response test used to determine whether or not the user is human. A CAPTCHA presents a challenge to the agent accessing a web page. Typically, the challenge may be a task that automated tools are less prone to respond correctly, such as identifying numbers or letters in a distorted image, which may be a relatively easy task for a human and relatively hard task for an automated agent. BRIEF SUMMARY
[0005] One exemplary embodiment of the disclosed subject matter is a method comprising: obtaining a set of behavioral features of a usage of an input device during an interaction of an agent with a page, wherein the set of behavioral features are consistent with a human-generated interaction, wherein the set of behavioral features are generated based on events obtained from a client device used by the agent; automatically estimating whether the agent is a bot based on comparison of the set of behavioral features with one or more additional sets of behavioral features, wherein the one or more additional sets of behavioral features were previously obtained based on interactions of one or more agents with one or more pages; and in response to an estimation that the agent is a bot, performing a responsive action.
[0006] Optionally, the method may comprise verifying that the set of behavioral features are consistent with a human-generated interaction, and wherein in response to said verifying, performing said automatically estimating.
[0007] Optionally, said automatically estimating is performed by a classifier, wherein the classifier obtains multidimensional vectors each of which is defined by a set of behavioral features, wherein said classifier is configured to provide a classification of a bot for a set of behavioral features based on a statistical measurement being above or below a threshold.
[0008] Optionally, the set of behavioral features is based on a Page Interaction Packet (PIP), wherein the one or more additional sets of behavioral features are based on PIPs.
[0009] Optionally, the PIP and the PIPs are obtained from a same session.
[0010] Optionally, the PIP is obtained from a first session, wherein the PIPs comprise at least a subset of PIPs that are obtained from a second session.
[001 1] Optionally, the PIP and the PIPs originate from pages, wherein the pages are pages of a same type.
[0012] Optionally, the type of the pages is selected from the group consisting of: a search results page; an item display page; a verification page; a login page; a form page; and a checkout page. [0013] Optionally, the pages are pages of the same type from different websites, wherein the different websites are websites of a similar subject matter.
[0014] Optionally, the set of behavioral features is based on a Session Interaction Packet (SIP), wherein the one or more additional sets of behavioral features are based on SIPs.
[0015] Optionally, said automatically estimating comprises: obtaining a plurality of sets of behavioral features each of which is based on a Page Interaction Packet (PIP) of the agent, all of which were obtained during a same session, wherein the plurality of sets of behavioral features comprises the set of behavioral features, for each set of the plurality of sets of behavioral features, computing a statistical measurement with respect to the one or more additional sets of behavioral features; computing an aggregated statistical measurement based on the statistical measurement of each set of the plurality of set of behavioral features; and estimating whether the agent is a bot based on the aggregated statistical measurement being above or below a threshold,
[0016] Optionally, the set of behavioral features comprise at least one of: a speed of a movement of a pointing device; an acceleration of a movement of a pointing device; a curvature feature of a movement of a pointing device; one or more movement vectors of a pointing device; a pause pattern of a movement of a pointing device; a click pattern of one or more buttons of an input device; a time duration of the interaction; a flight time; a dwell time; an angle movement of a pointing device; acceleration measurement of a user device during the interaction; and an orientation of a user device during the interaction.
[0017] Optionally, said obtaining comprises receiving, by a server, the set of behavioral features or indications thereof from a client-side script embedded in the page that is being executed by a computerized device of the agent while the agent is interacting with the page.
[0018] Optionally, the one or more additional sets of behavioral features were previously obtained based on interactions with the page.
[0019] Another exemplary embodiment of the disclosed subject matter is an apparatus having a processor, the processor being adapted to perform the steps of: obtaining a set of behavioral features of a usage of an input device during an interaction of an agent with a page, wherein the set of behavioral features are consistent with a human- generated interaction, wherein the set of behavioral features are generated based on events obtained from a client device used by the agent; automatically estimating whether the agent is a bot based on comparison of the set of behavioral features with one or more additional sets of behavioral features, wherein the one or more additional sets of behavioral features were previously obtained based on interactions of one or more agents with one or more pages; and in response to an estimation that the agent is a bot, performing a responsive action.
[0020] Optionally, said processor is further adapted to perform: verifying that the set of behavioral features are consistent with a human-generated interaction; wherein in response to said verifying, performing said automatically estimating,
[0021] Optionally, said automatically estimating is performed by a classifier, wherein the classifier obtains multidimensional vectors each of which is defined by a set of behavioral features, wherein said classifier is configured to provide a classification of a bot for a set of behavioral features based on a statistical measurement being above or below a threshold,
[0022] Optionally, the set of behavioral features is based on a Page Interaction Packet (PIP), wherein the one or more additional sets of behavioral features are based on PIPs.
[0023] Optionally, the PIP and the PIPs are obtained from a same session.
[0024] Optionally, the PIP is obtained from a first session, wherein the PIPs comprise at least a subset of PIPs that are obtained from a second session.
[0025] Optionally, the PIP and the PIPs originate from pages, wherein the pages are pages of a same type.
[0026] Optionally, the pages are pages of the same type from different websites, wherein the different websites are websites of a similar subject matter.
[0027] Optionally, the set of behavioral features is based on a Session Interaction Packet (SIP), wherein the one or more additional sets of behavioral features are based on SIPs.
[0028] Optionally, the apparatus is a server connectable to computerized devices over a communication network, wherein said obtaining comprises receiving, by the server, the set of behavioral features or indications thereof from a client-side script embedded in the page that is being executed by a computerized device of the agent while the agent is interacting with the page.
[0029] Yet another exemplar}' embodiment of the disclosed subject matter is a computer program product comprising a non-transitory computer readable storage medium retaining program instructions, which program instructions when read by a processor, cause the processor to perform a method comprising: obtaining a set of behavioral features of a usage of an input device during an interaction of an agent with a page, wherein the set of behavioral features are consistent with a human-generated interaction, wherein the set of behavioral features are generated based on events obtained from a client device used by the agent; automatically estimating whether the agent is a bot based on comparison of the set of behavioral features with one or more additional sets of behavioral features, wherein the one or more additional sets of behavioral features were previously obtained based on interactions of one or more agents with one or more pages; and in response to an estimation that the agent is a bot, performing a responsive action.
THE BRIEF DESCRIPTION OF THE SEVERAL VIEWS OF THE DRAWINGS
[0030] The present disclosed subject matter will be understood and appreciated more fully from the following detailed description taken in conjunction with the drawings in which corresponding or like numerals or characters indicate corresponding or like components. Unless indicated othenvise, the drawings provide exemplary embodiments or aspects of the disclosure and do not limit the scope of the disclosure. In the drawings:
[0031] Figure 1 shows an illustration of histograms representing empirical distribution of behavioral feature, in accordance with some exemplar}' embodiments of the disclosed subject matter;
[0032] Figures 2A-2D show flowchart diagrams of methods, in accordance with some exemplar^' embodiments of the disclosed subject matter;
[0033] Figures 3A-3B show flowchart diagram s of methods, in accordance with some exemplaiy embodiments of the disclosed subject matter;
[0034] Figure 4 shows an illustration of clusters and usage thereof, in accordance with some exemplary embodiments of the disclosed subject matter; and
[0035] Figure 5 shows a block diagram of an apparatus, in accordance with some exemplaiy embodiments of the disclosed subject matter.
DETAILED DESCRIPTION
[0036] One technical problem dealt with by the disclosed subject matter is to distinguish human agents from bot agents based on their interaction with one or more pages. In the present disclosure, the term "page" may refer to a web page within a website; a page within a web application; a page within an application which may or may not be implemented using native language for the operating system, such as native Android™ app, desktop app, and a similar app; or the like.
[0037] In some exemplary embodiments, a bot may perform a "replay attack" during which a recorded set of interactions originally performed by a human are injected by the bot, thereby mimicking a human behavior. In such a case, behavioral features alone may not be sufficient to distinguish between bot agents and human agents, as both agents generate interactions that can be characterized as human-generated. Such bots may record genuine hum an -generated interactions and re-use them ("replay") on additional pages, thereby providing interactions which may cause one to believe that the agent is a human agent and not a bot.
[0038] In some exemplary embodiments, a bot agent may use a Page Interaction Packet (PIP) generated based on a human agent's actions (i.e., "human-generated") and replay such interaction when accessing a page. In some exemplar}' embodiments, the PIP may comprise interaction data that an agent produces while interacting with a page. The PIP may include, for example, mouse movements, keyboard strokes, touch gestures, device orientation data, or the like. The PIP may be multiplied by the bot agent either as is without modification or while introducing stochastic noise to create variations on the PIP. The multiplied PIPs may then be used while interacting with a single website during a session, an operation which may be referred to as Intra-Session Replay Attack.
[0039] In some exemplary embodiments, a Session Interaction Packet (SIP) may comprise the set of PIPs generated during a single session.
[0040] In some exemplary embodiments, during an Intra-Session Replay Attack, a SIP may be compiled based on the multiplied PIPs, and used when the bot is accessing a website to simulate a human interaction during a session. [0041] Some bot agents employ Inter-Session Replay Attacks. During an Inter- Session Replay Attack a bot may utilize a same human-generated PIP in different sessions. In some cases, several human PIPs may be gathered, multiplied (with or without variations) and used in different SIPs. The different SIPs may be sent to websites in different sessions in order to fake human behavior. In Inter-Session Replay Attack, a single session may comprise human-generated PIPs that are different from one another. A same human-generated PIP (or variations thereof) may not be used in the same session more than once. However, looking at different sessions, the same human-generated PIP (or variation thereof) may be used a large number of times by the bot.
[0042] Some bot agents automate interaction with the page until reaching a verification page used to differentiate human agents from bot agents, such as a page comprising CAPTCHA, Verification pages may be introduced dynamically, such as based on a suspicion by the server that the agent may be a bot agent, for example, based on a volume of queries generated by the agent during a timeframe, or may be set a- priori, such as in login pages, at entry points to websites or sections thereof, or the like. A bot agent may detect a verification page and pass the session to a human user to overcome the verification page itself. For example, a CAPTCHA presented to the bot agent may be transferred to a CAPTCHA farm service, in which a human may be presented with the CAPTCHA and the interactions produced by the human during interaction with the verification page and the CAPTCHA component embedded therein, also referred to as Verification Interaction Packet (VIP), may be gathered. The VIP may be sent by the bot agent to the verification page to overcome the CAPTCHA. After the CAPTCHA is solved, the bot agent may continue to traverse the website automatically, such as using other replayed human-generated interactions or using other forms of interactions.
[0043] It will be noted that the human-generated PIP may be modified to provide a desired result, such as pressing a specific link, interacting with a widget, or the like. However, such modification may not influence at all, or influence only in a negligible manner, behavioral features of the human-generated interactions.
[0044] One technical solution is to obtain behavioral features of the usage of an input device during an interaction of an agent with a page. The behavioral features may then be used in order to estimate whether or not the agent is a bot agent. In some cases, as the interaction is a human-generated interaction, the interaction by itself appears to be of human origin (i.e., behavioral features are consistent with a human user using the input device in such a manner). Estimation of whether the agent is a bot agent may be performed by comparing the behavioral features with a corpus of behavioral features. The corpus may be used to provide a baseline. A large deviation from the baseline may be indicative of the agent being a bot agent. The deviation may be that the interactions are too similar to one another (e.g., indicative of a same ΡΪΡ being reused). The deviation may be that the interactions are too dissimilar from one another (e.g., indicative of PIPs generated by different human agents being compiled and used together by a same bot).
[0045] Figure 1 shows an illustration of histograms that describe the empirical distribution of a behavioral feature, where Graph 100 exemplify a histogram that describes the average speed of a group of human-agents on a certain website. Graph 110 exemplify the same histogram on the same website, for a group of bot agents. As can be appreciated from such example, the behavioral features may be used to differentiate between human agents and bot agents. Additionally or alternatively, the histograms may represent any behavioral feature or statistical measurement of samples thereof, such as but not limited to average speed of the mouse movements of the agent, mouse pointer acceleration, the curvature of mouse movement, elapsed time, click patterns, pause pattern, movement vectors, average latency between the release of a keyboard key and the pressure of a next key (also referred to as "Flight Time"), average latency between the release of two successive keyboard keys, average latency between the pressure of two successive keys, duration of pressure on one key (also referred to as "Dwell Time").
[0046] In some exemplary embodiments, the set of values for each behavioral feature (referred to as generally a set of behavioral features, also denoted as BF) of the interaction being examined may be compared with a corpus of additional BFs. The additional BFs may be BFs that were observed from other interactions of the same agent during the same session. Using such BFs may be useful to identify Intra-Session Replay Attacks as the BF and additional BFs may be too similar (e.g., a similarity measurement being above an acceptable threshold). Additionally or alternatively, the additional BFs may be BFs that were observed from other interactions of agents during different sessions. Using such BFs may be useful to identify Inter-Session Replay Attack, as a PIP that is re-used by a bot in different sessions may be identified as creating a relatively dense cluster in a multidimensional space defined based on the BFs. Other metrics may be used instead of or in addition to density of clusters, in order to identify a PIP that was replayed.
[0047] Referring now to Figure 2A showing a flowchart diagram of a method in accordance with some exemplary embodiments of the disclosed subject matter.
[0048] On Step 200, client-side event acquisition may be performed. In some exemplary embodiments, event acquisition may be performed by a client-side script that audits, logs or otherwise monitors user interactions that are generated by the agent of the client device. The client device may be, for example, a laptop, a Personal Computer, a mobile phone, a tablet computer, a Personal Digital Assistant (PDA), or any other computerized device capable of accessing pages over a communication network, such as the Internet. The client-side script may be a computer program executed by the client device. As an example, the client-side script may be embedded and provided as part of a page being viewed using a page viewer, such as an internet browser, an app interpreter, or the like, and be executed by the page viewer. In one embodiment, the client-side script may be a javascript code embedded within the web page being accessed. In another embodiment, the client-side script may be native code directly- executable by the operating system of the device executing the page, such as objective-c for iOS™. In some exemplar}' embodiments, the interactions may include various communication channels of data potentially associated with Graphical User Interface (GUI) input devices or peripherals of the client device, such as mouse movements, keyboard strokes, touch gestures, device orientation data (e.g., obtainable from gyroscope and accelerometer sensors of the device), or the like. A stream of data maybe monitored in each such channel and collected for analysis. Some or all of the analysis may be performed locally by the client-side script. Additionally or alternatively, the analysis may be performed, in whole or in part, by a remote server, by a cloud-based server, or the like. In some exemplary embodiments, server-side analysis may be more robust and less susceptive to manipulation by a malicious agent tampering with the client-side code. In some exemplary embodiments, the client-side event acquisition may generate a log of events exhibited by the agent. The log may be transmitted to a remote server for analysis. The log may comprise tracking data regarding peripherals and other input devices during the interaction of the agent with the page.
[0049] In some exemplar}' embodiments, complementary information, such as a copy of Hypertext Transfer Protocol (HTTP) traffic may be collected as well or HTTP metadata, either by the client-side script, or the web server responding to such HTTP traffic. The complementary information may include information useful for identifying geo-location of the agent, demographic information, connection method, session identification (e.g., session cookie), connectivity patterns, or the like. In some exemplary embodiments, HTTP session metadata may be collected.
[0050] In some exemplary embodiments, information may be collected at PIP level, indicating information for a single page interaction. Additionally or alternatively, information may be collected at SIP level, aggregating information over several pages for a single session.
[0051] On Step 205, behavioral features may be extracted from the interactions data, such as from a PIP. In some exemplary embodiments, measurements of one or more features may be extracted from the event acquired at Step 200. The valuations of each behavioral feature utilized by the disclosed subject matter may be referred to as a "set of behavioral features". In some exemplary embodiments, a data stream (also referred to a series) may be transformed into empirical distribution statistic using a discretization method such as entropy of the empirical distribution of the series, moment of the empirical distribution of the series, or the like. In some exemplar}' embodiments, each behavioral feature may be descriptive of how the user utilizes the input device to interact with the page. Without loss of generality, the behavioral features may comprise any of the following non-limiting examples: a speed of a movement of a pointing device, an acceleration of a movement of a pointing device, a curvature feature of a movement of a pointing device, movement vectors of a pointing device, a pause pattern of a movement of a pointing device, a click pattern of one or more buttons of an input device, a time duration of the interaction, an average angle of a movement of a pointing device during the interaction, a sharp turns count during the interaction (e.g.,
1 movements having an angle below a threshold), a variance of a movement angle of a pointing device, an orientation of a user device during the interaction, or the like.
[0052] It will be noted that in some cases, a set of behavioral features may be used as a biometric identifier of a user, which can be used to identify the same person at different interactions. Additionally or alternatively, a set of behavioral features may be used to differentiate human-generated interactions from interactions not generated by- humans (e.g., bot-generated interactions). However, the set of behavioral features themselves may be ineffective to differentiate original human interactions (e.g., human- generated interactions generated by the human agent interacting with the page) from replayed human interactions (e.g., human-generated interactions recorded and reused by a bot agent interacting with the page).
[0053] On Step 210, a first classifier may be used. The set of behavioral features may¬ be applied on the classifier to determine whether the behavioral features are consistent with being human-generated or non-human-generated. In some exemplary embodiments, the first classifier may be a predictive classifier configured to label a set of behavioral features as either "BOT" or "HUMAN", The first classifier may be trained using training data to be able to distinguish between human-generated interaction and non-human-generated interactions. The training may be based on a large number of instances of both human- and non-human-generated interactions, whose classification is known. The first classifi er may be used to label the set of behavioral features as either "BOT" or "HUMAN".
[0054] It will be noted that the classification may be based on additional features, and may not be limited solely to behavioral features. For example, features relating to demographic information may be used and the prediction may be based on such features as well.
[0055] The label determination by the first classifier (220) may be used to determine whether the interaction is human-generated or not. It will be noted, however, that human-generated interaction does not guarantee, original interaction that is not being replayed. In case, the interaction is determined to be non-human-generated, Step 232 may be performed. Otherwise, Step 230 may be performed. [0056] On Step 230, a second classifier may be utilized. The set of behavioral features may be provided to the second classifier to determine whether the interaction is original or replayed. The behavioral features used in Step 230 may be the same or different than those used in Step 210. Additionally or alternatively, the second classifier may utilize additional features that are not behavioral features, such as features extracted from the HTTP metadata.
[0057] In some exemplary embodiments, the second classifier may be a classifier implementing non-supervised learning techniques or semi-supervised learning techniques. In some exemplary embodiments, each set of features used by the second classifier to represent an interaction by an agent, may be viewed as an n-dimensional vector in an n-dimensional space. The second classifier may inspect statistical measurement relating to the vector representing the interaction being examined. In some exemplar}' embodiments, in case the vector is in a relatively dense area of the n- dimensional space, such as an area for which calculated density is above a predetermined threshold, above a designated percentile density in the space, or the like, it may be determined that the interaction is suspected as being replayed. In some exemplary embodiments, other statistical measurement may be employed in addition to or instead of density. In some exemplar}' embodiments, the second classifier may identify clusters in a multidimensional space and provide its estimation based thereof. Density and other statistical measurement may be computed with respect to the cluster of which the vector is a member.
[0058] In some exemplary embodiments, the classifier may compare the vector of the interaction with a subset of all vectors available thereto. In some exemplary embodiments, vectors of other recorded interactions mav be available to the classifier and retained in a data storage. A subset of the interactions may be employed for the determination.
[0059] For example, in case the vector under examination (VUE) is a PIP -based vector, the space may consist of only other PIP-based vector (e.g., excluding SIP -based vectors from the space). In some exemplary embodiments, in case of a relative dense area of the VUE is high (e.g., density above threshold, which is relative and based on general density in the space), it may be determined that the interaction is replayed and was viewed in many previous interactions (e.g., in the same session or in different sessions). The PIP may be deemed as a human-generated interaction which is replayed in the same session or in different sessions.
[0060] As another example, the PIP -based VUE may be compared with other PIP- based vectors obtained from interactions in the same session. In case of high density, the interaction may be deemed as being reused in the same session. Additionally or alternatively, in case of a low density, the interaction may be deemed as inconsistent with the person performing other interactions in the same session, and is therefore indicative of a combination of different PIPs from different sessions and users that are being reused in the same session.
[0061 ] As yet another example, the PIP-based VUE may be compared with other P IP- based vectors obtained based on interactions of users with the same type of page as the interaction of the VUE. In some exemplary embodiments, interaction with a page may be substantially different depending on the type thereof. Interactions with pages of a same type, albeit being in different sites, may be relatively similar. Consider for example, users' interaction with GOOGLE™ search results page, which may be substantially similar to the users' interaction with BING™ search results page. The type of page may be, for example, a search results page, an item display page, a verification page, a login page, a form page, a checkout page, or the like. The item display page may be a page displaying information about an item, such as without limitation, a movie page in IMDB™, a book page in AMAZON™, a product page in ALIEXPRESS™, or the like. The verification page may be a page comprising a mechanism to differentiate between humans and bots, such as CAPTCHA. A login page may be a page comprising input fields to be used to provide credentials. In some exemplary embodiments, the login page may be categorized into sub-types based on the credentials type. As an example, a page requiring only site-specific credentials may be of different type than a page allowing Single Sign On (SSO) credentials. Additionally or alternatively, different SSOs may be associated with different sub-types (e.g., type for GOOGLE™ SSO and a different type for FACEBOOK™ SSO). A checkout page may be a page where the user performs a checkout activity, such as completing transaction created through the session. In some exemplary embodiments, the form page may be a page comprising a tillable form. The tillable form may be comprised of input widgets such as radio button widgets, text input widgets, select widgets, or the like. In some exemplary embodiments, form pages having a same or similar (e.g., about ±30%; about ±3; or the like) number of widgets may be considered as pages of the same type. Additionally or alternatively, only form pages having a same or similar number of widgets of each type may be considered as pages of the same type.
[0062] As yet another example, the VUE may be compared with vectors that are based with other PIPs or SIPs in which the same page was visited. Such comparison may allow for differentiating abnormal behavior that are abnormal to the specific page. Such embodiments may enable treating the same VUE differently depending on the specific page in which the interaction occurred. The same may be applicable to sessions in the same site. The disclosed subject matter may provide for a framework for checking interactions in different pages, in different sites, and not limited to a specific page or site, while still providing accuracy that is based on big data analytics of interactions that were viewed with each specific page or site.
[0063] As yet another example, the VUE may be an S IP-based vector. The VUE may be compared with a space consisting of only other SIP -based vectors (e.g., excluding PIP -based vectors from the space). In such a case, high density may be indicative of a same SIP (or variations thereof) being reused by a bot in different sessions.
[0064] In some exemplary embodiments, the classifier may obtain the vectors based on all the PIPs that were obtained in the same session as the interactions upon which the VUE is based. For each same session vector, a statistical measurement (e.g., density measurement) may be computed with respect to one or more additional vectors (e.g., based on PIPs from different sessions). An aggregated statistical measurement, such as average density, variance of density, skewness of density and kurtosis of density, or the like, may be computed, and used to determine whether the session comprises replayed interactions, such as created using Inter- or Intra-Session Replay Attack.
[0065] In some exemplar}' embodiments, when using PIPs of the same session, all PIPs of the same session may be used. Additionally or alternatively, all PIPs except for VIPs may be used. In some exemplary embodiments, as the verification process of the VIP has passed, the VIP itself may be original VIP generated by a CAPTCHA farm, rather than a replayed interaction. As a result, such original interaction should be overlooked when examining the entire session for determining whether the agent operating is a human or a bot. [0066] In some exemplary embodiments, any combination of the above mentioned examinations may be employed by the second classifier to estimate whether the interaction of the VUE is original or replayed.
[0067] On Step 240, based on the prediction provided by the second classifier, it may be determined whether the human-generated interaction is original or replayed. In case the interaction is original, normal processing may be performed (Step 245) without interference. In case the interaction is estimated to be replayed, responsive action may be performed (Step 232).
[0068] On Step 232, in response to a hot-detection, a responsive action is performed. The responsive action may be designed to mitigate the risk from the bot operation. In some exemplary embodiments, the session may be discontinued to prevent the bot from acting. Additionally or alternatively, the web server of the web page may blacklist the IP address of the agent to prevent the same IP to be reused by the bot. In some exemplar}' embodiments, instead of blacklisting, IP information may be used as part of the estimation process. Additionally or alternatively, in order to prevent DoS attacks, bot agents may be redirected to designated bot servers for handling additional requests therefrom. Additionally or alternatively, different pages may be provided to bot agents, such as pages specifically designed for bots to provide the bots with false information about the site, thereby preventing impermissible data mining from the site, vulnerability detection in the site, or the like. Additionally or alternatively, a verification page may¬ be provided to the bot to verify the estimated decision. It will be noted, however, that verification pages may be circumvented, such as using the semces of CAPTCHA farms. Additionally or alternatively, actions taken by the bot agent may be disregarded, such as discounting credit accumulated to the agent, overlooking advertisement impressions served to the bot and Pay Per Click (PPC) advertisements presented to the bot in response to the bot performing an action (e.g., clicking on a banner), or the like. In some exemplary embodiments, the responsive action may be designed to allow for analysis of bot and non-bot sessions. In some exemplary embodiments, the responsive action may tag sessions with a bot or non-bot label, allowing for offline analysis of the session information. In some exemplary embodiments, session data may be utilized for website optimization, such as for a-b testing of different versions of the website. The optimization may be improved by relating solely to non-bot sessions, which may be achieved using the tagging.
[0069] In some exemplary embodiments, a single classifier may be utilized to unify Steps 210 and 230. In some exemplar}' embodiments, the classifier may be configured to label each VUE as "HUMAN", "BOT" or "REPLAYED", indicating human agent, bot-generated interaction, or hum an -generated bot-replayed interaction. In some exemplary embodiments, the label of REPLAYED may be unified with the BOT label. In some exemplary embodiments, the first classifier may not be employed and instead only the second classifier may be used without first verifying that the interaction is consistent with human-generated interaction.
[0070] Referring now to Figure 3 A showing a flowchart diagram of a method in accordance with some exemplary embodiments of the disclosed subject matter.
[0071] Web Agent 310 may be an agent interacting with a Website 300 over a session. Web Agent 310 provides activity reports to Event Acquisition System (EAS) 320. EAS 320 may be configured to obtain peripherals tracking data, such as of mouse, keyboard, touchpad, touch screen, orientation, or the like, originated from the device employed by- Web Agent 310. EAS 320 may further obtain HTTP session metadata of Web Agent 310 interacting with Website 300. In some exemplary embodiments, EAS 320 is deployed on a server, and obtains by receiving reports issued by a client-side script embedded in the page of Website 300 that is being executed by the device of Web Agent 310 while Web Agent 310 is interacting with Website 300.
[0072] Extract, Transform & Load (ETL) 330 may be a module configured to transform each event acquired by EAS 320 into a set of features to be used by Classifier 340. The features may comprise at least in pari, behavioral features based on the peripherals tracking data. The set of features generated by ETL 330 may be a multidimensional vector in a multidimensional space,
[0073] The vector of features generated by ETL 330 may be provided to Classifier 340 for training. In some exemplary embodiments, training may be performed with respect to human-verified Web Agent 310 or to bot-verified Web Agent 310.
[0074] Additionally or alternatively, the vector of features generated by ETL 330 may be added to a repository used by Classifier 340 for its estimation. In some exemplar}' embodiments, after large amounts of vectors are generated by ETL 330 based on many monitored interactions, the predicative capabilities of Classifier 340 may be improved.
[0075] Referring now to Figure 3B showing a flowchart diagram of a method in accordance with some exemplary embodiments of the disclosed subject matter. In Figure 3B, Classifier 340 is being used to provide a prediction.
[0076] Web Agent 311 is interacting with Website 301, which may or may not be the same website as Website 300. EAS 320 may acquire events which are transformed by ETL 330 to features. The features may be fed to Classifier 340 to provide a prediction. In some exemplary embodiments, the features are also added to Classifier's 340 repository to be used for future predictions as well. In case Classifier 340 estimates that Web Agent 31 1 is a bot, Bot Responsive Module 350 may be employed to implement a responsive action. The responsive action may be determined based on rules, based on the identity of the Website 301 or web page therein being accessed, or the like. In some exemplar}' embodiments, Bot Responsive Module 350 may instruct Website 301 to alter its functionality in view of the bot detection, such as issue a redirect notification, provide different web pages to Web Agent 311, or the like.
[0077] Figure 4 shows an illustration of clusters that are automatically determined based on the set of behavioral features. Based on clustering process, Clusters 410, 420, 430, 440 may be identified.
[0078] A cluster may represent a typical type of behavior. For example, Cluster 410 may comprise interactions with very slow mouse movements, while interactions with ver fast mouse movements may be grouped in Cluster 440.
[0079] In some exemplar}- embodiments, centroids of each cluster may be computed and potentially stored to be retrieved when needed. In some exemplary embodiments, other statistical features of the clusters may be computed, such as, for example, a density of the clusters, a scattering and the number of elements in each cluster, an average distance of the elements in each cluster from their centroid, a standard deviation of the distances, or the like. Such statistical features may be stored in a database for furt er usage.
[0080] Centroids of each cluster may be computed, such as Centroids 412, 422, 432, 442. Additionally or alternatively, density of Cluster 430 may be greater than that of Cluster 420, In some exemplary embodiments, dense clusters with small diameter may be indicative of bot agents replaying the same interaction over and over in different sessions. Though Cluster 440 may be denser than Cluster 430, it has a relatively larger diameter, indicating that the cluster has a large degree of variations, and therefore is not necessarily a bot cluster. In some exemplary embodiments, Cluster 410 may have relatively small number of instances scattered therein, indicating that the cluster is likely to comprise rare interactions that have relatively low correlation with one another and therefore may be indicative of non-standard human behavior, and potentially various interactions by bots.
[0081] In some exemplar}' embodiments, clusters may be labeled. In some exemplary embodiments, clusters may be labeled as "HUMAN" or "BOT". Labeling may be performed manually, automatically or in a semi-automatic manner. In some exemplary embodiments, automatic labeling may be based on sessions in which bots were detected as indicting bot interactions which may infer a bot cluster. Additionally or alternatively, sessions in which a human agent was validated without a doubt may be used to infer a human cluster. As an example, a session in which a purchase order was completed and including making payment may be treated as a validated human agent session. In some exemplar}' embodiments, the session may be no longer assumed to be a human agent session if the purchase order was canceled and/or the payment was canceled subsequently to the purchase order being completed. In some exemplary embodiments, the training data for such semi-supervised method may not include information from validation mechanisms, such as CAPTCHA, as such mechanisms may be circumvented. Instead, the training data may include data that is validated in other manners, such as by performing actions which bot agents do not perform, by relying on human interaction with the human agent to verify her identity, or the like.
[0082] In some exemplar}' embodiments, automatic labeling may be based on clusters' statistical features such as diameter, density, standard deviation, or the like. Such features may be compared with predetermined or computed thresholds, which may be absolute or relative.
[0083] Referring again to Figure 4, Vectors 450, 460 and 470 may be analyzed. Vector 450 is relatively close to Cluster 410. Therefore, it may be deduced that the chances of Vector 450 being produced by a bot agent are not high, even though it does not match any existing cluster. The probability of Vector 460 being of a bot agent is higher, though it may be estimated to have some relation with Cluster 430. Vector 470 is very distant from all obtained clusters, and can therefore be estimated that it is associated with a bot agent.
[0084] In some embodiments, the distances may be weighted according to various factors, such as a number of elements (either normalized or not) in the clusters, cluster density, or the like. These weights may help in reflecting the clusters more accurately, as they reflect not only the typical human-behaviors, but also how typical they are. Moreover, the examined distance of the VTJE from the centroids could be for example, but not limited to - the minimum distance, the average distance, the weighted average distance, or any other distance-heuristic,
[0085] Moreover, each interaction may be examined from several data-perspectives, in accordance to the number of the behavioral features that are examined . The chances that an interaction is generated by a bot agent is higher when many of its vectors are too far from the obtained centroids. A user that moves his mouse in a specific SIP in an irregular speed, for example, is not necessarily a bot. However, if its acceleration and movement curvature (and/or other features) are also irregular - the chances that it is a bot may be higher. Therefore, a large collection of behavioral features may reduce the chances of false positive determinations (i.e. erroneously flagging an interaction generated by a human agent as a bot agent). Similarly, if the irregularity is local to a single VUE, for example, in one PIP, such irregularity may be insufficient to estimate, with a high confidence level, that the user is a bot. However, if there are irregularities that are exhibited in different perspectives, such as when examining one PIP, sets of PIPs in the session, the SIP, or the like, the confidence level of the bot detection may be considered higher and therefore an estimation may be produced.
[0086] In some exemplary embodiments, the estimation may be produced only in case that the confidence level is above a threshold. In some exemplary embodiments, if there is a bot estimation with a confidence level that is below a threshold, a responsive action may be implemented to gather additional information to re-perform the determination. The additional information may include causing the user to pass through a verification page, gathering additional PIP data to perform additional checks or the like. In some exemplar}' embodiments, during the phase in which additional information is gathered, the agent may be estimated as a bot, and may be prohibited from performing some actions (e.g., accessing specific data elements). Additionally or alternatively, bot estimated agent may be allowed to click on ads. However, in order to prevent click frauds, ad serving may be disabled or the economic effect of the ad serving may be disabled. In some exemplary embodiments, if the estimation is later on refuted, the economic effect may be retroactively enabled.
[0087] Referring now to Figure 5 showing an apparatus in accordance with some exemplary embodiments of the disclosed subject matter. An apparatus 500 may be configured to perform any of the methods of Figures 2, 3A-3B, or the like.
[0088] In some exemplary embodiments, Apparatus 500 may comprise a Processor 502. Processor 502 may be a Central Processing Unit (CPU), a microprocessor, an electronic circuit, an Integrated Circuit (IC) or the like. Processor 502 may be utilized to perform computations required by Apparatus 500 or any of it subcomponents.
[0089] In some exemplary embodiments of the disclosed subject matter, Apparatus 500 may comprise an Input/Output (I/O) Module 505. The I/O Module 505 may be utilized to provide an output to and receive input from a user. In some exemplary embodiments, user input may be provided in manual validation of cluster labels, in manual labeling of clusters, or the like. It will be understood, that Apparatus 500 may be configured to operate without user input or human interaction.
[0090] In some exemplary embodiments, Apparatus 500 may comprise Memory 507. Memory 507 may be a hard disk drive, a Flash disk, a Random Access Memory (RAM), a memory chip, or the like. In some exemplary embodiments, Memory 507 may retain program code operative to cause Processor 502 to perform acts associated with an of the subcomponents of Apparatus 500. In some exemplary embodiments, Memory 507 may retain clustering data, such as clusters, centroids or other statistical features thereof, labeling of clusters, or the like.
[0091] BF Determinator 510 may be configured to obtain and compute the set of behavioral features based on an interaction packet, such as PIP, SIP, CIP, or the like. In some exemplary embodiments, BF Determinator 5 0 may be configured to compute various BFs for a same interaction packet, each of which associated with a different group of behavioral features. [0092] BF Distance Computer 520 may be configured to compute a distance measurement between two vectors representing BFs. In some exemplary embodiments, the distance measurement may be based on any distance metric. In some exemplary embodiments, the distance measurement may take into account information regarding a cluster of one of the BFs, such as in case an BF is compared with a centroid BF of a cluster.
[0093] Clustering Module 530 may be configured to compute clusters based on BF data. Clustering performed by Clustering Module 530 may utilize non-supervised machine learning techniques, semi-supervised machine learning techniques, or the like. In some exemplary embodiments, Clustering Module 530 may provide labeling for clusters, such as based on automated deduction, semi-automatic determination having manual validation, manual input by human users, or the like.
AN EMBODIMENT
[0094] One embodiment of the disclosed subject matter is a method for classifying an HTTP session in a manner that distinguishes between a human agent and a bot agent.
[0095] (1) Client-side events are acquired. The client-side events are events generated by the client's device GUI peripherals, such as, pointing devices, keyboard devices and move sensors, on a given page in an HTTP session.
[0096] (2) The raw GUI events of a given page are transformed to a multidimensional numerical vector representing behavioral features and HTTP metadata features.
[0097] (3) A machine learning classifier is provided with the vector. The classifier distinguishes between pages that originated from a human agent and between a bot agent. If the classifier determines the agent to be a bot agent, such indication is returned. Otherwise, the process continues.
[0098] (4) The multi-dimensional numerical vector is embedded in a mutli- dimensional space. The vector's similarity/distance/density is compared to that of a control group of previously generated multi -dimensional numerical vectors. The control group can be generated using events from pages as follows: (a) previous pages from the same session; (b) previous pages accessed by different sessions on the same page; (c) previous pages accessed by different sessions that are similar to the current page. [0099] (5) A rale engine may be employed. The rule agent may implement any or all of the following non-limiting, exemplary, rales. If the current page generates a highly dense or very sparse density with reference to control group (a), it may be concluded that the page originated by a bot agent. If the current page generates a highly dense or very sparse density with reference to control group (b), it may be concluded that the page originated by a bot agent via multiple sessions, if the current page generates a highly dense or very sparse density with reference to control group (c), it may be concluded that the page originated by a bot agent. Additional control groups and rules may be used in the determination. In case no rule indicated that the current page was visited by a bot agent, an indication that the agent is a human agent may be returned.
[00100] The present invention may be a system, a method, and/or a computer program product. The computer program product may include a computer readable storage medium (or media) having computer readable program instructions thereon for causing a processor to carry out aspects of the present invention.
[00101] The computer readable storage medium can be a tangible device that can retain and store instructions for use by an instruction execution device. The computer readable storage medium may be, for example, but is not limited to, an electronic storage device, a magnetic storage device, an optical storage device, an electromagnetic storage device, a semiconductor storage device, or any suitable combination of the foregoing. A non- exhaustive list of more specific examples of the computer readable storage medium includes the following: a portable computer diskette, a hard disk, a random access memory (RAM), a read-only memory (ROM), an erasable programmable read-only memory (EPROM or Flash memory), a static random access memory (SRAM), a portable compact disc read-only memory (CD-ROM), a digital versatile disk (DVD), a memory stick, a floppy disk, a mechanically encoded device such as punch-cards or raised structures in a groove having instructions recorded thereon, and any suitable combination of the foregoing. A computer readable storage medium, as used herein, is not to be construed as being transitory signals per se, such as radio waves or other freely propagating electromagnetic waves, electromagnetic waves propagating through a waveguide or other transmission media (e.g., light pulses passing through a fiber-optic cable), or electrical signals transmitted through a wire. [00102] Computer readable program instructions described herein can be downloaded to respective computing/processing devices from a computer readable storage medium or to an external computer or external storage device via a network, for example, the Internet, a local area network, a wide area network and/or a wireless network. The network may comprise copper transmission cables, optical transmission fibers, wireless transmission, routers, firewalls, switches, gateway computers and/or edge servers. A network adapter card or network interface in each computing/processing device receives computer readable program instructions from the network and forwards the computer readable program instructions for storage in a computer readable storage medium within the respective computing processing device.
[00103] Computer readable program instructions for carrying out operations of the present invention may be assembler instructions, instruction-set-architecture (ISA) instructions, machine instructions, machine dependent instructions, microcode, firmware instructions, state-setting data, or either source code or object code written in any combination of one or more programming languages, including an object oriented programming language such as Smalltalk, C++ or the like, and conventional procedural programming languages, such as the "C" programming language or similar programming languages. The computer readable program instructions may execute entirely on the user's computer, partly on the user's computer, as a stand-alone software package, partly on the user's computer and partly on a remote computer or entirely on the remote computer or server. In the latter scenario, the remote computer may be connected to the user's computer through any type of network, including a local area network (LAN) or a wide area network (WAN), or the connection may be made to an external computer (for example, through the Internet using an Internet Service Provider). In some embodiments, electronic circuitry including, for example, programmable logic circuitry, field-programmable gate arrays (FPGA), or programmable logic arrays (PLA) may execute the computer readable program instructions by utilizing state information of the computer readable program instructions to personalize the electronic circuitry, in order to perform aspects of the present invention.
[00104] Aspects of the present invention are described herein with reference to flowchart illustrations and/or block diagrams of methods, apparatus (systems), and computer program products according to embodiments of the invention. It will be understood that each block of the flowchart illustrations and/or block diagrams, and combinations of blocks in the flowchart illustrations and/or block diagrams, can be implemented by computer readable program instructions.
[00105] These computer readable program instructions may be provided to a processor of a general purpose computer, special purpose computer, or other programmable data processing apparatus to produce a machine, such that the instructions, which execute via the processor of the computer or other programmable data processing apparatus, create means for implementing the functions/acts specified in the flowchart and/or block diagram block or blocks. These computer readable program instructions may also be stored in a computer readable storage medium that can direct a computer, a programmable data processing apparatus, and/or other devices to function in a particular manner, such that the computer readable storage medium having instructions stored therein comprises an article of manufacture including instructions which implement aspects of the function/act specified in the flowchart and/or block diagram block or blocks.
[00106] The computer readable program instructions may also be loaded onto a computer, other programmable data processing apparatus, or other device to cause a series of operational steps to be performed on the computer, other programmable apparatus or other device to produce a computer implemented process, such that the instructions which execute on the computer, other programmable apparatus, or other device implement the functions/acts specified in the flowchart and/or block diagram block or blocks.
[00107] The flowchart and block diagrams in the Figures illustrate the architecture, functionality, and operation of possible implementations of systems, methods, and computer program products according to various embodiments of the present invention. In this regard, each block in the flowchart or block diagrams may represent a module, segment, or portion of instructions, which comprises one or more executable instructions for implementing the specified logical function(s). In some alternative implementations, the functions noted in the block may occur out of the order noted in the figures. For example, two blocks shown in succession may, in fact, be executed substantially concurrently, or the blocks may sometimes be executed in the reverse order, depending upon the functionality involved. It will also be noted that each block of the block diagrams and/or flowchart illustration, and combinations of blocks in the block diagrams and/or flowchart illustration, can be implemented by special purpose hardware-based systems that perform the specified functions or acts or carry out combinations of special puipose hardware and computer instructions.
[00108] The terminology used herein is for the puipose of describing particular embodiments only and is not intended to be limiting of the invention. As used herein, the singular forms "a", "an" and "the" are intended to include the plural forms as well, unless the context clearly indicates otherwise. It will be further understood that the terms "comprises" and/or "comprising," when used in this specification, specify the presence of stated features, integers, steps, operations, elements, and/or components, but do not preclude the presence or addition of one or more other features, integers, steps, operations, elements, components, and/or groups thereof.
[00109] The corresponding structures, materials, acts, and equivalents of all means or step plus function elements in the claims below are intended to include any structure, material, or act for performing the function in combination with other claimed elements as specifically claimed. The description of the present invention has been presented for purposes of illustration and description, but is not intended to be exhaustive or limited to the invention in the form disclosed. Many modifications and variations will be apparent to those of ordinary skill in the art without departing from the scope and spirit of the invention. The embodiment was chosen and described in order to best explain the principles of the invention and the practical application, and to enable others of ordinary skill in the art to understand the invention for various embodiments with various modifications as are suited to the particular use contemplated.

Claims

CLAIMS What is claimed is:
1 . A method comprising:
obtaining a set of behavioral features of a usage of an input device during an interaction of an agent with a page, wherein the set of behavioral features are consistent with a human-generated interaction, wherein the set of behavioral features are generated based on events obtained from a client device used by the agent;
automatically estimating whether the agent is a bot based on comparison of the set of behavioral features with one or more additional sets of behavioral features, wherein the one or more additional sets of behavioral features were previously obtained based on interactions of one or more agents with one or more pages; and
in response to an estimation that the agent is a bot, performing a responsive action.
2. The method of Claim 1 further comprising:
verifying that the set of behavioral features are consistent with a human- generated interaction;
wherein in response to said verifying, performing said automatically estimating.
3. The method of Claim 1, wherein said automatically estimating is performed by a classifier, wherein the classifier obtains multidimensional vectors each of which is defined by a set of behavioral features, wherein said classifier is configured to provide a classification of a bot for a set of behavioral features based on a statistical measurement being above or below a threshold,
4. The method of Claim 1 , wherein the set of behavioral features is based on a Page Interaction Packet (PIP), wherein the one or more additional sets of behavioral features are based on PIPs,
5. The method of Claim 4, wherein the PIP and the PIPs are obtained from a same session.
6. The method of Claim 4, wherein the PIP is obtained from a first session, wherein the PIPs comprise at least a subset of PIPs that are obtained from a second session.
7. The method of Claim 4, wherein the PIP and the PIPs originate from pages, wherein the pages are pages of a same type.
8. The method of Claim 7, wherein the type of the pages is selected from the group consisting of;
a search results page;
an item display page;
a verification page;
a login page;
a form page; and
a checkout page.
9. The method of Claim 7, wherein the pages are pages of the same type from different websites, wherein the different websites are websites of a similar subject matter.
10. The method of Claim 1, wherein the set of behavioral features is based on a Session Interaction Packet (SIP), wherein the one or more additional sets of behavioral features are based on SIPs.
11. The method of Claim 1, wherein said automatically estimating comprises:
obtaining a plurality of sets of behavioral features each of which is based on a Page Interaction Packet (PIP) of the agent, all of which were obtained during a same session, wherein the plurality of sets of behavioral features comprises the set of behavioral features,
for each set of the plurality of sets of behavioral features, computing a statistical measurement with respect to the one or more additional sets of behavioral features;
computing an aggregated statistical measurement based on the statistical measurement of each set of the plurality of set of behavioral features; and
estimating whether the agent is a bot based on the aggregated statistical measurement being above or below a threshold.
12. The method of Claim 1 , wherein the set of behavioral features comprise at least one of:
a speed of a movement of a pointing device,
an acceleration of a movement of a pointing device;
a curvature feature of a movement of a pointing device;
one or more movement vectors of a pointing device;
a pause pattern of a movement of a pointing device;
a click pattern of one or more buttons of an input device;
a time duration of the interaction;
a flight time;
a dwell time;
an angle movement of a pointing device;
acceleration measurement of a user device during the interaction; and an orientation of a user device during the interaction.
13. The method of Claim 1, wherein said obtaining comprises receiving, by a server, the set of behavioral features or indications thereof from a client-side script embedded in the page that is being executed by a computerized device of the agent while the agent i s interacting with the page.
14. The method of Claim 1, wherein the one or more additional sets of behavioral features were previously obtained based on interactions with the page.
15. An apparatus having a processor, the processor being adapted to perform the steps of;
obtaining a set of behavioral features of a usage of an input device during an interaction of an agent with a page, wherein the set of behavioral features are consistent with a human-generated interaction, wherein the set of behavioral features are generated based on events obtained from a client device used by the agent;
automatically estimating whether the agent is a bot based on comparison of the set of behavioral features with one or more additional sets of behavioral features, wherein the one or more additional sets of behavioral features were previously obtained based on interactions of one or more agents with one or more pages; and
in response to an estimation that the agent is a bot, performing a responsive action.
16. The apparatus of Claim 15, wherein said processor is further adapted to perform:
verifying that the set of behavioral features are consistent with a human- generated interaction,
wherein in response to said verifying, performing said automatically estimating.
17. The apparatus of Claim 15, wherein said automatically estimating is performed by a classifier, wherein the classifier obtains multidimensional vectors each of which is defined by a set of behavioral features, wherein said classifier is configured to provide a classification of a bot for a set of behavioral features based on a statistical measurement being above or below a threshold.
18. The apparatus of Claim 15, wherein the set of behavioral features is based on a Page Interaction Packet (PIP), wherein the one or more additional sets of behavioral features are based on PIPs.
19. The apparatus of Claim 18, wherein the PIP and the PIPs are obtained from a same session.
20. The apparatus of Claim 18, wherein the PIP is obtained from a first session, wherein the PIPs comprise at least a subset of PIPs that are obtained from a second session.
21. The apparatus of Claim 18, wherein the PIP and the PIPs originate from pages, wherein the pages are pages of a same type.
22. The apparatus of Claim 21, wherein the pages are pages of the same type from different websites, wherein the different websites are websites of a similar subject matter.
23. The apparatus of Claim 15, wherein the set of behavioral features is based on a Session Interaction Packet (SIP), wherein the one or more additional sets of behavioral features are based on SIPs.
24. The apparatus of Claim 15, wherein the apparatus is a server connectable to computerized devices over a communication network, wherein said obtaining comprises receiving, by the server, the set of behavioral features or indications thereof from a client-side script embedded in the page that is being executed by a computerized device of the agent while the agent is interacting with the page.
25. A computer program product comprising a non-transitory computer readable storage medium retaining program instructions, which program instructions when read by a processor, cause the processor to perform a method comprising:
obtaining a set of behavioral features of a usage of an input device during an interaction of an agent with a page, wherein the set of behavioral features are consistent with a human-generated interaction, wherein the set of behavioral features are generated based on events obtained from a client device used by the agent;
automatically estimating whether the agent is a bot based on comparison of the set of behavioral features with one or more additional sets of behavioral features, wherein the one or more additional sets of behavioral features were previously obtained based on interactions of one or more agents with one or more pages; and
in response to an estimation that the agent is a bot, performing a responsive action.
PCT/IL2016/051300 2015-12-28 2016-12-06 Utilizing behavioral features to identify bot WO2017115347A1 (en)

Priority Applications (2)

Application Number Priority Date Filing Date Title
CN201680076727.5A CN108604272A (en) 2015-12-28 2016-12-06 Robot is identified using behavioural characteristic
EP16881394.7A EP3398106B1 (en) 2015-12-28 2016-12-06 Utilizing behavioral features to identify bot

Applications Claiming Priority (4)

Application Number Priority Date Filing Date Title
US201562272058P 2015-12-28 2015-12-28
US201562272065P 2015-12-28 2015-12-28
US62/272,058 2015-12-28
US62/272,065 2015-12-28

Publications (1)

Publication Number Publication Date
WO2017115347A1 true WO2017115347A1 (en) 2017-07-06

Family

ID=59086423

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/IL2016/051300 WO2017115347A1 (en) 2015-12-28 2016-12-06 Utilizing behavioral features to identify bot

Country Status (4)

Country Link
US (1) US11003748B2 (en)
EP (1) EP3398106B1 (en)
CN (1) CN108604272A (en)
WO (1) WO2017115347A1 (en)

Families Citing this family (117)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US9792957B2 (en) 2014-10-08 2017-10-17 JBF Interlude 2009 LTD Systems and methods for dynamic video bookmarking
US20160364762A1 (en) 2015-06-09 2016-12-15 Clickagy, LLC Method and system for creating an audience list based on user behavior data
US10460765B2 (en) 2015-08-26 2019-10-29 JBF Interlude 2009 LTD Systems and methods for adaptive and responsive video
US20220164840A1 (en) 2016-04-01 2022-05-26 OneTrust, LLC Data processing systems and methods for integrating privacy information management systems with data loss prevention tools or other tools for privacy design
US11244367B2 (en) 2016-04-01 2022-02-08 OneTrust, LLC Data processing systems and methods for integrating privacy information management systems with data loss prevention tools or other tools for privacy design
US11856271B2 (en) 2016-04-12 2023-12-26 JBF Interlude 2009 LTD Symbiotic interactive video
US11277416B2 (en) * 2016-04-22 2022-03-15 Sophos Limited Labeling network flows according to source applications
US10986109B2 (en) 2016-04-22 2021-04-20 Sophos Limited Local proxy detection
US11102238B2 (en) 2016-04-22 2021-08-24 Sophos Limited Detecting triggering events for distributed denial of service attacks
US11165797B2 (en) 2016-04-22 2021-11-02 Sophos Limited Detecting endpoint compromise based on network usage history
US10938781B2 (en) 2016-04-22 2021-03-02 Sophos Limited Secure labeling of network flows
US11294939B2 (en) 2016-06-10 2022-04-05 OneTrust, LLC Data processing systems and methods for automatically detecting and documenting privacy-related aspects of computer software
US12045266B2 (en) 2016-06-10 2024-07-23 OneTrust, LLC Data processing systems for generating and populating a data inventory
US11481710B2 (en) 2016-06-10 2022-10-25 OneTrust, LLC Privacy management systems and methods
US11188862B2 (en) 2016-06-10 2021-11-30 OneTrust, LLC Privacy management systems and methods
US11475136B2 (en) 2016-06-10 2022-10-18 OneTrust, LLC Data processing systems for data transfer risk identification and related methods
US10949565B2 (en) 2016-06-10 2021-03-16 OneTrust, LLC Data processing systems for generating and populating a data inventory
US10284604B2 (en) 2016-06-10 2019-05-07 OneTrust, LLC Data processing and scanning systems for generating and populating a data inventory
US10282559B2 (en) 2016-06-10 2019-05-07 OneTrust, LLC Data processing systems for identifying, assessing, and remediating data processing risks using data modeling techniques
US11295316B2 (en) 2016-06-10 2022-04-05 OneTrust, LLC Data processing systems for identity validation for consumer rights requests and related methods
US11403377B2 (en) 2016-06-10 2022-08-02 OneTrust, LLC Privacy management systems and methods
US11418492B2 (en) 2016-06-10 2022-08-16 OneTrust, LLC Data processing systems and methods for using a data model to select a target data asset in a data migration
US11438386B2 (en) 2016-06-10 2022-09-06 OneTrust, LLC Data processing systems for data-transfer risk identification, cross-border visualization generation, and related methods
US11277448B2 (en) 2016-06-10 2022-03-15 OneTrust, LLC Data processing systems for data-transfer risk identification, cross-border visualization generation, and related methods
US11675929B2 (en) 2016-06-10 2023-06-13 OneTrust, LLC Data processing consent sharing systems and related methods
US10678945B2 (en) 2016-06-10 2020-06-09 OneTrust, LLC Consent receipt management systems and related methods
US11562097B2 (en) 2016-06-10 2023-01-24 OneTrust, LLC Data processing systems for central consent repository and related methods
US11461500B2 (en) 2016-06-10 2022-10-04 OneTrust, LLC Data processing systems for cookie compliance testing with website scanning and related methods
US11727141B2 (en) 2016-06-10 2023-08-15 OneTrust, LLC Data processing systems and methods for synching privacy-related user consent across multiple computing devices
US10846433B2 (en) 2016-06-10 2020-11-24 OneTrust, LLC Data processing consent management systems and related methods
US11586700B2 (en) 2016-06-10 2023-02-21 OneTrust, LLC Data processing systems and methods for automatically blocking the use of tracking tools
US10592648B2 (en) 2016-06-10 2020-03-17 OneTrust, LLC Consent receipt management systems and related methods
US10997318B2 (en) 2016-06-10 2021-05-04 OneTrust, LLC Data processing systems for generating and populating a data inventory for processing data access requests
US11366786B2 (en) 2016-06-10 2022-06-21 OneTrust, LLC Data processing systems for processing data subject access requests
US12052289B2 (en) 2016-06-10 2024-07-30 OneTrust, LLC Data processing systems for data-transfer risk identification, cross-border visualization generation, and related methods
US11636171B2 (en) 2016-06-10 2023-04-25 OneTrust, LLC Data processing user interface monitoring systems and related methods
US11343284B2 (en) 2016-06-10 2022-05-24 OneTrust, LLC Data processing systems and methods for performing privacy assessments and monitoring of new versions of computer code for privacy compliance
US11227247B2 (en) 2016-06-10 2022-01-18 OneTrust, LLC Data processing systems and methods for bundled privacy policies
US12118121B2 (en) 2016-06-10 2024-10-15 OneTrust, LLC Data subject access request processing systems and related methods
US11134086B2 (en) 2016-06-10 2021-09-28 OneTrust, LLC Consent conversion optimization systems and related methods
US11222142B2 (en) 2016-06-10 2022-01-11 OneTrust, LLC Data processing systems for validating authorization for personal data collection, storage, and processing
US10909265B2 (en) 2016-06-10 2021-02-02 OneTrust, LLC Application privacy scanning systems and related methods
US10740487B2 (en) 2016-06-10 2020-08-11 OneTrust, LLC Data processing systems and methods for populating and maintaining a centralized database of personal data
US11520928B2 (en) 2016-06-10 2022-12-06 OneTrust, LLC Data processing systems for generating personal data receipts and related methods
US11410106B2 (en) 2016-06-10 2022-08-09 OneTrust, LLC Privacy management systems and methods
US11416590B2 (en) 2016-06-10 2022-08-16 OneTrust, LLC Data processing and scanning systems for assessing vendor risk
US11238390B2 (en) 2016-06-10 2022-02-01 OneTrust, LLC Privacy management systems and methods
US12021831B2 (en) 2016-06-10 2024-06-25 Sophos Limited Network security
US10685140B2 (en) 2016-06-10 2020-06-16 OneTrust, LLC Consent receipt management systems and related methods
US11651104B2 (en) 2016-06-10 2023-05-16 OneTrust, LLC Consent receipt management systems and related methods
US11651106B2 (en) 2016-06-10 2023-05-16 OneTrust, LLC Data processing systems for fulfilling data subject access requests and related methods
US10467432B2 (en) 2016-06-10 2019-11-05 OneTrust, LLC Data processing systems for use in automatically generating, populating, and submitting data subject access requests
US11354434B2 (en) 2016-06-10 2022-06-07 OneTrust, LLC Data processing systems for verification of consent and notice processing and related methods
US11328092B2 (en) 2016-06-10 2022-05-10 OneTrust, LLC Data processing systems for processing and managing data subject access in a distributed environment
US11416798B2 (en) 2016-06-10 2022-08-16 OneTrust, LLC Data processing systems and methods for providing training in a vendor procurement process
US11416109B2 (en) 2016-06-10 2022-08-16 OneTrust, LLC Automated data processing systems and methods for automatically processing data subject access requests using a chatbot
US11392720B2 (en) 2016-06-10 2022-07-19 OneTrust, LLC Data processing systems for verification of consent and notice processing and related methods
US10318761B2 (en) 2016-06-10 2019-06-11 OneTrust, LLC Data processing systems and methods for auditing data request compliance
US11625502B2 (en) 2016-06-10 2023-04-11 OneTrust, LLC Data processing systems for identifying and modifying processes that are subject to data subject access requests
US11222139B2 (en) 2016-06-10 2022-01-11 OneTrust, LLC Data processing systems and methods for automatic discovery and assessment of mobile software development kits
US11336697B2 (en) 2016-06-10 2022-05-17 OneTrust, LLC Data processing systems for data-transfer risk identification, cross-border visualization generation, and related methods
US11416589B2 (en) 2016-06-10 2022-08-16 OneTrust, LLC Data processing and scanning systems for assessing vendor risk
US10510031B2 (en) 2016-06-10 2019-12-17 OneTrust, LLC Data processing systems for identifying, assessing, and remediating data processing risks using data modeling techniques
US10606916B2 (en) 2016-06-10 2020-03-31 OneTrust, LLC Data processing user interface monitoring systems and related methods
US11341447B2 (en) 2016-06-10 2022-05-24 OneTrust, LLC Privacy management systems and methods
US11188615B2 (en) 2016-06-10 2021-11-30 OneTrust, LLC Data processing consent capture systems and related methods
US11366909B2 (en) 2016-06-10 2022-06-21 OneTrust, LLC Data processing and scanning systems for assessing vendor risk
US11544667B2 (en) 2016-06-10 2023-01-03 OneTrust, LLC Data processing systems for generating and populating a data inventory
US11301796B2 (en) 2016-06-10 2022-04-12 OneTrust, LLC Data processing systems and methods for customizing privacy training
US10909488B2 (en) 2016-06-10 2021-02-02 OneTrust, LLC Data processing systems for assessing readiness for responding to privacy-related incidents
US11354435B2 (en) 2016-06-10 2022-06-07 OneTrust, LLC Data processing systems for data testing to confirm data deletion and related methods
US10878127B2 (en) 2016-06-10 2020-12-29 OneTrust, LLC Data subject access request processing systems and related methods
US11050809B2 (en) 2016-12-30 2021-06-29 JBF Interlude 2009 LTD Systems and methods for dynamic weighting of branched video paths
US11386349B1 (en) * 2017-05-16 2022-07-12 Meta Platforms, Inc. Systems and methods for distinguishing human users from bots
US10013577B1 (en) 2017-06-16 2018-07-03 OneTrust, LLC Data processing systems for identifying whether cookies contain personally identifying information
US10594836B2 (en) * 2017-06-30 2020-03-17 Microsoft Technology Licensing, Llc Automatic detection of human and non-human activity
CN109313645B (en) * 2017-08-25 2022-05-24 深圳市大富智慧健康科技有限公司 Artificial intelligence terminal system, server and behavior control method thereof
GB201715801D0 (en) * 2017-09-29 2017-11-15 Intechnica Ltd Method of processing web requests directed to a website
US10257578B1 (en) 2018-01-05 2019-04-09 JBF Interlude 2009 LTD Dynamic library display for interactive videos
WO2020227291A1 (en) * 2019-05-06 2020-11-12 SunStone Information Defense, Inc. Methods and apparatus for interfering with automated bots using a graphical pointer and page display elements
US11368483B1 (en) 2018-02-13 2022-06-21 Akamai Technologies, Inc. Low touch integration of a bot detection service in association with a content delivery network
US11245722B1 (en) * 2018-02-13 2022-02-08 Akamai Technologies, Inc. Content delivery network (CDN)-based bot detection service with stop and reset protocols
US10685655B2 (en) 2018-03-07 2020-06-16 International Business Machines Corporation Leveraging natural language processing
US11601721B2 (en) 2018-06-04 2023-03-07 JBF Interlude 2009 LTD Interactive video dynamic adaptation and user profiling
CN109145544A (en) * 2018-09-05 2019-01-04 郑州云海信息技术有限公司 A kind of human-computer behavior detection system and method
US10803202B2 (en) 2018-09-07 2020-10-13 OneTrust, LLC Data processing systems for orphaned data identification and deletion and related methods
US11544409B2 (en) 2018-09-07 2023-01-03 OneTrust, LLC Data processing systems and methods for automatically protecting sensitive data within privacy management systems
CN109284590B (en) * 2018-09-29 2021-06-25 武汉极意网络科技有限公司 Method, equipment, storage medium and device for access behavior security protection
FR3094518B1 (en) 2019-04-01 2021-02-26 Idemia Identity & Security France Method of detecting bots in a user network
US11972368B2 (en) 2019-09-20 2024-04-30 International Business Machines Corporation Determining source of interface interactions
CN112989295A (en) * 2019-12-16 2021-06-18 北京沃东天骏信息技术有限公司 User identification method and device
US11012492B1 (en) * 2019-12-26 2021-05-18 Palo Alto Networks (Israel Analytics) Ltd. Human activity detection in computing device transmissions
US11245961B2 (en) * 2020-02-18 2022-02-08 JBF Interlude 2009 LTD System and methods for detecting anomalous activities for interactive videos
US12096081B2 (en) 2020-02-18 2024-09-17 JBF Interlude 2009 LTD Dynamic adaptation of interactive video players using behavioral analytics
CN111641594B (en) * 2020-05-09 2021-11-30 同济大学 Method, system, medium and device for detecting fraudulent user based on page behavior
WO2021262159A1 (en) * 2020-06-24 2021-12-30 Google Llc Verifying content and interactions within webviews
US12047637B2 (en) 2020-07-07 2024-07-23 JBF Interlude 2009 LTD Systems and methods for seamless audio and video endpoint transitions
WO2022011142A1 (en) 2020-07-08 2022-01-13 OneTrust, LLC Systems and methods for targeted data discovery
EP4189569A1 (en) 2020-07-28 2023-06-07 OneTrust LLC Systems and methods for automatically blocking the use of tracking tools
WO2022032072A1 (en) 2020-08-06 2022-02-10 OneTrust, LLC Data processing systems and methods for automatically redacting unstructured data from a data subject access request
WO2022060860A1 (en) 2020-09-15 2022-03-24 OneTrust, LLC Data processing systems and methods for detecting tools for the automatic blocking of consent requests
US11526624B2 (en) 2020-09-21 2022-12-13 OneTrust, LLC Data processing systems and methods for automatically detecting target data transfers and target data processing
EP4241173A1 (en) 2020-11-06 2023-09-13 OneTrust LLC Systems and methods for identifying data processing activities based on data discovery results
US11687528B2 (en) 2021-01-25 2023-06-27 OneTrust, LLC Systems and methods for discovery, classification, and indexing of data in a native computing system
US11442906B2 (en) 2021-02-04 2022-09-13 OneTrust, LLC Managing custom attributes for domain objects defined within microservices
US20240111899A1 (en) 2021-02-08 2024-04-04 OneTrust, LLC Data processing systems and methods for anonymizing data samples in classification analysis
US20240098109A1 (en) 2021-02-10 2024-03-21 OneTrust, LLC Systems and methods for mitigating risks of third-party computing system functionality integration into a first-party computing system
US11775348B2 (en) 2021-02-17 2023-10-03 OneTrust, LLC Managing custom workflows for domain objects defined within microservices
WO2022178219A1 (en) 2021-02-18 2022-08-25 OneTrust, LLC Selective redaction of media content
EP4305539A1 (en) 2021-03-08 2024-01-17 OneTrust, LLC Data transfer discovery and analysis systems and related methods
US11562078B2 (en) 2021-04-16 2023-01-24 OneTrust, LLC Assessing and managing computational risk involved with integrating third party computing functionality within a computing system
US11882337B2 (en) 2021-05-28 2024-01-23 JBF Interlude 2009 LTD Automated platform for generating interactive videos
US11934477B2 (en) 2021-09-24 2024-03-19 JBF Interlude 2009 LTD Video player integration within websites
CN114548078A (en) * 2021-12-28 2022-05-27 城云科技(中国)有限公司 Method, device and application for automatically triggering data verification when submitting form
WO2023129824A1 (en) * 2021-12-29 2023-07-06 Voyetra Turtle Beach, Inc. Mouse device with detection function of non-human mouse events and detection method thereof
US20230344842A1 (en) * 2022-04-21 2023-10-26 Palo Alto Networks, Inc. Detection of user anomalies for software as a service application traffic with high and low variance feature modeling
US11620142B1 (en) 2022-06-03 2023-04-04 OneTrust, LLC Generating and customizing user interfaces for demonstrating functions of interactive user environments

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2014159563A1 (en) * 2013-03-13 2014-10-02 University Of Pittsburgh Of The Commonwealth System Of Higher Education Usage modeling
US20140344927A1 (en) * 2010-11-29 2014-11-20 Biocatch Ltd. Device, system, and method of detecting malicious automatic script and code injection
US20150213251A1 (en) 2010-11-29 2015-07-30 Biocatch Ltd. Method, device, and system of protecting a log-in process of a computerized service
US20150242628A1 (en) * 2014-02-23 2015-08-27 Cyphort Inc. System and Method for Detection of Malicious Hypertext Transfer Protocol Chains

Family Cites Families (16)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20080201206A1 (en) * 2007-02-01 2008-08-21 7 Billion People, Inc. Use of behavioral portraits in the conduct of E-commerce
US8311876B2 (en) 2009-04-09 2012-11-13 Sas Institute Inc. Computer-implemented systems and methods for behavioral identification of non-human web sessions
US8832257B2 (en) * 2009-05-05 2014-09-09 Suboti, Llc System, method and computer readable medium for determining an event generator type
US20110131652A1 (en) 2009-05-29 2011-06-02 Autotrader.Com, Inc. Trained predictive services to interdict undesired website accesses
US8892896B2 (en) * 2010-01-08 2014-11-18 Microsoft Corporation Capability and behavior signatures
US8442863B2 (en) * 2010-06-17 2013-05-14 Microsoft Corporation Real-time-ready behavioral targeting in a large-scale advertisement system
US9544271B2 (en) * 2011-09-16 2017-01-10 Telecommunication Systems, Inc. Anonymous messaging conversation
US20140278947A1 (en) * 2011-10-31 2014-09-18 Pureclick Llc System and method for click fraud protection
US8990935B1 (en) * 2012-10-17 2015-03-24 Google Inc. Activity signatures and activity replay detection
US9313213B2 (en) * 2012-10-18 2016-04-12 White Ops, Inc. System and method for detecting classes of automated browser agents
US10447711B2 (en) * 2012-10-18 2019-10-15 White Ops Inc. System and method for identification of automated browser agents
US20150156084A1 (en) * 2012-12-02 2015-06-04 Bot Or Not, Llc System and method for reporting on automated browser agents
US9438539B1 (en) * 2013-09-03 2016-09-06 Xpliant, Inc. Apparatus and method for optimizing the number of accesses to page-reference count storage in page link list based switches
US8997226B1 (en) * 2014-04-17 2015-03-31 Shape Security, Inc. Detection of client-side malware activity
US9906544B1 (en) * 2014-12-02 2018-02-27 Akamai Technologies, Inc. Method and apparatus to detect non-human users on computer systems
US9986058B2 (en) * 2015-05-21 2018-05-29 Shape Security, Inc. Security systems for mitigating attacks from a headless browser executing on a client computer

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20140344927A1 (en) * 2010-11-29 2014-11-20 Biocatch Ltd. Device, system, and method of detecting malicious automatic script and code injection
US20150213251A1 (en) 2010-11-29 2015-07-30 Biocatch Ltd. Method, device, and system of protecting a log-in process of a computerized service
WO2014159563A1 (en) * 2013-03-13 2014-10-02 University Of Pittsburgh Of The Commonwealth System Of Higher Education Usage modeling
US20150242628A1 (en) * 2014-02-23 2015-08-27 Cyphort Inc. System and Method for Detection of Malicious Hypertext Transfer Protocol Chains

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
ZI CHU ET AL.: "Blog Or Block: Detecting Blog Bots Through Behavioral Biometrics", COMPUTER NETWORKS, vol. 57, no. 3, 1 February 2013 (2013-02-01), pages 634 - 646, XP055129268, ISSN: 1389-1286, DOI: 10.1016/j.comnet.2012.10.005

Also Published As

Publication number Publication date
EP3398106A4 (en) 2019-07-03
US20170185758A1 (en) 2017-06-29
EP3398106B1 (en) 2021-04-21
EP3398106A1 (en) 2018-11-07
US11003748B2 (en) 2021-05-11
CN108604272A (en) 2018-09-28

Similar Documents

Publication Publication Date Title
US11003748B2 (en) Utilizing behavioral features to identify bot
US10705904B2 (en) Detecting anomalous behavior in an electronic environment using hardware-based information
Kharraz et al. Surveylance: Automatically detecting online survey scams
CA2997583C (en) Systems and methods for detecting and preventing spoofing
Feizollah et al. A review on feature selection in mobile malware detection
Hupperich et al. On the robustness of mobile device fingerprinting: Can mobile users escape modern web-tracking mechanisms?
US20180012003A1 (en) Pointing device biometrics continuous user authentication
US20120204257A1 (en) Detecting fraud using touchscreen interaction behavior
Matyunin et al. Magneticspy: Exploiting magnetometer in mobile devices for website and application fingerprinting
US10148664B2 (en) Utilizing transport layer security (TLS) fingerprints to determine agents and operating systems
Shen et al. Touch-interaction behavior for continuous user authentication on smartphones
Hupperich et al. Leveraging sensor fingerprinting for mobile device authentication
Crichton et al. How do home computer users browse the web?
US10896252B2 (en) Composite challenge task generation and deployment
Basu et al. COPPTCHA: COPPA tracking by checking hardware-level activity
US10817601B2 (en) Hypervisor enforcement of cryptographic policy
EP3410328A1 (en) Method and system to distinguish between a human and a robot as a user of a mobile smart device
KR20150133370A (en) System and method for web service access control
AU2018218526B2 (en) Identifying human interaction with a computer
Annamalai et al. FP-Fed: privacy-preserving federated detection of browser fingerprinting
Sanchez-Rola et al. Rods with laser beams: understanding browser fingerprinting on phishing pages
WO2022130374A1 (en) Device, system, and method of determining personal characteristics of a user
Kim Poster: Detection and prevention of web-based device fingerprinting
JP2019074893A (en) Unauthorized login detection method
Eusanio Machine Learning for the Detection of Mobile Malware on Android Devices

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 16881394

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE

WWE Wipo information: entry into national phase

Ref document number: 2016881394

Country of ref document: EP

ENP Entry into the national phase

Ref document number: 2016881394

Country of ref document: EP

Effective date: 20180730