US20170070521A1 - Systems and methods for detecting and scoring anomalies - Google Patents

Systems and methods for detecting and scoring anomalies Download PDF

Info

Publication number
US20170070521A1
US20170070521A1 US15/256,597 US201615256597A US2017070521A1 US 20170070521 A1 US20170070521 A1 US 20170070521A1 US 201615256597 A US201615256597 A US 201615256597A US 2017070521 A1 US2017070521 A1 US 2017070521A1
Authority
US
United States
Prior art keywords
values
bucket
count
attribute
buckets
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Abandoned
Application number
US15/256,597
Inventor
Christopher Everett Bailey
Randy Lukashuk
Gary Wayne Richardson
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Nudata Security Inc
Original Assignee
Nudata Security Inc
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Nudata Security Inc filed Critical Nudata Security Inc
Priority to US15/256,597 priority Critical patent/US20170070521A1/en
Assigned to NUDATA SECURITY INC. reassignment NUDATA SECURITY INC. ASSIGNMENT OF ASSIGNORS INTEREST (SEE DOCUMENT FOR DETAILS). Assignors: BAILEY, CHRISTOPHER EVERETT, LUKASHUK, Randy, RICHARDSON, GARY WAYNE
Publication of US20170070521A1 publication Critical patent/US20170070521A1/en
Assigned to Mastercard Technologies Canada ULC reassignment Mastercard Technologies Canada ULC CERTIFICATE OF AMALGAMATION Assignors: NUDATA SECURITY INC.
Abandoned legal-status Critical Current

Links

Images

Classifications

    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L63/00Network architectures or network communication protocols for network security
    • H04L63/14Network architectures or network communication protocols for network security for detecting or protecting against malicious traffic
    • H04L63/1408Network architectures or network communication protocols for network security for detecting or protecting against malicious traffic by monitoring network traffic
    • H04L63/1425Traffic logging, e.g. anomaly detection
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F21/00Security arrangements for protecting computers, components thereof, programs or data against unauthorised activity
    • G06F21/30Authentication, i.e. establishing the identity or authorisation of security principals
    • G06F21/31User authentication
    • G06F21/316User authentication by observing the pattern of computer usage, e.g. typical user behaviour
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/22Indexing; Data structures therefor; Storage structures
    • G06F16/2228Indexing structures
    • G06F16/2255Hash tables
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/24Querying
    • G06F16/245Query processing
    • G06F16/2457Query processing with adaptation to user needs
    • G06F16/24578Query processing with adaptation to user needs using ranking
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/28Databases characterised by their database models, e.g. relational or object models
    • G06F16/284Relational databases
    • G06F16/285Clustering or classification
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/90Details of database functions independent of the retrieved data types
    • G06F16/95Retrieval from the web
    • G06F16/953Querying, e.g. by the use of web search engines
    • G06F16/9535Search customisation based on user profiles and personalisation
    • G06F17/3033
    • G06F17/30598
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F21/00Security arrangements for protecting computers, components thereof, programs or data against unauthorised activity
    • G06F21/30Authentication, i.e. establishing the identity or authorisation of security principals
    • G06F21/31User authentication
    • G06F21/32User authentication using biometric data, e.g. fingerprints, iris scans or voiceprints
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F21/00Security arrangements for protecting computers, components thereof, programs or data against unauthorised activity
    • G06F21/30Authentication, i.e. establishing the identity or authorisation of security principals
    • G06F21/44Program or device authentication
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F21/00Security arrangements for protecting computers, components thereof, programs or data against unauthorised activity
    • G06F21/50Monitoring users, programs or devices to maintain the integrity of platforms, e.g. of processors, firmware or operating systems
    • G06F21/55Detecting local intrusion or implementing counter-measures
    • G06F21/552Detecting local intrusion or implementing counter-measures involving long-term monitoring or reporting
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F21/00Security arrangements for protecting computers, components thereof, programs or data against unauthorised activity
    • G06F21/60Protecting data
    • G06F21/602Providing cryptographic facilities or services
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F7/00Methods or arrangements for processing data by operating upon the order or content of the data handled
    • G06F7/06Arrangements for sorting, selecting, merging, or comparing data on individual record carriers
    • G06F7/08Sorting, i.e. grouping record carriers in numerical or other ordered sequence according to the classification of at least some of the information they carry
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L41/00Arrangements for maintenance, administration or management of data switching networks, e.g. of packet switching networks
    • H04L41/20Network management software packages
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L43/00Arrangements for monitoring or testing data switching networks
    • H04L43/12Network monitoring probes
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L43/00Arrangements for monitoring or testing data switching networks
    • H04L43/16Threshold monitoring
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L63/00Network architectures or network communication protocols for network security
    • H04L63/08Network architectures or network communication protocols for network security for authentication of entities
    • H04L63/0861Network architectures or network communication protocols for network security for authentication of entities using biometrical features, e.g. fingerprint, retina-scan
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L63/00Network architectures or network communication protocols for network security
    • H04L63/14Network architectures or network communication protocols for network security for detecting or protecting against malicious traffic
    • H04L63/1408Network architectures or network communication protocols for network security for detecting or protecting against malicious traffic by monitoring network traffic
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L63/00Network architectures or network communication protocols for network security
    • H04L63/14Network architectures or network communication protocols for network security for detecting or protecting against malicious traffic
    • H04L63/1408Network architectures or network communication protocols for network security for detecting or protecting against malicious traffic by monitoring network traffic
    • H04L63/1416Event detection, e.g. attack signature detection
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L63/00Network architectures or network communication protocols for network security
    • H04L63/14Network architectures or network communication protocols for network security for detecting or protecting against malicious traffic
    • H04L63/1441Countermeasures against malicious traffic
    • H04L63/1466Active attacks involving interception, injection, modification, spoofing of data unit addresses, e.g. hijacking, packet injection or TCP sequence number attacks
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L63/00Network architectures or network communication protocols for network security
    • H04L63/20Network architectures or network communication protocols for network security for managing network security; network security policies in general
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L67/00Network arrangements or protocols for supporting network services or applications
    • H04L67/01Protocols
    • H04L67/02Protocols based on web technology, e.g. hypertext transfer protocol [HTTP]
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L67/00Network arrangements or protocols for supporting network services or applications
    • H04L67/2866Architectures; Arrangements
    • H04L67/30Profiles
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L67/00Network arrangements or protocols for supporting network services or applications
    • H04L67/2866Architectures; Arrangements
    • H04L67/30Profiles
    • H04L67/303Terminal profiles
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L67/00Network arrangements or protocols for supporting network services or applications
    • H04L67/50Network services
    • H04L67/535Tracking the activity of the user
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F2221/00Indexing scheme relating to security arrangements for protecting computers, components thereof, programs or data against unauthorised activity
    • G06F2221/03Indexing scheme relating to G06F21/50, monitoring users, programs or devices to maintain the integrity of platforms
    • G06F2221/033Test or assess software
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F2221/00Indexing scheme relating to security arrangements for protecting computers, components thereof, programs or data against unauthorised activity
    • G06F2221/21Indexing scheme relating to G06F21/00 and subgroups addressing additional information or applications relating to security arrangements for protecting computers, components thereof, programs or data against unauthorised activity
    • G06F2221/2101Auditing as a secondary aspect
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F2221/00Indexing scheme relating to security arrangements for protecting computers, components thereof, programs or data against unauthorised activity
    • G06F2221/21Indexing scheme relating to G06F21/00 and subgroups addressing additional information or applications relating to security arrangements for protecting computers, components thereof, programs or data against unauthorised activity
    • G06F2221/2137Time limited access, e.g. to a computer or data
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F2221/00Indexing scheme relating to security arrangements for protecting computers, components thereof, programs or data against unauthorised activity
    • G06F2221/21Indexing scheme relating to G06F21/00 and subgroups addressing additional information or applications relating to security arrangements for protecting computers, components thereof, programs or data against unauthorised activity
    • G06F2221/2145Inheriting rights or properties, e.g., propagation of permissions or restrictions within a hierarchy
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L2463/00Additional details relating to network architectures or network communication protocols for network security covered by H04L63/00
    • H04L2463/144Detection or countermeasures against botnets

Definitions

  • a large organization with an online presence often receives tens of thousands requests per minute to initiate digital interactions.
  • a security system supporting multiple large organizations may handle millions of digital interactions at the same time, and the total number of digital interactions analyzed by the security system each week may easily exceed one billion.
  • a computer-implemented method for analyzing a plurality of digital interactions, the method comprising acts of: (A) identifying a plurality of values of an attribute, each value of the plurality of values corresponding respectively to a digital interaction of the plurality of digital interactions; (B) dividing the plurality of values into a plurality of buckets; (C) for at least one bucket of the plurality of buckets, determining a count of values from the plurality of values that fall within the at least one bucket; (D) comparing the count of values from the plurality of values that fall within the at least one bucket against historical information regarding the attribute; and (E) determining whether the attribute is anomalous based at least in part on a result of the act (D).
  • a computer-implemented method for analyzing a digital interaction, the method comprising acts of: identifying a plurality of attributes from a profile; for each attribute of the plurality of attributes, determining whether the digital interaction matches the profile with respect to the attribute, comprising: identifying, from the profile, at least one bucket of possible values of the attribute, the at least one bucket being indicative of anomalous behavior; identifying, from the digital interaction, a value of the attribute; and determining whether the value identified from the digital interaction falls into the at least one bucket, wherein the digital interaction is determined to match the profile with respect to the attribute if it is determined that the value identified from the digital interaction falls into the at least one bucket; and determining a penalty score based at least in part on a count of attributes with respect to which the digital interaction matches the profile.
  • a computer-implemented method for analyzing a digital interaction, the method comprising acts of: determining whether the digital interaction is suspicious; in response to determining that the digital interaction is suspicious, deploying a security probe of a first type to collect first data from the digital interaction; analyzing first data collected from the digital interaction by the security probe of the first type to determine if the digital interaction continues to appear suspicious; if the first data collected from the digital interaction by the security probe of the first type indicates that the digital interaction continues to appear suspicious, deploying a security probe of a second type to collect second data from the digital interaction; and if the first data collected from the digital interaction by the security probe of the first type indicates that the digital interaction no longer appears suspicious, deploying a security probe of a third type to collect third data from the digital interaction.
  • a system comprising at least one processor and at least one computer-readable storage medium having stored thereon instructions which, when executed, program the at least one processor to perform any of the above methods.
  • At least one computer-readable storage medium having stored thereon instructions which, when executed, program at least one processor to perform any of the above methods.
  • FIG. 1A shows an illustrative system 10 via which digital interactions may take place, in accordance with some embodiments.
  • FIG. 1B shows an illustrative security system 14 for processing data collected from digital interactions, in accordance with some embodiments.
  • FIG. 1C shows an illustrative flow 40 within a digital interaction, in accordance with some embodiments.
  • FIG. 2A shows an illustrative data structure 200 for recording observations from a digital interaction, in accordance with some embodiments.
  • FIG. 2B shows an illustrative data structure 220 for recording observations from a digital interaction, in accordance with some embodiments.
  • FIG. 2C shows an illustrative process 230 for recording observations from a digital interaction, in accordance with some embodiments.
  • FIG. 3 shows illustrative attributes that may be monitored by a security system, in accordance with some embodiments.
  • FIG. 4 shows an illustrative process 400 for detecting anomalies, in accordance with some embodiments.
  • FIG. 5 shows an illustrative technique for dividing a plurality of numerical attribute values into a plurality of ranges, in accordance with some embodiments.
  • FIG. 6 shows an illustrative hash-modding technique for dividing numerical and/or non-numerical attribute values into buckets, in accordance with some embodiments.
  • FIG. 7A shows an illustrative histogram 700 representing a distribution of numerical attribute values among a plurality of buckets, in accordance with some embodiments.
  • FIG. 7B shows an illustrative histogram 720 representing a distribution of attribute values among a plurality of buckets, in accordance with some embodiments.
  • FIG. 8A shows an illustrative expected histogram 820 representing a distribution of attribute values among a plurality of buckets, in accordance with some embodiments.
  • FIG. 8B shows a comparison between the illustrative histogram 720 of FIG. 7B and the illustrative expected histogram 820 of FIG. 8A , in accordance with some embodiments.
  • FIG. 9 shows illustrative time periods 902 and 904 , in accordance with some embodiments.
  • FIG. 10 shows an illustrative normalized histogram 1000 , in accordance with some embodiments.
  • FIG. 11 shows an illustrative array 1100 of histograms over time, in accordance with some embodiments.
  • FIG. 12 shows an illustrative profile 1200 with multiple anomalous attributes, in accordance with some embodiments.
  • FIG. 13 shows an illustrative process 1300 for detecting anomalies, in accordance with some embodiments.
  • FIG. 14 shows an illustrative process 1400 for matching a digital interaction to a fuzzy profile, in accordance with some embodiments.
  • FIG. 15 shows an illustrative fuzzy profile 1500 , in accordance with some embodiments.
  • FIG. 16 shows an illustrative fuzzy profile 1600 , in accordance with some embodiments.
  • FIG. 17 shows an illustrative process 1700 for dynamic security probe deployment, in accordance with some embodiments.
  • FIG. 18 shows an illustrative cycle 1800 for updating one or more segmented lists, in accordance with some embodiments.
  • FIG. 19 shows an illustrative process 1900 for dynamically deploying multiple security probes, in accordance with some embodiments.
  • FIG. 20 shows an example of a decision tree 2000 that may be used by a security system to determine whether to deploy a probe and/or which one or more probes are to be deployed, in accordance with some embodiments.
  • FIG. 21 shows, schematically, an illustrative computer 5000 on which any aspect of the present disclosure may be implemented.
  • aspects of the present disclosure relate to systems and methods for detecting and scoring anomalies.
  • an attacker may coordinate multiple computers to carry out the attack.
  • the attacker may launch the attack using a “botnet.”
  • the botnet may include a network of virus-infected computers that the attacker may control remotely.
  • the inventors have recognized and appreciated various challenges in detecting web attacks. For instance, in a distributed attack, the computers involved may be located throughout the world, and may have different characteristics. As a result, it may be difficult to ascertain which computers are involved in the same attack. Additionally, in an attempt to evade detection, a sophisticated attacker may modify the behavior of each controlled computer slightly so that no consistent behavior profile may be easily discernible across the attack. Accordingly, in some embodiments, anomaly detection techniques are provided with improved effectiveness against an attack participated by computers exhibiting different behaviors.
  • a trigger may be a pattern comprising an e-commerce user making a high-value order and shipping to a new address, or a new account making several orders with different credit card numbers.
  • an alert may be raised with respect to the user or account, and/or an action may be taken (e.g., suspending the transaction or account).
  • a trigger-based system may produce false positives (e.g., a trigger tripping on a legitimate event) and/or false negatives (e.g., triggers not tripped during an attack or being tripped too late, when significant damage has been done).
  • anomaly detection techniques are provided with reduced false positive rate and/or false negative rate.
  • one or more fuzzy profiles may be created. When an observation is made from a digital interaction, a score may be derived for each fuzzy profile, where the score is indicative of an extent to which the observation matches the fuzzy profile.
  • scores may be derived in addition to, or instead of, Boolean outputs of triggers as described above, and may provide a more nuanced set of data points for a decision logic that determines what, if any, action is to be taken in response to the observation.
  • the inventors have recognized and appreciated that, although many attacks exhibit known suspicious patterns, it may take time for such patterns to emerge. For instance, an attacker may gain control of multiple computers that are seemingly unrelated (e.g., computers that are associated with different users, different network addresses, different geographic locations, etc.), and may use the compromised computers to carry out an attack simultaneously. As a result, damage may have been done by the time any suspicious pattern is detected.
  • a security system may be able to flag potential attacks earlier by looking for anomalies that emerge in real time, rather than suspicious patterns that are defined ahead of time. For instance, in some embodiments, a security system may monitor digital interactions taking place at a particular web site and compare what is currently observed against what was observed previously at the same web site. As one example, the security system may compare a certain statistic (e.g., a count of digital interactions reporting a certain browser type) from a current time period (e.g., 30 minutes, one hour, 90 minutes, two hours, etc.) against the same statistic from a past time period (e.g., the same time period a day ago, a week ago, a month ago, a year ago, etc.).
  • a certain statistic e.g., a count of digital interactions reporting a certain browser type
  • anomalies may be reported.
  • anomalies may be defined dynamically, based on activity patterns at the particular web site. Such flexibility may reduce false positive and/or false negative errors.
  • the security system may be able to detect attacks that do not exhibit any known suspicious pattern, and such detection may be possible before significant damage has been done.
  • a security system for detecting and scoring anomalies may process an extremely large amount of data.
  • a security system may analyze digital interactions for multiple large organizations.
  • the web site of each organization may handle hundreds of digital interactions per second, so that the security system may receive thousands, tens of thousands, or hundreds of thousands of requests per second to detect anomalies.
  • a few megabytes of data may be captured from each digital interaction (e.g., URL being accessed, user device information, keystroke recording, etc.) and, in evaluating the captured data, the security system may retrieve and analyze a few megabytes of historical, population, and/or other data.
  • the security system may analyze a few gigabytes of data per second just to support 1000 requests per second. Accordingly, in some embodiments, techniques are provided for aggregating data to facilitate efficient storage and/or analysis.
  • a security system may begin to analyze a digital interaction as soon as an entity arrives at a web site. For instance, the security system may begin collecting data from the digital interaction before the entity even attempts to log into a certain account. In some embodiments, the security system may compare the entity's behaviors against population data.
  • the security system may be able to draw some inferences as to whether the entity is likely a legitimate user, or a bot or human fraudster, before the entity takes any substantive action.
  • Various techniques are described herein for performing such analyses in real time for a high volume of digital interactions.
  • a number of attributes may be selected for a particular web site, where an attribute may be a question that may be asked about a digital interaction, and a value for that attribute may be an answer to the question.
  • a question may be, “how much time elapsed between viewing a product to checking out?”
  • An answer may be a value (e.g., in seconds or milliseconds) calculated based on a timestamp of a request for a product details page and a timestamp of a request for a checkout page.
  • an attribute may include an anchor type that is observable from a digital interaction.
  • a security system may observe that data packets received in connection with a digital interaction indicate a certain source network address and/or a certain source device identifier. Additionally, or alternatively, the security system may observe that a certain email address is used to log in and/or a certain credit card is charged in connection with the digital interaction.
  • anchor types include, but are not limited to, account identifier, email address (e.g., user name and/or email domain), network address (e.g., IP address, sub address, etc.), phone number (e.g., area code and/or subscriber number), location (e.g., GPS coordinates, continent, country, territory, city, designated market area, etc.), device characteristic (e.g., brand, model, operating system, browser, device fingerprint, etc.), device identifier, etc.
  • email address e.g., user name and/or email domain
  • network address e.g., IP address, sub address, etc.
  • phone number e.g., area code and/or subscriber number
  • location e.g., GPS coordinates, continent, country, territory, city, designated market area, etc.
  • device characteristic e.g., brand, model, operating system, browser, device fingerprint, etc.
  • a security system may maintain one or more counters for each possible value (e.g., Chrome, Safari, etc.) of an attribute (e.g., browser type). For instance, a counter for a possible attribute value (e.g., Chrome) may keep track of how many digital interactions with that particular attribute value (e.g., Chrome) are observed within some period of time (e.g., 30 minutes, one hour, 90 minutes, two hours, etc.). Thus, to determine if there is an anomaly associated with an attribute, the security system may simply examine one or more counters.
  • a counter for a possible attribute value e.g., Chrome
  • some period of time e.g. 30 minutes, one hour, 90 minutes, two hours, etc.
  • the security system may compare a counter keeping track of the number of digital interactions reporting a Chrome browser since 3:00 pm, against a counter keeping track of the number of digital interactions reporting a Chrome browser between 3:00 pm and 4:00 pm on the previous day (or a week ago, a month ago, a year ago, etc.). This may eliminate or at least reduce on-the-fly processing of raw data associated with the attribute values, thereby improving responsiveness of the security system.
  • the security system may maintain one or more counters for each bucket of attribute values. For instance, a counter may keep track of a number of digital interactions with any network address from a bucket B of network addresses, as opposed to a number of digital interactions with a particular network address Y. Thus, multiple counters (e.g., a separate counter for each attribute value in the bucket B) may be replaced with a single counter (e.g., an aggregate counter for all attribute values in the bucket B).
  • a desired balance between precision and efficiency may be achieved by selecting an appropriate number of buckets. For instance, a larger number of buckets may provide a higher resolution, but more counters may be maintained and updated, whereas a smaller number of buckets may reduce storage requirement and speed up retrieval and updates, but more information may be lost.
  • a hash function may be applied to attribute values and a modulo operation may be applied to divide the resulting hashes into a plurality of buckets, where there may be one bucket for each residue of the modulo operation.
  • An appropriate modulus may be chosen based on how many buckets are desired, and an appropriate hash function may be chosen to spread the attribute values roughly evenly across possible hashes. Examples of suitable hash functions include, but are not limited to, MD5, MD6, SHA-1, SHA-2, SHA-3, etc.
  • Some security systems flag all suspicious digital interactions for manual review, which may cause delays in sending acknowledgements to users. Moderate delays may be acceptable to organizations selling physical goods over the Internet, because for each order there may be a time window during which the ordered physical goods are picked from a warehouse and packaged for shipment, and a manual review may be conducted during that time window.
  • many digital interactions involve sale of digital goods (e.g., music, game, etc.), transfer of funds, etc.
  • a security system may be expected to respond to each request in real time, for example, within hundreds or tens of milliseconds. Such quick responses may improve user experience. For instance, a user making a transfer or ordering a song, game, etc. may wish to receive real time confirmation that the transaction has gone through. Accordingly, in some embodiments, techniques are provided for automatically investigating suspicious digital interactions, thereby improving response time of a security system.
  • a security system may scrutinize the digital interaction more closely, even if there is not yet sufficient information to justify classifying the digital interaction as part of an attack.
  • the security system may scrutinize a digital interaction in a non-invasive manner so as to reduce user experience friction.
  • a security system may observe an anomalously high percentage of traffic at a retail web site involving a particular product or service, and may so indicate in a fuzzy profile.
  • a digital interaction with an attempted purchase of that product or service may be flagged as matching the fuzzy profile, but that pattern alone may not be sufficiently suspicious, as many users may purchase that product or service for legitimate reasons.
  • one approach may be to send the flagged digital interaction to a human operator for review.
  • Another approach may be to require one or more verification tasks (e.g., captcha challenge, security question, etc.) before approving the attempted purchase.
  • the inventors have recognized and appreciated that both of these approaches may negatively impact user experience.
  • a match with a fuzzy profile may trigger additional analysis that is non-invasive.
  • the security system may collect additional data from the digital interaction in a non-invasive manner and may analyze the data in real time, so that by the time the digital interaction progresses to a stage with potential for damage (e.g., charging a credit card), the security system may have already determined whether the digital interaction is likely to be legitimate.
  • one or more security probes may be deployed dynamically to obtain information from a digital interaction. For instance, a security probe may be deploy only when a security system determines that there is sufficient value in doing so (e.g., using an understanding of user behavior). As an example, a security probe may be deployed when a level of suspicion associated with the digital interaction is sufficiently high to warrant an investigation (e.g., when the digital interaction matches a fuzzy profile comprising one or more anomalous attributes, or when the digital interaction represents a significant deviation from an activity pattern observed in the past for an anchor value, such as a device identifier, that is reported in the digital interaction).
  • an anchor value such as a device identifier
  • the inventors have recognized and appreciated that by reducing a rate of deployment of security probes for surveillance, it may be more difficult for an attacker to detect the surveillance and/or to discover how the surveillance is conducted. As a result, the attacker may not be able to evade the surveillance effectively.
  • multiple security probes may be deployed, where each probe may be designed to discover different information.
  • information collected by a probe may be used by a security system to inform the decision of which one or more other probes to deploy next.
  • the security system may be able to gain an in-depth understanding into network traffic (e.g., web site and/or application traffic).
  • the security system may be able to: classify traffic in ways that facilitate identification of malicious traffic, define with precision what type of attack is being observed, and/or discover that some suspect behavior is actually legitimate.
  • a result may indicate not only a likelihood that certain traffic is malicious, but also a likely type of malicious traffic. Therefore, such a result may be more meaningful than just a numeric score.
  • client-side checks to collect information.
  • such checks are enabled in a client during many interactions, which may give an attacker clear visibility into how the online behavior scoring system works (e.g., what information is collected, what tests are performed, etc.).
  • an attacker may be able to adapt and evade detection.
  • techniques are provided for obfuscating client-side functionalities. Used alone or in combination with dynamic probe deployment (which may reduce the number of probes deployed to, for example, one in hundreds of thousands of digital interactions), client-side functionality obfuscation may reduce the likelihood of malicious entities detecting surveillance and/or discovering how the surveillance is conducted. For instance, client-side functionality obfuscation may make it difficult for a malicious entity to test a probe's behavior in a consistent environment.
  • FIG. 1A shows an illustrative system 10 via which digital interactions may take place, in accordance with some embodiments.
  • the system 10 includes user devices 11 A-C, online systems 12 and 13 , and a security system 14 .
  • a user 15 may use the user devices 11 A-C to engage in digital interactions.
  • the user device 11 A may be a smart phone and may be used by the user 15 to check email and download music
  • the user device 11 B may be a tablet computer and may be used by the user 15 to shop and bank
  • the user device 11 C may be a laptop computer and may be used by the user 15 to watch TV and play games.
  • digital interactions are not limited to interactions that are conducted via an Internet connection.
  • a digital interaction may involve an ATM transaction over a leased telephone line.
  • user devices 11 A-C is provided solely for purposes of illustration, as the user 15 may use any suitable device or combination of devices to engage in digital interactions, and the user may use different devices to engage in a same type of digital interactions (e.g., checking email).
  • a digital interaction may involve an interaction between the user 15 and an online system, such as the online system 12 or the online system 13 .
  • the online system 12 may include an application server that hosts a backend of a banking app used by the user 15
  • the online system 13 may include a web server that hosts a retailer's web site that the user 15 visits using a web browser.
  • the user 15 may interact with other online systems (not shown) in addition to, or instead of the online systems 12 and 13 .
  • the user 15 may visit a pharmacy's web site to have a prescription filled and delivered, a travel agent's web site to book a trip, a government agency's web site to renew a license, etc.
  • behaviors of the user 15 may be measured and analyzed by the security system 14 .
  • the online systems 12 and 13 may report, to the security system 14 , behaviors observed from the user 15 .
  • the user devices 11 A-C may report, to the security system 14 , behaviors observed from the user 15 .
  • a web page downloaded from the web site hosted by the online system 13 may include software (e.g., a JavaScript snippet) that programs the browser running on one of the user devices 11 A-C to observe and report behaviors of the user 15 .
  • Such software may be provided by the security system 14 and inserted into the web page by the online system 13 .
  • an application running on one of the user devices 11 A-C may be programmed to observe and report behaviors of the user 15 .
  • the behaviors observed by the application may include interactions between the user 15 and the application, and/or interactions between the user 15 and another application.
  • an operating system running on one of the user devices 11 A-C may be programmed to observe and report behaviors of the user 15 .
  • software that observes and reports behaviors of a user may be written in any suitable language, and may be delivered to a user device in any suitable manner.
  • the software may be delivered by a firewall (e.g., an application firewall), a network operator (e.g., Comcast, Sprint, etc.), a network accelerator (e.g., Akamai), or any device along a communication path between the user device and an online system, or between the user device and a security system.
  • a firewall e.g., an application firewall
  • a network operator e.g., Comcast, Sprint, etc.
  • a network accelerator e.g., Akamai
  • the security system 14 may be programmed to measure and analyze behaviors of many users across the Internet. Furthermore, it should be appreciated that the security system 14 may interact with other online systems (not shown) in addition to, or instead of the online systems 12 and 13 . The inventors have recognized and appreciated that, by analyzing digital interactions involving many different users and many different online systems, the security system 14 may have a more comprehensive and accurate understanding of how the users behave. However, aspects of the present disclosure are not limited to the analysis of measurements collected from different online systems, as one or more of the techniques described herein may be used to analyze measurements collected from a single online system. Likewise, aspects of the present disclosure are not limited to the analysis of measurements collected from different users, as one or more of the techniques described herein may be used to analyze measurements collected from a single user.
  • FIG. 1B shows an illustrative implementation of the security system 14 shown in FIG. 1A , in accordance with some embodiments.
  • the security system 14 includes one or more frontend systems and/or one or more backend systems.
  • the security system 14 may include a frontend system 22 configured to interact with user devices (e.g., the illustrative user device 11 C shown in FIG. 1A ) and/or online systems (e.g., the illustrative online system 13 shown in FIG. 1A ).
  • the security system 14 may include a backend system 32 configured to interact with a backend user interface 34 .
  • the backend user interface 34 may include a graphical user interface (e.g., a dashboard) for displaying current observations and/or historical trends regarding individual users and/or populations of users.
  • a graphical user interface e.g., a dashboard
  • Such an interface may be delivered in any suitable manner (e.g., as a web application or a cloud application), and may be used by any suitable party (e.g., security personnel of an organization).
  • the security system 14 includes a log storage 24 .
  • the log storage 24 may store log files comprising data received by the frontend system 22 from user devices (e.g., the user device 11 C), online systems (e.g., the online system 13 ), and/or any other suitable sources.
  • a log file may include any suitable information. For instance, in some embodiments, a log file may include keystrokes and/or mouse clicks recorded from a digital interaction over some length of time (e.g., several seconds, several minutes, several hours, etc.). Additionally, or alternatively, a log file may include other information of interest, such as account identifier, network address, user device identifier, user device characteristics, URL accessed, Stocking Keeping Unit (SKU) of viewed product, etc.
  • SKU Stocking Keeping Unit
  • the log storage 24 may store log files accumulated over some suitable period of time (e.g., a few years), which may amount to tens of billions, hundreds of billions, or trillions of log files.
  • Each log file may be of any suitable size. For instance, in some embodiments, about 60 kilobytes of data may be captured from a digital interaction per minute, so that a log file recording a few minutes of user behavior may include a few hundred kilobytes of data, whereas a log file recording an hour of user behavior may include a few megabytes of data.
  • the log storage 24 may store petabytes of data overall.
  • a log processing system 26 may be provided to filter, transform, and/or route data from the log storage 24 to one or more databases 28 .
  • the log processing system 26 may be implemented in any suitable manner.
  • the log processing system 26 may include one or more services configured to retrieve a log file from the log storage 24 , extract useful information from the log file, transform one or more pieces of extracted information (e.g., adding latitude and longitude coordinates to an extracted address), and/or store the extracted and/or transformed information in one or more appropriate databases (e.g., among the one or more databases 28 ).
  • the one or more services may include one or more services configured to route data from log files to one or more queues, and/or one or more services configured to process the data in the one or more queues.
  • each queue may have a dedicated service for processing data in that queue. Any suitable number of instances of the service may be run, depending on a volume of data to be processed in the queue.
  • the one or more databases 28 may be accessed by any suitable component of the security system 14 .
  • the backend system 32 may query the one or more databases 28 to generate displays of current observations and/or historical trends regarding individual users and/or populations of users.
  • a data service system 30 may query the one or more databases 28 to provide input to the frontend system 22 .
  • the inventors have recognized and appreciated that some database queries may be time consuming. For instance, if the frontend system 22 were to query the one or more databases 28 each time a request to detect anomaly is received, the frontend system 22 may be unable to respond to the request within 100 msec, 80 msec, 60 msec, 40 msec, 20 msec, or less. Accordingly, in some embodiments, the data service system 30 may maintain one or more data sources separate from the one or more databases 28 . An example of a data source maintained by the data service system 30 is shown in FIG. 2A and discussed below.
  • a data source maintained by the data service system 30 may have a bounded size, regardless of how much data is analyzed to populate the data source. For instance, if there is a burst of activities from a certain account, an increased amount of data may be stored in the one or more databases 28 in association with that account. The data service system 30 may process the data stored in the one or more databases 28 down to a bounded size, so that the frontend system 22 may be able to respond to requests in constant time.
  • all possible network addresses may be divided into a certain number of buckets. Statistics may be maintained on such buckets, rather than individual network addresses. In this manner, a bounded number of statistics may be analyzed, even if an actual number of network addresses observed may fluctuate over time.
  • One or more other techniques may also be used in addition to, or instead of bucketing, such as maintaining an array of a certain size.
  • the data service system 30 may include a plurality of data services (e.g., implemented using a service-oriented architecture). For example, one or more data services may access the one or more databases 28 periodically (e.g., every hour, every few hours, every day, etc.), and may analyze the accessed data and populate one or more first data sources used by the frontend system 22 . Additionally, or alternatively, one or more data services may receive data from the log processing system 26 , and may use the received data to update one or more second data sources used by the frontend system 22 . Such a second data source may supplement the one or more first data sources with recent data that has arrived since the last time the one or more first data sources were populated using data accessed from the one or more databases 28 . In various embodiments, the one or more first data sources may be the same as, or different from, the one or more second data sources, or there may be some overlap.
  • one or more data services may access the one or more databases 28 periodically (e.g., every hour, every few hours, every day, etc.
  • each of the frontend system 22 , the log processing system 26 , the data service system 30 , and the backend system 32 may be implemented in any suitable manner, such as using one or more parallel processors operating at a same location or different locations.
  • FIG. 1C shows an illustrative flow 40 within a digital interaction, in accordance with some embodiments.
  • the flow 40 may represent a sequence of activities conducted by a user on a merchant's web site. For instance, the user may log into the web site, change billing address, view a product details page of a first product, view a product details page of a second product, add the second product to a shopping cart, and then check out.
  • a security system may receive data captured from the digital interaction throughout the flow 40 .
  • the security system may receive log files from a user device and/or an online system involved in the digital interaction (e.g., as shown in FIG. 1B and discussed above).
  • the security system may use the data captured from the digital interaction in any suitable manner. For instance, as shown in FIG. 1B , the security system may process the captured data and populate one or more databases (e.g., the one or more illustrative databases 28 shown in FIG. 1B ). Additionally, or alternatively, the security system may populate one or more data sources adapted for efficient access. For instance, the security system may maintain current interaction data 42 in a suitable data structure (e.g., the illustrative data structure 220 shown in FIG. 2B ).
  • a suitable data structure e.g., the illustrative data structure 220 shown in FIG. 2B .
  • the security system may keep track of different network addresses observed at different points in the flow 40 (e.g., logging in and changing billing address via a first network address, viewing the first and second products via a second network address, and adding the second product to the cart and checking out via a third network address).
  • the security system may keep track of different credit card numbers used in the digital interaction (e.g., different credit cards being entered in succession during checkout).
  • the data structure may be maintained in any suitable manner (e.g., using the illustrative process 230 shown in FIG. 2C ) and by any suitable component of the security system (e.g., the illustrative frontend system 22 and/or the illustrative data service system 30 ).
  • the security system may maintain historical data 44 , in addition to, or instead of the current interaction data 42 .
  • the historical data 44 may include log entries for user activities observed during one or more prior digital interactions. Additionally, or alternatively, the historical data 44 may include one or more profiles associated respectively with one or more anchor values (e.g., a profile associated with a particular device identifier, a profile associated with a particular network address, etc.).
  • anchor values e.g., a profile associated with a particular device identifier, a profile associated with a particular network address, etc.
  • the security system may maintain population data 46 , in addition to, or instead of the current interaction data 42 and/or the historical data 44 .
  • the security system may update, in real time, statistics such as breakdown of web site traffic by user agent, geographical location, product SKU, etc.
  • the security system may use a hash-modding method to divide all known browser types into a certain number of buckets (e.g., 10 buckets, 100 buckets, etc.). For each bucket, the security system may calculate a percentage of overall web site traffic that falls within that bucket.
  • the security system may use a hash-modding method to divide all known product SKUs into a certain number of buckets (e.g., 10 buckets, 100 buckets) and calculate respective traffic percentages. Additionally, or alternatively, the security system may calculate respective traffic percentages for combinations of buckets (e.g., a combination of a bucket of browser types, a bucket of product SKUs, etc.).
  • a security system may perform anomaly detection processing on an on-going basis and may continually create new fuzzy profiles and/or update existing fuzzy profiles. For instance, the security system may compare a certain statistic (e.g., a count of digital interactions reporting Chrome as browser type) from a current time period (e.g., 9:00 pm-10:00 pm today) against the same statistic from a past time period (e.g., 9:00 pm-10:00 pm yesterday, a week ago, a month ago, a year ago, etc.).
  • a certain statistic e.g., a count of digital interactions reporting Chrome as browser type
  • an anomaly may be reported, and the corresponding attribute (e.g., browser type) and attribute value (e.g., Chrome) may be stored in a fuzzy profile.
  • attribute e.g., browser type
  • attribute value e.g., Chrome
  • the security system may render any one or more aspects of the current interaction data 42 , the historical data 44 , and/or the population data 46 (e.g., via the illustrative backend user interface 34 shown in FIG. 1B ). For instance, the security system may render breakdown of web site traffic (e.g., with actual traffic measurements, or percentages of overall traffic) using a stacked area chart.
  • the security system may render breakdown of web site traffic (e.g., with actual traffic measurements, or percentages of overall traffic) using a stacked area chart.
  • FIG. 1C also shows examples of time measurements in the illustrative flow 40 .
  • the security system may receive data captured throughout the flow 40 , and the received data may include log entries for user activities such as logging into the web site, changing billing address, viewing the product details page of the first product, viewing the product details page of the second product, adding the second product to the shopping cart, checking out, etc.
  • the log entries may include timestamps, which may be used by the security system to determine an amount of time that elapsed between two points in the digital interaction. For instance, the security system may use the appropriate timestamps to determine how much time elapsed between viewing the second product and adding the second product to the shopping cart, between adding the second product to the shopping cart and checking out, between viewing the second product to checking out, etc.
  • timing patterns may be indicative of illegitimate digital interactions.
  • a reseller may use bots to make multiple purchases of a product that is on sale, thereby circumventing a quantity restriction (e.g., one per customer) imposed by a retail web site.
  • a quantity restriction e.g., one per customer
  • Such a bot may be programmed to step through an order quickly, to maximize the total number of orders completed during a promotional period.
  • the resulting timing pattern may be noticeably different from that of a human customer browsing through the web site and taking time to read product details before making a purchase decision. Therefore, a timing pattern such as a delay between product view and checkout may be a useful attribute to monitor in digital interactions.
  • aspects of the present disclosure are not limited to the analysis of online purchases, as one or more of the techniques described herein may be used to analyze other types of digital interactions, including, but not limited to, opening a new account, checking email, transferring money, etc. Furthermore, it should be appreciated that aspects of the present disclosure are not limited to monitoring any particular timing attribute, or any timing attribute at all. In some embodiments, other attributes, such as various anchor types observed from a digital interaction, may be monitored in addition to, or instead of, timing attributes.
  • FIG. 2A shows an illustrative data structure 200 for recording observations from a digital interaction, in accordance with some embodiments.
  • the data structure 200 may be used by a security system (e.g., the illustrative security system 14 shown in FIG. 1A ) to record distinct anchor values of a same type that have been observed in a certain context.
  • a security system e.g., the illustrative security system 14 shown in FIG. 1A
  • the data structure 200 may be used to record other distinct values, instead of, or in addition to, anchor values.
  • the data structure 200 may be used to store up to N distinct anchor values of a same type (e.g., N distinct credit card numbers) that have been seen in a digital interaction.
  • the data structure 200 may include an array 205 of a certain size N. Once the array has been filled, a suitable method may be used to determine whether to discard a newly observed credit card number, or replace one of the stored credit card numbers with the newly observed credit card number. In this manner, only a bounded amount of data may be analyzed in response to a query, regardless of an amount of raw data that has been received.
  • the number N of distinct values may be chosen to provide sufficient information without using an excessive amount of storage space.
  • a security system may store more distinct values (e.g., 8-16) if precise values are useful for detecting anomalies, and fewer distinct values (e.g., 2-4) if precise values are less important.
  • N may be 8-16 for network addresses, 4-8 for credit card numbers, and 2-4 for user agents.
  • the security system may use the network addresses to determine if there is a legitimate reason for multiple network addresses being observed (e.g., a user traveling and connecting to a sequence of access points along the way), whereas the security system may only look for a simple indication that multiple user agents have been observed.
  • bit string 210 of length M may be stored in addition to, or instead of, N distinct observed values.
  • Each bit in the bit string 210 may correspond to a respective bucket, and may be initialized to 0. Whenever a value in a bucket is observed, the bit corresponding to that bucket may be set to 1.
  • Possible values may be divided into buckets in any suitable manner.
  • a hash function may be applied to possible values and a modulo operation (with modulus M) may be applied to divide the resulting hashes into M buckets.
  • the modulus M may be chosen to achieve a desired balance between precision and efficiency. For instance, a larger number of buckets may provide a higher resolution (e.g., fewer possible values being lumped together and becoming indistinguishable), but the bit string 210 may take up more storage space, and it may be computationally more complex to update and/or access the bit string 210 .
  • aspects of the present disclosure are not limited to the use of hash-modding to divide possible values into buckets, as other methods may also be suitable. For instance, in some embodiments, one or more techniques based on Bloom filters may be used.
  • FIG. 2B shows an illustrative data structure 220 for recording observations from a digital interaction, in accordance with some embodiments.
  • the data structure 220 may be used by a security system (e.g., the illustrative security system 14 shown in FIG. 1A ) to record distinct anchor values that have been observed in a certain context.
  • a security system e.g., the illustrative security system 14 shown in FIG. 1A
  • the data structure 220 may be used to record other distinct values, instead of, or in addition to, anchor values.
  • the data structure 220 may be indexed by a session identifier and a flow identifier.
  • the session identifier may be an identifier assigned by a web server for a web session.
  • the flow identifier may identifier a flow (e.g., the illustrative flow 40 shown in FIG. 1C ), which may include a sequence of activities.
  • the security system may use the session and flow identifiers to match a detected activity to the digital interaction.
  • aspects of the present disclosure are not limited to the use of a session identifier and a flow identifier to identify a digital interaction.
  • the data structure 220 may include a plurality of components, such as components 222 , 224 , 226 , and 228 shown in FIG. 2B .
  • Each of the components 222 , 224 , 226 , and 228 may be similar to the illustrative data structure 200 shown in FIG. 2A .
  • the component 222 may store up to a certain number of distinct network addresses observed from the digital interaction
  • the component 224 may store up to a certain number of distinct user agents observed from the digital interaction
  • the component 226 may store up to a certain number of distinct credit card numbers observed from the digital interaction, etc.
  • the data structure 220 may include a relatively small number (e.g., 10, 20, 30, etc.) of components such as 222 , 224 , 226 , and 228 . In this manner, a relatively small amount of data may be stored for each on-going digital interaction, while still allowing a security system to conduct an effective sameness analysis.
  • a relatively small number e.g., 10, 20, 30, etc.
  • the component 228 may store a list of lists of indices, where each list of indices may correspond to an activity that took place in the digital interaction. For instance, with reference to the illustrative flow 40 shown in FIG. 1C , a first list of indices may correspond to logging in, a second list of indices may corresponding to changing billing address, a third list of indices may correspond to viewing the first product, a fourth list of indices may correspond to viewing the second product, a fifth list of indices may correspond to adding the second product to the shopping cart, and a sixth list of indices may correspond to checking out.
  • each list of indices may indicate anchor values observed from the corresponding activity. For instance, a list [1, 3, 2, . . . ] may indicate the first network address stored in the component 222 , the third user agent stored in the component 224 , the second credit card stored in the component 226 , etc. This may provide a compact representation of the anchor values observed from each activity.
  • one or more lists of indices including the anchor value being replaced may be updated. For instance, if the first network address stored in the component 222 is replaced by another network address, the list [1, 3, 2, . . . ] may be updated as [ ⁇ , 3, 2, . . . ], where ⁇ is any suitable default value (e.g., N+1, where N is the capacity of the component 222 ).
  • a security system may use a list of lists of indices to determine how frequently an anchor value has been observed. For instance, the security system may count a number of lists in which the index 1 appears at the first position. This may indicate a number of times the first network address stored in the component 222 has been observed.
  • other types of component data structures may be used in addition to, or instead of, the illustrative data structure 200 shown in FIG. 2A .
  • FIG. 2C shows an illustrative process 230 for recording observations from a digital interaction, in accordance with some embodiments.
  • the process 230 may be performed by a security system (e.g., the illustrative security system 14 shown in FIG. 1A ) to record distinct values of a same type (e.g., N distinct credit card numbers) that have been observed in a certain context (e.g., in a certain digital interaction).
  • the distinct values may be recorded in a data structure such as the illustrative data structure 200 shown in FIG. 2A .
  • the security system may identify an anchor value X in a certain context. For instance, in some embodiments, the anchor value X may be observed from a certain digital interaction. In some embodiments, the security system may access a record of the digital interaction, and may identify from the record a data structure associated with a type T of the anchor value X. For instance, if the anchor value X is a credit card number, the security system may identify, from the record of the digital interaction, a data structure for storing credit card numbers observed from the digital interaction.
  • the security system may identify a bucket B to which the anchor value X belongs. For instance, in some embodiments, a hash-modding operation may be performed to map the anchor value X to the bucket B as described above in connection with FIG. 2A .
  • the security system may store an indication that at least one anchor value from the bucket B has been observed in connection with the digital interaction. For instance, the security system may operate on the data structure identified at act 231 . With reference with the example shown in FIG. 2A , the security system may identify, in the illustrative bit string 210 , a position that corresponds to the bucket B identified at act 232 and write 1 into that position.
  • the security system may determine whether the anchor value X has already been stored in connection with the relevant context. For instance, the security system may check if the anchor value X has already been stored in the data structure identified at act 231 . With reference to the example shown in FIG. 2A , the security system may look up the anchor value X in the illustrative array 205 . This lookup may be performed in any suitable manner. For instance, if the array 205 is sorted, the security system may perform a binary search to determine if the anchor value X is already stored in the array 205 .
  • the process 230 may end.
  • the security system may, in some embodiments, increment one or more counters for the anchor value X prior to ending the process 230 .
  • the security system may proceed to act 235 to determine whether to store the anchor value X.
  • the security system may, in some embodiments, store the anchor value X if the array 205 is not yet full. If the array 205 is full, the security system may determine whether to replace one of the stored anchor values with the anchor value X.
  • the security system may store in the array 205 the first N distinct anchor values of the type T observed from the digital interaction, and may discard every subsequently observed anchor value of the type T. As another example, the security system may replace the oldest stored anchor value with the newly observed anchor value, so that the array 205 stores the last N distinct values of the type T observed in the digital interaction.
  • the security system may store in the array 205 a suitable combination of N anchor values of the type T, such as one or more anchor values observed near a beginning of the digital interaction, one or more anchor values most recently observed from the digital interaction, one or more anchor values most frequently observed from the digital interaction (e.g., based on respective counters stored for anchor values, or lists of indices such as the illustrative component 228 shown in FIG. 2 B), and/or one or more other anchor values of interest (e.g., one or more credit card numbers previously involved in credit card cycling attacks).
  • N anchor values of the type T such as one or more anchor values observed near a beginning of the digital interaction, one or more anchor values most recently observed from the digital interaction, one or more anchor values most frequently observed from the digital interaction (e.g., based on respective counters stored for anchor values, or lists of indices such as the illustrative component 228 shown in FIG. 2 B), and/or one or more other anchor values of interest (e.g., one or more credit card numbers previously involved
  • FIG. 3 shows illustrative attributes that may be monitored by a security system, in accordance with some embodiments.
  • a security system e.g., the illustrative security system 14 shown in FIG. 1B
  • monitors a plurality of digital interactions such as digital interactions 301 , 302 , 303 , etc. These digital interactions may take place via a same web site. However, that is not required, as one or more of the techniques described herein may be used to analyze digital interactions taking place across multiple web sites.
  • the security system monitors different types of attributes. For instance, the security system may record one or more anchor values for each digital interaction, such as network address (attribute 311 ), email address (attribute 312 ), account identifier (attribute 313 ), etc.
  • anchor values for each digital interaction, such as network address (attribute 311 ), email address (attribute 312 ), account identifier (attribute 313 ), etc.
  • the security system may identify an anchor value from a digital interaction in any suitable matter.
  • the digital interaction may include an attempt to log in, and an email address may be submitted to identify an account associated with the email address. However, that is not required, as in some embodiments a separate account identifier may be submitted and an email address on record for that account may be identified.
  • the digital interaction may include an online purchase. A phone number may be submitted for scheduling a delivery, and a credit card number may be submitted for billing. However, that is not required, as in some embodiments a phone number and/or a credit card number may be identified from a record of the account from which the online purchase is made.
  • the security system may examine data packets received in connection with the digital interaction and extract, from the data packets, information such as a source network address and a source device identifier.
  • anchor types include, but are not limited to the following.
  • the security system may monitor one or more transaction attributes in addition to, or instead of, one or more anchor types.
  • the security system may identify transaction attribute values from a digital interaction in any suitable matter.
  • the digital interaction may include a purchase transaction, and the security system may identify information relating to the purchase transaction, such as the a SKU for a product being purchased (attribute 321 ), a count of items in a shopping cart at time of checkout (attribute 322 ), an average value of items being purchased (attribute 323 ), etc.
  • the security system may monitor one or more timing attributes, such as time from product view to checking out (attribute 331 ), time from adding a product to cart to checking out (attribute 332 ), etc. Illustrative techniques for identifying timing attribute values are discussed in connection with FIG. 2 .
  • a digital interaction may include a transfer of funds, instead of, or in addition to, a purchase transaction.
  • transaction attributes for a transfer of funds include, but are not limited to, amount being transferred, name of recipient institution, recipient account number, etc.
  • FIG. 4 shows an illustrative process 400 for detecting anomalies, in accordance with some embodiments.
  • the process 400 may be performed by a security system (e.g., the illustrative security system 14 shown in FIG. 1B ) to monitor digital interactions taking place at a particular web site.
  • the security system may compare what is currently observed against what was observed previously at the same web site to determine whether there is any anomaly.
  • the security system may identify a plurality of values of an attribute.
  • the security system may monitor any suitable attribute, such as an anchor type (e.g., network address, email address, account identifier, etc.), a transaction attribute (e.g., product SKU, number of items in shopping cart, average value of items purchased, etc.), a timing attribute (e.g., time from product view to checkout, time from adding product to shopping cart to checkout, etc.), etc.
  • an anchor type e.g., network address, email address, account identifier, etc.
  • a transaction attribute e.g., product SKU, number of items in shopping cart, average value of items purchased, etc.
  • a timing attribute e.g., time from product view to checkout, time from adding product to shopping cart to checkout, etc.
  • the security system may identify each value of the attribute from a respective digital interaction. For instance, the security system may monitor digital interactions taking place within a current time period (e.g., 30 minutes, one hour, 90 minutes, two hours, etc.), and may identify a value of the attribute from each digital interaction.
  • a current time period e.g. 30 minutes, one hour, 90 minutes, two hours, etc.
  • aspects of the present disclosure are not limited to monitoring every digital interaction taking place within some time period.
  • digital interactions may be sampled (e.g., randomly) and attribute values may be identified from the sampled digital interactions.
  • possible values of an attribute may be divided into a plurality of buckets.
  • the security system may maintain a counter for each bucket of attribute values. For instance, a counter may keep track of a number of digital interactions with any network address from a bucket B of network addresses, as opposed to a number of digital interactions with a particular network address Y.
  • multiple counters e.g., a separate counter for each attribute value in the bucket B
  • a single counter e.g., an aggregate counter for all attribute values in the bucket B).
  • a desired balance between precision and efficiency may be achieved by selecting an appropriate number of buckets. For instance, a larger number of buckets may provide a higher resolution, but more counters may be maintained and updated, whereas a smaller number of buckets may reduce storage requirement and speed up retrieval and updates, but more information may be lost.
  • the security system may, at act 410 , divide the attribute values identified at act 405 into a plurality of buckets.
  • each bucket may be a multiset. For instance, if two different digital interactions report the same network address, that network address may appear twice in the corresponding bucket.
  • the security system may determine a count of the values that fall within a particular bucket. In some embodiments, a count may be determined for each bucket of the plurality of buckets. However, that is not required, as in some embodiments the security system may only keep track of one or more buckets of interest.
  • the security system may divide numerical attribute values (e.g., time measurements) into a plurality of ranges.
  • the security system may use a hash-modding technique to divide numerical and/or non-numerical attribute values into buckets.
  • Other techniques may also be used, as aspects of the present disclosure are not limited to any particular technique for dividing attribute values into buckets.
  • FIG. 5 shows an illustrative technique for dividing a plurality of numerical attribute values into a plurality of ranges, in accordance with some embodiments.
  • the illustrative technique shown in FIG. 5 may be used by a security system to divide values of the illustrative attribute 331 (time from product view to checkout) shown in FIG. 3 into a plurality of buckets.
  • the plurality of buckets include three buckets corresponding respectively to three ranges of time measurements.
  • bucket 581 may correspond to a range between 0 and 10 seconds
  • bucket 582 may correspond to a range between 10 and 30 seconds
  • bucket 583 may correspond to a range of greater than 30 seconds.
  • thresholds for dividing numeric measurements into buckets may be chosen based on observations from population data. For instance, the inventors have recognized and appreciated that the time from product view to checkout is rarely less than 10 seconds in a legitimate digital interaction, and therefore a high count for the bucket 581 may be a good indicator of an anomaly.
  • buckets may be defined based on a population mean and a population standard deviation. For instance, there may be a first bucket for values that are within one standard deviation of the mean, a second bucket for values that are between one and two standard deviations away from the mean, a third bucket for values that are between two and three standard deviations away from the mean, and a fourth bucket for values that are more than three standard deviations away from the mean.
  • a bucket may be defined based on observations from known fraudsters, and/or a bucket may be defined based on observations from known legitimate users.
  • the security system may identify a plurality of values of the attribute 331 from a plurality of digital interactions. For instance, the security system may identify from each digital interaction an amount of time that elapsed between viewing a product details page for a product and checking out (e.g., as discuss in connection with FIGS. 1C and 3 ). In the example shown in FIG. 5 , nine digital interactions are monitored, and nine values of the attribute 331 (time from product view to checkout) are obtained. It should be appreciated that aspects of the present disclosure are not limited to monitoring any particular number of digital interactions. For instance, in some embodiments, some or all of the digital interactions taking place during a certain period of time may be monitored, and the number of digital interactions may fluctuate depending on a traffic volume at one or more relevant web sites.
  • the nine values are divided into the buckets 581 - 583 based on the corresponding ranges, resulting in four values (i.e., 10 seconds, 1 second, 2 seconds, and 2 seconds) in the bucket 581 , three values (i.e., 25 seconds, 15 seconds, and 30 seconds) in the bucket 582 , and two values (i.e., 45 seconds and 90 seconds) in the bucket 583 .
  • numerical data collected by the security system may be quantized to reduce a number of possible values for a particular attribute, for example, from thousands or more of possible values (3600 seconds, assuming time is recorded up to one hour) to three possible values (three ranges). This may allow the security system to analyze the collected data more efficiently.
  • aspects of the present disclosure are not limited to the use of any particular quantization technique, or any quantization technique at all.
  • FIG. 7A shows an illustrative histogram 700 representing a distribution of numerical attribute values among a plurality of buckets, in accordance with some embodiments.
  • the histogram 700 may represent a result of dividing a plurality of time attribute values into a plurality of ranges, as discussed in connection with act 415 of FIG. 4 .
  • the time attribute values may be values of the illustrative attribute 331 (time from product view to checkout) shown in FIG. 3 .
  • the histogram 700 includes a plurality of bars, where each bar may correspond to a bucket, and each bucket may correspond to a range of time attribute values.
  • the height of each bar may represent a count of values that fall into the corresponding bucket. For instance, the count for the second bucket (between 1 and 5 minutes) may higher than the count for the first bucket (between 0 and 1 minute), while the count for the third bucket (between 5 and 15 minutes) may be the highest, indicating that a delay between product view and checkout most frequently falls between 5 and 15 minutes.
  • a number M of buckets may be selected to provide an appropriate resolution to analyze measured values for an attribute, while managing storage requirement. For instance, more buckets may provide higher resolution, but more counters may be stored.
  • the buckets may correspond to ranges of uniform length, or variable lengths. For instance, in some embodiments, smaller ranges may be used where attribute values tend to cluster (e.g., smaller ranges below 15 minutes), and/or larger ranges may be used where attribute values tend to be sparsely distributed (e.g., larger ranges above 15 minutes). As an example, if a bucket has too many values (e.g., above a selected threshold number), the bucket may divided into two or more smaller buckets.
  • the bucket may be merged with one or more adjacent buckets. In this manner, useful information about distribution of the attribute values may be made available, without storing too many counters.
  • FIG. 6 shows an illustrative hash-modding technique for dividing numerical and/or non-numerical attribute values into buckets, in accordance with some embodiments.
  • the illustrative technique shown in FIG. 6 may be used by a security system to divide values of the illustrative attribute 311 (IP addresses) shown in FIG. 3 into a plurality of buckets.
  • a hash-modding technique may involve hashing an input value and performing a modulo operation on the resulting hash value.
  • a modulo operation i.e., mod 256
  • FIG. 6 nine digital interactions are monitored, and nine values of the attribute 311 (IP address) are obtained. These nine IP addresses may be hashed to produce nine hash values, respectively. The following values may result from extracting two least significant digits from each hash value: 93, 93, 41, 41, 9a, 9a, 9a, 9a, 9a.
  • This extraction process may be equivalent to performing a modulo operation (i.e., mod 256) on the hash values.
  • each residue of the modulo operation may correspond to a bucket of attribute values.
  • the residues 93, 41, and 9a correspond, respectively, to buckets 681 - 683 .
  • there may be two attribute values in each of the bucket 681 and the bucket 682 and five attribute values in the bucket 683 .
  • FIG. 7B shows an illustrative histogram 720 representing a distribution of attribute values among a plurality of buckets, in accordance with some embodiments.
  • the histogram 720 may represent a result of dividing a plurality of attribute values into a plurality of buckets, as discussed in connection with act 415 of FIG. 4 .
  • the attribute values may be values of the illustrative attribute 311 (IP addresses) shown in FIG. 3 .
  • IP addresses IP addresses
  • Each attribute value may be converted into a hash value, and a modulo operation may be applied to map each hash value to a residue, as discussed in connection with FIG. 6 .
  • the histogram 720 includes a plurality of bars, where each bar may correspond to a bucket, and each bucket may correspond to a residue of the modulo operation.
  • the height of each bar may represent a count of values that fall into the corresponding bucket. For instance, the count for the third bucket (residue “02”) is higher than the count for the first bucket (residue “00”) and the count for the second bucket (residue “01”), indicating that one or more IP addresses that hash-mod to “02” are frequently observed.
  • a modulus M of the modulo operation (which determines how many buckets there are) may be selected to provide an appropriate resolution to analyze measured values for an attribute, while managing storage requirement. For instance, more buckets may provide higher resolution, but more counters may be stored. Moreover, in some embodiments, buckets may be further divided and/or merged. As one example, if a bucket has too many values (e.g., above a selected threshold number), the bucket may divided into smaller buckets. For instance, the bucket for hash values ending in “00” may be divided into 16 buckets for hash values ending, respectively, in “000,” “100,” . . . , “f00”), or into two buckets, the first for hash values ending in “000,” “100,” .
  • the bucket may be merged with one or more other buckets. In this manner, useful information about distribution of the attribute values may be made available, without storing too many counters.
  • the security system may, at act 420 , compare the count determined in act 415 against historical information.
  • the historical information may include an expected count for the same bucket, and the security system may compare the count determined in act 415 against the expected count.
  • the determination at act 415 and the comparison at act 420 may be performed for any number of one or more buckets.
  • a histogram obtained at act 415 e.g., the illustrative histogram 720 shown in FIG. 7B
  • an expected histogram obtained from historical information may be compared against an expected histogram obtained from historical information.
  • FIG. 8A shows an illustrative expected histogram 820 representing a distribution of attribute values among a plurality of buckets, in accordance with some embodiments.
  • the expected histogram 820 may be calculated in similar manner as the illustrative histogram 720 of FIG. 7B , except that attribute values used to calculate the expected histogram 820 may be obtained from a plurality of past digital interactions, such as digital interactions from a past period of time during which there is no known attack (or no known large-scale attack) on one or more relevant web site.
  • the expected histogram 820 may represent an acceptable pattern.
  • FIG. 8B shows a comparison between the illustrative histogram 720 of FIG. 7B and the illustrative expected histogram 820 of FIG. 8A , in accordance with some embodiments.
  • FIG. 9 shows illustrative time periods 902 and 904 , in accordance with some embodiments.
  • attribute values that are used to calculate the illustrative histogram 720 of FIG. 7B may be obtained from digital interactions taking place during the time period 902
  • attribute values that are used to calculate the illustrative expected histogram 820 of FIG. 8A may be obtained from digital interactions taking place during the time period 904 .
  • the security system may perform anomaly detection processing on a rolling basis. Whenever anomaly detection processing is performed, the time period 902 may be near a current time, whereas the time period 904 may be in the past.
  • the time periods 902 and 904 may have a same length (e.g., 30 minutes, one hour, 90 minutes, 2 hours, etc.), and/or at a same time of day, so that the comparison between the histogram 720 and the expected histogram 820 may be more meaningful.
  • multiple comparisons may be made using different expected histograms, such as expected histograms for a time period of a same length from an hour ago, two hours ago, etc., and/or a same time period from a day ago, a week ago, a month ago, a year ago, etc.
  • the security system may compare the histogram 720 against an expected histogram that is further back in time (e.g., a week ago, a month ago, a year ago, etc.). This may allow the security system to take into account cyclical patterns (e.g., higher traffic volume on Saturdays, before Christmas, etc.)
  • the security system may, at act 425 , determine if the there is any anomaly associated with the attribute in question (e.g., time from product view to checkout, IP address, etc.). For instance, with reference to FIG. 8A , the third bar (residue “02”) in the histogram 720 may exceed the third bar in the expected histogram 820 by a significant amount (e.g., more than a selected threshold amount). Thus, the security system may infer a possible attack from an IP address that hash-mods into “02.” The security system may store the attribute (e.g., IP address) and the particular bucket exhibiting an anomaly (e.g., residue “02”) in a fuzzy profile.
  • the attribute e.g., IP address
  • the particular bucket exhibiting an anomaly e.g., residue “02”
  • incoming digital interactions may be analyzed against the fuzzy profile, and one or more security measures may be imposed on matching digital interactions (e.g., digital interactions involving IP addresses that hash-mod into “02”). For example, one or more security probes may be deployed to investigate the matching digital interactions.
  • the illustrative techniques discussed in connection with FIG. 4 may provide flexibility in anomaly detection.
  • the expected histogram 820 may be customized for a web site, by using only digital interactions taking place on that web site.
  • expected histograms may evolve over time. For instance, on any given day, the security system may use digital interactions from the day before (or a week ago, a month ago, a year ago, etc.) to calculate an expected histogram. In this manner, expected histograms may follow trends on the web site and remain up-to-date.
  • an unexpected increase in traffic from a few IP addresses may be an indication of coordinated attack from computer resources controlled by an attacker.
  • an unexpected spike in a product SKU being ordered from a web site may be an indication of a potential pricing mistake and resellers ordering large quantities for that particular product SKU.
  • a security system that merely looks for known anomalous patterns may not be able to detect such emergent anomalies.
  • the security system may compute a normalized count for a bucket, which may be a ratio between a count for the individual bucket and a total count among all buckets. The normalized count may then be compared against an expected normalized count, in addition to, or instead of, comparing the count against an expected count as described in connection with FIG. 8B .
  • normalization may be used advantageously to reduce false positives. For instance, during traditional holiday shopping seasons, or during an advertised sales special, there may be an increase of shopping web site visits and checkout activities. Such an increase may lead to an increase of absolute counts across multiple buckets. A comparison between a current absolute count for an individual bucket and an expected absolute count (e.g., an absolute count for that bucket observed a week ago) may show that the current absolute count exceeds the expected absolute count by more than a threshold amount, which may lead to a false positive identification of anomaly. By contrast, a comparison between a current normalized count and an expected normalized count may remain reliable despite an across-the-board increase in activities.
  • FIG. 10 shows an illustrative normalized histogram 1000 , in accordance with some embodiments.
  • each bar in the histogram 1000 corresponds to a bucket
  • a height of the bar corresponds to a normalized count obtained by dividing an absolute count for the bucket by a sum of counts from all buckets.
  • the first bucket may account for 10% of all digital interactions, the second bucket 15%, the third bucket 30%, the fourth bucket 15%, etc.
  • a normalized histogram may be used at acts 415 - 420 of the illustrative process 400 of FIG. 4 , instead of, or in addition to, a histogram with absolute counts. For instance, with increased sales activities during a holiday shopping season, an absolute count in a bucket may increase significantly from a week or a month ago, but a normalized count may remain roughly the same. If, on the other hand, an attack is taking place via digital interactions originating from a small number of IP addresses, a bucket to which one or more of the malicious IP addresses are mapped (e.g., via hash-modding) may account for an increased percentage of all digital interactions.
  • each histogram may correspond to a separate window of time.
  • FIG. 11 shows an illustrative array 1100 of histograms over time, in accordance with some embodiments.
  • the array 1100 includes 24 histograms, each corresponding to a one-hour window. For instance, there may be a histogram for a current time, a histogram for one hour prior, a histogram for two hours prior, etc. These histograms may show statistics for a same attribute, such as IP address.
  • the attribute may be IP address, and an IP address may be mapped to one of the four buckets based on a time zone associated with the IP address.
  • buckets 1120 , 1140 , 1160 , and 1180 may correspond, respectively, to Eastern, Central, Mountain, and Pacific.
  • the illustrative array 1100 shows peak activity levels in the bucket 1120 at hour markers ⁇ 18, ⁇ 19, and ⁇ 20, which may be morning hours for the Eastern time zone.
  • the illustrative array 1100 also shows peak activity levels in the bucket 1160 at hours markers ⁇ 16, ⁇ 17, and ⁇ 19, which may be morning hours for the Mountain time zone. These may be considered normal patterns. Although not shown, a pike of activities at nighttime may indicate an anomaly.
  • time resolution i.e., 24 one-hour windows
  • time resolutions may be used additionally, or alternatively, such as 12 five-minute windows, seven one-day windows, 14 one-day windows, four one-week windows, etc.
  • aspects of the present disclosure are not limited to the use of an array of histograms.
  • a profile may be generated with a plurality of attributes to increase accuracy and/or efficiency of anomaly detection. For instance, a plurality of attributes may be monitored, and the illustrative process 400 of FIG. 4 may be performed for each attribute to determine if that attribute is anomalous (e.g., by building a histogram, or an array of histograms as discussed in connection with FIG. 11 ). In this manner, risk assessment may be performed in multiple dimensions, which may improve accuracy.
  • one or more attributes may be selected so that a detected anomaly in any of the one or more attributes may be highly indicative of an attack.
  • the inventors have recognized and appreciated that, while anomalies in some attributes may be highly indicative of attacks, such anomalies may rarely occur, so that it may not be worthwhile to expend time and resources (e.g., storage, processor cycles, etc.) to monitor those attributes.
  • an attribute may be selected only if anomalies in that attribute are observed frequently in known attacks (e.g., in higher than a selected threshold percentage of attacks).
  • anomalies in one attribute may be correlated with anomalies in another attribute. For instance, there may be a strong correlation between time zone and language, so that an observation of an anomalous time zone value may not provide a lot of additional information if a corresponding language value is already known to be anomalous, or vice versa. Accordingly, in some embodiments, the plurality of attributes may be selected to be pairwise independent.
  • FIG. 12 shows an illustrative profile 1200 with multiple anomalous attributes, in accordance with some embodiments.
  • the illustrative profile includes at least three attributes—time from product view to checkout, email domain, and product SKU.
  • Three illustrative histograms 1220 , 1240 , and 1260 may be built for these attributes, respectively.
  • each of the histograms 1220 , 1240 , and 1260 may be built based on recent digital interactions at a relevant web site, using one or more of the techniques described in connection with FIGS. 4-7B .
  • the histograms 1220 , 1240 , and 1260 are compared against three expected histograms, respectively.
  • an expected histogram may be calculated based on historical data.
  • each bar in an expected histogram may be calculated as a moving average over some length of time.
  • an expected histogram may be a histogram calculated from digital interactions that took place in a past period of time, for instance, as discussed in connection with FIGS. 8A-9 .
  • each of the histograms 1220 , 1240 , and 1260 has an anomalous value.
  • the third bucket for the histogram 1220 may show a count 1223 that is substantially higher (e.g., more than a threshold amount higher) than an expected count 1226 in the corresponding expected histogram
  • the fourth bucket for the histogram 1240 may show a count 1244 that is substantially higher (e.g., more than a threshold amount higher) than an expected count 1248 in the corresponding expected histogram
  • the last bucket for the histogram 1260 shows a count 1266 that is substantially higher (e.g., more than a threshold amount higher) than an expected count 1272 in the corresponding expected histogram.
  • different thresholds may be used to determine anomaly for different attributes, as some attributes may have counts that tend to fluctuate widely over time, while other attributes may have counts that tend to stay relatively stable.
  • the inventors have recognized and appreciated that when information is collected from a digital interaction, not all of the collected information may be useful for anomaly detection. For instance, if a particular operating system has a certain vulnerability that is exploited in an attack, and the vulnerability exists in all versions of the operating system, a stronger anomalous pattern may emerge if all digital interactions involving that operating system are analyzed together, regardless of version number. If, by contrast, digital interactions are stratified by version number, each version number may deviate from a respective expected pattern only moderately, which may make the attack more difficult to detect.
  • an entropy reduction operation may be performed on an observation from a digital interaction to remove information that may not be relevant for assessing a level of risk associated with the digital interaction. In this manner, less information may be processed, which may reduce storage requirement and/or improve response time of a security system.
  • FIG. 13 shows an illustrative process 1300 for detecting anomalies, in accordance with some embodiments.
  • the process 1300 may be performed by a security system (e.g., the illustrative security system 14 shown in FIG. 1B ) to monitor digital interactions taking place at a particular web site.
  • the security system may compare what is currently observed against what was observed previously at the same web site to determine whether there is any anomaly.
  • the security system may record a plurality of observations relating to an attribute.
  • the security system may monitor any suitable attribute, such as an anchor type (e.g., network address, email address, account identifier, etc.), a transaction attribute (e.g., product SKU, number of items in shopping cart, average value of items purchased, etc.), a timing attribute (e.g., time from product view to checkout, time from adding product to shopping cart to checkout, etc.), etc.
  • an anchor type e.g., network address, email address, account identifier, etc.
  • a transaction attribute e.g., product SKU, number of items in shopping cart, average value of items purchased, etc.
  • a timing attribute e.g., time from product view to checkout, time from adding product to shopping cart to checkout, etc.
  • the security system may record each observation from a respective digital interaction. Instead of dividing the observations into a plurality of buckets, the security system may, at act 1308 , perform an entropy reduction operation on each observation, thereby deriving a plurality of attribute values. The plurality of attribute values are then divided into buckets, for instance, as discussed in connection with act 410 of FIG. 4 . The remainder of the process 1300 may proceed as described in connection with FIG. 4 .
  • two observations relating to user agent may be recorded as follows:
  • the inventors have recognized and appreciated that the operating system Mac OS X may often be associated with attacks, regardless of version number (e.g., 10_11_6 versus 10_11_4).
  • the above strings may land in two different buckets.
  • an increase in traffic e.g., 1000 digital interactions per hour
  • each bucket may show a smaller increase (e.g., about 500 digital interactions per hour), and the security system may not be sufficiently confident to flag an anomaly.
  • the security system may strip the operating system version numbers from the above strings at act 1308 of the illustrative process 1300 of FIG. 13 .
  • the Mozilla version numbers “5.0” may be reduced to “5”
  • the AppleWebKit version numbers “537.36” may be reduced to “537”
  • the Chrome version numbers “52.0.2743.116 ” may be reduced to “52”
  • the Safari version numbers “537.36” may be reduced to “537.”
  • both of the above strings may be reduced to a common attribute value:
  • entropy reduction may be performed incrementally. For instance, the security system may first strip out operating version numbers. If no discernible anomaly emerges, the security system may strip out AppleWebKit version numbers. This may continue until some discernible anomaly emerges, or all version numbers have been stripped out.
  • an observation relating to display size may be recorded as follows:
  • the security system may sort the display dimensions in some appropriate order (e.g., low to high, or high to low), which may result in the following:
  • sorting may allow partial matching.
  • aspects of the present disclosure are not limited to sorting display dimensions.
  • the security system may reduce the display dimensions, for example, by dividing the display dimensions by 100 and then rounding (e.g., using a floor or ceiling function). This may result in the following:
  • FIG. 14 shows an illustrative process 1400 for matching a digital interaction to a fuzzy profile, in accordance with some embodiments.
  • the process 1400 may be performed by a security system (e.g., the illustrative security system 14 shown in FIG. 1B ) to determine if a digital interaction is likely part of an attack.
  • a security system e.g., the illustrative security system 14 shown in FIG. 1B
  • the fuzzy profile is built (e.g., using the illustrative process 400 shown in FIG. 4 ) for detecting illegitimate resellers.
  • the profile may store one or more attributes that are anomalous. Additionally, or alternatively, the profile may store, for each anomalous attribute, an attribute value that is anomalous, and/or an indication of an extent to which that attribute value deviates from expectation.
  • an anomalous attribute may be product SKU, and an anomalous attribute may be a particular hash-mod bucket (e.g., the last bucket in the illustrative histogram 1260 shown in FIG. 12 ).
  • the profile may store an indication of an extent to which an observed count for that bucket (e.g., the count 1266 ) deviates from an expected count (e.g., the count 1272 ).
  • the profile may store a percentage by which the observed count exceeds the expected count.
  • the profile may store an amount by which the observed count exceeds the expected count.
  • the profile may store an indication of a distance between the observed count and the expected count.
  • the expected count may be an average count for the particular bucket over some period of time, and the expected interval may be defined based on a standard deviation (e.g., one standard deviation away from the average count, two standard deviations away, two standard deviations away, etc.).
  • a standard deviation e.g., one standard deviation away from the average count, two standard deviations away, two standard deviations away, etc.
  • the security system may, at act 1405 , identify a plurality of attributes from the fuzzy profile.
  • digital interactions with a retailer's web store may be analyzed to distinguish possible resellers from retail customers who purchase goods for their own use.
  • a reseller profile for use in such an analysis may contain attributes such as the following.
  • the security system may select an anomalous attribute (e.g., product SKU) and identify one or more values that are anomalous (e.g., one or more hash-mod buckets with anomalously high counts).
  • the security system may determine if the digital interaction that is being analyzed matches the fuzzy profile with respect to the anomalous attribute. For instance, the security system may identify a hash-mod bucket for a product SKU that is being purchased in the digital interaction, and determine whether that hash-mod bucket is among one or more anomalous hash-mod buckets stored in the profile for the product SKU attribute. If there is a match, the security system may so record.
  • the security system may determine if there is another anomalous attribute to be processed. If so, the security system may return to act 1410 . Otherwise, the security system may proceed to act 1425 to calculate a penalty score.
  • the penalty score may be calculated in any suitable manner. In some embodiments, the penalty score is determined on a ratio between a count of anomalous attributes with respect to which the digital interaction matches the profile, and a total count of anomalous attributes. Illustrative code for calculating the penalty score is shown below.
  • an attribute penalty score may be determined for a matching attribute based on an extent to which an observed count for a matching bucket deviates from an expected count for that bucket.
  • An overall penalty score may then be calculated based on one or more attribute penalty scores (e.g., as a weighted sum).
  • a penalty score calculated using a reseller profile may indicate a likelihood that a reseller is involved in a digital interaction.
  • a penalty score may be used in any suitable manner.
  • the web retailer may use the penalty score to decide whether to initiate one or more actions, such as canceling an order already placed by the suspected reseller, suspend the suspected reseller's account, and/or prevent creation of a new account by an entity linked to the suspected reseller's account.
  • the reseller profile described above in connection with FIG. 14 is provided merely for purposes of illustration. Aspects of the present disclosure are not limited to monitoring any particular attribute or combination of attributes to identify resellers, nor to the use of a reseller profile at all. In various embodiments, any suitable attribute may be monitored to detect any type of anomaly, in addition to, or instead of, reseller activity.
  • one or more past digital interactions may be identified, using any suitable method, as part of an attack.
  • Each such digital interaction may be associated with an anchor value (e.g., IP address, name, account ID, email address, device ID, device fingerprint, user ID, hashed credit card number, etc.), and the anchor value may in turn be associated with a behavior profile.
  • an anchor value e.g., IP address, name, account ID, email address, device ID, device fingerprint, user ID, hashed credit card number, etc.
  • the anchor value may in turn be associated with a behavior profile.
  • one or more behavior profiles may be identified as being associated with the attack and may be used to build a fuzzy profile.
  • a fuzzy profile may include any suitable combination of one or more attributes, which may, although need not, coincide with one or more attributes of the behavior profiles from which the fuzzy profile is built.
  • the fuzzy profile may store a range or limit of values for an attribute, where the range or limit may be determined based on values of the attribute stored in the behavior profiles.
  • FIG. 15 shows an illustrative fuzzy profile 1500 , in accordance with some embodiments.
  • three individual behaviors A, B, and C are observed in known malicious digital interactions.
  • each of the behaviors A, B, and C may be observed in 20% of known malicious digital interactions (although it should be appreciated that behaviors observed at different frequencies may also be analyzed together).
  • each of the behaviors A, B, and C may be a poor indicator of whether a digital interaction exhibiting that behavior is part of an attack
  • certain combinations of the behaviors A, B, and C may provide more reliable indicators. For example, if a digital interaction exhibits both behaviors A and B, there may be a high likelihood (e.g., 80%) that the digital interaction is part of an attack, whereas if a digital interaction exhibits both behaviors B and C, there may be a low likelihood (e.g., 40%) that the digital interaction is part of an attack.
  • FIG. 16 shows an illustrative fuzzy profile 1600 , in accordance with some embodiments.
  • the fuzzy profile 1600 includes six individual behaviors A, B, C, X, Y, and Z, where behaviors A, B, and C each include an observed historical pattern, and behaviors X, Y, and Z each include a behavior observed during a current digital interaction.
  • an anchor value e.g., IP address, account ID, etc.
  • a high likelihood e.g., 80%
  • such a likelihood may be determined based on a percentage of malicious digital interactions that are also associated with an anchor value exhibiting both historical patterns A and B.
  • a digital interaction is associated with an anchor value (e.g., IP address, account ID, etc.) exhibiting historical patterns C, and if both behaviors X and Y are observed during the current digital interaction, there may be an even higher likelihood (e.g., 98%) that the digital interaction is part of an attack. If, on the other hand, only behaviors Y and Z are observed during the current digital interaction, there may be a lower likelihood (e.g., 75%) likelihood that the digital interaction is part of an attack.
  • an anchor value e.g., IP address, account ID, etc.
  • one or more behaviors observed in a new digital interaction may be checked against a fuzzy profile and a score may be computed that is indicative of a likelihood that the new digital interaction is part of an attack associated with the fuzzy profile.
  • a score may be computed that is indicative of a likelihood that the new digital interaction is part of an attack associated with the fuzzy profile.
  • an anchor value associated with the new digital interaction may be linked to a known malicious anchor value associated with the fuzzy profile.
  • fuzzy profiles may capture behavior characteristics that may be more difficult for an attacker to spoof, compared to other types of characteristics such as device characteristics.
  • a fuzzy profile may be used across multiple web sites and/or applications. For example, when an attack occurs against a particular web site or application, a fuzzy profile may be created based on that attack (e.g., to identify linked anchor values) and may be used to detect similar attacks on a different web site or application.
  • aspects of the present disclosure are not limited to the use of a fuzzy profile, as each of the techniques described herein may be used alone, or in combination with any one or more other techniques described herein.
  • SKUs Stock Keeping Units
  • Other types of identifiers may allow analysis of sales data by product/service, for example, to identify historical purchase trends. In some embodiments, techniques are provided for identifying unexpected sale patterns.
  • SKUs are used in some of the examples described herein, it should be appreciated that aspects of the present disclosure are not limited to the use of SKUs, as other types of identifiers for products and/or services may also be used.
  • a SKU may sometimes become incorrectly priced in a retailer's inventory management software. This may be the result of a glitch or bug in the software, or a human error.
  • a product that normally sells for $1,200.00 may be incorrectly priced at $120.00, which may lead to a sharp increase in the number of purchases of that product.
  • the retailer may inadvertently allow transactions to complete and ship goods at a loss. Examples of other problems that may lead to anomalous sales data include, but are not limited to, consumers exploiting unexpected coupon code interactions, consumers violating sale policies (e.g., limit one item per customer at a discounted price), and commercial resellers attempting to take advantage of consumer-only pricing.
  • a security system may be programmed to monitor purchase activity (e.g., per SKU or group of SKUs) and raise an alert when significant deviation from an expected baseline is observed.
  • the security system may use any suitable technique for detecting unexpected sale patterns, including, but not limited, using a fuzzy profile as described herein.
  • one or more automated countermeasures may be implemented in response to an alert. For example, a retailer may automatically freeze sales transactions that are in progress, and/or remove a SKU from the website, until an investigation is conducted. Additionally, or alternatively, one or more recommendations may be made to a retailer (e.g., based on profit/loss calculations), so that the retailer may decide to allow or block certain activities depending on projected financial impact.
  • data relating to sales activities may be collected and stored in a database.
  • One or more metrics may then be derived from the stored data. Examples of metrics that may be computed for a particular SKU or group of SKUs include, but are not limited to, proportion of transactions including that SKU or group of SKUs (e.g., out of all transactions at a website or group of websites), average number of items of that SKU or group of SKUs purchased in a single transaction or over a certain period of time by a single buyer, etc.
  • one or more metrics derived from current sales activities may be compared against historical data.
  • JavaScript code running on a website may monitor one or more sales activities and compare one or more current metrics against historical data.
  • aspects of the present disclosure are not limited to the use of JavaScript, as any suitable client-side and/or server-side programs written in any suitable language may be used to implement any one or more of the functionalities described herein.
  • an alert may be raised if one or more current metrics represent a significant deviation from one or more historically observed baselines.
  • the one or more metrics may be derived in any suitable manner. For instance, in some embodiments, a metric may pertain to all transaction conducted over a web site or group of web sites, or may be specific to a certain anchor value such as a certain IP address or a certain user account. Additionally, or alternatively, a metric may be per SKU or group of SKUs.
  • an electronics retailer may sell a particular model of television for $1,200.00.
  • Historical sales data may indicate one or more of the following:
  • a system may be provided that is programmed to use historical data (e.g., one or more of the observations noted above) as a baseline to intelligently detect notable deviations. For instance, with reference to the television example described above, if the retailer's stock keeping system incorrectly priced the $1,200.00 model of television at $120.00, one or more of the following may be observed:
  • alerts may be triggered based on observations such as those described above.
  • any one of a designated set of observations may trigger an alert.
  • a threshold number of observations e.g., two, three, etc.
  • one or more specific combinations of observations may trigger an alert.
  • a retailer when an alert is raised, a retailer may be notified in real time. In this manner, the retailer may be able to investigate and correct one or more errors that led to the anomalous sales activities, before significant damage is done to the retailer's business.
  • the techniques described herein may be used in other scenarios as well.
  • one or more of the techniques described herein may be used to detect abuse of sale prices, new customer loss-leader deals, programming errors relating to certain coupon codes, resellers buying out stock, etc. Any of these and/or other anomalies may be detected from a population of transactions.
  • an online behavior scoring system may calculate a risk score for an anchor value, where the anchor value may be associated with an entity such as a human user or a bot.
  • the risk score may indicate a perceived likelihood that the associated entity is malicious (e.g., being part of an attack).
  • Risk scores may be calculated using any suitable combination of one or more techniques, including, but not limited to:
  • techniques are provided for monitoring online behavior in a manner that is transparent to entities being monitored.
  • one or more security probes may be deployed dynamically to obtain information regarding an entity. For instance, a security probe may be deploy only when a security system determines that there is sufficient value in doing so (e.g., using an understanding of user behavior). As an example, a security probe may be deployed when a level of suspicion associated with the entity is sufficiently high to warrant an investigation (e.g., when recent activities of an entity represent a significant deviation from an activity pattern observed in the past for that entity).
  • the inventors have recognized and appreciated that by reducing a rate of deployment of security probes for surveillance, it may be more difficult for an attacker to detect the surveillance and/or to discover how the surveillance is conducted. As a result, the attacker may not be able to evade the surveillance effectively.
  • FIG. 17 shows an illustrative process 1700 for dynamic security probe deployment, in accordance with some embodiments.
  • the process 1700 may be performed by a security system (e.g., the illustrative security system 14 shown in FIG. 1B ) to determine if and when to deploy one or more security probes.
  • a security system e.g., the illustrative security system 14 shown in FIG. 1B
  • the security system may receive data regarding a digital interaction. For instance, as discussed in connection with FIG. 1B , the security system may receive log files comprising data recorded from digital interactions. The security system may process the received data and store salient information into an appropriate data structure, such as the illustrative data structure 220 shown in FIG. 2B . The stored information may be used, at act 1710 , to determine if the digital interaction is suspicious.
  • any suitable technique may be used to determine if the digital interaction is suspicious.
  • the illustrative process 1400 shown in FIG. 14 may be used to determine if the digital interaction matches a fuzzy profile that stores anomalous attributes. If a resulting penalty score is below a selected threshold, the security system may proceed to act 1715 to perform standard operation. Otherwise, the security system may proceed to act 1720 to deploy a security probe, and data collected by the security probe from the digital interaction may be analyzed at act 1725 to determine if further action is appropriate.
  • the penalty score threshold may be chosen in any suitable manner. For instance, the inventors have recognized and appreciated that, while it may be desirable to collect more data from digital interactions, the security system may have limited resources such as network bandwidth and processing power. Therefore, to conserve resources, security probes should be deployed judiciously. Moreover, the inventors have recognized and appreciated that frequent deployment of probes may allow an attacker to study the probes and learn how to evade detection. Accordingly, in some embodiments, a penalty score threshold may be selected to provide a desired tradeoff.
  • aspects of the present disclosure are not limited to the use of a fuzzy profile to determine if and when to deploy a security probe.
  • a profile associated with an anchor value observed from the digital interaction may be used to determine if the digital interaction is sufficiently similar to prior digital interactions from which the anchor value was observed. If it is determined that the digital interaction is not sufficiently similar to prior digital interactions from which the anchor value was observed, one or more security probes may be deployed to gather additional information from the digital interaction.
  • a security system may be configured to segment traffic over one or more dimensions, including, but not limited to, IP Address, XFF IP Address, C-Class IP Address, Input Signature, Account ID, Device ID, User Agent, etc.
  • each digital interaction may be associated with one or more anchor values, where each anchor value may correspond to a dimension for segmentation.
  • This may allow the security system to create segmented lists.
  • a segmented list may be created that includes all traffic reporting Chrome as the user agent.
  • a segmented list may be created that includes all traffic reporting Chrome Version 36.0.1985.125 as the user agent. In this manner, segmented lists may be created at any suitable granularity.
  • a segmented list may include all traffic reporting Mac OS X 10.9.2 as the operating system. Additionally, or alternatively, a segmented list may be created that includes all traffic reporting Chrome Version 36.0.1985.125 as the user agent and Mac OS X 10.9.2 as the operating system. In this manner, segmented lists may be created with any suitable combination of one or more anchor values.
  • one or more metrics may be collected and stored for a segmented list.
  • a segmented list e.g., all traffic associated with a particular IP address or block of IP addresses
  • one or more metrics collected for that segmented list may be stored in association with the segment identifier.
  • metrics that may be collected include, but are not limited to, average risk score, minimum risk score, maximum risk score, number of accesses with some window of time (e.g., the last 5 minutes, 10 minutes, 15 minutes, 30 minutes, 45 minutes, hour, 2 hours, 3 hours, 6 hours, 12 hours, 24 hours, day, 2 days, 3 days, 7 days, two weeks, etc.), geographic data, etc.
  • a security system may use one or more metrics stored for a segmented list to determine whether a security probe should be deployed. For example, a security probe may be deployed when one or more metrics exceed corresponding thresholds. The security system may select one or more probes based on a number of different factors, such as which one or more metrics have exceeded the corresponding thresholds, by how much the one or more metrics have exceeded the corresponding thresholds, and/or which segmented list is implicated.
  • Thresholds for metrics may be determined in any suitable manner. For instance, in some embodiments, one or more human analysts may examine historical data (e.g., general population data, data relating to traffic that turned out to be associated with an attack, data relating to traffic that was not identified as being associated with an attack, etc.), and may select the thresholds based on the historical data (e.g., to achieve a desire tradeoff between false positive errors and false negative errors). Additionally, or alternatively, one or more techniques described below in connection with threshold-type sensors may be used to select thresholds automatically.
  • historical data e.g., general population data, data relating to traffic that turned out to be associated with an attack, data relating to traffic that was not identified as being associated with an attack, etc.
  • thresholds e.g., to achieve a desire tradeoff between false positive errors and false negative errors.
  • one or more techniques described below in connection with threshold-type sensors may be used to select thresholds automatically.
  • client-side checks to collect information.
  • such checks are enabled in a client during many interactions, which may give an attacker clear visibility into how the online behavior scoring system works (e.g., what information is collected, what tests are performed, etc.).
  • an attacker may be able to adapt and evade detection.
  • techniques are provided for obfuscating client-side functionalities. Used alone or in combination with dynamic probe deployment (which may reduce a number of probes deployed to, for example, one in hundreds of thousands of interactions), client-side functionality obfuscation may reduce a likelihood of malicious entities detecting surveillance and/or discovering how the surveillance is conducted. For instance, client-side functionality obfuscation may make it difficult for a malicious entity to test a probe's behavior in a consistent environment.
  • FIG. 18 shows an illustrative cycle 1800 for updating one or more segmented lists, in accordance with some embodiments.
  • one or more handlers may be programmed to read from a segmented list (e.g., by reading one or more metrics associated with the segmented list) and determine whether and/or how a probe should be deployed.
  • handlers include, but are not limited to, an initialization handler programmed to handle initialization requests and return HTML code, and/or an Ajax (asynchronous JavaScript and XML) handler programmed to respond to Ajax requests.
  • one or more handlers may be programmed to write to a segmented list (e.g., by updating one or more metrics associated with the segmented list, such as average, minimum, and/or maximum risk scores).
  • a score handler programmed to calculate risk scores may be programmed to write to a segmented list (e.g., by updating one or more metrics associated with the segmented list, such as average, minimum, and/or maximum risk scores).
  • a score handler programmed to calculate risk scores may be programmed to write to a segmented list (e.g., by updating one or more metrics associated with the segmented list, such as average, minimum, and/or maximum risk scores).
  • aspects of the present disclosure are not limited to the use of handlers, as other types of programs may also be used to implement any of the functionalities described herein.
  • a write to a segmented list may trigger one or more reads from the segmented list. For example, whenever a score handler updates a risk score metric, a cycle may be started and an initialization handler and/or Ajax handler may read one or more segmented lists affected by the update. In this manner, whenever a new event takes place that affects a metric, a fresh determination may be made as to whether to deploy one or more probes.
  • aspects of the present disclosure are not limited to the implementation of such cycles, as in some embodiments a segmented list may be read periodically, regardless of observations from new events.
  • a probe may be deployed to one or more selected interactions only, as opposed to all interactions in a segmented list.
  • a probe may be deployed only to one or more suspected members in a segmented list (e.g., a member for which one or more measurements are at or above certain alert levels).
  • the result may be stored in association with the member and/or the segmented list, and the probe may not be sent again.
  • a probe may be deployed only a limited number of times, which may make it difficult for an attacker to detect what information the probe is collecting, or even the fact that a probe has been deployed.
  • a probe may be deployed to every interaction, or one or more probes may be deployed in a targeted fashion, while one or more other probes may be deployed to every interaction.
  • a probe may use markup (e.g., image tag) already present on a web page to perform one or more functions. For example, any markup that requires a user agent to perform a computational action may be used as a probe. Additionally, or alternatively, a probe may include additional markup, JavaScript, and/or Ajax calls to a server. Some non-limiting examples of probes are described below.
  • markup e.g., image tag
  • a probe may be deployed and one or more results of the probe may be logged (e.g., in association with a segment identifier and/or alongside one or more metrics associated with the segment identifier). Such a result may be used to determine if a subsequent probe is to be deployed. Additionally, or alternatively, such a result may be used to facilitate scoring and/or classifying future digital interactions.
  • a same form of input pattern may be observed several times in a short window of time, which may represent an anomalously high rate. Additionally, it may be observed that a same user agent is involved in all or a significant portion of the digital interactions exhibiting the suspicious input pattern. This may indicate a potential high volume automated attack, and may cause one or more probes to be deployed to obtain more information about a potential automation method.
  • multiple security probes may be deployed, where each probe may be designed to discover different information.
  • information collected by a probe may be used by a security system to inform the decision of which one or more other probes to deploy next.
  • the security system may be able to gain an in-depth understanding into network traffic (e.g., website and/or application traffic).
  • the security system may be able to classify traffic in ways that facilitate identification of malicious traffic, define with precision what type of attack is being observed, and/or discover that some suspect behavior is actually legitimate. These results may indicate not only a likelihood that certain traffic is malicious, but also a likely type of malicious traffic. Therefore, such results may be more meaningful than just a numeric score. For instance, if multiple probe results indicate a digital interaction is legitimate, a determination may be made that an initial identification of the digital interaction as being suspicious may be a false positive identification.
  • FIG. 19 shows an illustrative process 1900 for dynamically deploying multiple security probes, in accordance with some embodiments.
  • the process 1900 may be performed by a security system (e.g., the illustrative security system 14 shown in FIG. 1B ) to determine if and when to deploy one or more security probes.
  • a security system e.g., the illustrative security system 14 shown in FIG. 1B
  • Acts 1905 , 1910 , 1915 , and 1920 of the process 1900 may be similar to acts 1705 , 1710 , 1715 , and 1720 of the process 1700 , respectively.
  • the security system may analyze data collected by a probe of a first type (e.g., Probe 1 ) deployed at act 1720 to determine what type of probe to further deploy to the digital interaction. For example, if a result of Probe 1 is positive (e.g., a suspicious pattern is identified), a probe of a second type (e.g., Probe 2 ) may be deployed at act 1930 to further investigate the digital interaction.
  • the security system may analyze data collected by Probe 2 to determine what, if any, action may be appropriate.
  • a probe of a third type (e.g., Probe 3 ) may be deployed at act 1935 to further investigate the digital interaction.
  • the security system may analyze data collected by Probe 3 to determine what, if any, action may be appropriate.
  • a first probe may be deployed to verify if the client is running JavaScript.
  • This probe may include a JavaScript snippet, and may be deployed only in one or a small number of suspicious interactions, to make it more difficult for an attacker to detect the probe.
  • the security system may determine that an attacker may be employing some type of GUI macro, and a subsequent probe may be sent to confirm this hypothesis (e.g., by altering a layout of a form).
  • the security system may determine that an attacker may be employing some type of CLI script, and a subsequent probe may be sent to further discover one or more script capabilities and/or methods used to spoof form input. This decision-making pattern may be repeated until all desired information has been collected about the potential attack.
  • FIG. 20 shows an example of a decision tree that may be used by a security system to determine whether to deploy a probe and/or which one or more probes are to be deployed, in accordance with some embodiments.
  • some or all JavaScript code may be obfuscated before being sent to a client.
  • one or more obfuscation techniques may be used to hide logic for one or more probes. Examples of such techniques include, but are not limited to, symbol renaming and/or re-ordering, code minimization, logic shuffling, and fabrication of meaningless logic (e.g., additional decision and control statements that are not required for the probe to function as intended).
  • threshold-type sensors to trigger actions. For instance, a sensor may be set up to monitor one or more attributes of an entity and raise an alert when a value of an attribute falls above or below an expected threshold. Similarly, an expected range may be used, and an alert may be raised when the value of the attribute falls outside the expected range.
  • the threshold or range may be determined manually by one or more data scientists, for example, by analyzing a historical data set to identify a set of acceptable values and setting the threshold or range based on the acceptable values.
  • a security system is provided that is programed to monitor one or more digital interactions and tune a sensor based on data collected from the digital interactions.
  • Such monitoring and tuning may be performed with or without human involvement.
  • the monitoring and tuning may be performed in real time, which may allow the security system to react to an attack as soon as the attack is suspected, rather than waiting for data to be accumulated and analyzed over several weeks. In this manner, one or more actions may be taken while the attack is still on-going to stop the attack and/or control damages.
  • real time tuning is not required, as data may alternatively, or additionally, be accumulated and analyzed after the attack.
  • a security system may be configured to use one or more sensors to collect data from one or more digital interactions.
  • the system may analyze the collected data to identify a baseline of expected behavior, and then use the identified baseline to tune the one or more sensors, thereby providing a feedback loop.
  • the system may accumulate the data collected by the one or more sensors over time and use the accumulated data to build a model of baseline behavior.
  • data collected by one or more sensors may be segmented.
  • segmentation may allow a security system to deal with large amounts of data more efficiently.
  • the security system may group observed entities and/or digital interactions into buckets based on certain shared characteristics.
  • each entity or digital interaction may be associated with one of several buckets based on a typing speed detected for the entity or digital interaction.
  • the buckets may be chosen in any suitable manner. For instance, more buckets may be used when finer-grained distinctions are desirable.
  • an entity or digital interaction may be associated with one of four different buckets based on typing speed: 0-30 words per minute, 31-60 words per minute, 61-90 words per minute, and 90+ words per minute.
  • segmentation may be performed on any type of measurements, including, but not limited to, typing speed, geo-location, user agent, and/or device ID.
  • data collected by one or more sensors may be quantized to reduce the number of possible values for a particular attribute, which may allow a security system to analyze the data more efficiently.
  • quantization may be performed using a hash-modding process, which may involve hashing an input value and performing a modulo operation on the resulting hash value.
  • hash-modding may involve hashing an input value and performing a modulo operation on the resulting hash value.
  • a hashing technique may be used that produces a same hash value every time given a same input value, and the hash value may be such that it is difficult to reconstruct the input from the hash value alone.
  • a hash function may allow comparison of attribute values without exposing actual data.
  • a security system may hash a credit card number to produce an alphanumeric string such as the following:
  • the same credit card number may produce the same hash value.
  • the hash function may be selected such that so that no two inputs are mapped to the same hash value, or the number of such pairs is small.
  • the likelihood of two different credit card numbers producing the same hash value may be low, and the security system may be able to verify if a newly submitted credit card number is the same as a previously submitted credit card number by simply computing a hash value of newly submitted credit card number and comparing the computed hash value against a stored hash value of the previously submitted credit card number, without having to store the previously submitted credit card number.
  • a hash function may be used to convert input data (including non-numerical input data) into numerical values, while preserving a distribution of the input data.
  • a distribution of output hash values may approximate the distribution of the input data.
  • a modulo operation (e.g., mod M, where M is a large number) may be applied to a numerical value resulting from hashing or otherwise converting an input value. This may reduce a number of possible output values (e.g., to M, if the modulo operation is mod M). Some information on the distribution of the input data may be lost, as multiple input values may be mapped to the same number under the modulo operation. However, the inventors have recognized and appreciated that sufficient information may be retained for purposes of detecting anomalies.
  • a hash-modding process may be applied in analyzing network addresses.
  • the addresses may be physical addresses and/or logical addresses, as aspects of the present disclosure are not limited to the use of hash-modding to analyze any particular type of input data.
  • IPv4 Internet Protocol version 4
  • IPv6 Internet Protocol version 6
  • IPv6 Internet Protocol version 6
  • IPv6 Internet Protocol version 6
  • comparing such addresses against each other may require a significant amount of time and/or processing power. Therefore, it may be beneficial to reduce the length of each piece of data to be compared, while preserving the salient information contained in the data.
  • IP addresses may be observed.
  • a security system may only compare portions of the hash values. For instance, a security system may extract one or more digits from each hash value, such as one or more least significant digits (e.g., one, two, three, four, five, six, seven, eight, nine, ten, etc.), and compare the extracted digits. In the above example, two least significant digits may be extracted from each hash value, resulting in the values 93 and 41, respectively. It may be more efficient to compare 93 against 41, as opposed to comparing 22.231.113.64 against 194.66.82.11.
  • the extraction of one or more least significant digits may be equivalent to a modulo operation. For example, extracting one least significant hexadecimal digit may be equivalent to mod 16, extracting two least significant hexadecimal digits may be equivalent to mod 256, etc.
  • base-16 numbers may be used instead of, or in addition to, base 16.
  • the inventors have recognized and appreciated that, if the extracted digits for two input IP addresses are different, the security system may infer, with 100% confidence, that the two input IP addresses are different. Thus, hash-modding may provide an efficient way to confirm that two input IP addresses are different.
  • the inventors have further recognized and appreciated that, if the extracted digits for two input IP addresses are same, the security system may infer, with some level of confidence, that the two input IP addresses are the same.
  • a level of confidence that two input IP addresses are the same may be increased by extracting and comparing more digits. For instance, in response to determining that the extracted digits for two input IP addresses are same, two more digits may be extracted from each input IP address and compared. This may be repeated until a suitable stopping condition is reached, for example, if the newly extracted digits are different, or some threshold number of digits have been extracted. The threshold number may be selected to provide a desired level of confidence that the two input IP addresses are the same. In this manner, additional processing to extract and compare more digits may be performed only if the processing that has been done does not yield a definitive answer. This may provide improved efficiency.
  • aspects of the present disclosure are not limited to extracting and comparing digits in two-digit increments, as in some embodiments extraction and comparison may be performed in one-digit increments, three-digit increments, four-digit increments, etc., or in some non-uniform manner. Furthermore, in some embodiments, all digits may be extracted and compared at once, with no incremental processing.
  • IP addresses may cluster around certain points. For instance, a collection of IP address may share a certain prefix.
  • An example of clustered addresses is shown below:
  • the inventors have further recognized and appreciated that, by hashing IP addresses, the observations may be spread more evenly across a number line. For example, the following three addresses may be spread out after hashing, even though they share nine out of eleven digits.
  • the following two addresses may be hashed to the same value because they are identical, and that hash value may be spaced apart from the hash values for the above three addresses.
  • IP addresses may be hashed into a larger space, for example, to spread out the addresses more evenly, and/or to decrease the likelihood of collisions.
  • a 32-bit IPv4 address may be hashed into a 192-bit value, and likewise for a 128-bit IPv6 address.
  • any suitable hash function may be used, including, but not limited to, MD5, MD6, SHA-1, SHA-2, SHA-3, etc.
  • hash-modding may be used to analyze any suitable type of input data, in addition to, or instead of, IP addresses.
  • hash-modding may provide a variable resolution with variable accuracy, which may allow storage requirement and/or efficiency to be managed. For instance, in some embodiments, a higher resolution (e.g., extracting and comparing more digits) may provide more certainty about an observed behavior, but even a lower resolution may provide sufficient information to label the observed behavior.
  • a security system may be able to differentiate, with a reasonable level of certainty, whether a user is typing the same password 10 times, or trying 10 different passwords, because the likelihood of 10 randomly chosen passwords all having the same last 10 bits after hash-modding may be sufficiently low.
  • a hash function may be used advantageously to anonymize input data
  • one or more other functions e.g., a one-to-one function with numerical output values
  • a modulo operation may be performed directly on an input, without first hashing the input (e.g., where the input is already a numerical value).
  • aspects of the present disclosure are not limited to the use of a modulo operation.
  • One or more other techniques for dividing numerical values into buckets may be used instead of, or in addition to, a modulo operation.
  • a security system may create a feedback loop to gain greater insight into historical trends. For example, the system may adapt a baseline for expected behavior and/or anomalous behavior (e.g., thresholds for expected and/or anomalous values) based on current population data and/or historical data. Thus, a feedback loop may allow the system to “teach” itself what an anomaly is by analyzing historical data.
  • a baseline for expected behavior and/or anomalous behavior e.g., thresholds for expected and/or anomalous values
  • a system may determine from historical data that a particular user agent is associated with a higher risk for fraud, and that the user agent makes up only a small percentage (e.g., 1%) of total traffic. If the system detects a dramatic increase in the percentage of traffic involving that user agent in a real-time data stream, the system may determine that a large-scale fraud attack is taking place. The system may continually update an expected percentage of traffic involving the user agent based on what the system observes over time. This may help to avoid false positives (e.g., resulting from the user agent becoming more common among legitimate digital interactions) and/or false negatives (e.g., resulting from the user agent becoming less common among legitimate digital interactions).
  • false positives e.g., resulting from the user agent becoming more common among legitimate digital interactions
  • false negatives e.g., resulting from the user agent becoming less common among legitimate digital interactions.
  • the system may determine from historical data that a vast majority of legitimate digital interactions have a recorded typing speed between 30 and 80 words per minute. If the system detects that a large number of present digital interactions have an improbably high typing speed, the system may determine that a large-scale fraud attack is taking place.
  • the system may continually update an expected range of typing speed based on what the system observes over time. For example, at any given point in time, the expected range may be determined as a range that is centered at an average (e.g., mean, median, or mode) and just large enough to capture a certain percentage of all observations (e.g., 95%, 98%, 99%, etc.). Other techniques for determining an expected range may also be used, as aspects of the present disclosure are not limited to any particular manner of implementation.
  • a historical baseline may change for any number of legitimate reasons. For instance, the release of a new browser version may change the distribution of user agents Likewise, a shift in site demographics or username/password requirements may change the mean typing speed.
  • the system may be able to redraw the historical baseline to reflect any “new normal.” In this manner, the system may be able to adapt itself automatically and with greater accuracy and speed than a human analyst.
  • FIG. 21 shows, schematically, an illustrative computer 5000 on which any aspect of the present disclosure may be implemented.
  • the computer 5000 includes a processing unit 5001 having one or more processors and a non-transitory computer-readable storage medium 5002 that may include, for example, volatile and/or non-volatile memory.
  • the memory 5002 may store one or more instructions to program the processing unit 5001 to perform any of the functions described herein.
  • the computer 5000 may also include other types of non-transitory computer-readable medium, such as storage 5005 (e.g., one or more disk drives) in addition to the system memory 5002 .
  • the storage 5005 may also store one or more application programs and/or external components used by application programs (e.g., software libraries), which may be loaded into the memory 5002 .
  • the computer 5000 may have one or more input devices and/or output devices, such as devices 5006 and 5007 illustrated in FIG. 21 . These devices can be used, among other things, to present a user interface. Examples of output devices that can be used to provide a user interface include printers or display screens for visual presentation of output and speakers or other sound generating devices for audible presentation of output. Examples of input devices that can be used for a user interface include keyboards and pointing devices, such as mice, touch pads, and digitizing tablets. As another example, the input devices 5007 may include a microphone for capturing audio signals, and the output devices 5006 may include a display screen for visually rendering, and/or a speaker for audibly rendering, recognized text.
  • input devices 5007 may include a microphone for capturing audio signals
  • the output devices 5006 may include a display screen for visually rendering, and/or a speaker for audibly rendering, recognized text.
  • the computer 5000 may also comprise one or more network interfaces (e.g., the network interface 5010 ) to enable communication via various networks (e.g., the network 5020 ).
  • networks include a local area network or a wide area network, such as an enterprise network or the Internet.
  • Such networks may be based on any suitable technology and may operate according to any suitable protocol and may include wireless networks, wired networks or fiber optic networks.
  • the above-described embodiments of the present disclosure can be implemented in any of numerous ways.
  • the embodiments may be implemented using hardware, software or a combination thereof.
  • the software code can be executed on any suitable processor or collection of processors, whether provided in a single computer or distributed among multiple computers.
  • the various methods or processes outlined herein may be coded as software that is executable on one or more processors that employ any one of a variety of operating systems or platforms. Additionally, such software may be written using any of a number of suitable programming languages and/or programming or scripting tools, and also may be compiled as executable machine language code or intermediate code that is executed on a framework or virtual machine.
  • the concepts disclosed herein may be embodied as a non-transitory computer-readable medium (or multiple computer-readable media) (e.g., a computer memory, one or more floppy discs, compact discs, optical discs, magnetic tapes, flash memories, circuit configurations in Field Programmable Gate Arrays or other semiconductor devices, or other non-transitory, tangible computer storage medium) encoded with one or more programs that, when executed on one or more computers or other processors, perform methods that implement the various embodiments of the present disclosure discussed above.
  • the computer-readable medium or media can be transportable, such that the program or programs stored thereon can be loaded onto one or more different computers or other processors to implement various aspects of the present disclosure as discussed above.
  • program or “software” are used herein to refer to any type of computer code or set of computer-executable instructions that can be employed to program a computer or other processor to implement various aspects of the present disclosure as discussed above. Additionally, it should be appreciated that according to one aspect of this embodiment, one or more computer programs that when executed perform methods of the present disclosure need not reside on a single computer or processor, but may be distributed in a modular fashion amongst a number of different computers or processors to implement various aspects of the present disclosure.
  • Computer-executable instructions may be in many forms, such as program modules, executed by one or more computers or other devices.
  • program modules include routines, programs, objects, components, data structures, etc. that perform particular tasks or implement particular abstract data types.
  • functionality of the program modules may be combined or distributed as desired in various embodiments.
  • data structures may be stored in computer-readable media in any suitable form.
  • data structures may be shown to have fields that are related through location in the data structure. Such relationships may likewise be achieved by assigning storage for the fields with locations in a computer-readable medium that conveys relationship between the fields.
  • any suitable mechanism may be used to establish a relationship between information in fields of a data structure, including through the use of pointers, tags or other mechanisms that establish relationship between data elements.
  • the concepts disclosed herein may be embodied as a method, of which an example has been provided.
  • the acts performed as part of the method may be ordered in any suitable way. Accordingly, embodiments may be constructed in which acts are performed in an order different than illustrated, which may include performing some acts simultaneously, even though shown as sequential acts in illustrative embodiments.

Abstract

Systems and methods for detecting and scoring anomalies. In some embodiments, a method is provided, comprising acts of: (A) identifying a plurality of values of an attribute, each value of the plurality of values corresponding respectively to a digital interaction of the plurality of digital interactions; (B) dividing the plurality of values into a plurality of buckets; (C) for at least one bucket of the plurality of buckets, determining a count of values from the plurality of values that fall within the at least one bucket; (D) comparing the count of values from the plurality of values that fall within the at least one bucket against historical information regarding the attribute; and (E) determining whether the attribute is anomalous based at least in part on a result of the act (D).

Description

    RELATED APPLICATION
  • This application claims the benefit under 35 U.S.C. §119 of U.S. Provisional Patent Application No. 62/214,969, filed on Sep. 5, 2015, which is hereby incorporated by reference in its entirety.
  • This application is filed on the same day as application Ser. No. ______, entitled “SYSTEMS AND METHODS FOR MATCHING AND SCORING SAMENESS,” bearing Attorney Docket No. L0702.70006US00, and application Ser. No. ______, entitled “SYSTEMS AND METHODS FOR DETECTING AND PREVENTING SPOOFING,” bearing Attorney Docket No. L0702.70003US01. Each of these applications is hereby incorporated by reference in its entirety.
  • BACKGROUND
  • A large organization with an online presence often receives tens of thousands requests per minute to initiate digital interactions. A security system supporting multiple large organizations may handle millions of digital interactions at the same time, and the total number of digital interactions analyzed by the security system each week may easily exceed one billion.
  • As organizations increasingly demand real time results, a security system have to analyze a large amount of data and accurately determine whether a digital interaction is legitimate, all within fractions of a second. This presents tremendous technical challenges, especially given the large overall volume of digital interactions handled by the security system.
  • SUMMARY
  • In accordance with some embodiments, a computer-implemented method is provided for analyzing a plurality of digital interactions, the method comprising acts of: (A) identifying a plurality of values of an attribute, each value of the plurality of values corresponding respectively to a digital interaction of the plurality of digital interactions; (B) dividing the plurality of values into a plurality of buckets; (C) for at least one bucket of the plurality of buckets, determining a count of values from the plurality of values that fall within the at least one bucket; (D) comparing the count of values from the plurality of values that fall within the at least one bucket against historical information regarding the attribute; and (E) determining whether the attribute is anomalous based at least in part on a result of the act (D).
  • In accordance with some embodiments, a computer-implemented method is provided for analyzing a digital interaction, the method comprising acts of: identifying a plurality of attributes from a profile; for each attribute of the plurality of attributes, determining whether the digital interaction matches the profile with respect to the attribute, comprising: identifying, from the profile, at least one bucket of possible values of the attribute, the at least one bucket being indicative of anomalous behavior; identifying, from the digital interaction, a value of the attribute; and determining whether the value identified from the digital interaction falls into the at least one bucket, wherein the digital interaction is determined to match the profile with respect to the attribute if it is determined that the value identified from the digital interaction falls into the at least one bucket; and determining a penalty score based at least in part on a count of attributes with respect to which the digital interaction matches the profile.
  • In accordance with some embodiments, a computer-implemented method is provided for analyzing a digital interaction, the method comprising acts of: determining whether the digital interaction is suspicious; in response to determining that the digital interaction is suspicious, deploying a security probe of a first type to collect first data from the digital interaction; analyzing first data collected from the digital interaction by the security probe of the first type to determine if the digital interaction continues to appear suspicious; if the first data collected from the digital interaction by the security probe of the first type indicates that the digital interaction continues to appear suspicious, deploying a security probe of a second type to collect second data from the digital interaction; and if the first data collected from the digital interaction by the security probe of the first type indicates that the digital interaction no longer appears suspicious, deploying a security probe of a third type to collect third data from the digital interaction.
  • In accordance with some embodiments, a system is provided, comprising at least one processor and at least one computer-readable storage medium having stored thereon instructions which, when executed, program the at least one processor to perform any of the above methods.
  • In accordance with some embodiments, at least one computer-readable storage medium having stored thereon instructions which, when executed, program at least one processor to perform any of the above methods.
  • BRIEF DESCRIPTION OF DRAWINGS
  • FIG. 1A shows an illustrative system 10 via which digital interactions may take place, in accordance with some embodiments.
  • FIG. 1B shows an illustrative security system 14 for processing data collected from digital interactions, in accordance with some embodiments.
  • FIG. 1C shows an illustrative flow 40 within a digital interaction, in accordance with some embodiments.
  • FIG. 2A shows an illustrative data structure 200 for recording observations from a digital interaction, in accordance with some embodiments.
  • FIG. 2B shows an illustrative data structure 220 for recording observations from a digital interaction, in accordance with some embodiments.
  • FIG. 2C shows an illustrative process 230 for recording observations from a digital interaction, in accordance with some embodiments.
  • FIG. 3 shows illustrative attributes that may be monitored by a security system, in accordance with some embodiments.
  • FIG. 4 shows an illustrative process 400 for detecting anomalies, in accordance with some embodiments.
  • FIG. 5 shows an illustrative technique for dividing a plurality of numerical attribute values into a plurality of ranges, in accordance with some embodiments.
  • FIG. 6 shows an illustrative hash-modding technique for dividing numerical and/or non-numerical attribute values into buckets, in accordance with some embodiments.
  • FIG. 7A shows an illustrative histogram 700 representing a distribution of numerical attribute values among a plurality of buckets, in accordance with some embodiments.
  • FIG. 7B shows an illustrative histogram 720 representing a distribution of attribute values among a plurality of buckets, in accordance with some embodiments.
  • FIG. 8A shows an illustrative expected histogram 820 representing a distribution of attribute values among a plurality of buckets, in accordance with some embodiments.
  • FIG. 8B shows a comparison between the illustrative histogram 720 of FIG. 7B and the illustrative expected histogram 820 of FIG. 8A, in accordance with some embodiments.
  • FIG. 9 shows illustrative time periods 902 and 904, in accordance with some embodiments.
  • FIG. 10 shows an illustrative normalized histogram 1000, in accordance with some embodiments.
  • FIG. 11 shows an illustrative array 1100 of histograms over time, in accordance with some embodiments.
  • FIG. 12 shows an illustrative profile 1200 with multiple anomalous attributes, in accordance with some embodiments.
  • FIG. 13 shows an illustrative process 1300 for detecting anomalies, in accordance with some embodiments.
  • FIG. 14 shows an illustrative process 1400 for matching a digital interaction to a fuzzy profile, in accordance with some embodiments.
  • FIG. 15 shows an illustrative fuzzy profile 1500, in accordance with some embodiments.
  • FIG. 16 shows an illustrative fuzzy profile 1600, in accordance with some embodiments.
  • FIG. 17 shows an illustrative process 1700 for dynamic security probe deployment, in accordance with some embodiments.
  • FIG. 18 shows an illustrative cycle 1800 for updating one or more segmented lists, in accordance with some embodiments.
  • FIG. 19 shows an illustrative process 1900 for dynamically deploying multiple security probes, in accordance with some embodiments.
  • FIG. 20 shows an example of a decision tree 2000 that may be used by a security system to determine whether to deploy a probe and/or which one or more probes are to be deployed, in accordance with some embodiments.
  • FIG. 21 shows, schematically, an illustrative computer 5000 on which any aspect of the present disclosure may be implemented.
  • DETAILED DESCRIPTION
  • Aspects of the present disclosure relate to systems and methods for detecting and scoring anomalies.
  • In a distributed attack on a web site or application, an attacker may coordinate multiple computers to carry out the attack. For example, the attacker may launch the attack using a “botnet.” In some instances, the botnet may include a network of virus-infected computers that the attacker may control remotely.
  • The inventors have recognized and appreciated various challenges in detecting web attacks. For instance, in a distributed attack, the computers involved may be located throughout the world, and may have different characteristics. As a result, it may be difficult to ascertain which computers are involved in the same attack. Additionally, in an attempt to evade detection, a sophisticated attacker may modify the behavior of each controlled computer slightly so that no consistent behavior profile may be easily discernible across the attack. Accordingly, in some embodiments, anomaly detection techniques are provided with improved effectiveness against an attack participated by computers exhibiting different behaviors.
  • I. Dynamically Generated Fuzzy Profiles
  • Some security systems use triggers that trip on certain observed behaviors. For example, a trigger may be a pattern comprising an e-commerce user making a high-value order and shipping to a new address, or a new account making several orders with different credit card numbers. When one of these suspicious patterns is detected, an alert may be raised with respect to the user or account, and/or an action may be taken (e.g., suspending the transaction or account). However, the inventors have recognized and appreciated that a trigger-based system may produce false positives (e.g., a trigger tripping on a legitimate event) and/or false negatives (e.g., triggers not tripped during an attack or being tripped too late, when significant damage has been done).
  • Accordingly, in some embodiments, anomaly detection techniques are provided with reduced false positive rate and/or false negative rate. For example, one or more fuzzy profiles may be created. When an observation is made from a digital interaction, a score may be derived for each fuzzy profile, where the score is indicative of an extent to which the observation matches the fuzzy profile. In some embodiments, such scores may be derived in addition to, or instead of, Boolean outputs of triggers as described above, and may provide a more nuanced set of data points for a decision logic that determines what, if any, action is to be taken in response to the observation.
  • The inventors have recognized and appreciated that, although many attacks exhibit known suspicious patterns, it may take time for such patterns to emerge. For instance, an attacker may gain control of multiple computers that are seemingly unrelated (e.g., computers that are associated with different users, different network addresses, different geographic locations, etc.), and may use the compromised computers to carry out an attack simultaneously. As a result, damage may have been done by the time any suspicious pattern is detected.
  • The inventors have recognized and appreciated that a security system may be able to flag potential attacks earlier by looking for anomalies that emerge in real time, rather than suspicious patterns that are defined ahead of time. For instance, in some embodiments, a security system may monitor digital interactions taking place at a particular web site and compare what is currently observed against what was observed previously at the same web site. As one example, the security system may compare a certain statistic (e.g., a count of digital interactions reporting a certain browser type) from a current time period (e.g., 30 minutes, one hour, 90 minutes, two hours, etc.) against the same statistic from a past time period (e.g., the same time period a day ago, a week ago, a month ago, a year ago, etc.). If the current value of the statistic deviates significantly from the past value of the statistic (e.g., by more than a selected threshold amount), an anomaly may be reported. In this manner, anomalies may be defined dynamically, based on activity patterns at the particular web site. Such flexibility may reduce false positive and/or false negative errors. Furthermore, the security system may be able to detect attacks that do not exhibit any known suspicious pattern, and such detection may be possible before significant damage has been done.
  • II. Techniques for Efficient Processing and Representation of Data
  • The inventors have recognized and appreciated that a security system for detecting and scoring anomalies may process an extremely large amount of data. For instance, a security system may analyze digital interactions for multiple large organizations. The web site of each organization may handle hundreds of digital interactions per second, so that the security system may receive thousands, tens of thousands, or hundreds of thousands of requests per second to detect anomalies. In some instances, a few megabytes of data may be captured from each digital interaction (e.g., URL being accessed, user device information, keystroke recording, etc.) and, in evaluating the captured data, the security system may retrieve and analyze a few megabytes of historical, population, and/or other data. Thus, the security system may analyze a few gigabytes of data per second just to support 1000 requests per second. Accordingly, in some embodiments, techniques are provided for aggregating data to facilitate efficient storage and/or analysis.
  • Some security systems perform a security check only when a user takes a substantive action such as changing one or more access credentials (e.g., account identifier, password, etc.), changing contact information (e.g., email address, phone number, etc.), changing shipping address, making a purchase, etc. The inventors have recognized and appreciated that such a security system may have collected little information by the time the security check is initiated. Accordingly, in some embodiments, a security system may begin to analyze a digital interaction as soon as an entity arrives at a web site. For instance, the security system may begin collecting data from the digital interaction before the entity even attempts to log into a certain account. In some embodiments, the security system may compare the entity's behaviors against population data. In this manner, the security system may be able to draw some inferences as to whether the entity is likely a legitimate user, or a bot or human fraudster, before the entity takes any substantive action. Various techniques are described herein for performing such analyses in real time for a high volume of digital interactions.
  • In some embodiments, a number of attributes may be selected for a particular web site, where an attribute may be a question that may be asked about a digital interaction, and a value for that attribute may be an answer to the question. As one example, a question may be, “how much time elapsed between viewing a product to checking out?” An answer may be a value (e.g., in seconds or milliseconds) calculated based on a timestamp of a request for a product details page and a timestamp of a request for a checkout page. As another example, an attribute may include an anchor type that is observable from a digital interaction. For instance, a security system may observe that data packets received in connection with a digital interaction indicate a certain source network address and/or a certain source device identifier. Additionally, or alternatively, the security system may observe that a certain email address is used to log in and/or a certain credit card is charged in connection with the digital interaction. Examples of anchor types include, but are not limited to, account identifier, email address (e.g., user name and/or email domain), network address (e.g., IP address, sub address, etc.), phone number (e.g., area code and/or subscriber number), location (e.g., GPS coordinates, continent, country, territory, city, designated market area, etc.), device characteristic (e.g., brand, model, operating system, browser, device fingerprint, etc.), device identifier, etc.
  • In some embodiments, a security system may maintain one or more counters for each possible value (e.g., Chrome, Safari, etc.) of an attribute (e.g., browser type). For instance, a counter for a possible attribute value (e.g., Chrome) may keep track of how many digital interactions with that particular attribute value (e.g., Chrome) are observed within some period of time (e.g., 30 minutes, one hour, 90 minutes, two hours, etc.). Thus, to determine if there is an anomaly associated with an attribute, the security system may simply examine one or more counters. For instance, if the current time is 3:45 pm, the security system may compare a counter keeping track of the number of digital interactions reporting a Chrome browser since 3:00 pm, against a counter keeping track of the number of digital interactions reporting a Chrome browser between 3:00 pm and 4:00 pm on the previous day (or a week ago, a month ago, a year ago, etc.). This may eliminate or at least reduce on-the-fly processing of raw data associated with the attribute values, thereby improving responsiveness of the security system.
  • The inventors have recognized and appreciated that as the volume of digital interactions processed by a security system increases, the collection of counters maintained by the security system may become unwieldy. Accordingly, in some embodiments, possible values of an attribute may be divided into a plurality of buckets. Rather than maintaining one or more counters for each attribute value, the security system may maintain one or more counters for each bucket of attribute values. For instance, a counter may keep track of a number of digital interactions with any network address from a bucket B of network addresses, as opposed to a number of digital interactions with a particular network address Y. Thus, multiple counters (e.g., a separate counter for each attribute value in the bucket B) may be replaced with a single counter (e.g., an aggregate counter for all attribute values in the bucket B).
  • In this manner, a desired balance between precision and efficiency may be achieved by selecting an appropriate number of buckets. For instance, a larger number of buckets may provide a higher resolution, but more counters may be maintained and updated, whereas a smaller number of buckets may reduce storage requirement and speed up retrieval and updates, but more information may be lost.
  • The inventors have recognized and appreciated that it may be desirable to spread attribute values roughly evenly across a plurality of buckets. Accordingly, in some embodiments, a hash function may be applied to attribute values and a modulo operation may be applied to divide the resulting hashes into a plurality of buckets, where there may be one bucket for each residue of the modulo operation. An appropriate modulus may be chosen based on how many buckets are desired, and an appropriate hash function may be chosen to spread the attribute values roughly evenly across possible hashes. Examples of suitable hash functions include, but are not limited to, MD5, MD6, SHA-1, SHA-2, SHA-3, etc.
  • For example, there may be tens of thousands of possible user agents. The inventors have recognized and appreciated that it may not be important to precisely keep track of which user agents have been seen. Therefore, it may be sufficient to apply a hash-modding technique to divide the tens of thousands of possible user agents into, say, a hundred or fewer buckets. In this manner, if multiple user agents have been seen, there may be a high probability of multiple buckets being hit, which may provide sufficient information for anomaly detection.
  • III. Dynamically Deployed Security Probes
  • Some security systems flag all suspicious digital interactions for manual review, which may cause delays in sending acknowledgements to users. Moderate delays may be acceptable to organizations selling physical goods over the Internet, because for each order there may be a time window during which the ordered physical goods are picked from a warehouse and packaged for shipment, and a manual review may be conducted during that time window. However, many digital interactions involve sale of digital goods (e.g., music, game, etc.), transfer of funds, etc. For such digital interactions, a security system may be expected to respond to each request in real time, for example, within hundreds or tens of milliseconds. Such quick responses may improve user experience. For instance, a user making a transfer or ordering a song, game, etc. may wish to receive real time confirmation that the transaction has gone through. Accordingly, in some embodiments, techniques are provided for automatically investigating suspicious digital interactions, thereby improving response time of a security system.
  • In some embodiments, if a digital interaction matches one or more fuzzy profiles, a security system may scrutinize the digital interaction more closely, even if there is not yet sufficient information to justify classifying the digital interaction as part of an attack. The security system may scrutinize a digital interaction in a non-invasive manner so as to reduce user experience friction.
  • As an example, a security system may observe an anomalously high percentage of traffic at a retail web site involving a particular product or service, and may so indicate in a fuzzy profile. A digital interaction with an attempted purchase of that product or service may be flagged as matching the fuzzy profile, but that pattern alone may not be sufficiently suspicious, as many users may purchase that product or service for legitimate reasons. To prevent a false positive, one approach may be to send the flagged digital interaction to a human operator for review. Another approach may be to require one or more verification tasks (e.g., captcha challenge, security question, etc.) before approving the attempted purchase. The inventors have recognized and appreciated that both of these approaches may negatively impact user experience.
  • Accordingly, in some embodiments, a match with a fuzzy profile may trigger additional analysis that is non-invasive. For example, the security system may collect additional data from the digital interaction in a non-invasive manner and may analyze the data in real time, so that by the time the digital interaction progresses to a stage with potential for damage (e.g., charging a credit card), the security system may have already determined whether the digital interaction is likely to be legitimate.
  • In some embodiments, one or more security probes may be deployed dynamically to obtain information from a digital interaction. For instance, a security probe may be deploy only when a security system determines that there is sufficient value in doing so (e.g., using an understanding of user behavior). As an example, a security probe may be deployed when a level of suspicion associated with the digital interaction is sufficiently high to warrant an investigation (e.g., when the digital interaction matches a fuzzy profile comprising one or more anomalous attributes, or when the digital interaction represents a significant deviation from an activity pattern observed in the past for an anchor value, such as a device identifier, that is reported in the digital interaction).
  • The inventors have recognized and appreciated that by reducing a rate of deployment of security probes for surveillance, it may be more difficult for an attacker to detect the surveillance and/or to discover how the surveillance is conducted. As a result, the attacker may not be able to evade the surveillance effectively.
  • In some embodiments, multiple security probes may be deployed, where each probe may be designed to discover different information. For example, information collected by a probe may be used by a security system to inform the decision of which one or more other probes to deploy next. In this manner, the security system may be able to gain an in-depth understanding into network traffic (e.g., web site and/or application traffic). For example, the security system may be able to: classify traffic in ways that facilitate identification of malicious traffic, define with precision what type of attack is being observed, and/or discover that some suspect behavior is actually legitimate. In some embodiments, a result may indicate not only a likelihood that certain traffic is malicious, but also a likely type of malicious traffic. Therefore, such a result may be more meaningful than just a numeric score.
  • The inventors have recognized and appreciated that some online behavior scoring systems use client-side checks to collect information. In some instances, such checks are enabled in a client during many interactions, which may give an attacker clear visibility into how the online behavior scoring system works (e.g., what information is collected, what tests are performed, etc.). As a result, an attacker may be able to adapt and evade detection. Accordingly, in some embodiments, techniques are provided for obfuscating client-side functionalities. Used alone or in combination with dynamic probe deployment (which may reduce the number of probes deployed to, for example, one in hundreds of thousands of digital interactions), client-side functionality obfuscation may reduce the likelihood of malicious entities detecting surveillance and/or discovering how the surveillance is conducted. For instance, client-side functionality obfuscation may make it difficult for a malicious entity to test a probe's behavior in a consistent environment.
  • IV. Further Descriptions
  • It should be appreciated that the techniques introduced above and discussed in greater detail below may be implemented in any of numerous ways, as the techniques are not limited to any particular manner of implementation. Examples of details of implementation are provided herein solely for illustrative purposes. Furthermore, the techniques disclosed herein may be used individually or in any suitable combination, as aspects of the present disclosure are not limited to the use of any particular technique or combination of techniques.
  • FIG. 1A shows an illustrative system 10 via which digital interactions may take place, in accordance with some embodiments. In this example, the system 10 includes user devices 11A-C, online systems 12 and 13, and a security system 14. A user 15 may use the user devices 11A-C to engage in digital interactions. For instance, the user device 11A may be a smart phone and may be used by the user 15 to check email and download music, the user device 11B may be a tablet computer and may be used by the user 15 to shop and bank, and the user device 11C may be a laptop computer and may be used by the user 15 to watch TV and play games.
  • It should be appreciated that the user 15 may engage in other types of digital interactions in addition to, or instead of, those mentioned above, as aspects of the present disclosure are not limited to the analysis of any particular type of digital interactions. Also, digital interactions are not limited to interactions that are conducted via an Internet connection. For example, a digital interaction may involve an ATM transaction over a leased telephone line.
  • Furthermore, it should be appreciated that the particular combination of user devices 11A-C is provided solely for purposes of illustration, as the user 15 may use any suitable device or combination of devices to engage in digital interactions, and the user may use different devices to engage in a same type of digital interactions (e.g., checking email).
  • In some embodiments, a digital interaction may involve an interaction between the user 15 and an online system, such as the online system 12 or the online system 13. For instance, the online system 12 may include an application server that hosts a backend of a banking app used by the user 15, and the online system 13 may include a web server that hosts a retailer's web site that the user 15 visits using a web browser. It should be appreciated that the user 15 may interact with other online systems (not shown) in addition to, or instead of the online systems 12 and 13. For example, the user 15 may visit a pharmacy's web site to have a prescription filled and delivered, a travel agent's web site to book a trip, a government agency's web site to renew a license, etc.
  • In some embodiments, behaviors of the user 15 may be measured and analyzed by the security system 14. For instance, the online systems 12 and 13 may report, to the security system 14, behaviors observed from the user 15. Additionally, or alternatively, the user devices 11A-C may report, to the security system 14, behaviors observed from the user 15. As one example, a web page downloaded from the web site hosted by the online system 13 may include software (e.g., a JavaScript snippet) that programs the browser running on one of the user devices 11A-C to observe and report behaviors of the user 15. Such software may be provided by the security system 14 and inserted into the web page by the online system 13. As another example, an application running on one of the user devices 11A-C may be programmed to observe and report behaviors of the user 15. The behaviors observed by the application may include interactions between the user 15 and the application, and/or interactions between the user 15 and another application. As another example, an operating system running on one of the user devices 11A-C may be programmed to observe and report behaviors of the user 15.
  • It should be appreciated that software that observes and reports behaviors of a user may be written in any suitable language, and may be delivered to a user device in any suitable manner. For example, the software may be delivered by a firewall (e.g., an application firewall), a network operator (e.g., Comcast, Sprint, etc.), a network accelerator (e.g., Akamai), or any device along a communication path between the user device and an online system, or between the user device and a security system.
  • Although only one user (i.e., the user 15) is shown in FIG. 1A, it should be appreciated that the security system 14 may be programmed to measure and analyze behaviors of many users across the Internet. Furthermore, it should be appreciated that the security system 14 may interact with other online systems (not shown) in addition to, or instead of the online systems 12 and 13. The inventors have recognized and appreciated that, by analyzing digital interactions involving many different users and many different online systems, the security system 14 may have a more comprehensive and accurate understanding of how the users behave. However, aspects of the present disclosure are not limited to the analysis of measurements collected from different online systems, as one or more of the techniques described herein may be used to analyze measurements collected from a single online system. Likewise, aspects of the present disclosure are not limited to the analysis of measurements collected from different users, as one or more of the techniques described herein may be used to analyze measurements collected from a single user.
  • FIG. 1B shows an illustrative implementation of the security system 14 shown in FIG. 1A, in accordance with some embodiments. In this example, the security system 14 includes one or more frontend systems and/or one or more backend systems. For instance, the security system 14 may include a frontend system 22 configured to interact with user devices (e.g., the illustrative user device 11C shown in FIG. 1A) and/or online systems (e.g., the illustrative online system 13 shown in FIG. 1A). Additionally, or alternatively, the security system 14 may include a backend system 32 configured to interact with a backend user interface 34. In some embodiments, the backend user interface 34 may include a graphical user interface (e.g., a dashboard) for displaying current observations and/or historical trends regarding individual users and/or populations of users. Such an interface may be delivered in any suitable manner (e.g., as a web application or a cloud application), and may be used by any suitable party (e.g., security personnel of an organization).
  • In the example shown in FIG. 1B, the security system 14 includes a log storage 24. The log storage 24 may store log files comprising data received by the frontend system 22 from user devices (e.g., the user device 11C), online systems (e.g., the online system 13), and/or any other suitable sources. A log file may include any suitable information. For instance, in some embodiments, a log file may include keystrokes and/or mouse clicks recorded from a digital interaction over some length of time (e.g., several seconds, several minutes, several hours, etc.). Additionally, or alternatively, a log file may include other information of interest, such as account identifier, network address, user device identifier, user device characteristics, URL accessed, Stocking Keeping Unit (SKU) of viewed product, etc.
  • In some embodiments, the log storage 24 may store log files accumulated over some suitable period of time (e.g., a few years), which may amount to tens of billions, hundreds of billions, or trillions of log files. Each log file may be of any suitable size. For instance, in some embodiments, about 60 kilobytes of data may be captured from a digital interaction per minute, so that a log file recording a few minutes of user behavior may include a few hundred kilobytes of data, whereas a log file recording an hour of user behavior may include a few megabytes of data. Thus, the log storage 24 may store petabytes of data overall.
  • The inventors have recognized and appreciated it may be impractical to retrieve and analyze log files from the log storage 24 each time a request is received to examine a digital interaction for anomaly. For instance, the security system 14 may perform expected to respond to a request to detect anomaly within 100 msec, 80 msec, 60 msec, 40 msec, 20 msec, or less. The security system 14 may be unable to identify and analyze all relevant log files from the log storage 24 within such a short window of time. Accordingly, in some embodiments, a log processing system 26 may be provided to filter, transform, and/or route data from the log storage 24 to one or more databases 28.
  • The log processing system 26 may be implemented in any suitable manner. For instance, in some embodiments, the log processing system 26 may include one or more services configured to retrieve a log file from the log storage 24, extract useful information from the log file, transform one or more pieces of extracted information (e.g., adding latitude and longitude coordinates to an extracted address), and/or store the extracted and/or transformed information in one or more appropriate databases (e.g., among the one or more databases 28).
  • In some embodiments, the one or more services may include one or more services configured to route data from log files to one or more queues, and/or one or more services configured to process the data in the one or more queues. For instance, each queue may have a dedicated service for processing data in that queue. Any suitable number of instances of the service may be run, depending on a volume of data to be processed in the queue.
  • The one or more databases 28 may be accessed by any suitable component of the security system 14. As one example, the backend system 32 may query the one or more databases 28 to generate displays of current observations and/or historical trends regarding individual users and/or populations of users. As another example, a data service system 30 may query the one or more databases 28 to provide input to the frontend system 22.
  • The inventors have recognized and appreciated that some database queries may be time consuming. For instance, if the frontend system 22 were to query the one or more databases 28 each time a request to detect anomaly is received, the frontend system 22 may be unable to respond to the request within 100 msec, 80 msec, 60 msec, 40 msec, 20 msec, or less. Accordingly, in some embodiments, the data service system 30 may maintain one or more data sources separate from the one or more databases 28. An example of a data source maintained by the data service system 30 is shown in FIG. 2A and discussed below.
  • In some embodiments, a data source maintained by the data service system 30 may have a bounded size, regardless of how much data is analyzed to populate the data source. For instance, if there is a burst of activities from a certain account, an increased amount of data may be stored in the one or more databases 28 in association with that account. The data service system 30 may process the data stored in the one or more databases 28 down to a bounded size, so that the frontend system 22 may be able to respond to requests in constant time.
  • Various techniques are described herein for processing incoming data. For instance, in some embodiments, all possible network addresses may be divided into a certain number of buckets. Statistics may be maintained on such buckets, rather than individual network addresses. In this manner, a bounded number of statistics may be analyzed, even if an actual number of network addresses observed may fluctuate over time. One or more other techniques may also be used in addition to, or instead of bucketing, such as maintaining an array of a certain size.
  • In some embodiments, the data service system 30 may include a plurality of data services (e.g., implemented using a service-oriented architecture). For example, one or more data services may access the one or more databases 28 periodically (e.g., every hour, every few hours, every day, etc.), and may analyze the accessed data and populate one or more first data sources used by the frontend system 22. Additionally, or alternatively, one or more data services may receive data from the log processing system 26, and may use the received data to update one or more second data sources used by the frontend system 22. Such a second data source may supplement the one or more first data sources with recent data that has arrived since the last time the one or more first data sources were populated using data accessed from the one or more databases 28. In various embodiments, the one or more first data sources may be the same as, or different from, the one or more second data sources, or there may be some overlap.
  • Although details of implementation are shown in FIG. 1B and discussed above, it should be appreciated that aspects of the present disclosure are not limited to the use of any particular component, or combination of components, or to any particular arrangement of components. Furthermore, each of the frontend system 22, the log processing system 26, the data service system 30, and the backend system 32 may be implemented in any suitable manner, such as using one or more parallel processors operating at a same location or different locations.
  • FIG. 1C shows an illustrative flow 40 within a digital interaction, in accordance with some embodiments. In this example, the flow 40 may represent a sequence of activities conducted by a user on a merchant's web site. For instance, the user may log into the web site, change billing address, view a product details page of a first product, view a product details page of a second product, add the second product to a shopping cart, and then check out.
  • In some embodiments, a security system may receive data captured from the digital interaction throughout the flow 40. For instance, the security system may receive log files from a user device and/or an online system involved in the digital interaction (e.g., as shown in FIG. 1B and discussed above).
  • The security system may use the data captured from the digital interaction in any suitable manner. For instance, as shown in FIG. 1B, the security system may process the captured data and populate one or more databases (e.g., the one or more illustrative databases 28 shown in FIG. 1B). Additionally, or alternatively, the security system may populate one or more data sources adapted for efficient access. For instance, the security system may maintain current interaction data 42 in a suitable data structure (e.g., the illustrative data structure 220 shown in FIG. 2B). As one example, the security system may keep track of different network addresses observed at different points in the flow 40 (e.g., logging in and changing billing address via a first network address, viewing the first and second products via a second network address, and adding the second product to the cart and checking out via a third network address). As another example, the security system may keep track of different credit card numbers used in the digital interaction (e.g., different credit cards being entered in succession during checkout). The data structure may be maintained in any suitable manner (e.g., using the illustrative process 230 shown in FIG. 2C) and by any suitable component of the security system (e.g., the illustrative frontend system 22 and/or the illustrative data service system 30).
  • In some embodiments, the security system may maintain historical data 44, in addition to, or instead of the current interaction data 42. In some embodiments, the historical data 44 may include log entries for user activities observed during one or more prior digital interactions. Additionally, or alternatively, the historical data 44 may include one or more profiles associated respectively with one or more anchor values (e.g., a profile associated with a particular device identifier, a profile associated with a particular network address, etc.). However, it should be appreciated that aspects of the present disclosure are not limited to the use of any particular type of historical data, or to any historical data at all. Moreover, any historical data used may be stored in any suitable manner.
  • In some embodiments, the security system may maintain population data 46, in addition to, or instead of the current interaction data 42 and/or the historical data 44. For instance, the security system may update, in real time, statistics such as breakdown of web site traffic by user agent, geographical location, product SKU, etc. As one example, the security system may use a hash-modding method to divide all known browser types into a certain number of buckets (e.g., 10 buckets, 100 buckets, etc.). For each bucket, the security system may calculate a percentage of overall web site traffic that falls within that bucket. As another example, the security system may use a hash-modding method to divide all known product SKUs into a certain number of buckets (e.g., 10 buckets, 100 buckets) and calculate respective traffic percentages. Additionally, or alternatively, the security system may calculate respective traffic percentages for combinations of buckets (e.g., a combination of a bucket of browser types, a bucket of product SKUs, etc.).
  • In some embodiments, a security system may perform anomaly detection processing on an on-going basis and may continually create new fuzzy profiles and/or update existing fuzzy profiles. For instance, the security system may compare a certain statistic (e.g., a count of digital interactions reporting Chrome as browser type) from a current time period (e.g., 9:00 pm-10:00 pm today) against the same statistic from a past time period (e.g., 9:00 pm-10:00 pm yesterday, a week ago, a month ago, a year ago, etc.). If the current value of the statistic deviates significantly from the past value of the statistic (e.g., by more than a selected threshold amount), an anomaly may be reported, and the corresponding attribute (e.g., browser type) and attribute value (e.g., Chrome) may be stored in a fuzzy profile.
  • In some embodiments, the security system may render any one or more aspects of the current interaction data 42, the historical data 44, and/or the population data 46 (e.g., via the illustrative backend user interface 34 shown in FIG. 1B). For instance, the security system may render breakdown of web site traffic (e.g., with actual traffic measurements, or percentages of overall traffic) using a stacked area chart.
  • FIG. 1C also shows examples of time measurements in the illustrative flow 40. In some embodiments, the security system may receive data captured throughout the flow 40, and the received data may include log entries for user activities such as logging into the web site, changing billing address, viewing the product details page of the first product, viewing the product details page of the second product, adding the second product to the shopping cart, checking out, etc. The log entries may include timestamps, which may be used by the security system to determine an amount of time that elapsed between two points in the digital interaction. For instance, the security system may use the appropriate timestamps to determine how much time elapsed between viewing the second product and adding the second product to the shopping cart, between adding the second product to the shopping cart and checking out, between viewing the second product to checking out, etc.
  • The inventors have recognized and appreciated that certain timing patterns may be indicative of illegitimate digital interactions. For instance, a reseller may use bots to make multiple purchases of a product that is on sale, thereby circumventing a quantity restriction (e.g., one per customer) imposed by a retail web site. Such a bot may be programmed to step through an order quickly, to maximize the total number of orders completed during a promotional period. The resulting timing pattern may be noticeably different from that of a human customer browsing through the web site and taking time to read product details before making a purchase decision. Therefore, a timing pattern such as a delay between product view and checkout may be a useful attribute to monitor in digital interactions.
  • It should be appreciated that aspects of the present disclosure are not limited to the analysis of online purchases, as one or more of the techniques described herein may be used to analyze other types of digital interactions, including, but not limited to, opening a new account, checking email, transferring money, etc. Furthermore, it should be appreciated that aspects of the present disclosure are not limited to monitoring any particular timing attribute, or any timing attribute at all. In some embodiments, other attributes, such as various anchor types observed from a digital interaction, may be monitored in addition to, or instead of, timing attributes.
  • FIG. 2A shows an illustrative data structure 200 for recording observations from a digital interaction, in accordance with some embodiments. For instance, the data structure 200 may be used by a security system (e.g., the illustrative security system 14 shown in FIG. 1A) to record distinct anchor values of a same type that have been observed in a certain context. However, that is not required, as in some embodiments the data structure 200 may be used to record other distinct values, instead of, or in addition to, anchor values.
  • In some embodiments, the data structure 200 may be used to store up to N distinct anchor values of a same type (e.g., N distinct credit card numbers) that have been seen in a digital interaction. For instance, in some embodiments, the data structure 200 may include an array 205 of a certain size N. Once the array has been filled, a suitable method may be used to determine whether to discard a newly observed credit card number, or replace one of the stored credit card numbers with the newly observed credit card number. In this manner, only a bounded amount of data may be analyzed in response to a query, regardless of an amount of raw data that has been received.
  • In some embodiments, the number N of distinct values may be chosen to provide sufficient information without using an excessive amount of storage space. For instance, a security system may store more distinct values (e.g., 8-16) if precise values are useful for detecting anomalies, and fewer distinct values (e.g., 2-4) if precise values are less important. In some embodiments, N may be 8-16 for network addresses, 4-8 for credit card numbers, and 2-4 for user agents. The security system may use the network addresses to determine if there is a legitimate reason for multiple network addresses being observed (e.g., a user traveling and connecting to a sequence of access points along the way), whereas the security system may only look for a simple indication that multiple user agents have been observed.
  • It should be appreciated that aspects of the present disclosure are not limited to the use of an array to store distinct values. Other data structures, such as linked list, tree, etc., may also be used.
  • The inventors have recognized and appreciated that it may be desirable to store additional information in the data structure 200, beyond N distinct observed values. For instance, it may be desirable to store an indication of how many distinct values have been observed overall, and how such values are distributed. Accordingly, in some embodiments, possible values may be divided into a plurality of M buckets, and a bit string 210 of length M may be stored in addition to, or instead of, N distinct observed values. Each bit in the bit string 210 may correspond to a respective bucket, and may be initialized to 0. Whenever a value in a bucket is observed, the bit corresponding to that bucket may be set to 1.
  • Possible values may be divided into buckets in any suitable manner. For instance, in some embodiments, a hash function may be applied to possible values and a modulo operation (with modulus M) may be applied to divide the resulting hashes into M buckets. The modulus M may be chosen to achieve a desired balance between precision and efficiency. For instance, a larger number of buckets may provide a higher resolution (e.g., fewer possible values being lumped together and becoming indistinguishable), but the bit string 210 may take up more storage space, and it may be computationally more complex to update and/or access the bit string 210.
  • It should be appreciated that aspects of the present disclosure are not limited to the use of hash-modding to divide possible values into buckets, as other methods may also be suitable. For instance, in some embodiments, one or more techniques based on Bloom filters may be used.
  • FIG. 2B shows an illustrative data structure 220 for recording observations from a digital interaction, in accordance with some embodiments. For instance, the data structure 220 may be used by a security system (e.g., the illustrative security system 14 shown in FIG. 1A) to record distinct anchor values that have been observed in a certain context. However, that is not required, as in some embodiments the data structure 220 may be used to record other distinct values, instead of, or in addition to, anchor values.
  • In the example shown in FIG. 2B, the data structure 220 may be indexed by a session identifier and a flow identifier. The session identifier may be an identifier assigned by a web server for a web session. The flow identifier may identifier a flow (e.g., the illustrative flow 40 shown in FIG. 1C), which may include a sequence of activities. The security system may use the session and flow identifiers to match a detected activity to the digital interaction. However, it should be appreciated that aspects of the present disclosure are not limited to the use of a session identifier and a flow identifier to identify a digital interaction.
  • In some embodiments, the data structure 220 may include a plurality of components, such as components 222, 224, 226, and 228 shown in FIG. 2B. Each of the components 222, 224, 226, and 228 may be similar to the illustrative data structure 200 shown in FIG. 2A. For instance, the component 222 may store up to a certain number of distinct network addresses observed from the digital interaction, the component 224 may store up to a certain number of distinct user agents observed from the digital interaction, the component 226 may store up to a certain number of distinct credit card numbers observed from the digital interaction, etc.
  • In some embodiments, the data structure 220 may include a relatively small number (e.g., 10, 20, 30, etc.) of components such as 222, 224, 226, and 228. In this manner, a relatively small amount of data may be stored for each on-going digital interaction, while still allowing a security system to conduct an effective sameness analysis.
  • In some embodiments, the component 228 may store a list of lists of indices, where each list of indices may correspond to an activity that took place in the digital interaction. For instance, with reference to the illustrative flow 40 shown in FIG. 1C, a first list of indices may correspond to logging in, a second list of indices may corresponding to changing billing address, a third list of indices may correspond to viewing the first product, a fourth list of indices may correspond to viewing the second product, a fifth list of indices may correspond to adding the second product to the shopping cart, and a sixth list of indices may correspond to checking out.
  • In some embodiments, each list of indices may indicate anchor values observed from the corresponding activity. For instance, a list [1, 3, 2, . . . ] may indicate the first network address stored in the component 222, the third user agent stored in the component 224, the second credit card stored in the component 226, etc. This may provide a compact representation of the anchor values observed from each activity.
  • In some embodiments, if an anchor value stored in a component is replaced by another anchor value, one or more lists of indices including the anchor value being replaced may be updated. For instance, if the first network address stored in the component 222 is replaced by another network address, the list [1, 3, 2, . . . ] may be updated as [φ, 3, 2, . . . ], where φ is any suitable default value (e.g., N+1, where N is the capacity of the component 222).
  • In some embodiments, a security system may use a list of lists of indices to determine how frequently an anchor value has been observed. For instance, the security system may count a number of lists in which the index 1 appears at the first position. This may indicate a number of times the first network address stored in the component 222 has been observed.
  • It should be appreciated that the components 222, 224, 226, and 228 shown in FIG. 2B and discussed above solely for purposes of illustration, as aspects of the present disclosure are not limited to storing any particular information about a current digital interaction, or to any particular way of representing the stored information. For instance, other types of component data structures may be used in addition to, or instead of, the illustrative data structure 200 shown in FIG. 2A.
  • FIG. 2C shows an illustrative process 230 for recording observations from a digital interaction, in accordance with some embodiments. For instance, the process 230 may be performed by a security system (e.g., the illustrative security system 14 shown in FIG. 1A) to record distinct values of a same type (e.g., N distinct credit card numbers) that have been observed in a certain context (e.g., in a certain digital interaction). The distinct values may be recorded in a data structure such as the illustrative data structure 200 shown in FIG. 2A.
  • At act 231, the security system may identify an anchor value X in a certain context. For instance, in some embodiments, the anchor value X may be observed from a certain digital interaction. In some embodiments, the security system may access a record of the digital interaction, and may identify from the record a data structure associated with a type T of the anchor value X. For instance, if the anchor value X is a credit card number, the security system may identify, from the record of the digital interaction, a data structure for storing credit card numbers observed from the digital interaction.
  • At act 232, the security system may identify a bucket B to which the anchor value X belongs. For instance, in some embodiments, a hash-modding operation may be performed to map the anchor value X to the bucket B as described above in connection with FIG. 2A.
  • At act 233, the security system may store an indication that at least one anchor value from the bucket B has been observed in connection with the digital interaction. For instance, the security system may operate on the data structure identified at act 231. With reference with the example shown in FIG. 2A, the security system may identify, in the illustrative bit string 210, a position that corresponds to the bucket B identified at act 232 and write 1 into that position.
  • At act 234, the security system may determine whether the anchor value X has already been stored in connection with the relevant context. For instance, the security system may check if the anchor value X has already been stored in the data structure identified at act 231. With reference to the example shown in FIG. 2A, the security system may look up the anchor value X in the illustrative array 205. This lookup may be performed in any suitable manner. For instance, if the array 205 is sorted, the security system may perform a binary search to determine if the anchor value X is already stored in the array 205.
  • If it is determined at act 234 that the anchor value X has already been stored, the process 230 may end. Although not shown, the security system may, in some embodiments, increment one or more counters for the anchor value X prior to ending the process 230.
  • If it is determined at act 234 that the anchor value X has not already been stored, the security system may proceed to act 235 to determine whether to store the anchor value X. With reference to the example shown in FIG. 2A, the security system may, in some embodiments, store the anchor value X if the array 205 is not yet full. If the array 205 is full, the security system may determine whether to replace one of the stored anchor values with the anchor value X.
  • As one example, the security system may store in the array 205 the first N distinct anchor values of the type T observed from the digital interaction, and may discard every subsequently observed anchor value of the type T. As another example, the security system may replace the oldest stored anchor value with the newly observed anchor value, so that the array 205 stores the last N distinct values of the type T observed in the digital interaction. As another example, the security system may store in the array 205 a suitable combination of N anchor values of the type T, such as one or more anchor values observed near a beginning of the digital interaction, one or more anchor values most recently observed from the digital interaction, one or more anchor values most frequently observed from the digital interaction (e.g., based on respective counters stored for anchor values, or lists of indices such as the illustrative component 228 shown in FIG. 2B), and/or one or more other anchor values of interest (e.g., one or more credit card numbers previously involved in credit card cycling attacks).
  • FIG. 3 shows illustrative attributes that may be monitored by a security system, in accordance with some embodiments. In this example, a security system (e.g., the illustrative security system 14 shown in FIG. 1B) monitors a plurality of digital interactions, such as digital interactions 301, 302, 303, etc. These digital interactions may take place via a same web site. However, that is not required, as one or more of the techniques described herein may be used to analyze digital interactions taking place across multiple web sites.
  • In the example shown in FIG. 3, the security system monitors different types of attributes. For instance, the security system may record one or more anchor values for each digital interaction, such as network address (attribute 311), email address (attribute 312), account identifier (attribute 313), etc.
  • The security system may identify an anchor value from a digital interaction in any suitable matter. As one example, the digital interaction may include an attempt to log in, and an email address may be submitted to identify an account associated with the email address. However, that is not required, as in some embodiments a separate account identifier may be submitted and an email address on record for that account may be identified. As another example, the digital interaction may include an online purchase. A phone number may be submitted for scheduling a delivery, and a credit card number may be submitted for billing. However, that is not required, as in some embodiments a phone number and/or a credit card number may be identified from a record of the account from which the online purchase is made. As another example, the security system may examine data packets received in connection with the digital interaction and extract, from the data packets, information such as a source network address and a source device identifier.
  • It should be appreciated that the examples described above are merely illustrative, as aspects of the present disclosure are not limited to the use of any particular anchor type, or any particular method for identifying an anchor value. Examples of anchor types include, but are not limited to the following.
      • User information
        • account identifier
        • real name, social security number, driver's license number, passport number, etc.
        • email address
          • user name, country of user registration, date of user registration, etc.
          • email domain, DNS, server
          • status/type/availability/capabilities/software/etc., network details, domain registrar and associated details (e.g., country of domain registrant, contact information of domain registrant, etc.), age of domain, country of domain registration, etc.
        • phone number
          • subscriber number, country prefix, country of number, area code, state/province/parish/etc. of area code or number location, if the number is activated, if the number is forwarded, billing type (e.g. premium rate), ownership details (e.g., personal, business, and associated details regarding email, domain, network address, etc.), hardware changes, etc.
        • location
          • GPS coordinates, continent, country, territory, state, province, parish, city, time zone, designated market area, metropolitan statistical area, postal code, street name, street number, apartment number, address type (e.g., billing, shipping, home, etc.), etc.
        • payment
          • plain text or hash of number of credit card, payment card, debit card, bank card, etc., card type, primary account number (PAN), issuer identification number (IIN), IIN details (e.g., name, address, etc.), date of issue, date of expiration, etc.
      • Device information
        • brand, model, operating system, user agent, installed components, rendering artifacts, browser capabilities, installed software, available features, available external hardware (e.g. displays, keyboards, network and available associated data), etc.
        • device identifier, cookie/HTML storage, other device-based storage, secure password storage (e.g., iOS Keychain), etc.
        • device fingerprint (e.g., from network and environment characteristics)
      • Network information
        • network address (e.g., IP address, sub address, etc.), network identifier, network access identifier, mobile station equipment identity (IMEI), media access control address (MAC), subscriber identity module (SIM), etc.
        • IP routing type (e.g. fixed connection, aol, pop, superpop, satellite, cache proxy, international proxy, regional proxy, mobile gateway, etc.), proxy type (e.g., anonymous, distorting, elite/concealing, transparent, http, service provider, socks/socks http, web, etc.), connection type (e.g., anonymized, VPN, Tor, etc.), network speed, network operator, autonomous system number (ASN), carrier, registering organization of network address, organization NAICS code, organization ISIC code, if the organization is a hosting facility, etc.
  • Returning to FIG. 3, the security system may monitor one or more transaction attributes in addition to, or instead of, one or more anchor types. The security system may identify transaction attribute values from a digital interaction in any suitable matter. As one example, the digital interaction may include a purchase transaction, and the security system may identify information relating to the purchase transaction, such as the a SKU for a product being purchased (attribute 321), a count of items in a shopping cart at time of checkout (attribute 322), an average value of items being purchased (attribute 323), etc.
  • Alternatively, or additionally, the security system may monitor one or more timing attributes, such as time from product view to checking out (attribute 331), time from adding a product to cart to checking out (attribute 332), etc. Illustrative techniques for identifying timing attribute values are discussed in connection with FIG. 2.
  • It should be appreciated that the attributes shown in FIG. 3 and discussed above are provided solely for purposes of illustration, as aspects of the present disclosure are not limited to the use of any particular attribute or combination of attributes. For instance, in some embodiments, a digital interaction may include a transfer of funds, instead of, or in addition to, a purchase transaction. Examples of transaction attributes for a transfer of funds include, but are not limited to, amount being transferred, name of recipient institution, recipient account number, etc.
  • FIG. 4 shows an illustrative process 400 for detecting anomalies, in accordance with some embodiments. For instance, the process 400 may be performed by a security system (e.g., the illustrative security system 14 shown in FIG. 1B) to monitor digital interactions taking place at a particular web site. The security system may compare what is currently observed against what was observed previously at the same web site to determine whether there is any anomaly.
  • At act 405, the security system may identify a plurality of values of an attribute. As discussed in connection with FIG. 3, the security system may monitor any suitable attribute, such as an anchor type (e.g., network address, email address, account identifier, etc.), a transaction attribute (e.g., product SKU, number of items in shopping cart, average value of items purchased, etc.), a timing attribute (e.g., time from product view to checkout, time from adding product to shopping cart to checkout, etc.), etc.
  • In some embodiments, the security system may identify each value of the attribute from a respective digital interaction. For instance, the security system may monitor digital interactions taking place within a current time period (e.g., 30 minutes, one hour, 90 minutes, two hours, etc.), and may identify a value of the attribute from each digital interaction. However, it should be appreciated that aspects of the present disclosure are not limited to monitoring every digital interaction taking place within some time period. For instance, in some embodiments, digital interactions may be sampled (e.g., randomly) and attribute values may be identified from the sampled digital interactions.
  • The inventors have recognized and appreciated that it may be impractical to maintain statistics on individual attribute values. For instance, there may be billions of possible network addresses. It may be impractical to maintain a counter for each possible network address to keep track of how many digital interactions are reporting that particular network address. Accordingly, in some embodiments, possible values of an attribute may be divided into a plurality of buckets. Rather than maintaining a counter for each attribute value, the security system may maintain a counter for each bucket of attribute values. For instance, a counter may keep track of a number of digital interactions with any network address from a bucket B of network addresses, as opposed to a number of digital interactions with a particular network address Y. Thus, multiple counters (e.g., a separate counter for each attribute value in the bucket B) may be replaced with a single counter (e.g., an aggregate counter for all attribute values in the bucket B).
  • In this manner, a desired balance between precision and efficiency may be achieved by selecting an appropriate number of buckets. For instance, a larger number of buckets may provide a higher resolution, but more counters may be maintained and updated, whereas a smaller number of buckets may reduce storage requirement and speed up retrieval and updates, but more information may be lost.
  • Returning to the example of FIG. 4, the security system may, at act 410, divide the attribute values identified at act 405 into a plurality of buckets. In some embodiments, each bucket may be a multiset. For instance, if two different digital interactions report the same network address, that network address may appear twice in the corresponding bucket.
  • At act 415, the security system may determine a count of the values that fall within a particular bucket. In some embodiments, a count may be determined for each bucket of the plurality of buckets. However, that is not required, as in some embodiments the security system may only keep track of one or more buckets of interest.
  • Various techniques may be used to divide attribute values into buckets. As one example, the security system may divide numerical attribute values (e.g., time measurements) into a plurality of ranges. As another example, the security system may use a hash-modding technique to divide numerical and/or non-numerical attribute values into buckets. Other techniques may also be used, as aspects of the present disclosure are not limited to any particular technique for dividing attribute values into buckets.
  • FIG. 5 shows an illustrative technique for dividing a plurality of numerical attribute values into a plurality of ranges, in accordance with some embodiments. For instance, the illustrative technique shown in FIG. 5 may be used by a security system to divide values of the illustrative attribute 331 (time from product view to checkout) shown in FIG. 3 into a plurality of buckets.
  • In the example shown in FIG. 5, the plurality of buckets include three buckets corresponding respectively to three ranges of time measurements. For instance, bucket 581 may correspond to a range between 0 and 10 seconds, bucket 582 may correspond to a range between 10 and 30 seconds, and bucket 583 may correspond to a range of greater than 30 seconds.
  • In some embodiments, thresholds for dividing numeric measurements into buckets may be chosen based on observations from population data. For instance, the inventors have recognized and appreciated that the time from product view to checkout is rarely less than 10 seconds in a legitimate digital interaction, and therefore a high count for the bucket 581 may be a good indicator of an anomaly. In some embodiments, buckets may be defined based on a population mean and a population standard deviation. For instance, there may be a first bucket for values that are within one standard deviation of the mean, a second bucket for values that are between one and two standard deviations away from the mean, a third bucket for values that are between two and three standard deviations away from the mean, and a fourth bucket for values that are more than three standard deviations away from the mean. However, it should be appreciated that aspects of the present disclosure are not limited to the use of population mean and population standard deviation to define buckets. For instance, in some embodiments, a bucket may be defined based on observations from known fraudsters, and/or a bucket may be defined based on observations from known legitimate users.
  • In some embodiments, the security system may identify a plurality of values of the attribute 331 from a plurality of digital interactions. For instance, the security system may identify from each digital interaction an amount of time that elapsed between viewing a product details page for a product and checking out (e.g., as discuss in connection with FIGS. 1C and 3). In the example shown in FIG. 5, nine digital interactions are monitored, and nine values of the attribute 331 (time from product view to checkout) are obtained. It should be appreciated that aspects of the present disclosure are not limited to monitoring any particular number of digital interactions. For instance, in some embodiments, some or all of the digital interactions taking place during a certain period of time may be monitored, and the number of digital interactions may fluctuate depending on a traffic volume at one or more relevant web sites.
  • In the example shown in FIG. 5, the nine values are divided into the buckets 581-583 based on the corresponding ranges, resulting in four values (i.e., 10 seconds, 1 second, 2 seconds, and 2 seconds) in the bucket 581, three values (i.e., 25 seconds, 15 seconds, and 30 seconds) in the bucket 582, and two values (i.e., 45 seconds and 90 seconds) in the bucket 583. In this manner, numerical data collected by the security system may be quantized to reduce a number of possible values for a particular attribute, for example, from thousands or more of possible values (3600 seconds, assuming time is recorded up to one hour) to three possible values (three ranges). This may allow the security system to analyze the collected data more efficiently. However, it should be appreciated that aspects of the present disclosure are not limited to the use of any particular quantization technique, or any quantization technique at all.
  • FIG. 7A shows an illustrative histogram 700 representing a distribution of numerical attribute values among a plurality of buckets, in accordance with some embodiments. For instance, the histogram 700 may represent a result of dividing a plurality of time attribute values into a plurality of ranges, as discussed in connection with act 415 of FIG. 4. The time attribute values may be values of the illustrative attribute 331 (time from product view to checkout) shown in FIG. 3.
  • In the example of FIG. 7A, the histogram 700 includes a plurality of bars, where each bar may correspond to a bucket, and each bucket may correspond to a range of time attribute values. The height of each bar may represent a count of values that fall into the corresponding bucket. For instance, the count for the second bucket (between 1 and 5 minutes) may higher than the count for the first bucket (between 0 and 1 minute), while the count for the third bucket (between 5 and 15 minutes) may be the highest, indicating that a delay between product view and checkout most frequently falls between 5 and 15 minutes.
  • In some embodiments, a number M of buckets may be selected to provide an appropriate resolution to analyze measured values for an attribute, while managing storage requirement. For instance, more buckets may provide higher resolution, but more counters may be stored. Moreover, the buckets may correspond to ranges of uniform length, or variable lengths. For instance, in some embodiments, smaller ranges may be used where attribute values tend to cluster (e.g., smaller ranges below 15 minutes), and/or larger ranges may be used where attribute values tend to be sparsely distributed (e.g., larger ranges above 15 minutes). As an example, if a bucket has too many values (e.g., above a selected threshold number), the bucket may divided into two or more smaller buckets. As another example, if a bucket has too few values (e.g., below a selected threshold number), the bucket may be merged with one or more adjacent buckets. In this manner, useful information about distribution of the attribute values may be made available, without storing too many counters.
  • FIG. 6 shows an illustrative hash-modding technique for dividing numerical and/or non-numerical attribute values into buckets, in accordance with some embodiments. For instance, the illustrative technique shown in FIG. 6 may be used by a security system to divide values of the illustrative attribute 311 (IP addresses) shown in FIG. 3 into a plurality of buckets.
  • In some embodiments, a hash-modding technique may involve hashing an input value and performing a modulo operation on the resulting hash value. In the example is shown in FIG. 6, nine digital interactions are monitored, and nine values of the attribute 311 (IP address) are obtained. These nine IP addresses may be hashed to produce nine hash values, respectively. The following values may result from extracting two least significant digits from each hash value: 93, 93, 41, 41, 9a, 9a, 9a, 9a, 9a. This extraction process may be equivalent to performing a modulo operation (i.e., mod 256) on the hash values.
  • In some embodiments, each residue of the modulo operation may correspond to a bucket of attribute values. For instance, in the example shown in FIG. 6, the residues 93, 41, and 9a correspond, respectively, to buckets 681-683. As a result, there may be two attribute values in each of the bucket 681 and the bucket 682, and five attribute values in the bucket 683.
  • FIG. 7B shows an illustrative histogram 720 representing a distribution of attribute values among a plurality of buckets, in accordance with some embodiments. For instance, the histogram 720 may represent a result of dividing a plurality of attribute values into a plurality of buckets, as discussed in connection with act 415 of FIG. 4. The attribute values may be values of the illustrative attribute 311 (IP addresses) shown in FIG. 3. Each attribute value may be converted into a hash value, and a modulo operation may be applied to map each hash value to a residue, as discussed in connection with FIG. 6.
  • In the example of FIG. 7B, the histogram 720 includes a plurality of bars, where each bar may correspond to a bucket, and each bucket may correspond to a residue of the modulo operation. The height of each bar may represent a count of values that fall into the corresponding bucket. For instance, the count for the third bucket (residue “02”) is higher than the count for the first bucket (residue “00”) and the count for the second bucket (residue “01”), indicating that one or more IP addresses that hash-mod to “02” are frequently observed.
  • In some embodiments, a modulus M of the modulo operation (which determines how many buckets there are) may be selected to provide an appropriate resolution to analyze measured values for an attribute, while managing storage requirement. For instance, more buckets may provide higher resolution, but more counters may be stored. Moreover, in some embodiments, buckets may be further divided and/or merged. As one example, if a bucket has too many values (e.g., above a selected threshold number), the bucket may divided into smaller buckets. For instance, the bucket for hash values ending in “00” may be divided into 16 buckets for hash values ending, respectively, in “000,” “100,” . . . , “f00”), or into two buckets, the first for hash values ending in “000,” “100,” . . . , or “700, ” the second for hash values ending in “800,” “900,” . . . , or “f00.” As another example, if a bucket has too few values (e.g., below a selected threshold number), the bucket may be merged with one or more other buckets. In this manner, useful information about distribution of the attribute values may be made available, without storing too many counters.
  • Returning to the example of FIG. 4, the security system may, at act 420, compare the count determined in act 415 against historical information. In some embodiments, the historical information may include an expected count for the same bucket, and the security system may compare the count determined in act 415 against the expected count.
  • The determination at act 415 and the comparison at act 420 may be performed for any number of one or more buckets. For instance, in some embodiments, a histogram obtained at act 415 (e.g., the illustrative histogram 720 shown in FIG. 7B) may be compared against an expected histogram obtained from historical information.
  • FIG. 8A shows an illustrative expected histogram 820 representing a distribution of attribute values among a plurality of buckets, in accordance with some embodiments. The expected histogram 820 may be calculated in similar manner as the illustrative histogram 720 of FIG. 7B, except that attribute values used to calculate the expected histogram 820 may be obtained from a plurality of past digital interactions, such as digital interactions from a past period of time during which there is no known attack (or no known large-scale attack) on one or more relevant web site. Thus, the expected histogram 820 may represent an acceptable pattern.
  • FIG. 8B shows a comparison between the illustrative histogram 720 of FIG. 7B and the illustrative expected histogram 820 of FIG. 8A, in accordance with some embodiments. FIG. 9 shows illustrative time periods 902 and 904, in accordance with some embodiments. For instance, attribute values that are used to calculate the illustrative histogram 720 of FIG. 7B may be obtained from digital interactions taking place during the time period 902, whereas attribute values that are used to calculate the illustrative expected histogram 820 of FIG. 8A may be obtained from digital interactions taking place during the time period 904. In some embodiments, the security system may perform anomaly detection processing on a rolling basis. Whenever anomaly detection processing is performed, the time period 902 may be near a current time, whereas the time period 904 may be in the past.
  • In some embodiments, the time periods 902 and 904 may have a same length (e.g., 30 minutes, one hour, 90 minutes, 2 hours, etc.), and/or at a same time of day, so that the comparison between the histogram 720 and the expected histogram 820 may be more meaningful. In some embodiments, multiple comparisons may be made using different expected histograms, such as expected histograms for a time period of a same length from an hour ago, two hours ago, etc., and/or a same time period from a day ago, a week ago, a month ago, a year ago, etc. For instance, if a significant deviation is detected between the histogram 720 and an expected histogram (e.g., a day ago), the security system may compare the histogram 720 against an expected histogram that is further back in time (e.g., a week ago, a month ago, a year ago, etc.). This may allow the security system to take into account cyclical patterns (e.g., higher traffic volume on Saturdays, before Christmas, etc.)
  • Returning to the example of FIG. 4, the security system may, at act 425, determine if the there is any anomaly associated with the attribute in question (e.g., time from product view to checkout, IP address, etc.). For instance, with reference to FIG. 8A, the third bar (residue “02”) in the histogram 720 may exceed the third bar in the expected histogram 820 by a significant amount (e.g., more than a selected threshold amount). Thus, the security system may infer a possible attack from an IP address that hash-mods into “02.” The security system may store the attribute (e.g., IP address) and the particular bucket exhibiting an anomaly (e.g., residue “02”) in a fuzzy profile. As discussed in connection with FIG. 14, incoming digital interactions may be analyzed against the fuzzy profile, and one or more security measures may be imposed on matching digital interactions (e.g., digital interactions involving IP addresses that hash-mod into “02”). For example, one or more security probes may be deployed to investigate the matching digital interactions.
  • The inventors have recognized and appreciated that the illustrative techniques discussed in connection with FIG. 4 may provide flexibility in anomaly detection. For instance, the expected histogram 820 may be customized for a web site, by using only digital interactions taking place on that web site. Moreover, expected histograms may evolve over time. For instance, on any given day, the security system may use digital interactions from the day before (or a week ago, a month ago, a year ago, etc.) to calculate an expected histogram. In this manner, expected histograms may follow trends on the web site and remain up-to-date.
  • The inventors have recognized and appreciated that the illustrative techniques discussed in connection with FIG. 4 may facilitate detection of unknown anomalies. As one example, an unexpected increase in traffic from a few IP addresses may be an indication of coordinated attack from computer resources controlled by an attacker. As another example, an unexpected spike in a product SKU being ordered from a web site may be an indication of a potential pricing mistake and resellers ordering large quantities for that particular product SKU. A security system that merely looks for known anomalous patterns may not be able to detect such emergent anomalies.
  • Although details of implementation are shown in FIGS. 4-9 and discussed above, it should be appreciated that aspects of the present disclosure are not limited to such details. For instance, in some embodiments, the security system may compute a normalized count for a bucket, which may be a ratio between a count for the individual bucket and a total count among all buckets. The normalized count may then be compared against an expected normalized count, in addition to, or instead of, comparing the count against an expected count as described in connection with FIG. 8B.
  • The inventors have recognized and appreciated that normalization may be used advantageously to reduce false positives. For instance, during traditional holiday shopping seasons, or during an advertised sales special, there may be an increase of shopping web site visits and checkout activities. Such an increase may lead to an increase of absolute counts across multiple buckets. A comparison between a current absolute count for an individual bucket and an expected absolute count (e.g., an absolute count for that bucket observed a week ago) may show that the current absolute count exceeds the expected absolute count by more than a threshold amount, which may lead to a false positive identification of anomaly. By contrast, a comparison between a current normalized count and an expected normalized count may remain reliable despite an across-the-board increase in activities.
  • FIG. 10 shows an illustrative normalized histogram 1000, in accordance with some embodiments. In this example, each bar in the histogram 1000 corresponds to a bucket, and a height of the bar corresponds to a normalized count obtained by dividing an absolute count for the bucket by a sum of counts from all buckets. For instance, the first bucket may account for 10% of all digital interactions, the second bucket 15%, the third bucket 30%, the fourth bucket 15%, etc.
  • In some embodiments, a normalized histogram may be used at acts 415-420 of the illustrative process 400 of FIG. 4, instead of, or in addition to, a histogram with absolute counts. For instance, with increased sales activities during a holiday shopping season, an absolute count in a bucket may increase significantly from a week or a month ago, but a normalized count may remain roughly the same. If, on the other hand, an attack is taking place via digital interactions originating from a small number of IP addresses, a bucket to which one or more of the malicious IP addresses are mapped (e.g., via hash-modding) may account for an increased percentage of all digital interactions.
  • The inventors have recognized and appreciated that it may be beneficial to examine how histograms for an attribute evolve over time. For instance, more digital interactions may be expected from a certain time zone during daytime for that time zone, and a deviation from that pattern may indicate an anomaly. Accordingly, in some embodiments, an array of histograms may be built, where each histogram may correspond to a separate window of time.
  • FIG. 11 shows an illustrative array 1100 of histograms over time, in accordance with some embodiments. In this example, the array 1100 includes 24 histograms, each corresponding to a one-hour window. For instance, there may be a histogram for a current time, a histogram for one hour prior, a histogram for two hours prior, etc. These histograms may show statistics for a same attribute, such as IP address.
  • In the example shown in FIG. 11, there are four buckets for the attribute. For instance, the attribute may be IP address, and an IP address may be mapped to one of the four buckets based on a time zone associated with the IP address. For instance, buckets 1120, 1140, 1160, and 1180 may correspond, respectively, to Eastern, Central, Mountain, and Pacific.
  • The illustrative array 1100 shows peak activity levels in the bucket 1120 at hour markers −18, −19, and −20, which may be morning hours for the Eastern time zone. The illustrative array 1100 also shows peak activity levels in the bucket 1160 at hours markers −16, −17, and −19, which may be morning hours for the Mountain time zone. These may be considered normal patterns. Although not shown, a pike of activities at nighttime may indicate an anomaly.
  • Although a particular time resolution (i.e., 24 one-hour windows) is used in the example of FIG. 11, it should be appreciated that aspects of the present disclosure are not limited to any particular time resolution. One or more other time resolutions may be used additionally, or alternatively, such as 12 five-minute windows, seven one-day windows, 14 one-day windows, four one-week windows, etc. Furthermore, aspects of the present disclosure are not limited to the use of an array of histograms.
  • The inventors have recognized and appreciated that digital interactions associated with an attack may exhibit anomaly in multiple attributes. Accordingly, in some embodiments, a profile may be generated with a plurality of attributes to increase accuracy and/or efficiency of anomaly detection. For instance, a plurality of attributes may be monitored, and the illustrative process 400 of FIG. 4 may be performed for each attribute to determine if that attribute is anomalous (e.g., by building a histogram, or an array of histograms as discussed in connection with FIG. 11). In this manner, risk assessment may be performed in multiple dimensions, which may improve accuracy.
  • In some embodiments, one or more attributes may be selected so that a detected anomaly in any of the one or more attributes may be highly indicative of an attack. However, the inventors have recognized and appreciated that, while anomalies in some attributes may be highly indicative of attacks, such anomalies may rarely occur, so that it may not be worthwhile to expend time and resources (e.g., storage, processor cycles, etc.) to monitor those attributes. Accordingly, in some embodiments, an attribute may be selected only if anomalies in that attribute are observed frequently in known attacks (e.g., in higher than a selected threshold percentage of attacks).
  • The inventors have further recognized and appreciated that anomalies in one attribute may be correlated with anomalies in another attribute. For instance, there may be a strong correlation between time zone and language, so that an observation of an anomalous time zone value may not provide a lot of additional information if a corresponding language value is already known to be anomalous, or vice versa. Accordingly, in some embodiments, the plurality of attributes may be selected to be pairwise independent.
  • FIG. 12 shows an illustrative profile 1200 with multiple anomalous attributes, in accordance with some embodiments. In this example, the illustrative profile includes at least three attributes—time from product view to checkout, email domain, and product SKU. Three illustrative histograms 1220, 1240, and 1260 may be built for these attributes, respectively. For instance, each of the histograms 1220, 1240, and 1260 may be built based on recent digital interactions at a relevant web site, using one or more of the techniques described in connection with FIGS. 4-7B.
  • In the example of FIG. 12, the histograms 1220, 1240, and 1260 are compared against three expected histograms, respectively. In some embodiments, an expected histogram may be calculated based on historical data. As one example, each bar in an expected histogram may be calculated as a moving average over some length of time. As another example, an expected histogram may be a histogram calculated from digital interactions that took place in a past period of time, for instance, as discussed in connection with FIGS. 8A-9.
  • In the example of FIG. 12, each of the histograms 1220, 1240, and 1260 has an anomalous value. For instance, the third bucket for the histogram 1220 may show a count 1223 that is substantially higher (e.g., more than a threshold amount higher) than an expected count 1226 in the corresponding expected histogram, the fourth bucket for the histogram 1240 may show a count 1244 that is substantially higher (e.g., more than a threshold amount higher) than an expected count 1248 in the corresponding expected histogram, and the last bucket for the histogram 1260 shows a count 1266 that is substantially higher (e.g., more than a threshold amount higher) than an expected count 1272 in the corresponding expected histogram. In some embodiments, different thresholds may be used to determine anomaly for different attributes, as some attributes may have counts that tend to fluctuate widely over time, while other attributes may have counts that tend to stay relatively stable.
  • Although a particular combination of attributes is shown in FIG. 12 and described above, it should be appreciated that aspects of the present disclosure are not so limited. Any suitable one or more attributes may be used in a fuzzy profile for anomaly detection.
  • The inventors have recognized and appreciated that when information is collected from a digital interaction, not all of the collected information may be useful for anomaly detection. For instance, if a particular operating system has a certain vulnerability that is exploited in an attack, and the vulnerability exists in all versions of the operating system, a stronger anomalous pattern may emerge if all digital interactions involving that operating system are analyzed together, regardless of version number. If, by contrast, digital interactions are stratified by version number, each version number may deviate from a respective expected pattern only moderately, which may make the attack more difficult to detect.
  • Accordingly, in some embodiments, an entropy reduction operation may be performed on an observation from a digital interaction to remove information that may not be relevant for assessing a level of risk associated with the digital interaction. In this manner, less information may be processed, which may reduce storage requirement and/or improve response time of a security system.
  • FIG. 13 shows an illustrative process 1300 for detecting anomalies, in accordance with some embodiments. Like the illustrative process 400 of FIG. 4, the process 1300 may be performed by a security system (e.g., the illustrative security system 14 shown in FIG. 1B) to monitor digital interactions taking place at a particular web site. The security system may compare what is currently observed against what was observed previously at the same web site to determine whether there is any anomaly.
  • At act 1305, the security system may record a plurality of observations relating to an attribute. As discussed in connection with FIG. 3, the security system may monitor any suitable attribute, such as an anchor type (e.g., network address, email address, account identifier, etc.), a transaction attribute (e.g., product SKU, number of items in shopping cart, average value of items purchased, etc.), a timing attribute (e.g., time from product view to checkout, time from adding product to shopping cart to checkout, etc.), etc.
  • In some embodiments, the security system may record each observation from a respective digital interaction. Instead of dividing the observations into a plurality of buckets, the security system may, at act 1308, perform an entropy reduction operation on each observation, thereby deriving a plurality of attribute values. The plurality of attribute values are then divided into buckets, for instance, as discussed in connection with act 410 of FIG. 4. The remainder of the process 1300 may proceed as described in connection with FIG. 4.
  • As one example of entropy reduction, two observations relating to user agent may be recorded as follows:
      • Mozilla/5.0 (Macintosh; Intel Mac OS X 10_11_4) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/52.0.2743.116 Safari/537.36
      • Mozilla/5.0 (Macintosh; Intel Mac OS X 10_11_6) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/52.0.2743.116 Safari/537.36
  • The inventors have recognized and appreciated that the operating system Mac OS X may often be associated with attacks, regardless of version number (e.g., 10_11_6 versus 10_11_4).
  • If hash-modded directly, the above strings may land in two different buckets. As a result, an increase in traffic (e.g., 1000 digital interactions per hour) involving the above strings may be split between the two buckets, where each bucket may show a smaller increase (e.g., about 500 digital interactions per hour), and the security system may not be sufficiently confident to flag an anomaly.
  • Accordingly, in some embodiments, the security system may strip the operating system version numbers from the above strings at act 1308 of the illustrative process 1300 of FIG. 13. Additionally, or alternatively, the Mozilla version numbers “5.0” may be reduced to “5,” the AppleWebKit version numbers “537.36” may be reduced to “537,” the Chrome version numbers “52.0.2743.116 ” may be reduced to “52,” and the Safari version numbers “537.36” may be reduced to “537.” As a result, both of the above strings may be reduced to a common attribute value:
      • mozilla5macintoshintelmacosx10applewebkit537khtmllikegeckochrome52safari537
  • In this manner, digital interactions involving the two original strings may be aggregated into one bucket, which may accentuate anomalies and facilitate detection.
  • In some embodiments, entropy reduction may be performed incrementally. For instance, the security system may first strip out operating version numbers. If no discernible anomaly emerges, the security system may strip out AppleWebKit version numbers. This may continue until some discernible anomaly emerges, or all version numbers have been stripped out.
  • As another example of entropy reduction, an observation relating to display size may be recorded as follows:
      • 1024×768, 1440×900
  • There may be two sets of display dimensions because a computer used for the digital interaction may have two displays. In some embodiments, the security system may sort the display dimensions in some appropriate order (e.g., low to high, or high to low), which may result in the following:
      • 768, 900, 1024, 1440
  • The inventors have recognized and appreciated that sorting may allow partial matching. However, it should be appreciated that aspects of the present disclosure are not limited to sorting display dimensions.
  • In some embodiments, the security system may reduce the display dimensions, for example, by dividing the display dimensions by 100 and then rounding (e.g., using a floor or ceiling function). This may result in the following:
      • 8, 9, 10, 14
  • Thus, small differences in display dimensions may be removed. Such differences may occur due to changes in window sizes. For example, a height of a task bar may change, or the task bar may only be present sometimes. Such changes may be considered unimportant for anomaly detection.
  • Although the inventors have recognized and appreciated various advantages of entropy reduction, it should be appreciated that aspects of the present disclosure are not limited to any particular entropy reduction technique, or to the use of entropy reduction at all.
  • FIG. 14 shows an illustrative process 1400 for matching a digital interaction to a fuzzy profile, in accordance with some embodiments. For instance, the process 1400 may be performed by a security system (e.g., the illustrative security system 14 shown in FIG. 1B) to determine if a digital interaction is likely part of an attack.
  • In the example shown in FIG. 14, the fuzzy profile is built (e.g., using the illustrative process 400 shown in FIG. 4) for detecting illegitimate resellers. For instance, the profile may store one or more attributes that are anomalous. Additionally, or alternatively, the profile may store, for each anomalous attribute, an attribute value that is anomalous, and/or an indication of an extent to which that attribute value deviates from expectation.
  • In some embodiments, an anomalous attribute may be product SKU, and an anomalous attribute may be a particular hash-mod bucket (e.g., the last bucket in the illustrative histogram 1260 shown in FIG. 12). The profile may store an indication of an extent to which an observed count for that bucket (e.g., the count 1266) deviates from an expected count (e.g., the count 1272). As one example, the profile may store a percentage by which the observed count exceeds the expected count. As another example, the profile may store an amount by which the observed count exceeds the expected count. As another example, the profile may store an indication of a distance between the observed count and the expected count. For instance, the expected count may be an average count for the particular bucket over some period of time, and the expected interval may be defined based on a standard deviation (e.g., one standard deviation away from the average count, two standard deviations away, two standard deviations away, etc.).
  • Returning to FIG. 14, the security system may, at act 1405, identify a plurality of attributes from the fuzzy profile. In some embodiments, digital interactions with a retailer's web store may be analyzed to distinguish possible resellers from retail customers who purchase goods for their own use. A reseller profile for use in such an analysis may contain attributes such as the following.
      • Product SKU
      • Email Domain of purchaser
      • Browser Type
      • Web Session Interaction Time
  • At act 1410, the security system may select an anomalous attribute (e.g., product SKU) and identify one or more values that are anomalous (e.g., one or more hash-mod buckets with anomalously high counts). At act 1415, the security system may determine if the digital interaction that is being analyzed matches the fuzzy profile with respect to the anomalous attribute. For instance, the security system may identify a hash-mod bucket for a product SKU that is being purchased in the digital interaction, and determine whether that hash-mod bucket is among one or more anomalous hash-mod buckets stored in the profile for the product SKU attribute. If there is a match, the security system may so record.
  • At act 1420, the security system may determine if there is another anomalous attribute to be processed. If so, the security system may return to act 1410. Otherwise, the security system may proceed to act 1425 to calculate a penalty score. The penalty score may be calculated in any suitable manner. In some embodiments, the penalty score is determined on a ratio between a count of anomalous attributes with respect to which the digital interaction matches the profile, and a total count of anomalous attributes. Illustrative code for calculating the penalty score is shown below.
  • PENALTY_MIN = 100
    PENALTY_MAX = 500
    PENALTY = 0
    PARAMETERS = array(sku_histograms, domain_histograms,
    browser_histograms, time_histograms)
    NUM_MATCH = 0
    FOREACH PARAMETERS as PARAM
     IF isAnomalous(PARAM)NUM_MATCH++
    // Minimum threshold for anomaly (e.g., two out of four
    match) may be set in any suitable way
    // If threshold exceeded, linear interpolation between MIN
    and MAX
    if (NUM_MATCH >= 2)
     RATIO = NUM MATCH / COUNT(PARAMETERS)
     PENALTY = ((PENALTY_MAX − PENALTY_MIN) * RATIO) +
     PENALTY_MIN
     triggerSignal(“RESELLER”, PENALTY)
    END
  • Additionally, or alternatively, an attribute penalty score may be determined for a matching attribute based on an extent to which an observed count for a matching bucket deviates from an expected count for that bucket. An overall penalty score may then be calculated based on one or more attribute penalty scores (e.g., as a weighted sum).
  • In this example, a penalty score calculated using a reseller profile may indicate a likelihood that a reseller is involved in a digital interaction. Such a penalty score may be used in any suitable manner. For instance, the web retailer may use the penalty score to decide whether to initiate one or more actions, such as canceling an order already placed by the suspected reseller, suspend the suspected reseller's account, and/or prevent creation of a new account by an entity linked to the suspected reseller's account.
  • It should be appreciated the reseller profile described above in connection with FIG. 14 is provided merely for purposes of illustration. Aspects of the present disclosure are not limited to monitoring any particular attribute or combination of attributes to identify resellers, nor to the use of a reseller profile at all. In various embodiments, any suitable attribute may be monitored to detect any type of anomaly, in addition to, or instead of, reseller activity.
  • In some embodiments, one or more past digital interactions may be identified, using any suitable method, as part of an attack. Each such digital interaction may be associated with an anchor value (e.g., IP address, name, account ID, email address, device ID, device fingerprint, user ID, hashed credit card number, etc.), and the anchor value may in turn be associated with a behavior profile. Thus, one or more behavior profiles may be identified as being associated with the attack and may be used to build a fuzzy profile.
  • In some embodiments, a fuzzy profile may include any suitable combination of one or more attributes, which may, although need not, coincide with one or more attributes of the behavior profiles from which the fuzzy profile is built. For instance, the fuzzy profile may store a range or limit of values for an attribute, where the range or limit may be determined based on values of the attribute stored in the behavior profiles.
  • FIG. 15 shows an illustrative fuzzy profile 1500, in accordance with some embodiments. In this example, three individual behaviors A, B, and C are observed in known malicious digital interactions. For instance, each of the behaviors A, B, and C may be observed in 20% of known malicious digital interactions (although it should be appreciated that behaviors observed at different frequencies may also be analyzed together).
  • The inventors have recognized and appreciated that although each of the behaviors A, B, and C, individually, may be a poor indicator of whether a digital interaction exhibiting that behavior is part of an attack, certain combinations of the behaviors A, B, and C may provide more reliable indicators. For example, if a digital interaction exhibits both behaviors A and B, there may be a high likelihood (e.g., 80%) that the digital interaction is part of an attack, whereas if a digital interaction exhibits both behaviors B and C, there may be a low likelihood (e.g., 40%) that the digital interaction is part of an attack. Thus, if a digital interaction exhibits behavior B, that the digital interaction also exhibits behavior A may greatly increase the likelihood that the digital interaction is part of an attack, whereas that the digital interaction also exhibits behavior C may increase that likelihood to a lesser extent. It should be appreciated that specific percentages are provided in the example of FIG. 1 merely for purposes of illustration, as other percentages may also be possible.
  • FIG. 16 shows an illustrative fuzzy profile 1600, in accordance with some embodiments. In this example, the fuzzy profile 1600 includes six individual behaviors A, B, C, X, Y, and Z, where behaviors A, B, and C each include an observed historical pattern, and behaviors X, Y, and Z each include a behavior observed during a current digital interaction. If a digital interaction is associated with an anchor value (e.g., IP address, account ID, etc.) exhibiting both historical patterns A and B, there may be a high likelihood (e.g., 80%) that the digital interaction is part of an attack. As discussed above in connection with FIG. 13, such a likelihood may be determined based on a percentage of malicious digital interactions that are also associated with an anchor value exhibiting both historical patterns A and B.
  • If a digital interaction is associated with an anchor value (e.g., IP address, account ID, etc.) exhibiting historical patterns C, and if both behaviors X and Y are observed during the current digital interaction, there may be an even higher likelihood (e.g., 98%) that the digital interaction is part of an attack. If, on the other hand, only behaviors Y and Z are observed during the current digital interaction, there may be a lower likelihood (e.g., 75%) likelihood that the digital interaction is part of an attack.
  • In some embodiments, one or more behaviors observed in a new digital interaction may be checked against a fuzzy profile and a score may be computed that is indicative of a likelihood that the new digital interaction is part of an attack associated with the fuzzy profile. In this manner, an anchor value associated with the new digital interaction may be linked to a known malicious anchor value associated with the fuzzy profile.
  • The inventors have recognized and appreciated that the use of fuzzy profiles to link anchor values may be advantageous. For instance, fuzzy profiles may capture behavior characteristics that may be more difficult for an attacker to spoof, compared to other types of characteristics such as device characteristics. Moreover, in some embodiments, a fuzzy profile may be used across multiple web sites and/or applications. For example, when an attack occurs against a particular web site or application, a fuzzy profile may be created based on that attack (e.g., to identify linked anchor values) and may be used to detect similar attacks on a different web site or application. However, it should be appreciated that aspects of the present disclosure are not limited to the use of a fuzzy profile, as each of the techniques described herein may be used alone, or in combination with any one or more other techniques described herein.
  • Some retailers use Stock Keeping Units (SKUs) or other types of identifiers to identify products and/or services sold. This may allow analysis of sales data by product/service, for example, to identify historical purchase trends. In some embodiments, techniques are provided for identifying unexpected sale patterns. Although SKUs are used in some of the examples described herein, it should be appreciated that aspects of the present disclosure are not limited to the use of SKUs, as other types of identifiers for products and/or services may also be used.
  • The inventors have recognized and appreciated that a SKU may sometimes become incorrectly priced in a retailer's inventory management software. This may be the result of a glitch or bug in the software, or a human error. As one example, a product that normally sells for $1,200.00 may be incorrectly priced at $120.00, which may lead to a sharp increase in the number of purchases of that product. In an automated retail environment, such as e-commerce, the retailer may inadvertently allow transactions to complete and ship goods at a loss. Examples of other problems that may lead to anomalous sales data include, but are not limited to, consumers exploiting unexpected coupon code interactions, consumers violating sale policies (e.g., limit one item per customer at a discounted price), and commercial resellers attempting to take advantage of consumer-only pricing.
  • Accordingly, in some embodiments, techniques are provided for detecting unexpected sale patterns and notifying retailers so that any underlying problems may be corrected. For example, a security system may be programmed to monitor purchase activity (e.g., per SKU or group of SKUs) and raise an alert when significant deviation from an expected baseline is observed. The security system may use any suitable technique for detecting unexpected sale patterns, including, but not limited, using a fuzzy profile as described herein.
  • The inventors have recognized and appreciated that some systems only analyze historical sales data (e.g., sale patterns for previous month or year). As a result, retailers may not be able to discover issues such as those discussed above until the damage has been done (e.g., goods shipped and transactions closed). Accordingly, in some embodiments, techniques are provided for analyzing sales data and alerting retailers in real time (e.g., before sending confirmations to consumers, when payments are still being processed, before goods are shipped, before goods are received by consumers, before transactions are marked closed, etc.).
  • In some embodiments, one or more automated countermeasures may be implemented in response to an alert. For example, a retailer may automatically freeze sales transactions that are in progress, and/or remove a SKU from the website, until an investigation is conducted. Additionally, or alternatively, one or more recommendations may be made to a retailer (e.g., based on profit/loss calculations), so that the retailer may decide to allow or block certain activities depending on projected financial impact.
  • In some embodiments, data relating to sales activities may be collected and stored in a database. One or more metrics may then be derived from the stored data. Examples of metrics that may be computed for a particular SKU or group of SKUs include, but are not limited to, proportion of transactions including that SKU or group of SKUs (e.g., out of all transactions at a website or group of websites), average number of items of that SKU or group of SKUs purchased in a single transaction or over a certain period of time by a single buyer, etc.
  • In some embodiments, one or more metrics derived from current sales activities may be compared against historical data. For example, JavaScript code running on a website may monitor one or more sales activities and compare one or more current metrics against historical data. However, it should be appreciated that aspects of the present disclosure are not limited to the use of JavaScript, as any suitable client-side and/or server-side programs written in any suitable language may be used to implement any one or more of the functionalities described herein.
  • In some embodiments, an alert may be raised if one or more current metrics represent a significant deviation from one or more historically observed baselines. The one or more metrics may be derived in any suitable manner. For instance, in some embodiments, a metric may pertain to all transaction conducted over a web site or group of web sites, or may be specific to a certain anchor value such as a certain IP address or a certain user account. Additionally, or alternatively, a metric may be per SKU or group of SKUs.
  • As a non-limiting example, an electronics retailer may sell a particular model of television for $1,200.00. Historical sales data may indicate one or more of the following:
      • a small percentage (e.g., 1% of) transactions site-wide include this particular model of television;
      • a large percentage (e.g., 99%) of transactions including this model of television include only one television;
      • on average the retailer sells a moderate number (e.g., 30) of televisions of this model per month;
      • an average value of transactions including this model of television is $1,600.00 (or some other value close to the price of this model of television);
      • sales of this model of television spike during one or more specific time periods, such as on or around Black Friday or Boxing Day;
      • sales of this model of television drop during summer months;
      • etc.
  • In some embodiments, a system may be provided that is programmed to use historical data (e.g., one or more of the observations noted above) as a baseline to intelligently detect notable deviations. For instance, with reference to the television example described above, if the retailer's stock keeping system incorrectly priced the $1,200.00 model of television at $120.00, one or more of the following may be observed:
      • the proportion of transactions site-wide including this particular model of television increases sharply (e.g., from 1% to 4%);
      • transactions including this model of television suddenly start including multiple televisions;
      • the retailer has sold more televisions in the last 24 hours than the retailer does typically in one month;
      • the average value of transactions including this model of television drops substantially;
      • etc.
  • In some embodiments, alerts may be triggered based on observations such as those described above. As one example, any one of a designated set of observations may trigger an alert. As another example, a threshold number of observations (e.g., two, three, etc.) from a designated set of observations may trigger an alert. As yet another example, one or more specific combinations of observations may trigger an alert.
  • In some embodiments, when an alert is raised, a retailer may be notified in real time. In this manner, the retailer may be able to investigate and correct one or more errors that led to the anomalous sales activities, before significant damage is done to the retailer's business.
  • Although an example is described above relating to mispriced items, it should be appreciated that the techniques described herein may be used in other scenarios as well. For example, one or more of the techniques described herein may be used to detect abuse of sale prices, new customer loss-leader deals, programming errors relating to certain coupon codes, resellers buying out stock, etc. Any of these and/or other anomalies may be detected from a population of transactions.
  • In some embodiments, an online behavior scoring system may calculate a risk score for an anchor value, where the anchor value may be associated with an entity such as a human user or a bot. The risk score may indicate a perceived likelihood that the associated entity is malicious (e.g., being part of an attack). Risk scores may be calculated using any suitable combination of one or more techniques, including, but not limited to:
      • Analyzing traffic volumes over one or more dimensions such as IP, UID stored in a cookie, device fingerprint, etc. Observations may be compared against a baseline, which may be derived from one or more legitimate samples.
      • Analyzing historical access patterns. For example, a system may detect a new user ID and device association (e.g., a user logging in from a newly purchased mobile phone). The system may observe a rate at which requests associated with the user ID are received from the new device, and may compare the newly observed rate against a rate at which requests associated with the user ID were received from a previous device. Additionally, or alternatively, the system may observe whether requests are distributed in a similar manner throughout different times of day (e.g., whether more or fewer requests are received at a certain time of day).
      • Checking reputation of origins, for example, using honeypots, IP blacklists, and/or TOR lists.
      • Using one-time-use tokens to detect replays of old communication.
      • Altering forms to detect GUI replay or screen macro agents, for example, by adding or removing fields, altering x/y coordinates of fields, etc.
  • The inventors have recognized and appreciated that a sophisticated attacker may be able to detect when some of the above-described techniques are deployed, and to react accordingly to avoid appearing suspicious. Accordingly, in some embodiments, techniques are provided for monitoring online behavior in a manner that is transparent to entities being monitored.
  • In some embodiments, one or more security probes may be deployed dynamically to obtain information regarding an entity. For instance, a security probe may be deploy only when a security system determines that there is sufficient value in doing so (e.g., using an understanding of user behavior). As an example, a security probe may be deployed when a level of suspicion associated with the entity is sufficiently high to warrant an investigation (e.g., when recent activities of an entity represent a significant deviation from an activity pattern observed in the past for that entity). The inventors have recognized and appreciated that by reducing a rate of deployment of security probes for surveillance, it may be more difficult for an attacker to detect the surveillance and/or to discover how the surveillance is conducted. As a result, the attacker may not be able to evade the surveillance effectively.
  • FIG. 17 shows an illustrative process 1700 for dynamic security probe deployment, in accordance with some embodiments. For instance, the process 1700 may be performed by a security system (e.g., the illustrative security system 14 shown in FIG. 1B) to determine if and when to deploy one or more security probes.
  • At act 1705, the security system may receive data regarding a digital interaction. For instance, as discussed in connection with FIG. 1B, the security system may receive log files comprising data recorded from digital interactions. The security system may process the received data and store salient information into an appropriate data structure, such as the illustrative data structure 220 shown in FIG. 2B. The stored information may be used, at act 1710, to determine if the digital interaction is suspicious.
  • Any suitable technique may be used to determine if the digital interaction is suspicious. For instance, the illustrative process 1400 shown in FIG. 14 may be used to determine if the digital interaction matches a fuzzy profile that stores anomalous attributes. If a resulting penalty score is below a selected threshold, the security system may proceed to act 1715 to perform standard operation. Otherwise, the security system may proceed to act 1720 to deploy a security probe, and data collected by the security probe from the digital interaction may be analyzed at act 1725 to determine if further action is appropriate.
  • The penalty score threshold may be chosen in any suitable manner. For instance, the inventors have recognized and appreciated that, while it may be desirable to collect more data from digital interactions, the security system may have limited resources such as network bandwidth and processing power. Therefore, to conserve resources, security probes should be deployed judiciously. Moreover, the inventors have recognized and appreciated that frequent deployment of probes may allow an attacker to study the probes and learn how to evade detection. Accordingly, in some embodiments, a penalty score threshold may be selected to provide a desired tradeoff.
  • It should be appreciated that aspects of the present disclosure are not limited to the use of a fuzzy profile to determine if and when to deploy a security probe. Additionally, or alternatively, a profile associated with an anchor value observed from the digital interaction may be used to determine if the digital interaction is sufficiently similar to prior digital interactions from which the anchor value was observed. If it is determined that the digital interaction is not sufficiently similar to prior digital interactions from which the anchor value was observed, one or more security probes may be deployed to gather additional information from the digital interaction.
  • In some embodiments, a security system may be configured to segment traffic over one or more dimensions, including, but not limited to, IP Address, XFF IP Address, C-Class IP Address, Input Signature, Account ID, Device ID, User Agent, etc. For instance, each digital interaction may be associated with one or more anchor values, where each anchor value may correspond to a dimension for segmentation. This may allow the security system to create segmented lists. As one example, a segmented list may be created that includes all traffic reporting Chrome as the user agent. Additionally, or alternatively, a segmented list may be created that includes all traffic reporting Chrome Version 36.0.1985.125 as the user agent. In this manner, segmented lists may be created at any suitable granularity. As another example, a segmented list may include all traffic reporting Mac OS X 10.9.2 as the operating system. Additionally, or alternatively, a segmented list may be created that includes all traffic reporting Chrome Version 36.0.1985.125 as the user agent and Mac OS X 10.9.2 as the operating system. In this manner, segmented lists may be created with any suitable combination of one or more anchor values.
  • In some embodiments, one or more metrics may be collected and stored for a segmented list. For instance, a segmented list (e.g., all traffic associated with a particular IP address or block of IP addresses) may be associated with a segment identifier, and one or more metrics collected for that segmented list may be stored in association with the segment identifier. Examples of metrics that may be collected include, but are not limited to, average risk score, minimum risk score, maximum risk score, number of accesses with some window of time (e.g., the last 5 minutes, 10 minutes, 15 minutes, 30 minutes, 45 minutes, hour, 2 hours, 3 hours, 6 hours, 12 hours, 24 hours, day, 2 days, 3 days, 7 days, two weeks, etc.), geographic data, etc.
  • In some embodiments, a security system may use one or more metrics stored for a segmented list to determine whether a security probe should be deployed. For example, a security probe may be deployed when one or more metrics exceed corresponding thresholds. The security system may select one or more probes based on a number of different factors, such as which one or more metrics have exceeded the corresponding thresholds, by how much the one or more metrics have exceeded the corresponding thresholds, and/or which segmented list is implicated.
  • Thresholds for metrics may be determined in any suitable manner. For instance, in some embodiments, one or more human analysts may examine historical data (e.g., general population data, data relating to traffic that turned out to be associated with an attack, data relating to traffic that was not identified as being associated with an attack, etc.), and may select the thresholds based on the historical data (e.g., to achieve a desire tradeoff between false positive errors and false negative errors). Additionally, or alternatively, one or more techniques described below in connection with threshold-type sensors may be used to select thresholds automatically.
  • The inventors have recognized and appreciated that some online behavior scoring systems use client-side checks to collect information. In some instances, such checks are enabled in a client during many interactions, which may give an attacker clear visibility into how the online behavior scoring system works (e.g., what information is collected, what tests are performed, etc.). As a result, an attacker may be able to adapt and evade detection. Accordingly, in some embodiments, techniques are provided for obfuscating client-side functionalities. Used alone or in combination with dynamic probe deployment (which may reduce a number of probes deployed to, for example, one in hundreds of thousands of interactions), client-side functionality obfuscation may reduce a likelihood of malicious entities detecting surveillance and/or discovering how the surveillance is conducted. For instance, client-side functionality obfuscation may make it difficult for a malicious entity to test a probe's behavior in a consistent environment.
  • FIG. 18 shows an illustrative cycle 1800 for updating one or more segmented lists, in accordance with some embodiments. In this example, one or more handlers may be programmed to read from a segmented list (e.g., by reading one or more metrics associated with the segmented list) and determine whether and/or how a probe should be deployed. Examples of handlers include, but are not limited to, an initialization handler programmed to handle initialization requests and return HTML code, and/or an Ajax (asynchronous JavaScript and XML) handler programmed to respond to Ajax requests. Additionally, or alternatively, one or more handlers (e.g., a score handler programmed to calculate risk scores) may be programmed to write to a segmented list (e.g., by updating one or more metrics associated with the segmented list, such as average, minimum, and/or maximum risk scores). However, aspects of the present disclosure are not limited to the use of handlers, as other types of programs may also be used to implement any of the functionalities described herein.
  • In some embodiments, a write to a segmented list may trigger one or more reads from the segmented list. For example, whenever a score handler updates a risk score metric, a cycle may be started and an initialization handler and/or Ajax handler may read one or more segmented lists affected by the update. In this manner, whenever a new event takes place that affects a metric, a fresh determination may be made as to whether to deploy one or more probes. However, aspects of the present disclosure are not limited to the implementation of such cycles, as in some embodiments a segmented list may be read periodically, regardless of observations from new events.
  • In some embodiments, a probe may be deployed to one or more selected interactions only, as opposed to all interactions in a segmented list. For example, a probe may be deployed only to one or more suspected members in a segmented list (e.g., a member for which one or more measurements are at or above certain alert levels). Once a result is received from the probe, the result may be stored in association with the member and/or the segmented list, and the probe may not be sent again. In this manner, a probe may be deployed only a limited number of times, which may make it difficult for an attacker to detect what information the probe is collecting, or even the fact that a probe has been deployed. However, it should be appreciated that aspects of the present disclosure are not limited to such targeted deployment of probes, as in some embodiments a probe may be deployed to every interaction, or one or more probes may be deployed in a targeted fashion, while one or more other probes may be deployed to every interaction.
  • In some embodiments, a probe may use markup (e.g., image tag) already present on a web page to perform one or more functions. For example, any markup that requires a user agent to perform a computational action may be used as a probe. Additionally, or alternatively, a probe may include additional markup, JavaScript, and/or Ajax calls to a server. Some non-limiting examples of probes are described below.
      • IsRealJavaScript
        • One or more JavaScript statements to perform a function may be included on a web page, where a result of executing the function is to be sent back to a server. If the result is not received or is received but not correct, it may be determined that the client is not running a real JavaScript engine.
      • IsRunningHeadlessBrowser
        • A widget may be programmed to request graphics card information, viewport information, and/or window information (e.g. window.innerHeight, document.body.clientWidth, etc.). Additionally, or alternatively, the widget may be programmed to watch for mouse movement inside a form. If one or more results are missing or anomalous, it may be determined that the client is running a headless browser.
      • IsCookieEnabled
        • One or more cookies with selected names and values may be set in a user's browser, where the one or more values are to be sent back to a server. If the one or more values are not received, it may be determined that the browser is not allowing cookies.
      • IsUASpoofing
        • One or more JavaScript statements that behave in a certain recognizable manner for a purported browser type and/or version may be included. If the expected anomalous behavior is not seen, it may be determined that the user agent is being spoofed.
      • IsDevicelDSpoofing
        • The inventors have recognized and appreciated that a device ID may be a dynamic combination of certain elements (e.g., relating to browser and/or hardware characteristics). A formula for deriving the device ID may be altered during a probe (e.g., increasing/decreasing length, and/or adding/omitting one or more elements). If the newly derived device ID is not as expected, it may be determined that the device ID is being spoofed.
      • IsReadinglDs
        • One or more values may be modified one or more times during a digital interaction. For example, one or more system form IDs may be modified before being delivered as HTML, and again after associated JavaScript code loads. Depending on which version of the one or more IDs is obtained by an attacker, a security system may deduce when in a transaction cycle the attacker is reading the one or more IDs.
      • IsFabricatinglnputBehavior
        • Software code for a widget may be randomly modified to use different symbols for key and/or mouse events. If one or more symbols do not match, it may be determined that the input data has been fabricated.
      • IsReferencingSystemJS
        • One or more system JavaScript functions may be duplicated and hidden, and one or more alarms may be added to one or more original system functions. If an alarm is triggered, it may be determined that a third party is invoking a system function.
      • IsReplayingGUlMouseEvents
        • An Ajax response may be altered to include a Document Object Model (DOM) manipulation instruction to manipulate a GUI field or object. As one example, the DOM manipulation instruction may move a GUI field (e.g., a required field such as a submit button) to a different location in a GUI, and place an invisible yet fully functional field (e.g., another submit button) at the original location. If a form is submitted using the invisible field, it may be determined that the GUI events are a result of a replay or macro. As another example, the GUI field may be moved, but there may be no replacement field at the original location. If a “click” event nonetheless occurs at the original location, it may be determined that the GUI events are a result of a replay or macro. As yet another example, the DOM manipulation instruction may replace a first GUI field (e.g., a “Submit” button) with a second GUI field of the same type (e.g., a “Submitl” button). A human user completing the form legitimately may click the second GUI field, which is visible. A bot completing the form using a replay script may instead “click” the first GUI field, which is invisible.
      • IsReplayingGUlKeyEvents
        • Similar to IsReplayingGUIMouseEvents, this probe may hide a text input field, and place a differently named field at the original location. If the invisible field receives a key event, it may be determined that the event is a replay.
        • IsReplayingRecordedAjaxCalls
        • This probe may change an endpoint address of an Ajax call. If an old address is used for an Ajax call, it may be determined that the Ajax call is a replay.
      • IsAssumingAjaxBehavior
        • This probe may instruct a client to make multiple Ajax calls and/or one or more delayed Ajax calls. If an unexpected Ajax behavior pattern is observed, it may be determined that an attacker is fabricating an Ajax behavior pattern.
  • Although several examples of probes are discussed above, it should be appreciated that aspects of the present disclosure are not limited to the use of any one probe or combination of probes, or any probe at all. For instance, in some embodiments, a probe may be deployed and one or more results of the probe may be logged (e.g., in association with a segment identifier and/or alongside one or more metrics associated with the segment identifier). Such a result may be used to determine if a subsequent probe is to be deployed. Additionally, or alternatively, such a result may be used to facilitate scoring and/or classifying future digital interactions.
  • In one example, a same form of input pattern may be observed several times in a short window of time, which may represent an anomalously high rate. Additionally, it may be observed that a same user agent is involved in all or a significant portion of the digital interactions exhibiting the suspicious input pattern. This may indicate a potential high volume automated attack, and may cause one or more probes to be deployed to obtain more information about a potential automation method.
  • In some embodiments, multiple security probes may be deployed, where each probe may be designed to discover different information. For example, information collected by a probe may be used by a security system to inform the decision of which one or more other probes to deploy next. In this manner, the security system may be able to gain an in-depth understanding into network traffic (e.g., website and/or application traffic). For instance, the security system may be able to classify traffic in ways that facilitate identification of malicious traffic, define with precision what type of attack is being observed, and/or discover that some suspect behavior is actually legitimate. These results may indicate not only a likelihood that certain traffic is malicious, but also a likely type of malicious traffic. Therefore, such results may be more meaningful than just a numeric score. For instance, if multiple probe results indicate a digital interaction is legitimate, a determination may be made that an initial identification of the digital interaction as being suspicious may be a false positive identification.
  • FIG. 19 shows an illustrative process 1900 for dynamically deploying multiple security probes, in accordance with some embodiments. Like the illustrative process 1700 of FIG. 17, the process 1900 may be performed by a security system (e.g., the illustrative security system 14 shown in FIG. 1B) to determine if and when to deploy one or more security probes.
  • Acts 1905, 1910, 1915, and 1920 of the process 1900 may be similar to acts 1705, 1710, 1715, and 1720 of the process 1700, respectively. At act 1925, the security system may analyze data collected by a probe of a first type (e.g., Probe 1) deployed at act 1720 to determine what type of probe to further deploy to the digital interaction. For example, if a result of Probe 1 is positive (e.g., a suspicious pattern is identified), a probe of a second type (e.g., Probe 2) may be deployed at act 1930 to further investigate the digital interaction. At act 1940, the security system may analyze data collected by Probe 2 to determine what, if any, action may be appropriate.
  • If instead the result of Probe 1 is negative (e.g., no suspicious pattern identified) at act 1925, a probe of a third type (e.g., Probe 3) may be deployed at act 1935 to further investigate the digital interaction. At act 1945, the security system may analyze data collected by Probe 3 to determine what, if any, action may be appropriate.
  • As an example, a first probe may be deployed to verify if the client is running JavaScript. This probe may include a JavaScript snippet, and may be deployed only in one or a small number of suspicious interactions, to make it more difficult for an attacker to detect the probe. If a result of the first probe indicates that the client is running JavaScript, the security system may determine that an attacker may be employing some type of GUI macro, and a subsequent probe may be sent to confirm this hypothesis (e.g., by altering a layout of a form). If a result of the first probe indicates that the client is not running JavaScript, the security system may determine that an attacker may be employing some type of CLI script, and a subsequent probe may be sent to further discover one or more script capabilities and/or methods used to spoof form input. This decision-making pattern may be repeated until all desired information has been collected about the potential attack.
  • It should be appreciated that aspects of the present disclosure are not limited to the use of the illustrative decision process described above. For instance, FIG. 20 shows an example of a decision tree that may be used by a security system to determine whether to deploy a probe and/or which one or more probes are to be deployed, in accordance with some embodiments.
  • In some embodiments, some or all JavaScript code may be obfuscated before being sent to a client. For instance, one or more obfuscation techniques may be used to hide logic for one or more probes. Examples of such techniques include, but are not limited to, symbol renaming and/or re-ordering, code minimization, logic shuffling, and fabrication of meaningless logic (e.g., additional decision and control statements that are not required for the probe to function as intended). The inventors have recognized and appreciated that one or more of these and/or other techniques may be applied so that a total amount of code (e.g., in terms of number of statements and/or number of characters) does not increase significantly despite the inclusion of one or more probes, which may reduce the likelihood of an attacker discovering a probe. However, it should be appreciated that aspects of the present disclosure are not limited to the use of any probe obfuscation technique.
  • Some security systems use threshold-type sensors to trigger actions. For instance, a sensor may be set up to monitor one or more attributes of an entity and raise an alert when a value of an attribute falls above or below an expected threshold. Similarly, an expected range may be used, and an alert may be raised when the value of the attribute falls outside the expected range. The threshold or range may be determined manually by one or more data scientists, for example, by analyzing a historical data set to identify a set of acceptable values and setting the threshold or range based on the acceptable values.
  • The inventors have recognized and appreciated some disadvantages of the above-described approach for tuning sensors. For example:
      • The above-described approach assumes that a historical data set already exists or will be collected. Depending on the volume of digital interactions, it may take a month or more to collect a data set of an appropriate sample size.
      • In some instances, significant processing and modeling may be performed on the dataset, which may take more than one week.
      • The analysis of the historical data set may require a significant amount of human involvement.
  • Accordingly, in some embodiments, a security system is provided that is programed to monitor one or more digital interactions and tune a sensor based on data collected from the digital interactions. Such monitoring and tuning may be performed with or without human involvement. In some embodiments, the monitoring and tuning may be performed in real time, which may allow the security system to react to an attack as soon as the attack is suspected, rather than waiting for data to be accumulated and analyzed over several weeks. In this manner, one or more actions may be taken while the attack is still on-going to stop the attack and/or control damages. However, it should be appreciated that real time tuning is not required, as data may alternatively, or additionally, be accumulated and analyzed after the attack.
  • In some embodiments, a security system may be configured to use one or more sensors to collect data from one or more digital interactions. The system may analyze the collected data to identify a baseline of expected behavior, and then use the identified baseline to tune the one or more sensors, thereby providing a feedback loop. For example, in some embodiments, the system may accumulate the data collected by the one or more sensors over time and use the accumulated data to build a model of baseline behavior.
  • In some embodiments, data collected by one or more sensors may be segmented. The inventors have recognized and appreciated that segmentation may allow a security system to deal with large amounts of data more efficiently. For instance, the security system may group observed entities and/or digital interactions into buckets based on certain shared characteristics. As one example, each entity or digital interaction may be associated with one of several buckets based on a typing speed detected for the entity or digital interaction. The buckets may be chosen in any suitable manner. For instance, more buckets may be used when finer-grained distinctions are desirable. In one example, an entity or digital interaction may be associated with one of four different buckets based on typing speed: 0-30 words per minute, 31-60 words per minute, 61-90 words per minute, and 90+ words per minute. Other configurations of buckets are also possible, as aspects of the present disclosure are not limited to the use of any particular configuration. Also, it should be appreciated that segmentation may be performed on any type of measurements, including, but not limited to, typing speed, geo-location, user agent, and/or device ID.
  • In some embodiments, data collected by one or more sensors may be quantized to reduce the number of possible values for a particular attribute, which may allow a security system to analyze the data more efficiently. In some embodiments, quantization may be performed using a hash-modding process, which may involve hashing an input value and performing a modulo operation on the resulting hash value. However, it should be appreciated that aspects of the present disclosure are not limited to the use of hash-modding, as other quantization methods may also be suitable.
  • In some embodiments, a hashing technique may be used that produces a same hash value every time given a same input value, and the hash value may be such that it is difficult to reconstruct the input from the hash value alone. Such a hash function may allow comparison of attribute values without exposing actual data. For example, a security system may hash a credit card number to produce an alphanumeric string such as the following:
  • 12KAY8XOOW0881PWBM81KJCUYPDXHG
  • If hashed again in the future, the same credit card number may produce the same hash value. Furthermore, the hash function may be selected such that so that no two inputs are mapped to the same hash value, or the number of such pairs is small. As a result, the likelihood of two different credit card numbers producing the same hash value may be low, and the security system may be able to verify if a newly submitted credit card number is the same as a previously submitted credit card number by simply computing a hash value of newly submitted credit card number and comparing the computed hash value against a stored hash value of the previously submitted credit card number, without having to store the previously submitted credit card number.
  • The inventors have recognized and appreciated that a hash function may be used to convert input data (including non-numerical input data) into numerical values, while preserving a distribution of the input data. For example, a distribution of output hash values may approximate the distribution of the input data.
  • In some embodiments, a modulo operation (e.g., mod M, where M is a large number) may be applied to a numerical value resulting from hashing or otherwise converting an input value. This may reduce a number of possible output values (e.g., to M, if the modulo operation is mod M). Some information on the distribution of the input data may be lost, as multiple input values may be mapped to the same number under the modulo operation. However, the inventors have recognized and appreciated that sufficient information may be retained for purposes of detecting anomalies.
  • In some embodiments, a hash-modding process may be applied in analyzing network addresses. The addresses may be physical addresses and/or logical addresses, as aspects of the present disclosure are not limited to the use of hash-modding to analyze any particular type of input data. The inventors have recognized and appreciated that some network addresses are long. For example, an Internet Protocol version 4 (IPv4) address may include 32 bits, while an Internet Protocol version 6 (IPv6) address may include 128 bits (e.g., eight groups of four hexadecimal digits). The inventors have recognized and appreciated that comparing such addresses against each other (e.g., comparing a currently observed address against a set of previously observed addresses) may require a significant amount of time and/or processing power. Therefore, it may be beneficial to reduce the length of each piece of data to be compared, while preserving the salient information contained in the data.
  • In one illustrative example, the following IP addresses may be observed.
  • 22.231.113.64
  • 194.66.82.11
  • These addresses may be hashed to produce the following values, respectively.
  • 9678a5be1599cb7e9ea7174aceb6dc93
  • 6afd70b94d389a30cb34fb7f884e9941
  • In some embodiments, instead of comparing the input IP addresses against each other, or the hash values against each other, a security system may only compare portions of the hash values. For instance, a security system may extract one or more digits from each hash value, such as one or more least significant digits (e.g., one, two, three, four, five, six, seven, eight, nine, ten, etc.), and compare the extracted digits. In the above example, two least significant digits may be extracted from each hash value, resulting in the values 93 and 41, respectively. It may be more efficient to compare 93 against 41, as opposed to comparing 22.231.113.64 against 194.66.82.11.
  • The extraction of one or more least significant digits may be equivalent to a modulo operation. For example, extracting one least significant hexadecimal digit may be equivalent to mod 16, extracting two least significant hexadecimal digits may be equivalent to mod 256, etc. However, it should be appreciated that aspects of the present disclosure are not limited to the use of base-16 numbers, as one or more other numeral systems (e.g., base 2, base 8, base 10, base 64, etc.) may be used instead of, or in addition to, base 16.
  • The inventors have recognized and appreciated that, if the extracted digits for two input IP addresses are different, the security system may infer, with 100% confidence, that the two input IP addresses are different. Thus, hash-modding may provide an efficient way to confirm that two input IP addresses are different. The inventors have further recognized and appreciated that, if the extracted digits for two input IP addresses are same, the security system may infer, with some level of confidence, that the two input IP addresses are the same.
  • In some embodiments, a level of confidence that two input IP addresses are the same may be increased by extracting and comparing more digits. For instance, in response to determining that the extracted digits for two input IP addresses are same, two more digits may be extracted from each input IP address and compared. This may be repeated until a suitable stopping condition is reached, for example, if the newly extracted digits are different, or some threshold number of digits have been extracted. The threshold number may be selected to provide a desired level of confidence that the two input IP addresses are the same. In this manner, additional processing to extract and compare more digits may be performed only if the processing that has been done does not yield a definitive answer. This may provide improved efficiency. However, it should be appreciated that aspects of the present disclosure are not limited to extracting and comparing digits in two-digit increments, as in some embodiments extraction and comparison may be performed in one-digit increments, three-digit increments, four-digit increments, etc., or in some non-uniform manner. Furthermore, in some embodiments, all digits may be extracted and compared at once, with no incremental processing.
  • The inventors have recognized and appreciated that observed IP addresses may cluster around certain points. For instance, a collection of IP address may share a certain prefix. An example of clustered addresses is shown below:
  • 1.22.231.113.64
  • 1.22.231.113.15
  • 1.22.231.113.80
  • 1.22.231.113.80
  • 1.22.231.113.52
  • The inventors have further recognized and appreciated that, by hashing IP addresses, the observations may be spread more evenly across a number line. For example, the following three addresses may be spread out after hashing, even though they share nine out of eleven digits.
  • 1.22.231.113.64
  • 1.22.231.113.15
  • 1.22.231.113.52
  • On the other hand, the following two addresses may be hashed to the same value because they are identical, and that hash value may be spaced apart from the hash values for the above three addresses.
  • 1.22.231.113.80
  • 1.22.231.113.80
  • In some embodiments, IP addresses may be hashed into a larger space, for example, to spread out the addresses more evenly, and/or to decrease the likelihood of collisions. For instance, a 32-bit IPv4 address may be hashed into a 192-bit value, and likewise for a 128-bit IPv6 address. However, it should be appreciated that aspects of the present disclosure are not limited to the use of 192-bit hash values. Moreover, any suitable hash function may be used, including, but not limited to, MD5, MD6, SHA-1, SHA-2, SHA-3, etc.
  • In some embodiments, hash-modding may be used to analyze any suitable type of input data, in addition to, or instead of, IP addresses. The inventors have recognized and appreciated that hash-modding may provide a variable resolution with variable accuracy, which may allow storage requirement and/or efficiency to be managed. For instance, in some embodiments, a higher resolution (e.g., extracting and comparing more digits) may provide more certainty about an observed behavior, but even a lower resolution may provide sufficient information to label the observed behavior. For example, even with a relatively low resolution of 10 bits (and thus 210=1024 possible output values), a security system may be able to differentiate, with a reasonable level of certainty, whether a user is typing the same password 10 times, or trying 10 different passwords, because the likelihood of 10 randomly chosen passwords all having the same last 10 bits after hash-modding may be sufficiently low.
  • Although various techniques are described above for modeling any type of input data as a numerical data set, it should be appreciated that such examples are provided solely for purposes of illustration, and that other implementations may be possible. For instance, although a hash function may be used advantageously to anonymize input data, one or more other functions (e.g., a one-to-one function with numerical output values) may, alternatively, or additionally, be used to convert input data. Moreover, in some embodiments, a modulo operation may be performed directly on an input, without first hashing the input (e.g., where the input is already a numerical value). However, it should be appreciated that aspects of the present disclosure are not limited to the use of a modulo operation. One or more other techniques for dividing numerical values into buckets may be used instead of, or in addition to, a modulo operation.
  • In some embodiments, a security system may create a feedback loop to gain greater insight into historical trends. For example, the system may adapt a baseline for expected behavior and/or anomalous behavior (e.g., thresholds for expected and/or anomalous values) based on current population data and/or historical data. Thus, a feedback loop may allow the system to “teach” itself what an anomaly is by analyzing historical data.
  • As one example, a system may determine from historical data that a particular user agent is associated with a higher risk for fraud, and that the user agent makes up only a small percentage (e.g., 1%) of total traffic. If the system detects a dramatic increase in the percentage of traffic involving that user agent in a real-time data stream, the system may determine that a large-scale fraud attack is taking place. The system may continually update an expected percentage of traffic involving the user agent based on what the system observes over time. This may help to avoid false positives (e.g., resulting from the user agent becoming more common among legitimate digital interactions) and/or false negatives (e.g., resulting from the user agent becoming less common among legitimate digital interactions).
  • As another example, the system may determine from historical data that a vast majority of legitimate digital interactions have a recorded typing speed between 30 and 80 words per minute. If the system detects that a large number of present digital interactions have an improbably high typing speed, the system may determine that a large-scale fraud attack is taking place. The system may continually update an expected range of typing speed based on what the system observes over time. For example, at any given point in time, the expected range may be determined as a range that is centered at an average (e.g., mean, median, or mode) and just large enough to capture a certain percentage of all observations (e.g., 95%, 98%, 99%, etc.). Other techniques for determining an expected range may also be used, as aspects of the present disclosure are not limited to any particular manner of implementation.
  • It should be appreciated that a historical baseline may change for any number of legitimate reasons. For instance, the release of a new browser version may change the distribution of user agents Likewise, a shift in site demographics or username/password requirements may change the mean typing speed. By continually analyzing incoming observations, the system may be able to redraw the historical baseline to reflect any “new normal.” In this manner, the system may be able to adapt itself automatically and with greater accuracy and speed than a human analyst.
  • FIG. 21 shows, schematically, an illustrative computer 5000 on which any aspect of the present disclosure may be implemented. In the embodiment shown in FIG. 21, the computer 5000 includes a processing unit 5001 having one or more processors and a non-transitory computer-readable storage medium 5002 that may include, for example, volatile and/or non-volatile memory. The memory 5002 may store one or more instructions to program the processing unit 5001 to perform any of the functions described herein. The computer 5000 may also include other types of non-transitory computer-readable medium, such as storage 5005 (e.g., one or more disk drives) in addition to the system memory 5002. The storage 5005 may also store one or more application programs and/or external components used by application programs (e.g., software libraries), which may be loaded into the memory 5002.
  • The computer 5000 may have one or more input devices and/or output devices, such as devices 5006 and 5007 illustrated in FIG. 21. These devices can be used, among other things, to present a user interface. Examples of output devices that can be used to provide a user interface include printers or display screens for visual presentation of output and speakers or other sound generating devices for audible presentation of output. Examples of input devices that can be used for a user interface include keyboards and pointing devices, such as mice, touch pads, and digitizing tablets. As another example, the input devices 5007 may include a microphone for capturing audio signals, and the output devices 5006 may include a display screen for visually rendering, and/or a speaker for audibly rendering, recognized text.
  • As shown in FIG. 21, the computer 5000 may also comprise one or more network interfaces (e.g., the network interface 5010) to enable communication via various networks (e.g., the network 5020). Examples of networks include a local area network or a wide area network, such as an enterprise network or the Internet. Such networks may be based on any suitable technology and may operate according to any suitable protocol and may include wireless networks, wired networks or fiber optic networks.
  • Having thus described several aspects of at least one embodiment, it is to be appreciated that various alterations, modifications, and improvements will readily occur to those skilled in the art. Such alterations, modifications, and improvements are intended to be within the spirit and scope of the present disclosure. Accordingly, the foregoing description and drawings are by way of example only.
  • The above-described embodiments of the present disclosure can be implemented in any of numerous ways. For example, the embodiments may be implemented using hardware, software or a combination thereof. When implemented in software, the software code can be executed on any suitable processor or collection of processors, whether provided in a single computer or distributed among multiple computers.
  • Also, the various methods or processes outlined herein may be coded as software that is executable on one or more processors that employ any one of a variety of operating systems or platforms. Additionally, such software may be written using any of a number of suitable programming languages and/or programming or scripting tools, and also may be compiled as executable machine language code or intermediate code that is executed on a framework or virtual machine.
  • In this respect, the concepts disclosed herein may be embodied as a non-transitory computer-readable medium (or multiple computer-readable media) (e.g., a computer memory, one or more floppy discs, compact discs, optical discs, magnetic tapes, flash memories, circuit configurations in Field Programmable Gate Arrays or other semiconductor devices, or other non-transitory, tangible computer storage medium) encoded with one or more programs that, when executed on one or more computers or other processors, perform methods that implement the various embodiments of the present disclosure discussed above. The computer-readable medium or media can be transportable, such that the program or programs stored thereon can be loaded onto one or more different computers or other processors to implement various aspects of the present disclosure as discussed above.
  • The terms “program” or “software” are used herein to refer to any type of computer code or set of computer-executable instructions that can be employed to program a computer or other processor to implement various aspects of the present disclosure as discussed above. Additionally, it should be appreciated that according to one aspect of this embodiment, one or more computer programs that when executed perform methods of the present disclosure need not reside on a single computer or processor, but may be distributed in a modular fashion amongst a number of different computers or processors to implement various aspects of the present disclosure.
  • Computer-executable instructions may be in many forms, such as program modules, executed by one or more computers or other devices. Generally, program modules include routines, programs, objects, components, data structures, etc. that perform particular tasks or implement particular abstract data types. Typically the functionality of the program modules may be combined or distributed as desired in various embodiments.
  • Also, data structures may be stored in computer-readable media in any suitable form. For simplicity of illustration, data structures may be shown to have fields that are related through location in the data structure. Such relationships may likewise be achieved by assigning storage for the fields with locations in a computer-readable medium that conveys relationship between the fields. However, any suitable mechanism may be used to establish a relationship between information in fields of a data structure, including through the use of pointers, tags or other mechanisms that establish relationship between data elements.
  • Various features and aspects of the present disclosure may be used alone, in any combination of two or more, or in a variety of arrangements not specifically discussed in the embodiments described in the foregoing and is therefore not limited in its application to the details and arrangement of components set forth in the foregoing description or illustrated in the drawings. For example, aspects described in one embodiment may be combined in any manner with aspects described in other embodiments.
  • Also, the concepts disclosed herein may be embodied as a method, of which an example has been provided. The acts performed as part of the method may be ordered in any suitable way. Accordingly, embodiments may be constructed in which acts are performed in an order different than illustrated, which may include performing some acts simultaneously, even though shown as sequential acts in illustrative embodiments.
  • Use of ordinal terms such as “first,” “second,” “third,” etc. in the claims to modify a claim element does not by itself connote any priority, precedence, or order of one claim element over another or the temporal order in which acts of a method are performed, but are used merely as labels to distinguish one claim element having a certain name from another element having a same name (but for use of the ordinal term) to distinguish the claim elements.
  • Also, the phraseology and terminology used herein is for the purpose of description and should not be regarded as limiting. The use of “including,” “comprising,” “having,” “containing,” “involving,” and variations thereof herein, is meant to encompass the items listed thereafter and equivalents thereof as well as additional items.

Claims (30)

What is claimed is:
1. A computer-implemented method for analyzing a plurality of digital interactions, the method comprising acts of:
(A) identifying a plurality of values of an attribute, each value of the plurality of values corresponding respectively to a digital interaction of the plurality of digital interactions;
(B) dividing the plurality of values into a plurality of buckets;
(C) for at least one bucket of the plurality of buckets, determining a count of values from the plurality of values that fall within the at least one bucket;
(D) comparing the count of values from the plurality of values that fall within the at least one bucket against historical information regarding the attribute; and
(E) determining whether the attribute is anomalous based at least in part on a result of the act (D).
2. The method of claim 1, wherein:
each value of the plurality of values comprises a time measurement between a first point and a second point in the corresponding digital interaction; and
each bucket of the plurality of buckets comprises a range of time measurements.
3. The method of claim 1, wherein:
the act (B) of dividing the plurality of values into a plurality of buckets comprises applying a hash-modding operation to each value of the plurality of values; and
each bucket of the plurality of buckets corresponds to a residue of the hash-modding operation.
4. The method of claim 3, further comprising acts of:
(F) recording a plurality of observations with respect to the attribute, each observation of the plurality of observations being recorded from a corresponding digital interaction of the plurality of digital interactions; and
(G) deriving each value of the plurality of values based on the observation recorded from the corresponding digital interaction.
5. The method of claim 1, wherein the historical information regarding the attribute comprises an expected count for the at least one bucket, and wherein the act (D) comprises:
comparing the count of values from the plurality of values that fall within the at least one bucket against the expected count for the at least one bucket.
6. The method of claim 5, wherein the act (E) comprises:
determining if the count of values from the plurality of values that fall within the at least one bucket exceeds the expected count for the at least one bucket by at least a selected threshold amount, wherein the attribute is determined to be anomalous in response to determining that the count of values from the plurality of values that fall within the at least one bucket exceeds the expected count for the at least one bucket by at least the selected threshold amount.
7. The method of claim 5, wherein:
the plurality of digital interactions comprises a plurality of first digital interactions observed from a first time period;
the plurality of values comprises a plurality of first values of the attribute;
dividing a plurality of second values of the attribute into the plurality of buckets, each value of the plurality of second values corresponding respectively to a digital interaction of a plurality of second digital interactions;
the expected count for the at least one bucket comprises a count of values from the plurality of second values that fall within the at least one bucket;
the plurality of second digital interactions were observed from a second time period, the second time period having a same length as the first time period; and
the first time period occurs after the second time period.
8. The method of claim 5, wherein the plurality of buckets comprises a plurality of first buckets, and wherein the method further comprises acts of:
determining if the count of values from the plurality of values that fall within the at least one bucket exceeds the expected count for the at least one bucket by at least a selected threshold amount; and
in response to determining that the count of values that fall within the at least one bucket exceeds the expected count for the at least one bucket by at least the selected threshold amount, dividing the plurality of values into a plurality of second buckets, wherein there are more second buckets than first buckets.
9. The method of claim 1, wherein the historical information regarding the attribute comprises an expected ratio for the at least one bucket, and wherein the act (D) comprises:
determining a ratio between the count of values from the plurality of values that fall within the at least one bucket, and a total count of values from the plurality of values; and
comparing the ratio against the expected ratio for the at least one bucket.
10. The method of claim 1, further comprising acts of:
selecting a plurality of attributes, the plurality of attributes comprising the attribute, wherein acts (A)-(E) are performed for each attribute of the plurality of attributes; and
storing, in a profile, information regarding one or more attributes that are determined to be anomalous.
11. A system comprising at least one processor and at least one computer-readable storage medium having stored thereon instructions which, when executed, program the at least one processor to perform a method for analyzing a plurality of digital interactions, the method comprising acts of:
(A) identifying a plurality of values of an attribute, each value of the plurality of values corresponding respectively to a digital interaction of the plurality of digital interactions;
(B) dividing the plurality of values into a plurality of buckets;
(C) for at least one bucket of the plurality of buckets, determining a count of values from the plurality of values that fall within the at least one bucket;
(D) comparing the count of values from the plurality of values that fall within the at least one bucket against historical information regarding the attribute; and
(E) determining whether the attribute is anomalous based at least in part on a result of the act (D).
12. The system of claim 11, wherein:
each value of the plurality of values comprises a time measurement between a first point and a second point in the corresponding digital interaction; and
each bucket of the plurality of buckets comprises a range of time measurements.
13. The system of claim 11, wherein:
the act (B) of dividing the plurality of values into a plurality of buckets comprises applying a hash-modding operation to each value of the plurality of values; and
each bucket of the plurality of buckets corresponds to a residue of the hash-modding operation.
14. The system of claim 13, wherein the method further comprises acts of:
(F) recording a plurality of observations with respect to the attribute, each observation of the plurality of observations being recorded from a corresponding digital interaction of the plurality of digital interactions; and
(G) deriving each value of the plurality of values based on the observation recorded from the corresponding digital interaction.
15. The system of claim 11, wherein the historical information regarding the attribute comprises an expected count for the at least one bucket, and wherein the act (D) comprises:
comparing the count of values from the plurality of values that fall within the at least one bucket against the expected count for the at least one bucket.
16. The system of claim 15, wherein the act (E) comprises:
determining if the count of values from the plurality of values that fall within the at least one bucket exceeds the expected count for the at least one bucket by at least a selected threshold amount, wherein the attribute is determined to be anomalous in response to determining that the count of values from the plurality of values that fall within the at least one bucket exceeds the expected count for the at least one bucket by at least the selected threshold amount.
17. The system of claim 15, wherein:
the plurality of digital interactions comprises a plurality of first digital interactions observed from a first time period;
the plurality of values comprises a plurality of first values of the attribute;
dividing a plurality of second values of the attribute into the plurality of buckets, each value of the plurality of second values corresponding respectively to a digital interaction of a plurality of second digital interactions;
the expected count for the at least one bucket comprises a count of values from the plurality of second values that fall within the at least one bucket;
the plurality of second digital interactions were observed from a second time period, the second time period having a same length as the first time period; and
the first time period occurs after the second time period.
18. The system of claim 15, wherein the plurality of buckets comprises a plurality of first buckets, and wherein the method further comprises acts of:
determining if the count of values from the plurality of values that fall within the at least one bucket exceeds the expected count for the at least one bucket by at least a selected threshold amount; and
in response to determining that the count of values that fall within the at least one bucket exceeds the expected count for the at least one bucket by at least the selected threshold amount, dividing the plurality of values into a plurality of second buckets, wherein there are more second buckets than first buckets.
19. The system of claim 11, wherein the historical information regarding the attribute comprises an expected ratio for the at least one bucket, and wherein the act (D) comprises:
determining a ratio between the count of values from the plurality of values that fall within the at least one bucket, and a total count of values from the plurality of values; and
comparing the ratio against the expected ratio for the at least one bucket.
20. The system of claim 11, wherein the method further comprises acts of:
selecting a plurality of attributes, the plurality of attributes comprising the attribute, wherein acts (A)-(E) are performed for each attribute of the plurality of attributes; and
storing, in a profile, information regarding one or more attributes that are determined to be anomalous.
21. At least one computer-readable storage medium having stored thereon instructions which, when executed, program at least one processor to perform a method for analyzing a plurality of digital interactions, the method comprising acts of:
(A) identifying a plurality of values of an attribute, each value of the plurality of values corresponding respectively to a digital interaction of the plurality of digital interactions;
(B) dividing the plurality of values into a plurality of buckets;
(C) for at least one bucket of the plurality of buckets, determining a count of values from the plurality of values that fall within the at least one bucket;
(D) comparing the count of values from the plurality of values that fall within the at least one bucket against historical information regarding the attribute; and
(E) determining whether the attribute is anomalous based at least in part on a result of the act (D).
22. The at least one computer-readable storage medium of claim 21, wherein:
each value of the plurality of values comprises a time measurement between a first point and a second point in the corresponding digital interaction; and
each bucket of the plurality of buckets comprises a range of time measurements.
23. The at least one computer-readable storage medium of claim 21, wherein:
the act (B) of dividing the plurality of values into a plurality of buckets comprises applying a hash-modding operation to each value of the plurality of values; and
each bucket of the plurality of buckets corresponds to a residue of the hash-modding operation.
24. The at least one computer-readable storage medium of claim 23, wherein the method further comprises acts of:
(F) recording a plurality of observations with respect to the attribute, each observation of the plurality of observations being recorded from a corresponding digital interaction of the plurality of digital interactions; and
(G) deriving each value of the plurality of values based on the observation recorded from the corresponding digital interaction.
25. The at least one computer-readable storage medium of claim 21, wherein the historical information regarding the attribute comprises an expected count for the at least one bucket, and wherein the act (D) comprises:
comparing the count of values from the plurality of values that fall within the at least one bucket against the expected count for the at least one bucket.
26. The at least one computer-readable storage medium of claim 25, wherein the act (E) comprises:
determining if the count of values from the plurality of values that fall within the at least one bucket exceeds the expected count for the at least one bucket by at least a selected threshold amount, wherein the attribute is determined to be anomalous in response to determining that the count of values from the plurality of values that fall within the at least one bucket exceeds the expected count for the at least one bucket by at least the selected threshold amount.
27. The at least one computer-readable storage medium of claim 25, wherein:
the plurality of digital interactions comprises a plurality of first digital interactions observed from a first time period;
the plurality of values comprises a plurality of first values of the attribute;
dividing a plurality of second values of the attribute into the plurality of buckets, each value of the plurality of second values corresponding respectively to a digital interaction of a plurality of second digital interactions;
the expected count for the at least one bucket comprises a count of values from the plurality of second values that fall within the at least one bucket;
the plurality of second digital interactions were observed from a second time period, the second time period having a same length as the first time period; and
the first time period occurs after the second time period.
28. The at least one computer-readable storage medium of claim 25, wherein the plurality of buckets comprises a plurality of first buckets, and wherein the method further comprises acts of:
determining if the count of values from the plurality of values that fall within the at least one bucket exceeds the expected count for the at least one bucket by at least a selected threshold amount; and
in response to determining that the count of values that fall within the at least one bucket exceeds the expected count for the at least one bucket by at least the selected threshold amount, dividing the plurality of values into a plurality of second buckets, wherein there are more second buckets than first buckets.
29. The at least one computer-readable storage medium of claim 21, wherein the historical information regarding the attribute comprises an expected ratio for the at least one bucket, and wherein the act (D) comprises:
determining a ratio between the count of values from the plurality of values that fall within the at least one bucket, and a total count of values from the plurality of values; and
comparing the ratio against the expected ratio for the at least one bucket.
30. The at least one computer-readable storage medium of claim 21, wherein the method further comprises acts of:
selecting a plurality of attributes, the plurality of attributes comprising the attribute, wherein acts (A)-(E) are performed for each attribute of the plurality of attributes; and
storing, in a profile, information regarding one or more attributes that are determined to be anomalous.
US15/256,597 2015-09-05 2016-09-04 Systems and methods for detecting and scoring anomalies Abandoned US20170070521A1 (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
US15/256,597 US20170070521A1 (en) 2015-09-05 2016-09-04 Systems and methods for detecting and scoring anomalies

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
US201562214969P 2015-09-05 2015-09-05
US15/256,597 US20170070521A1 (en) 2015-09-05 2016-09-04 Systems and methods for detecting and scoring anomalies

Publications (1)

Publication Number Publication Date
US20170070521A1 true US20170070521A1 (en) 2017-03-09

Family

ID=58187372

Family Applications (14)

Application Number Title Priority Date Filing Date
US15/256,611 Active US10212180B2 (en) 2015-09-05 2016-09-04 Systems and methods for detecting and preventing spoofing
US15/256,597 Abandoned US20170070521A1 (en) 2015-09-05 2016-09-04 Systems and methods for detecting and scoring anomalies
US15/256,607 Active US10129279B2 (en) 2015-09-05 2016-09-04 Systems and methods for detecting and preventing spoofing
US15/256,612 Active US9813446B2 (en) 2015-09-05 2016-09-04 Systems and methods for matching and scoring sameness
US15/256,617 Active US9749358B2 (en) 2015-09-05 2016-09-04 Systems and methods for matching and scoring sameness
US15/256,610 Active US9979747B2 (en) 2015-09-05 2016-09-04 Systems and methods for detecting and preventing spoofing
US15/256,616 Active US9680868B2 (en) 2015-09-05 2016-09-04 Systems and methods for matching and scoring sameness
US15/256,603 Active US9648034B2 (en) 2015-09-05 2016-09-04 Systems and methods for detecting and scoring anomalies
US15/256,600 Active US9749356B2 (en) 2015-09-05 2016-09-04 Systems and methods for detecting and scoring anomalies
US15/256,613 Active US9749357B2 (en) 2015-09-05 2016-09-04 Systems and methods for matching and scoring sameness
US15/411,805 Active US9800601B2 (en) 2015-09-05 2017-01-20 Systems and methods for detecting and scoring anomalies
US15/617,542 Active US10965695B2 (en) 2015-09-05 2017-06-08 Systems and methods for matching and scoring sameness
US15/908,228 Active 2037-07-07 US10805328B2 (en) 2015-09-05 2018-02-28 Systems and methods for detecting and scoring anomalies
US16/232,789 Active US10749884B2 (en) 2015-09-05 2018-12-26 Systems and methods for detecting and preventing spoofing

Family Applications Before (1)

Application Number Title Priority Date Filing Date
US15/256,611 Active US10212180B2 (en) 2015-09-05 2016-09-04 Systems and methods for detecting and preventing spoofing

Family Applications After (12)

Application Number Title Priority Date Filing Date
US15/256,607 Active US10129279B2 (en) 2015-09-05 2016-09-04 Systems and methods for detecting and preventing spoofing
US15/256,612 Active US9813446B2 (en) 2015-09-05 2016-09-04 Systems and methods for matching and scoring sameness
US15/256,617 Active US9749358B2 (en) 2015-09-05 2016-09-04 Systems and methods for matching and scoring sameness
US15/256,610 Active US9979747B2 (en) 2015-09-05 2016-09-04 Systems and methods for detecting and preventing spoofing
US15/256,616 Active US9680868B2 (en) 2015-09-05 2016-09-04 Systems and methods for matching and scoring sameness
US15/256,603 Active US9648034B2 (en) 2015-09-05 2016-09-04 Systems and methods for detecting and scoring anomalies
US15/256,600 Active US9749356B2 (en) 2015-09-05 2016-09-04 Systems and methods for detecting and scoring anomalies
US15/256,613 Active US9749357B2 (en) 2015-09-05 2016-09-04 Systems and methods for matching and scoring sameness
US15/411,805 Active US9800601B2 (en) 2015-09-05 2017-01-20 Systems and methods for detecting and scoring anomalies
US15/617,542 Active US10965695B2 (en) 2015-09-05 2017-06-08 Systems and methods for matching and scoring sameness
US15/908,228 Active 2037-07-07 US10805328B2 (en) 2015-09-05 2018-02-28 Systems and methods for detecting and scoring anomalies
US16/232,789 Active US10749884B2 (en) 2015-09-05 2018-12-26 Systems and methods for detecting and preventing spoofing

Country Status (8)

Country Link
US (14) US10212180B2 (en)
EP (3) EP3345117A4 (en)
CN (3) CN108885666B (en)
AU (7) AU2016334816A1 (en)
CA (3) CA2997585C (en)
IL (3) IL257849B2 (en)
SG (1) SG10201909133YA (en)
WO (3) WO2017037542A1 (en)

Cited By (63)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20170192872A1 (en) * 2014-12-11 2017-07-06 Hewlett Packard Enterprise Development Lp Interactive detection of system anomalies
US9800601B2 (en) 2015-09-05 2017-10-24 Nudata Security Inc. Systems and methods for detecting and scoring anomalies
US9842204B2 (en) 2008-04-01 2017-12-12 Nudata Security Inc. Systems and methods for assessing security risk
US9946864B2 (en) 2008-04-01 2018-04-17 Nudata Security Inc. Systems and methods for implementing and tracking identification tests
US9990487B1 (en) 2017-05-05 2018-06-05 Mastercard Technologies Canada ULC Systems and methods for distinguishing among human users and software robots
US10007776B1 (en) 2017-05-05 2018-06-26 Mastercard Technologies Canada ULC Systems and methods for distinguishing among human users and software robots
US20180219890A1 (en) * 2017-02-01 2018-08-02 Cisco Technology, Inc. Identifying a security threat to a web-based resource
US20180285854A1 (en) * 2017-03-29 2018-10-04 International Business Machines Corporation Sensory data collection in an augmented reality system
US20180302425A1 (en) * 2017-04-17 2018-10-18 Splunk Inc. Detecting fraud by correlating user behavior biometrics with other data sources
US10127373B1 (en) 2017-05-05 2018-11-13 Mastercard Technologies Canada ULC Systems and methods for distinguishing among human users and software robots
US20190020676A1 (en) * 2017-07-12 2019-01-17 The Boeing Company Mobile security countermeasures
US20190068623A1 (en) * 2017-08-24 2019-02-28 Level 3 Communications, Llc Low-complexity detection of potential network anomalies using intermediate-stage processing
US10305773B2 (en) * 2017-02-15 2019-05-28 Dell Products, L.P. Device identity augmentation
US10419268B2 (en) * 2017-01-27 2019-09-17 Bmc Software, Inc. Automated scoring of unstructured events in information technology environments
US20190306170A1 (en) * 2018-03-30 2019-10-03 Yanlin Wang Systems and methods for adaptive data collection using analytics agents
US20200019698A1 (en) * 2018-07-11 2020-01-16 Vmware, Inc. Entropy based security detection system
US10642995B2 (en) * 2017-07-26 2020-05-05 Forcepoint Llc Method and system for reducing risk score volatility
US20200233958A1 (en) * 2019-08-07 2020-07-23 Alibaba Group Holding Limited Method and system for active risk control based on intelligent interaction
US10769283B2 (en) 2017-10-31 2020-09-08 Forcepoint, LLC Risk adaptive protection
WO2020176978A1 (en) * 2019-03-01 2020-09-10 Mastercard Technologies Canada ULC Feature drift hardened online application origination (oao) service for fraud prevention systems
US10776708B2 (en) 2013-03-01 2020-09-15 Forcepoint, LLC Analyzing behavior in light of social time
US10832153B2 (en) 2013-03-01 2020-11-10 Forcepoint, LLC Analyzing behavior in light of social time
US10949428B2 (en) 2018-07-12 2021-03-16 Forcepoint, LLC Constructing event distributions via a streaming scoring operation
US11003525B2 (en) * 2019-03-23 2021-05-11 AO Kaspersky Lab System and method of identifying and addressing anomalies in a system
US11010257B2 (en) * 2018-10-12 2021-05-18 EMC IP Holding Company LLC Memory efficient perfect hashing for large records
US11025659B2 (en) 2018-10-23 2021-06-01 Forcepoint, LLC Security system using pseudonyms to anonymously identify entities and corresponding security risk related behaviors
US11025638B2 (en) 2018-07-19 2021-06-01 Forcepoint, LLC System and method providing security friction for atypical resource access requests
US11037428B2 (en) * 2019-03-27 2021-06-15 International Business Machines Corporation Detecting and analyzing actions against a baseline
US11042591B2 (en) 2015-06-23 2021-06-22 Splunk Inc. Analytical search engine
US11068798B2 (en) * 2016-02-09 2021-07-20 Upside Services, Inc Systems and methods for short identifier behavioral analytics
US11080109B1 (en) 2020-02-27 2021-08-03 Forcepoint Llc Dynamically reweighting distributions of event observations
US11080032B1 (en) 2020-03-31 2021-08-03 Forcepoint Llc Containerized infrastructure for deployment of microservices
WO2021155471A1 (en) * 2020-02-07 2021-08-12 Mastercard Technologies Canada ULC Automated web traffic anomaly detection
US11113342B2 (en) * 2015-06-23 2021-09-07 Splunk Inc. Techniques for compiling and presenting query results
US11157614B1 (en) * 2021-01-27 2021-10-26 Malwarebytes Inc. Prevention of false positive detection of malware
US11171980B2 (en) 2018-11-02 2021-11-09 Forcepoint Llc Contagion risk detection, analysis and protection
US11188550B2 (en) 2016-09-26 2021-11-30 Splunk Inc. Metrics store system
US11190589B1 (en) 2020-10-27 2021-11-30 Forcepoint, LLC System and method for efficient fingerprinting in cloud multitenant data loss prevention
US11223646B2 (en) 2020-01-22 2022-01-11 Forcepoint, LLC Using concerning behaviors when performing entity-based risk calculations
US11258655B2 (en) 2018-12-06 2022-02-22 Vmware, Inc. Holo-entropy based alarm scoring approach
US11310343B2 (en) * 2018-08-02 2022-04-19 Paul Swengler User and user device registration and authentication
US11315010B2 (en) 2017-04-17 2022-04-26 Splunk Inc. Neural networks for detecting fraud based on user behavior biometrics
US11314787B2 (en) 2018-04-18 2022-04-26 Forcepoint, LLC Temporal resolution of an entity
US11368464B2 (en) * 2019-11-28 2022-06-21 Salesforce.Com, Inc. Monitoring resource utilization of an online system based on statistics describing browser attributes
US11372956B2 (en) 2017-04-17 2022-06-28 Splunk Inc. Multiple input neural networks for detecting fraud
US11381570B2 (en) * 2019-12-20 2022-07-05 Beijing Didi Infinity Technology And Development Co., Ltd. Identity and access management dynamic control and remediation
US11411973B2 (en) 2018-08-31 2022-08-09 Forcepoint, LLC Identifying security risks using distributions of characteristic features extracted from a plurality of events
US11429697B2 (en) 2020-03-02 2022-08-30 Forcepoint, LLC Eventually consistent entity resolution
US11436512B2 (en) 2018-07-12 2022-09-06 Forcepoint, LLC Generating extracted features from an event
US11475125B2 (en) * 2019-05-01 2022-10-18 EMC IP Holding Company LLC Distribution-based aggregation of scores across multiple events
US11516206B2 (en) 2020-05-01 2022-11-29 Forcepoint Llc Cybersecurity system having digital certificate reputation system
US11516225B2 (en) 2017-05-15 2022-11-29 Forcepoint Llc Human factors framework
US11544390B2 (en) 2020-05-05 2023-01-03 Forcepoint Llc Method, system, and apparatus for probabilistic identification of encrypted files
US11568136B2 (en) 2020-04-15 2023-01-31 Forcepoint Llc Automatically constructing lexicons from unlabeled datasets
US11620180B2 (en) 2018-11-29 2023-04-04 Vmware, Inc. Holo-entropy adaptive boosting based anomaly detection
US11630901B2 (en) 2020-02-03 2023-04-18 Forcepoint Llc External trigger induced behavioral analyses
US11704387B2 (en) 2020-08-28 2023-07-18 Forcepoint Llc Method and system for fuzzy matching and alias matching for streaming data sets
US11755585B2 (en) 2018-07-12 2023-09-12 Forcepoint Llc Generating enriched events using enriched data and extracted features
US11810012B2 (en) 2018-07-12 2023-11-07 Forcepoint Llc Identifying event distributions using interrelated events
US11836265B2 (en) 2020-03-02 2023-12-05 Forcepoint Llc Type-dependent event deduplication
US11888859B2 (en) 2017-05-15 2024-01-30 Forcepoint Llc Associating a security risk persona with a phase of a cyber kill chain
US11895158B2 (en) 2020-05-19 2024-02-06 Forcepoint Llc Cybersecurity system having security policy visualization
US11928683B2 (en) 2019-10-01 2024-03-12 Mastercard Technologies Canada ULC Feature encoding in online application origination (OAO) service for a fraud prevention system

Families Citing this family (115)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US8938053B2 (en) 2012-10-15 2015-01-20 Twilio, Inc. System and method for triggering on platform usage
US11165770B1 (en) * 2013-12-06 2021-11-02 A10 Networks, Inc. Biometric verification of a human internet user
EP3215955B1 (en) 2014-11-03 2019-07-24 Level 3 Communications, LLC Identifying a potential ddos attack using statistical analysis
US9600651B1 (en) * 2015-01-05 2017-03-21 Kimbia, Inc. System and method for determining use of non-human users in a distributed computer network environment
CZ306210B6 (en) * 2015-07-07 2016-09-29 Aducid S.R.O. Method of assignment of at least two authentication devices to the account of a user using authentication server
US10671337B2 (en) 2015-09-25 2020-06-02 Oracle International Corporation Automatic sizing of agent's screen for html co-browsing applications
US11657399B1 (en) * 2015-10-06 2023-05-23 Wells Fargo Bank, N.A. Value-added services provided in a seamless life format
US10346421B1 (en) * 2015-10-16 2019-07-09 Trifacta Inc. Data profiling of large datasets
US9985980B1 (en) * 2015-12-15 2018-05-29 EMC IP Holding Company LLC Entropy-based beaconing detection
US9967275B1 (en) * 2015-12-17 2018-05-08 EMC IP Holding Company LLC Efficient detection of network anomalies
US10972333B2 (en) * 2016-03-16 2021-04-06 Telefonaktiebolaget Lm Ericsson (Publ) Method and device for real-time network event processing
US10148683B1 (en) * 2016-03-29 2018-12-04 Microsoft Technology Licensing, Llc ATO threat detection system
US10187421B2 (en) 2016-06-06 2019-01-22 Paypal, Inc. Cyberattack prevention system
US10924479B2 (en) * 2016-07-20 2021-02-16 Aetna Inc. System and methods to establish user profile using multiple channels
US10165005B2 (en) * 2016-09-07 2018-12-25 Oracle International Corporation System and method providing data-driven user authentication misuse detection
US10769608B2 (en) * 2016-09-15 2020-09-08 International Business Machines Corporation Intelligent checkout management system
US10225731B2 (en) * 2016-10-31 2019-03-05 Mastercard International Incorporated Anonymously linking cardholder information with communication service subscriber information
TWI622894B (en) * 2016-12-13 2018-05-01 宏碁股份有限公司 Electronic device and method for detecting malicious file
CN108243115B (en) * 2016-12-26 2021-06-29 新华三技术有限公司 Message processing method and device
US10645086B1 (en) * 2016-12-30 2020-05-05 Charles Schwab & Co., Inc. System and method for handling user requests for web services
US9942255B1 (en) * 2016-12-30 2018-04-10 Google Llc Method and system for detecting abusive behavior in hosted services
US10805333B2 (en) * 2017-02-27 2020-10-13 Ivanti, Inc. Systems and methods for context-based mitigation of computer security risks
US10503896B2 (en) * 2017-03-17 2019-12-10 Chronicle Llc Detecting data breaches
US10402413B2 (en) * 2017-03-31 2019-09-03 Intel Corporation Hardware accelerator for selecting data elements
US10360391B2 (en) * 2017-04-03 2019-07-23 International Business Machines Corporation Verifiable keyed all-or-nothing transform
US10038788B1 (en) * 2017-05-09 2018-07-31 Oracle International Corporation Self-learning adaptive routing system
US11394731B2 (en) * 2017-05-16 2022-07-19 Citrix Systems, Inc. Computer system providing anomaly detection within a virtual computing sessions and related methods
CN108966340B (en) * 2017-05-17 2020-10-13 腾讯科技(深圳)有限公司 Equipment positioning method and device
WO2018213325A1 (en) * 2017-05-19 2018-11-22 Liveramp, Inc. Distributed node cluster for establishing a digital touchpoint across multiple devices on a digital communications network
US10567421B2 (en) * 2017-05-24 2020-02-18 Oath Inc. Systems and methods for analyzing network data to identify human and non-human users in network communications
US11128643B2 (en) * 2017-07-17 2021-09-21 Hewlett-Packard Development Company, L.P. Activity detection based on time difference metrics
US10984427B1 (en) * 2017-09-13 2021-04-20 Palantir Technologies Inc. Approaches for analyzing entity relationships
US10601864B1 (en) * 2017-10-05 2020-03-24 Symantec Corporation Using disposable profiles for privacy in internet sessions
US11503015B2 (en) * 2017-10-12 2022-11-15 Mx Technologies, Inc. Aggregation platform portal for displaying and updating data for third-party service providers
US11184369B2 (en) * 2017-11-13 2021-11-23 Vectra Networks, Inc. Malicious relay and jump-system detection using behavioral indicators of actors
US11682016B2 (en) * 2017-11-30 2023-06-20 Mastercard International Incorporated System to perform identity verification
US10965683B1 (en) 2017-12-07 2021-03-30 Wells Fargo Bank, N.A. Login and authentication methods and systems
US11349869B1 (en) * 2017-12-22 2022-05-31 Spins Ventures Llc Network device detection and verification protocol
FR3076922A1 (en) * 2018-01-12 2019-07-19 Ingenico Group METHOD FOR DETERMINING AN ASSOCIATION BETWEEN A BANKING CARD AND A COMMUNICATION TERMINAL, DEVICE, SYSTEM AND PROGRAM THEREOF
US11941491B2 (en) 2018-01-31 2024-03-26 Sophos Limited Methods and apparatus for identifying an impact of a portion of a file on machine learning classification of malicious content
US10719613B1 (en) 2018-02-23 2020-07-21 Facebook, Inc. Systems and methods for protecting neural network weights
US10699190B1 (en) * 2018-03-04 2020-06-30 Facebook, Inc. Systems and methods for efficiently updating neural networks
US11354617B1 (en) 2018-03-12 2022-06-07 Amazon Technologies, Inc. Managing shipments based on data from a sensor-based automatic replenishment device
US11137479B1 (en) 2018-03-20 2021-10-05 Amazon Technologies, Inc. Product specific correction for a sensor-based device
US10373118B1 (en) 2018-03-21 2019-08-06 Amazon Technologies, Inc. Predictive consolidation system based on sensor data
US11144921B2 (en) * 2018-04-05 2021-10-12 The Toronto-Dominion Bank Generation and provisioning of digital tokens based on dynamically obtained contextual data
US11361011B1 (en) * 2018-04-26 2022-06-14 Amazon Technologies, Inc. Sensor-related improvements to automatic replenishment devices
US11019059B2 (en) 2018-04-26 2021-05-25 Radware, Ltd Blockchain-based admission processes for protected entities
CN110535809B (en) * 2018-05-25 2021-08-31 腾讯科技(深圳)有限公司 Identification code pulling method, storage medium, terminal device and server
GB2574209B (en) * 2018-05-30 2020-12-16 F Secure Corp Controlling Threats on a Computer System by Searching for Matching Events on other Endpoints
US20200007565A1 (en) * 2018-07-02 2020-01-02 Ebay Inc. Passive automated content entry detection system
US11157837B2 (en) 2018-08-02 2021-10-26 Sas Institute Inc. Advanced detection of rare events and corresponding interactive graphical user interface
US11017100B2 (en) * 2018-08-03 2021-05-25 Verizon Patent And Licensing Inc. Identity fraud risk engine platform
US10715471B2 (en) * 2018-08-22 2020-07-14 Synchronoss Technologies, Inc. System and method for proof-of-work based on hash mining for reducing spam attacks
US10997290B2 (en) * 2018-10-03 2021-05-04 Paypal, Inc. Enhancing computer security via detection of inconsistent internet browser versions
US11647048B2 (en) * 2018-10-03 2023-05-09 Visa International Service Association Real-time feedback service for resource access rule configuration
US11947668B2 (en) 2018-10-12 2024-04-02 Sophos Limited Methods and apparatus for preserving information between layers within a neural network
US11558409B2 (en) * 2018-10-31 2023-01-17 SpyCloud, Inc. Detecting use of passwords that appear in a repository of breached credentials
US10904616B2 (en) * 2018-11-06 2021-01-26 International Business Machines Corporation Filtering of content in near real time
CN109685670A (en) * 2018-12-13 2019-04-26 平安医疗健康管理股份有限公司 Social security violation detection method, device, equipment and computer readable storage medium
US11743290B2 (en) * 2018-12-21 2023-08-29 Fireeye Security Holdings Us Llc System and method for detecting cyberattacks impersonating legitimate sources
US10885180B2 (en) 2018-12-21 2021-01-05 Paypal, Inc. Detection of emulated computer systems using variable difficulty challenges
RU2724783C1 (en) * 2018-12-28 2020-06-25 Акционерное общество "Лаборатория Касперского" Candidate fingerprint matching and comparison system and method
CN109738220B (en) * 2019-01-07 2021-01-08 哈尔滨工业大学(深圳) Sensor optimal arrangement method based on multi-load working condition structure response correlation
US11171978B2 (en) * 2019-03-27 2021-11-09 Microsoft Technology Licensing, Llc. Dynamic monitoring, detection of emerging computer events
US11886888B2 (en) * 2019-03-28 2024-01-30 Lenovo (Singapore) Pte. Ltd. Reduced application view during loading
US20220027438A1 (en) * 2019-04-04 2022-01-27 Hewlett-Packard Development Company, L.P. Determining whether received data is required by an analytic
CN110222525B (en) * 2019-05-14 2021-08-06 新华三大数据技术有限公司 Database operation auditing method and device, electronic equipment and storage medium
US11245724B2 (en) * 2019-06-07 2022-02-08 Paypal, Inc. Spoofed webpage detection
US11550891B2 (en) * 2019-06-19 2023-01-10 Preventice Solutions, Inc. Login token management
US11178163B2 (en) * 2019-07-02 2021-11-16 Easy Solutions Enterprises Corp. Location spoofing detection using round-trip times
US10498760B1 (en) * 2019-07-16 2019-12-03 ALSCO Software LLC Monitoring system for detecting and preventing a malicious program code from being uploaded from a client computer to a webpage computer server
CN113892104A (en) 2019-07-24 2022-01-04 惠普发展公司, 有限责任合伙企业 Access management and control of peripheral devices
US11700247B2 (en) * 2019-07-30 2023-07-11 Slack Technologies, Llc Securing a group-based communication system via identity verification
CA3149824A1 (en) * 2019-08-09 2021-02-18 Mastercard Technologies Canada ULC Determining a fraud risk score associated with a transaction
CN110505232A (en) * 2019-08-27 2019-11-26 百度在线网络技术(北京)有限公司 The detection method and device of network attack, electronic equipment, storage medium
US11269987B2 (en) 2019-09-09 2022-03-08 International Business Machines Corporation Security credentials management for client applications
CN110599135A (en) * 2019-09-16 2019-12-20 腾讯科技(深圳)有限公司 Method and device for evaluating third-party payment account of user and electronic equipment
US11297084B2 (en) * 2019-09-30 2022-04-05 Mcafee, Llc Methods and apparatus to perform malware detection using a generative adversarial network
US11475515B1 (en) * 2019-10-11 2022-10-18 Wells Fargo Bank, N.A. Adverse action methodology for credit risk models
US20210136059A1 (en) * 2019-11-05 2021-05-06 Salesforce.Com, Inc. Monitoring resource utilization of an online system based on browser attributes collected for a session
US11451396B2 (en) * 2019-11-05 2022-09-20 Microsoft Technology Licensing, Llc False positive reduction in electronic token forgery detection
CN110851839B (en) * 2019-11-12 2022-03-11 杭州安恒信息技术股份有限公司 Risk-based asset scoring method and system
US20210168171A1 (en) * 2019-12-03 2021-06-03 Microsoft Technology Licensing, Llc System for Calculating Trust of Client Session(s)
US11075905B2 (en) * 2019-12-09 2021-07-27 Google Llc Requesting and transmitting data for related accounts
US11423406B2 (en) * 2019-12-16 2022-08-23 Paypal, Inc. Multi-tiered approach to detect and mitigate online electronic attacks
US11444961B2 (en) * 2019-12-20 2022-09-13 Intel Corporation Active attack detection in autonomous vehicle networks
CN110995763B (en) * 2019-12-26 2022-08-05 深信服科技股份有限公司 Data processing method and device, electronic equipment and computer storage medium
US11050587B1 (en) 2020-02-04 2021-06-29 360 It, Uab Multi-part TCP connection over VPN
US11394582B2 (en) 2020-02-04 2022-07-19 360 It, Uab Multi-part TCP connection over VPN
US11863567B2 (en) * 2020-02-04 2024-01-02 Fastly, Inc. Management of bot detection in a content delivery network
CN111581641B (en) * 2020-04-03 2022-07-29 北京大学 Lightweight WebAPI protection method and device based on Hook
CN111506819B (en) * 2020-04-24 2023-05-16 成都安易迅科技有限公司 Hardware equipment recommendation method and device, server and storage medium
US11748460B2 (en) * 2020-04-27 2023-09-05 Imperva, Inc. Procedural code generation for challenge code
TWI734456B (en) * 2020-04-29 2021-07-21 正修學校財團法人正修科技大學 Process capability evaluation method
US20210342441A1 (en) * 2020-05-01 2021-11-04 Forcepoint, LLC Progressive Trigger Data and Detection Model
US11451562B2 (en) 2020-05-01 2022-09-20 Mastercard Technologies Canada ULC Recommending signals to monitor in a fraud prevention application
US11290480B2 (en) 2020-05-26 2022-03-29 Bank Of America Corporation Network vulnerability assessment tool
CN111385313B (en) * 2020-05-28 2020-09-11 支付宝(杭州)信息技术有限公司 Method and system for verifying object request validity
US11456917B2 (en) * 2020-06-01 2022-09-27 Cisco Technology, Inc. Analyzing deployed networks with respect to network solutions
US11574071B2 (en) 2020-07-28 2023-02-07 Bank Of America Corporation Reliability of information security controls for attack readiness
US11789982B2 (en) * 2020-09-23 2023-10-17 Electronic Arts Inc. Order independent data categorization, indication, and remediation across realtime datasets of live service environments
CN112016939B (en) * 2020-10-19 2021-02-26 耀方信息技术(上海)有限公司 Automatic maintenance user system
US20220269662A1 (en) * 2021-02-22 2022-08-25 Mastercard Technologies Canada ULC Event interval approximation
US20220270598A1 (en) * 2021-02-24 2022-08-25 International Business Machines Corporation Autonomous communication initiation responsive to pattern detection
US20220353284A1 (en) * 2021-04-23 2022-11-03 Sophos Limited Methods and apparatus for using machine learning to classify malicious infrastructure
US20220366039A1 (en) * 2021-05-13 2022-11-17 Microsoft Technology Licensing, Llc Abnormally permissive role definition detection systems
US20220398307A1 (en) * 2021-06-10 2022-12-15 Armis Security Ltd. Techniques for securing network environments by identifying device types based on naming conventions
US20230007017A1 (en) * 2021-06-30 2023-01-05 Fortinet, Inc. Enforcing javascript for mitb detection
US11847598B2 (en) * 2021-08-13 2023-12-19 Edgeverve Systems Limited Method and system for analyzing process flows for a process performed by users
US20230058569A1 (en) * 2021-08-23 2023-02-23 Fortinet, Inc. Systems and methods for quantifying file access risk exposure by an endpoint in a network environment
US11461492B1 (en) * 2021-10-15 2022-10-04 Infosum Limited Database system with data security employing knowledge partitioning
TWI789271B (en) * 2022-03-16 2023-01-01 中原大學 Packet information analysis method and network traffic monitoring device
US20230385837A1 (en) * 2022-05-25 2023-11-30 Dell Products L.P. Machine learning-based detection of potentially malicious behavior on an e-commerce platform
US11693958B1 (en) * 2022-09-08 2023-07-04 Radiant Security, Inc. Processing and storing event data in a knowledge graph format for anomaly detection

Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20110247071A1 (en) * 2010-04-06 2011-10-06 Triumfant, Inc. Automated Malware Detection and Remediation
US20130024339A1 (en) * 2011-07-21 2013-01-24 Bank Of America Corporation Multi-stage filtering for fraud detection with customer history filters

Family Cites Families (130)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US5774525A (en) 1995-01-23 1998-06-30 International Business Machines Corporation Method and apparatus utilizing dynamic questioning to provide secure access control
US5892900A (en) 1996-08-30 1999-04-06 Intertrust Technologies Corp. Systems and methods for secure transaction management and electronic rights protection
US5761652A (en) * 1996-03-20 1998-06-02 International Business Machines Corporation Constructing balanced multidimensional range-based bitmap indices
US5940751A (en) 1996-06-27 1999-08-17 Cellular Technical Services Company, Inc. System and method for detection of fraud in a wireless telephone system
US6272631B1 (en) * 1997-06-30 2001-08-07 Microsoft Corporation Protected storage of core data secrets
US7403922B1 (en) 1997-07-28 2008-07-22 Cybersource Corporation Method and apparatus for evaluating fraud risk in an electronic commerce transaction
US5870752A (en) * 1997-08-21 1999-02-09 Lucent Technologies Inc. Incremental maintenance of an approximate histogram in a database system
US20050114705A1 (en) 1997-12-11 2005-05-26 Eran Reshef Method and system for discriminating a human action from a computerized action
US6829711B1 (en) 1999-01-26 2004-12-07 International Business Machines Corporation Personal website for electronic commerce on a smart java card with multiple security check points
US20010020228A1 (en) * 1999-07-09 2001-09-06 International Business Machines Corporation Umethod, system and program for managing relationships among entities to exchange encryption keys for use in providing access and authorization to resources
US7159237B2 (en) * 2000-03-16 2007-01-02 Counterpane Internet Security, Inc. Method and system for dynamic network intrusion monitoring, detection and response
WO2001080525A1 (en) 2000-04-14 2001-10-25 Sun Microsystems, Inc. Network access security
EP1340178A4 (en) 2000-11-02 2005-06-08 Cybersource Corp Method and apparatus for evaluating fraud risk in an electronic commerce transaction
FI20002814A0 (en) 2000-12-21 2000-12-21 Nokia Mobile Phones Ltd Context-based communication backup method and arrangement, communication network and communication network terminal
US20040006532A1 (en) 2001-03-20 2004-01-08 David Lawrence Network access risk management
US7142651B2 (en) 2001-11-29 2006-11-28 Ectel Ltd. Fraud detection in a distributed telecommunications networks
AU2003223379A1 (en) * 2002-03-29 2003-10-13 Global Dataguard, Inc. Adaptive behavioral intrusion detection systems and methods
US20040030764A1 (en) * 2002-08-08 2004-02-12 International Business Machines Corporation Identity assertion token principal mapping for common secure interoperability
US8201252B2 (en) 2002-09-03 2012-06-12 Alcatel Lucent Methods and devices for providing distributed, adaptive IP filtering against distributed denial of service attacks
US20040215574A1 (en) 2003-04-25 2004-10-28 First Data Corporation Systems and methods for verifying identities in transactions
US20040254793A1 (en) 2003-06-12 2004-12-16 Cormac Herley System and method for providing an audio challenge to distinguish a human from a computer
US20050039057A1 (en) * 2003-07-24 2005-02-17 Amit Bagga Method and apparatus for authenticating a user using query directed passwords
JP4778899B2 (en) 2003-09-12 2011-09-21 イーエムシー コーポレイション System and method for risk-based authentication
WO2005032111A1 (en) * 2003-10-02 2005-04-07 Viralg Oy Limiting use of unauthorized digital content in a content-sharing peer-to-peer network
US8578462B2 (en) 2003-12-12 2013-11-05 Avaya Inc. Method and system for secure session management in a web farm
US20050144067A1 (en) 2003-12-19 2005-06-30 Palo Alto Research Center Incorporated Identifying and reporting unexpected behavior in targeted advertising environment
US8782405B2 (en) 2004-03-18 2014-07-15 International Business Machines Corporation Providing transaction-level security
US8572254B2 (en) * 2004-04-08 2013-10-29 Worldextend, Llc Systems and methods for establishing and validating secure network sessions
WO2005119648A2 (en) 2004-06-01 2005-12-15 Dna Digital Media Group Character branding employing voice and speech recognition technology
JP4874251B2 (en) 2004-08-18 2012-02-15 マスターカード インターナシヨナル インコーポレーテツド Method and apparatus for authenticating a transaction using a dynamic authentication code
US20080010678A1 (en) 2004-09-17 2008-01-10 Jeff Burdette Authentication Proxy
US9535679B2 (en) * 2004-12-28 2017-01-03 International Business Machines Corporation Dynamically optimizing applications within a deployment server
US20060230039A1 (en) 2005-01-25 2006-10-12 Markmonitor, Inc. Online identity tracking
WO2006094275A2 (en) 2005-03-02 2006-09-08 Markmonitor, Inc. Trust evaluation systems and methods
US20060206941A1 (en) 2005-03-08 2006-09-14 Praesidium Technologies, Ltd. Communications system with distributed risk management
US7630924B1 (en) 2005-04-20 2009-12-08 Authorize.Net Llc Transaction velocity counting for fraud detection
AU2006242555A1 (en) 2005-04-29 2006-11-09 Oracle International Corporation System and method for fraud monitoring, detection, and tiered user authentication
US8578500B2 (en) 2005-05-31 2013-11-05 Kurt James Long System and method of fraud and misuse detection
US7945952B1 (en) 2005-06-30 2011-05-17 Google Inc. Methods and apparatuses for presenting challenges to tell humans and computers apart
GB2429094B (en) 2005-08-09 2010-08-25 Royal Bank Of Scotland Group P Online transaction systems and methods
US8082349B1 (en) * 2005-10-21 2011-12-20 Entrust, Inc. Fraud protection using business process-based customer intent analysis
US8392963B2 (en) 2005-11-28 2013-03-05 Imperva, Inc. Techniques for tracking actual users in web application security systems
US20070124201A1 (en) 2005-11-30 2007-05-31 Hu Hubert C Digital content access system and methods
US8549651B2 (en) 2007-02-02 2013-10-01 Facebook, Inc. Determining a trust level in a social network environment
US20070140131A1 (en) * 2005-12-15 2007-06-21 Malloy Patrick J Interactive network monitoring and analysis
US20070162761A1 (en) * 2005-12-23 2007-07-12 Davis Bruce L Methods and Systems to Help Detect Identity Fraud
US20070226053A1 (en) 2006-03-21 2007-09-27 Kevin Carl System for uploading video advertisements, solicit user feedback, and create ratings/rankings
US7818264B2 (en) 2006-06-19 2010-10-19 Visa U.S.A. Inc. Track data encryption
US8650080B2 (en) 2006-04-10 2014-02-11 International Business Machines Corporation User-browser interaction-based fraud detection system
US8001597B2 (en) 2006-05-15 2011-08-16 Fair Isaac Corporation Comprehensive online fraud detection system and method
US20080082662A1 (en) 2006-05-19 2008-04-03 Richard Dandliker Method and apparatus for controlling access to network resources based on reputation
US7680891B1 (en) 2006-06-19 2010-03-16 Google Inc. CAPTCHA-based spam control for content creation systems
US8601538B2 (en) 2006-08-22 2013-12-03 Fuji Xerox Co., Ltd. Motion and interaction based CAPTCHA
US20080049969A1 (en) 2006-08-25 2008-02-28 Jason David Koziol Methods And Systems For Generating A Symbol Identification Challenge For An Automated Agent
US8145560B2 (en) 2006-11-14 2012-03-27 Fmr Llc Detecting fraudulent activity on a network
US20080133321A1 (en) 2006-12-01 2008-06-05 Yahoo! Inc. System and method for measuring awareness of online advertising using captchas
US20080133348A1 (en) 2006-12-01 2008-06-05 Yahoo! Inc. System and method for delivering online advertisements using captchas
US20080133347A1 (en) 2006-12-01 2008-06-05 Yahoo! Inc. System and method for providing semantic captchas for online advertising
US7523016B1 (en) 2006-12-29 2009-04-21 Google Inc. Detecting anomalies
US8788419B2 (en) * 2006-12-30 2014-07-22 First Data Corporation Method and system for mitigating risk of fraud in internet banking
US20110047605A1 (en) 2007-02-06 2011-02-24 Vidoop, Llc System And Method For Authenticating A User To A Computer System
WO2008121945A2 (en) * 2007-03-30 2008-10-09 Netqos, Inc. Statistical method and system for network anomaly detection
US7620596B2 (en) 2007-06-01 2009-11-17 The Western Union Company Systems and methods for evaluating financial transaction risk
US8200959B2 (en) * 2007-06-28 2012-06-12 Cisco Technology, Inc. Verifying cryptographic identity during media session initialization
US20090012855A1 (en) 2007-07-06 2009-01-08 Yahoo! Inc. System and method of using captchas as ads
US7958228B2 (en) * 2007-07-11 2011-06-07 Yahoo! Inc. Behavioral predictions based on network activity locations
US8510795B1 (en) 2007-09-04 2013-08-13 Google Inc. Video-based CAPTCHA
US8880435B1 (en) 2007-10-26 2014-11-04 Bank Of America Corporation Detection and tracking of unauthorized computer access attempts
US20090113294A1 (en) 2007-10-30 2009-04-30 Yahoo! Inc. Progressive captcha
US8352598B2 (en) 2007-11-27 2013-01-08 Inha-Industry Partnership Institute Method of providing completely automated public turing test to tell computer and human apart based on image
DE102008003531A1 (en) 2008-01-08 2009-07-09 Giesecke & Devrient Gmbh software identification
US20090249477A1 (en) 2008-03-28 2009-10-01 Yahoo! Inc. Method and system for determining whether a computer user is human
US9842204B2 (en) 2008-04-01 2017-12-12 Nudata Security Inc. Systems and methods for assessing security risk
EP3382934A1 (en) 2008-04-01 2018-10-03 Nudata Security Inc. Systems and methods for implementing and tracking identification tests
US20090328163A1 (en) 2008-06-28 2009-12-31 Yahoo! Inc. System and method using streaming captcha for online verification
US20100077482A1 (en) 2008-09-23 2010-03-25 Robert Edward Adams Method and system for scanning electronic data for predetermined data patterns
CN101482847B (en) * 2009-01-19 2011-06-29 北京邮电大学 Detection method based on safety bug defect mode
KR101048991B1 (en) 2009-02-27 2011-07-12 (주)다우기술 Botnet Behavior Pattern Analysis System and Method
CA2760769A1 (en) 2009-05-04 2010-11-11 Visa International Service Association Determining targeted incentives based on consumer transaction history
US8312157B2 (en) * 2009-07-16 2012-11-13 Palo Alto Research Center Incorporated Implicit authentication
US20110016052A1 (en) 2009-07-16 2011-01-20 Scragg Ernest M Event Tracking and Velocity Fraud Rules for Financial Transactions
US8713705B2 (en) * 2009-08-03 2014-04-29 Eisst Ltd. Application authentication system and method
WO2011026604A1 (en) * 2009-09-01 2011-03-10 Nec Europe Ltd. Method for monitoring a network and network including a monitoring functionality
US10360039B2 (en) * 2009-09-28 2019-07-23 Nvidia Corporation Predicted instruction execution in parallel processors with reduced per-thread state information including choosing a minimum or maximum of two operands based on a predicate value
EP2315465A1 (en) * 2009-10-20 2011-04-27 ETH Zurich Method for secure communication between devices
US20120137367A1 (en) 2009-11-06 2012-05-31 Cataphora, Inc. Continuous anomaly detection based on behavior modeling and heterogeneous information analysis
US8370278B2 (en) * 2010-03-08 2013-02-05 Microsoft Corporation Ontological categorization of question concepts from document summaries
US8375427B2 (en) * 2010-04-21 2013-02-12 International Business Machines Corporation Holistic risk-based identity establishment for eligibility determinations in context of an application
US8868651B2 (en) * 2010-08-16 2014-10-21 Avon Products, Inc. Web community pre-population method and system
US8898759B2 (en) * 2010-08-24 2014-11-25 Verizon Patent And Licensing Inc. Application registration, authorization, and verification
US9361597B2 (en) 2010-10-19 2016-06-07 The 41St Parameter, Inc. Variable risk engine
US9329699B2 (en) * 2010-10-22 2016-05-03 Southern Methodist University Method for subject classification using a pattern recognition input device
US20120123821A1 (en) * 2010-11-16 2012-05-17 Raytheon Company System and Method for Risk Assessment of an Asserted Identity
US9665703B2 (en) * 2010-11-29 2017-05-30 Biocatch Ltd. Device, system, and method of detecting user identity based on inter-page and intra-page navigation patterns
GB2491101B (en) * 2011-04-15 2013-07-10 Bluecava Inc Detection of spoofing of remote client system information
RU2477929C2 (en) 2011-04-19 2013-03-20 Закрытое акционерное общество "Лаборатория Касперского" System and method for prevention safety incidents based on user danger rating
US10380585B2 (en) 2011-06-02 2019-08-13 Visa International Service Association Local usage of electronic tokens in a transaction processing system
US8756651B2 (en) * 2011-09-27 2014-06-17 Amazon Technologies, Inc. Policy compliance-based secure data access
US8881289B2 (en) * 2011-10-18 2014-11-04 Mcafee, Inc. User behavioral risk assessment
CN103067337B (en) * 2011-10-19 2017-02-15 中兴通讯股份有限公司 Identity federation method, identity federation intrusion detection & prevention system (IdP), identity federation service provider (SP) and identity federation system
US9141914B2 (en) * 2011-10-31 2015-09-22 Hewlett-Packard Development Company, L.P. System and method for ranking anomalies
WO2013119739A1 (en) 2012-02-07 2013-08-15 Visa International Service Association Mobile human challenge-response test
US9256715B2 (en) 2012-03-09 2016-02-09 Dell Products L.P. Authentication using physical interaction characteristics
EP2789243B1 (en) * 2012-03-30 2017-11-22 Fuji Oil Company Limited Cheese-like food article
US8898766B2 (en) 2012-04-10 2014-11-25 Spotify Ab Systems and methods for controlling a local application through a web page
US8997230B1 (en) * 2012-06-15 2015-03-31 Square, Inc. Hierarchical data security measures for a mobile device
US20130339186A1 (en) 2012-06-15 2013-12-19 Eventbrite, Inc. Identifying Fraudulent Users Based on Relational Information
US20130339736A1 (en) * 2012-06-19 2013-12-19 Alex Nayshtut Periodic platform based web session re-validation
CA2818439A1 (en) * 2012-07-05 2014-01-05 Cyber-Ark Software Ltd. System and method for out-of-band application authentication
US9485118B1 (en) 2012-09-28 2016-11-01 Juniper Networks, Inc. Penalty-box policers for network device control plane protection
CN102970383B (en) * 2012-11-13 2018-07-06 中兴通讯股份有限公司 A kind of method and device, method and device of information processing for distributing IP address
US9560014B2 (en) * 2013-01-23 2017-01-31 Mcafee, Inc. System and method for an endpoint hardware assisted network firewall in a security environment
US20140229414A1 (en) 2013-02-08 2014-08-14 Ebay Inc. Systems and methods for detecting anomalies
US8925056B2 (en) 2013-03-18 2014-12-30 Rawllin International Inc. Universal management of user profiles
US20150382084A1 (en) 2014-06-25 2015-12-31 Allied Telesis Holdings Kabushiki Kaisha Path determination of a sensor based detection system
US9558056B2 (en) 2013-07-28 2017-01-31 OpsClarity Inc. Organizing network performance metrics into historical anomaly dependency data
US20150052616A1 (en) * 2013-08-14 2015-02-19 L-3 Communications Corporation Protected mode for securing computing devices
US9432375B2 (en) 2013-10-10 2016-08-30 International Business Machines Corporation Trust/value/risk-based access control policy
US20150135316A1 (en) * 2013-11-13 2015-05-14 NetCitadel Inc. System and method of protecting client computers
US9608981B2 (en) * 2013-12-11 2017-03-28 Red Hat, Inc. Strong user authentication for accessing protected network
WO2015168203A1 (en) 2014-04-29 2015-11-05 PEGRight, Inc. Characterizing user behavior via intelligent identity analytics
US10412050B2 (en) * 2014-05-23 2019-09-10 Citrix Systems, Inc. Protect applications from session stealing/hijacking attacks by tracking and blocking anomalies in end point characteristics throughout a user session
US10171491B2 (en) 2014-12-09 2019-01-01 Fortinet, Inc. Near real-time detection of denial-of-service attacks
US10552994B2 (en) 2014-12-22 2020-02-04 Palantir Technologies Inc. Systems and interactive user interfaces for dynamic retrieval, analysis, and triage of data items
US9794229B2 (en) 2015-04-03 2017-10-17 Infoblox Inc. Behavior analysis based DNS tunneling detection and classification framework for network security
US10142353B2 (en) * 2015-06-05 2018-11-27 Cisco Technology, Inc. System for monitoring and managing datacenters
EP3345117A4 (en) 2015-09-05 2019-10-09 Nudata Security Inc. Systems and methods for detecting and preventing spoofing
US9990487B1 (en) 2017-05-05 2018-06-05 Mastercard Technologies Canada ULC Systems and methods for distinguishing among human users and software robots
US10007776B1 (en) 2017-05-05 2018-06-26 Mastercard Technologies Canada ULC Systems and methods for distinguishing among human users and software robots
US10127373B1 (en) 2017-05-05 2018-11-13 Mastercard Technologies Canada ULC Systems and methods for distinguishing among human users and software robots

Patent Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20110247071A1 (en) * 2010-04-06 2011-10-06 Triumfant, Inc. Automated Malware Detection and Remediation
US20130024339A1 (en) * 2011-07-21 2013-01-24 Bank Of America Corporation Multi-stage filtering for fraud detection with customer history filters

Cited By (128)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US10997284B2 (en) 2008-04-01 2021-05-04 Mastercard Technologies Canada ULC Systems and methods for assessing security risk
US10839065B2 (en) 2008-04-01 2020-11-17 Mastercard Technologies Canada ULC Systems and methods for assessing security risk
US9842204B2 (en) 2008-04-01 2017-12-12 Nudata Security Inc. Systems and methods for assessing security risk
US9946864B2 (en) 2008-04-01 2018-04-17 Nudata Security Inc. Systems and methods for implementing and tracking identification tests
US11036847B2 (en) 2008-04-01 2021-06-15 Mastercard Technologies Canada ULC Systems and methods for assessing security risk
US11783216B2 (en) 2013-03-01 2023-10-10 Forcepoint Llc Analyzing behavior in light of social time
US10832153B2 (en) 2013-03-01 2020-11-10 Forcepoint, LLC Analyzing behavior in light of social time
US10860942B2 (en) 2013-03-01 2020-12-08 Forcepoint, LLC Analyzing behavior in light of social time
US10776708B2 (en) 2013-03-01 2020-09-15 Forcepoint, LLC Analyzing behavior in light of social time
US10884891B2 (en) * 2014-12-11 2021-01-05 Micro Focus Llc Interactive detection of system anomalies
US20170192872A1 (en) * 2014-12-11 2017-07-06 Hewlett Packard Enterprise Development Lp Interactive detection of system anomalies
US11042591B2 (en) 2015-06-23 2021-06-22 Splunk Inc. Analytical search engine
US11113342B2 (en) * 2015-06-23 2021-09-07 Splunk Inc. Techniques for compiling and presenting query results
US11868411B1 (en) 2015-06-23 2024-01-09 Splunk Inc. Techniques for compiling and presenting query results
US10129279B2 (en) 2015-09-05 2018-11-13 Mastercard Technologies Canada ULC Systems and methods for detecting and preventing spoofing
US10805328B2 (en) 2015-09-05 2020-10-13 Mastercard Technologies Canada ULC Systems and methods for detecting and scoring anomalies
US9979747B2 (en) 2015-09-05 2018-05-22 Mastercard Technologies Canada ULC Systems and methods for detecting and preventing spoofing
US9800601B2 (en) 2015-09-05 2017-10-24 Nudata Security Inc. Systems and methods for detecting and scoring anomalies
US10965695B2 (en) 2015-09-05 2021-03-30 Mastercard Technologies Canada ULC Systems and methods for matching and scoring sameness
US10749884B2 (en) 2015-09-05 2020-08-18 Mastercard Technologies Canada ULC Systems and methods for detecting and preventing spoofing
US10212180B2 (en) 2015-09-05 2019-02-19 Mastercard Technologies Canada ULC Systems and methods for detecting and preventing spoofing
US11068798B2 (en) * 2016-02-09 2021-07-20 Upside Services, Inc Systems and methods for short identifier behavioral analytics
US20210334686A1 (en) * 2016-02-09 2021-10-28 Upside Services, Inc Systems and methods for short identifier behavioral analytics
US11842293B2 (en) 2016-02-09 2023-12-12 Upside Services, Inc Systems and methods for short identifier behavioral analytics
US11605014B2 (en) * 2016-02-09 2023-03-14 Upside Services, Inc Systems and methods for short identifier behavioral analytics
US11238057B2 (en) 2016-09-26 2022-02-01 Splunk Inc. Generating structured metrics from log data
US11188550B2 (en) 2016-09-26 2021-11-30 Splunk Inc. Metrics store system
US11314758B2 (en) 2016-09-26 2022-04-26 Splunk Inc. Storing and querying metrics data using a metric-series index
US11314759B2 (en) 2016-09-26 2022-04-26 Splunk Inc. In-memory catalog for searching metrics data
US11200246B2 (en) * 2016-09-26 2021-12-14 Splunk Inc. Hash bucketing of data
US10419268B2 (en) * 2017-01-27 2019-09-17 Bmc Software, Inc. Automated scoring of unstructured events in information technology environments
US20180219890A1 (en) * 2017-02-01 2018-08-02 Cisco Technology, Inc. Identifying a security threat to a web-based resource
US10574679B2 (en) * 2017-02-01 2020-02-25 Cisco Technology, Inc. Identifying a security threat to a web-based resource
US10305773B2 (en) * 2017-02-15 2019-05-28 Dell Products, L.P. Device identity augmentation
US20180285854A1 (en) * 2017-03-29 2018-10-04 International Business Machines Corporation Sensory data collection in an augmented reality system
US11093927B2 (en) * 2017-03-29 2021-08-17 International Business Machines Corporation Sensory data collection in an augmented reality system
US20180302425A1 (en) * 2017-04-17 2018-10-18 Splunk Inc. Detecting fraud by correlating user behavior biometrics with other data sources
US11811805B1 (en) 2017-04-17 2023-11-07 Splunk Inc. Detecting fraud by correlating user behavior biometrics with other data sources
US11315010B2 (en) 2017-04-17 2022-04-26 Splunk Inc. Neural networks for detecting fraud based on user behavior biometrics
US11372956B2 (en) 2017-04-17 2022-06-28 Splunk Inc. Multiple input neural networks for detecting fraud
US11102225B2 (en) * 2017-04-17 2021-08-24 Splunk Inc. Detecting fraud by correlating user behavior biometrics with other data sources
US10127373B1 (en) 2017-05-05 2018-11-13 Mastercard Technologies Canada ULC Systems and methods for distinguishing among human users and software robots
US9990487B1 (en) 2017-05-05 2018-06-05 Mastercard Technologies Canada ULC Systems and methods for distinguishing among human users and software robots
US10007776B1 (en) 2017-05-05 2018-06-26 Mastercard Technologies Canada ULC Systems and methods for distinguishing among human users and software robots
US11601441B2 (en) 2017-05-15 2023-03-07 Forcepoint Llc Using indicators of behavior when performing a security operation
US11888863B2 (en) 2017-05-15 2024-01-30 Forcepoint Llc Maintaining user privacy via a distributed framework for security analytics
US11902295B2 (en) 2017-05-15 2024-02-13 Forcepoint Llc Using a security analytics map to perform forensic analytics
US11902296B2 (en) 2017-05-15 2024-02-13 Forcepoint Llc Using a security analytics map to trace entity interaction
US11902293B2 (en) 2017-05-15 2024-02-13 Forcepoint Llc Using an entity behavior catalog when performing distributed security operations
US11902294B2 (en) 2017-05-15 2024-02-13 Forcepoint Llc Using human factors when calculating a risk score
US11621964B2 (en) 2017-05-15 2023-04-04 Forcepoint Llc Analyzing an event enacted by a data entity when performing a security operation
US11888861B2 (en) 2017-05-15 2024-01-30 Forcepoint Llc Using an entity behavior catalog when performing human-centric risk modeling operations
US11528281B2 (en) 2017-05-15 2022-12-13 Forcepoint Llc Security analytics mapping system
US11888859B2 (en) 2017-05-15 2024-01-30 Forcepoint Llc Associating a security risk persona with a phase of a cyber kill chain
US11838298B2 (en) 2017-05-15 2023-12-05 Forcepoint Llc Generating a security risk persona using stressor data
US11888860B2 (en) 2017-05-15 2024-01-30 Forcepoint Llc Correlating concerning behavior during an activity session with a security risk persona
US11888864B2 (en) 2017-05-15 2024-01-30 Forcepoint Llc Security analytics mapping operation within a distributed security analytics environment
US11516225B2 (en) 2017-05-15 2022-11-29 Forcepoint Llc Human factors framework
US11888862B2 (en) 2017-05-15 2024-01-30 Forcepoint Llc Distributed framework for security analytics
US11563752B2 (en) 2017-05-15 2023-01-24 Forcepoint Llc Using indicators of behavior to identify a security persona of an entity
US11546351B2 (en) 2017-05-15 2023-01-03 Forcepoint Llc Using human factors when performing a human factor risk operation
US11843613B2 (en) 2017-05-15 2023-12-12 Forcepoint Llc Using a behavior-based modifier when generating a user entity risk score
US20190020676A1 (en) * 2017-07-12 2019-01-17 The Boeing Company Mobile security countermeasures
US11095678B2 (en) * 2017-07-12 2021-08-17 The Boeing Company Mobile security countermeasures
US10642995B2 (en) * 2017-07-26 2020-05-05 Forcepoint Llc Method and system for reducing risk score volatility
US11244070B2 (en) 2017-07-26 2022-02-08 Forcepoint, LLC Adaptive remediation of multivariate risk
US11250158B2 (en) 2017-07-26 2022-02-15 Forcepoint, LLC Session-based security information
US11132461B2 (en) 2017-07-26 2021-09-28 Forcepoint, LLC Detecting, notifying and remediating noisy security policies
US11379608B2 (en) 2017-07-26 2022-07-05 Forcepoint, LLC Monitoring entity behavior using organization specific security policies
US11379607B2 (en) 2017-07-26 2022-07-05 Forcepoint, LLC Automatically generating security policies
US20230239316A1 (en) * 2017-08-24 2023-07-27 Level 3 Communications, Llc Low-complexity detection of potential network anomalies using intermediate-stage processing
US11621971B2 (en) * 2017-08-24 2023-04-04 Level 3 Communications, Llc Low-complexity detection of potential network anomalies using intermediate-stage processing
US10601849B2 (en) * 2017-08-24 2020-03-24 Level 3 Communications, Llc Low-complexity detection of potential network anomalies using intermediate-stage processing
US20190068623A1 (en) * 2017-08-24 2019-02-28 Level 3 Communications, Llc Low-complexity detection of potential network anomalies using intermediate-stage processing
US20210385240A1 (en) * 2017-08-24 2021-12-09 Level 3 Communications, Llc Low-complexity detection of potential network anomalies using intermediate-stage processing
US11108801B2 (en) * 2017-08-24 2021-08-31 Level 3 Communications, Llc Low-complexity detection of potential network anomalies using intermediate-stage processing
US10769283B2 (en) 2017-10-31 2020-09-08 Forcepoint, LLC Risk adaptive protection
US10803178B2 (en) 2017-10-31 2020-10-13 Forcepoint Llc Genericized data model to perform a security analytics operation
US20190306170A1 (en) * 2018-03-30 2019-10-03 Yanlin Wang Systems and methods for adaptive data collection using analytics agents
US11314787B2 (en) 2018-04-18 2022-04-26 Forcepoint, LLC Temporal resolution of an entity
US20200019698A1 (en) * 2018-07-11 2020-01-16 Vmware, Inc. Entropy based security detection system
US10860712B2 (en) * 2018-07-11 2020-12-08 Vmware, Inc. Entropy based security detection system
US11755584B2 (en) 2018-07-12 2023-09-12 Forcepoint Llc Constructing distributions of interrelated event features
US10949428B2 (en) 2018-07-12 2021-03-16 Forcepoint, LLC Constructing event distributions via a streaming scoring operation
US11810012B2 (en) 2018-07-12 2023-11-07 Forcepoint Llc Identifying event distributions using interrelated events
US11755585B2 (en) 2018-07-12 2023-09-12 Forcepoint Llc Generating enriched events using enriched data and extracted features
US11544273B2 (en) 2018-07-12 2023-01-03 Forcepoint Llc Constructing event distributions via a streaming scoring operation
US11436512B2 (en) 2018-07-12 2022-09-06 Forcepoint, LLC Generating extracted features from an event
US11025638B2 (en) 2018-07-19 2021-06-01 Forcepoint, LLC System and method providing security friction for atypical resource access requests
US20220217222A1 (en) * 2018-08-02 2022-07-07 Paul Swengler User and client device registration with server
US11310343B2 (en) * 2018-08-02 2022-04-19 Paul Swengler User and user device registration and authentication
US11496586B2 (en) * 2018-08-02 2022-11-08 Paul Swengler User and client device registration with server
US11411973B2 (en) 2018-08-31 2022-08-09 Forcepoint, LLC Identifying security risks using distributions of characteristic features extracted from a plurality of events
US11811799B2 (en) 2018-08-31 2023-11-07 Forcepoint Llc Identifying security risks using distributions of characteristic features extracted from a plurality of events
US11010257B2 (en) * 2018-10-12 2021-05-18 EMC IP Holding Company LLC Memory efficient perfect hashing for large records
US11595430B2 (en) 2018-10-23 2023-02-28 Forcepoint Llc Security system using pseudonyms to anonymously identify entities and corresponding security risk related behaviors
US11025659B2 (en) 2018-10-23 2021-06-01 Forcepoint, LLC Security system using pseudonyms to anonymously identify entities and corresponding security risk related behaviors
US11171980B2 (en) 2018-11-02 2021-11-09 Forcepoint Llc Contagion risk detection, analysis and protection
US11620180B2 (en) 2018-11-29 2023-04-04 Vmware, Inc. Holo-entropy adaptive boosting based anomaly detection
US11258655B2 (en) 2018-12-06 2022-02-22 Vmware, Inc. Holo-entropy based alarm scoring approach
EP3931731A4 (en) * 2019-03-01 2022-11-23 Mastercard Technologies Canada ULC Feature drift hardened online application origination (oao) service for fraud prevention systems
US11410187B2 (en) 2019-03-01 2022-08-09 Mastercard Technologies Canada ULC Feature drift hardened online application origination (OAO) service for fraud prevention systems
WO2020176978A1 (en) * 2019-03-01 2020-09-10 Mastercard Technologies Canada ULC Feature drift hardened online application origination (oao) service for fraud prevention systems
US11003525B2 (en) * 2019-03-23 2021-05-11 AO Kaspersky Lab System and method of identifying and addressing anomalies in a system
US11037428B2 (en) * 2019-03-27 2021-06-15 International Business Machines Corporation Detecting and analyzing actions against a baseline
US11475125B2 (en) * 2019-05-01 2022-10-18 EMC IP Holding Company LLC Distribution-based aggregation of scores across multiple events
US11086991B2 (en) * 2019-08-07 2021-08-10 Advanced New Technologies Co., Ltd. Method and system for active risk control based on intelligent interaction
US20200233958A1 (en) * 2019-08-07 2020-07-23 Alibaba Group Holding Limited Method and system for active risk control based on intelligent interaction
US11928683B2 (en) 2019-10-01 2024-03-12 Mastercard Technologies Canada ULC Feature encoding in online application origination (OAO) service for a fraud prevention system
US11368464B2 (en) * 2019-11-28 2022-06-21 Salesforce.Com, Inc. Monitoring resource utilization of an online system based on statistics describing browser attributes
US11381570B2 (en) * 2019-12-20 2022-07-05 Beijing Didi Infinity Technology And Development Co., Ltd. Identity and access management dynamic control and remediation
US11489862B2 (en) 2020-01-22 2022-11-01 Forcepoint Llc Anticipating future behavior using kill chains
US11223646B2 (en) 2020-01-22 2022-01-11 Forcepoint, LLC Using concerning behaviors when performing entity-based risk calculations
US11570197B2 (en) 2020-01-22 2023-01-31 Forcepoint Llc Human-centric risk modeling framework
US11630901B2 (en) 2020-02-03 2023-04-18 Forcepoint Llc External trigger induced behavioral analyses
WO2021155471A1 (en) * 2020-02-07 2021-08-12 Mastercard Technologies Canada ULC Automated web traffic anomaly detection
US11736505B2 (en) 2020-02-07 2023-08-22 Mastercard Technologies Canada ULC Automated web traffic anomaly detection
US11080109B1 (en) 2020-02-27 2021-08-03 Forcepoint Llc Dynamically reweighting distributions of event observations
US11429697B2 (en) 2020-03-02 2022-08-30 Forcepoint, LLC Eventually consistent entity resolution
US11836265B2 (en) 2020-03-02 2023-12-05 Forcepoint Llc Type-dependent event deduplication
US11080032B1 (en) 2020-03-31 2021-08-03 Forcepoint Llc Containerized infrastructure for deployment of microservices
US11568136B2 (en) 2020-04-15 2023-01-31 Forcepoint Llc Automatically constructing lexicons from unlabeled datasets
US11516206B2 (en) 2020-05-01 2022-11-29 Forcepoint Llc Cybersecurity system having digital certificate reputation system
US11544390B2 (en) 2020-05-05 2023-01-03 Forcepoint Llc Method, system, and apparatus for probabilistic identification of encrypted files
US11895158B2 (en) 2020-05-19 2024-02-06 Forcepoint Llc Cybersecurity system having security policy visualization
US11704387B2 (en) 2020-08-28 2023-07-18 Forcepoint Llc Method and system for fuzzy matching and alias matching for streaming data sets
US11190589B1 (en) 2020-10-27 2021-11-30 Forcepoint, LLC System and method for efficient fingerprinting in cloud multitenant data loss prevention
US11157614B1 (en) * 2021-01-27 2021-10-26 Malwarebytes Inc. Prevention of false positive detection of malware

Also Published As

Publication number Publication date
SG10201909133YA (en) 2019-11-28
CN108885659B (en) 2022-02-11
AU2016315900B2 (en) 2019-11-21
US20170070525A1 (en) 2017-03-09
CA2997597C (en) 2021-01-26
US20170070527A1 (en) 2017-03-09
AU2016315900A1 (en) 2018-03-29
EP3345349A2 (en) 2018-07-11
IL257849B2 (en) 2023-12-01
AU2016314061A1 (en) 2018-03-29
US20170070534A1 (en) 2017-03-09
US20170339184A1 (en) 2017-11-23
US10212180B2 (en) 2019-02-19
US9648034B2 (en) 2017-05-09
IL257852B (en) 2021-04-29
US20170070522A1 (en) 2017-03-09
AU2019271892A1 (en) 2019-12-19
US20170070517A1 (en) 2017-03-09
US9680868B2 (en) 2017-06-13
IL257849A (en) 2018-04-30
EP3345117A4 (en) 2019-10-09
CA2997583C (en) 2021-04-20
WO2017037542A1 (en) 2017-03-09
WO2017037544A2 (en) 2017-03-09
US9749357B2 (en) 2017-08-29
US10965695B2 (en) 2021-03-30
US20180367555A9 (en) 2018-12-20
US20170070526A1 (en) 2017-03-09
US20190149567A1 (en) 2019-05-16
US9979747B2 (en) 2018-05-22
US20180191762A1 (en) 2018-07-05
IL257852A (en) 2018-04-30
EP3345117A1 (en) 2018-07-11
EP3345099A4 (en) 2019-08-28
WO2017037544A3 (en) 2017-04-20
CA2997597A1 (en) 2017-04-13
IL257849B1 (en) 2023-08-01
US9800601B2 (en) 2017-10-24
EP3345099A2 (en) 2018-07-11
US9749356B2 (en) 2017-08-29
AU2019232865B2 (en) 2021-03-18
AU2019271891A1 (en) 2019-12-19
US10805328B2 (en) 2020-10-13
EP3345099B1 (en) 2021-12-01
CN108885659A (en) 2018-11-23
US20170070523A1 (en) 2017-03-09
US10129279B2 (en) 2018-11-13
CN108885666A (en) 2018-11-23
US20170134414A1 (en) 2017-05-11
IL257844B (en) 2021-04-29
US10749884B2 (en) 2020-08-18
IL257844A (en) 2018-04-30
US9813446B2 (en) 2017-11-07
AU2019232865A1 (en) 2019-10-10
AU2016314061B2 (en) 2020-02-27
US20170070524A1 (en) 2017-03-09
CN108780479A (en) 2018-11-09
US9749358B2 (en) 2017-08-29
WO2017060778A3 (en) 2017-07-20
EP3345349A4 (en) 2019-08-14
US20170070533A1 (en) 2017-03-09
AU2019271890A1 (en) 2019-12-19
AU2019271890B2 (en) 2021-02-18
AU2019271892B2 (en) 2021-02-18
CN108780479B (en) 2022-02-11
CA2997585C (en) 2021-02-23
AU2016334816A1 (en) 2018-03-29
CN108885666B (en) 2022-06-10
WO2017060778A2 (en) 2017-04-13
CA2997585A1 (en) 2017-03-09
AU2019271891B2 (en) 2021-02-18
CA2997583A1 (en) 2017-03-09

Similar Documents

Publication Publication Date Title
AU2019232865B2 (en) Systems and methods for detecting and scoring anomalies
US10467294B2 (en) Systems and methods of using a bitmap index to determine bicliques
US10296918B1 (en) Providing risk assessments to compromised payment cards

Legal Events

Date Code Title Description
AS Assignment

Owner name: NUDATA SECURITY INC., CANADA

Free format text: ASSIGNMENT OF ASSIGNORS INTEREST;ASSIGNORS:BAILEY, CHRISTOPHER EVERETT;LUKASHUK, RANDY;RICHARDSON, GARY WAYNE;REEL/FRAME:040594/0526

Effective date: 20161123

STCB Information on status: application discontinuation

Free format text: ABANDONED -- FAILURE TO RESPOND TO AN OFFICE ACTION

AS Assignment

Owner name: MASTERCARD TECHNOLOGIES CANADA ULC, CANADA

Free format text: CERTIFICATE OF AMALGAMATION;ASSIGNOR:NUDATA SECURITY INC.;REEL/FRAME:045997/0492

Effective date: 20180101