WO2013006071A1 - System and method for intrusion detection through keystroke dynamics - Google Patents

System and method for intrusion detection through keystroke dynamics Download PDF

Info

Publication number
WO2013006071A1
WO2013006071A1 PCT/PT2011/000022 PT2011000022W WO2013006071A1 WO 2013006071 A1 WO2013006071 A1 WO 2013006071A1 PT 2011000022 W PT2011000022 W PT 2011000022W WO 2013006071 A1 WO2013006071 A1 WO 2013006071A1
Authority
WO
WIPO (PCT)
Prior art keywords
samples
predetermined
user
sample
module
Prior art date
Application number
PCT/PT2011/000022
Other languages
French (fr)
Inventor
João Pedro SOARES DA SILVA FERREIRA
Original Assignee
Critical Software, S.A.
Universidade Do Minho
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Critical Software, S.A., Universidade Do Minho filed Critical Critical Software, S.A.
Priority to PCT/PT2011/000022 priority Critical patent/WO2013006071A1/en
Publication of WO2013006071A1 publication Critical patent/WO2013006071A1/en

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING; COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F21/00Security arrangements for protecting computers, components thereof, programs or data against unauthorised activity
    • G06F21/50Monitoring users, programs or devices to maintain the integrity of platforms, e.g. of processors, firmware or operating systems
    • G06F21/55Detecting local intrusion or implementing counter-measures
    • GPHYSICS
    • G06COMPUTING; CALCULATING; COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F21/00Security arrangements for protecting computers, components thereof, programs or data against unauthorised activity
    • G06F21/30Authentication, i.e. establishing the identity or authorisation of security principals
    • G06F21/31User authentication
    • G06F21/316User authentication by observing the pattern of computer usage, e.g. typical user behaviour

Abstract

The invention relates to a system and method for detection of intrusions, based on the passive and continuous analysis of users' unique patterns when typing on a computer keyboard. Independent of any additional peripheral equipment, this method requires no dedicated interactions, running in the background of the legitimate users' computers monitoring the unrestrained text input produced. In addition to periodical intrusion verifications, it also enables triggering asynchronous intrusion detection in the event of any critical operation. It comprises event logging and periodic verification (1), asynchronous verification (2), sample processing (3), user profile database (4), scores calculation (5), decision (6), progressive learning (8), and intrusion reporting (7).

Description

D E S C R I P T I O N

"SYSTEM AND METHOD FOR INTRUSION DETECTION THROUGH

KEYSTROKE DYNAMICS"

Technical field of the invention

The invention relates to a system and method for detection of intrusions, based on the passive and continuous analysis of users' unique patterns when typing on a computer keyboard. Independent of any additional peripheral equipment, this method requires no dedicated interactions, consistently running in the background of the legitimate users' computers monitoring the unrestrained text input produced. In addition tc periodical intrusion verifications, this method also enables triggering asynchronous intrusion detection in the event of any critical operation.

Background

Critical information is being progressively handled in digital format, a consequence of the reigning Information Era. Proportionately, as the value of digital information escalates, cyber-attacks become increasingly threatening, and popular. Protecting that information is therefore a growing necessity. Among several security technologies that emerged, User Authentication (within the context of Access Control Policies) plays a very important role as a first control concerning user/machine interaction. Password-based authentication mechanisms are currently the most common way to assure the user is who he/she is supposed to be. To the trusted user, authentication mechanisms offer a reasonable layer of protection against intruders. However, after the authentication phase is passed, the user is successfully identified and nc further proof of identity is usually required. This lack of continuous identity veri f i cati on is a severe access controi vulnerability that allows for opportunist attacks, especially from insiders. Insiders are unavoidable, trusted, have access and opportunity. Studies show that 60 to 70 percent of cyber- attacks come from insiders (Lynch, 2006), and several are ignored by these statistics, since a significant number of insider attacks explore some sort of password abuse and are not detected (Schultz, 2002) .

For example, whenever a user leaves its workstation in a logged in state, an attacker nearby can use it to access critical information; in another ordinary scenario, an intruder can persuade a legitimate user to let him use his computer to just read the mail or do other apparently innocent task, and, maliciously, do something else. Mitigating this threat with more frequent authentication challenges is not a valid option, since it would be inconvenient to the user, which could ultimately look for workarounds that would pose even greater security risks. Therefore, a better solution :s the adoption of a technique that passively and continuously monitors the user's interactions, searching for some proof of intrusion.

Host-based Intrusion Detection Systems (HIDSs) satisfy most of these conditions. However, current IDSs are focused on the system instead of the user. As a consequence, system- safe actions are considered legal (no matter who is really behind those actions), and it is still possible (in fact, very easy) for an attacker to execute malicious actions even with such a security control in place.

Biometric features can be used for two main objectives: identification and authentication. In the former case, a biometric trait is used for matching processing against the entire content of a previously captured biometric database - this can be a huge problem, especially for very large databases. In the latter case a biometric trait is used to verify if it matches with one previously stored and belonging to the user enrolled with the system - the matching process is simpler, but precision is a main concern to avoid false negatives (negative authentication of a legitimate user) and false positives (positive authentication of an impersonator) .

Authentication mechanisms are typically divided into three classes (Liu and Silverman, 2001) :

Based on something the user knows (or knowledge based) the most common means of authentication includes passwords and personal identification numbers (PINs) . These suffer from the possibility of being easily duplicated, even without the user's consent. Complex passwords can be forgotten (and because of that they are often stored or written, increasing the risk of theft), while simple passwords may be easily guessed, cracked or offered to an ill-intentioned con artist.

Based on something the user has (or possession based) keys and authentication tokens. Usually used in conjunction with PINs or passwords, these help increasing security, but car be easily lost, borrowed or stolen. ■ Based on something the user is (or identity based) the field of biometric security. While being currently the least commonly deployed mechanisms in computer systems, they are believed to represent an effective mean cf authentication. More importantly, with the added bonus of better protection of the authentication data from duplication, loss or theft, since the data source is the user in question. Concerning user intrusion detection based on authentication, biometrics is the only technology that allows an effective link between users and respective credentials. So, it is on biometric analysis that this paper is focused.

Biometric features are commonly divided into two categories: physiological and behavioral features. The physiological features include face, retinal or iris patterns, fingerprints, palm lopo.ogy, hand geometry, wrist veins and thermal images. On the other hand, behavioral features include voiceprints, keystroke dynamics, handwritten signatures and gait (Bergadano, Gunetti and Picardi, 2002 ) .

Physiological features are currently the most successfully implemented, due to the high variability of behavioral features - which can greatly vary between consecutive samplings, since they are dependent of a human user's performance . Another drawback of many biometric techniques is the requirement of specific equipment, such as scanners or special cameras, in order to sample the required characteristics. Most users are also wary of using intrusive equipment (such as retinal scanners, for example ) . The Keystroke Dynamics technique demonstrated that distinctive neurophysioiogic factors influence the typing patterns of human users interacting with a keyboard (Marsters, 2009) . Using the keyboard as a source of biometric information is especially appealing due to the ever-ava i 1 abi 1 i t y of typing rhythms, independent of an authentication phase being passed (or fooled) .

Concerning the user's typing dynamics, on a standard keyboard connected to a Personal Computer it is possible to extract the amount of time each key is held down (called dwell time) , and the elapsed time between the release of the first key and the depression of the second (called flight time) (Monrose and Rubin, 2000) . These atomic features are usually merged to form n-graphs, representing consecutive keystrokes (digraphs, trigraphs and fourgraphs being the most widely sampled graphs) . Recent laptop computers feature 3-D acceler ometer chips, and research on vibration-sensitive keystroke analysis has showed promising results (Lopatka and Peetz, 2009; "wasaki, Miyaki and Rekimoto, 2009) . However, this metric would not be usable on most situations (desktop computers; external keyboards; older laptops), harming the desired generality. The use of special keyboards, such as the ones claimed on documents GB2470579A and US6442692B] also share this downside. The possibility of using distance between keys as a valid metric was also studied, to nc avail (Magaihaes, 2005) .

Apart from the natural unreliability of the human user as a data source, a factor intrinsic tc all behavioral biometric measuring, an authentication system based solely on keystroke timing information is of course susceptible to other problems. A considerable number of potential sources of noise might shift the user's behavior away from their normal typing profile. J . -D . Marsters (2009) listed some examples: weather conditions (a cold day might mean that a typist's fingers move more si owly ) , fatigue and stress, injury, and even a simple distraction (common in an office environment) . All these factors can add a significant amount of noise to ar otherwise consistent typing.

These variations will inevitably induce false-positives and false-negatives on any behavioral biometric based system. Valid solutions using keystroke dynamics need to take them into account, and try to mitigate their occurrences as much as possible .

In order to minimize the aforementioned typing instability, most research on published literature chooses to contro] the text used to produce samples, asking for usernames, passwords, or fixed text paragraphs (e.g. Robinson et al . , 1996; Lopatka and Peetz, 2009; Jiang, Shieh and Liu, 2007; documents US6895514B1, US20040059950A1 , US20060242424A1 , US20040187037, US20090150992 ) . It is believed that users tend to type familiar and we] ] -pract] ced phrases with a more consistent rhythm (Robinson et al . , 1998) .

There are currently very few approaches to the keystroke analysis of unrestrained text. Ahmed and Traore (2005 and document US200602.24898 ) claimed a False Acceptance Rate (FAR) of 1.312% and a False Rejection Rate (FRR) of 0.651%, using tr igraph-based keystroke analysis in conjugation with pointer dynamics.

Downland and Furnell (2004) monitorized 35 subjects for 3 months (nearly 6 million samples), obtaining a FRR of 4.9% for a FAR of 0% via a computational-heavy process that would be impossible to implement on a system that needs to provide quick responses.

J.-D. Marsters (2009) developed a system for transparent keystroke analysis with a FAR of around 2% for a near-zero FRR and a small userbase cf 1C participants.

Finally, Gunetti and Picardi (2005) thoroughly researched the impact of multiple parameter variations in their study, obtaining (for a userbase of 205 participants) a FRR of less than 5% and FAR of Jess than 0.005% for their best- scoring implementation, adopting two-factor (absolute and relative) scoring calculations, a concept shared by the proposed solution. However, their data set was obtained by filling an online text box (which guarantees that nearly every sample will be constituted of real text), which can not be considered an unrestrained text input - which i ncludescoding, browsing, gaming, writing in different contexts and languages, etc.

General Description of the invention

As mentioned above, current Host-based Intrusion Detection Systems (HIDSs) are focused on the system instead of the user. As a consequence, system-safe actions are considered legal (no matter who is really behind those actions), and it is still possible (in fact, very easy) for an attacker to execute malicious actions even with such a security control in place.

One solution for this issue is to extend the IDS concept to the user authentication level, using Anomaly-based detect ion to distinguish benign activity from malicious activity. Applying "his concept with focus on the user requires tracking user profiles, leading to biometric features .

Also, from above, or. the HIDS context, a biometric technique can only be used if the user's analyzed trait can be continuously sampled - which is rare for this kind of techniques. An attractive biometric technique would perform transparently and continuously without any additional equipment .

The system and method describee: aims to passively and continuously monitor the user's interactions with a computer keyboard.

It comprises Event Logging and Periodic Verification Trigger (2), Sample Processing (3), User Profile Database (4), Scores Calculation (5), Decision (6), Progressive Learning (8), Prevention and Reporting (7), and Asynchronous Verification Trigger (2) - divided in two Phases: Enrollment (11) and Validation (10) .

Regarding the Enrollment Phase (11), the legitimate user is required to type, typically, a few lines of text, so that sufficient sample data is gathered. When the system is ready to recognize legitimate use, enters into Validation Phase (10), which will be preserved during operation.

User's interactions with the computer keyboard are logged, typically continuously, (Event Logging for Periodic Verification 1), generating structured sequences hereby referred as user samples (Sample Processing 3) . Every validation attempt (every typed sample) will be matched against the user's typing profile, outputting a score that will help determine if the attempt is valid (Scores Calculation 5) . In the Decision Module (6), an attempt that passes validation will be added to the stored profile (4), ensuring that the user's profile is constantly updated, accompanying evolutions of the user's typing dynamics (Progressive Learning 8) . Attempts that fail to validate denunciate the presence of an intruder and will trigger the Reporting Module (7) .

The described system has a predefined periodicity triggered by the Periodic Verification Trigger Module (2) . However, whenever an immediate validation attempt is needed (e.g. when performing critical actions on the computer), the Asynchronous Verification Trigger Module (2) triggers the validation process above described.

Description of the figures

The following figures provide preferred embodiments for illustrating the description and should not be seen as limiting the scope of invention.

Figure 1: Schematic representation of a first preferred embodiment where :

(1) represents a periodic verification trigger module,

(2) represents an asynchronous verification trigger module,

(3) represents a sample processing module,

(4) represents a database module,

(5) represents a score calculation module,

(6) represents a decision module,

(7) represents an intrusion detection module, and

(8) represents a progressive learning module. Figure 2: Schematic representation of a second preferred embodiment where:

(0) represents a event logging module,

(1) represents a periodic verification trigger module,

(2) represents an asynchronous verification trigger module,

(3) represents a sample processing module,

(4) represents a database module,

(5) represents a score calculation module,

(6) represents a decision module,

(7) represents an intrusion detection module,

(8) represents a progressive learning module,

(9) represents a secur itand/or administration console,

(10) represents the modules involved in validation, and

(11) represents the modules involved in enrolment.

Detailed description of the invention

It is described a system and method for passively and continuously monitoring user's interactions with a computer keyboard, searching for proof of intrusion with the scope of strengthening the security of the human user's connection to his digital identity. The keyboard, as known in the art, can be based on physical keys of various types, namely mechanical or optical, or on-screen keyboards, namely capacitive or resistive, or even virtual keyboards, namely laser projection or visual finger motion keyboards.

The system for this method comprises several modules and their respective processes - Event Logging (0) and Periodic Verification Trigger (1), Sample Processing (3), Scores Calculation (5), Decision (6), Progressive Learning (8), Reporting ( 1 ) , and Asynchronous Verification Trigger (2) - operating in :wo Phases: Enrollment (11) and Validation (10) .

An initial Enrollment Phase (11 ) is required to enable the system to learn the unique typing characteristics of the legitimate human user.

The user is required to typically type a few lines of text, so that sufficient sample data about his unique typing characteristics is gathered, but the present invention introduces novel aspects in this respect, as shall be clear .

During this phase, the system will employ the user's samples with the sole purpose of learning, without deciding over their authenticity.

In this Phase, it is therefore required that the user is the only effective person typing on the computer keyboard or, similarly, only typing from the intended user is accepted e.g. by adequate filtering.

The former prior art in this subject comprises patent applications that require the user to actively engage in the learning process, namely prompting the user to effectively type a dedicated block of text in order to validate his unique biometric pattern (e.g. document GB2470579A) . In contrast, the herewith described invention may perform the same task without asking the user for dedicated input, collecting ordinary typing rhythms from the user's daily routine (namely emails, IM conversations, coding, etc.) with no restrictions or particular requirements . When the system is ready to recognize legitimate use, the transition between Enrollment Phase ana Validation Phase is equally performed in a seamless way, in any suitable way. Typically, after a predetermined number of stored samples (e.g. 5 to 20, 10 to 18, around 15) or after a predetermined time (a few minutes of active typing) or after the stored samples converge (showing a minimum of changes from the previously stored samples, below a predetermined level) . Preferably, this is user or system configurable .

Once into the Validation Phase, the - system is able to decide whether the authentication attempt was performed by the legitimate user or by an intruder.

Samples generated in this Phase will be matched against the legitimate user's stored biometric profile, outputting a score that will determine if the attempt is valid.

Successful and unsuccessful attempts will trigger different modules, responsible for performing the adequate actions in case of legitimate access or intrusion.

Having described the two Phases of the system's operation, the seven Modules described above in Figure 1 will now be detailed .

Event Logging (0) and Periodic Verification (1)

This module is a lightweight software agent running in the background of every registered user's computer. It continuously logs the sequence of keystroke events (keydowns and keyups) generated by the user's typing, along with elapsed time measurements (with an accuracy of .01 milliseconds) . This module is also responsible for filtering out unwanted events, like function keys, modifier keys, auto-repeat events (when a key is depressed for more than half a second) and writing breaks. Table 1 exemplifies how the log looks like.

The computational weight cf this module needs to be kept at a minimum level, since "imprisoning" keystroke events for an exaggerated amount of time could lead to noticeable delays on the system-wide responsiveness of the keystrokes. This "typing lag" would bother the user and affect the desired transparency of the system. Therefore, the deep processing work is delegated to the Sample Processing Module when a full sample is logged.

Figure imgf000014_0001

Table 1: Log from typing the word "apples

Establishing the size of a sample is an important decision. Longer samples imply an increased number of n-graphs, thus improving sample accuracy, since more shared n-graphs will be available for comparison (Gunetti and Picardi, 2005) . On the other hand, shorter samples ensure a faster periodicity uf the proposed solution's process. A typical sample size may vary between 300 and 3000 events, between 600 and 2000, between 1000 and 1800, around 1500. Also, the number of stored samples is important and may typically vary from 5 to 500 samples, 10 to 100, 15 to 30 samples. Obviously, sample size and the number of stored samples are interrelated, namely that both should be as small as possible while still maintaining sufficient accuracy. Preferably, these are user or system configurable.

This module can input samples suitable for both periodic (1) and asynchronous (2) triggered verifications.

Sample Processing (3)

This module is responsible for converting the received raw sample log into a structured and normalized sample. Table 2 illustrates how the logged samples (listed in the leftmost table) generate the subsamples used by this solution, regarding different n-graphs .

Figure imgf000016_0001

Table 2: Subsamples generated from the raw sample log

It is noted the presence of negative values on flight measurements. This is an example of overlapped typing (result of pressing the succeeding key before releasing the previous one), very common on most users' typing samples. Some users overlap a set of key combinations with extreme consistency, a differentiating factor worth examining in subsequent steps of the invention. These negative values will be taken into consideration and may be used to differentiate users. For example, when comparing key combinations that a legitimate user routinely overlaps and an intruder does no: .

Once finished building the subsampies, execution proceeds by filtering outliers - typical of every behavioral biometric traits, giving that human users as a source of data are naturally unpredictable. Outlier times are filtered out, namely applying the widely used Interquartile Range (IQR) formula. However, this formula requires the existence of at least 4 occurrences, and on a full sample, many graphs are typed just once. In these cases, in order to determine if their timing measure is in fact an outlier, a correlation is devised through temporal analysis regarding every other n-grapn in its subsample - what we call Individual Outlier Filtering, an approach to profile purification unknown to prior art embodiments. For example, if all trigraph times in a user's sample vary between 20000 and 40000, (assuming there are around 500 trigraphs per sample or another reasonably sufficient amount) it is reasonably safe to assume that a trigraph with a time of 80000 is not natural to that user.

Finally, this module calculates the mean and standard deviation for each graph, and outputs a fully processed user sample (subsampies similar to the ones on table 2, now with fields for mean, standard deviation and number of occurrences, instead of Time) . Other centrality and dispersion measures may be used, if reasonably updatable. If the user is new to the system (enrollment phase), the sample goes straight to the user's profile stored in database (4) . In the proposed solution, a single stored sample is enough for the system to finish the enrollment phase and proceed tc the validation phase. However, the decisions' accuracy increases with more samples in storage, should the user be willing to certify that he is the only one typing on the computer (a requisite during enrollment phase) for a longer period cf time. The number of stored samples needed to enter the validation phase is customizable. If the user is already registered the system will be in validation phase - as a consequence, the sample will be labeled as an authentication attempt, and sent to the Scores Calculation module.

Scores Calculation ( 5 )

At this stage, there is a structured sample (the attempt) ready for evaluation. All samples from the user profile in a centrally managed database are imported to the application in order to perform the necessary comparisons. Sample length is a very important factor on the accuracy of a free text keystroke dynamics algorithm (Gunetti and Picardi 2005; Hempstalk 2009), which is understandable, mainly because longer profile samples will share more n- graphs with the attempts. On the other hand, long samples force longer enrollment periods, make the system less responsive in passive mode (longer listening periods, and consequently fewer validation stages), and provide a coarse update of the user's typing evolution. Therefore, published solutions end up compromising with setting an average sample length, never benefiting from the advantages of both short and long samples.

The proposed scoring module derives from the idea that it is possible to benefit from the user's longest possible sample for comparison, keeping the system iterating with reasonably short samples. During the calculation of scores, the attempt sample (short) will be compared against a unified user signature - a ^ong sample with the merged information of every attempt sample previously validated and stored for that user. With this, the maximum amount of n-graphs will be shared with the attempt, which consequently improves accuracy. Basically, every attempt sample previously validated is continuously merged into a sufficiently long stored sample. Moreover, this merge is done with a very low processing cost.

Each sample n-graph contains its average ( μ ) , standard deviation (ό ) and hit count in) . For a merged sample with the combination (C) of two samples (sample 1 and 2), these values are obtained with the formulas shown or. Formula 1.

Figure imgf000019_0001

Formula 1 : Obtaining average, standard deviation and hit count for a merged sample

With the merged sample and the attempt sample ready for comparison, the next step is to filter out all the graphs that are not shared between them - these samples come from unrestrained text input, and therefore are likely to not share every occurrence.

The scoring algorithm is preferentially based on two measurements: absolute and relative comparisons. While absolute comparisons rely solely on timing values for identity evaluation, relative comparisons refer to the order of a user's typing - the underlying rationale being the belief that if a legitimate user is known to type a certain n-graph faster than another (e.g. if he types the trigraph "for" faster than "spa"), he will keep consistently doing so, no matter the speed of his typing. Most prior art implementations perform Keystroke Dynamics analysis based only on absolute comparisons, namely claimed on document US20060224898.

The global score of a sample is an average between the absolute and relative scores, as detailed below. The weights can vary, but typically 40/60 to 60/40, around 50 to 50 are used.

In order to obtain a likeness score, the algorithm is based on two comparison measures: absolute and relative.

Regarding the absolute comparisons, for each shared graph, it is compared the difference between the averages of the attempt and the sample. If this difference falls below a certain threshold, the graph comparison is labeled as a success, being a failure otherwise. Threshold setting is a key decision in this measurement .

The human user does not write every text segment with the same timing stability. On his typing dynamics, he becomes so used to performing certain finger movements, that they become almost automatic (hence the resort to fixed and frequently typed samples on most keystroke dynamics solutions) . This is a behavior that can be explored by reducing the acceptance threshold value for graphs the user is known to perform consistently (that is, with the lowest standard deviation values) . While the legitimate user will still easily qualify his graphs, an attacker will most likely fail due tc the reduced acceptance window. These graphs are called consistent graphs .

For each registered user, the database keeps and updates a record of their most consistent graphs (2% to 20%, or 5% to 15%, or around the 10% most consistent, preferably user or system customizable), on every subsample. This record is retrieved when the user profile is imported for comparison. When comparing a shared graph, the threshold to apply will be determined by presence on that record. By default, we preferably use values selected from or between 1.03, 1.05, 1.10, and 1.25 as the regular threshold value, and preferably use values selected from or between 1.01, 1.03, 1.05, and 1.10 as the threshold for the consistent graphs. These are preferably user or system customizable. These thresholds will then be adjusted depending on each user's timing stability.

The absolute score of each subsample will be a success ratio of the comparisons carried out. Finally, the definitive "absolute score" of a sample will be the weighted sum of each subsample score. The weight of each subsample (currently equally weighted) will be determined by a later impact analysis .

This process is illustrated on Table 3.

Figure imgf000022_0001

* - marked as a consistent graph Table 3: Illustration of absolute scores comparisons

In this table, there is an example of a comparison that successfully made the regular threshold (<1.25) but was labeled as a failure because that specific trigraph was marked as a consistent graph and its difference is greater than threshold 1.10, the threshold for this case. The score of this example would be the success ratio, 1/4 = 0.25.

As far as the relative comparisons are concerned, samples are ordered regarding the graphs' times. Differences in the position of the shared graphs represent the list's disorder, which when divided by the maximum possible disorder outputs the final relative score. The maximum possible disorder is equivalent tc a list whose members are in inverse order, therefore given by X72 if X is even, (X~'-l)/2 if X is odd. This implementation comprises an additional feature: a degree of tolerance, for mitigating the accumulation of marginal errors caused by a single displaced item in the ordinal list (as depicted on Table 4) .

Figure imgf000023_0001

Table 4: Example of two attempt sample disorder scores, with the less scrambled one erroneously having a higher disorder score due tc the accumulation of marginal errors

Decision (6 )

The captured sample finally arrives to the Decision module with its score already calculated. At this stage an "identity acceptance threshold" is defined, which will determine if the attempt score is valid or not. This threshold is obtained by calculating the score of each previous attempt sample in the database against the merged sample (filtering out its own, attempt sample, occurrences, to avoid score adulteration) . Those scores are the necessary data to predict the highest possible threshold that the legitimate user is sure to meet. This identity acceptance threshold will therefore be different to each user, adapting to their unique typing stability. This is clearly distinct from most prior art implementations, namely from document US20060224898. Progressive learning (8)

This module is triggered in case an attempt is successfully validated. A valid attempt sample will be stored in database alongside the rest of the user's samples, to ensure that the user profile remains up-to-date with the user's typing dynamics modifications along the time. The number of attempt samples stored for each user will take into consideration the computational weight of the scoring process, as storing a new validated sample implies recalculating the acceptance threshold as above described. When the stored profile is already constituted by the defined sample limit, the oldest will be erased to give way for the new one.

Reporting (7)

The Reporting module is triggered in case an attempt sample fails to validate, indicating the probable presence of an int. ruder .

Intrusions will be reported to a security log supervised by a security administrator. This solution uses the attempt sample score's distance to the threshold of user authentication as the indicator for the Alarm Level of an intrusion detection - the rationale is that an intruder that scores far off the acceptance threshold is much more probably an intruder, than someone who scored below the acceptance threshold by a small margin. With intrusions being logged with three alarm degrees (Yellow, Orange, and Red, with ascending severity), the security administrator will be able to prioritize his reaction to the intrusions detected, and the problem of exaggerated False Alarms - an endemic problem of all IDSs (Axelsson, 1999) - can be alleviated through this mechanism.

Asynchronous Verification (2)

Prior art solutions, as well as the one described, trigger an identity verification procedure each time a sample is logged - what is called synchronous verifications. However, the larger number cf n-graphs shared per sample (resultant from this solution's adoption of a merged imported user profile) allow for a trustable identity verification even with shorter-sized samples, as long as a certain number of shared n-graphs is detected. This number can be user or system customizable.

This creates an opportunity for prevention mechanisms, integrating with existing software applications in order to trigger asynchronous verifications. For example, whenever a user saves a document or sends an e-mail, a verification process can be triggered to ensure this is an action performed by the legitimate user. At that phase, the system is most certainly still in the process of logging an attempt sample, but nevertheless it can be able to perform a valid decision. Otherwise, insufficient number of shared n-graphs detected during the asynchronous verification is an indicator (along with the last sample's timestamp) that the last synchronous decision may be recent enough for usage. This process also differentiates from the approaches of prior art. implementations. Moreover, the described method features novel safety measures, based on the fact that this biometric assessment, which encompasses a keystroke monitoring agent, is per se a reason for user apprehension (particularly since malicious keystroke loggers became ubiquitous) . The described method would pose a greater security risk than the original problem it tries to address, if the text typed by every user could be reproducible. Therefore, making sure that none of the mentioned attacks are possible is critical to the relevance of the method and to the acceptability of the solution .

Keystrokes are stored as a hash value, and the original keystroke identifiers are masked before hashing (otherwise known-plaintext attacks would be trivial) . Moreover, keystroke identifiers are uniquely generated for every user. Samples are sent to the server in large chunks of data, preventing typing sequences to be recovered. Without text reproduction, these samples are unusable, even if stolen. Still, studies have demonstrated that even with full knowledge of a user typing habit, learning and reproducing it by hand is a very difficult task (Rundhaug, 2007) . These strict security precautions also set this method apart from solutions (e.g. Patent No. US20100115610 ) that require handling private information.

Key masks and hash values are not reflected on the tables and figures of this description, in order to facilitate their comprehension. Typically these are based on known methods such as SHA256 plus salt, or any of the commonly used alternatives. Preferably, the salt can be based on the user login name, thus ensuring hashed information is different even if two different users are typing the same. The following claims set out particular embodiments invention .

Claims

C L A I M S
1. System for intrusion detection through keystroke dynamics of typing samples of individual users comprising :
- a periodic verification trigger module (1) for triggering periodic verifications, when a predetermined periodic condition occurs;
- an asynchronous verification trigger module (2), for triggering asynchronous verifications, when a predetermined specific condition occurs;
- a database module ( 4 i ;
- a sample processing module (3) for processing unrestrained text input, configured such that, during the enrolment phase, its output is directly fed to the database nodule (4) merged validated samples;
- a score calculation module (5) for verification of a current attempt sample, from said sample processing module (3), against the database module (4) merged validated samples;
- a decision module (6) for comparing the calculated score of a current attempt sample against an identity acceptance threshold for each previously enrolled individual user;
- an intrusion detection module (7), configured to be triggered by the decision module (6) if the decision does not valicate the current attempt sample;
- a progressive learning module (8), configured to be triggered by the decision module (6) if the decision does validate the current attempt sample, for storing and merging the current attempt sample with the database module (4) merged validated samples.
2. System according ;o the previous claim wherein the predetermined periodic condition occurs after a predetermined number of samples and/or a predetermined time interval and'or a predetermined number of graph subsamples of said samples and/or a predetermined number of typinq events of said samples.
System according tc any previous claim wherein the predetermined specific condition occurs after a predetermined user operation and/or user elevated- security operation and/or user file operation and/or user database operation.
4. System according to any previous claim wherein the system and the sample processing module (3) are configured such that the transition from enrolment phase is transparently performed without user intervention after a predetermined number of samples and/or a predetermined time interval and/or a predetermined number of graph subsamples of said samples and/or a predetermined number of typing events of said samples.
5. System according :o any previous claim wherein the system and the sample processing module (3) are configured to process the samples into graph subsamples .
6. System according to the previous claim wherein the system and the sample processing module (3) are configured tc filter outliers from said graph subsamples .
7. Method for intrusion detection through keystroke dynamics of typing samples of individual users compr i si ng :
- triggering periodic verifications (1), when a predeterminec. periodic condition occurs;
- triggering asynchronous verifications (2), when a predetermined specific condition occurs;
- processing unrestrained text input samples (3), such that, during the enrolment. phase, its output is directly fed to merged validated samples in a database module (4);
- calculating a score (5) for verification of a current attempt sample, from said sample processing (3), against the database nodule (4) merged validated samples ;
- comparing the calculated score of a current attempt sample against ar. identity acceptance threshold for each previously enrolled individual user for a deci sion ( 6 ) ;
- triggering an intrusion detection (7), if the decision (6) does not validate the current attempt sample ;
- triggering progressive learning (8), if the decision (6) does validate the current attempt sample, storing and merging the current attempt sample with the database module (4) merged validated samples.
8. Method according to claim 7 wherein the predetermined periodic condition occurs after a predetermined number of samples and/ or a predetermined time interval and/or a predetermined number of graph subsamples of said samples and/or a predetermined number of typing events of said samples.
9. Method according to claim 7 wherein the predetermined specific condition occurs after a predetermined user operation and/or user elevated-security operation and/or user file operation and/or user database oper at i on .
10. Method according to any claim 7 - 9 wherein the transition from enrolment phase is transparently performed without user intervention after a predetermined number of samples and/or a predetermined time interval and/or a predetermined number of graph subsamples of said samples and/or a predetermined number of typing events of said samples.
11. Method according to any claim 7 - 10 wherein the samples are processed into graph subsamples.
12. System according to the previous claim wherein the samples are filtered from outliers of said graph subsamples .
PCT/PT2011/000022 2011-07-07 2011-07-07 System and method for intrusion detection through keystroke dynamics WO2013006071A1 (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
PCT/PT2011/000022 WO2013006071A1 (en) 2011-07-07 2011-07-07 System and method for intrusion detection through keystroke dynamics

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
PCT/PT2011/000022 WO2013006071A1 (en) 2011-07-07 2011-07-07 System and method for intrusion detection through keystroke dynamics

Publications (1)

Publication Number Publication Date
WO2013006071A1 true WO2013006071A1 (en) 2013-01-10

Family

ID=44774093

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/PT2011/000022 WO2013006071A1 (en) 2011-07-07 2011-07-07 System and method for intrusion detection through keystroke dynamics

Country Status (1)

Country Link
WO (1) WO2013006071A1 (en)

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2014200667A1 (en) * 2013-06-13 2014-12-18 Motorola Mobility Llc Method and apparatus for electronic device access
WO2020009847A1 (en) * 2018-07-02 2020-01-09 Ebay Inc. Passive automated content entry detection system

Citations (9)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US6442692B1 (en) 1998-07-21 2002-08-27 Arkady G. Zilberman Security method and apparatus employing authentication by keystroke dynamics
US20040059950A1 (en) 2002-09-24 2004-03-25 Bender Steven S. Key sequence rhythm recognition system and method
US20040187037A1 (en) 2003-02-03 2004-09-23 Checco John C. Method for providing computer-based authentication utilizing biometrics
US20060224898A1 (en) 2003-05-02 2006-10-05 Ahmed Ahmed E System and method for determining a computer user profile from a motion-based input device
US20060242424A1 (en) 2004-04-23 2006-10-26 Kitchens Fred L Identity authentication based on keystroke latencies using a genetic adaptive neural network
US20060271790A1 (en) * 2005-05-25 2006-11-30 Wenying Chen Relative latency dynamics for identity authentication
US20090150992A1 (en) 2007-12-07 2009-06-11 Kellas-Dicks Mechthild R Keystroke dynamics authentication techniques
US20100115610A1 (en) 2008-11-05 2010-05-06 Xerox Corporation Method and system for providing authentication through aggregate analysis of behavioral and time patterns
GB2470579A (en) 2009-05-27 2010-12-01 Univ Abertay Dundee A behavioural biometric security system using keystroke metrics

Patent Citations (9)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US6442692B1 (en) 1998-07-21 2002-08-27 Arkady G. Zilberman Security method and apparatus employing authentication by keystroke dynamics
US20040059950A1 (en) 2002-09-24 2004-03-25 Bender Steven S. Key sequence rhythm recognition system and method
US20040187037A1 (en) 2003-02-03 2004-09-23 Checco John C. Method for providing computer-based authentication utilizing biometrics
US20060224898A1 (en) 2003-05-02 2006-10-05 Ahmed Ahmed E System and method for determining a computer user profile from a motion-based input device
US20060242424A1 (en) 2004-04-23 2006-10-26 Kitchens Fred L Identity authentication based on keystroke latencies using a genetic adaptive neural network
US20060271790A1 (en) * 2005-05-25 2006-11-30 Wenying Chen Relative latency dynamics for identity authentication
US20090150992A1 (en) 2007-12-07 2009-06-11 Kellas-Dicks Mechthild R Keystroke dynamics authentication techniques
US20100115610A1 (en) 2008-11-05 2010-05-06 Xerox Corporation Method and system for providing authentication through aggregate analysis of behavioral and time patterns
GB2470579A (en) 2009-05-27 2010-12-01 Univ Abertay Dundee A behavioural biometric security system using keystroke metrics

Cited By (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2014200667A1 (en) * 2013-06-13 2014-12-18 Motorola Mobility Llc Method and apparatus for electronic device access
US9369870B2 (en) 2013-06-13 2016-06-14 Google Technology Holdings LLC Method and apparatus for electronic device access
WO2020009847A1 (en) * 2018-07-02 2020-01-09 Ebay Inc. Passive automated content entry detection system

Similar Documents

Publication Publication Date Title
Teh et al. A survey on touch dynamics authentication in mobile devices
US9626677B2 (en) Identification of computerized bots, and identification of automated cyber-attack modules
US10454922B2 (en) System and method for recognizing malicious credential guessing attacks
Teh et al. A survey of keystroke dynamics biometrics
Feng et al. Continuous mobile authentication using touchscreen gestures
US10069852B2 (en) Detection of computerized bots and automated cyber-attack modules
Feher et al. User identity verification via mouse dynamics
Giuffrida et al. I sensed it was you: authenticating mobile users with sensor-enhanced keystroke dynamics
US20160269411A1 (en) System and Method for Anonymous Biometric Access Control
Saevanee et al. Continuous user authentication using multi-modal biometrics
US10002244B2 (en) Utilization of biometric data
EP2477136B1 (en) Method for continuously verifying user identity via keystroke dynamics
Bours Continuous keystroke dynamics: A different perspective towards biometric evaluation
CN104408341B (en) Smart phone user identity identifying method based on gyroscope behavioural characteristic
Tasia et al. Two novel biometric features in keystroke dynamics authentication systems for touch screen devices
Li et al. Unobservable re-authentication for smartphones.
Araújo et al. User authentication through typing biometrics features
Prabhakar et al. Biometric recognition: Security and privacy concerns
Sim et al. Continuous verification using multimodal biometrics
KR101552587B1 (en) Location-based access control for portable electronic device
Jorgensen et al. On mouse dynamics as a behavioral biometric for authentication
Mahfouz et al. A survey on behavioral biometric authentication on smartphones
Crawford Keystroke dynamics: Characteristics and opportunities
O'Gorman Comparing passwords, tokens, and biometrics for user authentication
US9686300B1 (en) Intrusion detection on computing devices

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 11767483

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase in:

Ref country code: DE

122 Ep: pct application non-entry in european phase

Ref document number: 11767483

Country of ref document: EP

Kind code of ref document: A1