CN110502883A - A kind of keystroke abnormal behavior detection method based on PCA - Google Patents

A kind of keystroke abnormal behavior detection method based on PCA Download PDF

Info

Publication number
CN110502883A
CN110502883A CN201910785323.9A CN201910785323A CN110502883A CN 110502883 A CN110502883 A CN 110502883A CN 201910785323 A CN201910785323 A CN 201910785323A CN 110502883 A CN110502883 A CN 110502883A
Authority
CN
China
Prior art keywords
data
keystroke
pca
abnormal
score
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN201910785323.9A
Other languages
Chinese (zh)
Other versions
CN110502883B (en
Inventor
刘录
常清雪
文有庆
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Sichuan Changhong Electric Co Ltd
Original Assignee
Sichuan Changhong Electric Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Sichuan Changhong Electric Co Ltd filed Critical Sichuan Changhong Electric Co Ltd
Priority to CN201910785323.9A priority Critical patent/CN110502883B/en
Publication of CN110502883A publication Critical patent/CN110502883A/en
Application granted granted Critical
Publication of CN110502883B publication Critical patent/CN110502883B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F11/00Error detection; Error correction; Monitoring
    • G06F11/30Monitoring
    • G06F11/34Recording or statistical evaluation of computer activity, e.g. of down time, of input/output operation ; Recording or statistical evaluation of user activity, e.g. usability assessment
    • G06F11/3438Recording or statistical evaluation of computer activity, e.g. of down time, of input/output operation ; Recording or statistical evaluation of user activity, e.g. usability assessment monitoring of user actions
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F11/00Error detection; Error correction; Monitoring
    • G06F11/30Monitoring
    • G06F11/34Recording or statistical evaluation of computer activity, e.g. of down time, of input/output operation ; Recording or statistical evaluation of user activity, e.g. usability assessment
    • G06F11/3466Performance evaluation by tracing or monitoring
    • G06F11/3476Data logging
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F21/00Security arrangements for protecting computers, components thereof, programs or data against unauthorised activity
    • G06F21/30Authentication, i.e. establishing the identity or authorisation of security principals
    • G06F21/31User authentication
    • G06F21/316User authentication by observing the pattern of computer usage, e.g. typical user behaviour

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • General Engineering & Computer Science (AREA)
  • Computer Hardware Design (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • Quality & Reliability (AREA)
  • Computer Security & Cryptography (AREA)
  • Health & Medical Sciences (AREA)
  • General Health & Medical Sciences (AREA)
  • Social Psychology (AREA)
  • Software Systems (AREA)
  • Collating Specific Patterns (AREA)
  • Input From Keyboards Or The Like (AREA)

Abstract

The keystroke abnormal behavior detection method based on PCA that the invention discloses a kind of, comprising the following steps: A. collects the keystroke data of user, including user keystroke duration and keystroke interval time;B. keystroke data pre-processes, and handles missing data or format error data, then carries out centralization and normalization to keystroke data;C. PCA abnormality detection model is established, comprehensive abnormal score is obtained;D. abnormal score threshold is set, determines that the keystroke behavior is abnormal behaviour if detection sample exception score is greater than threshold value.Method of the invention, which can be realized, establishes model for normal users keystroke data, detects abnormal behaviour, does not need a large amount of data and does training, and PCA algorithm has calculation amount small, the simple feature of model.

Description

PCA-based keystroke behavior anomaly detection method
Technical Field
The invention relates to the technical field of machine learning and network security, in particular to a keystroke behavior abnormity detection method based on PCA.
Background
The biometric authentication technology performs identity authentication through physiological characteristics specific to each person, such as fingerprints, palm prints, faces, irises and the like, or behavior characteristics, such as handwriting, voice and the like. Since these features are largely unique and non-moldable, the risk of the user being impersonated is greatly reduced. As the development of biometric authentication technology matures, the biometric authentication technology has been successfully applied in many fields, but the popularity of the biometric authentication technology is restricted due to the need of an additional biometric extraction device with high cost. Compared with other biological methods, the keystroke characteristic has obvious advantages, the keyboard is used as the characteristic extraction device, only identification software is embedded in a computer system, and the cost is low. Secondly, it perfectly combines the user login process and the authentication process together, and does not cause any influence on the user use.
In past research, the keystroke pattern recognition mainly uses methods such as a support vector machine, a BP neural network and the like. The support vector machine establishes a two-classification model for normal user keystroke behavior data and abnormal data, and since the keystroke data of the user and other keystroke data belong to the abnormal data, the abnormal behavior data are various in types, cannot be comprehensively collected, and influence on the model effect, and the model calculation complexity of the support vector machine is high. The abnormal keystroke detection is to establish a model for each user, and when the number of users is large, the problem of overlarge space occupation exists in the support vector machine model. The BP neural network algorithm model agreement also has the problem of complex structure, and the training process needs a large amount of data, and the accuracy of the model is influenced when the data volume of a user is insufficient.
Disclosure of Invention
The invention aims to overcome the defects in the background technology, provides a keystroke behavior abnormity detection method based on PCA, can realize the establishment of a model aiming at keystroke data of a normal user and the detection of abnormal behaviors, does not need a large amount of data for training, and has the characteristics of small calculated amount and simple model of the PCA algorithm.
In order to achieve the technical effects, the invention adopts the following technical scheme:
a PCA-based keystroke behavior anomaly detection method comprises the following steps:
A. collecting user keystroke data including user keystroke duration and keystroke interval time;
B. preprocessing keystroke data, processing missing data or format error data, and then centralizing and normalizing the keystroke data; centralizing and normalizing the data, wherein the centralizing aims to make the following formula description more concise and not influence the decomposition of characteristic values, and the normalization aims to control the variance variation scales of different variables in the same range and eliminate the influence of different dimensions so that the variables are more comparable;
C. establishing a PCA anomaly detection model to obtain a comprehensive anomaly score; establishing a keystroke behavior anomaly detection model through PCA, wherein the feature vector obtained after the PCA is subjected to feature value decomposition reflects different directions of variance change degrees of original data, and the feature value is the variance of the data in the corresponding direction, so that the feature vector corresponding to the maximum feature value is the direction with the maximum data variance, and the feature vector corresponding to the minimum feature value is the direction with the minimum data variance, and if the characteristics of a single data sample and the characteristics of the whole data sample are not consistent, for example, the single data sample is greatly deviated from other data samples in certain directions, the data sample is possibly an anomaly point;
D. and setting an abnormal score threshold, and judging that the keystroke behavior is abnormal if the detected sample abnormal score is greater than the threshold.
Further, in the step a, the keystroke duration, i.e. the difference between the time when the key is lifted and the time when the key is pressed, is specifically calculated as:the calculation formula of the keystroke interval time is as follows:wherein,indicating the moment of the ith key press,the method is characterized in that the key stroke interval time of the user is the time difference between the pressing of a next key and the pressing of a previous key in the calculation of the key stroke interval time in the method, so that the condition that the previous key is not pressed when a second key of the user is pressed can be avoidedA raised condition.
Further, the step B specifically performs the following processing on the keystroke data:wherein x' is processed data, x is original data,and the sigma is the standard deviation of the original data, so that the original keystroke data are preprocessed to obtain data which has the mean value of 0 and the standard deviation of 1 and is subjected to standard normal distribution, and the data are used as the data of the next model training.
Further, the step C includes:
C1. solving eigenvalues and eigenvectors of keystroke data, including computing covariance matricesSolving eigenvalues λ of covariance matrix1,λ2,...,λmAnd a feature vector e1,e2,...,em
C2. Forming a conversion matrix: arranging the eigenvectors from left to right according to the sequence of the eigenvalues from big to small to form an eigenvector matrix P;
C3. and (3) performing dimensionality reduction on the data X: determining a reduced dimension k and data conversion, wherein the eigenvector corresponding to the maximum eigenvalue is the direction with the maximum data variance and also the direction with the maximum original information, the eigenvector corresponding to the minimum eigenvalue is the direction with the minimum data variance and also the direction with the minimum original information, and when the information utilization rate reaches more than 99%, the reduced dimension k is obtained by the following formula:
taking the eigenvector P corresponding to the first k eigenvaluesk(ii) a Then, dimension reduction is carried out on X, and data after conversion is Y-XPk
C4. Counting ofAccording to the abnormal score: for a certain feature vector ejData sample XiDegree of deviation d in this directionijThe calculation is as follows:after the deviation degrees of the data in all directions are calculated, the deviation degrees in all directions are added up to obtain a comprehensive abnormal score:the variance change of the data in different directions reflects the inherent characteristics of the data, and if the characteristics of a single data sample are inconsistent with the characteristics of the whole data sample, and the deviation of the sample is large, the sample is identified as an abnormal point.
Further, in the step D, after the user abnormal score is obtained, a mean value and a standard deviation of all the abnormal scores of the user are calculated, and a sum of the mean value plus 3 times the standard deviation is used as an abnormal score threshold.
Compared with the prior art, the invention has the following beneficial effects:
according to the keystroke behavior abnormity detection method based on PCA, the PCA algorithm is used for establishing the keystroke behavior abnormity detection model for detecting the abnormal behavior, a large amount of data is not required for training, the problems of large calculated amount and large required data amount in the keystroke abnormity identification model can be effectively solved, and the method has the advantages of small calculated amount and simple model.
Drawings
FIG. 1 is a schematic diagram of keystroke data extraction of the present invention.
Detailed Description
The invention will be further elucidated and described with reference to the embodiments of the invention described hereinafter.
Example (b):
the first embodiment is as follows:
a PCA-based keystroke behavior anomaly detection method comprises the following steps:
step 1: keystroke data is collected for the user, including the user's keystroke duration and the keystroke interval, and is extracted in the manner shown in FIG. 1.
The key press duration is:wherein,indicating the moment of the ith key press,indicating the ith key up time;
the keystroke interval time is:the keystroke interval time is the time interval between the depression of a subsequent key and the depression of a previous key.
Step 2: the keystroke data is pre-processed.
The method comprises the following steps of processing missing data or format error data, centralizing and normalizing the data to be used as training data of a next model, wherein the specific calculation mode is as follows:wherein x' is the processed data, x is the original data, is the mean value of the original data, and is the standard deviation of the original data, so that the data which is subjected to the standard normal distribution and has the mean value of 0 and the standard deviation of 1 is obtained after the original keystroke data is preprocessed and is used as the data of the next model training.
And step 3: and establishing a PCA anomaly detection model.
And 3.1, solving the eigenvalue and the eigenvector of the data:
by calculating covariance matricesSolving eigenvalues λ of covariance matrix1,λ2,...,λmAnd a feature vector e1,e2,...,em
And 3.2, forming a conversion matrix: and arranging the eigenvectors from left to right according to the sequence of the eigenvalues from large to small to form an eigenvector matrix P.
And 3.3, performing dimensionality reduction on the data X:
the method comprises the steps of firstly determining a reduced dimension k, wherein a feature vector corresponding to a maximum feature value is the direction with the maximum data variance and also the direction with the maximum original information, a feature vector corresponding to a minimum feature value is the direction with the minimum data variance and also the direction with the minimum original information, and when the information utilization rate is more than 99%, namely:k values are obtained, and the feature vectors P corresponding to the first k feature values are obtainedk(ii) a Then, dimension reduction is carried out on X, and data after conversion is Y-XPk
And 3.4, calculating the abnormal score of the data:
for a certain feature vector ejData sample XiDegree of deviation d in this directionijThe calculation is as follows:after the deviation degrees of the data in all directions are calculated, the deviation degrees in all directions are added up to obtain a comprehensive abnormal score:
the variance change of the data in different directions reflects the inherent characteristics of the data, and if the characteristics of a single data sample and the characteristics of the whole data sample are inconsistent and the deviation of the sample is large, the sample is identified as an abnormal point.
And 4, step 4: and setting an abnormal score threshold, and judging that the keystroke behavior is abnormal if the abnormal score threshold is larger than the abnormal score threshold.
For example, calculate the anomaly score for 20 pieces of keystroke behavior data for a user:
Score=[score0,score1,...,score19]
calculating the mean x of the anomaly scores-And standard deviation sigma, setting a threshold value as mean plus 3 times standard deviation:
and if the detected sample abnormal score is larger than the threshold value, judging that the keystroke behavior is abnormal behavior.
It will be understood that the above embodiments are merely exemplary embodiments taken to illustrate the principles of the present invention, which is not limited thereto. It will be apparent to those skilled in the art that various modifications and improvements can be made without departing from the spirit and substance of the invention, and these modifications and improvements are also considered to be within the scope of the invention.

Claims (5)

1. A PCA-based keystroke behavior anomaly detection method is characterized by comprising the following steps:
A. collecting user keystroke data including user keystroke duration and keystroke interval time;
B. preprocessing keystroke data, processing missing data or format error data, and then centralizing and normalizing the keystroke data;
C. establishing a PCA anomaly detection model to obtain a comprehensive anomaly score;
D. and setting an abnormal score threshold, and judging that the keystroke behavior is abnormal if the detected sample abnormal score is greater than the threshold.
2. The PCA-based keystroke behavior anomaly detection method of claim 1, wherein in step A, the keystroke duration calculation formula is:the calculation formula of the keystroke interval time is as follows:wherein,indicating the moment of the ith key press,indicating the ith key lifting time.
3. The PCA-based keystroke behavior anomaly detection method of claim 2, wherein said step B specifically performs the following processing on the keystroke data:wherein x' is processed data, x is original data,and the sigma is the standard deviation of the original data, so that the original keystroke data are preprocessed to obtain data which has the mean value of 0 and the standard deviation of 1 and is subjected to standard normal distribution, and the data are used as the data of the next model training.
4. The PCA-based keystroke behavior anomaly detection method of claim 3, wherein said step C comprises:
C1. solving eigenvalues and eigenvectors of keystroke data, including computing covariance matricesSolving eigenvalues λ of covariance matrix1,λ2,...,λmAnd a feature vector e1,e2,...,em
C2. Forming a conversion matrix: arranging the eigenvectors from left to right according to the sequence of the eigenvalues from big to small to form an eigenvector matrix P;
C3. and (3) performing dimensionality reduction on the data X: includes determining a reduced dimension k and dataAnd converting, wherein the eigenvector corresponding to the maximum eigenvalue is the direction with the maximum data variance, the eigenvector corresponding to the minimum eigenvalue is the direction with the minimum data variance, and the reduced dimension k is obtained by the following formula:taking the eigenvector P corresponding to the first k eigenvaluesk(ii) a Then, dimension reduction is carried out on X, and data after conversion is Y-XPk
C4. Calculating an anomaly score for the data: for a certain feature vector ejData sample XiDegree of deviation d in this directionijThe calculation is as follows:after the deviation degrees of the data in all directions are calculated, the deviation degrees in all directions are added up to obtain a comprehensive abnormal score:
5. the PCA-based keystroke behavior abnormality detection method of claim 4, wherein in step D, after the abnormality score of the user is obtained, the average value and the standard deviation of all the abnormality scores of the user are calculated, and the sum of the average value plus 3 times the standard deviation is used as the abnormality score threshold.
CN201910785323.9A 2019-08-23 2019-08-23 PCA-based keystroke behavior anomaly detection method Active CN110502883B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201910785323.9A CN110502883B (en) 2019-08-23 2019-08-23 PCA-based keystroke behavior anomaly detection method

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201910785323.9A CN110502883B (en) 2019-08-23 2019-08-23 PCA-based keystroke behavior anomaly detection method

Publications (2)

Publication Number Publication Date
CN110502883A true CN110502883A (en) 2019-11-26
CN110502883B CN110502883B (en) 2022-08-19

Family

ID=68589339

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201910785323.9A Active CN110502883B (en) 2019-08-23 2019-08-23 PCA-based keystroke behavior anomaly detection method

Country Status (1)

Country Link
CN (1) CN110502883B (en)

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111984952A (en) * 2020-09-03 2020-11-24 四川长虹电器股份有限公司 HMM-based user input behavior abnormity identification method
CN114509690A (en) * 2022-04-19 2022-05-17 杭州宇谷科技有限公司 PCA (principal component analysis) decomposition-based lithium battery cell charging and discharging abnormity detection method and system

Citations (10)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101833619A (en) * 2010-04-29 2010-09-15 西安交通大学 Method for judging identity based on keyboard-mouse crossed certification
US20110320816A1 (en) * 2009-03-13 2011-12-29 Rutgers, The State University Of New Jersey Systems and method for malware detection
CN105389486A (en) * 2015-11-05 2016-03-09 同济大学 Authentication method based on mouse behavior
CN105933267A (en) * 2015-08-21 2016-09-07 中国银联股份有限公司 Identity authentication method and device
CN106101116A (en) * 2016-06-29 2016-11-09 东北大学 A kind of user behavior abnormality detection system based on principal component analysis and method
CN109145554A (en) * 2018-07-12 2019-01-04 温州大学苍南研究院 A kind of recognition methods of keystroke characteristic abnormal user and system based on support vector machines
CN109308306A (en) * 2018-09-29 2019-02-05 重庆大学 A kind of user power utilization anomaly detection method based on isolated forest
CN109377409A (en) * 2018-09-29 2019-02-22 重庆大学 A kind of user power utilization anomaly detection method based on BP neural network
CN109447099A (en) * 2018-08-28 2019-03-08 西安理工大学 A kind of Combining Multiple Classifiers based on PCA dimensionality reduction
CN109815655A (en) * 2017-11-22 2019-05-28 北京纳米能源与系统研究所 Identification and verifying system, method, apparatus and computer readable storage medium

Patent Citations (10)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20110320816A1 (en) * 2009-03-13 2011-12-29 Rutgers, The State University Of New Jersey Systems and method for malware detection
CN101833619A (en) * 2010-04-29 2010-09-15 西安交通大学 Method for judging identity based on keyboard-mouse crossed certification
CN105933267A (en) * 2015-08-21 2016-09-07 中国银联股份有限公司 Identity authentication method and device
CN105389486A (en) * 2015-11-05 2016-03-09 同济大学 Authentication method based on mouse behavior
CN106101116A (en) * 2016-06-29 2016-11-09 东北大学 A kind of user behavior abnormality detection system based on principal component analysis and method
CN109815655A (en) * 2017-11-22 2019-05-28 北京纳米能源与系统研究所 Identification and verifying system, method, apparatus and computer readable storage medium
CN109145554A (en) * 2018-07-12 2019-01-04 温州大学苍南研究院 A kind of recognition methods of keystroke characteristic abnormal user and system based on support vector machines
CN109447099A (en) * 2018-08-28 2019-03-08 西安理工大学 A kind of Combining Multiple Classifiers based on PCA dimensionality reduction
CN109308306A (en) * 2018-09-29 2019-02-05 重庆大学 A kind of user power utilization anomaly detection method based on isolated forest
CN109377409A (en) * 2018-09-29 2019-02-22 重庆大学 A kind of user power utilization anomaly detection method based on BP neural network

Non-Patent Citations (4)

* Cited by examiner, † Cited by third party
Title
IGNACIO DE MENDIZABAL-VA´ZQUEZ 等: "Supervised classification methods applied to Keystroke Dynamics through Mobile Devices", 《2014 INTERNATIONAL CARNAHAN CONFERENCE ON SECURITY TECHNOLOGY (ICCST)》 *
吴梦溪: "基于输入特征的用户身份认证的研究", 《中国优秀博硕士学位论文全文数据库(硕士) 信息科技辑》 *
王焘 等: "一种基于自适应监测的云计算系统故障检测方法", 《计算机学报》 *
郭志民 等: "基于用户与网络行为分析的主机异常检测方法", 《北京交通大学学报》 *

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111984952A (en) * 2020-09-03 2020-11-24 四川长虹电器股份有限公司 HMM-based user input behavior abnormity identification method
CN114509690A (en) * 2022-04-19 2022-05-17 杭州宇谷科技有限公司 PCA (principal component analysis) decomposition-based lithium battery cell charging and discharging abnormity detection method and system

Also Published As

Publication number Publication date
CN110502883B (en) 2022-08-19

Similar Documents

Publication Publication Date Title
US6826300B2 (en) Feature based classification
CN110502883B (en) PCA-based keystroke behavior anomaly detection method
CN102254188A (en) Palmprint recognizing method and device
JP2002304626A (en) Data classifying device and body recognizing device
CN107220627A (en) Pose-varied face recognition method based on cooperation fuzzy mean discriminatory analysis
EP3371739A1 (en) High speed reference point independent database filtering for fingerprint identification
Shawkat et al. The new hand geometry system and automatic identification
CN114384999B (en) User-independent myoelectric gesture recognition system based on self-adaptive learning
Farhan et al. A new model for pattern recognition
Rastogi et al. Nir palm vein pattern recognition
CN114840834A (en) Implicit identity authentication method based on gait characteristics
Lin et al. Optical sensor measurement and biometric-based fractal pattern classifier for fingerprint recognition
CN117558281A (en) Speaker identification method and system based on enhanced self-supervision framework
CN109446780B (en) Identity authentication method, device and storage medium thereof
CN111428643A (en) Finger vein image recognition method and device, computer equipment and storage medium
Neha et al. Biometric re-authentication: An approach towards achieving transparency in user authentication
WO2002080088A1 (en) Method for biometric identification
Lee et al. A New Similarity Measure Based on Intraclass Statistics for Biometrie Systems
Rajagopal et al. Performance evaluation of multimodal multifeature authentication system using KNN classification
Kundu et al. An efficient integrator based on template matching technique for person authentication using different biometrics
Tahmasebi et al. Signature identification using dynamic and HMM features and KNN classifier
Kundu et al. A modified RBFN based on heuristic based clustering for location invariant fingerprint recognition and localization with and without occlusion
JP2021140784A (en) Spoofing prevention method and apparatus
CN111984952A (en) HMM-based user input behavior abnormity identification method
Chaabane et al. Iris Recognition Based on Multilevel Thresholding Technique and Modified Fuzzy c-Means Algorithm.

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant