CN109145554A - A kind of recognition methods of keystroke characteristic abnormal user and system based on support vector machines - Google Patents

A kind of recognition methods of keystroke characteristic abnormal user and system based on support vector machines Download PDF

Info

Publication number
CN109145554A
CN109145554A CN201810763718.4A CN201810763718A CN109145554A CN 109145554 A CN109145554 A CN 109145554A CN 201810763718 A CN201810763718 A CN 201810763718A CN 109145554 A CN109145554 A CN 109145554A
Authority
CN
China
Prior art keywords
sample
behavioural characteristic
data
user
keystroke
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN201810763718.4A
Other languages
Chinese (zh)
Inventor
戴大蒙
单鹏飞
陆岚
夏海江
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Wenzhou University
Original Assignee
Cangnan Institute Of Cangnan
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Cangnan Institute Of Cangnan filed Critical Cangnan Institute Of Cangnan
Priority to CN201810763718.4A priority Critical patent/CN109145554A/en
Publication of CN109145554A publication Critical patent/CN109145554A/en
Pending legal-status Critical Current

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F21/00Security arrangements for protecting computers, components thereof, programs or data against unauthorised activity
    • G06F21/30Authentication, i.e. establishing the identity or authorisation of security principals
    • G06F21/31User authentication
    • G06F21/32User authentication using biometric data, e.g. fingerprints, iris scans or voiceprints
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/24Classification techniques
    • G06F18/241Classification techniques relating to the classification model, e.g. parametric or non-parametric approaches
    • G06F18/2413Classification techniques relating to the classification model, e.g. parametric or non-parametric approaches based on distances to training or reference patterns

Abstract

The invention discloses a kind of recognition methods of keystroke characteristic abnormal user and system based on support vector machines, the first sample behavioural characteristic data of default sample are inputted by obtaining sample of users, quantity including Key stroke character difference type of error, keystroke speed, keystroke average speed, keystroke instantaneous velocity, the stability of keystroke accuracy and keystroke, the behavioural characteristic of user keystroke individual difference characterization user identity can will be more embodied as behavioural characteristic library, in addition preset behavioural characteristic sample database is used as after carrying out Supplementing Data processing to the behavioural characteristic data lost in behavioural characteristic data, so that sample of users behavioural characteristic data are more complete, so that discrimination is greatly improved compared with prior art.

Description

A kind of recognition methods of keystroke characteristic abnormal user and system based on support vector machines
Technical field
The present invention relates to area of pattern recognition, and in particular to a kind of keystroke characteristic abnormal user knowledge based on support vector machines Other method and system.
Background technique
Currently, password, password and user name certification are main user authentication modes in internetwork operation.But it is this The problem of mechanism maximum is exactly to be easy leakage individual privacy.With machine learning, the development of the biometrics such as deep learning, New resolving ideas is provided for Internet authentication.
Biometrics are and these biological informations using everyone special physiologic information and distinctive behavioural information With very strong identification and uniqueness, the screening of abnormal user can be effectively carried out by biometrics.
Since people are to the difference of the keystroke habit and personal character of keyboard so that everyone is in input password or completion one Oneself unique key stroke pattern has been respectively formed when section text input.Key stroke pattern can react the dynamics that a people beats keyboard, Speed, habit etc. of pausing, these features are difficult to be imitated.The optional foundation of Washington, DC by further studies confirmed The uniqueness of people's keystroke characteristic, so key stroke pattern can represent user identity.In a practical situation in order to accomplish to unite in real time It counts, very big probability will appear the loss of partial data, to be easy to cause the problem of reducing discrimination.
Summary of the invention
Therefore, the present invention provides a kind of recognition methods of keystroke characteristic abnormal user and system based on support vector machines, solution It has determined problem not high to the user identity identification rate based on keystroke characteristic in the prior art.
A kind of keystroke characteristic abnormal user recognition methods based on support vector machines provided in an embodiment of the present invention, including such as Lower step: the behavioural characteristic data that user to be identified inputs default sample are obtained;According to preset disaggregated model and preset Behavioural characteristic sample database identifies the behavioural characteristic data of the user to be identified, generates recognition result;By following Step establishes the preset behavioural characteristic sample database: obtaining the first sample behavioural characteristic number that sample of users inputs default sample According to;Supplementing Data processing is carried out to the behavioural characteristic data lost in the first sample behavioural characteristic data, forms the second sample This behavioural characteristic data, and using the second sample behavioural characteristic data as the preset behavioural characteristic sample database.
Preferably, the first sample behavioural characteristic data include at least one of the following contents: Key stroke character The stabilization of the quantity of different type of errors, keystroke speed, keystroke average speed, keystroke instantaneous velocity, keystroke accuracy and keystroke Property.
Preferably, described that Supplementing Data is carried out to the behavioural characteristic data lost in the first sample behavioural characteristic data The step of processing, specifically includes: the first sample behavioural characteristic data are normalized;After normalized The behavioural characteristic data of first sample behavioural characteristic data judgement sample user whether lose;Behavioural characteristic data are not lost Sample of users and its first sample behavioural characteristic data as training set, the sample of users of behavioural characteristic loss of data and its The same this behavioural characteristic data do not lose number as test set, according to the determination of the training set, test set and lasso regression model According to behavioural characteristic and lose data behavioural characteristic between weight;The behavior lost according to the weight and training set completion Characteristic forms the second sample behavioural characteristic data.
Preferably, the step of behavioural characteristic data lost according to the weight and training set completion, specifically include: The characteristic value of the behavioural characteristic of data is lost according to the weight and training set completion;According to the spy for the behavioural characteristic for losing data The behavioural characteristic data that value indicative completion is lost.
Preferably, it is calculated by the following formula the behavioural characteristic that do not lose and loses the weight between feature:
Wherein, J (W) is loss function, and X is the characteristic value that the training set does not lose behavioural characteristic, and N is sample of users Quantity, y are the characteristic value that the training set loses behavioural characteristic, and w is the power between the behavioural characteristic that do not lose and loss feature Weight, α is hyper parameter.
Preferably, it is special to form the second sample behavior for the behavioural characteristic data lost according to the weight and training set completion In the step of levying data, it is calculated by the following formula the characteristic value for losing the behavioural characteristic of data:
Wherein,For the characteristic value for the behavioural characteristic that the test set is lost, w is the behavioural characteristic that do not lose and loses special Weight between sign, b are bias.
Preferably, in the behavioural characteristic data lost according to the weight and training set completion, the second sample is formed After the step of behavioural characteristic data, further includes:
Correlation analysis is carried out to the second sample behavioural characteristic data, obtains analysis result;
According to the analysis as a result, being screened to the second sample behavioural characteristic;By principal component analysis to described Data after screening carry out dimension-reduction treatment;Using the data after dimension-reduction treatment as user behavior characteristics sample database.
Preferably, dimension-reduction treatment is carried out to the data after the screening by following formula:
Wherein, x(i)For the feature vector of current dimension, x(i) approxIt is the feature vector after dimension-reduction treatment, α is preset Threshold value, m represent the quantity of the sample of users.
Preferably, it is described according to preset disaggregated model and preset behavioural characteristic sample database to the user to be identified Behavioural characteristic data identified, generate recognition result the step of, specifically include:
Obtain the sample of users set of the preset behavioural characteristic sample database;Successively in the sample of users set One of sample of users is as positive collection, other sample of users are as negative collection;Pass through dimensionality reduction according to the user to be identified Behavioural characteristic data that treated and support vector cassification model obtain recognition result;The recognition result is ranked up, The corresponding identity as the sample of users just collected of the maximum value of recognition result is determined as to the identity of the user to be identified; Judge whether the identity of the user to be identified belongs to the sample of users set;When the user to be identified identity not When belonging to the sample of users set, determine that the user to be identified is abnormal user.
Preferably, it is described according to preset disaggregated model and preset behavioural characteristic sample database to the user to be identified Behavioural characteristic data identified, generate recognition result the step of, specifically include:
Obtain the sample of users set of the user behavior characteristics sample database;Successively wherein one in user set A user's sample is as positive collection, and other users sample is as negative collection;According to the user to be identified after dimension-reduction treatment Behavioural characteristic data and support vector cassification model in identified, obtain recognition result;According to the recognition result into Row sequence, judges whether the maximum value in the recognition result is greater than a preset value;When the maximum value in the recognition result is big When the preset value, the identity as the sample of users just collected is determined as to the identity of the user to be identified;When described When maximum value in recognition result is less than the preset value, determine that the user to be identified is abnormal user.
The embodiment of the present invention also provides a kind of keystroke characteristic abnormal user identifying system based on support vector machines, comprising: User behavior characteristics extraction module to be identified inputs the behavioural characteristic data of default sample for obtaining user to be identified; User's categorization module to be identified is used for according to preset disaggregated model and preset behavioural characteristic sample database to described to be identified The behavioural characteristic data of user classify, generate classification results.The preset behavioural characteristic is established by following steps Sample database:
Obtain the first sample behavioural characteristic data that sample of users inputs default sample;To the first sample behavioural characteristic The behavioural characteristic data lost in data carry out Supplementing Data processing, form the second sample behavioural characteristic data, and by described the Two sample behavioural characteristic data are as the preset behavioural characteristic sample database.
The embodiment of the present invention also provides a kind of computer equipment, comprising: at least one processor, and with described at least one The memory of a processor communication connection, wherein the memory is stored with the finger that can be executed by least one described processor Enable, described instruction executed by least one described processor so that at least one described processor execute it is above-mentioned based on support The keystroke characteristic abnormal user recognition methods of vector machine.
The embodiment of the present invention also provides a kind of computer readable storage medium, and the computer-readable recording medium storage has Computer instruction, the computer instruction are different for making the computer execute the above-mentioned keystroke characteristic based on support vector machines Normal user identification method.
Technical solution of the present invention has the advantages that
1. the recognition methods of keystroke characteristic abnormal user and system provided in an embodiment of the present invention based on support vector machines is led to It crosses and obtains the first sample behavioural characteristic data that sample of users inputs default sample, including Key stroke character difference type of error Quantity, the stability of keystroke speed, keystroke average speed, keystroke instantaneous velocity, keystroke accuracy and keystroke, can will more embody User keystroke individual difference characterizes the behavioural characteristic of user identity as behavioural characteristic library, so that the identification of user identity to be identified Rate is improved significantly.
2. the recognition methods of keystroke characteristic abnormal user and system provided in an embodiment of the present invention based on support vector machines, right The behavioural characteristic data lost in behavioural characteristic data are used as preset behavioural characteristic sample database after carrying out Supplementing Data processing, make It is more complete to obtain sample of users behavioural characteristic data.The behavioural characteristic data that user to be identified is inputted to default sample, according to Preset disaggregated model and preset behavioural characteristic sample database classify to the behavioural characteristic data of the user to be identified, It generates classification results and determines its identity, and then judge whether it is abnormal user, so that discrimination has obtained very compared with prior art Big raising.
Detailed description of the invention
It, below will be to specific in order to illustrate more clearly of the specific embodiment of the invention or technical solution in the prior art Embodiment or attached drawing needed to be used in the description of the prior art be briefly described, it should be apparent that, it is described below Attached drawing is some embodiments of the present invention, for those of ordinary skill in the art, before not making the creative labor It puts, is also possible to obtain other drawings based on these drawings.
Fig. 1 is the one of the keystroke characteristic abnormal user recognition methods based on support vector machines provided in the embodiment of the present invention The flow chart of a specific example;
Fig. 2 is the process of the specific example for establishing preset behavioural characteristic sample database provided in the embodiment of the present invention Figure;
Fig. 3 is that behavioural characteristic data progress Supplementing Data processing one of the loss provided in the embodiment of the present invention is exemplary Flow chart;
Fig. 4 is the exemplary flow chart that in the embodiment of the present invention completion treated data are carried out with dimension-reduction treatment;
Fig. 5 is according to preset disaggregated model and preset behavioural characteristic sample database in the embodiment of the present invention to be identified The behavioural characteristic data of user identify, generate the flow chart of a specific example of recognition result;
Fig. 6 is according to preset disaggregated model and preset behavioural characteristic sample database in the embodiment of the present invention to be identified The behavioural characteristic data of user identify, generate the flow chart of another specific example of recognition result;
Fig. 7 is the knot of the keystroke characteristic abnormal user identifying system based on support vector machines provided in the embodiment of the present invention Structure schematic diagram;
Fig. 8 is the structural schematic diagram of the computer equipment provided in the embodiment of the present invention.
Specific embodiment
Technical solution of the present invention is clearly and completely described below in conjunction with attached drawing, it is clear that described implementation Example is a part of the embodiment of the present invention, instead of all the embodiments.Based on the embodiments of the present invention, ordinary skill Personnel's every other embodiment obtained without making creative work, shall fall within the protection scope of the present invention.
As long as in addition, the non-structure each other of technical characteristic involved in invention described below different embodiments It can be combined with each other at conflict.
Embodiment 1
The embodiment of the present invention provides a kind of keystroke characteristic abnormal user recognition methods based on support vector machines, such as Fig. 1 institute Show, is somebody's turn to do the keystroke characteristic abnormal user recognition methods based on support vector machines, includes the following steps:
Step S1: the behavioural characteristic data that user to be identified inputs default sample are obtained.
In the embodiment of the present invention, behavior characteristic can be user and input default sample by input equipments such as keyboards Behavioural characteristic, behavior characteristic may include: the quantity of Key stroke character difference type of error, keystroke speed, hits The stability of key average speed, keystroke instantaneous velocity, keystroke accuracy and keystroke.It, will according to the analysis to behavioural characteristic data Wherein the type of the different mistakes of Key stroke character is divided into following several classifications:
(1) Bad case: keystroke mistake, such as will ".The " be entered as ".";
(2) Bad ordering: when striking a string of characters, some character is inputted too early, such as " house " is inputted At " houes ";
(3) Doublet: when tapping a string of character strings, the same letter is tapped twice, such as " home " is entered as "homee";
(4) Other: other kinds of percussion mistake;
(5) RED:, which having manifest error, to be modified when percussion but not.
The classification of the different mistakes of above-mentioned Key stroke character by way of example only, in practical application, about Key stroke The different mistake of character can be adjusted according to the actual situation, and the present invention is not limited thereto.
In embodiments of the present invention, it is that input per minute is correct that above-mentioned keystroke speed, which can be the keystroke speed of user's input, Character number;Keystroke average speed can be expressed as the input correct characters number of user within a preset period of time;Keystroke is instantaneous Speed is the keystroke speed of current time user;Keystroke accuracy be with input correct number of characters and input alphabet number it Than;The stability of keystroke can be expressed as the variance of user keystroke speed and the variance of user keystroke accuracy rate.
Step S2: special to the behavior of user to be identified according to preset disaggregated model and preset behavioural characteristic sample database Sign data are identified, recognition result is generated.
Default disaggregated model in the embodiment of the present invention can be support vector cassification model, in order to improve the supporting vector The robustness of machine disaggregated model introduces slack variable for each sampleTo control the influence to solution, i.e., for separate The sample point of the centre of sphere implements punishment, shown in constraint condition such as formula (1):
That is the feature vector x of sample of usersiEuclidean distance to the centre of sphere is less than or equal to radius plus slack variable.
In linear inseparable situation, support vector machines is completed to calculate first in lower dimensional space, then passes through core letter The input space is mapped to high-dimensional feature space by number, and optimal separating hyperplane is finally constructed in high-dimensional feature space, thus Bad point of nonlinear data separates in itself in plane.It in the embodiment of the present invention, is located using the gaussian kernel function of optimization Nonlinear classification task is managed, as shown in formula (2):
Wherein, K (x, y) is the similarity of behavioural characteristic vector X and Y, ‖ x-y ‖2For Euclidean distance, σ be feature vector X with The variance of Y.
Shown in the decision function of the support vector machines such as formula (3):
fSVD(z;α, R)=I (‖ φ (z)-φ (α) ‖2≤R2)
=I (- 2 ∑ of K (z, z)iαiK(z,xi)+∑i,jαiαjK(xi,xi)≤R2) (3)
Wherein, K (z, z) is i.e. 1, the K of similarity (z, the x of Z and Zi) be Z and i-th of sample similarity, αiBe for weight, R2For radius squared.
In a preferred embodiment, as shown in Fig. 2, establishing above-mentioned preset behavioural characteristic sample by following steps Library:
Step S3: the first sample behavioural characteristic data that sample of users inputs default sample are obtained.
In embodiments of the present invention, which it is defeated by input equipments such as keyboards to can be user Enter the behavioural characteristic of default sample, which includes: the different wrong classes of above-mentioned Key stroke character The quantity of type, keystroke speed, keystroke average speed, keystroke instantaneous velocity, the stability of keystroke accuracy and keystroke and corresponding Characteristic value.It can also include other behavioural characteristics that however, it is not limited to this in other embodiments.
Step S4: Supplementing Data processing, shape are carried out to the behavioural characteristic data lost in first sample behavioural characteristic data At the second sample behavioural characteristic data, and using the second sample behavioural characteristic data as preset behavioural characteristic sample database.
In a preferred embodiment, as shown in figure 3, it is above-mentioned to the behavioural characteristic lost in first sample behavioural characteristic data Data carry out the step of Supplementing Data processing, specifically include:
Step S5: first sample behavioural characteristic data are normalized.
In the embodiment of the present invention, the corresponding characteristic value of above-mentioned each behavioural characteristic is done into normalized, so that its numerical value In 0~1 range, to facilitate subsequent processing.
Step S6: according to the behavioural characteristic number of the first sample behavioural characteristic data judgement sample user after normalized According to whether losing.In practical applications, there can be the case where partial data loss, such as user first will when inputting character string After " home " breaks into " homme ", and deletes and break into " hoem " again, according to the division of the type of above-mentioned mistake, user at this time Bad ordering index accumulate once, but Doublet, there is no cumulative, which results in the loss of data.In addition exist In actual conditions, in order to accomplish that Realtime Statistics, very big probability will appear the loss of partial data.
Step S7: the sample of users and its first sample behavioural characteristic data that behavioural characteristic data are not lost are as training Collection, the sample of users and its first sample behavioural characteristic data of behavioural characteristic loss of data according to training set, are surveyed as test set Examination collection and lasso regression model determine the weight between the behavioural characteristic for not losing data and the behavioural characteristic for losing data.
In the embodiment of the present invention, it is to be returned using Lasso to data progress completion is lost, behavioural characteristic data is not lost Sample of users behavioural characteristic data as training set, by the behavioural characteristic data of the sample of users of behavioural characteristic loss of data As test set, the characteristic value that behavioural characteristic is lost in test set is set asThe behavioural characteristic that data are lost in test set is denoted as xk, corresponding x in training setkValue be denoted as y, remove xkThe characteristic value of the behavioural characteristic that do not lose in addition is denoted as X, passes through public affairs Formula (4) calculates the weight between the behavioural characteristic that do not lose and the behavioural characteristic of loss:
Wherein, J (W) is loss function, and X is the characteristic value that training set does not lose behavioural characteristic, and N is the number of sample of users Amount, y are the characteristic value that training set loses behavioural characteristic, and w is the behavioural characteristic that do not lose and loses the weight between feature, and α is Hyper parameter.I.e. when above-mentioned loss function J (W) is minimum, acquires the behavioural characteristic that do not lose and loses the weight w between feature, The characteristic value for losing the behavioural characteristic of data is calculated by formula (5):
Wherein,For the characteristic value for the behavioural characteristic that test set is lost, w be the behavioural characteristic do not lost and loss feature it Between weight, b is bias.
Step S8: the behavioural characteristic data lost according to weight and training set completion form the second sample behavioural characteristic number According to.
In the embodiment of the present invention, the characteristic value of the behavioural characteristic of data is lost according to above-mentioned weight and training set completion, The behavioural characteristic data lost according to the characteristic value completion for the behavioural characteristic for losing data.
In a preferred embodiment, as shown in figure 4, above-mentioned steps S8 carries out Supplementing Data to the behavioural characteristic data of loss After the step of processing, the keystroke characteristic abnormal user recognition methods further include:
Step S9: carrying out correlation analysis to the second sample behavioural characteristic data, obtains analysis result.
Step S10: based on the analysis results, the second sample behavioural characteristic is screened.
In embodiments of the present invention, the sample of users of test is that make by totally 88 samples (everyone 88 sections of texts of input) by 11 people For training data, less data amount and more feature will lead to over-fitting in practical application, in order to alleviate over-fitting, The verifying number for constructing sample is to be analyzed to carry out Feature Selection using correlation thermal map in the embodiment of the present invention, to be tested using intersecting Card come verify data whether over-fitting.
In order to solve over-fitting, in the embodiment of the present invention, above-mentioned five kinds of error characteristics are analyzed, carry out sample User has found there is very big relevance between certain features when testing, for example user is accidentally tapped " home " at " hoeem ", It belongs to Bad ordering at this time, and belongs to Doublet, and the two often occurs simultaneously.Further, to five kinds of mistakes Accidentally correlation analysis is carried out between feature.In the embodiment of the present invention, given threshold 0.5, the association between two kinds of behavioural characteristics Property | α | when >=0.5, it is believed that the relevance between two kinds of behavioural characteristics is very big, α > 0.5, illustrates that two features have very high correlation Property, it can replace mutually, as α < -0.5, it is believed that two features mutually inhibit, and can also replace mutually, by this method, Behavioural characteristic is screened.It should be noted that the value of above-mentioned threshold value is not limited to this, in other embodiments in other realities It applies and is also possible to other numerical value in example.
Step S11: dimension-reduction treatment is carried out to the data after screening by principal component analysis.
In the embodiment of the present invention, dimension is reduced using principal component analysis to the feature that S10 through the above steps has been screened, is mentioned High operational efficiency.Principal component analysis is a kind of unsupervised statistical method, generally by means of orthogonal transformation, component is relevant Vector is converted into the incoherent vector of component, is to convert orthogonal coordinates for original coordinate system in intuitive performance geometrically Sample point is dispersed in multiple directions, and carries out dimension-reduction treatment to multidimensional variable by system.Specifically, it is right by formula (6) to can be Data after screening carry out dimension-reduction treatment:
Wherein, x(i)For the feature vector of current dimension, x(i) approxIt is the feature vector after dimension-reduction treatment, α is preset Threshold value, the quantity of m representative sample user.Threshold value in the embodiment of the present invention is set as 0.01, and however, it is not limited to this in other realities It applies example and is being also possible to other numerical value.
Step S12: using the data after dimension-reduction treatment as user behavior characteristics sample database.
In the embodiment of the present invention, using the second sample behavioural characteristic data after dimension-reduction treatment as preset behavioural characteristic sample This library.
In a preferred embodiment, as shown in figure 5, above-mentioned steps S2 is special according to preset disaggregated model and preset behavior The step of sign sample database identifies the behavioural characteristic data of user to be identified, generates recognition result, specifically includes:
Step S211: the sample of users set of preset behavioural characteristic sample database is obtained.
Sample of users in the embodiment of the present invention is 11, therefore user's collection is combined into U={ u1,u2,……,un(n= 11)。
Step S212: successively using one of sample of users in sample of users set as positive collection, other sample of users As negative collection.
Step S213: according to the behavioural characteristic data and support vector machines point after dimension-reduction treatment of user to be identified Class model obtains recognition result.In the embodiment of the present invention, if x={ t1,t2,……,tmIt is a user to be identified, and each t A respectively behavioural characteristic, in the embodiment of the present invention, the behavioural characteristic after dimension-reduction treatment is 6, therefore m=6.
Step S214: recognition result is ranked up, and the maximum value of recognition result is corresponding as the sample just collected use The identity at family is determined as the identity of user to be identified.
In the embodiment of the present invention, the behavioural characteristic of the sample of users of some classification is successively classified as one kind when being classified, The behavioural characteristic of other remaining sample of users is classified as another kind of, and the sample of such k classification has just constructed k supporting vector Machine classifier.In the embodiment of the present invention, (namely 11 Label) will be divided by sharing 11 classes, they are respectively U={ u1, u2,……,un(n=11), then extract training set when, respectively extract:
1)u1Corresponding vector is as positive collection, U={ u2,……,unVector corresponding to (n=11) is as negative collection;
2)u2Corresponding vector is as positive collection, U={ u1,u3,……,unVector corresponding to (n=11) is as negative Collection;
3)u3Corresponding vector is as positive collection, U={ u1,u2,……,unVector corresponding to (n=11) is as negative Collection;
4)u4Corresponding vector is as positive collection, U={ u1,u2,……,unVector corresponding to (n=11) is as negative Collection, and so on.
5) it is trained respectively using this 11 training sets, then obtains 11 classification results f1(x), f2(x), f3 (x)……f11(x), in 11 values maximum one be used as classification results, and by corresponding as the sample of users just collected Identity is determined as the identity of user to be identified.
Step S215: judge whether the identity of user to be identified belongs to sample of users set.
Step S216: when the identity of user to be identified is not belonging to sample of users set, determine that user to be identified is Abnormal user.
In the embodiment of the present invention, when user to be identified is one of above-mentioned sample of users, for example, sample is used Family u1, be sample of users in the user identity obtained after classifier is classified it is u2When sample of users, just illustrate to be identified User keystroke abnormal behavior, be determined as abnormal user.
In another embodiment, as shown in fig. 6, above-mentioned steps S2 is according to preset disaggregated model and preset behavioural characteristic The step of sample database identifies the behavioural characteristic data of user to be identified, generates recognition result, specifically includes:
Step S221: the sample of users set of user behavior characteristics sample database is obtained.
In embodiments of the present invention, obtaining the sample of users in the embodiment of the present invention is 11, therefore user's collection is combined into U= {u1,u2,……,un(n=11).
Step S222: successively using user gather in one of user's sample as positive collection, other users sample conduct Negative collection;
Step S223: according to the behavioural characteristic data and support vector machines point after dimension-reduction treatment of user to be identified It is identified in class model, obtains recognition result.In the embodiment of the present invention, if x={ t1,t2,……,tmIt is one to be identified User, and each t is respectively a behavioural characteristic, in the embodiment of the present invention, the behavioural characteristic after dimension-reduction treatment is 6, Therefore m=6.
Step S224: being ranked up according to recognition result, judges whether the maximum value in recognition result is greater than a preset value.
In the embodiment of the present invention, the behavioural characteristic of the sample of users of some classification is successively classified as one kind when being classified, The behavioural characteristic of other remaining sample of users is classified as another kind of, and the sample of such k classification has just constructed k supporting vector Machine classifier.In the embodiment of the present invention, (namely 11 Label) will be divided by sharing 11 classes, they are respectively U={ u1, u2,……,un(n=11) then extract training set when, respectively extract:
1)u1Corresponding vector is as positive collection, U={ u2,……,unVector corresponding to (n=11) is as negative collection;
2)u2Corresponding vector is as positive collection, U={ u1,u3,……,unVector corresponding to (n=11) is as negative Collection;
3)u3Corresponding vector is as positive collection, U={ u1,u2,……,unVector corresponding to (n=11) is as negative Collection;
4)u4Corresponding vector is as positive collection, U={ u1,u2,……,unVector corresponding to (n=11) is as negative Collection, and so on.
5) it is trained respectively using this 11 training sets, then obtains 11 classification results f1(x), f2(x), f3 (x)……f11(x), it and to 11 recognition results is ranked up, it is default to judge whether the maximum value in recognition result is greater than one It is worth, the preset value in the embodiment of the present invention is 0.8, and however, it is not limited to this in other embodiments can also be according to applied field Scape sets corresponding numerical value.
Step S225: when the maximum value in recognition result is greater than preset value, by the identity as the sample of users just collected It is determined as the identity of user to be identified.
Step S226: when the maximum value in recognition result is less than preset value, determine user to be identified for abnormal user.
In the embodiment of the present invention, when the behavioural characteristic of user to be identified is carried out by classifier and behavioural characteristic sample database After classification, needs to set preset value to the result of classification, only when being greater than the preset value, just can determine that the affiliated sample of its identity Which of user illustrates when being less than preset value for abnormal user.
Keystroke characteristic abnormal user recognition methods provided in an embodiment of the present invention based on support vector machines, by obtaining sample This user inputs the first sample behavioural characteristic data of default sample, quantity including Key stroke character difference type of error, The stability of keystroke speed, keystroke average speed, keystroke instantaneous velocity, keystroke accuracy and keystroke, will can more embody user and hit The behavioural characteristic of key individual difference characterization user identity is as behavioural characteristic library, the in addition behavior to losing in behavioural characteristic data Characteristic is used as preset behavioural characteristic sample database after carrying out Supplementing Data processing, so that sample of users behavioural characteristic data are more Add it is whole so that discrimination is greatly improved compared with prior art.
Embodiment 2
The embodiment of the present invention provides a kind of keystroke characteristic abnormal user identifying system based on support vector machines, such as Fig. 7 institute Show, being somebody's turn to do the keystroke characteristic abnormal user identifying system based on support vector machines includes:
User behavior characteristics extraction module 1 to be identified inputs the behavior of default sample for obtaining user to be identified Characteristic.The method that this module specifically executes step S1 in embodiment 1, details are not described herein.
User's categorization module 2 to be identified, for according to preset disaggregated model and preset behavioural characteristic sample database pair The behavioural characteristic data of user to be identified are classified, and classification results are generated.This module specifically executes step S2 in embodiment 1 Method, details are not described herein.
In the embodiment of the present invention, the method for establishing above-mentioned preset behavioural characteristic library, referring to the step of being recorded in embodiment 1 S3~S12, details are not described herein.
Keystroke characteristic abnormal user identifying system provided in an embodiment of the present invention based on support vector machines, by obtaining sample This user inputs the first sample behavioural characteristic data of default sample, quantity including Key stroke character difference type of error, The stability of keystroke speed, keystroke average speed, keystroke instantaneous velocity, keystroke accuracy and keystroke, will can more embody user and hit The behavioural characteristic of key individual difference characterization user identity is as behavioural characteristic library, the in addition behavior to losing in behavioural characteristic data Characteristic is used as preset behavioural characteristic sample database after carrying out Supplementing Data processing, so that sample of users behavioural characteristic data are more Add it is whole so that discrimination is greatly improved compared with prior art.
Embodiment 3
The embodiment of the present invention provides a kind of computer equipment, as shown in Figure 8, comprising: at least one processor 401, such as CPU (Central Processing Unit, central processing unit), at least one communication interface 403, memory 404, at least one A communication bus 402.Wherein, communication bus 402 is for realizing the connection communication between these components.Wherein, communication interface 403 It may include display screen (Display), keyboard (Keyboard), optional communication interface 403 can also include that the wired of standard connects Mouth, wireless interface.Memory 404 can be high speed RAM memory, and (Ramdom Access Memory, effumability are deposited at random Access to memory), it is also possible to non-labile memory (non-volatile memory), for example, at least a disk storage Device.Memory 404 optionally can also be that at least one is located remotely from the storage device of aforementioned processor 401.Wherein processor 401 can combine the keystroke characteristic abnormal user identifying systems based on support vector machines of Fig. 3 description, store in memory 404 Batch processing code, and processor 401 calls the program code stored in memory 404, to be based on supporting vector for executing The keystroke characteristic abnormal user recognition methods of machine, i.e., for executing such as the hitting based on support vector machines in FIG. 1 to FIG. 6 embodiment Key feature abnormalities user identification method.
Wherein, communication bus 402 can be Peripheral Component Interconnect standard (peripheral component Interconnect, abbreviation PCI) bus or expanding the industrial standard structure (extended industry standard Architecture, abbreviation EISA) bus etc..Communication bus 402 can be divided into address bus, data/address bus, control bus etc.. Only to be indicated with a thick line in Fig. 8, it is not intended that an only bus or a type of bus convenient for indicating.
Wherein, memory 404 may include volatile memory (English: volatile memory), such as arbitrary access Memory (English: random-access memory, abbreviation: RAM);Memory also may include nonvolatile memory (English Text: non-volatile memory), for example, flash memory (English: flash memory), hard disk (English: hard disk Drive, abbreviation: HDD) or solid state hard disk (English: solid-state drive, abbreviation: SSD);Memory 404 can also wrap Include the combination of the memory of mentioned kind.
Wherein, processor 401 can be central processing unit (English: central processing unit, abbreviation: CPU), the combination of network processing unit (English: network processor, abbreviation: NP) or CPU and NP.
Wherein, processor 401 can further include hardware chip.Above-mentioned hardware chip can be specific integrated circuit (English: application-specific integrated circuit, abbreviation: ASIC), programmable logic device (English: Programmable logic device, abbreviation: PLD) or combinations thereof.Above-mentioned PLD can be Complex Programmable Logic Devices (English: complex programmable logic device, abbreviation: CPLD), field programmable gate array (English: Field-programmable gate array, abbreviation: FPGA), Universal Array Logic (English: generic array Logic, abbreviation: GAL) or any combination thereof.
Optionally, memory 404 is also used to store program instruction.Processor 401 can be instructed with caller, be realized such as this Apply for the keystroke characteristic abnormal user recognition methods based on support vector machines in 1~Fig. 6 embodiment.
The embodiment of the present invention also provides a kind of computer readable storage medium, and meter is stored on computer readable storage medium Calculation machine executable instruction, the computer executable instructions can be performed in above-mentioned any means embodiment based on support vector machines Keystroke characteristic abnormal user recognition methods.Wherein, the storage medium can be magnetic disk, CD, read-only memory (Read- Only Memory, ROM), random access memory (Random Access Memory, RAM), flash memory (Flash Memory), hard disk (Hard Disk Drive, abbreviation: HDD) or solid state hard disk (Solid-State Drive, SSD) etc.;Institute State the combination that storage medium can also include the memory of mentioned kind.
It should be understood by those skilled in the art that, the embodiment of the present invention can provide as method, system or computer program Product.Therefore, complete hardware embodiment, complete software embodiment or reality combining software and hardware aspects can be used in the present invention Apply the form of example.Moreover, it wherein includes the computer of computer usable program code that the present invention, which can be used in one or more, The computer program implemented in usable storage medium (including but not limited to magnetic disk storage, CD-ROM, optical memory etc.) produces The form of product.
The present invention be referring to according to the method for the embodiment of the present invention, the process of equipment (system) and computer program product Figure and/or block diagram describe.It should be understood that every one stream in flowchart and/or the block diagram can be realized by computer program instructions The combination of process and/or box in journey and/or box and flowchart and/or the block diagram.It can provide these computer programs Instruct the processor of general purpose computer, special purpose computer, Embedded Processor or other programmable data processing devices to produce A raw machine, so that being generated by the instruction that computer or the processor of other programmable data processing devices execute for real The device for the function of being specified in present one or more flows of the flowchart and/or one or more blocks of the block diagram.
These computer program instructions, which may also be stored in, is able to guide computer or other programmable data processing devices with spy Determine in the computer-readable memory that mode works, so that it includes referring to that instruction stored in the computer readable memory, which generates, Enable the manufacture of device, the command device realize in one box of one or more flows of the flowchart and/or block diagram or The function of being specified in multiple boxes.
These computer program instructions also can be loaded onto a computer or other programmable data processing device, so that counting Series of operation steps are executed on calculation machine or other programmable devices to generate computer implemented processing, thus in computer or The instruction executed on other programmable devices is provided for realizing in one or more flows of the flowchart and/or block diagram one The step of function of being specified in a box or multiple boxes.
Obviously, the above embodiments are merely examples for clarifying the description, and does not limit the embodiments.It is right For those of ordinary skill in the art, can also make on the basis of the above description it is other it is various forms of variation or It changes.There is no necessity and possibility to exhaust all the enbodiments.And it is extended from this it is obvious variation or It changes still within the protection scope of the invention.

Claims (13)

1. a kind of keystroke characteristic abnormal user recognition methods based on support vector machines, which comprises the steps of:
Obtain the behavioural characteristic data that user to be identified inputs default sample;
According to preset disaggregated model and preset behavioural characteristic sample database to the behavioural characteristic data of the user to be identified It is identified, generates recognition result;
The preset behavioural characteristic sample database is established by following steps:
Obtain the first sample behavioural characteristic data that sample of users inputs default sample;
Supplementing Data processing is carried out to the behavioural characteristic data lost in the first sample behavioural characteristic data, forms the second sample This behavioural characteristic data, and using the second sample behavioural characteristic data as the preset behavioural characteristic sample database.
2. the keystroke characteristic abnormal user recognition methods according to claim 1 based on support vector machines, which is characterized in that The first sample behavioural characteristic data include at least one of the following contents: the number of Key stroke character difference type of error The stability of amount, keystroke speed, keystroke average speed, keystroke instantaneous velocity, keystroke accuracy and keystroke.
3. the keystroke characteristic abnormal user recognition methods according to claim 1 based on support vector machines, which is characterized in that Described the step of Supplementing Data processing is carried out to the behavioural characteristic data lost in the first sample behavioural characteristic data, specifically Include:
The first sample behavioural characteristic data are normalized;
Whether lost according to the behavioural characteristic data of the first sample behavioural characteristic data judgement sample user after normalized;
The sample of users and its first sample behavioural characteristic data that behavioural characteristic data are not lost are as training set, behavioural characteristic The sample of users and its first sample behavioural characteristic data of loss of data as test set, according to the training set, test set and Lasso regression model determines the weight between the behavioural characteristic for not losing data and the behavioural characteristic for losing data;
According to the behavioural characteristic data that the weight and training set completion are lost, the second sample behavioural characteristic data are formed.
4. the keystroke characteristic abnormal user recognition methods according to claim 3 based on support vector machines, which is characterized in that It the step of behavioural characteristic data lost according to the weight and training set completion, specifically includes:
The characteristic value of the behavioural characteristic of data is lost according to the weight and training set completion;
The behavioural characteristic data lost according to the characteristic value completion for the behavioural characteristic for losing data.
5. the keystroke characteristic abnormal user recognition methods according to claim 4 based on support vector machines, which is characterized in that The weight being calculated by the following formula between the behavioural characteristic for not losing data and the behavioural characteristic for losing data:
Wherein, J (W) is loss function, and X is the characteristic value of the behavioural characteristic for not losing data in the training set, and N is sample The quantity of user, y are the characteristic value of the behavioural characteristic of the loss data in the training set, and w is the behavior spy for not losing data Weight between sign and the behavioural characteristic for losing data, α is hyper parameter.
6. the keystroke characteristic abnormal user recognition methods according to claim 5 based on support vector machines, which is characterized in that It is calculated by the following formula the characteristic value for losing the behavioural characteristic of data:
Wherein,For in the test set lose data behavioural characteristic characteristic value, w be do not lose data behavioural characteristic and The weight between the behavioural characteristic of data is lost, b is bias.
7. the keystroke characteristic abnormal user recognition methods according to claim 6 based on support vector machines, which is characterized in that In the behavioural characteristic data lost according to the weight and training set completion, the step of the second sample behavioural characteristic data is formed After rapid, further includes:
Correlation analysis is carried out to the second sample behavioural characteristic data, obtains analysis result;
According to the analysis as a result, being screened to the second sample behavioural characteristic;
Dimension-reduction treatment is carried out to the data after the screening by principal component analysis;
Using the data after dimension-reduction treatment as user behavior characteristics sample database.
8. the keystroke characteristic abnormal user recognition methods according to claim 7 based on support vector machines, which is characterized in that Dimension-reduction treatment is carried out to the data after the screening by following formula:
Wherein, x(i)For the feature vector of current dimension, x(i) approxIt is the feature vector after dimension-reduction treatment, α is preset threshold value, M represents the quantity of the sample of users.
9. the keystroke characteristic abnormal user recognition methods according to claim 8 based on support vector machines, which is characterized in that It is described according to preset disaggregated model and preset behavioural characteristic sample database to the behavioural characteristic data of the user to be identified The step of being identified, generating recognition result, specifically includes:
Obtain the sample of users set of the preset behavioural characteristic sample database;
Successively using one of sample of users in the sample of users set as positive collection, other sample of users are as negative collection;
It is obtained according to behavioural characteristic data after dimension-reduction treatment of the user to be identified and support vector cassification model To recognition result;
The recognition result is ranked up, the corresponding identity as the sample of users just collected of the maximum value of recognition result is true It is set to the identity of the user to be identified;
Judge whether the identity of the user to be identified belongs to the sample of users set;
When the identity of the user to be identified is not belonging to the sample of users set, determine that the user to be identified is different Common family.
10. the keystroke characteristic abnormal user recognition methods according to claim 9 based on support vector machines, feature exist In, it is described according to preset disaggregated model and preset behavioural characteristic sample database to the behavioural characteristic number of the user to be identified According to the step of being identified, generating recognition result, specifically include:
Obtain the sample of users set of the user behavior characteristics sample database;
Successively using one of user's sample in user set as positive collection, other users sample is as negative collection;
According in behavioural characteristic data after dimension-reduction treatment of the user to be identified and support vector cassification model It is identified, obtains recognition result;
It is ranked up according to the recognition result, judges whether the maximum value in the recognition result is greater than a preset value;
When the maximum value in the recognition result is greater than the preset value, the identity as the sample of users just collected is determined as The identity of the user to be identified;
When the maximum value in the recognition result is less than the preset value, determine that the user to be identified is abnormal user.
11. a kind of keystroke characteristic abnormal user identifying system based on support vector machines characterized by comprising
User behavior characteristics extraction module to be identified inputs the behavioural characteristic number of default sample for obtaining user to be identified According to;
User's categorization module to be identified, for according to preset disaggregated model and preset behavioural characteristic sample database to it is described to The behavioural characteristic data of the user of identification are classified, and classification results are generated;
The preset behavioural characteristic sample database is established by following steps:
Obtain the first sample behavioural characteristic data that sample of users inputs default sample;
Supplementing Data processing is carried out to the behavioural characteristic data lost in the first sample behavioural characteristic data, forms the second sample This behavioural characteristic data, and using the second sample behavioural characteristic data as the preset behavioural characteristic sample database.
12. a kind of computer equipment characterized by comprising at least one processor, and at least one described processor The memory of communication connection, wherein the memory is stored with the instruction that can be executed by least one described processor, the finger It enables and being executed by least one described processor, so that at least one described processor executes any institute in the claims 1-10 The keystroke characteristic abnormal user recognition methods based on support vector machines stated.
13. a kind of computer readable storage medium, which is characterized in that the computer-readable recording medium storage has computer to refer to It enables, it is any described based on supporting vector in the claims 1-10 that the computer instruction is used to making the computer to execute The keystroke characteristic abnormal user recognition methods of machine.
CN201810763718.4A 2018-07-12 2018-07-12 A kind of recognition methods of keystroke characteristic abnormal user and system based on support vector machines Pending CN109145554A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201810763718.4A CN109145554A (en) 2018-07-12 2018-07-12 A kind of recognition methods of keystroke characteristic abnormal user and system based on support vector machines

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201810763718.4A CN109145554A (en) 2018-07-12 2018-07-12 A kind of recognition methods of keystroke characteristic abnormal user and system based on support vector machines

Publications (1)

Publication Number Publication Date
CN109145554A true CN109145554A (en) 2019-01-04

Family

ID=64800424

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201810763718.4A Pending CN109145554A (en) 2018-07-12 2018-07-12 A kind of recognition methods of keystroke characteristic abnormal user and system based on support vector machines

Country Status (1)

Country Link
CN (1) CN109145554A (en)

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109873813A (en) * 2019-01-28 2019-06-11 平安科技(深圳)有限公司 Text input abnormality monitoring method, device, computer equipment and storage medium
CN110502883A (en) * 2019-08-23 2019-11-26 四川长虹电器股份有限公司 A kind of keystroke abnormal behavior detection method based on PCA

Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2006111963A2 (en) * 2005-04-17 2006-10-26 Rafael - Armament Development Authority Ltd. Generic classification system
CN105450412A (en) * 2014-08-19 2016-03-30 阿里巴巴集团控股有限公司 Identity authentication method and device

Patent Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2006111963A2 (en) * 2005-04-17 2006-10-26 Rafael - Armament Development Authority Ltd. Generic classification system
CN105450412A (en) * 2014-08-19 2016-03-30 阿里巴巴集团控股有限公司 Identity authentication method and device

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
单鹏飞 等: "基于支持向量机的击键特征异常用户识别", 《电脑知识与技术》 *

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109873813A (en) * 2019-01-28 2019-06-11 平安科技(深圳)有限公司 Text input abnormality monitoring method, device, computer equipment and storage medium
CN110502883A (en) * 2019-08-23 2019-11-26 四川长虹电器股份有限公司 A kind of keystroke abnormal behavior detection method based on PCA

Similar Documents

Publication Publication Date Title
WO2021026805A1 (en) Adversarial example detection method and apparatus, computing device, and computer storage medium
CN107577945A (en) URL attack detection methods, device and electronic equipment
CN113489685B (en) Secondary feature extraction and malicious attack identification method based on kernel principal component analysis
CN106803039B (en) A kind of homologous determination method and device of malicious file
CN108363902A (en) A kind of accurate prediction technique of pathogenic hereditary variation
WO2021111540A1 (en) Evaluation method, evaluation program, and information processing device
CN109189892A (en) A kind of recommended method and device based on article review
Vignotto et al. Extreme Value Theory for Open Set Classification--GPD and GEV Classifiers
Neshatian et al. Feature construction and dimension reduction using genetic programming
CN114707571A (en) Credit data anomaly detection method based on enhanced isolation forest
CN109145554A (en) A kind of recognition methods of keystroke characteristic abnormal user and system based on support vector machines
CN106529470A (en) Gesture recognition method based on multistage depth convolution neural network
CN115798022A (en) Artificial intelligence identification method based on feature extraction
CN110378389A (en) A kind of Adaboost classifier calculated machine creating device
Bader-El-Den Self-adaptive heterogeneous random forest
CN112016317A (en) Sensitive word recognition method and device based on artificial intelligence and computer equipment
CN109101984A (en) A kind of image-recognizing method and device based on convolutional neural networks
US20170293863A1 (en) Data analysis system, and control method, program, and recording medium therefor
CN113128556B (en) Deep learning test case sequencing method based on mutation analysis
CN108875060A (en) A kind of website identification method and identifying system
CN113918471A (en) Test case processing method and device and computer readable storage medium
CN107871141A (en) A kind of classification Forecasting Methodology and classification fallout predictor for non-equilibrium data collection
CN113591881A (en) Intention recognition method and device based on model fusion, electronic equipment and medium
CN110309285A (en) Automatic question-answering method, device, electronic equipment and storage medium
CN117708569B (en) Identification method, device, terminal and storage medium for pathogenic microorganism information

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
TA01 Transfer of patent application right
TA01 Transfer of patent application right

Effective date of registration: 20210402

Address after: 325006 Wenzhou Higher Education Park, Zhejiang Province (Chashan Town, Ouhai District)

Applicant after: Wenzhou University

Address before: 325000 Room 203, 2nd floor, area D, building 14, Haixi e-commerce Science Park, Lingxi Town, Cangnan County, Wenzhou City, Zhejiang Province

Applicant before: WENZHOU UNIVERSITY CANGNAN Research Institute

RJ01 Rejection of invention patent application after publication
RJ01 Rejection of invention patent application after publication

Application publication date: 20190104