CN109145554A - A kind of recognition methods of keystroke characteristic abnormal user and system based on support vector machines - Google Patents
A kind of recognition methods of keystroke characteristic abnormal user and system based on support vector machines Download PDFInfo
- Publication number
- CN109145554A CN109145554A CN201810763718.4A CN201810763718A CN109145554A CN 109145554 A CN109145554 A CN 109145554A CN 201810763718 A CN201810763718 A CN 201810763718A CN 109145554 A CN109145554 A CN 109145554A
- Authority
- CN
- China
- Prior art keywords
- sample
- behavioural characteristic
- data
- user
- keystroke
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Pending
Links
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F21/00—Security arrangements for protecting computers, components thereof, programs or data against unauthorised activity
- G06F21/30—Authentication, i.e. establishing the identity or authorisation of security principals
- G06F21/31—User authentication
- G06F21/32—User authentication using biometric data, e.g. fingerprints, iris scans or voiceprints
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F18/00—Pattern recognition
- G06F18/20—Analysing
- G06F18/24—Classification techniques
- G06F18/241—Classification techniques relating to the classification model, e.g. parametric or non-parametric approaches
- G06F18/2413—Classification techniques relating to the classification model, e.g. parametric or non-parametric approaches based on distances to training or reference patterns
Abstract
The invention discloses a kind of recognition methods of keystroke characteristic abnormal user and system based on support vector machines, the first sample behavioural characteristic data of default sample are inputted by obtaining sample of users, quantity including Key stroke character difference type of error, keystroke speed, keystroke average speed, keystroke instantaneous velocity, the stability of keystroke accuracy and keystroke, the behavioural characteristic of user keystroke individual difference characterization user identity can will be more embodied as behavioural characteristic library, in addition preset behavioural characteristic sample database is used as after carrying out Supplementing Data processing to the behavioural characteristic data lost in behavioural characteristic data, so that sample of users behavioural characteristic data are more complete, so that discrimination is greatly improved compared with prior art.
Description
Technical field
The present invention relates to area of pattern recognition, and in particular to a kind of keystroke characteristic abnormal user knowledge based on support vector machines
Other method and system.
Background technique
Currently, password, password and user name certification are main user authentication modes in internetwork operation.But it is this
The problem of mechanism maximum is exactly to be easy leakage individual privacy.With machine learning, the development of the biometrics such as deep learning,
New resolving ideas is provided for Internet authentication.
Biometrics are and these biological informations using everyone special physiologic information and distinctive behavioural information
With very strong identification and uniqueness, the screening of abnormal user can be effectively carried out by biometrics.
Since people are to the difference of the keystroke habit and personal character of keyboard so that everyone is in input password or completion one
Oneself unique key stroke pattern has been respectively formed when section text input.Key stroke pattern can react the dynamics that a people beats keyboard,
Speed, habit etc. of pausing, these features are difficult to be imitated.The optional foundation of Washington, DC by further studies confirmed
The uniqueness of people's keystroke characteristic, so key stroke pattern can represent user identity.In a practical situation in order to accomplish to unite in real time
It counts, very big probability will appear the loss of partial data, to be easy to cause the problem of reducing discrimination.
Summary of the invention
Therefore, the present invention provides a kind of recognition methods of keystroke characteristic abnormal user and system based on support vector machines, solution
It has determined problem not high to the user identity identification rate based on keystroke characteristic in the prior art.
A kind of keystroke characteristic abnormal user recognition methods based on support vector machines provided in an embodiment of the present invention, including such as
Lower step: the behavioural characteristic data that user to be identified inputs default sample are obtained;According to preset disaggregated model and preset
Behavioural characteristic sample database identifies the behavioural characteristic data of the user to be identified, generates recognition result;By following
Step establishes the preset behavioural characteristic sample database: obtaining the first sample behavioural characteristic number that sample of users inputs default sample
According to;Supplementing Data processing is carried out to the behavioural characteristic data lost in the first sample behavioural characteristic data, forms the second sample
This behavioural characteristic data, and using the second sample behavioural characteristic data as the preset behavioural characteristic sample database.
Preferably, the first sample behavioural characteristic data include at least one of the following contents: Key stroke character
The stabilization of the quantity of different type of errors, keystroke speed, keystroke average speed, keystroke instantaneous velocity, keystroke accuracy and keystroke
Property.
Preferably, described that Supplementing Data is carried out to the behavioural characteristic data lost in the first sample behavioural characteristic data
The step of processing, specifically includes: the first sample behavioural characteristic data are normalized;After normalized
The behavioural characteristic data of first sample behavioural characteristic data judgement sample user whether lose;Behavioural characteristic data are not lost
Sample of users and its first sample behavioural characteristic data as training set, the sample of users of behavioural characteristic loss of data and its
The same this behavioural characteristic data do not lose number as test set, according to the determination of the training set, test set and lasso regression model
According to behavioural characteristic and lose data behavioural characteristic between weight;The behavior lost according to the weight and training set completion
Characteristic forms the second sample behavioural characteristic data.
Preferably, the step of behavioural characteristic data lost according to the weight and training set completion, specifically include:
The characteristic value of the behavioural characteristic of data is lost according to the weight and training set completion;According to the spy for the behavioural characteristic for losing data
The behavioural characteristic data that value indicative completion is lost.
Preferably, it is calculated by the following formula the behavioural characteristic that do not lose and loses the weight between feature:
Wherein, J (W) is loss function, and X is the characteristic value that the training set does not lose behavioural characteristic, and N is sample of users
Quantity, y are the characteristic value that the training set loses behavioural characteristic, and w is the power between the behavioural characteristic that do not lose and loss feature
Weight, α is hyper parameter.
Preferably, it is special to form the second sample behavior for the behavioural characteristic data lost according to the weight and training set completion
In the step of levying data, it is calculated by the following formula the characteristic value for losing the behavioural characteristic of data:
Wherein,For the characteristic value for the behavioural characteristic that the test set is lost, w is the behavioural characteristic that do not lose and loses special
Weight between sign, b are bias.
Preferably, in the behavioural characteristic data lost according to the weight and training set completion, the second sample is formed
After the step of behavioural characteristic data, further includes:
Correlation analysis is carried out to the second sample behavioural characteristic data, obtains analysis result;
According to the analysis as a result, being screened to the second sample behavioural characteristic;By principal component analysis to described
Data after screening carry out dimension-reduction treatment;Using the data after dimension-reduction treatment as user behavior characteristics sample database.
Preferably, dimension-reduction treatment is carried out to the data after the screening by following formula:
Wherein, x(i)For the feature vector of current dimension, x(i) approxIt is the feature vector after dimension-reduction treatment, α is preset
Threshold value, m represent the quantity of the sample of users.
Preferably, it is described according to preset disaggregated model and preset behavioural characteristic sample database to the user to be identified
Behavioural characteristic data identified, generate recognition result the step of, specifically include:
Obtain the sample of users set of the preset behavioural characteristic sample database;Successively in the sample of users set
One of sample of users is as positive collection, other sample of users are as negative collection;Pass through dimensionality reduction according to the user to be identified
Behavioural characteristic data that treated and support vector cassification model obtain recognition result;The recognition result is ranked up,
The corresponding identity as the sample of users just collected of the maximum value of recognition result is determined as to the identity of the user to be identified;
Judge whether the identity of the user to be identified belongs to the sample of users set;When the user to be identified identity not
When belonging to the sample of users set, determine that the user to be identified is abnormal user.
Preferably, it is described according to preset disaggregated model and preset behavioural characteristic sample database to the user to be identified
Behavioural characteristic data identified, generate recognition result the step of, specifically include:
Obtain the sample of users set of the user behavior characteristics sample database;Successively wherein one in user set
A user's sample is as positive collection, and other users sample is as negative collection;According to the user to be identified after dimension-reduction treatment
Behavioural characteristic data and support vector cassification model in identified, obtain recognition result;According to the recognition result into
Row sequence, judges whether the maximum value in the recognition result is greater than a preset value;When the maximum value in the recognition result is big
When the preset value, the identity as the sample of users just collected is determined as to the identity of the user to be identified;When described
When maximum value in recognition result is less than the preset value, determine that the user to be identified is abnormal user.
The embodiment of the present invention also provides a kind of keystroke characteristic abnormal user identifying system based on support vector machines, comprising:
User behavior characteristics extraction module to be identified inputs the behavioural characteristic data of default sample for obtaining user to be identified;
User's categorization module to be identified is used for according to preset disaggregated model and preset behavioural characteristic sample database to described to be identified
The behavioural characteristic data of user classify, generate classification results.The preset behavioural characteristic is established by following steps
Sample database:
Obtain the first sample behavioural characteristic data that sample of users inputs default sample;To the first sample behavioural characteristic
The behavioural characteristic data lost in data carry out Supplementing Data processing, form the second sample behavioural characteristic data, and by described the
Two sample behavioural characteristic data are as the preset behavioural characteristic sample database.
The embodiment of the present invention also provides a kind of computer equipment, comprising: at least one processor, and with described at least one
The memory of a processor communication connection, wherein the memory is stored with the finger that can be executed by least one described processor
Enable, described instruction executed by least one described processor so that at least one described processor execute it is above-mentioned based on support
The keystroke characteristic abnormal user recognition methods of vector machine.
The embodiment of the present invention also provides a kind of computer readable storage medium, and the computer-readable recording medium storage has
Computer instruction, the computer instruction are different for making the computer execute the above-mentioned keystroke characteristic based on support vector machines
Normal user identification method.
Technical solution of the present invention has the advantages that
1. the recognition methods of keystroke characteristic abnormal user and system provided in an embodiment of the present invention based on support vector machines is led to
It crosses and obtains the first sample behavioural characteristic data that sample of users inputs default sample, including Key stroke character difference type of error
Quantity, the stability of keystroke speed, keystroke average speed, keystroke instantaneous velocity, keystroke accuracy and keystroke, can will more embody
User keystroke individual difference characterizes the behavioural characteristic of user identity as behavioural characteristic library, so that the identification of user identity to be identified
Rate is improved significantly.
2. the recognition methods of keystroke characteristic abnormal user and system provided in an embodiment of the present invention based on support vector machines, right
The behavioural characteristic data lost in behavioural characteristic data are used as preset behavioural characteristic sample database after carrying out Supplementing Data processing, make
It is more complete to obtain sample of users behavioural characteristic data.The behavioural characteristic data that user to be identified is inputted to default sample, according to
Preset disaggregated model and preset behavioural characteristic sample database classify to the behavioural characteristic data of the user to be identified,
It generates classification results and determines its identity, and then judge whether it is abnormal user, so that discrimination has obtained very compared with prior art
Big raising.
Detailed description of the invention
It, below will be to specific in order to illustrate more clearly of the specific embodiment of the invention or technical solution in the prior art
Embodiment or attached drawing needed to be used in the description of the prior art be briefly described, it should be apparent that, it is described below
Attached drawing is some embodiments of the present invention, for those of ordinary skill in the art, before not making the creative labor
It puts, is also possible to obtain other drawings based on these drawings.
Fig. 1 is the one of the keystroke characteristic abnormal user recognition methods based on support vector machines provided in the embodiment of the present invention
The flow chart of a specific example;
Fig. 2 is the process of the specific example for establishing preset behavioural characteristic sample database provided in the embodiment of the present invention
Figure;
Fig. 3 is that behavioural characteristic data progress Supplementing Data processing one of the loss provided in the embodiment of the present invention is exemplary
Flow chart;
Fig. 4 is the exemplary flow chart that in the embodiment of the present invention completion treated data are carried out with dimension-reduction treatment;
Fig. 5 is according to preset disaggregated model and preset behavioural characteristic sample database in the embodiment of the present invention to be identified
The behavioural characteristic data of user identify, generate the flow chart of a specific example of recognition result;
Fig. 6 is according to preset disaggregated model and preset behavioural characteristic sample database in the embodiment of the present invention to be identified
The behavioural characteristic data of user identify, generate the flow chart of another specific example of recognition result;
Fig. 7 is the knot of the keystroke characteristic abnormal user identifying system based on support vector machines provided in the embodiment of the present invention
Structure schematic diagram;
Fig. 8 is the structural schematic diagram of the computer equipment provided in the embodiment of the present invention.
Specific embodiment
Technical solution of the present invention is clearly and completely described below in conjunction with attached drawing, it is clear that described implementation
Example is a part of the embodiment of the present invention, instead of all the embodiments.Based on the embodiments of the present invention, ordinary skill
Personnel's every other embodiment obtained without making creative work, shall fall within the protection scope of the present invention.
As long as in addition, the non-structure each other of technical characteristic involved in invention described below different embodiments
It can be combined with each other at conflict.
Embodiment 1
The embodiment of the present invention provides a kind of keystroke characteristic abnormal user recognition methods based on support vector machines, such as Fig. 1 institute
Show, is somebody's turn to do the keystroke characteristic abnormal user recognition methods based on support vector machines, includes the following steps:
Step S1: the behavioural characteristic data that user to be identified inputs default sample are obtained.
In the embodiment of the present invention, behavior characteristic can be user and input default sample by input equipments such as keyboards
Behavioural characteristic, behavior characteristic may include: the quantity of Key stroke character difference type of error, keystroke speed, hits
The stability of key average speed, keystroke instantaneous velocity, keystroke accuracy and keystroke.It, will according to the analysis to behavioural characteristic data
Wherein the type of the different mistakes of Key stroke character is divided into following several classifications:
(1) Bad case: keystroke mistake, such as will ".The " be entered as ".";
(2) Bad ordering: when striking a string of characters, some character is inputted too early, such as " house " is inputted
At " houes ";
(3) Doublet: when tapping a string of character strings, the same letter is tapped twice, such as " home " is entered as
"homee";
(4) Other: other kinds of percussion mistake;
(5) RED:, which having manifest error, to be modified when percussion but not.
The classification of the different mistakes of above-mentioned Key stroke character by way of example only, in practical application, about Key stroke
The different mistake of character can be adjusted according to the actual situation, and the present invention is not limited thereto.
In embodiments of the present invention, it is that input per minute is correct that above-mentioned keystroke speed, which can be the keystroke speed of user's input,
Character number;Keystroke average speed can be expressed as the input correct characters number of user within a preset period of time;Keystroke is instantaneous
Speed is the keystroke speed of current time user;Keystroke accuracy be with input correct number of characters and input alphabet number it
Than;The stability of keystroke can be expressed as the variance of user keystroke speed and the variance of user keystroke accuracy rate.
Step S2: special to the behavior of user to be identified according to preset disaggregated model and preset behavioural characteristic sample database
Sign data are identified, recognition result is generated.
Default disaggregated model in the embodiment of the present invention can be support vector cassification model, in order to improve the supporting vector
The robustness of machine disaggregated model introduces slack variable for each sampleTo control the influence to solution, i.e., for separate
The sample point of the centre of sphere implements punishment, shown in constraint condition such as formula (1):
That is the feature vector x of sample of usersiEuclidean distance to the centre of sphere is less than or equal to radius plus slack variable.
In linear inseparable situation, support vector machines is completed to calculate first in lower dimensional space, then passes through core letter
The input space is mapped to high-dimensional feature space by number, and optimal separating hyperplane is finally constructed in high-dimensional feature space, thus
Bad point of nonlinear data separates in itself in plane.It in the embodiment of the present invention, is located using the gaussian kernel function of optimization
Nonlinear classification task is managed, as shown in formula (2):
Wherein, K (x, y) is the similarity of behavioural characteristic vector X and Y, ‖ x-y ‖2For Euclidean distance, σ be feature vector X with
The variance of Y.
Shown in the decision function of the support vector machines such as formula (3):
fSVD(z;α, R)=I (‖ φ (z)-φ (α) ‖2≤R2)
=I (- 2 ∑ of K (z, z)iαiK(z,xi)+∑i,jαiαjK(xi,xi)≤R2) (3)
Wherein, K (z, z) is i.e. 1, the K of similarity (z, the x of Z and Zi) be Z and i-th of sample similarity, αiBe for weight,
R2For radius squared.
In a preferred embodiment, as shown in Fig. 2, establishing above-mentioned preset behavioural characteristic sample by following steps
Library:
Step S3: the first sample behavioural characteristic data that sample of users inputs default sample are obtained.
In embodiments of the present invention, which it is defeated by input equipments such as keyboards to can be user
Enter the behavioural characteristic of default sample, which includes: the different wrong classes of above-mentioned Key stroke character
The quantity of type, keystroke speed, keystroke average speed, keystroke instantaneous velocity, the stability of keystroke accuracy and keystroke and corresponding
Characteristic value.It can also include other behavioural characteristics that however, it is not limited to this in other embodiments.
Step S4: Supplementing Data processing, shape are carried out to the behavioural characteristic data lost in first sample behavioural characteristic data
At the second sample behavioural characteristic data, and using the second sample behavioural characteristic data as preset behavioural characteristic sample database.
In a preferred embodiment, as shown in figure 3, it is above-mentioned to the behavioural characteristic lost in first sample behavioural characteristic data
Data carry out the step of Supplementing Data processing, specifically include:
Step S5: first sample behavioural characteristic data are normalized.
In the embodiment of the present invention, the corresponding characteristic value of above-mentioned each behavioural characteristic is done into normalized, so that its numerical value
In 0~1 range, to facilitate subsequent processing.
Step S6: according to the behavioural characteristic number of the first sample behavioural characteristic data judgement sample user after normalized
According to whether losing.In practical applications, there can be the case where partial data loss, such as user first will when inputting character string
After " home " breaks into " homme ", and deletes and break into " hoem " again, according to the division of the type of above-mentioned mistake, user at this time
Bad ordering index accumulate once, but Doublet, there is no cumulative, which results in the loss of data.In addition exist
In actual conditions, in order to accomplish that Realtime Statistics, very big probability will appear the loss of partial data.
Step S7: the sample of users and its first sample behavioural characteristic data that behavioural characteristic data are not lost are as training
Collection, the sample of users and its first sample behavioural characteristic data of behavioural characteristic loss of data according to training set, are surveyed as test set
Examination collection and lasso regression model determine the weight between the behavioural characteristic for not losing data and the behavioural characteristic for losing data.
In the embodiment of the present invention, it is to be returned using Lasso to data progress completion is lost, behavioural characteristic data is not lost
Sample of users behavioural characteristic data as training set, by the behavioural characteristic data of the sample of users of behavioural characteristic loss of data
As test set, the characteristic value that behavioural characteristic is lost in test set is set asThe behavioural characteristic that data are lost in test set is denoted as
xk, corresponding x in training setkValue be denoted as y, remove xkThe characteristic value of the behavioural characteristic that do not lose in addition is denoted as X, passes through public affairs
Formula (4) calculates the weight between the behavioural characteristic that do not lose and the behavioural characteristic of loss:
Wherein, J (W) is loss function, and X is the characteristic value that training set does not lose behavioural characteristic, and N is the number of sample of users
Amount, y are the characteristic value that training set loses behavioural characteristic, and w is the behavioural characteristic that do not lose and loses the weight between feature, and α is
Hyper parameter.I.e. when above-mentioned loss function J (W) is minimum, acquires the behavioural characteristic that do not lose and loses the weight w between feature,
The characteristic value for losing the behavioural characteristic of data is calculated by formula (5):
Wherein,For the characteristic value for the behavioural characteristic that test set is lost, w be the behavioural characteristic do not lost and loss feature it
Between weight, b is bias.
Step S8: the behavioural characteristic data lost according to weight and training set completion form the second sample behavioural characteristic number
According to.
In the embodiment of the present invention, the characteristic value of the behavioural characteristic of data is lost according to above-mentioned weight and training set completion,
The behavioural characteristic data lost according to the characteristic value completion for the behavioural characteristic for losing data.
In a preferred embodiment, as shown in figure 4, above-mentioned steps S8 carries out Supplementing Data to the behavioural characteristic data of loss
After the step of processing, the keystroke characteristic abnormal user recognition methods further include:
Step S9: carrying out correlation analysis to the second sample behavioural characteristic data, obtains analysis result.
Step S10: based on the analysis results, the second sample behavioural characteristic is screened.
In embodiments of the present invention, the sample of users of test is that make by totally 88 samples (everyone 88 sections of texts of input) by 11 people
For training data, less data amount and more feature will lead to over-fitting in practical application, in order to alleviate over-fitting,
The verifying number for constructing sample is to be analyzed to carry out Feature Selection using correlation thermal map in the embodiment of the present invention, to be tested using intersecting
Card come verify data whether over-fitting.
In order to solve over-fitting, in the embodiment of the present invention, above-mentioned five kinds of error characteristics are analyzed, carry out sample
User has found there is very big relevance between certain features when testing, for example user is accidentally tapped " home " at " hoeem ",
It belongs to Bad ordering at this time, and belongs to Doublet, and the two often occurs simultaneously.Further, to five kinds of mistakes
Accidentally correlation analysis is carried out between feature.In the embodiment of the present invention, given threshold 0.5, the association between two kinds of behavioural characteristics
Property | α | when >=0.5, it is believed that the relevance between two kinds of behavioural characteristics is very big, α > 0.5, illustrates that two features have very high correlation
Property, it can replace mutually, as α < -0.5, it is believed that two features mutually inhibit, and can also replace mutually, by this method,
Behavioural characteristic is screened.It should be noted that the value of above-mentioned threshold value is not limited to this, in other embodiments in other realities
It applies and is also possible to other numerical value in example.
Step S11: dimension-reduction treatment is carried out to the data after screening by principal component analysis.
In the embodiment of the present invention, dimension is reduced using principal component analysis to the feature that S10 through the above steps has been screened, is mentioned
High operational efficiency.Principal component analysis is a kind of unsupervised statistical method, generally by means of orthogonal transformation, component is relevant
Vector is converted into the incoherent vector of component, is to convert orthogonal coordinates for original coordinate system in intuitive performance geometrically
Sample point is dispersed in multiple directions, and carries out dimension-reduction treatment to multidimensional variable by system.Specifically, it is right by formula (6) to can be
Data after screening carry out dimension-reduction treatment:
Wherein, x(i)For the feature vector of current dimension, x(i) approxIt is the feature vector after dimension-reduction treatment, α is preset
Threshold value, the quantity of m representative sample user.Threshold value in the embodiment of the present invention is set as 0.01, and however, it is not limited to this in other realities
It applies example and is being also possible to other numerical value.
Step S12: using the data after dimension-reduction treatment as user behavior characteristics sample database.
In the embodiment of the present invention, using the second sample behavioural characteristic data after dimension-reduction treatment as preset behavioural characteristic sample
This library.
In a preferred embodiment, as shown in figure 5, above-mentioned steps S2 is special according to preset disaggregated model and preset behavior
The step of sign sample database identifies the behavioural characteristic data of user to be identified, generates recognition result, specifically includes:
Step S211: the sample of users set of preset behavioural characteristic sample database is obtained.
Sample of users in the embodiment of the present invention is 11, therefore user's collection is combined into U={ u1,u2,……,un(n=
11)。
Step S212: successively using one of sample of users in sample of users set as positive collection, other sample of users
As negative collection.
Step S213: according to the behavioural characteristic data and support vector machines point after dimension-reduction treatment of user to be identified
Class model obtains recognition result.In the embodiment of the present invention, if x={ t1,t2,……,tmIt is a user to be identified, and each t
A respectively behavioural characteristic, in the embodiment of the present invention, the behavioural characteristic after dimension-reduction treatment is 6, therefore m=6.
Step S214: recognition result is ranked up, and the maximum value of recognition result is corresponding as the sample just collected use
The identity at family is determined as the identity of user to be identified.
In the embodiment of the present invention, the behavioural characteristic of the sample of users of some classification is successively classified as one kind when being classified,
The behavioural characteristic of other remaining sample of users is classified as another kind of, and the sample of such k classification has just constructed k supporting vector
Machine classifier.In the embodiment of the present invention, (namely 11 Label) will be divided by sharing 11 classes, they are respectively U={ u1,
u2,……,un(n=11), then extract training set when, respectively extract:
1)u1Corresponding vector is as positive collection, U={ u2,……,unVector corresponding to (n=11) is as negative collection;
2)u2Corresponding vector is as positive collection, U={ u1,u3,……,unVector corresponding to (n=11) is as negative
Collection;
3)u3Corresponding vector is as positive collection, U={ u1,u2,……,unVector corresponding to (n=11) is as negative
Collection;
4)u4Corresponding vector is as positive collection, U={ u1,u2,……,unVector corresponding to (n=11) is as negative
Collection, and so on.
5) it is trained respectively using this 11 training sets, then obtains 11 classification results f1(x), f2(x), f3
(x)……f11(x), in 11 values maximum one be used as classification results, and by corresponding as the sample of users just collected
Identity is determined as the identity of user to be identified.
Step S215: judge whether the identity of user to be identified belongs to sample of users set.
Step S216: when the identity of user to be identified is not belonging to sample of users set, determine that user to be identified is
Abnormal user.
In the embodiment of the present invention, when user to be identified is one of above-mentioned sample of users, for example, sample is used
Family u1, be sample of users in the user identity obtained after classifier is classified it is u2When sample of users, just illustrate to be identified
User keystroke abnormal behavior, be determined as abnormal user.
In another embodiment, as shown in fig. 6, above-mentioned steps S2 is according to preset disaggregated model and preset behavioural characteristic
The step of sample database identifies the behavioural characteristic data of user to be identified, generates recognition result, specifically includes:
Step S221: the sample of users set of user behavior characteristics sample database is obtained.
In embodiments of the present invention, obtaining the sample of users in the embodiment of the present invention is 11, therefore user's collection is combined into U=
{u1,u2,……,un(n=11).
Step S222: successively using user gather in one of user's sample as positive collection, other users sample conduct
Negative collection;
Step S223: according to the behavioural characteristic data and support vector machines point after dimension-reduction treatment of user to be identified
It is identified in class model, obtains recognition result.In the embodiment of the present invention, if x={ t1,t2,……,tmIt is one to be identified
User, and each t is respectively a behavioural characteristic, in the embodiment of the present invention, the behavioural characteristic after dimension-reduction treatment is 6,
Therefore m=6.
Step S224: being ranked up according to recognition result, judges whether the maximum value in recognition result is greater than a preset value.
In the embodiment of the present invention, the behavioural characteristic of the sample of users of some classification is successively classified as one kind when being classified,
The behavioural characteristic of other remaining sample of users is classified as another kind of, and the sample of such k classification has just constructed k supporting vector
Machine classifier.In the embodiment of the present invention, (namely 11 Label) will be divided by sharing 11 classes, they are respectively U={ u1,
u2,……,un(n=11) then extract training set when, respectively extract:
1)u1Corresponding vector is as positive collection, U={ u2,……,unVector corresponding to (n=11) is as negative collection;
2)u2Corresponding vector is as positive collection, U={ u1,u3,……,unVector corresponding to (n=11) is as negative
Collection;
3)u3Corresponding vector is as positive collection, U={ u1,u2,……,unVector corresponding to (n=11) is as negative
Collection;
4)u4Corresponding vector is as positive collection, U={ u1,u2,……,unVector corresponding to (n=11) is as negative
Collection, and so on.
5) it is trained respectively using this 11 training sets, then obtains 11 classification results f1(x), f2(x), f3
(x)……f11(x), it and to 11 recognition results is ranked up, it is default to judge whether the maximum value in recognition result is greater than one
It is worth, the preset value in the embodiment of the present invention is 0.8, and however, it is not limited to this in other embodiments can also be according to applied field
Scape sets corresponding numerical value.
Step S225: when the maximum value in recognition result is greater than preset value, by the identity as the sample of users just collected
It is determined as the identity of user to be identified.
Step S226: when the maximum value in recognition result is less than preset value, determine user to be identified for abnormal user.
In the embodiment of the present invention, when the behavioural characteristic of user to be identified is carried out by classifier and behavioural characteristic sample database
After classification, needs to set preset value to the result of classification, only when being greater than the preset value, just can determine that the affiliated sample of its identity
Which of user illustrates when being less than preset value for abnormal user.
Keystroke characteristic abnormal user recognition methods provided in an embodiment of the present invention based on support vector machines, by obtaining sample
This user inputs the first sample behavioural characteristic data of default sample, quantity including Key stroke character difference type of error,
The stability of keystroke speed, keystroke average speed, keystroke instantaneous velocity, keystroke accuracy and keystroke, will can more embody user and hit
The behavioural characteristic of key individual difference characterization user identity is as behavioural characteristic library, the in addition behavior to losing in behavioural characteristic data
Characteristic is used as preset behavioural characteristic sample database after carrying out Supplementing Data processing, so that sample of users behavioural characteristic data are more
Add it is whole so that discrimination is greatly improved compared with prior art.
Embodiment 2
The embodiment of the present invention provides a kind of keystroke characteristic abnormal user identifying system based on support vector machines, such as Fig. 7 institute
Show, being somebody's turn to do the keystroke characteristic abnormal user identifying system based on support vector machines includes:
User behavior characteristics extraction module 1 to be identified inputs the behavior of default sample for obtaining user to be identified
Characteristic.The method that this module specifically executes step S1 in embodiment 1, details are not described herein.
User's categorization module 2 to be identified, for according to preset disaggregated model and preset behavioural characteristic sample database pair
The behavioural characteristic data of user to be identified are classified, and classification results are generated.This module specifically executes step S2 in embodiment 1
Method, details are not described herein.
In the embodiment of the present invention, the method for establishing above-mentioned preset behavioural characteristic library, referring to the step of being recorded in embodiment 1
S3~S12, details are not described herein.
Keystroke characteristic abnormal user identifying system provided in an embodiment of the present invention based on support vector machines, by obtaining sample
This user inputs the first sample behavioural characteristic data of default sample, quantity including Key stroke character difference type of error,
The stability of keystroke speed, keystroke average speed, keystroke instantaneous velocity, keystroke accuracy and keystroke, will can more embody user and hit
The behavioural characteristic of key individual difference characterization user identity is as behavioural characteristic library, the in addition behavior to losing in behavioural characteristic data
Characteristic is used as preset behavioural characteristic sample database after carrying out Supplementing Data processing, so that sample of users behavioural characteristic data are more
Add it is whole so that discrimination is greatly improved compared with prior art.
Embodiment 3
The embodiment of the present invention provides a kind of computer equipment, as shown in Figure 8, comprising: at least one processor 401, such as
CPU (Central Processing Unit, central processing unit), at least one communication interface 403, memory 404, at least one
A communication bus 402.Wherein, communication bus 402 is for realizing the connection communication between these components.Wherein, communication interface 403
It may include display screen (Display), keyboard (Keyboard), optional communication interface 403 can also include that the wired of standard connects
Mouth, wireless interface.Memory 404 can be high speed RAM memory, and (Ramdom Access Memory, effumability are deposited at random
Access to memory), it is also possible to non-labile memory (non-volatile memory), for example, at least a disk storage
Device.Memory 404 optionally can also be that at least one is located remotely from the storage device of aforementioned processor 401.Wherein processor
401 can combine the keystroke characteristic abnormal user identifying systems based on support vector machines of Fig. 3 description, store in memory 404
Batch processing code, and processor 401 calls the program code stored in memory 404, to be based on supporting vector for executing
The keystroke characteristic abnormal user recognition methods of machine, i.e., for executing such as the hitting based on support vector machines in FIG. 1 to FIG. 6 embodiment
Key feature abnormalities user identification method.
Wherein, communication bus 402 can be Peripheral Component Interconnect standard (peripheral component
Interconnect, abbreviation PCI) bus or expanding the industrial standard structure (extended industry standard
Architecture, abbreviation EISA) bus etc..Communication bus 402 can be divided into address bus, data/address bus, control bus etc..
Only to be indicated with a thick line in Fig. 8, it is not intended that an only bus or a type of bus convenient for indicating.
Wherein, memory 404 may include volatile memory (English: volatile memory), such as arbitrary access
Memory (English: random-access memory, abbreviation: RAM);Memory also may include nonvolatile memory (English
Text: non-volatile memory), for example, flash memory (English: flash memory), hard disk (English: hard disk
Drive, abbreviation: HDD) or solid state hard disk (English: solid-state drive, abbreviation: SSD);Memory 404 can also wrap
Include the combination of the memory of mentioned kind.
Wherein, processor 401 can be central processing unit (English: central processing unit, abbreviation:
CPU), the combination of network processing unit (English: network processor, abbreviation: NP) or CPU and NP.
Wherein, processor 401 can further include hardware chip.Above-mentioned hardware chip can be specific integrated circuit
(English: application-specific integrated circuit, abbreviation: ASIC), programmable logic device (English:
Programmable logic device, abbreviation: PLD) or combinations thereof.Above-mentioned PLD can be Complex Programmable Logic Devices
(English: complex programmable logic device, abbreviation: CPLD), field programmable gate array (English:
Field-programmable gate array, abbreviation: FPGA), Universal Array Logic (English: generic array
Logic, abbreviation: GAL) or any combination thereof.
Optionally, memory 404 is also used to store program instruction.Processor 401 can be instructed with caller, be realized such as this
Apply for the keystroke characteristic abnormal user recognition methods based on support vector machines in 1~Fig. 6 embodiment.
The embodiment of the present invention also provides a kind of computer readable storage medium, and meter is stored on computer readable storage medium
Calculation machine executable instruction, the computer executable instructions can be performed in above-mentioned any means embodiment based on support vector machines
Keystroke characteristic abnormal user recognition methods.Wherein, the storage medium can be magnetic disk, CD, read-only memory (Read-
Only Memory, ROM), random access memory (Random Access Memory, RAM), flash memory (Flash
Memory), hard disk (Hard Disk Drive, abbreviation: HDD) or solid state hard disk (Solid-State Drive, SSD) etc.;Institute
State the combination that storage medium can also include the memory of mentioned kind.
It should be understood by those skilled in the art that, the embodiment of the present invention can provide as method, system or computer program
Product.Therefore, complete hardware embodiment, complete software embodiment or reality combining software and hardware aspects can be used in the present invention
Apply the form of example.Moreover, it wherein includes the computer of computer usable program code that the present invention, which can be used in one or more,
The computer program implemented in usable storage medium (including but not limited to magnetic disk storage, CD-ROM, optical memory etc.) produces
The form of product.
The present invention be referring to according to the method for the embodiment of the present invention, the process of equipment (system) and computer program product
Figure and/or block diagram describe.It should be understood that every one stream in flowchart and/or the block diagram can be realized by computer program instructions
The combination of process and/or box in journey and/or box and flowchart and/or the block diagram.It can provide these computer programs
Instruct the processor of general purpose computer, special purpose computer, Embedded Processor or other programmable data processing devices to produce
A raw machine, so that being generated by the instruction that computer or the processor of other programmable data processing devices execute for real
The device for the function of being specified in present one or more flows of the flowchart and/or one or more blocks of the block diagram.
These computer program instructions, which may also be stored in, is able to guide computer or other programmable data processing devices with spy
Determine in the computer-readable memory that mode works, so that it includes referring to that instruction stored in the computer readable memory, which generates,
Enable the manufacture of device, the command device realize in one box of one or more flows of the flowchart and/or block diagram or
The function of being specified in multiple boxes.
These computer program instructions also can be loaded onto a computer or other programmable data processing device, so that counting
Series of operation steps are executed on calculation machine or other programmable devices to generate computer implemented processing, thus in computer or
The instruction executed on other programmable devices is provided for realizing in one or more flows of the flowchart and/or block diagram one
The step of function of being specified in a box or multiple boxes.
Obviously, the above embodiments are merely examples for clarifying the description, and does not limit the embodiments.It is right
For those of ordinary skill in the art, can also make on the basis of the above description it is other it is various forms of variation or
It changes.There is no necessity and possibility to exhaust all the enbodiments.And it is extended from this it is obvious variation or
It changes still within the protection scope of the invention.
Claims (13)
1. a kind of keystroke characteristic abnormal user recognition methods based on support vector machines, which comprises the steps of:
Obtain the behavioural characteristic data that user to be identified inputs default sample;
According to preset disaggregated model and preset behavioural characteristic sample database to the behavioural characteristic data of the user to be identified
It is identified, generates recognition result;
The preset behavioural characteristic sample database is established by following steps:
Obtain the first sample behavioural characteristic data that sample of users inputs default sample;
Supplementing Data processing is carried out to the behavioural characteristic data lost in the first sample behavioural characteristic data, forms the second sample
This behavioural characteristic data, and using the second sample behavioural characteristic data as the preset behavioural characteristic sample database.
2. the keystroke characteristic abnormal user recognition methods according to claim 1 based on support vector machines, which is characterized in that
The first sample behavioural characteristic data include at least one of the following contents: the number of Key stroke character difference type of error
The stability of amount, keystroke speed, keystroke average speed, keystroke instantaneous velocity, keystroke accuracy and keystroke.
3. the keystroke characteristic abnormal user recognition methods according to claim 1 based on support vector machines, which is characterized in that
Described the step of Supplementing Data processing is carried out to the behavioural characteristic data lost in the first sample behavioural characteristic data, specifically
Include:
The first sample behavioural characteristic data are normalized;
Whether lost according to the behavioural characteristic data of the first sample behavioural characteristic data judgement sample user after normalized;
The sample of users and its first sample behavioural characteristic data that behavioural characteristic data are not lost are as training set, behavioural characteristic
The sample of users and its first sample behavioural characteristic data of loss of data as test set, according to the training set, test set and
Lasso regression model determines the weight between the behavioural characteristic for not losing data and the behavioural characteristic for losing data;
According to the behavioural characteristic data that the weight and training set completion are lost, the second sample behavioural characteristic data are formed.
4. the keystroke characteristic abnormal user recognition methods according to claim 3 based on support vector machines, which is characterized in that
It the step of behavioural characteristic data lost according to the weight and training set completion, specifically includes:
The characteristic value of the behavioural characteristic of data is lost according to the weight and training set completion;
The behavioural characteristic data lost according to the characteristic value completion for the behavioural characteristic for losing data.
5. the keystroke characteristic abnormal user recognition methods according to claim 4 based on support vector machines, which is characterized in that
The weight being calculated by the following formula between the behavioural characteristic for not losing data and the behavioural characteristic for losing data:
Wherein, J (W) is loss function, and X is the characteristic value of the behavioural characteristic for not losing data in the training set, and N is sample
The quantity of user, y are the characteristic value of the behavioural characteristic of the loss data in the training set, and w is the behavior spy for not losing data
Weight between sign and the behavioural characteristic for losing data, α is hyper parameter.
6. the keystroke characteristic abnormal user recognition methods according to claim 5 based on support vector machines, which is characterized in that
It is calculated by the following formula the characteristic value for losing the behavioural characteristic of data:
Wherein,For in the test set lose data behavioural characteristic characteristic value, w be do not lose data behavioural characteristic and
The weight between the behavioural characteristic of data is lost, b is bias.
7. the keystroke characteristic abnormal user recognition methods according to claim 6 based on support vector machines, which is characterized in that
In the behavioural characteristic data lost according to the weight and training set completion, the step of the second sample behavioural characteristic data is formed
After rapid, further includes:
Correlation analysis is carried out to the second sample behavioural characteristic data, obtains analysis result;
According to the analysis as a result, being screened to the second sample behavioural characteristic;
Dimension-reduction treatment is carried out to the data after the screening by principal component analysis;
Using the data after dimension-reduction treatment as user behavior characteristics sample database.
8. the keystroke characteristic abnormal user recognition methods according to claim 7 based on support vector machines, which is characterized in that
Dimension-reduction treatment is carried out to the data after the screening by following formula:
Wherein, x(i)For the feature vector of current dimension, x(i) approxIt is the feature vector after dimension-reduction treatment, α is preset threshold value,
M represents the quantity of the sample of users.
9. the keystroke characteristic abnormal user recognition methods according to claim 8 based on support vector machines, which is characterized in that
It is described according to preset disaggregated model and preset behavioural characteristic sample database to the behavioural characteristic data of the user to be identified
The step of being identified, generating recognition result, specifically includes:
Obtain the sample of users set of the preset behavioural characteristic sample database;
Successively using one of sample of users in the sample of users set as positive collection, other sample of users are as negative collection;
It is obtained according to behavioural characteristic data after dimension-reduction treatment of the user to be identified and support vector cassification model
To recognition result;
The recognition result is ranked up, the corresponding identity as the sample of users just collected of the maximum value of recognition result is true
It is set to the identity of the user to be identified;
Judge whether the identity of the user to be identified belongs to the sample of users set;
When the identity of the user to be identified is not belonging to the sample of users set, determine that the user to be identified is different
Common family.
10. the keystroke characteristic abnormal user recognition methods according to claim 9 based on support vector machines, feature exist
In, it is described according to preset disaggregated model and preset behavioural characteristic sample database to the behavioural characteristic number of the user to be identified
According to the step of being identified, generating recognition result, specifically include:
Obtain the sample of users set of the user behavior characteristics sample database;
Successively using one of user's sample in user set as positive collection, other users sample is as negative collection;
According in behavioural characteristic data after dimension-reduction treatment of the user to be identified and support vector cassification model
It is identified, obtains recognition result;
It is ranked up according to the recognition result, judges whether the maximum value in the recognition result is greater than a preset value;
When the maximum value in the recognition result is greater than the preset value, the identity as the sample of users just collected is determined as
The identity of the user to be identified;
When the maximum value in the recognition result is less than the preset value, determine that the user to be identified is abnormal user.
11. a kind of keystroke characteristic abnormal user identifying system based on support vector machines characterized by comprising
User behavior characteristics extraction module to be identified inputs the behavioural characteristic number of default sample for obtaining user to be identified
According to;
User's categorization module to be identified, for according to preset disaggregated model and preset behavioural characteristic sample database to it is described to
The behavioural characteristic data of the user of identification are classified, and classification results are generated;
The preset behavioural characteristic sample database is established by following steps:
Obtain the first sample behavioural characteristic data that sample of users inputs default sample;
Supplementing Data processing is carried out to the behavioural characteristic data lost in the first sample behavioural characteristic data, forms the second sample
This behavioural characteristic data, and using the second sample behavioural characteristic data as the preset behavioural characteristic sample database.
12. a kind of computer equipment characterized by comprising at least one processor, and at least one described processor
The memory of communication connection, wherein the memory is stored with the instruction that can be executed by least one described processor, the finger
It enables and being executed by least one described processor, so that at least one described processor executes any institute in the claims 1-10
The keystroke characteristic abnormal user recognition methods based on support vector machines stated.
13. a kind of computer readable storage medium, which is characterized in that the computer-readable recording medium storage has computer to refer to
It enables, it is any described based on supporting vector in the claims 1-10 that the computer instruction is used to making the computer to execute
The keystroke characteristic abnormal user recognition methods of machine.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201810763718.4A CN109145554A (en) | 2018-07-12 | 2018-07-12 | A kind of recognition methods of keystroke characteristic abnormal user and system based on support vector machines |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201810763718.4A CN109145554A (en) | 2018-07-12 | 2018-07-12 | A kind of recognition methods of keystroke characteristic abnormal user and system based on support vector machines |
Publications (1)
Publication Number | Publication Date |
---|---|
CN109145554A true CN109145554A (en) | 2019-01-04 |
Family
ID=64800424
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201810763718.4A Pending CN109145554A (en) | 2018-07-12 | 2018-07-12 | A kind of recognition methods of keystroke characteristic abnormal user and system based on support vector machines |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN109145554A (en) |
Cited By (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN109873813A (en) * | 2019-01-28 | 2019-06-11 | 平安科技(深圳)有限公司 | Text input abnormality monitoring method, device, computer equipment and storage medium |
CN110502883A (en) * | 2019-08-23 | 2019-11-26 | 四川长虹电器股份有限公司 | A kind of keystroke abnormal behavior detection method based on PCA |
Citations (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
WO2006111963A2 (en) * | 2005-04-17 | 2006-10-26 | Rafael - Armament Development Authority Ltd. | Generic classification system |
CN105450412A (en) * | 2014-08-19 | 2016-03-30 | 阿里巴巴集团控股有限公司 | Identity authentication method and device |
-
2018
- 2018-07-12 CN CN201810763718.4A patent/CN109145554A/en active Pending
Patent Citations (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
WO2006111963A2 (en) * | 2005-04-17 | 2006-10-26 | Rafael - Armament Development Authority Ltd. | Generic classification system |
CN105450412A (en) * | 2014-08-19 | 2016-03-30 | 阿里巴巴集团控股有限公司 | Identity authentication method and device |
Non-Patent Citations (1)
Title |
---|
单鹏飞 等: "基于支持向量机的击键特征异常用户识别", 《电脑知识与技术》 * |
Cited By (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN109873813A (en) * | 2019-01-28 | 2019-06-11 | 平安科技(深圳)有限公司 | Text input abnormality monitoring method, device, computer equipment and storage medium |
CN110502883A (en) * | 2019-08-23 | 2019-11-26 | 四川长虹电器股份有限公司 | A kind of keystroke abnormal behavior detection method based on PCA |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
WO2021026805A1 (en) | Adversarial example detection method and apparatus, computing device, and computer storage medium | |
CN107577945A (en) | URL attack detection methods, device and electronic equipment | |
CN113489685B (en) | Secondary feature extraction and malicious attack identification method based on kernel principal component analysis | |
CN106803039B (en) | A kind of homologous determination method and device of malicious file | |
CN108363902A (en) | A kind of accurate prediction technique of pathogenic hereditary variation | |
WO2021111540A1 (en) | Evaluation method, evaluation program, and information processing device | |
CN109189892A (en) | A kind of recommended method and device based on article review | |
Vignotto et al. | Extreme Value Theory for Open Set Classification--GPD and GEV Classifiers | |
Neshatian et al. | Feature construction and dimension reduction using genetic programming | |
CN114707571A (en) | Credit data anomaly detection method based on enhanced isolation forest | |
CN109145554A (en) | A kind of recognition methods of keystroke characteristic abnormal user and system based on support vector machines | |
CN106529470A (en) | Gesture recognition method based on multistage depth convolution neural network | |
CN115798022A (en) | Artificial intelligence identification method based on feature extraction | |
CN110378389A (en) | A kind of Adaboost classifier calculated machine creating device | |
Bader-El-Den | Self-adaptive heterogeneous random forest | |
CN112016317A (en) | Sensitive word recognition method and device based on artificial intelligence and computer equipment | |
CN109101984A (en) | A kind of image-recognizing method and device based on convolutional neural networks | |
US20170293863A1 (en) | Data analysis system, and control method, program, and recording medium therefor | |
CN113128556B (en) | Deep learning test case sequencing method based on mutation analysis | |
CN108875060A (en) | A kind of website identification method and identifying system | |
CN113918471A (en) | Test case processing method and device and computer readable storage medium | |
CN107871141A (en) | A kind of classification Forecasting Methodology and classification fallout predictor for non-equilibrium data collection | |
CN113591881A (en) | Intention recognition method and device based on model fusion, electronic equipment and medium | |
CN110309285A (en) | Automatic question-answering method, device, electronic equipment and storage medium | |
CN117708569B (en) | Identification method, device, terminal and storage medium for pathogenic microorganism information |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
TA01 | Transfer of patent application right | ||
TA01 | Transfer of patent application right |
Effective date of registration: 20210402 Address after: 325006 Wenzhou Higher Education Park, Zhejiang Province (Chashan Town, Ouhai District) Applicant after: Wenzhou University Address before: 325000 Room 203, 2nd floor, area D, building 14, Haixi e-commerce Science Park, Lingxi Town, Cangnan County, Wenzhou City, Zhejiang Province Applicant before: WENZHOU UNIVERSITY CANGNAN Research Institute |
|
RJ01 | Rejection of invention patent application after publication | ||
RJ01 | Rejection of invention patent application after publication |
Application publication date: 20190104 |