CN109063431B - User identity recognition method for weighting keystroke characteristic curve difference degree - Google Patents

User identity recognition method for weighting keystroke characteristic curve difference degree Download PDF

Info

Publication number
CN109063431B
CN109063431B CN201810644782.0A CN201810644782A CN109063431B CN 109063431 B CN109063431 B CN 109063431B CN 201810644782 A CN201810644782 A CN 201810644782A CN 109063431 B CN109063431 B CN 109063431B
Authority
CN
China
Prior art keywords
keystroke
data set
characteristic curve
time
interval time
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Expired - Fee Related
Application number
CN201810644782.0A
Other languages
Chinese (zh)
Other versions
CN109063431A (en
Inventor
王林
贺冰清
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Xian University of Technology
Original Assignee
Xian University of Technology
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Xian University of Technology filed Critical Xian University of Technology
Priority to CN201810644782.0A priority Critical patent/CN109063431B/en
Publication of CN109063431A publication Critical patent/CN109063431A/en
Application granted granted Critical
Publication of CN109063431B publication Critical patent/CN109063431B/en
Expired - Fee Related legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F21/00Security arrangements for protecting computers, components thereof, programs or data against unauthorised activity
    • G06F21/30Authentication, i.e. establishing the identity or authorisation of security principals
    • G06F21/31User authentication
    • G06F21/316User authentication by observing the pattern of computer usage, e.g. typical user behaviour

Landscapes

  • Engineering & Computer Science (AREA)
  • Computer Security & Cryptography (AREA)
  • Computer Hardware Design (AREA)
  • General Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • General Health & Medical Sciences (AREA)
  • Social Psychology (AREA)
  • Health & Medical Sciences (AREA)
  • Software Systems (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • Collating Specific Patterns (AREA)
  • Complex Calculations (AREA)

Abstract

The invention discloses a user identity recognition method of weighted keystroke characteristic curve difference, which comprises the following specific steps of firstly extracting a keystroke interval time data set and a half-time characteristic data set, then calculating the mean value and the standard deviation of the keystroke interval time data set and the half-time data, the upper/lower boundary of the keystroke interval time characteristic curve and the upper/lower boundary of the half-time characteristic curve, and the keystroke interval time weighted characteristic curve difference and the half-time characteristic curve difference, and finally recognizing the user identity by utilizing the weighted curve difference and the characteristic curve difference. Compared with the traditional keystroke authentication algorithm only using the keystroke duration and the keystroke time interval, the user identity authentication and identification method based on the characteristic curve difference has better performance, reduces the error rejection rate, the error acceptance rate and the equal error rate, and improves the identification accuracy.

Description

User identity recognition method for weighting keystroke characteristic curve difference degree
Technical Field
The invention belongs to the technical field of biometric authentication methods, and relates to a user identity identification method adopting a weighted keystroke characteristic curve difference degree.
Background
In recent years, we have used a large number of online web applications, including social media platforms (e.g., Facebook, Twitter, Weibo), cloud storage services (e.g., Drobox, Google Drive), and some online web games. However, cyber crimes from these Web applications have been unknowingly spread around the world. The serious cyber crime means that some criminals invade the account of a victim by using the internet and steal sensitive information including passwords and financial properties, and in order to solve the problem of the theft, an additional biometric authentication mechanism is introduced into an online program or equipment to improve the security of the user account. Among various current computer security measures, one is to use the traditional authentication technology based on passwords, but the passwords are easy to leak; another is to use some physical tokens (smart cards, etc.) instead of simple passwords, but this method requires the system to be equipped with corresponding hardware devices, which increases the cost and also has problems of loss, theft, duplication, etc. of the physical tokens. Since human biometrics have the characteristics of being non-reproducible, difficult to change and the like, biometric identification technology becomes a research hotspot. Common biometric techniques are: fingerprint identification technology, face identification technology, iris identification technology and the like. However, the above techniques all need to be equipped with hardware devices with high cost, which makes the application thereof inconvenient and difficult to popularize.
The keystroke dynamic identity authentication is a biometric authentication technology for identity recognition based on keystroke characteristics (such as keystroke delay, keystroke force and the like), and the method carries out the identification of the identity of a user by monitoring the keyboard input of the user, collecting keystroke data and carrying out classification modeling on the keystroke behavior characteristics of the user. Compared with other biological identification technologies, the keystroke dynamic identity authentication has the advantages of low cost, high flexibility and the like, and does not need extra expensive hardware equipment.
Disclosure of Invention
The invention aims to provide a user identity recognition method adopting the difference degree of a weighted keystroke characteristic curve, which solves the problem that the prior authentication method only adopts the size of each keystroke characteristic contained in a keystroke characteristic vector to carry out identity recognition, does not utilize the change rate between two adjacent characteristic values, and thus has low accuracy.
The technical scheme adopted by the invention is that the user identity identification method for weighting the difference degree of the keystroke characteristic curves is implemented according to the following steps:
step 1, collecting data, and establishing a half-time characteristic data set and a keystroke interval time data set;
step 2, respectively calculating the mean value and standard deviation of the keystroke interval time data set and the mean value and standard deviation of the half-time characteristic data set;
step 3, calculating the upper/lower boundary of the keystroke interval time characteristic curve according to the mean value and the standard deviation of the keystroke interval time data set, and calculating the upper/lower boundary of the half-time characteristic curve according to the mean value and the standard deviation of the half-time characteristic data set;
step 4, calculating the difference degree of the keystroke interval time weighting characteristic curve according to the upper/lower boundary of the keystroke interval time characteristic curve, and calculating the difference degree of the half-time characteristic curve according to the upper/lower boundary of the half-time characteristic curve;
and 5, identifying the user identity by using the difference degree of the weighting curve and the difference degree of the characteristic curve.
The present invention is also characterized in that,
the step 1 comprises the following concrete implementation steps:
1.1, screening k representative specific double-key character sequences from original keystroke information of a free text to form a specific character sequence set SK;
1.2 calculating the frequency of use λ of each double bondjJ-1, 2, …, k, constructing a user' S keystroke interval time dataset SppAnd a half-time feature data set Sst,SppAnd SstIs expressed as follows:
Figure BDA0001703248220000031
Sst={Vi st=[WPMi,Pi,N_UD,Pi,error,Pi,CapsLock,Pi,Shift]|i=1,2,…,n} (2)
wherein: wherein k is the number of the selected specific double bond character sequences, Vi pp∈RkThe time vector sample is spaced for the ith keystroke,
Figure BDA0001703248220000032
the inter-keystroke interval for the last specific double-bond character sequence in the ith sample,
Figure BDA0001703248220000033
the key stroke interval time (j is 1, …, k) of j-th specific double-key character sequence in the ith sample, and m is the number of collected key stroke interval time vector samples; vi st∈R5For the ith half-time eigenvector sample, WPMi、Pi,N_UD、Pi,error、Pi,CapsLockAnd Pi,ShiftAverage key stroke speed, occurrence frequency of negative interval time RP, input error rate, usage frequency of CapsLock key and usage frequency of Shift key, P, of ith sampleN_UD、Perror、PShiftAnd PCapsLockHas a variation range of [0,1 ]]The average keystroke speed WPM varies in the range of [0, + ∞), and typically the WPM is on the order of 102The magnitude of the half-time characteristic is obviously different from that of other half-time characteristics, and n is the number of collected half-time characteristic vector samples;
1.3 half-time feature data set SstThe normalization formula of the average keystroke speed WPM in (1) for normalization processing is as follows:
Figure BDA0001703248220000034
in the formula: max { WPM i1, …, n is the maximum average keystroke velocity in the sample, denoted WPMmaxAfter normalization, the half-time feature data set S is processedstIt is briefly described as
Sst={Vi st=[vi,1,vi,2,vi,3,vi,4,vi,5]|i=1,2,…n} (4)
In the formula:
Figure BDA0001703248220000041
vi,2=Pi,N_UD,vi,3=Pi,error,vi,4=Pi,CapsLock,vi,5=Pi,Shift
the method for calculating the mean and standard deviation of the keystroke interval time data set and the mean and standard deviation of the half-time characteristic data set in the step 2 comprises the following steps:
set data set SppThe mean value of all elements in the formula is
Figure BDA0001703248220000042
Data set SstThe mean value of all elements in the formula is
Figure BDA0001703248220000043
Then
Figure BDA0001703248220000044
Figure BDA0001703248220000045
Set data set SppThe standard deviation of all elements in the composition is
Figure BDA0001703248220000046
Data set SstThe standard deviation of the elements contained in (A) is
Figure BDA0001703248220000047
Then
Figure BDA0001703248220000048
Figure BDA0001703248220000049
The method for calculating the upper/lower boundary of the keystroke interval time characteristic curve and the upper/lower boundary of the half-time characteristic curve in the step 3 comprises the following steps:
set data set SppThe upper and lower boundary vectors of the elements contained in (1) are respectively
Figure BDA00017032482200000410
Data set SstThe upper and lower boundary vectors of the elements contained in (1) are respectively
Figure BDA00017032482200000411
The upper boundary of the inter-keystroke time characteristic curve
Figure BDA00017032482200000412
Lower boundary
Figure BDA00017032482200000413
Is calculated as the following equation (9), upper boundary v of the half-time characteristic curveu,lLower boundary vd,lIs calculated as follows (10):
Figure BDA00017032482200000414
Figure BDA00017032482200000415
in the formula:
Figure BDA00017032482200000416
and
Figure BDA00017032482200000417
is an adjustable threshold.
The method for calculating the difference degree of the keystroke interval time weighting characteristic curve and the difference degree of the half-time characteristic curve in the step 4 comprises the following steps:
sample time vector for setting any one keystroke interval
Figure BDA0001703248220000051
The sample is in the data set SppWeighted feature curve difference degree in (1)
Figure BDA0001703248220000052
The calculation formula of (2) is as follows:
Figure BDA0001703248220000053
in the formula:
Figure BDA0001703248220000054
Figure BDA0001703248220000055
Figure BDA0001703248220000056
Figure BDA0001703248220000057
Figure BDA0001703248220000058
wherein: lambda [ alpha ]jFor each specific double-bond character sequence, j ═ 1,2, …, k;
let any half-time eigenvector sample
Figure BDA0001703248220000059
In a data set SstDegree of difference of medium characteristic curve
Figure BDA00017032482200000510
Is composed of
Figure BDA0001703248220000061
In the formula:
Figure BDA0001703248220000062
Figure BDA0001703248220000063
Figure BDA0001703248220000064
Figure BDA0001703248220000065
Figure BDA0001703248220000066
a keystroke interval time data set S is calculated from the frequency of use of each double key in set SK and equation (11)ppThe difference degree of the weighted characteristic curve of each element in the key stroke interval time characteristic curve is formed into a key stroke interval time characteristic curve difference degree set Qpp(ii) a Calculating a half-time feature data set S from equation (12)stThe difference degree of the characteristic curve of each element in the graph is formed into a half-time characteristic curve difference degree set QstThe above-mentioned sets are defined as
Figure BDA0001703248220000067
Figure BDA0001703248220000068
In the formula:
Figure BDA0001703248220000069
representing a data set SppMiddle element Vi pp∈RkThe degree of difference of the weighted characteristic curves of (1),
Figure BDA00017032482200000610
representing a data set SstMiddle element Vi st∈R5The degree of difference in characteristic curves of (a).
The method for identifying the user identity by using the difference degree of the weighting curve and the difference degree of the characteristic curve in the step 5 comprises the following steps:
the test sample is judged according to the following inequality
Figure BDA0001703248220000071
Figure BDA0001703248220000072
In the formula:
Figure BDA0001703248220000073
and
Figure BDA0001703248220000074
is an adjustable threshold;
if inequality (15) and equation (16) are both true, the test sample is determined to belong to the user; otherwise, the test sample is deemed not to belong to the user.
Threshold value in step 4
Figure BDA0001703248220000075
And
Figure BDA0001703248220000076
the value ranges of (1) are all 0-3.
Threshold value in step 5
Figure BDA0001703248220000077
And
Figure BDA0001703248220000078
the value range of (A) is not less than 0.
Compared with the traditional keystroke authentication algorithm only using the keystroke duration and the keystroke time interval, the user identity authentication and identification algorithm based on the characteristic curve difference has better performance, reduces the error rejection rate (FRR), the error acceptance rate (FAR) and the equal error rate (ERR), and improves the identification accuracy.
Drawings
FIG. 1 is a key stroke duration characteristic curve of a user identification method using weighted key stroke characteristic curve disparity according to the present invention;
FIG. 2 is a half-time characteristic curve of a user identification method using weighted keystroke characteristic curve diversity in accordance with the present invention;
FIG. 3 is a data set S of a user identification method using weighted keystroke profile differences according to the present inventionppThe upper and lower boundary graphs of the keystroke characteristic;
FIG. 4 is a data set S of a user identification method using weighted keystroke profile differences according to the present inventionstThe upper and lower boundary graphs of the keystroke characteristic;
FIG. 5 is a graph showing the variation of the performance index ERR of the free text keystroke characteristic authentication algorithm with TP according to the method for identifying a user identity using the difference of weighted keystroke characteristic curves of the present invention;
FIG. 6 is a schematic diagram of the division of keystroke data set regions in the user identification method using weighted keystroke profile differences according to the present invention;
FIG. 7 is a schematic diagram of an internal sample of the method for identifying a user identity using a weighted keystroke profile difference according to the present invention;
FIG. 8 is a schematic diagram of an external sample of the method for identifying a user identity using a weighted keystroke profile difference according to the present invention;
FIG. 9 is a diagram of the difference of the weighted key characteristic curves of the user identification method using the difference of the weighted key characteristic curves according to the present invention.
Detailed Description
The present invention will be described in detail below with reference to the accompanying drawings and specific embodiments.
The invention relates to a user identity recognition method for weighting the difference degree of a keystroke characteristic curve, which is implemented according to the following steps:
step 1, collecting data, and establishing a half-time characteristic data set and a keystroke interval time data set, wherein the specific implementation steps are as follows:
1.1, screening k representative specific double-key character sequences from original keystroke information of a free text to form a specific character sequence set SK;
1.2 calculating the frequency of use λ of each double bondjJ-1, 2, …, k, constructing a user' S keystroke interval time dataset SppAnd a half-time feature data set Sst,SppAnd SstIs expressed as follows:
Figure BDA0001703248220000081
Sst={Vi st=[WPMi,Pi,N_UD,Pi,error,Pi,CapsLock,Pi,Shift]|i=1,2,…,n} (2)
wherein: wherein k is the number of the selected specific double bond character sequences, Vi pp∈RkThe time vector sample is spaced for the ith keystroke,
Figure BDA0001703248220000082
the inter-keystroke interval for the last specific double-bond character sequence in the ith sample,
Figure BDA0001703248220000091
the key stroke interval time (j is 1, …, k) of j-th specific double-key character sequence in the ith sample, and m is the number of collected key stroke interval time vector samples; vi st∈R5For the ith half-time eigenvector sample, WPMi、Pi,N_UD、Pi,error、Pi,CapsLockAnd Pi,ShiftAverage key stroke speed, occurrence frequency of negative interval time RP, input error rate, usage frequency of CapsLock key and usage frequency of Shift key, P, of ith sampleN_UD、Perror、PShiftAnd PCapsLockHas a variation range of [0,1 ]]The average keystroke speed WPM varies in the range of [0, + ∞), and typically the WPM is on the order of 102The magnitude of the half-time characteristic is obviously different from that of other half-time characteristics, and n is the number of collected half-time characteristic vector samples;
1.3 half-time feature data setSstThe normalization formula of the average keystroke speed WPM in (1) for normalization processing is as follows:
Figure BDA0001703248220000092
in the formula: max { WPM i1, …, n is the maximum average keystroke velocity in the sample, denoted WPMmaxAfter normalization, the half-time feature data set S is processedstIt is briefly described as
Sst={Vi st=[vi,1,vi,2,vi,3,vi,4,vi,5]|i=1,2,…n} (4)
In the formula:
Figure BDA0001703248220000093
vi,2=Pi,N_UD,vi,3=Pi,error,vi,4=Pi,CapsLock,vi,5=Pi,Shift
step 2, respectively calculating the mean value and standard deviation of the keystroke interval time data set and the mean value and standard deviation of the half-time characteristic data set, wherein the specific calculation method comprises the following steps:
data set SppAny one element V ofi ppExpressed by a curve with the abscissa j and the ordinate j
Figure BDA0001703248220000094
Wherein j is 1, L, k; in the same way, the data set SstAny one element V ofi prRepresented by a curve with the abscissa l and the ordinate v in turni,lWhere L is 1, L,5, the data set S will be described for convenienceppAny one element V ofi ppIs called the key stroke interval time characteristic curve, and sets the data SstAny one element V ofi prThe curve of (a) is referred to as a half-time characteristic curve, which may also be referred to collectively as a keystroke characteristic curve.
Set data set SppThe mean value of all elements in the formula is
Figure BDA0001703248220000101
Data set SstThe mean value of all elements in the formula is
Figure BDA0001703248220000102
Then
Figure BDA0001703248220000103
Figure BDA0001703248220000104
Set data set SppThe standard deviation of all elements in the composition is
Figure BDA0001703248220000105
Data set SstThe standard deviation of the elements contained in (A) is
Figure BDA0001703248220000106
Then
Figure BDA0001703248220000107
Figure BDA0001703248220000108
Step 3, calculating the upper/lower boundary of the keystroke interval time characteristic curve according to the mean value and the standard deviation of the keystroke interval time data set, and calculating the upper/lower boundary of the half-time characteristic curve according to the mean value and the standard deviation of the half-time characteristic data set, wherein the specific calculation method comprises the following steps:
set data set SppThe upper and lower boundary vectors of the elements contained in (1) are respectively
Figure BDA0001703248220000109
Data set SstThe upper and lower boundary vectors of the elements contained in (1) are respectively
Figure BDA00017032482200001010
The upper boundary of the inter-keystroke time characteristic curve
Figure BDA00017032482200001011
Lower boundary
Figure BDA00017032482200001012
Is calculated as the following equation (9), upper boundary v of the half-time characteristic curveu,lLower boundary vd,lIs calculated as follows (10):
Figure BDA00017032482200001013
Figure BDA00017032482200001014
in the formula:
Figure BDA00017032482200001015
and
Figure BDA00017032482200001016
is an adjustable threshold value for the threshold value,
Figure BDA00017032482200001017
and
Figure BDA00017032482200001018
the value ranges of (1) are all 0-3;
Figure BDA00017032482200001019
and
Figure BDA00017032482200001020
the value range is determined according to the central limit theorem (i.e. the collected key stroke time characteristic quantity is assumed to be taken)From a normal distribution) of the measured values,
Figure BDA00017032482200001021
and
Figure BDA00017032482200001022
the larger the value is, the larger the range of the upper and lower boundaries is, the probability that the sample is in the boundaries is increased, so that the FRR value is reduced and the FAR value is increased;
Figure BDA00017032482200001023
and
Figure BDA00017032482200001024
the smaller the value, the smaller the range of the upper and lower boundaries, the lower the probability that the sample is within the boundaries, thereby increasing the FRR value and decreasing the FAR value. Selected by
Figure BDA0001703248220000111
And
Figure BDA0001703248220000112
the value should be as minimal as possible to bring the EER value to a minimum,
Figure BDA0001703248220000113
and
Figure BDA0001703248220000114
the value range is 0-3, and 2 can be selected generally.
Step 4, calculating the difference degree of the keystroke interval time weighting characteristic curve according to the upper/lower boundary of the keystroke interval time characteristic curve, and calculating the difference degree of the half-time characteristic curve according to the upper/lower boundary of the half-time characteristic curve, wherein the specific calculation method comprises the following steps:
data set SppAnd SstThe upper/lower boundary curve of (a) divides the entire two-dimensional plane into an inner region and an outer region, as shown in fig. 6. Time vector samples for any keystroke interval
Figure BDA0001703248220000115
If it is
Figure BDA0001703248220000116
All satisfy
Figure BDA0001703248220000117
The sample is completely at SppIs called the data set SppAs shown in fig. 7; otherwise, it is called dataset SppAs shown in fig. 8. Similarly, for any half-time eigenvector sample, we can derive
Figure BDA0001703248220000118
If it is
Figure BDA0001703248220000119
All have vd,l≤vs,l≤vu,lThen it is called data set SstAn inner sample; otherwise, it is called dataset SstAn external sample.
According to the above definition, if a sample is an external sample of a data set, the characteristic curve of the sample must form several closed regions with the upper or lower boundary curve of the data set in the corresponding external region, as shown by the shaded region in fig. 8. The greater the total area of all enclosed regions, the greater the difference between the representative sample and this dataset, and the greater the likelihood that the sample does not belong to this dataset. In combination with the characteristics of the keystroke characteristic information of the free text, the chapter improves the difference degree of the keystroke characteristic curve of the fixed text appropriately, extracts the concept of the difference degree of the weighted keystroke characteristic curve, and leads the concept to be associated with the use frequency of a specific double-key character sequence.
In the fixed text keystroke characteristic study, the physical meaning of the keystroke characteristic curve difference of any sample is the sum of all closed regions formed by the keystroke characteristic curve of the sample and the upper boundary or the lower boundary curve of the corresponding data set in the external region of the sample, and the area of each closed region mainly depends on the distance of each element in the characteristic vector exceeding the upper boundary or the lower boundary, such as d in fig. 92、d4And d7. Considering that there is a difference in the frequency of use of a specific double-bond character sequence screened from the free text, the distance of each element in the feature vector beyond the upper or lower boundary is multiplied by a corresponding weight coefficient, such as λ in fig. 82d2、λ4d4And λ7d7So that the allowable fluctuation range of the interval time of a specific double key stroke is inversely proportional to the frequency of use thereof. The weight coefficient multiplied by each element in the feature vector is in direct proportion to the use frequency of the corresponding double-bond character sequence, and the use frequency can be directly selected as the weight coefficient under general conditions.
Compared with the difference of the keystroke characteristic curve in the fixed text, the difference of the weighted keystroke characteristic curve obtained by the design method is characterized in that when the absolute values of the distances between any two elements in the characteristic vector exceeding the upper boundary or the lower boundary are equal, the variation of the difference of the characteristic curve caused by the element with a large weight coefficient is larger than that caused by the element with a small weight coefficient. Considering that the interval time of double key strokes with high frequency of use has better stability and smaller fluctuation amplitude than the double keys with low frequency of use, the interval time of double key strokes with high frequency of use should be differentiated so that the allowable fluctuation range of the interval time of double key strokes with high frequency of use is smaller than that of the double keys with low frequency of use. Therefore, it is more appropriate to use the weighted keystroke characteristic curve difference degree in the free text keystroke characteristic authentication process.
Sample time vector for setting any one keystroke interval
Figure BDA0001703248220000121
The sample is in the data set SppWeighted feature curve difference degree in (1)
Figure BDA0001703248220000122
The calculation formula of (2) is as follows:
Figure BDA0001703248220000131
in the formula:
Figure BDA0001703248220000132
Figure BDA0001703248220000133
Figure BDA0001703248220000134
Figure BDA0001703248220000135
Figure BDA0001703248220000136
wherein: lambda [ alpha ]jFor each specific double-bond character sequence, j ═ 1,2, …, k;
let any half-time eigenvector sample
Figure BDA0001703248220000137
In a data set SstDegree of difference of medium characteristic curve
Figure BDA0001703248220000138
Is composed of
Figure BDA0001703248220000141
In the formula:
Figure BDA0001703248220000142
Figure BDA0001703248220000143
Figure BDA0001703248220000144
Figure BDA0001703248220000145
Figure BDA0001703248220000146
a keystroke interval time data set S is calculated from the frequency of use of each double key in set SK and equation (11)ppThe difference degree of the weighted characteristic curve of each element in the key stroke interval time characteristic curve is formed into a key stroke interval time characteristic curve difference degree set Qpp(ii) a Calculating a half-time feature data set S from equation (12)stThe difference degree of the characteristic curve of each element in the graph is formed into a half-time characteristic curve difference degree set QstThe above-mentioned sets are defined as
Figure BDA0001703248220000147
Figure BDA0001703248220000148
In the formula:
Figure BDA0001703248220000149
representing a data set SppMiddle element
Figure BDA00017032482200001410
The degree of difference of the weighted characteristic curves of (1),
Figure BDA00017032482200001411
representing a data set SstMiddle element Vi st∈R5The degree of difference in characteristic curves of (a).
Step 5, the specific method for identifying the user identity by using the difference degree of the weighting curve and the difference degree of the characteristic curve comprises the following steps:
hypothesis sample
Figure BDA0001703248220000151
And VstAre respectively a set SppAnd SstDefining the difference degree of the characteristic curve of the internal sample as zero; otherwise, the difference of the characteristic curve of the sample is equal to the sum of the areas of all closed areas formed by the characteristic curve of the sample and the upper/lower boundary characteristic curve of the corresponding data set in the outer area of the sample.
The test sample is judged according to the following inequality
Figure BDA0001703248220000152
Figure BDA0001703248220000153
In the formula:
Figure BDA0001703248220000154
and
Figure BDA0001703248220000155
in order to be able to adjust the threshold value,
Figure BDA0001703248220000156
and
Figure BDA0001703248220000157
the value range of (A) is not less than 0;
if inequality (15) and equation (16) are both true, the test sample is determined to belong to the user; otherwise, the test sample is deemed not to belong to the user.
Example 1
A specific example of user identification is introduced.
Step 1: collecting data, establishing a half-time feature data set and a keystroke interval time data set
The experimental data acquisition is mainly carried out on a PC (personal computer) provided with a Windows system, a conventional mechanical keyboard is selected as keystroke information acquisition equipment, in addition, a user keystroke information acquisition program is written based on a VC + +6.0 development environment, and keystroke information of a user freely knocking the keyboard can be stored in a designated file through the program. Before the data collection job is started, the written keystroke information collection program is first installed into a computer used by a user participating in the experiment. During the data collection period, the user is required to run the keystroke information collection program after each turn-on of the computer, and the program display interface is shown in fig. 9. After the user clicks the 'start' button, the program starts to collect the free keystroke information of the user in a background running mode and stores the information into a key _ record. In the data acquisition process, the keystroke information acquisition program does not disturb the normal use of the computer by a user. Before the user turns off the computer each time, click [ end ] button to exit the keystroke information collection program.
After the raw data collection work of all participants is completed, the double-bond character sequence and the using times (frequency) used by each participant are extracted, and the statistical result is shown in table 1.
TABLE 1
Figure BDA0001703248220000161
The double bond character sequence with the frequency of use ranked first 15 (in order of frequency of use from high to low) and the number of uses by each participant during the experiment are listed in table 1. As can be seen from analysis, the double bond character sequences "in", "an", "ng", "zh", "wo", "en", "sh" and "ji" are commonly owned by all participants and have high use frequency, and can reflect that the double bond character sequences have certain universality. Therefore, in the experiments in this section, the 8 double-bond character sequences are selected to form a specific character sequence set SK, i.e., SK ═ { in, an, ng, zh, wo, en, sh, ji }.
After a specific character sequence set SK is selected, the use frequency of each double bond in the set SK is calculated according to the original collected data and is recorded as lambdajThe use frequency of the jth double key in the set SK is shown, j being 1,2, L, 8. In the process of free input, each participant collects a double key stroke interval time vector sample and a half-time feature vector sample in each period of time, and combines the data in the table 1, and each participant has at least 200 double key stroke interval time vector samples and 200 half-time feature vector samples.
Step 2, respectively calculating the mean value and standard deviation of the keystroke interval time data set and the mean value and standard deviation of the half-time data
The experimental scheme for identity authentication with the keystroke characteristic of the free text is basically similar to that of the fixed text, and only the keystroke characteristic information and the authentication algorithm used in the experiment are different.
The first 20%, 40%, 60% and 80% of the sample samples from each participant keystroke data set in turn are used as samples to establish the keystroke signature model for that participant. For the purpose of analysis of the experimental results, the variable TP represents the percentage of the number of samples of the above participants to the total number of samples.
Then, the last 80%, 60%, 40% and 20% of the samples from each participant keystroke data set were taken as test samples, respectively, and the error rejection rate FRR of that participant was calculated.
Next, all the sample samples of the other 9 participants were used as test samples, and the participant was attacked to calculate the false acceptance rate FAR of the participant.
The process loops until the FRR and FAR for 10 users are all calculated. And finally, taking the average value of all the FRRs and FARs of the participants as the performance index of the identity authentication algorithm.
Step 4, user identity identification
The results of the experiments show that the error rejection rate (FRR), the error acceptance rate (FAR) and the Equal Error Rate (EER) of the various algorithms are obtained when the percentage TP of the number of user samples to the total number of samples is 20%, 40%, 60% and 80%, respectively, and the results of the experiments are shown in table 2. The experimental results in table 2 show that, under the condition that TP values are different, the Equal Error Rates (EERs) of the authentication algorithm based on the difference degree of the weighted keystroke characteristic curves are 20.11%, 16.28%, 13.48% and 10.32%, respectively, which are significantly better than those of the other 2 comparison algorithms, the authentication accuracy is high, and the authentication effect on the user keystroke characteristics is more ideal. The authentication algorithm based on the difference degree of the weighted keystroke characteristic curve provided in this chapter not only contains the traditional keystroke interval time, but also introduces the change rate of the interval time, the use frequency of the double-key character sequence and other information in the process of calculating the difference degree of the weighted keystroke characteristic curve. Therefore, the algorithm provided by the chapter can more accurately describe the keystroke characteristics of the user, and further can improve the accuracy of identity authentication.
The performance index of the free text keystroke characteristic authentication algorithm is shown in table 2, and the change curve of the performance index ERR of the free text keystroke characteristic authentication algorithm along with TP is shown in fig. 5.
TABLE 2 Performance indicators for free text keystroke feature authentication algorithms
Figure BDA0001703248220000181
Figure BDA0001703248220000191
From the above experimental results, it can be seen that the user identity recognition method using the weighted keystroke characteristic curve difference has better performance than the conventional keystroke authentication algorithm using only the keystroke duration and the keystroke time interval, reduces the False Rejection Rate (FRR), the False Acceptance Rate (FAR) and the equal error rate (ERR), and improves the recognition accuracy.

Claims (1)

1. The user identity recognition method for weighting the difference degree of the keystroke characteristic curve is characterized by comprising the following steps of:
step 1, collecting data, and establishing a half-time characteristic data set and a keystroke interval time data set;
the specific implementation steps are as follows:
1.1, screening k representative specific double-key character sequences from original keystroke information of a free text to form a specific character sequence set SK;
1.2 calculating the frequency of use λ of each double bondjJ-1, 2, …, k, constructing a user' S keystroke interval time dataset SppAnd a half-time feature data set Sst,SppAnd SstIs expressed as follows:
Figure FDA0003189719830000011
Sst={Vi st=[WPMi,Pi,N_UD,Pi,error,Pi,CapsLock,Pi,Shift]|i=1,2,…,n} (2)
wherein: wherein k is the number of the selected specific double bond character sequences, Vi pp∈RkThe time vector sample is spaced for the ith keystroke,
Figure FDA0003189719830000012
the inter-keystroke interval for the last specific double-bond character sequence in the ith sample,
Figure FDA0003189719830000013
the key stroke interval time (j is 1, …, k) of j-th specific double-key character sequence in the ith sample, and m is the number of collected key stroke interval time vector samples; vi st∈R5For the ith half-time eigenvector sample, WPMi、Pi,N_UD、Pi,error、Pi,CapsLockAnd Pi,ShiftAverage keystroke rates for the ith sample, respectivelyDegree, occurrence frequency of negative interval time RP, input error rate, CapsLock key use frequency and Shift key use frequency, PN_UD、Perror、PShiftAnd PCapsLockHas a variation range of [0,1 ]]The average keystroke speed WPM varies in the range of [0, + ∞), and typically the WPM is on the order of 102The magnitude of the half-time characteristic is obviously different from that of other half-time characteristics, and n is the number of collected half-time characteristic vector samples;
1.3 half-time feature data set SstThe normalization formula of the average keystroke speed WPM in (1) for normalization processing is as follows:
Figure FDA0003189719830000021
in the formula: max { WPMi1, …, n is the maximum average keystroke velocity in the sample, denoted WPMmaxAfter normalization, the half-time feature data set S is processedstIt is briefly described as
Sst={Vi st=[vi,1,vi,2,vi,3,vi,4,vi,5]|i=1,2,…n} (4)
In the formula:
Figure FDA0003189719830000022
vi,2=Pi,N_UD,vi,3=Pi,error,vi,4=Pi,CapsLock,vi,5=Pi,Shift
step 2, respectively calculating the mean value and standard deviation of the keystroke interval time data set and the mean value and standard deviation of the half-time characteristic data set;
the specific calculation method comprises the following steps:
set data set SppThe mean value of all elements in the formula is
Figure FDA0003189719830000023
Data set SstThe mean value of all elements in the formula is
Figure FDA0003189719830000024
Then
Figure FDA0003189719830000025
Figure FDA0003189719830000026
Set data set SppThe standard deviation of all elements in the composition is
Figure FDA0003189719830000027
Data set SstThe standard deviation of the elements contained in (A) is
Figure FDA0003189719830000028
Then
Figure FDA0003189719830000029
Figure FDA00031897198300000210
Step 3, calculating the upper/lower boundary of the keystroke interval time characteristic curve according to the mean value and the standard deviation of the keystroke interval time data set, and calculating the upper/lower boundary of the half-time characteristic curve according to the mean value and the standard deviation of the half-time characteristic data set;
the specific calculation method comprises the following steps:
set data set SppThe upper and lower boundary vectors of the elements contained in (1) are respectively
Figure FDA00031897198300000211
Data set SstUpper and lower boundary vectors of elements contained in (1)Respectively in the amount of
Figure FDA0003189719830000031
The upper boundary of the inter-keystroke time characteristic curve
Figure FDA0003189719830000032
Lower boundary
Figure FDA0003189719830000033
Is calculated as the following equation (9), upper boundary v of the half-time characteristic curveu,lLower boundary vd,lIs calculated as follows (10):
Figure FDA0003189719830000034
Figure FDA0003189719830000035
in the formula:
Figure FDA0003189719830000036
and
Figure FDA0003189719830000037
being adjustable threshold values, threshold values
Figure FDA0003189719830000038
And
Figure FDA0003189719830000039
the value ranges of (1) are all 0-3;
step 4, calculating the difference degree of the keystroke interval time weighting characteristic curve according to the upper/lower boundary of the keystroke interval time characteristic curve, and calculating the difference degree of the half-time characteristic curve according to the upper/lower boundary of the half-time characteristic curve;
the specific calculation method comprises the following steps:
sample time vector for setting any one keystroke interval
Figure FDA00031897198300000310
The sample is in the data set SppWeighted feature curve difference degree in (1)
Figure FDA00031897198300000311
The calculation formula of (2) is as follows:
Figure FDA00031897198300000312
in the formula:
Figure FDA00031897198300000313
Figure FDA0003189719830000041
Figure FDA0003189719830000042
Figure FDA0003189719830000043
Figure FDA0003189719830000044
wherein: lambda [ alpha ]jFor each specific double-bond character sequence, j ═ 1,2, …, k;
let any half-time eigenvector sample
Figure FDA0003189719830000045
In a data set SstDegree of difference of medium characteristic curve
Figure FDA0003189719830000046
Is composed of
Figure FDA0003189719830000047
In the formula:
Figure FDA0003189719830000048
Figure FDA0003189719830000049
Figure FDA00031897198300000410
Figure FDA00031897198300000411
Figure FDA0003189719830000051
a keystroke interval time data set S is calculated from the frequency of use of each double key in set SK and equation (11)ppThe difference degree of the weighted characteristic curve of each element in the key stroke interval time characteristic curve is formed into a key stroke interval time characteristic curve difference degree set Qpp(ii) a Calculating a half-time feature data set S from equation (12)stThe difference degree of the characteristic curve of each element in the graph is formed into a half-time characteristic curve difference degree set QstThe above-mentioned sets are defined as
Figure FDA0003189719830000052
Figure FDA0003189719830000053
In the formula:
Figure FDA0003189719830000054
representing a data set SppMiddle element Vi pp∈RkThe degree of difference of the weighted characteristic curves of (1),
Figure FDA0003189719830000055
representing a data set SstMiddle element Vi st∈R5Degree of difference in characteristic curves of
Step 5, identifying the user identity by utilizing the difference degree of the keystroke interval time weighting characteristic curve and the difference degree of the half-time characteristic curve;
the specific method for identifying the identity comprises the following steps:
the test sample is judged according to the following inequality
Figure FDA0003189719830000056
Figure FDA0003189719830000057
In the formula:
Figure FDA0003189719830000058
and
Figure FDA0003189719830000059
to adjustable threshold, threshold
Figure FDA00031897198300000510
And
Figure FDA00031897198300000511
the value range of (A) is not less than 0;
if inequality (15) and equation (16) are both true, the test sample is determined to belong to the user; otherwise, the test sample is deemed not to belong to the user.
CN201810644782.0A 2018-06-21 2018-06-21 User identity recognition method for weighting keystroke characteristic curve difference degree Expired - Fee Related CN109063431B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201810644782.0A CN109063431B (en) 2018-06-21 2018-06-21 User identity recognition method for weighting keystroke characteristic curve difference degree

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201810644782.0A CN109063431B (en) 2018-06-21 2018-06-21 User identity recognition method for weighting keystroke characteristic curve difference degree

Publications (2)

Publication Number Publication Date
CN109063431A CN109063431A (en) 2018-12-21
CN109063431B true CN109063431B (en) 2021-10-22

Family

ID=64821322

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201810644782.0A Expired - Fee Related CN109063431B (en) 2018-06-21 2018-06-21 User identity recognition method for weighting keystroke characteristic curve difference degree

Country Status (1)

Country Link
CN (1) CN109063431B (en)

Families Citing this family (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111988294B (en) * 2020-08-10 2022-04-12 中国平安人寿保险股份有限公司 User identity recognition method, device, terminal and medium based on artificial intelligence

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101478401A (en) * 2009-01-21 2009-07-08 东北大学 Authentication method and system based on key stroke characteristic recognition
US7649478B1 (en) * 2005-11-03 2010-01-19 Hyoungsoo Yoon Data entry using sequential keystrokes
CN103703433A (en) * 2011-05-16 2014-04-02 触摸式有限公司 User input prediction
CN104809377A (en) * 2015-04-29 2015-07-29 西安交通大学 Method for monitoring network user identity based on webpage input behavior characteristics
CN105429937A (en) * 2015-10-22 2016-03-23 同济大学 Identity authentication method and system based on keystroke behaviors

Family Cites Families (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US9245017B2 (en) * 2009-04-06 2016-01-26 Caption Colorado L.L.C. Metatagging of captions

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US7649478B1 (en) * 2005-11-03 2010-01-19 Hyoungsoo Yoon Data entry using sequential keystrokes
CN101478401A (en) * 2009-01-21 2009-07-08 东北大学 Authentication method and system based on key stroke characteristic recognition
CN103703433A (en) * 2011-05-16 2014-04-02 触摸式有限公司 User input prediction
CN104809377A (en) * 2015-04-29 2015-07-29 西安交通大学 Method for monitoring network user identity based on webpage input behavior characteristics
CN105429937A (en) * 2015-10-22 2016-03-23 同济大学 Identity authentication method and system based on keystroke behaviors

Non-Patent Citations (4)

* Cited by examiner, † Cited by third party
Title
A New Distance Measure for Free Text Keystroke Authentication;H. Davoudi 等;《2009 14th International CSI Computer Conference》;20091021;第570-575页 *
Non-conventional keystroke dynamics for user authentication;Arwa Alsultan 等;《Pattern Recognition Letters》;20170401;第89卷;第53-59页 *
基于加权相对距离的自由文本击键特征认证识别方法;宋梦玲 等;《现代计算机》;20160205;第7-11页 *
采用击键特征曲线差异度的用户身份认证方法;王林 等;《计算机工程与应用》;20180313;第54卷(第22期);第160-166,196页 *

Also Published As

Publication number Publication date
CN109063431A (en) 2018-12-21

Similar Documents

Publication Publication Date Title
Kabir et al. Normalization and weighting techniques based on genuine-impostor score fusion in multi-biometric systems
Kabir et al. A multi-biometric system based on feature and score level fusions
CN109447099B (en) PCA (principal component analysis) dimension reduction-based multi-classifier fusion method
WO2016049983A1 (en) User keyboard key-pressing behavior mode modeling and analysis system, and identity recognition method thereof
EP2523149A2 (en) A method and system for association and decision fusion of multimodal inputs
Mhenni et al. Double serial adaptation mechanism for keystroke dynamics authentication based on a single password
Bharadi et al. Off-line signature recognition systems
WO2017075913A1 (en) Mouse behaviors based authentication method
Sae-Bae et al. Distinctiveness, complexity, and repeatability of online signature templates
Tsai et al. An approach for user authentication on non-keyboard devices using mouse click characteristics and statistical-based classification
Kong et al. A hierarchical classification method for finger knuckle print recognition
Silasai et al. The study on using biometric authentication on mobile device
CN115204238B (en) PPG signal identity recognition method for wearable equipment and wearable equipment
CN109063431B (en) User identity recognition method for weighting keystroke characteristic curve difference degree
Quraishi et al. Keystroke dynamics biometrics, a tool for user authentication–review
Sun et al. Smartphone User Authentication Based on Holding Position and Touch-Typing Biometrics.
CN113627238B (en) Biological identification method, device, equipment and medium based on vibration response characteristics of hand structure
Shanmugapriya et al. Virtual key force—a new feature for keystroke
Neha et al. Biometric re-authentication: An approach towards achieving transparency in user authentication
Jeong et al. Effect of Smaller Fingerprint Sensors on the Security of Fingerprint Authentication
Shen et al. Handedness recognition through keystroke-typing behavior in computer forensics analysis
CN110298159A (en) A kind of smart phone dynamic gesture identity identifying method
CN111159698B (en) Terminal implicit identity authentication method based on Sudoku password
Vasuhi et al. An efficient multi-modal biometric person authentication system using fuzzy logic
Fan Applying generative adversarial networks for the generation of adversarial attacks against continuous authentication

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant
CF01 Termination of patent right due to non-payment of annual fee
CF01 Termination of patent right due to non-payment of annual fee

Granted publication date: 20211022