CN108647511B

CN108647511B - Password strength evaluation method based on weak password derivation

Info

Publication number: CN108647511B
Application number: CN201810324327.2A
Authority: CN
Inventors: 何道敬; 周贝贝; 吴宇
Original assignee: East China Normal University
Current assignee: East China Normal University
Priority date: 2018-04-12
Filing date: 2018-04-12
Publication date: 2022-04-05
Anticipated expiration: 2038-04-12
Also published as: CN108647511A

Abstract

The invention discloses a password strength evaluation method based on weak password derivation, which comprises the following steps: 1) weak password set generation: selecting passwords with the top rank from the password samples in a descending order of occurrence frequency as a weak password set; 2) and (3) grammar training: analyzing the password in the training set based on the weak password set to generate a probability context-free grammar table with a weak password label; 3) password strength evaluation: inputting a password, calculating the probability of the password according to a grammar table generated by grammar training, wherein the higher the probability value is, the lower the strength of the password is; 4) and (3) grammar table updating: and dynamically adjusting the probability distribution of the probability context independent grammar with the weak password label according to the input password. The method utilizes the existing probability context-free grammar to deduce the password similar to the password in the weak password set, inherits the efficiency and robustness of the traditional password strength evaluation method, and simultaneously can eliminate the potential weak password, enhance the password resistance to the password guessing attack and improve the safety of users.

Description

Password strength evaluation method based on weak password derivation

Technical Field

The invention belongs to the technical field of information security, and particularly relates to a password strength evaluation method based on weak password derivation.

Background

The rapid development of internet technology has profoundly changed the way of learning, working and living of people, and in recent years, information technology represented by mobile internet and electronic commerce greatly facilitates the life of people. The information security issues closely related to the internet are also receiving more and more attention. Identity authentication is an important way to protect the security of user information, and is widely applied to various service sites in the internet.

Identity authentication is a main means for protecting the security of user information. Password authentication is the most widely used identity authentication method in the internet because of its characteristics such as convenient deployment and flexible use. Password-based authentication systems, however, suffer from a number of security and usability problems. In password authentication systems, the system requires the user to create a printable string (i.e., a password) and use this string as a means of verifying the user's identity. Due to the limited memory of the human brain, it is difficult for human beings to remember complex and secure passwords, and the trend is often to use simple passwords. The use of a simple password may lead to vulnerability of the password authentication system.

A good password strength evaluator should be able to characterize the similarity between weak passwords. Such as password 123456, is a recognized weak password, and passwords 123.456 and 123456 have very high similarity, giving good reason to consider that users have a high likelihood of constructing a new password 123.456 based on password 123456.

However, the academic world is a leading password strength evaluation method based on the PCFG algorithm, and this cannot be judged. The PCFG model classifies the user password characters into three categories, letter (L), number (D), and special character (S), and assumes that the user generates the password by means of "concatenation".

Therefore, the combination of the traditional probabilistic context-free grammar and the weak password commonly used by each website has great significance for the evaluation of the weak password which is judged to be 'robust' but actually unsafe by the existing password strength evaluator.

Disclosure of Invention

The invention aims to make up the defects of the existing password strength evaluation method, combines the traditional probabilistic context-free grammar and the weak password set, provides the password strength evaluation method deduced by using the weak password set, and can identify more weak passwords which are misjudged as 'robust' while inheriting the efficiency and robustness of the traditional password strength evaluation method, thereby enhancing the capability of the password in resisting password guessing attack and improving the security of the password.

The specific technical scheme for realizing the purpose of the invention is as follows:

a password strength evaluation method based on weak password derivation comprises the following specific steps:

step 1: weak password set generation

Selecting passwords with the top rank from the password samples in a descending order of occurrence frequency as a weak password set;

step 2: grammar training

Analyzing the password in the training set based on the weak password set to generate a probability context-free grammar table with a weak password label;

and step 3: password strength evaluation

Inputting a password, calculating the probability of the password according to a grammar table generated by grammar training, wherein the higher the probability value is, the lower the strength of the password is;

and 4, step 4: grammar table update

And dynamically adjusting the probability distribution of the probability context independent grammar with the weak password label according to the input password.

The step 2 of the invention specifically comprises the following steps:

step A1: weak password matching

Carrying out similarity matching on the passwords or substrings thereof in the training set and the passwords in the weak password set for next password structure analysis;

if the substrings of the password in the training set are successfully matched with the password in the weak password set, continuing to execute the matching process on the rest unmatched parts in the password until all the substrings of the password are matched once, and finally returning an optimal value sequence;

step A2: password structure resolution

Firstly, marking the optimal value sequence returned in the step A1 by using a weak password label; the remaining part which can not be matched is matched by using the original probability context-free grammar label until the analysis of the whole password is finally completed;

step A3: grammar table generation

When all the passwords in the training set are analyzed, generating a probability context grammar-free table with weak password labels;

wherein: the algorithm used for the similarity matching includes, but is not limited to, a bk-tree.

The step A1 specifically comprises the following steps:

step A11: setting an editing distance threshold and a similarity threshold;

step A12: acquiring all password substrings to be analyzed, namely corresponding weak password character string pairs, of which the editing distance is less than or equal to an editing distance threshold and the similarity is greater than or equal to a similarity threshold;

step A13: obtaining all character string pairs with the minimum editing distance on the basis of A12;

step A14: obtaining all character string pairs with the maximum similarity on the basis of A13;

step A15: obtaining all character string pairs with the maximum weak password length on the basis of A14;

step A16: if the set formed by all the character string pairs obtained by A15 is empty, the matching failure of the password to be analyzed and the password in the weak password set is represented; if not, the matching between the password to be analyzed and the password in the weak password set is successful, and one character string pair is randomly selected from the set formed by the character string pairs to be used as the optimal solution to return.

The original probabilistic context-free grammar label is divided into: numbers, letters, special characters.

The probabilistic context-free grammar with the weak password label in step 2 of the present invention includes, but is not limited to, a non-final character set, a starting variable and a rule set.

The elements in the non-terminal character of the present invention include, but are not limited to: alphabetic characters, numeric characters, special characters, keyboard continuation, insert operations, delete operations, replace operations, and weak password strings.

Step 4 of the present invention specifically includes:

step B1: determining a structure of adding 1 to the frequency according to the input password;

step B2: adding 1 to the total number of structures in the grammar table;

step B3: updating the probability of the structure in step B1;

step B4: and updating the probabilities of other structures in the grammar table in sequence.

The method is based on the existing weak password set and combines a probability text context free text method, and more similar passwords are deduced by the password in the weak password set, so that the probability of the similar passwords with the password in the weak password set is effectively calculated, the efficiency and the robustness of the traditional password strength evaluation method are inherited, the capability of the password in resisting the password guessing attack is enhanced, and the precision of the password strength evaluation method is improved.

Drawings

FIG. 1 is a flow chart of the present invention;

FIG. 2 is a flowchart illustrating the matching of a password in a training set with a password in a weak password set according to the present invention.

Detailed Description

The present invention will be described in further detail with reference to the following specific examples and the accompanying drawings. The procedures, conditions, experimental methods and the like for carrying out the present invention are general knowledge and common general knowledge in the art except for the contents specifically mentioned below, and the present invention is not particularly limited.

Examples

The technical terms in this example represent the following meanings:

PCFG: probabilistic Context-Free Grammar (Probasilic Context Free Grammar)

W weak password set

W is an element of W

Wn password with length n in weak password set (Lmin is less than or equal to n and less than or equal to Lmax)

Lmax maximum length of password string the target system allows to receive

Lmin minimum length of password string allowed to be received by target system

T: training set

OLCS (optimal Long Common subsequence): optimal Longest Common subsequence algorithm

pw password to be resolved

Set of all SUB-strings of SUB pw

Cartesian product of SUB x W, SUB and W

Elements of SUB

DT: editing distance thresholds

ST: similarity threshold

V ═ Start, a, L, U, D, S, K, insert, delete, replace, no, W1, W2.. Wn, is a set of non-terminators, the elements of which are referred to as non-terminators

Σ ═ 95 printable ASCII characters, a terminator set disjoint from V, the elements of which are called terminators

Start is a subset of V, called the set of initial variables

P is a set of rules, the elements being called rules, the shape being α → β, where α is a non-terminal and β is composed of non-terminal and terminal

A is An alphabetic character, An represents n continuous alphabetic characters

L, U, letter character mask, wherein L represents lower case letters and U represents upper case letters

D, numerical characters, Dn represents n successive numerical characters

S special character, S_nRepresenting n successive special characters

K, keyboard continuous character, Kn represents n keyboard continuous characters (n is more than or equal to 4)

insert operation on weak password centralized password

delete operation on weak password set password

replace general replacement operation for weak password set password

no operation on weak password centralized password

Referring to fig. 1, the present embodiment includes the following steps:

step 1: weak password set generation

step 2: grammar training

Analyzing the password in the training set based on the weak password set to generate a probability context-free grammar table with a weak password label; the method specifically comprises the following steps:

step A1: weak password matching

And carrying out similarity matching on the passwords pw or substrings thereof in the training set and the passwords in the weak password set W for the next password structure analysis.

If the substrings of the password in the training set are successfully matched with the password in the weak password set, the matching process is continuously executed on the residual unmatched part (pw-sub) in the password until all the substrings of the password are matched once, and finally, an optimal value sequence opt is returned₁,opt₂,...opt_n)。

Step A2: password structure resolution

Firstly, marking each substring of pw with the ops returned in the step A1, and if substrings sub of pw are matched with W with the length of n in W, marking sub as Wn; and matching the substrings which are not matched with the password in the W and are left by the pw by using an LDS label of the original probability context-free grammar until the resolution of the whole password is finally completed.

Step A3: grammar table generation

When all the passwords in the training set are analyzed, a probability context-free grammar table with weak password labels is generated.

The similarity matching algorithm used in step A1 includes, but is not limited to, a bk-tree.

Referring to fig. 2, a distance function is used to determine the edit distance between two strings, a smility function is used to determine the similarity between two strings, and a len function is used to determine the length of a string. The specific process of the step A1 is as follows:

the process of matching the similarity between the password pw and the password in the weak password set W is as follows:

step A11: setting an editing distance threshold DT and a similarity threshold ST;

step A12: obtaining all password substrings to be analyzed, which have the editing distance less than or equal to DT and the similarity more than or equal to ST, and corresponding weak password character string pairs (SUB, W), wherein (SUB, W) belongs to SUB multiplied by W;

step A13: obtaining all character string pairs (sub, w) with the minimum editing distance on the basis of A12;

step A14: obtaining all character string pairs (sub, w) with the maximum similarity on the basis of A13;

step A15: obtaining all character string pairs (sub, w) with the maximum weak password length on the basis of A14;

step A16: if the set formed by all the character string pairs obtained by A15 is empty, the matching of the password pw and the password in the weak password set W is failed; if not, it indicates that the password pw matches successfully with the passwords in the weak password set W, and randomly selects one string pair (SUB, W) from the set of string pairs as the optimal solution opt (opt ═ SUB, W), and opt ∈ SUB × W is returned.

The original probability context-free grammar label is divided into: numbers, letters, special characters.

The probabilistic context-free grammar G with weak password labels includes, but is not limited to, a non-terminal character set, a starting variable, and a rule set.

Elements in the non-terminal character include, but are not limited to: alphabetic characters, numeric characters, special characters, keyboard continuation, insert operations, delete operations, replace operations, weak password strings.

Such as the password avai ^ able123 ∈ T, and available ∈ W. The password structure directly analyzed by a PCFG matching method is L4S1L5D 3; the avail ^ able is most similar to the weak password avail (the editing distance is shortest, the similarity is maximum, and the matching length is longest), so the (avail ^ able and avail) is used as an optimal value, and the (avail ^ able and avail) is directly returned because only one optimal value sequence is provided.

And step 3: password strength evaluation

if the entered password is 123.456 and 123456 ∈ W, the probability of W6 is 0.28, W₆Probability of 0.4, W → 123456₆Probability of → insert is 0.3, probability of insert → S1 is 0.11, S₁→ 0.52, 123.456 would be identified as 123456 (structure W)₆) Insert a special character ". The probability of password 123.456 is therefore: p (123.456) ═ P (Start → W6) × P (W6 → 123456) × P (W6 → insert) × insert → S1) × P (S1 →.)

＝0.28*0.4*0.3*0.11*0.52

≈0.00192。

And 4, step 4: grammar table update

Dynamically adjusting the probability distribution of the probability context irrelevant grammar with the weak password label according to the input password; the method specifically comprises the following steps:

step B2: adding 1 to the total number of structures in the grammar table;

step B3: updating the probability of the structure in step B1;

Let the grammar table total N structures, the total number of all structures present is N. The probability P of the ith structure occurring_i＝f_iN, wherein f_iThe frequency of occurrence of the ith structure. When a password is newly registered, the frequency count of the structure corresponding to i after the password is registered is added with 1, and the total number N of the structures is also added with 1.

The probability of the ith structure is updated to

P_i'＝(f_i+1)/(N+1) (1)

The probabilities of other structures are also updated to

P_j'＝f_j/(N+1),j≠i (2)

For example, when the user registers a password 123.456abc, the portion 123.456 of the password 123.456abc is determined to be similar to the weak password 123456 and labeled 123.456 as W6. The remaining part abc, which has no corresponding weak password to match, will be labeled L3 according to the PCFG segmentation method, so the password 123.456abc will be identified and labeled W₆L₃。

The structure W associated with the password₆L₃、L₃、W₆A terminal string 123.456abc, and a rule W₆→insert，insert→.W₆The probability of → 123456 is updated according to equation (1) and the probabilities of other structures are adjusted accordingly according to equation (2).

The protection of the present invention is not limited to the above embodiments. Variations and advantages that may occur to those skilled in the art may be incorporated into the invention without departing from the spirit and scope of the inventive concept, and the scope of the appended claims is intended to be protected.

Claims

1. A password strength evaluation method based on weak password derivation is characterized by comprising the following specific steps:

step 1: weak password set generation

step 2: grammar training

and step 3: password strength evaluation

and 4, step 4: grammar table update

Dynamically adjusting the probability distribution of the probability context irrelevant grammar with the weak password label according to the input password; wherein:

the step 2 specifically comprises:

step A1: weak password matching

step A2: password structure resolution

step A3: grammar table generation

wherein: the algorithms used for similarity matching include, but are not limited to, bk-tree;

the step A1 specifically comprises the following steps:

step A11: setting an editing distance threshold and a similarity threshold;

2. The weak password derivation-based password strength evaluation method of claim 1, wherein the original probabilistic context-free grammar tag is divided into: numbers, letters, special characters.

3. The password strength evaluation method based on weak password derivation as claimed in claim 1, wherein the weak password tagged probabilistic context-free grammar in step 2 includes but is not limited to a non-final character set, a starting variable and a rule set.

4. The password strength evaluation method based on weak password derivation as claimed in claim 3 wherein elements in the non-terminal character include but are not limited to: alphabetic characters, numeric characters, special characters, keyboard continuation, insert operations, delete operations, replace operations, and weak password strings.

5. The password strength evaluation method based on weak password derivation as claimed in claim 1, wherein said step 4 specifically comprises:

step B2: adding 1 to the total number of structures in the grammar table;

step B3: updating the probability of the structure in step B1;