CN107579821B

CN107579821B - Method for generating password dictionary and computer-readable storage medium

Info

Publication number: CN107579821B
Application number: CN201710851440.1A
Authority: CN
Inventors: 张光斌; 高志鹏; 黄仁裕; 姚灿荣; 尤俊生; 庄进发
Original assignee: Xiamen Meiya Pico Information Co Ltd
Current assignee: Xiamen Meiya Pico Information Co Ltd
Priority date: 2017-09-19
Filing date: 2017-09-19
Publication date: 2020-04-28
Anticipated expiration: 2037-09-19
Also published as: CN107579821A

Abstract

The invention discloses a password dictionary generating method and a computer readable storage medium, wherein the method comprises the following steps: collecting a password set; generating a test set; training a current password set through a recurrent neural network model to obtain a dictionary model; generating a dictionary according to the dictionary model; according to the test set, obtaining the hit rate of the current password set; randomly modifying the current password set to obtain a new password set; training the new password set through a recurrent neural network model to obtain a new dictionary model; generating a new dictionary according to the new dictionary model; according to the test set, obtaining the hit rate of a new password set; if the hit rate of the new password set is greater than that of the current password set, adding one to the updating times, and taking the new password set as the current password set; and when the updating times reach a preset first time, generating the password dictionary according to the dictionary model corresponding to the current password set. The password dictionary finally generated by the invention can improve the success rate of password recovery.

Description

Method for generating password dictionary and computer-readable storage medium

Technical Field

The present invention relates to the field of cryptographic technologies, and in particular, to a method for generating a cryptographic dictionary and a computer-readable storage medium.

Background

Brute force traversal and dictionary traversal are the two most common traversal methods in password recovery at present. Violence traversal is to traverse all passwords in a rule set by a user, and dictionary traversal is to traverse passwords in a dictionary file. The brute force mode usually needs more rules to cover the password space as large as possible to improve the success rate of password recovery, but the too large password space will cause the traversal time to be doubled, even hundreds of years of time are needed, and the password recovery is meaningless. The success rate of the dictionary traversal mode is limited by the password number in the dictionary, and as the password number of the dictionary is not too large, the traversal of the password can be completed in a short time, but the common password can be recovered generally, and the method is ineffective for some complex passwords.

How to recover as complex a password as possible in a valid time is currently the main research direction. At present, research is mainly focused on improving the traversal speed through various hardware acceleration, algorithm optimization and distributed technologies aiming at a violent mode so as to shorten the traversal time. But the traversal time can only be linearly reduced by improving the traversal speed, and the traversal time geometric progression is increased by increasing the password length and the password character set; for dictionary traversal, research is mainly focused on collection of dictionaries, and the collected password number is limited.

Researchers carry out social engineering research on a large number of passwords, and statistics and analysis are carried out on rules and characteristics of password setting, so that certain rules of password traversal are summarized to improve success rate of password recovery, such as common rules of name pinyin plus birthday, English word plus number rules and the like. However, the number of password samples is very large, the password distribution is disordered, some rules set by the passwords are difficult to find, and the rules summarized by manual statistics have great limitations.

In chinese patent publication No. CN104717058A, a password traversal method and apparatus are disclosed, which includes: acquiring a preset character set for password traversal; acquiring a probability factor set and an association factor set corresponding to a preset character set, wherein the probability factor set comprises a probability factor corresponding to each bit of a password of each character in the preset character set, and the association factor set comprises association factors of any two characters in the preset character set; and determining a traversal password according to the probability factor set and the association factor set. In the scheme, the password traversal algorithm designed according to the distribution rule and the associated probability of the password characters can reduce traversal time, preferentially construct a password with high probability and improve the success rate, but the algorithm does not analyze the setting rule of the password, cannot expand a password dictionary according to the password rule, and has certain limitation.

Disclosure of Invention

The technical problem to be solved by the invention is as follows: provided are a password dictionary generation method and a computer-readable storage medium, which can improve the success rate of password recovery.

In order to solve the technical problems, the invention adopts the technical scheme that: a method of cryptographic dictionary generation, comprising:

collecting a password set, wherein the password set comprises a real password and a virtual password;

generating a test set, wherein the test set comprises a plaintext password;

training a current password set through a recurrent neural network model to obtain a dictionary model;

generating a dictionary according to the dictionary model;

testing the dictionary according to the test set to obtain a hit rate corresponding to the current password set;

randomly modifying the virtual password in the current password set to obtain a new password set;

training the new password set through a recurrent neural network model to obtain a new dictionary model;

generating a new dictionary according to the new dictionary model;

testing the new dictionary according to the test set to obtain a hit rate corresponding to the new password set;

judging whether the hit rate corresponding to the new password set is greater than the hit rate corresponding to the current password set;

if not, returning to the step of executing the virtual password in the current password set to be randomly modified to obtain a new password set;

if yes, adding one to the updating times, and taking the new password set as the current password set;

when the updating times do not reach the preset first times, returning to the step of randomly modifying the virtual password in the current password set to obtain a new password set;

and when the updating times reach a preset first time, generating the password dictionary according to the dictionary model obtained by training the recurrent neural network model in the current password set.

The invention also relates to a computer-readable storage medium, on which a computer program is stored which, when being executed by a processor, carries out the steps of:

generating a test set, wherein the test set comprises a plaintext password;

generating a dictionary according to the dictionary model;

generating a new dictionary according to the new dictionary model;

The invention has the beneficial effects that: the method comprises the steps of utilizing ten million-level password sets as learning samples, generating corresponding dictionary models through a neural network suitable for password rule learning, generating a dictionary through the dictionary models, continuously modifying passwords in the password sets, selecting the password sets corresponding to the dictionaries with higher coincidence degrees of the test sets, and continuously conducting training, modification and hit rate comparison to enable the neural network to continuously learn setting rules of the passwords in the samples, and generating new password dictionaries according to the final dictionary models. The invention learns the password rule in the password sample by using the neural network, generates a new password by using the learned password rule, and can be subsequently used for password recovery.

Drawings

Fig. 1 is a flowchart of a method for generating a password dictionary according to a first embodiment of the present invention;

FIG. 2 is a flowchart of the method of step S3 according to the second embodiment of the present invention;

fig. 3 is a flowchart of the method of step S4 in the third embodiment of the present invention.

Detailed Description

In order to explain technical contents, objects and effects of the present invention in detail, the following detailed description is given with reference to the accompanying drawings in conjunction with the embodiments.

The most key concept of the invention is as follows: and learning the password rule in the password sample by using a neural network, and generating a new password by using the learned password rule for password recovery.

Referring to fig. 1, a method for generating a password dictionary includes:

generating a test set, wherein the test set comprises a plaintext password;

generating a dictionary according to the dictionary model;

generating a new dictionary according to the new dictionary model;

From the above description, the beneficial effects of the present invention are: the method has the advantages that the neural network is used for learning the password rule in the password sample, the learned password rule is used for generating a new password, the password can be subsequently recovered, and the setting rule of the password is effectively learned, so that the success rate of password recovery can be greatly improved.

Further, the step of testing the dictionary according to the test set to obtain the hit rate corresponding to the current password set specifically includes:

calculating the number of the passwords in the dictionary, which is the same as the plaintext passwords in the test set;

and calculating the quotient of the number of the passwords and the total number of the plaintext passwords in the test set to obtain the hit rate corresponding to the current password set.

Further, the "training the current password set through the recurrent neural network model to obtain the dictionary model" specifically includes:

constructing a recurrent neural network model, wherein the recurrent neural network model comprises an input layer, a hidden layer and an output layer, and the hidden layer comprises three GRU layers;

reading a group of passwords from a current password set, wherein each group of passwords comprises passwords of a preset first number which are randomly taken out from the password set;

importing the password of the first number into a recurrent neural network model, and converting the password from a character string into a numerical value vector at an input layer;

calculating the converted password of the first number through three GRU layers to obtain a second-order matrix;

inputting the second-order matrix to an output layer to obtain a numerical sequence of a first number;

calculating to obtain the deviation of the prediction result and the actual result of the numerical sequence according to the cost function;

adjusting the weight parameters of the neurons of the three GRU layers through an optimization algorithm according to the deviation and a preset learning rate;

returning to the step of reading a group of passwords from the current password set until all passwords in the password set complete one-time training;

returning to the step of reading the group of passwords from the current password set until the password set finishes training for a preset second time;

and obtaining weight parameters of the neurons of the three GRU layers to obtain a dictionary model.

According to the description, the neural network suitable for learning the password rule is designed, and the corresponding password dictionary model is generated through the neural network, so that the password setting rule in the password sample can be effectively learned, and the success rate of subsequent password recovery is improved.

Further, before the step of importing the password of the first number into a recurrent neural network model and converting the password from a character string to a numerical vector at an input layer, the method further includes:

and filling the password of the first number to a preset length by using 0.

As can be seen from the above description, it is guaranteed that the same size data is processed by the neural network.

Further, before the step of calculating the converted first number of passwords through three GRU layers to obtain a second-order matrix, the method further includes:

and according to a random discarding algorithm, selecting the neurons of the preset proportion of each GRU layer at random, and enabling the neurons to fail.

From the above description, it can be seen that not only is overfitting effectively prevented, but also the performance of the neural network is improved.

Further, the "generating a dictionary according to the dictionary model" specifically includes:

acquiring a character set;

randomly selecting a character from the character set as a first character of a password;

taking the characters as current characters, and calculating the probability of each character through a cost function according to the current characters and the dictionary model;

acquiring a preset second number of characters with the maximum probability, and randomly selecting one character from the second number of characters as a next character of the password;

judging whether the next character is a preset end character or not;

if not, the next character is used as the current character, the step of executing the current character and the dictionary model and obtaining the probability of each character through cost function calculation is returned;

if yes, adding the password into a dictionary;

and returning to the step of randomly selecting one character from the character set as the first character of the password until the number of the passwords in the dictionary reaches a preset third number.

Further, after the "adding the password to the dictionary", the method further includes:

and carrying out duplication elimination on the password and the existing password in the dictionary.

As can be seen from the above description, it can be ensured that the passwords in the dictionary are not repeated.

The invention also proposes a computer-readable storage medium, on which a computer program is stored which, when being executed by a processor, carries out the steps of:

generating a test set, wherein the test set comprises a plaintext password;

generating a dictionary according to the dictionary model;

generating a new dictionary according to the new dictionary model;

and filling the password of the first number to a preset length by using 0.

acquiring a character set;

judging whether the next character is a preset end character or not;

if yes, adding the password into a dictionary;

Example one

Referring to fig. 1, a first embodiment of the present invention is: a password dictionary generation method is based on a recurrent neural network, and the obtained password dictionary can be used for password recovery; the method comprises the following steps:

s1: collecting a password set, wherein the password set comprises a real password and a virtual password; to prevent overfitting, the password set consists of two parts, the first part is from a real password in a real website or an information management system database, and the second part is a virtual password consisting of a common password keyword (such as admin and the like) and a randomly generated suffix; furthermore, the first part accounts for 70% of the total capacity of the cipher, the second part accounts for 30% of the total capacity, and the total capacity is not less than 1000 ten thousand.

S2: generating a test set, wherein the test set comprises a plaintext password; the total capacity of the test set is not less than 100 ten thousand; further, the passwords in the test set are different from the passwords in the password set, that is, 100 ten thousand passwords of the test set are not included in 1000 ten thousand passwords of the password set.

Because a single password has weak character association from the character context, the password does not need to be participled, and each password is directly used as a sequence unit.

S3: training a current password set through a recurrent neural network model to obtain a dictionary model;

s4: generating a dictionary according to the dictionary model;

s5: testing the dictionary according to the test set to obtain a hit rate corresponding to the current password set;

s6: randomly modifying the virtual password in the current password set to obtain a new password set; specifically, the virtual password may be changed at intervals in a dynamically inserted manner.

S7: training the new password set through a recurrent neural network model to obtain a new dictionary model;

s8: generating a new dictionary according to the new dictionary model;

s9: testing the new dictionary according to the test set to obtain a hit rate corresponding to the new password set;

s10: and judging whether the hit rate corresponding to the new password set is greater than the hit rate corresponding to the current password set, if so, executing the step S11, and if not, executing the step S6.

S11: adding one to the updating times, and taking the new password set as the current password set; the initial value of the number of updates is 0.

S12: and judging whether the updating times reach a preset first time, if so, executing the step S13, and if not, executing the step S6. Preferably, the preset first time is 100 times, that is, the termination condition is that the original password set is successfully updated 100 times.

S13: and generating a password dictionary according to a dictionary model obtained by training a current password set through a recurrent neural network model.

Specifically, the calculation of the hit rate in step S5 is as follows: calculating the number of the passwords in the dictionary, which is the same as the plaintext passwords in the test set; and calculating the quotient of the number of the passwords and the total number of the plaintext passwords in the test set to obtain the hit rate corresponding to the current password set. That is, if one password in the dictionary is the same as one password in 100 ten thousand passwords in the test set, that is, one password is hit, and if N passwords are hit, the hit rate is N/1000000. Hit rate calculation in step S9 and so on.

Further, the password dictionary obtained in step S13 can be used for password recovery,

in the embodiment, ten million-level password sets are used as learning samples, corresponding dictionary models are generated through a neural network suitable for password rule learning, a dictionary is generated through the dictionary models, passwords in the password sets are continuously modified, a password set corresponding to a dictionary with a higher coincidence degree with a test set is selected, training, modification and hit rate comparison are continuously performed, the neural network is enabled to continuously learn the setting rule of the passwords in the samples, a new password dictionary is generated according to a final dictionary model, and the new passwords meet the setting rule of the passwords in the samples and are closer to practical application, so that the method is more effective in improving the success rate of password recovery.

Example two

Referring to fig. 2, the present embodiment is a further development of steps S3 and S7 in the first embodiment, taking step S3 as an example, step S3 includes the following steps:

s301: constructing a recurrent neural network model, wherein the recurrent neural network model comprises an input layer, a hidden layer and an output layer, and the hidden layer comprises three GRU layers.

Specifically, a recurrent neural network is adopted as a reference structure of the model, wherein, in order to have the memory capacity of long-time information sequence context, a long-time memory network Layer (LSTM) is adopted as a basic unit of the model, and in order to improve the efficiency and reduce the operand, a gate control cycle unit (GRU) which is a variant model of the network is further adopted, and an original LSTM model is forgotten to be gated and input to be gated to be updated, so that the dictionary generation speed is improved on the premise of keeping the precision.

Wherein, because the password length basically does not exceed 128 characters, the longest password sequence length is set as 128 characters, each GRU weight matrix is initialized with 0.05 Gaussian standard deviation, the hidden layer width is set as 128 characters, and the bias initialization is constant 0; in order to facilitate the control of information flow, the threshold is controlled by a nonlinear sigmoid function (sigmoid function), which has the advantages of nonlinearity, monotone increment and microminiaturity in the definition domain.

In order to prevent overfitting (overfitting is that the training samples are well learned, but the hit rate of the passwords in the training samples is low), the random discarding algorithm (dropout algorithm) is added to each GRU layer, and the ratio of the random discarding algorithm to the training samples is set to be 0.2, so that not only can overfitting be effectively prevented, but also the performance of the neural network is improved.

The neural network in this embodiment uses softmax _ cross _ entry _ with _ locations as a cost function, and the cost function is used to determine the deviation between the predicted result and the actual result. The gradient descent algorithm with an attenuation coefficient (RMSProp algorithm) is taken as an optimizer, and the adaptive weight capability is obtained by introducing an attenuation coefficient. The initial learning rate was set to 0.0001, and the gradual attenuation was performed at an attenuation rate of 20% per round of training.

S302: reading a group of passwords from a current password set, wherein each group of passwords comprises passwords of a preset first number which are randomly taken out from the password set; the first number can be set as required, and preferably is 1024. The current password set is used as a password sample, 1024 passwords are read randomly each time, and the passwords are led into the recurrent neural network model in batches for training, so that the processing speed can be improved.

S303: filling the passwords of the first number to a preset length by using 0, namely filling 1024 passwords to the same length by using 0; further the preset length is 128 characters.

S304: importing the password of the first number into a recurrent neural network model, and converting the password from a character string into a numerical value vector at an input layer; in this embodiment, the magnitude of the numerical vector is 1024 × 128. Furthermore, the association relationship between characters and numbers can be preset, and generally one character corresponds to one number; and then converting the password from the character string into a numerical value vector according to the incidence relation.

S305: and according to a random discarding algorithm (dropout algorithm), selecting neurons of a preset proportion of each GRU layer at random, and enabling the neurons to fail. In this embodiment, the predetermined ratio is 20%, that is, the passwords of the 20% neurons passing through a GRU layer do not participate in the calculation of the GRU layer.

S306: calculating the converted password of the first number through three GRU layers to obtain a second-order matrix; the data of the input layer is subjected to calculation of a GRU layer and combined with a random discarding algorithm, the influence of 20% of the randomly selected GRU layers is discarded, a 1024 x 128 second-order matrix is obtained, the second-order matrix is subjected to calculation of the GRU layer again, namely the steps are repeated, and the final 1024 x 128 second-order matrix is obtained after the calculation of 3 GRU layers in total.

S307: inputting the second-order matrix to an output layer to obtain a numerical sequence of a first number; the calculated structure of the GRU layer is input to the output layer, and a 1024 numerical sequence is obtained.

S308: calculating to obtain the deviation of the prediction result and the actual result of the numerical sequence according to the cost function;

s309: adjusting weight parameters of neurons of the three GRU layers through an optimization algorithm (RMSProp algorithm) according to the deviation and a preset learning rate; further, the offset amount may also be adjusted simultaneously.

S310: and judging whether all the passwords in the current password set complete one-time training, namely whether all the passwords participate in one-time training, if so, indicating that the current password set successfully completes one-time training, and executing step S311, otherwise, executing step S302.

S311: and judging whether the current password set completes the training for the preset second time, if so, executing step S312, and if not, executing step S302. The second number represents the effect of the model before and after the training for a certain period of time, and is not a fixed value. In this embodiment, the second frequency is 10. All passwords in the password set participate in the training once, i.e. a complete training round is completed, and after 10 training rounds, step S312 is executed.

S312: and obtaining weight parameters of the neurons of the three GRU layers to obtain a dictionary model. Through the steps, the weight parameters are continuously adjusted, and finally adjusted weight parameters are obtained, namely the dictionary model.

In the embodiment, the neural network suitable for learning the password rule is designed, and the corresponding password dictionary model is generated through the neural network, so that the password setting rule in the password sample can be effectively learned, and the success rate of subsequent password recovery is improved.

EXAMPLE III

Referring to fig. 3, the present embodiment is a further development of steps S4 and S8 in the above embodiment, taking step S4 as an example, step S4 includes the following steps:

s401: and acquiring a character set which comprises a-z, 0-9 and each special character.

S402: randomly selecting a character from the character set as a first character of a password;

s403: and taking the characters as current characters, and calculating the probability of each character according to the current characters and the dictionary model through a cost function, namely the probability of each character after the current character.

S404: acquiring a preset second number of characters with the maximum probability, and randomly selecting one character from the second number of characters as a next character of the password; in this embodiment, the second number is 5, that is, one character is selected from the 5 characters with the highest probability at any time.

S405: and judging whether the next character is a preset end character, if so, executing the step S406, otherwise, taking the next character as the current character, calculating the probability of each character after the character through a cost function by combining a dictionary model, and returning to execute the step S403. In this embodiment, the terminator may be "eos".

S406: adding the password into a dictionary;

s407: the password and the existing password in the dictionary are subjected to duplication elimination; namely, whether the password identical to the password exists in the existing passwords of the dictionary is judged, if yes, the password is deleted, and the passwords in the dictionary are not repeated.

S408: and judging whether the number of the passwords in the dictionary reaches a preset third number, if so, executing the step S409, otherwise, randomly selecting one character from the character set as a first character of a new password, and returning to execute the step S402. In this embodiment, the third number is 1 hundred million.

S409: and obtaining a dictionary.

According to the embodiment, the dictionary is generated through the dictionary model learned by the neural network, and the hit rate of the dictionary and the test set is calculated subsequently, so that the learning degree of the neural network can be effectively reflected.

Example four

The present embodiment is a computer-readable storage medium corresponding to the above-mentioned embodiments, on which a computer program is stored, which when executed by a processor implements the steps of:

generating a test set, wherein the test set comprises a plaintext password;

generating a dictionary according to the dictionary model;

generating a new dictionary according to the new dictionary model;

and filling the password of the first number to a preset length by using 0.

acquiring a character set;

judging whether the next character is a preset end character or not;

if yes, adding the password into a dictionary;

In summary, the method for generating a password dictionary and the computer-readable storage medium provided by the present invention utilize ten million levels of password sets as learning samples, generate a corresponding dictionary model through a neural network suitable for learning a password rule, generate a dictionary through the dictionary model, continuously modify the passwords in the password sets, select the password sets corresponding to the dictionaries with higher coincidence degrees in the test sets, and continue to perform training, modification and hit rate comparison, so that the neural network continuously learns the setting rules of the passwords in the samples, generate a new password dictionary according to the final dictionary model, and since the new passwords all satisfy the rules of password setting in the samples and are closer to practical application, the method is more effective in improving the success rate of password recovery. The invention learns the password rule in the password sample by using the neural network, generates a new password by using the learned password rule, and can be subsequently used for password recovery.

The above description is only an embodiment of the present invention, and not intended to limit the scope of the present invention, and all equivalent changes made by using the contents of the present specification and the drawings, or applied directly or indirectly to the related technical fields, are included in the scope of the present invention.

Claims

1. A method for generating a cryptographic dictionary, comprising:

generating a test set, wherein the test set comprises a plaintext password;

generating a dictionary according to the dictionary model;

generating a new dictionary according to the new dictionary model;

when the updating times reach a preset first time, generating a password dictionary according to a dictionary model obtained by training a recurrent neural network model in a current password set;

the method for training the current password set through the recurrent neural network model to obtain the dictionary model specifically comprises the following steps:

constructing a recurrent neural network model, wherein the recurrent neural network model comprises an input layer, a hidden layer and an output layer, the hidden layer comprises three GRU layers, and the GRU is a gating cycle unit;

2. The method for generating a cryptographic dictionary according to claim 1, wherein the step of testing the dictionary according to the test set to obtain the hit rate corresponding to the current cryptographic set specifically comprises:

3. The method for generating a cryptographic dictionary of claim 1, wherein before importing the first number of passwords into a recurrent neural network model and converting the passwords from character strings to numerical vectors at an input layer, the method further comprises:

and filling the password of the first number to a preset length by using 0.

4. The method of generating a cryptographic dictionary of claim 1, wherein before the step of calculating the transformed first number of ciphers through three GRU layers to obtain the second order matrix, the method further comprises:

5. The method for generating a cryptographic dictionary according to claim 1, wherein the "generating a dictionary according to the dictionary model" specifically includes:

acquiring a character set;

judging whether the next character is a preset end character or not;

if yes, adding the password into a dictionary;

6. The method for generating a password dictionary according to claim 5, wherein after "adding the password to the dictionary", the method further comprises:

7. A computer-readable storage medium, on which a computer program is stored, which program, when executed by a processor, performs the steps of:

generating a test set, wherein the test set comprises a plaintext password;

generating a dictionary according to the dictionary model;

generating a new dictionary according to the new dictionary model;

8. The computer-readable storage medium according to claim 7, wherein the "generating a dictionary according to the dictionary model" is specifically:

acquiring a character set;

judging whether the next character is a preset end character or not;

if yes, adding the password into a dictionary;