CN107579821B - Method for generating password dictionary and computer-readable storage medium - Google Patents

Method for generating password dictionary and computer-readable storage medium Download PDF

Info

Publication number
CN107579821B
CN107579821B CN201710851440.1A CN201710851440A CN107579821B CN 107579821 B CN107579821 B CN 107579821B CN 201710851440 A CN201710851440 A CN 201710851440A CN 107579821 B CN107579821 B CN 107579821B
Authority
CN
China
Prior art keywords
password
dictionary
character
current
password set
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN201710851440.1A
Other languages
Chinese (zh)
Other versions
CN107579821A (en
Inventor
张光斌
高志鹏
黄仁裕
姚灿荣
尤俊生
庄进发
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Xiamen Meiya Pico Information Co Ltd
Original Assignee
Xiamen Meiya Pico Information Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Xiamen Meiya Pico Information Co Ltd filed Critical Xiamen Meiya Pico Information Co Ltd
Priority to CN201710851440.1A priority Critical patent/CN107579821B/en
Publication of CN107579821A publication Critical patent/CN107579821A/en
Application granted granted Critical
Publication of CN107579821B publication Critical patent/CN107579821B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Landscapes

  • Document Processing Apparatus (AREA)
  • Financial Or Insurance-Related Operations Such As Payment And Settlement (AREA)

Abstract

The invention discloses a password dictionary generating method and a computer readable storage medium, wherein the method comprises the following steps: collecting a password set; generating a test set; training a current password set through a recurrent neural network model to obtain a dictionary model; generating a dictionary according to the dictionary model; according to the test set, obtaining the hit rate of the current password set; randomly modifying the current password set to obtain a new password set; training the new password set through a recurrent neural network model to obtain a new dictionary model; generating a new dictionary according to the new dictionary model; according to the test set, obtaining the hit rate of a new password set; if the hit rate of the new password set is greater than that of the current password set, adding one to the updating times, and taking the new password set as the current password set; and when the updating times reach a preset first time, generating the password dictionary according to the dictionary model corresponding to the current password set. The password dictionary finally generated by the invention can improve the success rate of password recovery.

Description

Method for generating password dictionary and computer-readable storage medium
Technical Field
The present invention relates to the field of cryptographic technologies, and in particular, to a method for generating a cryptographic dictionary and a computer-readable storage medium.
Background
Brute force traversal and dictionary traversal are the two most common traversal methods in password recovery at present. Violence traversal is to traverse all passwords in a rule set by a user, and dictionary traversal is to traverse passwords in a dictionary file. The brute force mode usually needs more rules to cover the password space as large as possible to improve the success rate of password recovery, but the too large password space will cause the traversal time to be doubled, even hundreds of years of time are needed, and the password recovery is meaningless. The success rate of the dictionary traversal mode is limited by the password number in the dictionary, and as the password number of the dictionary is not too large, the traversal of the password can be completed in a short time, but the common password can be recovered generally, and the method is ineffective for some complex passwords.
How to recover as complex a password as possible in a valid time is currently the main research direction. At present, research is mainly focused on improving the traversal speed through various hardware acceleration, algorithm optimization and distributed technologies aiming at a violent mode so as to shorten the traversal time. But the traversal time can only be linearly reduced by improving the traversal speed, and the traversal time geometric progression is increased by increasing the password length and the password character set; for dictionary traversal, research is mainly focused on collection of dictionaries, and the collected password number is limited.
Researchers carry out social engineering research on a large number of passwords, and statistics and analysis are carried out on rules and characteristics of password setting, so that certain rules of password traversal are summarized to improve success rate of password recovery, such as common rules of name pinyin plus birthday, English word plus number rules and the like. However, the number of password samples is very large, the password distribution is disordered, some rules set by the passwords are difficult to find, and the rules summarized by manual statistics have great limitations.
In chinese patent publication No. CN104717058A, a password traversal method and apparatus are disclosed, which includes: acquiring a preset character set for password traversal; acquiring a probability factor set and an association factor set corresponding to a preset character set, wherein the probability factor set comprises a probability factor corresponding to each bit of a password of each character in the preset character set, and the association factor set comprises association factors of any two characters in the preset character set; and determining a traversal password according to the probability factor set and the association factor set. In the scheme, the password traversal algorithm designed according to the distribution rule and the associated probability of the password characters can reduce traversal time, preferentially construct a password with high probability and improve the success rate, but the algorithm does not analyze the setting rule of the password, cannot expand a password dictionary according to the password rule, and has certain limitation.
Disclosure of Invention
The technical problem to be solved by the invention is as follows: provided are a password dictionary generation method and a computer-readable storage medium, which can improve the success rate of password recovery.
In order to solve the technical problems, the invention adopts the technical scheme that: a method of cryptographic dictionary generation, comprising:
collecting a password set, wherein the password set comprises a real password and a virtual password;
generating a test set, wherein the test set comprises a plaintext password;
training a current password set through a recurrent neural network model to obtain a dictionary model;
generating a dictionary according to the dictionary model;
testing the dictionary according to the test set to obtain a hit rate corresponding to the current password set;
randomly modifying the virtual password in the current password set to obtain a new password set;
training the new password set through a recurrent neural network model to obtain a new dictionary model;
generating a new dictionary according to the new dictionary model;
testing the new dictionary according to the test set to obtain a hit rate corresponding to the new password set;
judging whether the hit rate corresponding to the new password set is greater than the hit rate corresponding to the current password set;
if not, returning to the step of executing the virtual password in the current password set to be randomly modified to obtain a new password set;
if yes, adding one to the updating times, and taking the new password set as the current password set;
when the updating times do not reach the preset first times, returning to the step of randomly modifying the virtual password in the current password set to obtain a new password set;
and when the updating times reach a preset first time, generating the password dictionary according to the dictionary model obtained by training the recurrent neural network model in the current password set.
The invention also relates to a computer-readable storage medium, on which a computer program is stored which, when being executed by a processor, carries out the steps of:
collecting a password set, wherein the password set comprises a real password and a virtual password;
generating a test set, wherein the test set comprises a plaintext password;
training a current password set through a recurrent neural network model to obtain a dictionary model;
generating a dictionary according to the dictionary model;
testing the dictionary according to the test set to obtain a hit rate corresponding to the current password set;
randomly modifying the virtual password in the current password set to obtain a new password set;
training the new password set through a recurrent neural network model to obtain a new dictionary model;
generating a new dictionary according to the new dictionary model;
testing the new dictionary according to the test set to obtain a hit rate corresponding to the new password set;
judging whether the hit rate corresponding to the new password set is greater than the hit rate corresponding to the current password set;
if not, returning to the step of executing the virtual password in the current password set to be randomly modified to obtain a new password set;
if yes, adding one to the updating times, and taking the new password set as the current password set;
when the updating times do not reach the preset first times, returning to the step of randomly modifying the virtual password in the current password set to obtain a new password set;
and when the updating times reach a preset first time, generating the password dictionary according to the dictionary model obtained by training the recurrent neural network model in the current password set.
The invention has the beneficial effects that: the method comprises the steps of utilizing ten million-level password sets as learning samples, generating corresponding dictionary models through a neural network suitable for password rule learning, generating a dictionary through the dictionary models, continuously modifying passwords in the password sets, selecting the password sets corresponding to the dictionaries with higher coincidence degrees of the test sets, and continuously conducting training, modification and hit rate comparison to enable the neural network to continuously learn setting rules of the passwords in the samples, and generating new password dictionaries according to the final dictionary models. The invention learns the password rule in the password sample by using the neural network, generates a new password by using the learned password rule, and can be subsequently used for password recovery.
Drawings
Fig. 1 is a flowchart of a method for generating a password dictionary according to a first embodiment of the present invention;
FIG. 2 is a flowchart of the method of step S3 according to the second embodiment of the present invention;
fig. 3 is a flowchart of the method of step S4 in the third embodiment of the present invention.
Detailed Description
In order to explain technical contents, objects and effects of the present invention in detail, the following detailed description is given with reference to the accompanying drawings in conjunction with the embodiments.
The most key concept of the invention is as follows: and learning the password rule in the password sample by using a neural network, and generating a new password by using the learned password rule for password recovery.
Referring to fig. 1, a method for generating a password dictionary includes:
collecting a password set, wherein the password set comprises a real password and a virtual password;
generating a test set, wherein the test set comprises a plaintext password;
training a current password set through a recurrent neural network model to obtain a dictionary model;
generating a dictionary according to the dictionary model;
testing the dictionary according to the test set to obtain a hit rate corresponding to the current password set;
randomly modifying the virtual password in the current password set to obtain a new password set;
training the new password set through a recurrent neural network model to obtain a new dictionary model;
generating a new dictionary according to the new dictionary model;
testing the new dictionary according to the test set to obtain a hit rate corresponding to the new password set;
judging whether the hit rate corresponding to the new password set is greater than the hit rate corresponding to the current password set;
if not, returning to the step of executing the virtual password in the current password set to be randomly modified to obtain a new password set;
if yes, adding one to the updating times, and taking the new password set as the current password set;
when the updating times do not reach the preset first times, returning to the step of randomly modifying the virtual password in the current password set to obtain a new password set;
and when the updating times reach a preset first time, generating the password dictionary according to the dictionary model obtained by training the recurrent neural network model in the current password set.
From the above description, the beneficial effects of the present invention are: the method has the advantages that the neural network is used for learning the password rule in the password sample, the learned password rule is used for generating a new password, the password can be subsequently recovered, and the setting rule of the password is effectively learned, so that the success rate of password recovery can be greatly improved.
Further, the step of testing the dictionary according to the test set to obtain the hit rate corresponding to the current password set specifically includes:
calculating the number of the passwords in the dictionary, which is the same as the plaintext passwords in the test set;
and calculating the quotient of the number of the passwords and the total number of the plaintext passwords in the test set to obtain the hit rate corresponding to the current password set.
Further, the "training the current password set through the recurrent neural network model to obtain the dictionary model" specifically includes:
constructing a recurrent neural network model, wherein the recurrent neural network model comprises an input layer, a hidden layer and an output layer, and the hidden layer comprises three GRU layers;
reading a group of passwords from a current password set, wherein each group of passwords comprises passwords of a preset first number which are randomly taken out from the password set;
importing the password of the first number into a recurrent neural network model, and converting the password from a character string into a numerical value vector at an input layer;
calculating the converted password of the first number through three GRU layers to obtain a second-order matrix;
inputting the second-order matrix to an output layer to obtain a numerical sequence of a first number;
calculating to obtain the deviation of the prediction result and the actual result of the numerical sequence according to the cost function;
adjusting the weight parameters of the neurons of the three GRU layers through an optimization algorithm according to the deviation and a preset learning rate;
returning to the step of reading a group of passwords from the current password set until all passwords in the password set complete one-time training;
returning to the step of reading the group of passwords from the current password set until the password set finishes training for a preset second time;
and obtaining weight parameters of the neurons of the three GRU layers to obtain a dictionary model.
According to the description, the neural network suitable for learning the password rule is designed, and the corresponding password dictionary model is generated through the neural network, so that the password setting rule in the password sample can be effectively learned, and the success rate of subsequent password recovery is improved.
Further, before the step of importing the password of the first number into a recurrent neural network model and converting the password from a character string to a numerical vector at an input layer, the method further includes:
and filling the password of the first number to a preset length by using 0.
As can be seen from the above description, it is guaranteed that the same size data is processed by the neural network.
Further, before the step of calculating the converted first number of passwords through three GRU layers to obtain a second-order matrix, the method further includes:
and according to a random discarding algorithm, selecting the neurons of the preset proportion of each GRU layer at random, and enabling the neurons to fail.
From the above description, it can be seen that not only is overfitting effectively prevented, but also the performance of the neural network is improved.
Further, the "generating a dictionary according to the dictionary model" specifically includes:
acquiring a character set;
randomly selecting a character from the character set as a first character of a password;
taking the characters as current characters, and calculating the probability of each character through a cost function according to the current characters and the dictionary model;
acquiring a preset second number of characters with the maximum probability, and randomly selecting one character from the second number of characters as a next character of the password;
judging whether the next character is a preset end character or not;
if not, the next character is used as the current character, the step of executing the current character and the dictionary model and obtaining the probability of each character through cost function calculation is returned;
if yes, adding the password into a dictionary;
and returning to the step of randomly selecting one character from the character set as the first character of the password until the number of the passwords in the dictionary reaches a preset third number.
Further, after the "adding the password to the dictionary", the method further includes:
and carrying out duplication elimination on the password and the existing password in the dictionary.
As can be seen from the above description, it can be ensured that the passwords in the dictionary are not repeated.
The invention also proposes a computer-readable storage medium, on which a computer program is stored which, when being executed by a processor, carries out the steps of:
collecting a password set, wherein the password set comprises a real password and a virtual password;
generating a test set, wherein the test set comprises a plaintext password;
training a current password set through a recurrent neural network model to obtain a dictionary model;
generating a dictionary according to the dictionary model;
testing the dictionary according to the test set to obtain a hit rate corresponding to the current password set;
randomly modifying the virtual password in the current password set to obtain a new password set;
training the new password set through a recurrent neural network model to obtain a new dictionary model;
generating a new dictionary according to the new dictionary model;
testing the new dictionary according to the test set to obtain a hit rate corresponding to the new password set;
judging whether the hit rate corresponding to the new password set is greater than the hit rate corresponding to the current password set;
if not, returning to the step of executing the virtual password in the current password set to be randomly modified to obtain a new password set;
if yes, adding one to the updating times, and taking the new password set as the current password set;
when the updating times do not reach the preset first times, returning to the step of randomly modifying the virtual password in the current password set to obtain a new password set;
and when the updating times reach a preset first time, generating the password dictionary according to the dictionary model obtained by training the recurrent neural network model in the current password set.
Further, the step of testing the dictionary according to the test set to obtain the hit rate corresponding to the current password set specifically includes:
calculating the number of the passwords in the dictionary, which is the same as the plaintext passwords in the test set;
and calculating the quotient of the number of the passwords and the total number of the plaintext passwords in the test set to obtain the hit rate corresponding to the current password set.
Further, the "training the current password set through the recurrent neural network model to obtain the dictionary model" specifically includes:
constructing a recurrent neural network model, wherein the recurrent neural network model comprises an input layer, a hidden layer and an output layer, and the hidden layer comprises three GRU layers;
reading a group of passwords from a current password set, wherein each group of passwords comprises passwords of a preset first number which are randomly taken out from the password set;
importing the password of the first number into a recurrent neural network model, and converting the password from a character string into a numerical value vector at an input layer;
calculating the converted password of the first number through three GRU layers to obtain a second-order matrix;
inputting the second-order matrix to an output layer to obtain a numerical sequence of a first number;
calculating to obtain the deviation of the prediction result and the actual result of the numerical sequence according to the cost function;
adjusting the weight parameters of the neurons of the three GRU layers through an optimization algorithm according to the deviation and a preset learning rate;
returning to the step of reading a group of passwords from the current password set until all passwords in the password set complete one-time training;
returning to the step of reading the group of passwords from the current password set until the password set finishes training for a preset second time;
and obtaining weight parameters of the neurons of the three GRU layers to obtain a dictionary model.
Further, before the step of importing the password of the first number into a recurrent neural network model and converting the password from a character string to a numerical vector at an input layer, the method further includes:
and filling the password of the first number to a preset length by using 0.
Further, before the step of calculating the converted first number of passwords through three GRU layers to obtain a second-order matrix, the method further includes:
and according to a random discarding algorithm, selecting the neurons of the preset proportion of each GRU layer at random, and enabling the neurons to fail.
Further, the "generating a dictionary according to the dictionary model" specifically includes:
acquiring a character set;
randomly selecting a character from the character set as a first character of a password;
taking the characters as current characters, and calculating the probability of each character through a cost function according to the current characters and the dictionary model;
acquiring a preset second number of characters with the maximum probability, and randomly selecting one character from the second number of characters as a next character of the password;
judging whether the next character is a preset end character or not;
if not, the next character is used as the current character, the step of executing the current character and the dictionary model and obtaining the probability of each character through cost function calculation is returned;
if yes, adding the password into a dictionary;
and returning to the step of randomly selecting one character from the character set as the first character of the password until the number of the passwords in the dictionary reaches a preset third number.
Further, after the "adding the password to the dictionary", the method further includes:
and carrying out duplication elimination on the password and the existing password in the dictionary.
Example one
Referring to fig. 1, a first embodiment of the present invention is: a password dictionary generation method is based on a recurrent neural network, and the obtained password dictionary can be used for password recovery; the method comprises the following steps:
s1: collecting a password set, wherein the password set comprises a real password and a virtual password; to prevent overfitting, the password set consists of two parts, the first part is from a real password in a real website or an information management system database, and the second part is a virtual password consisting of a common password keyword (such as admin and the like) and a randomly generated suffix; furthermore, the first part accounts for 70% of the total capacity of the cipher, the second part accounts for 30% of the total capacity, and the total capacity is not less than 1000 ten thousand.
S2: generating a test set, wherein the test set comprises a plaintext password; the total capacity of the test set is not less than 100 ten thousand; further, the passwords in the test set are different from the passwords in the password set, that is, 100 ten thousand passwords of the test set are not included in 1000 ten thousand passwords of the password set.
Because a single password has weak character association from the character context, the password does not need to be participled, and each password is directly used as a sequence unit.
S3: training a current password set through a recurrent neural network model to obtain a dictionary model;
s4: generating a dictionary according to the dictionary model;
s5: testing the dictionary according to the test set to obtain a hit rate corresponding to the current password set;
s6: randomly modifying the virtual password in the current password set to obtain a new password set; specifically, the virtual password may be changed at intervals in a dynamically inserted manner.
S7: training the new password set through a recurrent neural network model to obtain a new dictionary model;
s8: generating a new dictionary according to the new dictionary model;
s9: testing the new dictionary according to the test set to obtain a hit rate corresponding to the new password set;
s10: and judging whether the hit rate corresponding to the new password set is greater than the hit rate corresponding to the current password set, if so, executing the step S11, and if not, executing the step S6.
S11: adding one to the updating times, and taking the new password set as the current password set; the initial value of the number of updates is 0.
S12: and judging whether the updating times reach a preset first time, if so, executing the step S13, and if not, executing the step S6. Preferably, the preset first time is 100 times, that is, the termination condition is that the original password set is successfully updated 100 times.
S13: and generating a password dictionary according to a dictionary model obtained by training a current password set through a recurrent neural network model.
Specifically, the calculation of the hit rate in step S5 is as follows: calculating the number of the passwords in the dictionary, which is the same as the plaintext passwords in the test set; and calculating the quotient of the number of the passwords and the total number of the plaintext passwords in the test set to obtain the hit rate corresponding to the current password set. That is, if one password in the dictionary is the same as one password in 100 ten thousand passwords in the test set, that is, one password is hit, and if N passwords are hit, the hit rate is N/1000000. Hit rate calculation in step S9 and so on.
Further, the password dictionary obtained in step S13 can be used for password recovery,
in the embodiment, ten million-level password sets are used as learning samples, corresponding dictionary models are generated through a neural network suitable for password rule learning, a dictionary is generated through the dictionary models, passwords in the password sets are continuously modified, a password set corresponding to a dictionary with a higher coincidence degree with a test set is selected, training, modification and hit rate comparison are continuously performed, the neural network is enabled to continuously learn the setting rule of the passwords in the samples, a new password dictionary is generated according to a final dictionary model, and the new passwords meet the setting rule of the passwords in the samples and are closer to practical application, so that the method is more effective in improving the success rate of password recovery.
Example two
Referring to fig. 2, the present embodiment is a further development of steps S3 and S7 in the first embodiment, taking step S3 as an example, step S3 includes the following steps:
s301: constructing a recurrent neural network model, wherein the recurrent neural network model comprises an input layer, a hidden layer and an output layer, and the hidden layer comprises three GRU layers.
Specifically, a recurrent neural network is adopted as a reference structure of the model, wherein, in order to have the memory capacity of long-time information sequence context, a long-time memory network Layer (LSTM) is adopted as a basic unit of the model, and in order to improve the efficiency and reduce the operand, a gate control cycle unit (GRU) which is a variant model of the network is further adopted, and an original LSTM model is forgotten to be gated and input to be gated to be updated, so that the dictionary generation speed is improved on the premise of keeping the precision.
Wherein, because the password length basically does not exceed 128 characters, the longest password sequence length is set as 128 characters, each GRU weight matrix is initialized with 0.05 Gaussian standard deviation, the hidden layer width is set as 128 characters, and the bias initialization is constant 0; in order to facilitate the control of information flow, the threshold is controlled by a nonlinear sigmoid function (sigmoid function), which has the advantages of nonlinearity, monotone increment and microminiaturity in the definition domain.
In order to prevent overfitting (overfitting is that the training samples are well learned, but the hit rate of the passwords in the training samples is low), the random discarding algorithm (dropout algorithm) is added to each GRU layer, and the ratio of the random discarding algorithm to the training samples is set to be 0.2, so that not only can overfitting be effectively prevented, but also the performance of the neural network is improved.
The neural network in this embodiment uses softmax _ cross _ entry _ with _ locations as a cost function, and the cost function is used to determine the deviation between the predicted result and the actual result. The gradient descent algorithm with an attenuation coefficient (RMSProp algorithm) is taken as an optimizer, and the adaptive weight capability is obtained by introducing an attenuation coefficient. The initial learning rate was set to 0.0001, and the gradual attenuation was performed at an attenuation rate of 20% per round of training.
S302: reading a group of passwords from a current password set, wherein each group of passwords comprises passwords of a preset first number which are randomly taken out from the password set; the first number can be set as required, and preferably is 1024. The current password set is used as a password sample, 1024 passwords are read randomly each time, and the passwords are led into the recurrent neural network model in batches for training, so that the processing speed can be improved.
S303: filling the passwords of the first number to a preset length by using 0, namely filling 1024 passwords to the same length by using 0; further the preset length is 128 characters.
S304: importing the password of the first number into a recurrent neural network model, and converting the password from a character string into a numerical value vector at an input layer; in this embodiment, the magnitude of the numerical vector is 1024 × 128. Furthermore, the association relationship between characters and numbers can be preset, and generally one character corresponds to one number; and then converting the password from the character string into a numerical value vector according to the incidence relation.
S305: and according to a random discarding algorithm (dropout algorithm), selecting neurons of a preset proportion of each GRU layer at random, and enabling the neurons to fail. In this embodiment, the predetermined ratio is 20%, that is, the passwords of the 20% neurons passing through a GRU layer do not participate in the calculation of the GRU layer.
S306: calculating the converted password of the first number through three GRU layers to obtain a second-order matrix; the data of the input layer is subjected to calculation of a GRU layer and combined with a random discarding algorithm, the influence of 20% of the randomly selected GRU layers is discarded, a 1024 x 128 second-order matrix is obtained, the second-order matrix is subjected to calculation of the GRU layer again, namely the steps are repeated, and the final 1024 x 128 second-order matrix is obtained after the calculation of 3 GRU layers in total.
S307: inputting the second-order matrix to an output layer to obtain a numerical sequence of a first number; the calculated structure of the GRU layer is input to the output layer, and a 1024 numerical sequence is obtained.
S308: calculating to obtain the deviation of the prediction result and the actual result of the numerical sequence according to the cost function;
s309: adjusting weight parameters of neurons of the three GRU layers through an optimization algorithm (RMSProp algorithm) according to the deviation and a preset learning rate; further, the offset amount may also be adjusted simultaneously.
S310: and judging whether all the passwords in the current password set complete one-time training, namely whether all the passwords participate in one-time training, if so, indicating that the current password set successfully completes one-time training, and executing step S311, otherwise, executing step S302.
S311: and judging whether the current password set completes the training for the preset second time, if so, executing step S312, and if not, executing step S302. The second number represents the effect of the model before and after the training for a certain period of time, and is not a fixed value. In this embodiment, the second frequency is 10. All passwords in the password set participate in the training once, i.e. a complete training round is completed, and after 10 training rounds, step S312 is executed.
S312: and obtaining weight parameters of the neurons of the three GRU layers to obtain a dictionary model. Through the steps, the weight parameters are continuously adjusted, and finally adjusted weight parameters are obtained, namely the dictionary model.
In the embodiment, the neural network suitable for learning the password rule is designed, and the corresponding password dictionary model is generated through the neural network, so that the password setting rule in the password sample can be effectively learned, and the success rate of subsequent password recovery is improved.
EXAMPLE III
Referring to fig. 3, the present embodiment is a further development of steps S4 and S8 in the above embodiment, taking step S4 as an example, step S4 includes the following steps:
s401: and acquiring a character set which comprises a-z, 0-9 and each special character.
S402: randomly selecting a character from the character set as a first character of a password;
s403: and taking the characters as current characters, and calculating the probability of each character according to the current characters and the dictionary model through a cost function, namely the probability of each character after the current character.
S404: acquiring a preset second number of characters with the maximum probability, and randomly selecting one character from the second number of characters as a next character of the password; in this embodiment, the second number is 5, that is, one character is selected from the 5 characters with the highest probability at any time.
S405: and judging whether the next character is a preset end character, if so, executing the step S406, otherwise, taking the next character as the current character, calculating the probability of each character after the character through a cost function by combining a dictionary model, and returning to execute the step S403. In this embodiment, the terminator may be "eos".
S406: adding the password into a dictionary;
s407: the password and the existing password in the dictionary are subjected to duplication elimination; namely, whether the password identical to the password exists in the existing passwords of the dictionary is judged, if yes, the password is deleted, and the passwords in the dictionary are not repeated.
S408: and judging whether the number of the passwords in the dictionary reaches a preset third number, if so, executing the step S409, otherwise, randomly selecting one character from the character set as a first character of a new password, and returning to execute the step S402. In this embodiment, the third number is 1 hundred million.
S409: and obtaining a dictionary.
According to the embodiment, the dictionary is generated through the dictionary model learned by the neural network, and the hit rate of the dictionary and the test set is calculated subsequently, so that the learning degree of the neural network can be effectively reflected.
Example four
The present embodiment is a computer-readable storage medium corresponding to the above-mentioned embodiments, on which a computer program is stored, which when executed by a processor implements the steps of:
collecting a password set, wherein the password set comprises a real password and a virtual password;
generating a test set, wherein the test set comprises a plaintext password;
training a current password set through a recurrent neural network model to obtain a dictionary model;
generating a dictionary according to the dictionary model;
testing the dictionary according to the test set to obtain a hit rate corresponding to the current password set;
randomly modifying the virtual password in the current password set to obtain a new password set;
training the new password set through a recurrent neural network model to obtain a new dictionary model;
generating a new dictionary according to the new dictionary model;
testing the new dictionary according to the test set to obtain a hit rate corresponding to the new password set;
judging whether the hit rate corresponding to the new password set is greater than the hit rate corresponding to the current password set;
if not, returning to the step of executing the virtual password in the current password set to be randomly modified to obtain a new password set;
if yes, adding one to the updating times, and taking the new password set as the current password set;
when the updating times do not reach the preset first times, returning to the step of randomly modifying the virtual password in the current password set to obtain a new password set;
and when the updating times reach a preset first time, generating the password dictionary according to the dictionary model obtained by training the recurrent neural network model in the current password set.
Further, the step of testing the dictionary according to the test set to obtain the hit rate corresponding to the current password set specifically includes:
calculating the number of the passwords in the dictionary, which is the same as the plaintext passwords in the test set;
and calculating the quotient of the number of the passwords and the total number of the plaintext passwords in the test set to obtain the hit rate corresponding to the current password set.
Further, the "training the current password set through the recurrent neural network model to obtain the dictionary model" specifically includes:
constructing a recurrent neural network model, wherein the recurrent neural network model comprises an input layer, a hidden layer and an output layer, and the hidden layer comprises three GRU layers;
reading a group of passwords from a current password set, wherein each group of passwords comprises passwords of a preset first number which are randomly taken out from the password set;
importing the password of the first number into a recurrent neural network model, and converting the password from a character string into a numerical value vector at an input layer;
calculating the converted password of the first number through three GRU layers to obtain a second-order matrix;
inputting the second-order matrix to an output layer to obtain a numerical sequence of a first number;
calculating to obtain the deviation of the prediction result and the actual result of the numerical sequence according to the cost function;
adjusting the weight parameters of the neurons of the three GRU layers through an optimization algorithm according to the deviation and a preset learning rate;
returning to the step of reading a group of passwords from the current password set until all passwords in the password set complete one-time training;
returning to the step of reading the group of passwords from the current password set until the password set finishes training for a preset second time;
and obtaining weight parameters of the neurons of the three GRU layers to obtain a dictionary model.
Further, before the step of importing the password of the first number into a recurrent neural network model and converting the password from a character string to a numerical vector at an input layer, the method further includes:
and filling the password of the first number to a preset length by using 0.
Further, before the step of calculating the converted first number of passwords through three GRU layers to obtain a second-order matrix, the method further includes:
and according to a random discarding algorithm, selecting the neurons of the preset proportion of each GRU layer at random, and enabling the neurons to fail.
Further, the "generating a dictionary according to the dictionary model" specifically includes:
acquiring a character set;
randomly selecting a character from the character set as a first character of a password;
taking the characters as current characters, and calculating the probability of each character through a cost function according to the current characters and the dictionary model;
acquiring a preset second number of characters with the maximum probability, and randomly selecting one character from the second number of characters as a next character of the password;
judging whether the next character is a preset end character or not;
if not, the next character is used as the current character, the step of executing the current character and the dictionary model and obtaining the probability of each character through cost function calculation is returned;
if yes, adding the password into a dictionary;
and returning to the step of randomly selecting one character from the character set as the first character of the password until the number of the passwords in the dictionary reaches a preset third number.
Further, after the "adding the password to the dictionary", the method further includes:
and carrying out duplication elimination on the password and the existing password in the dictionary.
In summary, the method for generating a password dictionary and the computer-readable storage medium provided by the present invention utilize ten million levels of password sets as learning samples, generate a corresponding dictionary model through a neural network suitable for learning a password rule, generate a dictionary through the dictionary model, continuously modify the passwords in the password sets, select the password sets corresponding to the dictionaries with higher coincidence degrees in the test sets, and continue to perform training, modification and hit rate comparison, so that the neural network continuously learns the setting rules of the passwords in the samples, generate a new password dictionary according to the final dictionary model, and since the new passwords all satisfy the rules of password setting in the samples and are closer to practical application, the method is more effective in improving the success rate of password recovery. The invention learns the password rule in the password sample by using the neural network, generates a new password by using the learned password rule, and can be subsequently used for password recovery.
The above description is only an embodiment of the present invention, and not intended to limit the scope of the present invention, and all equivalent changes made by using the contents of the present specification and the drawings, or applied directly or indirectly to the related technical fields, are included in the scope of the present invention.

Claims (8)

1. A method for generating a cryptographic dictionary, comprising:
collecting a password set, wherein the password set comprises a real password and a virtual password;
generating a test set, wherein the test set comprises a plaintext password;
training a current password set through a recurrent neural network model to obtain a dictionary model;
generating a dictionary according to the dictionary model;
testing the dictionary according to the test set to obtain a hit rate corresponding to the current password set;
randomly modifying the virtual password in the current password set to obtain a new password set;
training the new password set through a recurrent neural network model to obtain a new dictionary model;
generating a new dictionary according to the new dictionary model;
testing the new dictionary according to the test set to obtain a hit rate corresponding to the new password set;
judging whether the hit rate corresponding to the new password set is greater than the hit rate corresponding to the current password set;
if not, returning to the step of executing the virtual password in the current password set to be randomly modified to obtain a new password set;
if yes, adding one to the updating times, and taking the new password set as the current password set;
when the updating times do not reach the preset first times, returning to the step of randomly modifying the virtual password in the current password set to obtain a new password set;
when the updating times reach a preset first time, generating a password dictionary according to a dictionary model obtained by training a recurrent neural network model in a current password set;
the method for training the current password set through the recurrent neural network model to obtain the dictionary model specifically comprises the following steps:
constructing a recurrent neural network model, wherein the recurrent neural network model comprises an input layer, a hidden layer and an output layer, the hidden layer comprises three GRU layers, and the GRU is a gating cycle unit;
reading a group of passwords from a current password set, wherein each group of passwords comprises passwords of a preset first number which are randomly taken out from the password set;
importing the password of the first number into a recurrent neural network model, and converting the password from a character string into a numerical value vector at an input layer;
calculating the converted password of the first number through three GRU layers to obtain a second-order matrix;
inputting the second-order matrix to an output layer to obtain a numerical sequence of a first number;
calculating to obtain the deviation of the prediction result and the actual result of the numerical sequence according to the cost function;
adjusting the weight parameters of the neurons of the three GRU layers through an optimization algorithm according to the deviation and a preset learning rate;
returning to the step of reading a group of passwords from the current password set until all passwords in the password set complete one-time training;
returning to the step of reading the group of passwords from the current password set until the password set finishes training for a preset second time;
and obtaining weight parameters of the neurons of the three GRU layers to obtain a dictionary model.
2. The method for generating a cryptographic dictionary according to claim 1, wherein the step of testing the dictionary according to the test set to obtain the hit rate corresponding to the current cryptographic set specifically comprises:
calculating the number of the passwords in the dictionary, which is the same as the plaintext passwords in the test set;
and calculating the quotient of the number of the passwords and the total number of the plaintext passwords in the test set to obtain the hit rate corresponding to the current password set.
3. The method for generating a cryptographic dictionary of claim 1, wherein before importing the first number of passwords into a recurrent neural network model and converting the passwords from character strings to numerical vectors at an input layer, the method further comprises:
and filling the password of the first number to a preset length by using 0.
4. The method of generating a cryptographic dictionary of claim 1, wherein before the step of calculating the transformed first number of ciphers through three GRU layers to obtain the second order matrix, the method further comprises:
and according to a random discarding algorithm, selecting the neurons of the preset proportion of each GRU layer at random, and enabling the neurons to fail.
5. The method for generating a cryptographic dictionary according to claim 1, wherein the "generating a dictionary according to the dictionary model" specifically includes:
acquiring a character set;
randomly selecting a character from the character set as a first character of a password;
taking the characters as current characters, and calculating the probability of each character through a cost function according to the current characters and the dictionary model;
acquiring a preset second number of characters with the maximum probability, and randomly selecting one character from the second number of characters as a next character of the password;
judging whether the next character is a preset end character or not;
if not, the next character is used as the current character, the step of executing the current character and the dictionary model and obtaining the probability of each character through cost function calculation is returned;
if yes, adding the password into a dictionary;
and returning to the step of randomly selecting one character from the character set as the first character of the password until the number of the passwords in the dictionary reaches a preset third number.
6. The method for generating a password dictionary according to claim 5, wherein after "adding the password to the dictionary", the method further comprises:
and carrying out duplication elimination on the password and the existing password in the dictionary.
7. A computer-readable storage medium, on which a computer program is stored, which program, when executed by a processor, performs the steps of:
collecting a password set, wherein the password set comprises a real password and a virtual password;
generating a test set, wherein the test set comprises a plaintext password;
training a current password set through a recurrent neural network model to obtain a dictionary model;
generating a dictionary according to the dictionary model;
testing the dictionary according to the test set to obtain a hit rate corresponding to the current password set;
randomly modifying the virtual password in the current password set to obtain a new password set;
training the new password set through a recurrent neural network model to obtain a new dictionary model;
generating a new dictionary according to the new dictionary model;
testing the new dictionary according to the test set to obtain a hit rate corresponding to the new password set;
judging whether the hit rate corresponding to the new password set is greater than the hit rate corresponding to the current password set;
if not, returning to the step of executing the virtual password in the current password set to be randomly modified to obtain a new password set;
if yes, adding one to the updating times, and taking the new password set as the current password set;
when the updating times do not reach the preset first times, returning to the step of randomly modifying the virtual password in the current password set to obtain a new password set;
when the updating times reach a preset first time, generating a password dictionary according to a dictionary model obtained by training a recurrent neural network model in a current password set;
the method for training the current password set through the recurrent neural network model to obtain the dictionary model specifically comprises the following steps:
constructing a recurrent neural network model, wherein the recurrent neural network model comprises an input layer, a hidden layer and an output layer, the hidden layer comprises three GRU layers, and the GRU is a gating cycle unit;
reading a group of passwords from a current password set, wherein each group of passwords comprises passwords of a preset first number which are randomly taken out from the password set;
importing the password of the first number into a recurrent neural network model, and converting the password from a character string into a numerical value vector at an input layer;
calculating the converted password of the first number through three GRU layers to obtain a second-order matrix;
inputting the second-order matrix to an output layer to obtain a numerical sequence of a first number;
calculating to obtain the deviation of the prediction result and the actual result of the numerical sequence according to the cost function;
adjusting the weight parameters of the neurons of the three GRU layers through an optimization algorithm according to the deviation and a preset learning rate;
returning to the step of reading a group of passwords from the current password set until all passwords in the password set complete one-time training;
returning to the step of reading the group of passwords from the current password set until the password set finishes training for a preset second time;
and obtaining weight parameters of the neurons of the three GRU layers to obtain a dictionary model.
8. The computer-readable storage medium according to claim 7, wherein the "generating a dictionary according to the dictionary model" is specifically:
acquiring a character set;
randomly selecting a character from the character set as a first character of a password;
taking the characters as current characters, and calculating the probability of each character through a cost function according to the current characters and the dictionary model;
acquiring a preset second number of characters with the maximum probability, and randomly selecting one character from the second number of characters as a next character of the password;
judging whether the next character is a preset end character or not;
if not, the next character is used as the current character, the step of executing the current character and the dictionary model and obtaining the probability of each character through cost function calculation is returned;
if yes, adding the password into a dictionary;
and returning to the step of randomly selecting one character from the character set as the first character of the password until the number of the passwords in the dictionary reaches a preset third number.
CN201710851440.1A 2017-09-19 2017-09-19 Method for generating password dictionary and computer-readable storage medium Active CN107579821B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201710851440.1A CN107579821B (en) 2017-09-19 2017-09-19 Method for generating password dictionary and computer-readable storage medium

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201710851440.1A CN107579821B (en) 2017-09-19 2017-09-19 Method for generating password dictionary and computer-readable storage medium

Publications (2)

Publication Number Publication Date
CN107579821A CN107579821A (en) 2018-01-12
CN107579821B true CN107579821B (en) 2020-04-28

Family

ID=61036277

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201710851440.1A Active CN107579821B (en) 2017-09-19 2017-09-19 Method for generating password dictionary and computer-readable storage medium

Country Status (1)

Country Link
CN (1) CN107579821B (en)

Families Citing this family (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN108446761B (en) * 2018-03-23 2021-07-20 中国科学院计算技术研究所 Neural network accelerator and data processing method
CN109492385A (en) * 2018-11-05 2019-03-19 桂林电子科技大学 A kind of method for generating cipher code based on deep learning
CN109558723A (en) * 2018-12-06 2019-04-02 南京中孚信息技术有限公司 Password dictionary generation method, device and computer equipment
CN112257648A (en) * 2020-11-03 2021-01-22 泰山学院 Signal classification and identification method based on improved recurrent neural network
CN114338015B (en) * 2022-01-05 2024-03-29 北京华云安信息技术有限公司 Method, device, equipment and storage medium for generating password dictionary
CN115276983A (en) * 2022-07-29 2022-11-01 四川启睿克科技有限公司 Password dictionary management method for penetration test

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US8295480B1 (en) * 2007-07-10 2012-10-23 Avaya Inc. Uncertainty-based key agreement protocol
CN104573142A (en) * 2013-10-10 2015-04-29 无锡市思库瑞科技信息有限公司 Password attribute analysis method based on neural network
CN104717058A (en) * 2015-02-10 2015-06-17 厦门市美亚柏科信息股份有限公司 Cipher traversal method and device
CN106250931A (en) * 2016-08-03 2016-12-21 武汉大学 A kind of high-definition picture scene classification method based on random convolutional neural networks
CN107122479A (en) * 2017-05-03 2017-09-01 西安交通大学 A kind of user cipher conjecture system based on deep learning

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US8295480B1 (en) * 2007-07-10 2012-10-23 Avaya Inc. Uncertainty-based key agreement protocol
CN104573142A (en) * 2013-10-10 2015-04-29 无锡市思库瑞科技信息有限公司 Password attribute analysis method based on neural network
CN104717058A (en) * 2015-02-10 2015-06-17 厦门市美亚柏科信息股份有限公司 Cipher traversal method and device
CN106250931A (en) * 2016-08-03 2016-12-21 武汉大学 A kind of high-definition picture scene classification method based on random convolutional neural networks
CN107122479A (en) * 2017-05-03 2017-09-01 西安交通大学 A kind of user cipher conjecture system based on deep learning

Also Published As

Publication number Publication date
CN107579821A (en) 2018-01-12

Similar Documents

Publication Publication Date Title
CN107579821B (en) Method for generating password dictionary and computer-readable storage medium
CN110414219B (en) Injection attack detection method based on gated cycle unit and attention mechanism
Xie et al. Sql injection detection for web applications based on elastic-pooling cnn
Xu et al. Hierarchical bidirectional RNN for safety-enhanced B5G heterogeneous networks
CN107122479A (en) A kind of user cipher conjecture system based on deep learning
CN113691542B (en) Web attack detection method and related equipment based on HTTP request text
EP4133394A1 (en) Unstructured text classification
CN111966998A (en) Password generation method, system, medium, and apparatus based on variational automatic encoder
CN112131578A (en) Method and device for training attack information prediction model, electronic equipment and storage medium
CN113569001A (en) Text processing method and device, computer equipment and computer readable storage medium
CN110826056A (en) Recommendation system attack detection method based on attention convolution self-encoder
CN106803092B (en) Method and device for determining standard problem data
Uwagbole et al. Numerical encoding to tame SQL injection attacks
CN115269861A (en) Reinforced learning knowledge graph reasoning method based on generative confrontation and imitation learning
CN113312609B (en) Password cracking method and system of generative confrontation network based on strategy gradient
Mahara et al. Fake news detection: A RNN-LSTM, Bi-LSTM based deep learning approach
Remmide et al. Detection of phishing URLs using temporal convolutional network
Yan et al. Cross-site scripting attack detection based on a modified convolution neural network
Fang et al. Password guessing based on semantic analysis and neural networks
Zhang et al. Deep learning for password guessing and password strength evaluation, A survey
CN109308295A (en) A kind of privacy exposure method of real-time of data-oriented publication
CN116245146A (en) Ranking learning method, system and application for generating countermeasure network based on evolution condition
CN106293114B (en) Predict the method and device of user's word to be entered
Lopardo et al. Faithful and Robust Local Interpretability for Textual Predictions
Wang et al. Modeling password guessability via variational auto-encoder

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant