CN114330306A - Deep learning-based password dictionary generation technology - Google Patents

Deep learning-based password dictionary generation technology Download PDF

Info

Publication number
CN114330306A
CN114330306A CN202111652277.9A CN202111652277A CN114330306A CN 114330306 A CN114330306 A CN 114330306A CN 202111652277 A CN202111652277 A CN 202111652277A CN 114330306 A CN114330306 A CN 114330306A
Authority
CN
China
Prior art keywords
password
characters
dictionary
character
model
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202111652277.9A
Other languages
Chinese (zh)
Inventor
刘慧敏
肖晟
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Hunan University
Original Assignee
Hunan University
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Hunan University filed Critical Hunan University
Priority to CN202111652277.9A priority Critical patent/CN114330306A/en
Publication of CN114330306A publication Critical patent/CN114330306A/en
Pending legal-status Critical Current

Links

Images

Abstract

The invention discloses a password dictionary generation technology based on deep learning, which is characterized in that a password meeting experimental requirements is screened out from a data set, word2vec is used for carrying out word embedding processing on the password to obtain a character vector corresponding to characters forming the password, the character vector is input into an lstm neural network after certain processing, and the password dictionary generation is carried out according to an obtained model. Compared with PCFG (probability context-free grammar), Markov model, GAN and the like, the LSTM has better performance in the aspect of natural language processing, so the invention treats characters forming the password as words in natural language for processing, can obtain the password dictionary which accords with the habit of setting the password by human and has higher hit rate by applying the LSTM method, can improve weak password dictionary and password explosion effect to a certain extent, and can be applied to various security scenes.

Description

Deep learning-based password dictionary generation technology
Technical Field
The invention relates to the technical field of password data, in particular to a password dictionary generation technology based on deep learning.
Background
While the internet is developed, various security events are accompanied, a plurality of password data sets are revealed, and the revealed user passwords also reflect some information of the user set passwords. With these data sets, there are many researches on password attack, password generation, user password habits, etc., where password generation has become an emerging problem to improve the efficiency of identity verification in social engineering and plays an important role in checking security vulnerabilities.
More and more researchers have proven that methods in neural networks are more accurate and practical for password guessing than traditional methods. Meanwhile, the password created by human beings is learned, a model which is more in line with the habit of generating the password by the human beings can be trained and generated, and an anthropomorphic password dictionary is generated through the obtained model, so that the generated password dictionary is better represented in the password guessing process.
Disclosure of Invention
Aiming at the defects of the prior art, the invention provides a password dictionary generation technology based on deep learning, which can generate a password dictionary more conforming to the habit of generating passwords by human beings, so that password guessing based on deep learning has better effect compared with the traditional methods of traversing weak password dictionaries, brute force password cracking and the like.
In order to achieve the purpose, the invention is realized by the following technical scheme: a deep learning-based password dictionary generation technology specifically comprises the following steps:
s1, washing the password data set, screening out the passwords meeting the experimental requirements, and screening out the passwords which contain the characters which are the combination of letters, numbers and symbols and have the length of 8-16 bits. And dividing the data set into a training set, a validation set and a test set;
s2, word2vec carries out word embedding processing on the password to obtain character vectors corresponding to characters forming the password, and preparation is made for inputting an lstm neural network in the next step;
s3, forming an input list consisting of the character vectors by using the character vectors generated by the step S2, and completing the character vectors with space less than 16 bits as the input of the whole model; taking the ascii code corresponding to the character as a label, removing all the characters with the first bit, and complementing 16 bits by a fixed numerical value M;
s4, inputting the vectors processed in S3 into an lstm neural network model, setting the size of an output list, selecting cross entropy to calculate the loss of the model, obtaining model parameters with small loss and excellent effect through training for several times, and using the model parameters as the generation of a password dictionary;
s5, using the model obtained in S4, generating a password dictionary.
Preferably, in S1, the screening method is to screen out a password which contains characters which are combinations of letters, numbers and symbols and has a length of 8-16 bits, and the data set is divided into a training set, a verification set and a test set.
Preferably, in said S3, we finally need to determine what value to output, which will be based on the cell state, but also a filtered version, first running a sigmoid layer to determine which part of the cell state to output, then processing the cell state by tanh (to obtain a value between-1 and 1) and multiplying it with the output of the sigmoid gate, and finally outputting only that part which determines the output.
Preferably, in S5, the generating of the password dictionary specifically includes the following steps:
s1, acquiring the probability of the first letter of all password data for later use, and initializing an empty set S;
s2, setting the size of the password dictionary as N, and circularly generating a password sequence by taking the condition that whether the password number count (S) in S is less than N or not;
s3, initializing a null set password, randomly selecting one of the first lambda characters with the maximum probability and adding the selected character into the password;
s4, circularly generating password characters by taking the condition that whether the number of the characters in password is less than or equal to the maximum password length or not; loading the contents of password into a prediction function in the model to obtain the probability of the next character;
s5, randomly selecting one of the first lambda characters with the maximum probability and adding the selected character into password until the randomly selected character is M or the number of the characters of password is equal to the maximum password length;
and S6, when the circulation is equal to the size of the password dictionary, obtaining the password dictionary with the expected size N.
Advantageous effects
The invention aims to optimize the existing non-intelligent password cracking modes such as traversing weak password dictionaries, brute force cracking and the like; compared with PCFG (probability context-free grammar), Markov model, GAN and the like, LSTM has better performance in the aspect of natural language processing, so the characters forming the password are taken as words in natural language to be processed, a password dictionary which accords with the habit of setting the password by human and has higher hit rate can be obtained by applying the LSTM method, a weak password dictionary and a password blasting effect can be improved to a certain extent, and the LSTM can be applied to various security scenes.
Drawings
FIG. 1 is a flow diagram of a deep learning cryptographic dictionary generation technique of the present invention;
FIG. 2 is a schematic diagram of an LSTM neural network according to the present invention;
FIG. 3 is a flowchart of the generation of a cryptographic dictionary in accordance with the present invention.
Detailed Description
The technical solutions in the embodiments of the present invention will be clearly and completely described below with reference to the drawings in the embodiments of the present invention, and it is obvious that the described embodiments are only a part of the embodiments of the present invention, and not all of the embodiments. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present invention.
Referring to fig. 1-3, the present invention provides a technical solution: a deep learning-based password dictionary generation technology specifically comprises the following steps:
s1, washing the password data set, screening out the passwords meeting the experimental requirements, and screening out the passwords which contain the characters which are the combination of letters, numbers and symbols and have the length of 8-16 bits. And dividing the data set into a training set, a validation set and a test set;
s2, word2vec carries out word embedding processing on the password to obtain character vectors corresponding to characters forming the password, and preparation is made for inputting an lstm neural network in the next step;
s3, forming an input list consisting of the character vectors by using the character vectors generated by the step S2, and completing the character vectors with space less than 16 bits as the input of the whole model; taking the ascii code corresponding to the character as a label, removing all the characters with the first bit, and complementing 16 bits by a fixed numerical value M;
s4, inputting the vectors processed in S3 into an lstm neural network model, setting the size of an output list, selecting cross entropy to calculate the loss of the model, obtaining model parameters with small loss and excellent effect through training for several times, and using the model parameters as the generation of a password dictionary;
s5, generating a dictionary using the model obtained in S4, and the contents not described in detail in this specification are all prior art known to those skilled in the art.
In the present invention, in S1, the screening method is to screen out a password which contains characters which are combinations of letters, numbers and symbols and has a length of 8-16 bits, and to divide the data set into a training set, a verification set and a test set.
In the present invention, in said S3, we finally need to determine what value to output, which will be based on the cell state, but is also a filtered version, first run a sigmoid layer to determine which part of the cell state to output, then process the cell state by tanh (to get a value between-1 and 1) and multiply it with the output of the sigmoid gate, and finally output only that part which determines the output.
In the present invention, in S5, the password dictionary generation specifically includes the following steps:
s1, acquiring the probability of the first letter of all password data for later use, and initializing an empty set S;
s2, setting the size of the password dictionary as N, and circularly generating a password sequence by taking the condition that whether the password number count (S) in S is less than N or not;
s3, initializing a null set password, randomly selecting one of the first lambda characters with the maximum probability and adding the selected character into the password;
s4, circularly generating password characters by taking the condition that whether the number of the characters in password is less than or equal to the maximum password length or not; loading the contents of password into a prediction function in the model to obtain the probability of the next character;
s5, randomly selecting one of the first lambda characters with the maximum probability and adding the selected character into password until the randomly selected character is M or the number of the characters of password is equal to the maximum password length;
and S6, when the circulation is equal to the size of the password dictionary, obtaining the password dictionary with the expected size N.
It is noted that, herein, relational terms such as first and second, and the like may be used solely to distinguish one entity or action from another entity or action without necessarily requiring or implying any actual such relationship or order between such entities or actions. Also, the terms "comprises," "comprising," or any other variation thereof, are intended to cover a non-exclusive inclusion, such that a process, method, article, or apparatus that comprises a list of elements does not include only those elements but may include other elements not expressly listed or inherent to such process, method, article, or apparatus.
Although embodiments of the present invention have been shown and described, it will be appreciated by those skilled in the art that changes, modifications, substitutions and alterations can be made in these embodiments without departing from the principles and spirit of the invention, the scope of which is defined in the appended claims and their equivalents.

Claims (4)

1. A deep learning-based password dictionary generation technology is characterized in that: the method specifically comprises the following steps:
s1, washing the password data set, screening out the passwords meeting the experimental requirements, and screening out the passwords which contain the characters which are the combination of letters, numbers and symbols and have the length of 8-16 bits. And dividing the data set into a training set, a validation set and a test set;
s2, word2vec carries out word embedding processing on the password to obtain character vectors corresponding to characters forming the password, and preparation is made for inputting an lstm neural network in the next step;
s3, forming an input list consisting of the character vectors by using the character vectors generated by the step S2, and completing the character vectors with space less than 16 bits as the input of the whole model; taking the ascii code corresponding to the character as a label, removing all the characters with the first bit, and complementing 16 bits by a fixed numerical value M;
s4, inputting the vectors processed in S3 into an lstm neural network model, setting the size of an output list, selecting cross entropy to calculate the loss of the model, obtaining model parameters with small loss and excellent effect through training for several times, and using the model parameters as the generation of a password dictionary;
s5, using the model obtained in S4, generating a password dictionary.
2. The deep learning based password dictionary generation technology according to claim 1, wherein: in S1, the screening method is to screen out a password which contains characters which are combinations of letters, numbers and symbols and has a length of 8-16 bits, and to divide the data set into a training set, a verification set and a test set.
3. The deep learning based password dictionary generation technology of claim 4, wherein: in said S3, we finally need to determine what value to output, which will be based on the cell state, but is also a filtered version, first run a sigmoid layer to determine which part of the cell state to output, then process the cell state by tanh (to get a value between-1 and 1) and multiply it with the output of the sigmoid gate, and finally output only that part which determines the output.
4. The deep learning based password dictionary generation technology according to claim 1, wherein: in S5, the password dictionary generation specifically includes the following steps:
s1, acquiring the probability of the first letter of all password data for later use, and initializing an empty set S;
s2, setting the size of the password dictionary as N, and circularly generating a password sequence by taking the condition that whether the password number count (S) in S is less than N or not;
s3, initializing a null set password, randomly selecting one of the first lambda characters with the maximum probability and adding the selected character into the password;
s4, circularly generating password characters by taking the condition that whether the number of the characters in password is less than or equal to the maximum password length or not; loading the contents of password into a prediction function in the model to obtain the probability of the next character;
s5, randomly selecting one of the first lambda characters with the maximum probability and adding the selected character into password until the randomly selected character is M or the number of the characters of password is equal to the maximum password length;
and S6, when the circulation is equal to the size of the password dictionary, obtaining the password dictionary with the expected size N.
CN202111652277.9A 2021-12-30 2021-12-30 Deep learning-based password dictionary generation technology Pending CN114330306A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202111652277.9A CN114330306A (en) 2021-12-30 2021-12-30 Deep learning-based password dictionary generation technology

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202111652277.9A CN114330306A (en) 2021-12-30 2021-12-30 Deep learning-based password dictionary generation technology

Publications (1)

Publication Number Publication Date
CN114330306A true CN114330306A (en) 2022-04-12

Family

ID=81018564

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202111652277.9A Pending CN114330306A (en) 2021-12-30 2021-12-30 Deep learning-based password dictionary generation technology

Country Status (1)

Country Link
CN (1) CN114330306A (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN115276983A (en) * 2022-07-29 2022-11-01 四川启睿克科技有限公司 Password dictionary management method for penetration test

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN115276983A (en) * 2022-07-29 2022-11-01 四川启睿克科技有限公司 Password dictionary management method for penetration test

Similar Documents

Publication Publication Date Title
Weir et al. Password cracking using probabilistic context-free grammars
Melicher et al. Fast, lean, and accurate: Modeling password guessability using neural networks
Ur et al. Measuring {Real-World} Accuracies and Biases in Modeling Password Guessability
CN109145582B (en) Password guess set generation method based on byte pair encoding, password cracking method and device
CN107579816B (en) Method for generating password dictionary based on recurrent neural network
CN110334488B (en) User authentication password security evaluation method and device based on random forest model
Pasquini et al. Reducing bias in modeling real-world password strength via deep learning and dynamic dictionaries
CN111866004B (en) Security assessment method, apparatus, computer system, and medium
CN109670303A (en) The cryptographic attack appraisal procedure encoded certainly based on condition variation
JP2019153098A (en) Vector generation device, sentence pair leaning device, vector generation method, sentence pair learning method, and program
CN112131888A (en) Method, device and equipment for analyzing semantic emotion and storage medium
Aggarwal et al. New technologies in password cracking techniques
CN114330306A (en) Deep learning-based password dictionary generation technology
CN113312609B (en) Password cracking method and system of generative confrontation network based on strategy gradient
CN110674370A (en) Domain name identification method and device, storage medium and electronic equipment
Deng et al. Efficient password guessing based on a password segmentation approach
CN116018589A (en) Method and system for product quantization based matrix compression
CN116684144A (en) Malicious domain name detection method and device
CN113111329B (en) Password dictionary generation method and system based on multi-sequence long-term and short-term memory network
CN107239693B (en) Analysis method and system based on password coding rule
Biesner et al. Generative deep learning techniques for password generation
CN113726730A (en) DGA domain name detection method and system based on deep learning algorithm
Sun et al. Graph neural networks for contextual asr with the tree-constrained pointer generator
Li et al. PG-Pass: Targeted Online Password Guessing Model based on Pointer Generator Network
Luo et al. Recurrent neural network based password generation for group attribute context-ware applications

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination