CN114330306A

CN114330306A - Deep learning-based password dictionary generation technology

Info

Publication number: CN114330306A
Application number: CN202111652277.9A
Authority: CN
Inventors: 刘慧敏; 肖晟
Original assignee: Hunan University
Current assignee: Hunan University
Priority date: 2021-12-30
Filing date: 2021-12-30
Publication date: 2022-04-12

Abstract

The invention discloses a password dictionary generation technology based on deep learning, which is characterized in that a password meeting experimental requirements is screened out from a data set, word2vec is used for carrying out word embedding processing on the password to obtain a character vector corresponding to characters forming the password, the character vector is input into an lstm neural network after certain processing, and the password dictionary generation is carried out according to an obtained model. Compared with PCFG (probability context-free grammar), Markov model, GAN and the like, the LSTM has better performance in the aspect of natural language processing, so the invention treats characters forming the password as words in natural language for processing, can obtain the password dictionary which accords with the habit of setting the password by human and has higher hit rate by applying the LSTM method, can improve weak password dictionary and password explosion effect to a certain extent, and can be applied to various security scenes.

Description

Deep learning-based password dictionary generation technology

Technical Field

The invention relates to the technical field of password data, in particular to a password dictionary generation technology based on deep learning.

Background

While the internet is developed, various security events are accompanied, a plurality of password data sets are revealed, and the revealed user passwords also reflect some information of the user set passwords. With these data sets, there are many researches on password attack, password generation, user password habits, etc., where password generation has become an emerging problem to improve the efficiency of identity verification in social engineering and plays an important role in checking security vulnerabilities.

More and more researchers have proven that methods in neural networks are more accurate and practical for password guessing than traditional methods. Meanwhile, the password created by human beings is learned, a model which is more in line with the habit of generating the password by the human beings can be trained and generated, and an anthropomorphic password dictionary is generated through the obtained model, so that the generated password dictionary is better represented in the password guessing process.

Disclosure of Invention

Aiming at the defects of the prior art, the invention provides a password dictionary generation technology based on deep learning, which can generate a password dictionary more conforming to the habit of generating passwords by human beings, so that password guessing based on deep learning has better effect compared with the traditional methods of traversing weak password dictionaries, brute force password cracking and the like.

In order to achieve the purpose, the invention is realized by the following technical scheme: a deep learning-based password dictionary generation technology specifically comprises the following steps:

s1, washing the password data set, screening out the passwords meeting the experimental requirements, and screening out the passwords which contain the characters which are the combination of letters, numbers and symbols and have the length of 8-16 bits. And dividing the data set into a training set, a validation set and a test set;

s2, word2vec carries out word embedding processing on the password to obtain character vectors corresponding to characters forming the password, and preparation is made for inputting an lstm neural network in the next step;

s3, forming an input list consisting of the character vectors by using the character vectors generated by the step S2, and completing the character vectors with space less than 16 bits as the input of the whole model; taking the ascii code corresponding to the character as a label, removing all the characters with the first bit, and complementing 16 bits by a fixed numerical value M;

s4, inputting the vectors processed in S3 into an lstm neural network model, setting the size of an output list, selecting cross entropy to calculate the loss of the model, obtaining model parameters with small loss and excellent effect through training for several times, and using the model parameters as the generation of a password dictionary;

s5, using the model obtained in S4, generating a password dictionary.

Preferably, in S1, the screening method is to screen out a password which contains characters which are combinations of letters, numbers and symbols and has a length of 8-16 bits, and the data set is divided into a training set, a verification set and a test set.

Preferably, in said S3, we finally need to determine what value to output, which will be based on the cell state, but also a filtered version, first running a sigmoid layer to determine which part of the cell state to output, then processing the cell state by tanh (to obtain a value between-1 and 1) and multiplying it with the output of the sigmoid gate, and finally outputting only that part which determines the output.

Preferably, in S5, the generating of the password dictionary specifically includes the following steps:

s1, acquiring the probability of the first letter of all password data for later use, and initializing an empty set S;

s2, setting the size of the password dictionary as N, and circularly generating a password sequence by taking the condition that whether the password number count (S) in S is less than N or not;

s3, initializing a null set password, randomly selecting one of the first lambda characters with the maximum probability and adding the selected character into the password;

s4, circularly generating password characters by taking the condition that whether the number of the characters in password is less than or equal to the maximum password length or not; loading the contents of password into a prediction function in the model to obtain the probability of the next character;

s5, randomly selecting one of the first lambda characters with the maximum probability and adding the selected character into password until the randomly selected character is M or the number of the characters of password is equal to the maximum password length;

and S6, when the circulation is equal to the size of the password dictionary, obtaining the password dictionary with the expected size N.

Advantageous effects

The invention aims to optimize the existing non-intelligent password cracking modes such as traversing weak password dictionaries, brute force cracking and the like; compared with PCFG (probability context-free grammar), Markov model, GAN and the like, LSTM has better performance in the aspect of natural language processing, so the characters forming the password are taken as words in natural language to be processed, a password dictionary which accords with the habit of setting the password by human and has higher hit rate can be obtained by applying the LSTM method, a weak password dictionary and a password blasting effect can be improved to a certain extent, and the LSTM can be applied to various security scenes.

Drawings

FIG. 1 is a flow diagram of a deep learning cryptographic dictionary generation technique of the present invention;

FIG. 2 is a schematic diagram of an LSTM neural network according to the present invention;

FIG. 3 is a flowchart of the generation of a cryptographic dictionary in accordance with the present invention.

Detailed Description

The technical solutions in the embodiments of the present invention will be clearly and completely described below with reference to the drawings in the embodiments of the present invention, and it is obvious that the described embodiments are only a part of the embodiments of the present invention, and not all of the embodiments. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present invention.

Referring to fig. 1-3, the present invention provides a technical solution: a deep learning-based password dictionary generation technology specifically comprises the following steps:

s5, generating a dictionary using the model obtained in S4, and the contents not described in detail in this specification are all prior art known to those skilled in the art.

In the present invention, in S1, the screening method is to screen out a password which contains characters which are combinations of letters, numbers and symbols and has a length of 8-16 bits, and to divide the data set into a training set, a verification set and a test set.

In the present invention, in said S3, we finally need to determine what value to output, which will be based on the cell state, but is also a filtered version, first run a sigmoid layer to determine which part of the cell state to output, then process the cell state by tanh (to get a value between-1 and 1) and multiply it with the output of the sigmoid gate, and finally output only that part which determines the output.

In the present invention, in S5, the password dictionary generation specifically includes the following steps:

It is noted that, herein, relational terms such as first and second, and the like may be used solely to distinguish one entity or action from another entity or action without necessarily requiring or implying any actual such relationship or order between such entities or actions. Also, the terms "comprises," "comprising," or any other variation thereof, are intended to cover a non-exclusive inclusion, such that a process, method, article, or apparatus that comprises a list of elements does not include only those elements but may include other elements not expressly listed or inherent to such process, method, article, or apparatus.

Although embodiments of the present invention have been shown and described, it will be appreciated by those skilled in the art that changes, modifications, substitutions and alterations can be made in these embodiments without departing from the principles and spirit of the invention, the scope of which is defined in the appended claims and their equivalents.

Claims

1. A deep learning-based password dictionary generation technology is characterized in that: the method specifically comprises the following steps:

s5, using the model obtained in S4, generating a password dictionary.

2. The deep learning based password dictionary generation technology according to claim 1, wherein: in S1, the screening method is to screen out a password which contains characters which are combinations of letters, numbers and symbols and has a length of 8-16 bits, and to divide the data set into a training set, a verification set and a test set.

3. The deep learning based password dictionary generation technology of claim 4, wherein: in said S3, we finally need to determine what value to output, which will be based on the cell state, but is also a filtered version, first run a sigmoid layer to determine which part of the cell state to output, then process the cell state by tanh (to get a value between-1 and 1) and multiply it with the output of the sigmoid gate, and finally output only that part which determines the output.

4. The deep learning based password dictionary generation technology according to claim 1, wherein: in S5, the password dictionary generation specifically includes the following steps: