CN111241534A - Password guess set generation system and method - Google Patents

Password guess set generation system and method Download PDF

Info

Publication number
CN111241534A
CN111241534A CN202010033647.XA CN202010033647A CN111241534A CN 111241534 A CN111241534 A CN 111241534A CN 202010033647 A CN202010033647 A CN 202010033647A CN 111241534 A CN111241534 A CN 111241534A
Authority
CN
China
Prior art keywords
probability
password
neural network
long
term memory
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202010033647.XA
Other languages
Chinese (zh)
Inventor
杨超
张静
尤伟
郑昱
闫志成
朱泉龙
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Xidian University
Original Assignee
Xidian University
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Xidian University filed Critical Xidian University
Priority to CN202010033647.XA priority Critical patent/CN111241534A/en
Publication of CN111241534A publication Critical patent/CN111241534A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F21/00Security arrangements for protecting computers, components thereof, programs or data against unauthorised activity
    • G06F21/30Authentication, i.e. establishing the identity or authorisation of security principals
    • G06F21/45Structures or tools for the administration of authentication
    • G06F21/46Structures or tools for the administration of authentication by designing passwords or checking the strength of passwords
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/90Details of database functions independent of the retrieved data types
    • G06F16/903Querying
    • G06F16/90335Query processing
    • G06F16/90344Query processing by using string matching techniques
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/90Details of database functions independent of the retrieved data types
    • G06F16/906Clustering; Classification
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/044Recurrent networks, e.g. Hopfield networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/045Combinations of networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • Data Mining & Analysis (AREA)
  • Databases & Information Systems (AREA)
  • Software Systems (AREA)
  • Computational Linguistics (AREA)
  • General Health & Medical Sciences (AREA)
  • Evolutionary Computation (AREA)
  • Molecular Biology (AREA)
  • Computing Systems (AREA)
  • Biophysics (AREA)
  • Biomedical Technology (AREA)
  • Mathematical Physics (AREA)
  • Artificial Intelligence (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Health & Medical Sciences (AREA)
  • Computer Security & Cryptography (AREA)
  • Computer Hardware Design (AREA)
  • Financial Or Insurance-Related Operations Such As Payment And Settlement (AREA)

Abstract

The invention belongs to the technical field of information security, and discloses a password guess set generation system and a method, which are used for generating a probability context-free grammar based on personal information and a password database; dividing character strings in the probability context-free grammar into character strings which are suitable or not suitable for training a long-term and short-term memory neural network according to a classification rule; training a convergent long-short term memory neural network model; generating a password segment and a probability corresponding to the password segment by using the converged long-short term memory neural network model; mapping the probability corresponding to the password segment into a new probability, and sequencing the password segment according to the new probability in a descending order; passwords are generated that are sorted in descending order of probability. The invention makes up the defects that the long-term and short-term memory neural network can not identify the composition structure and semantic information in the password and has poor interpretability; the defect of poor generalization capability of the probabilistic context-free grammar is overcome; the problem that the probability of generating the password section by the long-term and short-term memory neural network is low is solved.

Description

Password guess set generation system and method
Technical Field
The invention belongs to the technical field of information security, and particularly relates to a password guess set generation system and method.
Background
Currently, the closest prior art: with the continuous development of networks and portable electronic devices, people's daily life and networks become closely related, and identity authentication gradually becomes a basic means for guaranteeing user information security. Password is widely used as an identity authentication mode due to its characteristics of simplicity and easy deployment, and has become one of the most important means for protecting user information security in the internet world. Thus, the security and usability of passwords have been a significant concern.
The basic idea of password guessing algorithms, one of the important methods for studying password security, is to generate a set of password guesses for guessing the user's password. The most primitive method for generating the guess set of passwords is to enumerate all the passwords meeting the conditions, but the method has poor feasibility and extremely low efficiency because the whole password searching space is extremely large. Subsequently, one tries to select frequently used words or numbers from a dictionary and combine or slightly modify them as a password guess set, which are relatively simple, fail to accurately describe one's habit of generating passwords, and do not describe the probabilities of those passwords, making guessing inefficient. Then, a Probabilistic model, such as a Probabilistic context-free grammar (PCFG) and a Markov (Markov) model, is used to generate a password guess set, but the Markov model can only calculate the probability of occurrence between preceding and following characters and cannot recognize the composition structure and semantic information of the password, and the Probabilistic context-free grammar can recognize the structure and a part of the semantic information in the password but cannot generalize the password and can only combine the password segments appearing in the training set. Recently, a way of generating a password guess set by using a long-short term memory neural network or a generation countermeasure network has appeared, which obtains a better guess effect by utilizing the fitting advantage of the neural network to a high-dimensional space, has a stronger generalization capability, but like a markov model, cannot identify the composition structure and semantic information in the password.
In summary, the problems of the prior art are as follows: the existing password guess set generation technology has the problems that the composition structure, semantic information and generalization capability of the password cannot be considered, so that the generated password guess set cannot accurately describe the habit of generating the password of people.
The difficulty of solving the technical problems is as follows:
because the probability models utilized by the existing password guess set generation technology are different, the probability of the generated passwords follows respective measurement standards, for example, for the same password, the probabilities given by different password guess set generation technologies are different, sometimes even by several orders of magnitude, so the passwords generated by different models cannot be simply merged together according to the probabilities of the passwords generated by different models, and are sorted in descending order of the probabilities.
The significance of solving the technical problems is as follows:
the habit of constructing passwords of people is further understood, and the weak passwords are reduced; the ability of a password to guess an attacker is further understood, so that a way is wanted to avoid; the safety and the usability of the password are deeply understood, and a system administrator is helped to prevent the weak password from being used; the password guessing speed is increased, and better technical guarantee is provided for digital evidence obtaining.
Disclosure of Invention
Aiming at the problems in the prior art, the invention provides a password guess set generation system and a method.
The invention is realized in such a way that a password guess set generating method comprises the following steps:
firstly, generating a probability context-free grammar based on personal information and a password database; dividing character strings in the probability context-free grammar into character strings which are suitable or not suitable for training a long-term and short-term memory neural network according to a classification rule;
secondly, training a convergent long-short term memory neural network model; generating a password segment and a probability corresponding to the password segment by using the converged long-short term memory neural network model;
thirdly, mapping the probability corresponding to the password segment into a new probability, and sequencing the password segment according to the new probability in a descending order; passwords are generated that are sorted in descending order of probability.
Further, the first step is that according to the personal information mode, the matching, division and marking of the personal information mode are carried out on the personal information and the password in the password database by using a longest prefix matching algorithm, the parts which do not match with the personal information are marked as L, D and S respectively according to the letters, the numbers or the special symbols, the length of the parts is represented by subscripts, then the probability of each character string in each mode is calculated, finally the character strings of the basic terminal and the same mode are sorted according to the descending probability, and the probability context-free grammar based on the personal information and the password database is generated.
Further, the personal information mode includes;
the matching, dividing and marking of the patterns means that a longest prefix matching algorithm is used for matching, then the password is divided into sections, and each section is marked according to the matched pattern;
the probability of the personal information mode is 1, and the personal information is fixed;
the modes are character strings of L, D and S, and the formula for calculating the probability of each character string in each mode is as follows:
Figure BDA0002365244830000031
where n is the number of occurrences of the character string and m is the number of occurrences of the pattern.
Further, the first step classifies the character strings in the generated probabilistic context-free grammar according to classification rules, wherein the classification rules are as follows: if the mode of the character string is a personal information mode, the character string is a character string which is not suitable for training the long-term and short-term memory neural network; if the character string mode is L, D, S mode and the length is less than or equal to 4, the character string is not suitable for training the long-short term memory neural network; if the string pattern is L, D, S pattern and the length is greater than 4, the string is a string suitable for training the long-short term memory neural network.
Further, for each character string suitable for training the long-short term memory neural network, the second step sequentially transmits characters of different moments of the character string into an input layer of the long-short term memory neural network, and the output of the long-short term memory neural network is prediction of a character at the next moment in the character string;
the loss function used in training the long-term and short-term memory neural network is a cross entropy loss function;
the optimization algorithm used in training the long-short term memory neural network is Adam or other self-adaptive learning rate methods;
convergence refers to the parameters of the long-short term memory neural network such that the value of the loss function mathematically converges to a certain value.
Further, the third step maps the probability of the generated password segment with the probability greater than a certain threshold value into a new probability according to a mapping function, and sorts the password segments in descending order according to the new probability, wherein the mapping function is as follows:
Figure BDA0002365244830000041
wherein p isOld ageSome probability of finger generation is greater than pThreshold valueProbability of password segment of (1), Σ piAll probabilities of finger generation are greater than pThreshold valueOf the password segment, pThreshold valueThe choice is free according to the size of the guess set of passwords desired to be generated.
Further, the generated base in the probability context-free grammar is utilizedThe terminal and the probability; generating passwords and probabilities by classifying the passwords and the probabilities into character strings and character segments which are not suitable for training the long-short term memory neural network and mapped password segments and new probabilities, and generating the passwords sorted in descending order of the probabilities by using a next function, wherein the passwords are generated by replacing personal information modes in a basic terminal one by one into corresponding personal information of corresponding modes of guessing targets, the modes are L, D or S and have subscripts less than or equal to 4 and are replaced into corresponding character strings in a probability context-free grammar, the modes are L, D or S and have subscripts greater than 4 and are replaced into the password segments mapped by a probability mapping module, and the probability is calculated by multiplying the probability of the basic terminal by the probability of the corresponding character strings which are not suitable for training the long-short term memory neural network or the new probability of the password segments mapped by the probability mapping module; the final password probability is calculated by
Figure BDA0002365244830000042
It is another object of the present invention to provide a password guess set generating system implementing the password guess set generating method, the password guess set generating system comprising:
the grammar generation module is used for generating a probability context-free grammar based on personal information and a password database;
the character string classification module is used for classifying character strings in the probability context-free grammar generated by the grammar generation module into character strings suitable for training the long and short term memory neural network and character strings not suitable for training the long and short term memory neural network according to classification rules;
the model training module is used for training the long-short term memory neural network by using the character string classification module to classify the character string into the character string suitable for training the long-short term memory neural network and training a convergent long-short term memory neural network model;
the password segment generation module is used for generating a password segment and a probability corresponding to the password segment by using the converged long-short term memory neural network model trained by the model training module;
the probability mapping module is used for mapping the probability corresponding to the password segment into new probability according to a mapping function and sequencing the password segment according to the new probability in a descending order;
and the password generation module is used for generating passwords sorted in descending order according to the probability by utilizing the basic terminal and the probability in the probability context-free grammar generated by the grammar generation module, and the character string classification module classifies the characters which are not suitable for training the long-short term memory neural network, the probability and the password section and the new probability which are mapped by the probability mapping module.
In summary, the advantages and positive effects of the invention are: the invention uses a probability context-free grammar to identify the composition structure and semantic information in the password, and uses a long-short term memory neural network to model part of password segments so as to generate the password segments which do not appear in a training set. Compared with the prior art, the probability context-free grammar is used for overcoming the defects that the long-term and short-term memory neural network cannot identify the composition structure and semantic information in the password and the interpretability is poor; the defect of poor generalization capability of the probabilistic context-free grammar is overcome by utilizing the long-term and short-term memory neural network; the problem that the probability of generating the password segment by the long-term and short-term memory neural network is low is solved by utilizing probability mapping, and the quality of the password guess set is improved.
Drawings
FIG. 1 is a schematic structural diagram of a password guess set generating system according to an embodiment of the present invention;
in the figure: 1. a grammar generation module; 2. a character string classification module; 3. a model training module; 4. a password segment generation module; 5. a probability mapping module; 6. and a password generation module.
Fig. 2 is a flowchart of a password guess set generating method according to an embodiment of the present invention.
Fig. 3 is a schematic diagram of a grammar generation module provided in an embodiment of the present invention.
Fig. 4 is a schematic diagram of a password generation module according to an embodiment of the present invention.
FIG. 5 is a diagram comparing guesses of the prior art and the present invention provided by the embodiment of the present invention.
Detailed Description
In order to make the objects, technical solutions and advantages of the present invention more apparent, the present invention is further described in detail with reference to the following embodiments. It should be understood that the specific embodiments described herein are merely illustrative of the invention and are not intended to limit the invention.
In view of the problems in the prior art, the present invention provides a system and a method for generating a guessing set of password, and the present invention will be described in detail with reference to the accompanying drawings.
As shown in fig. 1, the input of the password guess set generating system provided in the embodiment of the present invention is personal information of the user and a corresponding password, where the personal information of the user includes a user name, a mailbox, a name, a birth date, an identification number, a mobile phone number, and the like. The method specifically comprises the following steps:
the grammar generation module 1 is used for generating a probability context-free grammar based on personal information and a password database;
the character string classification module 2 is used for classifying character strings in the probability context-free grammar generated by the grammar generation module 1 into character strings suitable for training the long-short term memory neural network and character strings unsuitable for training the long-short term memory neural network according to classification rules;
the model training module 3 is used for training the long-short term memory neural network by using the character string classification module 2 to classify the character string into the character string suitable for training the long-short term memory neural network, and training a convergent long-short term memory neural network model;
the password segment generation module 4 is used for generating a password segment and a probability corresponding to the password segment by using the converged long-short term memory neural network model trained by the model training module 3;
the probability mapping module 5 is used for mapping the probability corresponding to the password segment into new probability according to a mapping function and sequencing the password segment according to the new probability in a descending order;
and the password generation module 6 is used for generating passwords which are sorted in descending order according to the probability by utilizing the basic terminal and the probability in the probability context-free grammar generated by the grammar generation module 1, and the character string classification module 2 classifies character strings which are not suitable for training the long-short term memory neural network, the probability and the password sections and the new probability after being mapped by the probability mapping module 5.
As shown in fig. 2, the password guess set generating method provided by the embodiment of the present invention includes the following steps:
s201: the grammar generation module generates a probability context-free grammar based on personal information and a password database; and the character string classification module classifies the character strings in the probability context-free grammar into character strings which are suitable or not suitable for training the long-term and short-term memory neural network according to classification rules.
S202: the model training module is used for training a converged long-term and short-term memory neural network model; the password segment generation module generates a password segment and a probability corresponding to the password segment using the converged long-short term memory neural network model.
S203: the probability mapping module maps the probability corresponding to the password segment into new probability and sorts the password segment in descending order according to the new probability; the password generation module generates passwords sorted in descending order of probability.
The technical solution of the present invention is further described below with reference to the accompanying drawings.
The password guess set generation method provided by the embodiment of the invention is based on the probability context-free grammar and the long-short term memory neural network, and comprises the following steps:
(1) as shown in fig. 3, the grammar generation module 1 matches, divides and marks the personal information patterns with the longest prefix matching algorithm according to the personal information patterns and the passwords in the password database, respectively marks the parts which do not match with the personal information as L, D and S according to the letters, numbers or special symbols, and uses subscripts to indicate the length of the parts, then calculates the probability of each character string in each pattern, and finally sorts the character strings of the basic terminal and the same pattern in descending order according to the probability to generate the probability context-free grammar based on the personal information and the password database.
Wherein the personal information mode includes all modes shown in table 1;
the matching, dividing and marking of the patterns means that the longest prefix matching algorithm is used for matching, then the password is divided into segments, and each segment is processed according to the matched patternRow marks, e.g. the 2 nd password "ls 19850102" in FIG. 3 can be matched to both "A" and "A1B4", may also match to" N15B1", but since" ls1985 "is longer than the length of" ls ", matching" ls1985 "to" A1"rather than matching" ls "to" N15”。
The probability of the personal information pattern is 1 because the personal information in the pattern is fixed for a certain person and is not possible for other character strings.
The modes are character strings of L, D and S, and the formula for calculating the probability of each character string in each mode is as follows:
Figure BDA0002365244830000081
where n is the number of occurrences of the character string and m is the number of occurrences of the pattern.
Descending sort may use any sort algorithm.
(2) The character string classification module 2 classifies the character strings in the probability context-free grammar generated in the step (1) according to classification rules, wherein the classification rules are as follows: if the mode of the character string is a personal information mode, the character string is a character string which is not suitable for training the long-term and short-term memory neural network; if the character string mode is L, D, S mode and the length is less than or equal to 4, the character string is not suitable for training the long-short term memory neural network; if the string pattern is L, D, S pattern and the length is greater than 4, the string is a string suitable for training the long-short term memory neural network.
(3) The model training module 3 trains a long-short term memory neural network model: and (3) training the long-short term memory neural network by using the character strings suitable for training the long-short term memory neural network in the step (2), and training a converged long-short term memory neural network model.
For each character string suitable for training the long-short term memory neural network, the characters of the character string at different moments are sequentially transmitted into an input layer of the long-short term memory neural network, and the output of the long-short term memory neural network is a prediction of the character at the next moment in the character string.
The loss function used in training the long-short term memory neural network is a cross entropy loss function.
The optimization algorithm used in training the long-short term memory neural network is Adam or other adaptive learning rate method.
Convergence refers to the parameters of the long-short term memory neural network such that the value of the loss function mathematically converges to a certain value.
(4) And (4) generating a password segment and a probability corresponding to the password segment by using the converged long-short term memory neural network model generated in the step (3).
(5) And the probability mapping module 5 is used for mapping the probability of the password segment with the probability larger than a certain threshold value generated in the step (4) into a new probability according to a mapping function, and sequencing the password segments according to a descending order of the new probability, wherein the mapping function is as follows:
Figure BDA0002365244830000091
wherein p isOld ageSome probability of finger (4) generation is greater than pThreshold valueProbability of password segment of (1), Σ piAll probabilities of finger (4) generation being greater than pThreshold valueOf the password segment, pThreshold valueThe choice is free according to the size of the guess set of passwords desired to be generated.
(6) As shown in FIG. 4, the password generation module 6 generates passwords and probabilities (31 in FIG. 4) by using (1) the basic terminals and probabilities (31 in FIG. 4) in the generated probabilistic context-free grammar, (2) the passwords and probabilities (33 in FIG. 4) classified as inappropriate for training the long-short term memory neural network and (5) the mapped password segments and new probabilities, and generates the passwords and probabilities in descending order of probabilities by using a next function, wherein the passwords are generated by replacing the personal information patterns in the basic terminals one by one with the corresponding personal information (34 in FIG. 4) of the corresponding patterns of guessing targets, the patterns of which the patterns are L, D or S and the subscript is less than or equal to 4, the patterns of which the patterns are L, D or S and the subscript is greater than 4 are replaced with the corresponding strings in the probabilistic context-free grammar, and the patterns of which the patterns are L, D or S and the subscript is greater than 4 areThe probability of the password segment is calculated by multiplying the probability of the basic terminal by the probability of the corresponding character string which is not suitable for training the long-short term memory neural network or the new probability of the password segment after being mapped by the probability mapping module, for example, the calculation process of the last password probability in the 35 units in FIG. 4 is
Figure BDA0002365244830000092
TABLE 1
Figure BDA0002365244830000093
Figure BDA0002365244830000101
Figure BDA0002365244830000111
The invention is further described below in conjunction with specific experiments/data.
Taking personal information and password data set revealed by 12306 as an example, the data set contains 131653 pieces of data, 65825 pieces of data are randomly selected to be used as training set, the remaining 65828 pieces of data are used as test set, and training is respectively carried out by using the existing password guess set generation technology and the system and method provided by the invention according to the personal information and the password in the training set to obtain corresponding password guess sets, and the number of coverage of each password guess set to the password in the test set is shown in fig. 5.
As can be seen from FIG. 5, the password guess set generated by the system and method (Hybrid in FIG. 5) of the present invention can cover more passwords in the test set under the same size, which means that the password guess set generated by the system and method of the present invention is more in line with the habit of people to construct passwords, thereby helping people reduce the use of weak passwords; the system helps people to understand the ability of guessing attackers by passwords more deeply, thereby thinking of a way to avoid; the system administrator is helped to understand the safety and the usability of the password more deeply and is helped to prevent the weak password from being used; the password guessing speed is increased, and better technical guarantee is provided for digital evidence obtaining.
The above description is only for the purpose of illustrating the preferred embodiments of the present invention and is not to be construed as limiting the invention, and any modifications, equivalents and improvements made within the spirit and principle of the present invention are intended to be included within the scope of the present invention.

Claims (8)

1. A password guess set generating method, characterized in that the password guess set generating method comprises the steps of:
firstly, generating a probability context-free grammar based on personal information and a password database; dividing character strings in the probability context-free grammar into character strings which are suitable or not suitable for training a long-term and short-term memory neural network according to a classification rule;
secondly, training a convergent long-short term memory neural network model; generating a password segment and a probability corresponding to the password segment by using the converged long-short term memory neural network model;
thirdly, mapping the probability corresponding to the password segment into a new probability, and sequencing the password segment according to the new probability in a descending order; passwords are generated that are sorted in descending order of probability.
2. The password guess set generating method as claimed in claim 1, wherein the first step matches, divides and marks the personal information pattern with the password in the personal information and password database using a longest prefix matching algorithm according to the personal information pattern, the parts not matching the personal information are marked L, D and S respectively according to whether they are letters, numbers or special symbols, and the length of the parts is represented by subscripts, then the probability of each character string in each pattern is calculated, and finally the character strings of the basic terminal and the same pattern are sorted in descending order of probability to generate the probability context-free grammar based on the personal information and password database.
3. The password guess set generating method as in claim 2, wherein said personal information pattern includes;
the matching, dividing and marking of the patterns means that a longest prefix matching algorithm is used for matching, then the password is divided into sections, and each section is marked according to the matched pattern;
the probability of the personal information mode is 1, and the personal information is fixed;
the modes are character strings of L, D and S, and the formula for calculating the probability of each character string in each mode is as follows:
Figure FDA0002365244820000011
where n is the number of occurrences of the character string and m is the number of occurrences of the pattern.
4. The password guess set generating method of claim 1, where the first step classifies the character strings in the generated probabilistic context free grammar according to classification rules that are: if the mode of the character string is a personal information mode, the character string is a character string which is not suitable for training the long-term and short-term memory neural network; if the character string mode is L, D, S mode and the length is less than or equal to 4, the character string is not suitable for training the long-short term memory neural network; if the string pattern is L, D, S pattern and the length is greater than 4, the string is a string suitable for training the long-short term memory neural network.
5. The password guess set generating method as recited in claim 1, wherein said second step sequentially transfers the characters of different time instants of each character string suitable for training the long-short term memory neural network into the input layer of the long-short term memory neural network, and the output of the long-short term memory neural network is a prediction of the character of the next time instant in the character string;
the loss function used in training the long-term and short-term memory neural network is a cross entropy loss function;
the optimization algorithm used in training the long-short term memory neural network is Adam or other self-adaptive learning rate methods;
convergence refers to the parameters of the long-short term memory neural network such that the value of the loss function mathematically converges to a certain value.
6. The password guess set generating method of claim 1, wherein said third step maps the probabilities of password segments having a generated probability greater than a certain threshold to new probabilities by a mapping function and sorts the password segments in descending order of the new probabilities, the mapping function being:
Figure FDA0002365244820000021
wherein p isOld ageSome probability of finger generation is greater than pThreshold valueProbability of password segment of (1), Σ piAll probabilities of finger generation are greater than pThreshold valueOf the password segment, pThreshold valueThe choice is free according to the size of the guess set of passwords desired to be generated.
7. The password guess set generating method of claim 1, wherein said third step generates the passwords in descending order of probability using the basic terminals and probabilities in the generated probabilistic context free grammar; generating passwords and probabilities by classifying the passwords and the probabilities into character strings and character segments which are not suitable for training the long-short term memory neural network and mapped password segments and new probabilities, and generating the passwords sorted in descending order of the probabilities by using a next function, wherein the passwords are generated by replacing personal information modes in a basic terminal one by one into corresponding personal information of corresponding modes of guessing targets, the modes are L, D or S and have subscripts less than or equal to 4 and are replaced into corresponding character strings in a probability context-free grammar, the modes are L, D or S and have subscripts greater than 4 and are replaced into the password segments mapped by a probability mapping module, and the probability is calculated by multiplying the probability of the basic terminal by the probability of the corresponding character strings which are not suitable for training the long-short term memory neural network or the new probability of the password segments mapped by the probability mapping module; the last password probability is calculated as p (456 < ainiya) < p (D)3S1L6)*p(D3→″456″)*p(S1→″$″)*p(L6→ainiya)=0.17*0.33*0.33*0.08=0.00148104。
8. A password guess set generating system for implementing the password guess set generating method as recited in any one of claims 1 to 7, said password guess set generating system comprising:
the grammar generation module is used for generating a probability context-free grammar based on personal information and a password database;
the character string classification module is used for classifying character strings in the probability context-free grammar generated by the grammar generation module into character strings suitable for training the long and short term memory neural network and character strings not suitable for training the long and short term memory neural network according to classification rules;
the model training module is used for training the long-short term memory neural network by using the character string classification module to classify the character string into the character string suitable for training the long-short term memory neural network and training a convergent long-short term memory neural network model;
the password segment generation module is used for generating a password segment and a probability corresponding to the password segment by using the converged long-short term memory neural network model trained by the model training module;
the probability mapping module is used for mapping the probability corresponding to the password segment into new probability according to a mapping function and sequencing the password segment according to the new probability in a descending order;
and the password generation module is used for generating passwords sorted in descending order according to the probability by utilizing the basic terminal and the probability in the probability context-free grammar generated by the grammar generation module, and the character string classification module classifies the characters which are not suitable for training the long-short term memory neural network, the probability and the password section and the new probability which are mapped by the probability mapping module.
CN202010033647.XA 2020-01-13 2020-01-13 Password guess set generation system and method Pending CN111241534A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202010033647.XA CN111241534A (en) 2020-01-13 2020-01-13 Password guess set generation system and method

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202010033647.XA CN111241534A (en) 2020-01-13 2020-01-13 Password guess set generation system and method

Publications (1)

Publication Number Publication Date
CN111241534A true CN111241534A (en) 2020-06-05

Family

ID=70877689

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202010033647.XA Pending CN111241534A (en) 2020-01-13 2020-01-13 Password guess set generation system and method

Country Status (1)

Country Link
CN (1) CN111241534A (en)

Cited By (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112257433A (en) * 2020-12-23 2021-01-22 四川大学 Password dictionary generation method and system based on Markov chain and neural network
CN112613325A (en) * 2021-01-04 2021-04-06 上海交通大学 Password semantic structuralization realization method based on deep learning
CN112861528A (en) * 2021-01-19 2021-05-28 复旦大学 Markov password recovery method based on password internal semantic driving
CN113886784A (en) * 2021-12-06 2022-01-04 华南理工大学 Password guessing method for improving guessing efficiency of small training set based on corpus

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20180075348A1 (en) * 2016-09-09 2018-03-15 Cylance Inc. Machine learning model for analysis of instruction sequences
CN107947921A (en) * 2017-11-22 2018-04-20 上海交通大学 Based on recurrent neural network and the password of probability context-free grammar generation system
CN108763920A (en) * 2018-05-23 2018-11-06 四川大学 A kind of password strength assessment model based on integrated study
CN110334488A (en) * 2019-06-14 2019-10-15 北京大学 User authentication password security appraisal procedure and device based on Random Forest model

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20180075348A1 (en) * 2016-09-09 2018-03-15 Cylance Inc. Machine learning model for analysis of instruction sequences
CN107947921A (en) * 2017-11-22 2018-04-20 上海交通大学 Based on recurrent neural network and the password of probability context-free grammar generation system
CN108763920A (en) * 2018-05-23 2018-11-06 四川大学 A kind of password strength assessment model based on integrated study
CN110334488A (en) * 2019-06-14 2019-10-15 北京大学 User authentication password security appraisal procedure and device based on Random Forest model

Non-Patent Citations (3)

* Cited by examiner, † Cited by third party
Title
MATT WEIR ETC.: "Password Cracking Using Probabilistic Context-Free Grammars", 《IEEE》 *
宋创创等: "基于集成学习的口令强度评估模型", 《计算机应用》 *
王星星: "基于个人信息的口令猜测技术研究与系统实现", 《硕士电子期刊》 *

Cited By (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112257433A (en) * 2020-12-23 2021-01-22 四川大学 Password dictionary generation method and system based on Markov chain and neural network
CN112257433B (en) * 2020-12-23 2021-05-14 四川大学 Password dictionary generation method and system based on Markov chain and neural network
CN112613325A (en) * 2021-01-04 2021-04-06 上海交通大学 Password semantic structuralization realization method based on deep learning
CN112861528A (en) * 2021-01-19 2021-05-28 复旦大学 Markov password recovery method based on password internal semantic driving
CN113886784A (en) * 2021-12-06 2022-01-04 华南理工大学 Password guessing method for improving guessing efficiency of small training set based on corpus

Similar Documents

Publication Publication Date Title
CN111241534A (en) Password guess set generation system and method
CN109977416B (en) Multi-level natural language anti-spam text method and system
CN110347835B (en) Text clustering method, electronic device and storage medium
CN109887484A (en) A kind of speech recognition based on paired-associate learning and phoneme synthesizing method and device
CN113722483B (en) Topic classification method, device, equipment and storage medium
CN107908642B (en) Industry text entity extraction method based on distributed platform
CN113297366B (en) Emotion recognition model training method, device, equipment and medium for multi-round dialogue
CN111680161B (en) Text processing method, equipment and computer readable storage medium
CN112052331A (en) Method and terminal for processing text information
CN111709223B (en) Sentence vector generation method and device based on bert and electronic equipment
CN113821587A (en) Text relevance determination method, model training method, device and storage medium
Jami et al. Biometric template protection through adversarial learning
CN115730597A (en) Multi-level semantic intention recognition method and related equipment thereof
CN113220828B (en) Method, device, computer equipment and storage medium for processing intention recognition model
Wang et al. Sin: Semantic inference network for few-shot streaming label learning
CN111291078B (en) Domain name matching detection method and device
CN115730237B (en) Junk mail detection method, device, computer equipment and storage medium
CN116739067A (en) Method, device, equipment and storage medium for learning few-sample model
CN115512693B (en) Audio recognition method, acoustic model training method, device and storage medium
US11475069B2 (en) Corpus processing method, apparatus and storage medium
CN112069392B (en) Method and device for preventing and controlling network-related crime, computer equipment and storage medium
CN113918696A (en) Question-answer matching method, device, equipment and medium based on K-means clustering algorithm
Du et al. Combating word-level adversarial text with robust adversarial training
Zhang et al. Filtering algorithm of spam short messages based on artificial immune system
CN110705275A (en) Theme word extraction method and device, storage medium and electronic equipment

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
AD01 Patent right deemed abandoned
AD01 Patent right deemed abandoned

Effective date of abandoning: 20230707