CN111553155B - Password word segmentation system and method based on semantic structure - Google Patents

Password word segmentation system and method based on semantic structure Download PDF

Info

Publication number
CN111553155B
CN111553155B CN202010356699.0A CN202010356699A CN111553155B CN 111553155 B CN111553155 B CN 111553155B CN 202010356699 A CN202010356699 A CN 202010356699A CN 111553155 B CN111553155 B CN 111553155B
Authority
CN
China
Prior art keywords
semantic
password
unit
name
nlp
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN202010356699.0A
Other languages
Chinese (zh)
Other versions
CN111553155A (en
Inventor
邱卫东
贾兴磊
田昊
郭捷
唐鹏
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Shanghai Jiaotong University
Original Assignee
Shanghai Jiaotong University
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Shanghai Jiaotong University filed Critical Shanghai Jiaotong University
Priority to CN202010356699.0A priority Critical patent/CN111553155B/en
Publication of CN111553155A publication Critical patent/CN111553155A/en
Application granted granted Critical
Publication of CN111553155B publication Critical patent/CN111553155B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/20Natural language analysis
    • G06F40/279Recognition of textual entities
    • G06F40/289Phrasal analysis, e.g. finite state techniques or chunking
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F21/00Security arrangements for protecting computers, components thereof, programs or data against unauthorised activity
    • G06F21/30Authentication, i.e. establishing the identity or authorisation of security principals
    • G06F21/31User authentication
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/30Semantic analysis
    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y02TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
    • Y02DCLIMATE CHANGE MITIGATION TECHNOLOGIES IN INFORMATION AND COMMUNICATION TECHNOLOGIES [ICT], I.E. INFORMATION AND COMMUNICATION TECHNOLOGIES AIMING AT THE REDUCTION OF THEIR OWN ENERGY USE
    • Y02D10/00Energy efficient computing, e.g. low power processors, power management or thermal management

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Audiology, Speech & Language Pathology (AREA)
  • Computational Linguistics (AREA)
  • General Health & Medical Sciences (AREA)
  • Health & Medical Sciences (AREA)
  • Artificial Intelligence (AREA)
  • Computer Security & Cryptography (AREA)
  • Computer Hardware Design (AREA)
  • Software Systems (AREA)
  • Machine Translation (AREA)

Abstract

A password word segmentation system and method based on semantic structure comprises: the system comprises a preprocessing module, an NLP semantic extraction module and a non-NLP semantic annotation module, wherein: the preprocessing module receives a password to be segmented, extracts special semantic factors which cannot be identified in the subsequent steps in the password, pre-segments the rest parts according to character types, outputs letter parts to the NLP semantic extraction module, and outputs non-letter parts to the non-NLP semantic labeling module; the NLP semantic extraction module divides words from the letter part of the password by using an NLP tool to obtain various semantic factors; the non-NLP semantic annotation module carries out semantic annotation on the part of the password, which cannot be segmented by the NLP tool. According to the invention, the password is segmented according to the semantic information contained in the password according to the corpus, the semantic structure of the password is identified, and the password set by the Chinese user and the English user can be accurately segmented.

Description

Password word segmentation system and method based on semantic structure
Technical Field
The invention relates to a technology in the field of computer security, in particular to a password word segmentation system and method based on a semantic structure.
Background
Text passwords are still widely used in user authentication and online services of computer systems at present due to their excellent security and usability. Since most of the passwords used by users are defined by the users, in order to facilitate memorization, the users often select a plurality of character strings containing specific semantics or rules as the passwords, so that research on the semantic structures of the passwords has important significance for improving the password security of the users.
Unlike natural language, the password has no fixed grammar structure, and the user can arbitrarily combine various semantic factors according to the rules of the website when setting the password, so that the word segmentation method aiming at natural language is not suitable for password word segmentation.
In the past, most of the research on password semantic structures aims at English user passwords, and the word segmentation method proposed for the English user passwords often has poor performance on a Chinese leakage library because of certain difference between the English user passwords and Chinese user passwords. Several researchers have begun to study the password of a central user in recent years, and the study shows that it is effective to add additional semantic information in a word segmentation system, but what information is added and how to add information are still subjective judgments, and no systematic method exists.
Disclosure of Invention
Aiming at the defects in the prior art, the invention provides a password word segmentation system and method based on a semantic structure, which are used for segmenting the password according to semantic information contained in the password according to a corpus, identifying the semantic structure of the password and accurately segmenting the password set by a Chinese user and an English user.
The invention is realized by the following technical scheme:
the invention relates to a password word segmentation system based on a semantic structure, which comprises the following steps: a preprocessing module, a Natural Language Processing (NLP) semantic extraction module and a non-NLP semantic annotation module, wherein: the preprocessing module receives a password to be segmented, extracts special semantic factors which cannot be identified in the subsequent steps in the password, pre-segments the rest parts according to character types, outputs letter parts to the NLP semantic extraction module, and outputs non-letter parts to the non-NLP semantic labeling module; the NLP semantic extraction module divides words from the letter part of the password by using an NLP tool to obtain various semantic factors; the non-NLP semantic annotation module carries out semantic annotation on the part of the password, which cannot be segmented by the NLP tool.
The special semantic factors include: keyboard structure, website, email.
The part incapable of word segmentation by using the NLP tool comprises: numbers, special characters.
The preprocessing module comprises: the electronic mail box extraction device comprises a keyboard structure extraction unit, an electronic mail box extraction unit, a website extraction unit and a character segmentation unit, wherein: the keyboard structure extracting unit extracts a part of the password related to the distribution rule of the keyboard keys, namely, extracts the keyboard structure in the password, the electronic mailbox extracting unit extracts the electronic mailbox address contained in the password, the website extracting unit extracts the website contained in the password, and the character segmentation unit carries out word segmentation on the password according to different character types.
The NLP semantic extraction module comprises: a word segmentation unit, a part of speech tagging (POS) unit and a semantic classification unit, wherein: the word segmentation unit is used for segmenting the letter part input from the preprocessing module by using a Natural Language Tool Kit (NLTK) and outputting a result to the POS unit; the POS unit marks the input factors by using a POS module of NLTK, and outputs semantic factors needing further classification to the semantic classification unit; the semantic classification unit further classifies the named entity factors by using a character string matching method, marks the named entity factors as abbreviation categories of place names, months, male names, female names and Chinese names, matches unidentified factors in a pinyin list, marks the matched factors as pinyin, marks the unidentified factors as abbreviations when the unidentified factors accord with the rule of consonant letters with the length exceeding 3 bits, and otherwise marks the unidentified factors.
The semantic factors to be further classified include: named entity, unidentified fragment.
The non-NLP semantic annotation module comprises: the system comprises a digital marking module and a special character marking module, wherein: the digital labeling module carries out corresponding labeling on the digital fragments containing the specific semantics, and labels the digital fragments with unknown semantics according to the length of the digital fragments; the special character labeling unit labels the special character segments according to the lengths of the special character segments.
The specific semantics include: date, year, phone number.
Technical effects
The invention solves the problem of word segmentation of passwords of different languages and different leakage libraries;
compared with the prior art, the method and the device have the advantages that the password is extracted from the semantic factors comprising various character types such as the keyboard structure, the email box, the website and the like in advance before the password is formally segmented, so that semantic loss caused by word segmentation according to the character types is avoided, the word segmentation accuracy is improved, the keyboard structure contained in the password can be effectively extracted, and the word segmentation accuracy is improved; the invention adds various semantic factors such as place name, chinese name abbreviation, pinyin, abbreviation, mobile phone number, keyboard structure, website address, email box and the like in the word segmentation system, improves the word segmentation accuracy and simultaneously realizes word segmentation of the Chinese website password.
Drawings
FIG. 1 is a schematic diagram of a system according to the present invention.
Detailed Description
As shown in fig. 1, this embodiment relates to a password word segmentation system based on semantic structures, which includes: the system comprises a preprocessing module, an NLP semantic extraction module and a non-NLP semantic classification module, wherein: the preprocessing module is connected with the NLP semantic extraction module and transmits letter parts obtained by pre-words in the preprocessing process, and the preprocessing module is connected with the non-NLP semantic labeling module and transmits numbers and special character parts obtained by pre-words in the preprocessing process.
Three special semantic factors (keyboard structure, website and email) are predefined in the preprocessing module, and one substring in the password is extracted in the keyboard structure extraction unit
Figure BDA0002473712320000021
Figure BDA0002473712320000035
Adjacent on the keyboard, and it<shift>The key states are the same, the substring is determined +.>
Figure BDA0002473712320000031
Is a keyboard structure ([ KB ]]) The specific tag is determined by its length ([ KB 4)],[KB5]… …); the website extraction unit detects whether a website exists in the password through the prefixes of ' www ' and ' http:// ', and matches a substring in a common domain name suffix list after ' www ' or ' http:// ' is detected and the substring is matched with the substring, and the two substrings are separated by ' oneOr a plurality of character strings, the character string from the prefix to the domain name suffix is determined to be a web address ([ Website ]]) The method comprises the steps of carrying out a first treatment on the surface of the In the email box extraction unit, the format of the '@' + domain name is used as the format of the email box, the user name before the '@' is reserved as a common character string, and word segmentation is performed in the following steps. When a string in the format of a '@' + domain name is matched in the string, it is determined that it is an electronic mailbox ([ email ")])。
The NLP semantic extraction module comprises: a word segmentation unit, a part of speech tagging (POS) unit, and a semantic classification unit for recognition of named entities, wherein: the recognition of the word segmentation unit to the named entity specifically refers to: adopting a twice word segmentation algorithm, adding a named entity list containing four semantic factors ([ location ], [ mole_name ]) into an NLTK tool for word segmentation for the first time, and adding a named entity list containing five semantic factors ([ location ], [ mole_name ], [ cn_name_abbr ]) into the NLTK tool for secondary word segmentation when unidentified fragments still exist after the first round of operation; the semantic classification unit marks the [ NP ] segment as one of named entities ([ location ], [ mole ], [ mole_name ], [ fe_name ], [ cn_name_abbr ]) through carrying out character string matching with the named entity list; for [ NN ] segmentation, firstly, judging through character string matching: when in the pinyin list, labeled [ PY ], otherwise, when the length is greater than 3 and the consonants are all consonants, the English abbreviation ([ abbr ]) is judged to be possible, and when the English abbreviation is not the two cases, the [ NN ] label is kept unchanged.
The password word segmentation method based on the semantic structure of the system comprises the following specific steps:
s1) the preprocessing module reads a password P to be segmented.
S2) in the keyboard structure extraction unit, for one substring in the password
Figure BDA0002473712320000032
Figure BDA0002473712320000033
Adjacent on the keyboard, and it<shift>The key states are the same, the substring is determined +.>
Figure BDA0002473712320000034
Keyboard structure ([ KB ]]) The specific tag is determined by its length ([ KB 4)],[KB5],……)。
S3) detecting whether a Website exists in the password through the prefixes of ' www ' and ' http:// ', and when ' www ' or ' http:// ' is detected and matched with a sub-string in a common domain name suffix list, and one or more character strings separated by ' are arranged between the two sub-strings, judging that the character string from the prefix to the domain name suffix is the Website ([ Website ]).
S4) the email box extracting unit uses the format of the '@' + domain name as the format of the email box, and the user name before the '@' is reserved as a common character string, and word segmentation is carried out in the following steps. When a string in the format of a '@' + domain name is matched in the string, it is determined to be an electronic mailbox ([ email ]).
S5) outputting unlabeled parts in the password to a character word segmentation unit, and pre-segmenting according to different character types (numbers, letters and special characters) of the password, wherein the pre-segmentation is respectively labeled as numbers ([ number ]), letters ([ word ]), and special characters ([ special ]).
After passing through the preprocessing module, the password 1qaziloveyou123@ becomes (1 qaz, KB 4), (iloveyou, word), (123, number), (@, special).
S6) outputting the fragments marked as word into NLTK, wherein the corpus used in the word segmentation process is a Brown corpus and a Web Text corpus, and a plurality of named entity lists are added in the corpus, and represent 5 semantic factors: four English semantic factors ([ location ], [ Month ], [ size_name ]),) and Chinese name abbreviation ([ cn_name_abbr ]), wherein Chinese name abbreviation is not added first, only four English named entities are added for word segmentation, and when the word segmentation result contains unrecognized fragments, chinese name abbreviation is added for second word segmentation.
S7) after word segmentation, outputting word segmentation results to a POS unit for semantic annotation, wherein the semantic annotation comprises the following semantic factors with specific semantics: pronouns ([ NOUN ]), NOUNs ([ NOUN ]), qualifiers ([ DET ]), adjectives ([ ADJ ]), VERBs ([ VERB ]), prepositions ([ ADP ]), adverbs ([ ADV ]), small works ([ PRT ]), conjunctions ([ CONJ ]), english words representing numbers ([ NUM ]), and suffixes ([ X ]). In the POS labeling process, a sequence reversing labeling device is used for labeling, brown trigram tagger is firstly used, bigram tags are used, onegar tags are used, fragments appearing in a named entity list are labeled as [ NN ], and unidentified fragments are labeled as [ NN ].
S8) after POS labeling, further semantic classification is carried out on the [ NN ] segments. The NP fragments are labeled as one of the named entities ([ location ], [ mole ], [ mole_name ], [ fe_name ], [ cn_name_abbr ]) by string matching with the named entity list.
S9) for [ NN ] segmentation, firstly judging through character string matching: when in the pinyin list, labeled [ PY ], otherwise, when the length is greater than 3 and the consonants are all consonants, the English abbreviation ([ abbr ]) is judged to be possible, and when the English abbreviation is not the two cases, the [ NN ] label is kept unchanged.
S10) outputting the segment marked as [ number ] in the S5) to a digital semantic classification unit, wherein for the digital segment with the length of 4 bits, the segment is regarded as year and marked as [ year ] when the segment is between 1900 and 2020; for a fragment of length 6, when the date format of YYMMDD is satisfied, the determination is a date, labeled [ YYMMDD ]; for a fragment of length 8, when the date format of YYYYMMDD is satisfied, the determination is a date, labeled [ YYYYMMDD ]; for the 11-bit segment, when the format of the cell phone number is satisfied, the cell phone number is determined to be [ mobile phone ], and the remaining digital segments are labeled as [ num1], [ num2], … … according to the length thereof.
S11) inputting the [ special ] segment in the S5) into a special character labeling unit, and labeling the special character labeling unit as [ spec1], [ spec2], … … according to the length of the special character labeling unit.
S12) sequentially combining the labels of all the fragments to form the semantic structure of the password.
In this embodiment, 13 leakage libraries including 6 chinese libraries (CSDN, skya, you, 17173, ai-bat, dudu cattle) and 7 english libraries (LinkedIn, zoosk, myspace, rockyou, myHeritage, gmail, webhost) were selected, and the word segmentation effect of the method was tested, and specific test results are shown in table 1.
TABLE 1
Figure BDA0002473712320000051
According to the test, the word segmentation result does not contain NN factors as a standard for successful word segmentation of the password, and it can be seen that the embodiment can obtain high word segmentation success rate on a Chinese leakage library and an English leakage library, and particularly the test word segmentation success rate on the Chinese leakage library reaches more than 90%, so that the effectiveness of the embodiment can be illustrated.
Compared with the prior art, four leakage libraries are selected for testing, wherein two middle libraries (17173 and pats) and two English libraries (LinkedIn and Gmail) are arranged, and the word segmentation success rates of the embodiment on the four leakage libraries are 92.22%, 91.24%, 79.37%, 84.19%, which are obviously higher than 65.17%, 60.88%, 62.26% and 67.14% of the prior art.
The foregoing embodiments may be partially modified in numerous ways by those skilled in the art without departing from the principles and spirit of the invention, the scope of which is defined in the claims and not by the foregoing embodiments, and all such implementations are within the scope of the invention.

Claims (8)

1. A password segmentation system based on semantic structures, comprising: a preprocessing module, a Natural Language Processing (NLP) semantic extraction module and a non-NLP semantic annotation module, wherein: the preprocessing module receives a password to be segmented, extracts special semantic factors which cannot be identified in the subsequent steps in the password, pre-segments the rest parts according to character types, outputs letter parts to the NLP semantic extraction module, and outputs non-letter parts to the non-NLP semantic labeling module; the NLP semantic extraction module divides words from the letter part of the password by using an NLP tool to obtain various semantic factors; the non-NLP semantic annotation module carries out semantic annotation on the part of the password, which cannot be segmented by the NLP tool;
the preprocessing module comprises: the electronic mail box extraction device comprises a keyboard structure extraction unit, an electronic mail box extraction unit, a website extraction unit and a character segmentation unit, wherein: the keyboard structure extracting unit extracts a part of the password related to the distribution rule of the keyboard keys, namely, extracts the keyboard structure in the password, the electronic mailbox extracting unit extracts the electronic mailbox address contained in the password, the website extracting unit extracts the website contained in the password, and the character segmentation unit carries out word segmentation on the password according to different character types.
2. The system of claim 1, wherein the NLP semantic extraction module comprises: the system comprises a word segmentation unit, a part-of-speech labeling POS unit and a semantic classification unit, wherein: the word segmentation unit is used for segmenting the letter part input from the preprocessing module by using a Natural Language Tool Kit (NLTK) and outputting a result to the POS unit; the POS unit marks the input factors by using a POS module of NLTK, and outputs semantic factors needing further classification to the semantic classification unit; the semantic classification unit further classifies the named entity factors by using a character string matching method, marks the named entity factors as abbreviation categories of place names, months, male names, female names and Chinese names, matches unidentified factors in a pinyin list, marks the matched factors as pinyin, marks the unidentified factors as abbreviations when the unidentified factors accord with the rule of consonant letters with the length exceeding 3 bits, and otherwise marks the unidentified factors.
3. The system of claim 1, wherein the non-NLP semantic annotation module comprises: the system comprises a digital marking module and a special character marking module, wherein: the digital labeling module carries out corresponding labeling on the digital fragments containing the specific semantics, and labels the digital fragments with unknown semantics according to the length of the digital fragments; the special character labeling unit labels the special character segments according to the lengths of the special character segments.
4. The system according to claim 2, wherein the recognition of named entities by the word segmentation unit is specifically: and adding a named entity list containing four semantic factors [ location ], [ mole_name ] and [ mole_name ] into the NLTK tool for word segmentation for the first time by adopting a twice word segmentation algorithm, and adding a named entity list containing five semantic factors [ location ], [ mole_name ] and [ cn_name_abbr ] into the NLTK tool for secondary word segmentation when unrecognized segments still exist after the first round of operation.
5. The system of claim 2, wherein the semantic classification unit marks the [ NP ] segment as one of a named entity [ location ], [ mol ], [ mol_name ], [ fe_name ], [ cn_name_abbr ] by performing a string match with a named entity list; for [ NN ] segmentation, firstly, judging through character string matching: when in the pinyin list, labeled [ PY ], otherwise, when the length is greater than 3 and the consonants are all consonants, the judgment is probably the English abbreviation [ abbr ], and when the two conditions are not the same, the [ NN ] label is kept unchanged.
6. A method of password segmentation based on semantic structures based on the system of any one of the preceding claims, comprising the steps of:
s1, a preprocessing module reads a password P to be segmented;
s2, in the keyboard structure extraction unit, for one substring in the password
Figure QLYQS_1
When->
Figure QLYQS_2
And
Figure QLYQS_3
adjacent on the keyboard, and it<shift>The key states are the same, the substring is determined +.>
Figure QLYQS_4
Is a keyboard structure [ KB ]]And its label is determined by its length; />
S3, detecting whether a Website exists in the password through prefixes of ' www ' and ' http:// ', and when ' www ' or ' http:// ' is detected and the sub-string is matched with a sub-string in a common domain name suffix list, one or more character strings separated by ' are arranged between the two sub-strings, judging that the character string from the prefix to the domain name suffix is the Website [ Website ];
s4, the email box extracting unit takes the format of the '@' + domain name as the format of the email box, and takes the user name before the '@' as a common character string; when the character string is matched with the character string in the format of the '@' + domain name, judging that the character string is an electronic mailbox [ email ];
s5, outputting unlabeled parts in the password to a character word segmentation unit, and pre-segmenting according to different character types, numbers, letters and special characters, wherein the unlabeled parts are labeled as numbers, letters and special characters;
s6, outputting the fragments marked as word into NLTK, wherein the corpus used in the word segmentation process is a Brown corpus and a Web Text corpus, and a plurality of named entity lists are added in the Brown corpus and the Web Text corpus;
s7, outputting the word segmentation result to a POS unit for semantic annotation, wherein the annotation is as follows: pronouns [ NOUN ], NOUNs [ NOUN ], qualifiers [ DET ], adjectives [ ADJ ], VERBs [ VERB ], prepositions [ ADP ], adverbs [ ADV ], small article words [ PRT ], conjunctions [ CONJ ], english words representing numbers [ NUM ] and suffixes [ X ];
s8, after POS labeling, further semantic classification is carried out on the [ NN ] segments; by matching the character string with the named entity list, labeling the [ NP ] segment as one of named entities [ location ], [ mole ], [ mole_name ], [ fe_name ], [ cn_name_abbr ];
s9, for [ NN ] segmentation, firstly judging through character string matching: when the Chinese phonetic alphabet is in the phonetic alphabet list, marking as [ PY ], otherwise, when the Chinese phonetic alphabet is longer than 3 and is consonant, judging as English abbreviation [ abbr ], and when the Chinese phonetic alphabet is not in the two cases, keeping the [ NN ] label unchanged;
s10, outputting the segment marked as the number in S5 to a digital semantic classification unit, wherein for the digital segment with the length of 4 bits, the segment is regarded as year and marked as the year between 1900 and 2020; for a fragment of length 6, when the date format of YYMMDD is satisfied, the determination is a date, labeled [ YYMMDD ]; for a fragment of length 8, when the date format of YYYYMMDD is satisfied, the determination is a date, labeled [ YYYYMMDD ]; for the 11-bit segment, when the format of the mobile phone number is met, judging that the mobile phone number is the mobile phone number, marking the mobile phone as [ mobile ], and marking the rest digital segments according to the length of the digital segments;
s11, inputting the special segment in the S5 into a special character labeling unit, and labeling according to the length of the special character labeling unit;
s12, the labels of all the fragments are combined together in sequence to form the semantic structure of the password.
7. The password segmentation method according to claim 6, wherein the named entity list in step S6 includes: four English semantic factors [ location ], [ month ], [ rule_name ], and Chinese name abbreviation [ cn_name_abbr ], wherein the Chinese name abbreviation is not added first, only four English named entities are added for word segmentation, and when the word segmentation result contains unrecognized fragments, the Chinese name abbreviation is added for second word segmentation.
8. The method of claim 6, wherein step s7 is performed by using a sequence reversing labeler in the POS labeling process, wherein first using Brown trigram tagger, then using bigram tagger, and finally using onegar tagger, the segments appearing in the named entity list are labeled [ NN ], and the unidentified segments are labeled [ NN ].
CN202010356699.0A 2020-04-29 2020-04-29 Password word segmentation system and method based on semantic structure Active CN111553155B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202010356699.0A CN111553155B (en) 2020-04-29 2020-04-29 Password word segmentation system and method based on semantic structure

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202010356699.0A CN111553155B (en) 2020-04-29 2020-04-29 Password word segmentation system and method based on semantic structure

Publications (2)

Publication Number Publication Date
CN111553155A CN111553155A (en) 2020-08-18
CN111553155B true CN111553155B (en) 2023-05-09

Family

ID=71999272

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202010356699.0A Active CN111553155B (en) 2020-04-29 2020-04-29 Password word segmentation system and method based on semantic structure

Country Status (1)

Country Link
CN (1) CN111553155B (en)

Families Citing this family (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112784227A (en) * 2021-01-04 2021-05-11 上海交通大学 Dictionary generating system and method based on password semantic structure
CN113657118B (en) * 2021-08-16 2024-05-14 好心情健康产业集团有限公司 Semantic analysis method, device and system based on call text

Citations (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109460552A (en) * 2018-10-29 2019-03-12 朱丽莉 Rule-based and corpus Chinese faulty wording automatic testing method and equipment

Patent Citations (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109460552A (en) * 2018-10-29 2019-03-12 朱丽莉 Rule-based and corpus Chinese faulty wording automatic testing method and equipment

Also Published As

Publication number Publication date
CN111553155A (en) 2020-08-18

Similar Documents

Publication Publication Date Title
US8447588B2 (en) Region-matching transducers for natural language processing
US8266169B2 (en) Complex queries for corpus indexing and search
Lita et al. Truecasing
US8510097B2 (en) Region-matching transducers for text-characterization
Tang et al. Email data cleaning
Ekbal et al. Bengali part of speech tagging using conditional random field
JP5113750B2 (en) Definition extraction
KR100858545B1 (en) Apparatus and method for handwriting recognition
Gupta et al. A survey of common stemming techniques and existing stemmers for indian languages
Abbasi et al. Applying authorship analysis to Arabic web content
Zhang et al. Automatic detecting/correcting errors in Chinese text by an approximate word-matching algorithm
Cing et al. Improving accuracy of part-of-speech (POS) tagging using hidden markov model and morphological analysis for Myanmar Language
CN111553155B (en) Password word segmentation system and method based on semantic structure
Patil et al. Issues and challenges in marathi named entity recognition
Khan et al. Urdu word segmentation using machine learning approaches
US20100094615A1 (en) Document translation apparatus and method
Venčkauskas et al. Problems of authorship identification of the national language electronic discourse
Gupta et al. Designing and development of stemmer of Dogri using unsupervised learning
Tufiş et al. DIAC+: A professional diacritics recovering system
Charoenpornsawat et al. Automatic sentence break disambiguation for Thai
Jain et al. Detection and correction of non word spelling errors in Hindi language
Rychlý et al. Annotated amharic corpora
Wang et al. Chinese-braille translation based on braille corpus
Li et al. Contextual post-processing based on the confusion matrix in offline handwritten Chinese script recognition
CN111767733A (en) Document security classification discrimination method based on statistical word segmentation

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant