CN105653912B - A kind of method and device for identifying batch registration behavior - Google Patents

A kind of method and device for identifying batch registration behavior Download PDF

Info

Publication number
CN105653912B
CN105653912B CN201410639883.0A CN201410639883A CN105653912B CN 105653912 B CN105653912 B CN 105653912B CN 201410639883 A CN201410639883 A CN 201410639883A CN 105653912 B CN105653912 B CN 105653912B
Authority
CN
China
Prior art keywords
character string
extension information
information
vector
user name
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN201410639883.0A
Other languages
Chinese (zh)
Other versions
CN105653912A (en
Inventor
顾思源
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Advanced New Technologies Co Ltd
Advantageous New Technologies Co Ltd
Original Assignee
Alibaba Group Holding Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Alibaba Group Holding Ltd filed Critical Alibaba Group Holding Ltd
Priority to CN201410639883.0A priority Critical patent/CN105653912B/en
Publication of CN105653912A publication Critical patent/CN105653912A/en
Application granted granted Critical
Publication of CN105653912B publication Critical patent/CN105653912B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Abstract

The embodiment of the present application provides a kind of method and device for identifying batch registration behavior.This method includes:The log-on message to be identified of default quantity is chosen, obtains the second extension information of the user name character string of E-mail address in log-on message, the first extension information of corresponding surname and name;It searches registered user first and extends the position that information and the second extension information occur for the first time in the user name character string of E-mail address, the user name character string of E-mail address is divided according to the first extension information position and the second extension information position, the feature of the various pieces after being divided using vector description and then is classified;The ratio that each vector accounts for institute's directed quantity is counted, when the ratio that some vector accounts for institute's directed quantity is more than or equal to first threshold, judges to include the log-on message of batch registration in the vector in all log-on messages.

Description

A kind of method and device for identifying batch registration behavior
Technical field
This application involves field of communication technology more particularly to a kind of method and devices for identifying batch registration behavior.
Background technology
With the fast development of the communication technology and computer technology, using increasingly extensive, people note Internet in website One account of volume can be issued copyright, various important informations by Internet, carry out the web trade and communication etc..Greatly The surname of registrant and name and E-mail address are filled in requirement during more website registrations.Often it is present with batch registration row in reality For this batch registration is usually associated with the behavior invaded computer system.Therefore, batch registration behavior how is identified So as to preferably solve the problems, such as that computer system security is a urgent problem to be solved.
The content of the invention
The purpose of the embodiment of the present application is to provide a kind of method and device for identifying batch registration behavior, to reduce malice The harm of batch registration behavior.
To achieve the above object, the embodiment of the present application provides a kind of method for identifying batch registration behavior, and this method includes:
The log-on message to be identified of default quantity is chosen, the log-on message includes surname, name and the registration electricity of registered user The address of sub-voice mailbox;
The user name character string of E-mail address in the log-on message is obtained, and the note is obtained according to pre-defined rule The first extension information of surname and the second extension information of corresponding name are corresponded in volume information;
It searches registered user first and extends the user name character of information and the second extension information in the E-mail address The position occurred for the first time in string obtains the first extension information position and the second extension information position;
According to the described first position of the extension information in the user name character string of the E-mail address and described the Two extend position of the information in the user name character string of the E-mail address by the user name of the E-mail address Character string is divided, using the feature of the various pieces after division described in vector description, and with the vector to the electronics The user name character string of email address is classified;
The ratio that each vector accounts for institute's directed quantity is counted, when the ratio that some vector accounts for institute's directed quantity is more than or equal to the first threshold During value, judge to include the log-on message of batch registration in the vector in all log-on messages.
On the other hand the application also provides a kind of device for identifying batch registration behavior, which includes:
Default unit for the log-on message of the address according to the surname comprising registered user, name and registration E-mail address, obtains The user name character string of E-mail address in the log-on message and registered user first is taken to extend information and the second extension Information;
Information unit is obtained, for obtaining the user name character string of E-mail address in the log-on message, and according to Pre-defined rule obtains the second extension information of the first extension information that surname is corresponded in the log-on message and corresponding name;
Searching unit extends information and the second extension information in the E-mail address for searching registered user first User name character string in the position that occurs for the first time, obtain the first extension information position and institute second extend an information position;
Taxon, for extending information in the user name character string of the E-mail address according to described first Position and described second extends position of the information in the user name character string of the E-mail address by the E-mail address The user name character string of address is divided, using described in vector description division after various pieces feature, and with it is described to Amount classifies to the user name character string of the E-mail address;
Statistic unit, for counting the ratio that each vector accounts for institute's directed quantity, when some vector accounts for the ratio of institute's directed quantity During more than or equal to first threshold, it is possible to judge to include the log-on message of batch registration in the vector in all log-on messages.
By above technical solution provided by the embodiments of the present application as it can be seen that the surname and name that the embodiment of the present application passes through registered user The user name character string of E-mail address is divided, and using the feature of the various pieces after vector description division, and Classified with vector to the user name character string of E-mail address, the ratio that each vector accounts for institute's directed quantity is counted, when certain When the ratio that a vector accounts for institute's directed quantity is more than or equal to first threshold, it is possible to judge to include in all log-on messages in the vector The log-on message of batch registration provides foundation for further accurate identification batch registration.
Description of the drawings
It in order to illustrate the technical solutions in the embodiments of the present application or in the prior art more clearly, below will be to embodiment or existing There is attached drawing needed in technology description to be briefly described, it should be apparent that, the accompanying drawings in the following description is only this Some embodiments described in application, for those of ordinary skill in the art, in the premise of not making the creative labor property Under, it can also be obtained according to these attached drawings other attached drawings.
Fig. 1 is a kind of method flow schematic diagram for identifying batch registration behavior provided by the embodiments of the present application;
Fig. 2 is a kind of schematic diagram of device for identifying batch registration behavior provided by the embodiments of the present application.
Specific embodiment
It is in order to make those skilled in the art better understand the technical solutions in the application, real below in conjunction with the application The attached drawing in example is applied, the technical solution in the embodiment of the present application is clearly and completely described, it is clear that described implementation Example is merely a part but not all of the embodiments of the present application.Based on the embodiment in the application, this field is common Technical staff's all other embodiments obtained without creative efforts should all belong to the application protection Scope.
The specific implementation of the embodiment of the present application is described in detail with specific example below.
The embodiment of the present application provides a kind of method for identifying batch registration behavior, as shown in Figure 1, this method includes:
Step S101:Choose the log-on message to be identified of default quantity, the log-on message include the surname of registered user, name and Register the address of E-mail address.
In practice, the surname for filling in registered user, name and electronics postal are generally may require that when some websites are registered Case address.Surname and surname and name that name can be Chinese character, or English surname and name.
Step S102:The user name character string of E-mail address in log-on message is obtained, and is obtained according to pre-defined rule The first extension information of surname and the second extension information of corresponding name are corresponded in log-on message.
Pre-defined rule includes the surname phonetic spelling corresponding with name acquisition Chinese character and phonetic head according to registered user Letter.When surname or file-name field include more than one Chinese character, pre-defined rule includes the phonetic of tactic each Chinese character Spelling and tactic each Chinese character first letter of pinyin.
Pre-defined rule includes the surname spelling corresponding with name English word and initial according to registered user.When surname or When file-name field includes more than one English word, pre-defined rule includes the spelling of tactic each English word, Yi Jishun Each English word initial of sequence arrangement.
That is, the spelling of the first extension information including surname and the initial of surname, the second extension information include name spelling and The initial of name.
For example surname is:Ouyang, the spelling of surname is ouyang, and the initial of surname is oy.Second extension information be name it is complete The initial with name is spelled, when name is made of two and more than two Chinese characters or English, spelling or whole names for whole names Initial, it is such as entitled:On the sunny side, the spelling of name is xiangyang, and the initial of name is xy.
Step S103:The the first extension information and second for searching registered user extend information in above-mentioned E-mail address The position occurred for the first time in user name character string obtains the first extension information position and the second extension information position.
In practice, when searching the first extension information, the spelling of surname is first searched, when the spelling of surname can not be found, then Search the initial of surname;When searching the second extension information, the spelling of name is first searched, when the spelling of name can not be found, then is looked into Look for the initial of name.
In practice, search registered user first and extend the use of information and the second extension information in the E-mail address The position occurred for the first time in name in an account book character string includes:Judge that the registered user first of the log-on message extends information and second Information is extended with the presence or absence of inclusion relation, specifically judges that the spelling of surname and the spelling of name whether there is inclusion relation.
When inclusion relation is not present in the spelling of surname and the spelling of name, the first extension information is in the E-mail address Position in user name character string is that the user name character string of E-mail address finds the first extension for the first time from left to right The position of information;Position of the second extension information in the user name character string of E-mail address is the use of E-mail address Name in an account book character string finds the position of the second extension information for the first time from left to right.
Such as:The user name character string of E-mail address is " 23likimsi#p ", and the surname of user is Lee, entitled four, this When first extension information include " li " or " l ", second extension information include " si " or " s ".Then in character string " 23likimsip# " When searching successively from left to right, the character string " li " found for the first time regards as the first extension information namely character " 3 " and word The character string " li " between " k " is accorded with as the first extension information, which is the first extension information in electronics Position in the user name character string of email address;When being searched successively from left to right in character string " 23likimsi#p ", for the first time The character string " si " found is that the character string " si " between the second extension information namely character " m " and character " # " is the second expansion Open up information, position of character string " si " position for the second extension information in the user name character string of E-mail address It puts.
When the spelling of surname and the spelling of name are there are inclusion relation, and the spelling of surname is consistent with the spelling of name, determine preferential Information is searched, preferential information of searching is the first extension information or the second extension information, when the first extension information is believed for preferential search During breath, the second extension information searches information for second;When the second extension information is preferentially searches information, the first extension information is Second searches information;The user name character string of E-mail address searches the preferential position for searching information for the first time from left to right; Find the position of the second lookup information for the first time backward from the position for finding preferential lookup information for the first time.
Such as:The user name character string of E-mail address is " 23likimli#p ", and the surname of user is Lee, entitled power, this When surname and second extension information it is consistent, include " li " or " l ".
If the first extension information searches information to be preferential, the second extension information searches information for second, then character string When being searched successively from left to right in " 23likimli#p ", the character string " li " found for the first time regards as the first extension information, Namely the character string " li " between character " 3 " and character " k " extends information, the word between the character " 3 " and character " k " for first Position of symbol string " li " position for the first extension information in the user name character string of E-mail address, from the first extension Information character string " li " beginning is searched successively backward, and the character string " li " found for the first time regards as the second extension information, Character string " li " i.e. between character " m " and character " # " is the second extension information, the character between the character " m " and character " # " Position of string " li " position for the second extension information in the user name character string of E-mail address.
If entitled preferential lookup information, when being searched successively from left to right in character string " 23likimli#p ", look into for the first time The character string " li " that the character string " li " found regards as between the second extension information namely character " 3 " and character " k " is second Information is extended, character string " li " position between the character " 3 " and character " k " extends information in E-mail address for second Position in the user name character string of location since the second extension information character string " li " is searched and is found for the first time successively backward Character string " li " regard as character string " li " between the first extension information namely character " m " and character " # " as the first extension Information, character string " li " position between the character " m " and character " # " are the first extension information in E-mail address Position in user name character string.
When the spelling of surname and the spelling of name are there are inclusion relation, and the string length of the spelling of surname is more than the spelling of name String length;Position of the first extension information in the user name character string of E-mail address is the use of E-mail address Name in an account book character string finds the position of the first extension information for the first time from left to right;Second extension information is in E-mail address After position in user name character string is removal the first extension information, the user name character string of E-mail address from left to right the Once find the position of the second extension information.
Such as:The user name character string of E-mail address is " 23likimlin#p ", and the surname of user is woods, entitled beautiful, The first extension information includes " lin " or " l " at this time, and the second extension information includes the complete of the spelling and name of " li " or " l " namely surname There are inclusion relation between spelling, and the string length of the spelling of surname is more than the string length of the spelling of name.Then first search One extension information position, then the second extension information position is searched, in character string " 23likimlin#p " from left to right Character string " lin " is found when searching successively for the first time to regard as between the first extension information namely character " m " and character " # " Character string " lin " for the first extension information, character string " lin " position between the character " m " and character " # " is first Extend position of the information in the user name character string of E-mail address, then the removal word from character string " 23likimlin#p " Symbol string " lin " finds character string " li " and regards as between the second extension information namely character " 3 " and character " k " for the first time afterwards Character string " li " for the second extension information, character string " li " position between the character " 3 " and character " k " is the second expansion Open up position of the information in the user name character string of E-mail address.
When the spelling of surname and the spelling of name are there are inclusion relation, and the string length of the spelling of surname is less than the spelling of name String length;When including the first extension information and the second extension information in the user name character string of E-mail address, the Two positions of the extension information in the user name character string of E-mail address be E-mail address user name character string from Find the position of the second extension information for the first time from left to right;First extension information user name in the address of the E-mail address After position in character string is removal the second extension information, the user name character string of E-mail address from left to right first The secondary position for finding the first extension information.
Such as:If the user name character string of E-mail address is " 23likimlin#p ", the surname of user is Lee, entitled Beautiful jade, at this time the first extension information include " li " or " l ", the second extension information includes the spelling and name of " lin " or " l " namely surname Spelling between there are inclusion relation, and the string length of the spelling of surname is less than the string length of the spelling of name.Then search When first search the second extension information position, then search the first extension information position, character string " 23likimlin#p " In find character string " lin " for the first time and regard as the second extension information namely character " m " and word when searching successively from left to right The character string " lin " between " # " is accorded with as the second extension information, where the character string " lin " between the character " m " and character " # " Position is the second position of the extension information in the user name character string of E-mail address, in character string " 23likimlin#p " Removal character string " lin " finds character string " li " and regards as the first extension information namely character " 3 " and character for the first time afterwards Character string " li " between " k " is the first extension information, character string " li " position between the character " 3 " and character " k " For position of the first extension information in the user name character string of E-mail address.
Step S104:Expanded according to position and second of the first extension information in the user name character string of E-mail address Position of the information in the user name character string of E-mail address is opened up to be drawn the user name character string of E-mail address Point, using the feature of the various pieces after the description division of 11 bit vectors, and with 11 bit vectors to the user name of E-mail address Character string is classified.
In practice, using the feature of 11 bit vectors record the first extension information, the feature and first of the second extension information The feature for the character string that extension information position and the second extension information position divide;
11 bit vectors are divided into primary vector part, secondary vector part, the 3rd vector portion, the first extension information vector portion Divide and the second extension information vector part, primary vector part, secondary vector part and the 3rd vector portion all include 3 marks Know, the first extension information vector part and the second extension information vector part all include 1 bit identification, and primary vector part is 11 1 to 3 of vector, secondary vector part is 5 to 7 of 11 bit vectors, and the 3rd vector portion is 9 to 11 of 11 bit vectors, First extension information vector part is the 4th of 11 bit vectors, and the second extension information vector part is the 8th of 11 bit vectors, 11 bit vectors can be described with table 1:
Table 1
In addition, it should be clear to a person skilled in the art that is not limited only to for each bit identification sequential system of 11 bit vectors A kind of mode is stated, can also be other forms in practice, the embodiment of the present application is not limited thereto.
Letter in the application includes " a-z " and " A-Z ", and number includes " 0-9 ", and spcial character refers to Email addresses Other characters in the character allowed in user name character string in addition to number and letter, such as in common Email clothes now It is engaged in device, in the user name character string of Email addresses in addition to allowing letter and number occur, also allows " _ " character occur, this In " _ " character be exactly a kind of spcial character, certainly it will be apparent to those skilled in the art that as Email servers are to user In name character string the requirement of character is allowed to change, respective change can also occur for the species of corresponding spcial character, such as In other embodiment, above-mentioned spcial character can also include " #, *,!," etc. characters.The embodiment of the present application is not limited thereto.
When including the first extension information and the second extension information in the user name character string of E-mail address;
First extension information of the first extension information of primary vector part record position and the user name of E-mail address Character string feature between one end of character string, the end do not include the first extension information bit between the first extension information position The the first extension information and the second extension information of the second extension information position put.It is equivalent to when the first extension information expands second Open up information before when, primary vector part record be E-mail address user name character start of string to first extension information Position first extension information start between character string feature;When the first extension information is behind the second extension information, Primary vector part record is that the user name character string ending of E-mail address is expanded to the first of the first extension information position Open up the character string feature between information ending.
Such as:The user name character string of E-mail address is " 23likimsi#p ", and the surname of user is Lee, entitled four, this When first extension information include " li " or " l ", second extension information include " si " or the user of " s " namely E-mail address When the first extension information is before the second extension information in name character string, primary vector part record character string " 23likimsi# P " start to first extension information " li " start between character string " 23 " feature;When the user name character of E-mail address Go here and there as " 23sikimli#p ", the surname of user is Lee, entitled four, at this time the first extension information include " li " or " l ", the second extension The first extension information extends information second in user name character string of the information including " si " or " s " namely E-mail address When below, primary vector part record is that the ending of the first extension information character string " li " ends up to character string " 23sikimli#p " Between character string " #p " feature.
The of first extension information of the first extension of secondary vector part record information position and the second extension information position Character string feature between two extension information, the first extension information and the second extension information not including the first extension information position The character string feature of second extension information of position.It is equivalent to when the first extension information is before the second extension information, second Vector portion record is that the first extension information ending of the first extension information position is expanded to the second of the second extension information position Exhibition information start between character string feature;When the first extension information is behind the second extension information, secondary vector part Record is that the second of the second extension information position the first extension information for extending information ending to the first extension information position is opened Character string feature between beginning.
Such as:The user name character string of E-mail address is " 23likimsi#p ", and the surname of user is Lee, entitled four, this When first extension information include " li " or " l ", second extension information include " si " or the user of " s " namely E-mail address When the first extension information is before the second extension information in name character string, secondary vector part record the first extension information character string Li ending to second extension information character string " si " start between character string " kim " feature;When the use of E-mail address Name in an account book character string is " 23sikimli#p ", and the surname of user is Lee, entitled four, at this time the first extension information include " li " or " l ", Second extension information includes " si " or " s " namely the user name character string first of E-mail address extends information in the second expansion When opening up behind information, record the second extension information character string " si " in secondary vector part ends up to the first extension information character string " li " start between character string " kim " feature.
The second of the second extension information of 3rd vector portion record position extends information to the user name of E-mail address The character string feature of the character string other end, the other end do not include the first extension information bit between the second extension information position The the first extension information and the second extension information of the second extension information position put.It is equivalent to when the first extension information expands second When opening up before information, the 3rd vector portion record is the second extension information ending of the second extension information position to E-mail address Character string feature between the user name character string ending of address;When the first extension information is behind the second extension information, 3rd vector portion record is that the user name character start of string of E-mail address expands to the second of the second extension information position Exhibition information start between character string feature.
Such as:The user name character string of E-mail address is " 23likimsi#p ", and the surname of user is Lee, entitled four, this When first extension information include " li " or " l ", second extension information include " si " or the user of " s " namely E-mail address When the first extension information is before the second extension information in name character string, the second extension information " si " of the 3rd vector portion record is tied Tail to character string " 23sikimli#p " end up between character string " #p " feature;When the user name character of E-mail address Go here and there as " 23sikimli#p ", the surname of user is Lee, entitled four, at this time the first extension information include " li " or " l ", the second extension Information includes " si " or " s " namely the user name character string first of E-mail address extends information after the second extension information During face, the 3rd vector portion record character string " 23sikimli#p " start to second extension information character string " si " start between Character string " 23 " feature.
First extension information vector part record the first extension information character string feature.
Second extension information vector part record the second extension information character string feature.
Check primary vector part, secondary vector part, the 3rd vector portion, the first extension information vector part and second The character string feature of information vector part record is extended, is specifically included:
The character string of primary vector part record is checked whether comprising letter, if comprising letter by the of 11 bit vector 1 is labeled as 1, if being labeled as 0 by the 1st of 11 bit vector not comprising letter;Check the character string of primary vector record Whether comprising number, if comprising number by the 2nd of 11 bit vector labeled as 1, if not comprising number by this 11 to The 2nd of amount is labeled as 0;The character string of primary vector record is checked whether comprising spcial character, if will comprising spcial character The 3rd of 11 bit vector is labeled as 1, if being labeled as 0 by the 3rd of 11 bit vector not comprising spcial character;
The character string of secondary vector part record is checked whether comprising letter, if comprising letter by the of 11 bit vector 5 are labeled as 1, if being labeled as 0 by the 5th of 11 bit vector not comprising letter;Check the character string of secondary vector record Whether comprising number, if comprising number by the 6th of 11 bit vector labeled as 1, if not comprising number by this 11 to The 6th of amount is labeled as 0;The character string of secondary vector record is checked whether comprising spcial character, if will comprising spcial character The 7th of 11 bit vector is labeled as 1, if being labeled as 0 by the 7th of 11 bit vector not comprising spcial character.
The character string of the 3rd vector record is checked whether comprising letter, if comprising letter by the 9th of 11 bit vector Labeled as 1, if being labeled as 0 by the 9th of 11 bit vector not comprising letter;Whether check the 3rd vectorial character string recorded Comprising number, if comprising number by the 10th of 11 bit vector labeled as 1, if not comprising number by 11 bit vector 10th is labeled as 0;The character string of the 3rd vector record is checked whether comprising spcial character, if comprising spcial character by this 11 The 11st of bit vector is labeled as 1, if being labeled as 0 by the 11st of 11 bit vector not comprising spcial character.
Check whether first extension information vector part record the first extension information character string includes the spelling of surname, if bag Spelling containing surname, labeled as 2, checks first extension information vector part record the first extension information by the 4th of 11 bit vector Whether character string includes the initial of surname, if the initial comprising surname is labeled as 1 by the 4th of 11 bit vector.
Check whether second extension information vector part record the second extension information character string includes the spelling of name, if bag Spelling containing name, labeled as 2, checks first extension information vector part record the second extension information by the 8th of 11 bit vector Whether character string includes the initial of name, if the initial comprising name is labeled as 1 by the 8th of 11 bit vector.
It should be clear to a person skilled in the art that for realizing that it is above-mentioned logical that the mark mode of 11 bit vector meanings is not limited only to A kind of mode of numeral mark is crossed, can also be other forms in practice, the embodiment of the present application is not limited thereto.
The setting method of above-mentioned 11 bit vector is described in detail with several specific examples below:
When inclusion relation is not present in the spelling of surname and the spelling of name, for example the user name character string of E-mail address is " 23likimsi#p ", the surname of user are Lee, entitled four, at this time the first extension information include " li " or " l ", the second extension information The first extension information is before the second extension information in user name character string including " si " or " s " namely E-mail address When, then the feature of primary vector part record character string " 23 " namely primary vector part are " 010 ";First extension information to The feature of amount part record character string " li " namely the first extension information vector part are " 2 ";Secondary vector part records character The feature of string " kim " namely secondary vector part are " 100 ";The spy of second extension information vector part record character string " si " Sign namely the second extension information vector part are " 2 ";The feature namely three-dimensional of 3rd vector portion record character string " #p " Amount part is " 101 ";Namely 11 be expressed as (01021002101) for vector, the concrete meaning of 11 bit vector is as shown in table 2.
Table 2
Vector value Meaning
0 Primary vector part record character string does not include letter
1 Primary vector part record character string includes number
0 Primary vector part record character string does not include spcial character
2 First extension information vector part record character string is the spelling of surname
1 Secondary vector part record character string includes letter
0 Secondary vector part record character string does not include number
0 Secondary vector part record character string does not include spcial character
2 The spelling that second extension information vector part record character string is run after fame
1 3rd vector portion record character string includes letter
0 3rd vector portion record character string does not include number
1 3rd vector portion record character string includes spcial character
When the user name character string of E-mail address is " 23sikimli#p ", the surname of user is Lee, entitled four, at this time First extension information includes " li " or " l ", and the second extension information includes " si " or the user name of " s " namely E-mail address When the first extension information is behind the second extension information in character string, then primary vector part records the feature of character string " #p ", Namely primary vector part is " 101 ";The feature of first extension information vector part record character string " li " namely the first extension Information vector part is " 2 ";The feature of secondary vector part record character string " kim " namely secondary vector part are " 100 "; The feature of second extension information vector part record character string " si " namely the second extension information vector part are " 2 ";Three-dimensional The feature of amount part record character string 23 namely the 3rd vector portion are " 010 ";Then 11 bit vectors are expressed as (10121002010), the concrete meaning of 11 bit vector is as shown in table 3.
Table 3
Vector value Meaning
1 Primary vector part record character string includes letter
0 Primary vector part record character string does not include number
1 Primary vector part record character string includes spcial character
2 First extension information vector part record character string is the spelling of surname
1 Secondary vector part record character string includes letter
0 Secondary vector part record character string does not include number
0 Secondary vector part record character string does not include spcial character
2 The spelling that second extension information vector part record character string is run after fame
0 3rd vector portion record character string does not include letter
1 3rd vector portion record character string includes number
0 3rd vector portion record character string does not include spcial character
When the spelling of surname and the spelling of name are there are inclusion relation, and the spelling of surname is consistent with the spelling of name, such as electronics The user name character string of email address is " 23likimli#p ", and the surname of user is Lee, and entitled power, first extends packet at this time It includes " li " or " l ", the second extension information includes " li " or " l ", if the first extension information searches information, the second extension to be preferential Information searches information for second, then the feature of primary vector part record character string " 23 " namely primary vector part are “010”;The feature namely first of character string " li " between first extension information vector part record character " 3 " and character " k " It is " 2 " to extend information vector part;The feature of secondary vector part record character string " kim " namely secondary vector part are “100”;The feature namely second of character string " li " between second extension information vector part record character " m " and character " # " It is " 2 " to extend information vector part;The feature of 3rd vector portion record character string " #p " namely the 3rd vector portion are “101”;Then 11 (01021002101) is expressed as vector, the concrete meaning of 11 bit vector is as shown in table 4.
Table 4
Vector value Meaning
0 Primary vector part record character string does not include letter
1 Primary vector part record character string includes number
0 Primary vector part record character string does not include spcial character
2 First extension information vector part record character string is the spelling of surname
1 Secondary vector part record character string includes letter
0 Secondary vector part record character string does not include number
0 Secondary vector part record character string does not include spcial character
2 The spelling that second extension information vector part record character string is run after fame
1 3rd vector portion record character string includes letter
0 3rd vector portion record character string does not include number
1 3rd vector portion record character string includes spcial character
Table 5
Vector value Meaning
1 Primary vector part record character string includes letter
0 Primary vector part record character string does not include number
1 Primary vector part record character string includes spcial character
2 First extension information vector part record character string is the spelling of surname
1 Secondary vector part record character string includes letter
0 Secondary vector part record character string does not include number
0 Secondary vector part record character string does not include spcial character
2 The spelling that second extension information vector part record character string is run after fame
0 3rd vector portion record character string does not include letter
1 3rd vector portion record character string includes number
0 3rd vector portion record character string does not include spcial character
If the second extension information searches information to be preferential, the first extension information searches information for second, then primary vector The feature of part record character string " #p " namely primary vector part are " 101 ";First extension information vector part record character The feature of character string " li " between " m " and character " # " namely the first extension information vector part are " 2 ";Secondary vector part It is " 100 " to record the feature of character string " kim " namely secondary vector part;Second extension information vector part record character " 3 " The feature of character string " li " between character " k " namely the second extension information vector part are " 2 ";3rd vector portion is remembered It is " 010 " to record the feature of character string " 23 " namely the 3rd vector portion;Then 11 (10121002010) are expressed as vector, this 11 The concrete meaning of bit vector is as shown in table 5.
When the spelling of surname and the spelling of name are there are inclusion relation, and the string length of the spelling of surname is more than the spelling of name String length, such as the user name character string of E-mail address are " 23likimlin#p ", and the surname of user is woods, entitled Beautiful, the first extension information includes " lin " or " l " at this time, and the second extension information includes " li " or " l ", then primary vector part is remembered It is " 101 " to record the feature of character string " #p " namely primary vector part;First extension information vector part record character " m " and The feature of character string " lin " between character " # " namely the first extension information vector part are " 2 ";Secondary vector part records The feature of character string " kim " namely secondary vector part are " 100 ";Second extension information vector part record character " 3 " and word The feature namely the second extension information vector part for according with the character string " li " between " k " are " 2 ";3rd vector portion records word The feature of symbol string " 23 " namely the 3rd vector portion are " 010 ";Then 11 bit vectors are expressed as (10121002010), this 11 to The concrete meaning of amount is as shown in table 6.
Table 6
Vector value Meaning
1 Primary vector part record character string includes letter
0 Primary vector part record character string does not include number
1 Primary vector part record character string includes spcial character
2 First extension information vector part record character string is the spelling of surname
1 Secondary vector part record character string includes letter
0 Secondary vector part record character string does not include number
0 Secondary vector part record character string does not include spcial character
2 The spelling that second extension information vector part record character string is run after fame
0 3rd vector portion record character string does not include letter
1 3rd vector portion record character string includes number
0 3rd vector portion record character string does not include spcial character
When the spelling of surname and the spelling of name are there are inclusion relation, and the string length of the spelling of surname is less than the spelling of name String length, such as the user name character string of E-mail address are " 23likimlin#p ", and the surname of user is Lee, entitled Beautiful jade, at this time first extension information include " li " or " l ", second extension information include " lin " or " l ", then primary vector part remember It is " 010 " to record the feature of character string " 23 " namely primary vector part;First extension information vector part record character " 3 " and The feature of character string " li " between character " k " namely the first extension information vector part are " 2 ";Secondary vector part records The feature of character string " kim " namely secondary vector part are " 100 ";Second extension information vector part record character m and word The feature namely the second extension information vector part for according with the character string " lin " between # are " 2 ";3rd vector portion records character The feature of string " #p " namely the 3rd vector portion are " 101 ");Then 11 be expressed as (01021002101) for vector, this 11 to The concrete meaning of amount is as shown in table 7.
Table 7
Vector value Meaning
0 Primary vector part record character string does not include letter
1 Primary vector part record character string includes number
0 Primary vector part record character string does not include spcial character
2 First extension information vector part record character string is the spelling of surname
1 Secondary vector part record character string includes letter
0 Secondary vector part record character string does not include number
0 Secondary vector part record character string does not include spcial character
2 The spelling that second extension information vector part record character string is run after fame
1 3rd vector portion record character string includes letter
0 3rd vector portion record character string does not include number
1 3rd vector portion record character string includes spcial character
When only extending information comprising first in the user name character string of E-mail address, the user of E-mail address Name character string is divided into two character strings by the first extension information;
The user name character start of string of primary vector part record E-mail address is to the of the first extension information position One extension information start between character string feature;First extension information of the first extension information of secondary vector part record position It ends up to the character string feature between the user name character string ending of E-mail address;The null character of 3rd vector portion record String feature;First extension information vector part record the first extension information character string feature;Second extension information vector part note Record null character string feature.
Check primary vector part, secondary vector part, the 3rd vector portion, the first extension information vector part and second Extend the character string feature of information vector part record:
The character string of primary vector part record is checked whether comprising letter, if comprising letter by the of 11 bit vector 1 is labeled as 1, if being labeled as 0 by the 1st of 11 bit vector not comprising letter;Check the character string of primary vector record Whether comprising number, if comprising number by the 2nd of 11 bit vector labeled as 1, if not comprising number by this 11 to The 2nd of amount is labeled as 0;The character string of primary vector record is checked whether comprising spcial character, if will comprising spcial character The 3rd of 11 bit vector is labeled as 1, if being labeled as 0 by the 3rd of 11 bit vector not comprising spcial character.
The character string of the secondary vector part record is checked whether comprising letter, if comprising letter by this 11 The 5th of vector is labeled as 1, if being labeled as 0 by the 5th of 11 bit vector not comprising letter;Check secondary vector record Character string whether comprising number, if comprising number by the 6th of 11 bit vector labeled as 1, if will not comprising number The 6th of 11 bit vector is labeled as 0;The character string of secondary vector record is checked whether comprising spcial character, if including spy Different character is labeled as 1 by the 7th of 11 bit vector, if be labeled as the 7th of 11 bit vector not comprising spcial character 0。
Check whether first extension information vector part record the first extension information character string includes the spelling of surname, if bag Spelling containing surname, labeled as 2, checks first extension information vector part record the first extension information by the 4th of 11 bit vector Whether character string includes the initial of surname, if the initial comprising surname is labeled as 1 by the 4th of 11 bit vector.
It is that null character string namely the 3rd vector portion do not include alphabetical, digital and special word to check the 3rd vector portion Any one in symbol;It is null character string namely the second extension information vector part to check the second extension information vector part The initial of spelling or name not comprising name;Accordingly 0 is labeled as by the 8th to 11 of 11 bit vectors.
Table 8
Vector value Meaning
0 Primary vector part record character string does not include letter
1 Primary vector part record character string includes number
0 Primary vector part record character string does not include spcial character
2 First extension information vector part record character string is the spelling of surname
1 Secondary vector part record character string includes letter
0 Secondary vector part record character string does not include number
1 Secondary vector part record character string does not include spcial character
0 Second extension information vector part record character string does not include the spelling of name or the initial of name
0 3rd vector portion record character string does not include letter
0 3rd vector portion record character string does not include number
0 3rd vector portion record character string does not include spcial character
Such as:The user name character string of E-mail address is " 23likim#p ", and the surname of user is Lee, entitled four, at this time First extension information includes " li " or " l ", and the second extension information includes " si " or " s " namely the then user of E-mail address Name character string in only comprising first extension information, thus primary vector part record character string " 23 " feature namely first to Amount part is " 010 ", the feature of the first extension information vector part record character string li namely the first extension information vector part For " 2 ";The feature of secondary vector part record character string " kim#p " namely secondary vector part are " 101 ";Second extension letter The feature namely the second extension information vector part for ceasing vector portion record null character string are " 0 ";3rd vector portion record is empty The feature of character string namely the 3rd vector portion are " 000 ";Then 11 bit vectors are expressed as (01021010000), 11 bit vector Concrete meaning it is as shown in table 8.
When only extending information comprising second in the user name character string of E-mail address, the user of E-mail address Name character string is divided into two character strings by the second extension information;
Primary vector part record is null character string feature;Secondary vector part records the user name of E-mail address First extension information of character start of string to the second extension information position start between character string feature;3rd vector portion is remembered Second extension information of the second extension information of record position ends up to the word between the user name character string ending of E-mail address Symbol string feature;First extension information vector part record null character string feature;Second extension information vector part record second expands Open up information character string feature.
Check primary vector part, secondary vector part, the 3rd vector portion, the first extension information vector part and second Extend the character string feature of information vector part record:
The character string of the secondary vector part record is checked whether comprising letter, if comprising letter by this 11 The 5th of vector is labeled as 1, if being labeled as 0 by the 5th of 11 bit vector not comprising letter;Check secondary vector record Character string whether comprising number, if comprising number by the 6th of 11 bit vector labeled as 1, if will not comprising number The 6th of 11 bit vector is labeled as 0;The character string of secondary vector record is checked whether comprising spcial character, if including spy Different character is labeled as 1 by the 7th of 11 bit vector, if be labeled as the 7th of 11 bit vector not comprising spcial character 0。
The character string of the 3rd vector record is checked whether comprising letter, if comprising letter by the 9th of 11 bit vector Labeled as 1, if being labeled as 0 by the 9th of 11 bit vector not comprising letter;Whether check the 3rd vectorial character string recorded Comprising number, if comprising number by the 10th of 11 bit vector labeled as 1, if not comprising number by 11 bit vector 10th is labeled as 0;The character string of the 3rd vector record is checked whether comprising spcial character, if comprising spcial character by this 11 The 11st of bit vector is labeled as 1, if being labeled as 0 by the 11st of 11 bit vector not comprising spcial character.
Check whether second extension information vector part record the second extension information character string includes the spelling of name, if bag Spelling containing name, labeled as 2, checks first extension information vector part record the second extension information by the 8th of 11 bit vector Whether character string includes the initial of name, if the initial comprising name is labeled as 1 by the 8th of 11 bit vector.
It is that null character string namely primary vector part do not include alphabetical, digital and special word to check primary vector part Any one in symbol;It is null character string namely the first extension information vector part to check the first extension information vector part The initial of spelling or surname not comprising surname;Accordingly 0 is labeled as by the 1st to 4 of 11 bit vectors.
Such as:The user name character string of E-mail address is " 23kimsi#p ", and the surname of user is Lee, entitled four, at this time First extension information includes " li " or " l ", and the second extension information includes " si " or the user name of " s " namely E-mail address Information only is extended comprising first in character string, thus the primary vector part record feature of null character string namely primary vector portion It is divided into " 000 ", the feature of the first extension information vector part record null character string namely the first extension information vector part are “0”;The feature of secondary vector part record character string " 23kim " namely secondary vector part are " 110 ";Second extension information The feature of vector portion record character string " si " namely the second extension information vector part are " 2 ";3rd vector portion records word The feature of symbol string " #p " namely the 3rd vector portion are " 101 ";Then 11 be expressed as (00001102101) for vector, this 11 to The concrete meaning of amount is as shown in table 9.
Table 9
Vector value Meaning
0 Primary vector part record character string does not include letter
0 Primary vector part record character string does not include number
0 Primary vector part record character string does not include spcial character
0 First extension information vector part record character string does not include the spelling of surname or the initial of surname
1 Secondary vector part record character string includes letter
1 Secondary vector part record character string includes number
0 Secondary vector part record character string does not include spcial character
2 The spelling that second extension information vector part record character string is run after fame
1 3rd vector portion record character string includes letter
0 3rd vector portion record character string does not include number
1 3rd vector portion record character string includes spcial character
When not including the first extension information and the second extension information in the user name character string of E-mail address;
Primary vector part, secondary vector part and the 3rd vector portion all record the user name of entire E-mail address Character string feature;First extension information vector part record null character string feature;The second empty word of extension information vector part record Symbol string feature.
Check primary vector part, secondary vector part, the 3rd vector portion, the first extension information vector part and second Extend the character string feature of information vector part record:
The user name character string of E-mail address is checked whether comprising letter, if comprising letter by 11 bit vector 1st, the 5th and the 9th is labeled as 1, if marked the 1st of 11 bit vector, the 5th and the 9th not comprising letter For 0;The user name character string of the E-mail address is checked whether comprising number, if comprising number by 11 bit vector 2nd, the 6th and the 10th is labeled as 1, if marked the 2nd of 11 bit vector, the 6th and the 10th not comprising number It is denoted as 0;The user name character string of the E-mail address is checked whether comprising spcial character, if should comprising spcial character The 3rd, the 7th and the 11st of 11 bit vectors is labeled as 1, if not comprising spcial character by the 3rd of 11 bit vector, the 7 and the 11st are labeled as 0.
It is that null character string namely the first extension information vector part do not include surname to check the first extension information vector part Spelling or surname initial;
It is that null character string namely the second extension information vector part do not include name to check the second extension information vector part Spelling or name initial;The 4th of 11 bit vectors and the 8th bit vector are labeled as 0 accordingly.
Such as:The user name character string of E-mail address is " 23kim#p ", and the surname of user is Lee, entitled four, at this time the One extension information includes " li " or " l ", and the second extension information includes " si " or the user name of " s " namely E-mail address The first extension information and the second extension information are not included in symbol string, thus primary vector part, secondary vector part and three-dimensional The feature namely primary vector part of amount part record character string " 23kim#p ", secondary vector part and the 3rd vector portion are " 111 ", the feature namely the first expansion of the first extension information vector part and the second extension information vector part record null character string It is all " 0 " to open up information vector part and the second extension information vector part;Then 11 (11101110111) are expressed as vector, it should The concrete meaning of 11 bit vectors is as shown in table 10.
Table 10
Vector value Meaning
1 Primary vector part record character string includes letter
1 Primary vector part record character string includes number
1 Primary vector part record character string includes spcial character
0 First extension information vector part record character string does not include the spelling of surname or the initial of surname
1 Secondary vector part record character string includes letter
1 Secondary vector part record character string includes number
1 Secondary vector part record character string includes spcial character
0 Second extension information vector part record character string does not include the spelling of name or the initial of name
1 3rd vector portion record character string includes letter
1 3rd vector portion record character string includes number
1 3rd vector portion record character string includes spcial character
Step S105:The ratio that each vector accounts for institute's directed quantity is counted, when the ratio that some vector accounts for institute's directed quantity is more than During equal to first threshold, the sample of the corresponding E-mail address of user name character string of all E-mail address is bag in the vector The sample of E-mail address containing batch registration.
In practice, first threshold can be by choosing the log-on message not comprising batch registration, by believing the registration The surname and name of the registered user of breath divides the user name character string of E-mail address, and using vector to the electronics The user name character string of email address is classified, the user name character string of each E-mail address obtain one 11 to Amount counts the quantity of the user name character string of E-mail address in each vector, calculates E-mail address in each vector User name character string quantity account for institute's directed quantity E-mail address user name character string quantity ratio, choose wherein maximum A ratio value as first threshold.
Again to the log-on message arbitrarily chosen, classified using 11 bit vectors, the user name of each E-mail address Character string obtains 11 bit vectors, counts the quantity of the user name character string of E-mail address in each vector, calculates every The user name character string quantity of E-mail address accounts for the user name character string number of institute's directed quantity E-mail address in a vector The ratio of amount, when the user name character string quantity of E-mail address in some vector accounts for the use of institute's directed quantity E-mail address When the ratio value of name in an account book character string quantity is more than or equal to first threshold, it is possible to judge to include in all log-on messages in the vector The log-on message of batch registration.
It can be seen that in the embodiment of the present application, by the surname of registered user and name to the user name of E-mail address Symbol string is divided, and using the feature of the various pieces after vector description division, and with vector to the use of E-mail address Name in an account book character string carries out the ratio that each vector of statistic of classification accounts for institute's directed quantity, when the ratio that some vector accounts for institute's directed quantity is more than During equal to first threshold, it is possible to judge to include the log-on message of batch registration in the vector in all log-on messages, be into one The accurate identification batch registration of step provides foundation.
The embodiment of the present application correspondingly provides a kind of device 200 for identifying batch registration behavior, as shown in Fig. 2, the device 200 include:Default unit 201 obtains information unit 202, searching unit 203, taxon 204 and statistic unit 205.
Wherein, the registration for the address that unit 201 is used for according to the surname comprising registered user, name and registration E-mail address is preset Information, the user name character string and registered user first for obtaining E-mail address in the log-on message extend information and the Two extension information;
It obtains information unit 202 and is used to obtain the user name character string of E-mail address in the log-on message, and press The second extension information of the first extension information that surname is corresponded in the log-on message and corresponding name is obtained according to pre-defined rule;
Searching unit 203 is used to search user name character of the extension information of registered user first in the E-mail address The position that position in string and the second extension information occur for the first time in the user name character string of E-mail address obtains the One extension information position and the second extension information position;
Taxon 204 is used to extend user name of the information in E-mail address according to the first extension information and second Position in symbol string divides the user name character string of E-mail address, the various pieces after being divided using vector description Feature, and classified with vector to the user name character string of E-mail address;
Statistic unit 205 is for counting the ratio that each vector accounts for institute's directed quantity, when some vector accounts for the ratio of institute's directed quantity When example is more than or equal to first threshold, it is possible to judge to include the log-on message of batch registration in the vector in all log-on messages.
It can be seen that a kind of device for identifying batch registration behavior provided by the embodiments of the present application, passes through registered user's Surname and name divide the user name character string of E-mail address, the spy of the various pieces after being divided using vector description Sign, and classified with vector to the user name character string of E-mail address, the ratio that each vector accounts for institute's directed quantity is counted, When the ratio that some vector accounts for institute's directed quantity is more than or equal to first threshold, it is possible to judge in the vector in all log-on messages Log-on message comprising batch registration provides foundation for further accurate identification batch registration.
In the 1990s, can clearly be distinguished for the improvement of a technology be on hardware improvement (for example, Improvement to circuit structures such as diode, transistor, switches) or software on improvement (improvement for method flow).So And with the development of technology, the improvement of current many method flows can be considered as directly improving for hardware circuit. Improved method flow nearly all by being programmed into hardware circuit to obtain corresponding hardware circuit by designer.Cause This, it cannot be said that the improvement of a method flow cannot be realized with hardware entities module.For example, programmable logic device (Programmable Logic Device, PLD) (such as field programmable gate array (Field Programmable Gate Array, FPGA)) it is exactly such a integrated circuit, logic function determines device programming by user.By designer Voluntarily programming comes a digital display circuit " integrated " on a piece of PLD, without chip maker is asked to design and make Dedicated IC chip 2.Moreover, nowadays, substitution manually makes IC chip, and this programming is also used instead mostly " logic compiler (logic compiler) " software realizes that software compiler used is similar when it writes with program development Seemingly, and the source code before compiling also handy specific programming language is write, this is referred to as hardware description language (Hardware Description Language, HDL), and HDL is also not only a kind of, but there are many kind, such as ABEL (Advanced Boolean Expression Language)、AHDL(Altera Hardware Description Language)、Confluence、CUPL(Cornell University Programming Language)、HDCal、JHDL (Java Hardware Description Language)、Lava、Lola、MyHDL、PALASM、RHDL(Ruby Hardware Description Language) etc., VHDL (Very-High-Speed are most generally used at present Integrated Circuit Hardware Description Language) and Verilog2.Those skilled in the art It will be apparent to the skilled artisan that it only needs method flow slightly programming in logic and being programmed into integrated circuit with above-mentioned several hardware description languages In, it is possible to it is readily available the hardware circuit for realizing the logical method flow.
Controller can be implemented in any suitable manner, for example, controller can take such as microprocessor or processing The computer for the computer readable program code (such as software or firmware) that device and storage can be performed by (micro-) processor can Read medium, logic gate, switch, application-specific integrated circuit (Application Specific Integrated Circuit, ASIC), the form of programmable logic controller (PLC) and embedded microcontroller, the example of controller include but not limited to following microcontroller Device:ARC 625D, Atmel AT91SAM, Microchip PIC18F26K20 and Silicone Labs C8051F320, are deposited Memory controller is also implemented as a part for the control logic of memory.
It is also known in the art that in addition to realizing controller in a manner of pure computer readable program code, it is complete Entirely can by by method and step carry out programming in logic come controller with logic gate, switch, application-specific integrated circuit, may be programmed The form of logic controller and embedded microcontroller etc. realizes identical function.Therefore this controller is considered one kind Hardware component, and the device for being used to implement various functions to including in it can also be considered as the structure in hardware component.Or Even, the device for being used to implement various functions can be considered as either the software module of implementation method can be Hardware Subdivision again Structure in part.
System, device, module or the unit that above-described embodiment illustrates can specifically realize by computer chip or entity, Or it is realized by having the function of certain product.
For convenience of description, it is divided into various units during description apparatus above with function to describe respectively.Certainly, this is being implemented The function of each unit is realized can in the same or multiple software and or hardware during application.
As seen through the above description of the embodiments, those skilled in the art can be understood that the application can It is realized by the mode of software plus required general hardware platform.Based on such understanding, the technical solution essence of the application On the part that the prior art contributes can be embodied in the form of software product in other words, in a typical configuration In, computing device includes one or more processors (CPU), input/output interface, network interface and memory.The computer is soft Part product can include some instructions and use so that a computer equipment (can be personal computer, server or network Equipment etc.) perform method described in some parts of each embodiment of the application or embodiment.The computer software product can To store in memory, memory may include the volatile memory in computer-readable medium, random access memory (RAM) and/or the forms such as Nonvolatile memory, such as read-only memory (ROM) or flash memory (flash RAM).Memory is computer The example of readable medium.Computer-readable medium includes permanent and non-permanent, removable and non-removable media can be by Any method or technique come realize information store.Information can be computer-readable instruction, data structure, the module of program or its His data.The example of the storage medium of computer includes, but are not limited to phase transition internal memory (PRAM), static RAM (SRAM), dynamic random access memory (DRAM), other kinds of random access memory (RAM), read-only memory (ROM), electrically erasable programmable read-only memory (EEPROM), fast flash memory bank or other memory techniques, read-only optical disc are read-only Memory (CD-ROM), digital versatile disc (DVD) or other optical storages, magnetic tape cassette, tape magnetic rigid disk storage or Other magnetic storage apparatus or any other non-transmission medium, the information that can be accessed by a computing device available for storage.According to Herein defines, and computer-readable medium does not include of short duration computer readable media (transitory media), such as modulation Data-signal and carrier wave.
Each embodiment in this specification is described by the way of progressive, identical similar portion between each embodiment Point just to refer each other, and the highlights of each of the examples are difference from other examples.It is real especially for system For applying example, since it is substantially similar to embodiment of the method, so description is fairly simple, related part is referring to embodiment of the method Part explanation.
The application can be used in numerous general or special purpose computing system environments or configuration.Such as:Personal computer, clothes Business device computer, handheld device or portable device, laptop device, the system based on microprocessor, are put multicomputer system Top box, programmable consumer-elcetronics devices, network PC, minicomputer, mainframe computer, including any of the above system or equipment Distributed computing environment etc..
The application can be described in the general context of computer executable instructions, such as program Module.Usually, program module includes routines performing specific tasks or implementing specific abstract data types, program, object, group Part, data structure etc..The application can also be put into practice in a distributed computing environment, in these distributed computing environment, by Task is performed and connected remote processing devices by communication network.In a distributed computing environment, program module can be with In the local and remote computer storage media including storage device.
Although depicting the application by embodiment, it will be appreciated by the skilled addressee that the application there are many deformation and Variation is without departing from spirit herein, it is desirable to which appended claim includes these deformations and changes without departing from the application's Spirit.

Claims (11)

  1. A kind of 1. method for identifying batch registration behavior, which is characterized in that this method includes:
    The log-on message to be identified of default quantity is chosen, the log-on message includes surname, name and the registration electronics postal of registered user The address of case;
    The user name character string of E-mail address in the log-on message is obtained, and obtains the registration according to pre-defined rule and believes The first extension information of surname and the second extension information of corresponding name are corresponded in breath;
    It searches registered user first and extends information and the second extension information in the user name character string of the E-mail address The position occurred for the first time obtains the first extension information position and the second extension information position;
    By the user name character string of the E-mail address according to the described first extension information in the E-mail address The position of position and the second extension information in the user name character string of the E-mail address in user name character string It puts and carries out being divided into five parts, including:First extension message part, the second extension message part, the first extension information and the Part, the user name character start of string character string to the described first extension information and described second between two extension information expands In part and the first extension information and the second extension information before opening up the extension information before information middle position rests against To the part of user name character string ending character string after the extension information of position rearward;After being divided using vector description The feature of various pieces, and classified with the vector corresponding to the user name character string to the user name character string, it will The user name character string of identical vector is as same type of user name character string;
    The user name character string for counting each type accounts for the ratio of all user name character strings, when the user name character of a certain type When the ratio of string is more than or equal to first threshold, judge in the log-on message corresponding to the user name character string of the type comprising batch The log-on message of registration.
  2. 2. according to the method described in claim 1, it is characterized in that, the first extension information includes the head of the spelling and surname of surname Letter, the second extension information include the initial of the spelling and name of name;
    When searching the first extension information, the spelling of surname is first searched, when the spelling of surname can not be found, then searches the lead-in of surname It is female;
    When searching the second extension information, the spelling of name is first searched, when the spelling of name can not be found, then searches the lead-in of name It is female.
  3. 3. according to the method described in claim 2, it is characterized in that, the lookup registered user first extends information and the second expansion The position that exhibition information occurs for the first time in the user name character string of the E-mail address includes:
    Judge that the spelling of registered user's surname of the log-on message and the spelling of name whether there is inclusion relation;
    When inclusion relation is not present in the spelling and the spelling of the name for judging the surname, in the use of the E-mail address The position for occurring the first extension information for the first time is searched in name in an account book character string, as the described first extension information in the electricity Position in the user name character string of sub-voice mailbox address;It is searched for the first time in the user name character string of the E-mail address There is the position of the second extension information, extend user name character of the information in the E-mail address as described second Position in string.
  4. 4. according to the method described in claim 3, it is characterized in that, the lookup registered user first extends information and the second expansion The position that exhibition information occurs for the first time in the user name character string of the E-mail address further includes:
    When the spelling and the spelling of the name for judging the surname are there are inclusion relation, and the spelling of the surname and the name is complete When spelling consistent;
    Determine preferential lookup information, the preferential lookup information is the first extension information or the second extension information;When described first When extending information preferentially to search information, the second extension information searches information for second;When the described second extension information is During preferential lookup information, the first extension information searches information for second;
    It is searched in the user name character string of the E-mail address and occurs the preferential position for searching information for the first time, made For the preferential position for searching information in the user name character string of the E-mail address;
    It is past from the position for finding the preferential lookup information for the first time in the user name character string of the E-mail address Afterwards, search and occur the described second position for searching information for the first time, information is searched in the E-mail address as described second Position in the user name character string of location.
  5. 5. according to the method described in claim 3, it is characterized in that, the lookup registered user first extends information and the second expansion The position that exhibition information occurs for the first time in the user name character string of the E-mail address further includes:
    When the spelling and the spelling of the name for judging the surname are there are inclusion relation, and the string length of the spelling of the surname More than the spelling of the name string length when;
    The position for occurring the first extension information for the first time is searched in the user name character string of the E-mail address, is made For position of the first extension information in the user name character string of the E-mail address;In the use of the E-mail address After the first extension information is removed in name in an account book character string, the position for occurring the second extension information for the first time is searched, as Position of the second extension information in the user name character string of the E-mail address;
    When the spelling of the surname and the spelling of the name are there are inclusion relation, and the string length of the spelling of the surname is less than institute When stating the string length of the spelling of name;
    The position for occurring the second extension information for the first time is searched in the user name character string of the E-mail address, is made For position of the second extension information in the user name character string of the E-mail address;In the use of the E-mail address After the second extension information is removed in name in an account book character string, the position for occurring the first extension information for the first time is searched, as Position of the first extension information in the user name character string of the E-mail address.
  6. 6. the according to the method described in claim 2, it is characterized in that, various pieces using after division described in vector description Feature, and with it is described vector to the user name character string of the E-mail address carry out classification include:
    Record feature, the feature and described the of the second extension information of the first extension information respectively using the vector The feature for the character string that one extension information position and the second extension information position divide;
    According to the feature of the described first extension information, the feature of the second extension information and the obtained character string of dividing Feature classifies to the user name character string of the E-mail address.
  7. 7. according to the method described in claim 6, it is characterized in that, the vector includes primary vector part, secondary vector portion Point, the 3rd vector portion, the first extension information vector part and the second extension information vector part:
    Information is extended when including the described first extension information and described second in the user name character string of the E-mail address When;
    The primary vector part records the first extension information of the first extension information position and the E-mail address User name character string one end between character string feature, the end is not to including described the between the first extension information position First extension information of one extension information position and the second extension information of the second extension information position;
    The secondary vector part records the first extension information and the described second extension information of the first extension information position Character string feature between second extension information of position, the first extension information not including the described first extension information position and The character string feature of second extension information of the second extension information position;
    3rd vector portion records the second extension information of the second extension information position to the E-mail address The user name character string other end character string feature, the other end is not to including described the between the second extension information position First extension information of one extension information position and the second extension information of the second extension information position;
    The first extension information vector part record the first extension information character string feature;
    The second extension information vector part record the second extension information character string feature.
  8. 8. according to the method described in claim 6, it is characterized in that, the vector includes primary vector part, secondary vector portion Point, the 3rd vector portion, the first extension information vector part and the second extension information vector part:
    When only extending information comprising described first in the user name character string of the E-mail address, the E-mail address The user name character string of location is divided into two character strings by the described first extension information;
    The user name character start of string of the primary vector part record E-mail address extends information position to described first First extension information start between character string feature;The secondary vector part records the first extension information position First extension information ends up to the character string feature between the user name character string ending of the E-mail address;Described 3rd The null character string feature of vector portion record;The first extension information vector part record the first extension information character string is special Sign;The second extension information vector part record null character string feature;
    When only extending information comprising described second in the user name character string of the E-mail address, the E-mail address The user name character string of location is divided into two character strings by the described second extension information;
    The primary vector part records null character string feature;The user name of the secondary vector part record E-mail address First extension information of character start of string to the described second extension information position start between character string feature;The three-dimensional Amount part records the second extension information ending of the second extension information position to the user name of the E-mail address Character string feature between symbol string ending;The first extension information vector part record null character string feature;Described second expands Open up information vector part record the second extension information character string feature.
  9. 9. according to the method described in claim 6, it is characterized in that, the vector includes primary vector part, secondary vector portion Point, the 3rd vector portion, the first extension information vector part and the second extension information vector part:
    Letter is extended when not including the described first extension information and described second in the user name character string of the E-mail address During breath;
    The primary vector part, the secondary vector part and the 3rd vector portion all record entire E-mail address User name character string feature;
    The first extension information vector part record null character string feature;
    The second extension information vector part record null character string feature.
  10. 10. according to the method described in claim 7 to 9 any one, which is characterized in that the primary vector part includes 3 Whether mark is respectively used to identify the character string of the primary vector record comprising letter, if comprising number and whether Include spcial character;
    The secondary vector part includes 3 bit identifications, be respectively used to identify the secondary vector record the character string whether Comprising letter, whether include number and whether comprising spcial character;
    3rd vector portion includes 3 bit identifications, is respectively used to whether identify the described 3rd vectorial character string recorded Comprising letter, whether include number and whether comprising spcial character;
    The first extension information vector part includes 1 bit identification, for identifying the first extension information vector part record Whether the first extension information character string includes the spelling of surname or the initial of surname;
    The second extension information vector part includes 1 bit identification, for identifying the second extension information vector part record Whether the second extension information character string includes the spelling of name or the initial of name.
  11. 11. a kind of device for identifying batch registration behavior, which is characterized in that the device includes:
    Default unit for the log-on message of the address according to the surname comprising registered user, name and registration E-mail address, obtains institute It states the user name character string of E-mail address in log-on message and registered user first extends information and the second extension information;
    Information unit is obtained, for obtaining the user name character string of E-mail address in the log-on message, and according to predetermined The first extension information of surname and the second extension information of corresponding name are corresponded in log-on message described in Rule;
    Searching unit extends the use of information and the second extension information in the E-mail address for searching registered user first The position occurred for the first time in name in an account book character string obtains the first extension information position and the second extension information position;
    Taxon, for the user name character string of the E-mail address to be extended information in the electricity according to described first Position and described second in the user name character string of sub-voice mailbox address extend user name of the information in the E-mail address Position in character string carries out being divided into five parts, including:First extension message part, the second extension message part, first Extend the part between information and the second extension information, the user name character start of string character string to described first extends information Part before extension information and described first before being rested against with the described second extension information middle position extend information and described the Two extend the extension information after information middle positions rest against afterwards to the part of user name character string ending character string;Using vector The feature of various pieces after description division, and with the vector corresponding to the user name character string to the user name character string Classify, using the user name character string of identical vector as same type of user name character string;
    Statistic unit, the user name character string for counting each type account for the ratio of all user name character strings, when certain is a kind of When the ratio of the user name character string of type is more than or equal to first threshold, the registration corresponding to the user name character string of the type is judged The log-on message of batch registration is included in information.
CN201410639883.0A 2014-11-13 2014-11-13 A kind of method and device for identifying batch registration behavior Active CN105653912B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201410639883.0A CN105653912B (en) 2014-11-13 2014-11-13 A kind of method and device for identifying batch registration behavior

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201410639883.0A CN105653912B (en) 2014-11-13 2014-11-13 A kind of method and device for identifying batch registration behavior

Publications (2)

Publication Number Publication Date
CN105653912A CN105653912A (en) 2016-06-08
CN105653912B true CN105653912B (en) 2018-06-01

Family

ID=56478701

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201410639883.0A Active CN105653912B (en) 2014-11-13 2014-11-13 A kind of method and device for identifying batch registration behavior

Country Status (1)

Country Link
CN (1) CN105653912B (en)

Families Citing this family (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN106339615B (en) * 2016-08-29 2020-06-16 北京红马传媒文化发展有限公司 Method, system and equipment for identifying abnormal registration behavior

Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN102185788A (en) * 2011-01-31 2011-09-14 北京开心人信息技术有限公司 Method and system for searching vice accounts on basis of temporary mailbox
CN103118043A (en) * 2011-11-16 2013-05-22 阿里巴巴集团控股有限公司 Identification method and equipment of user account
WO2014091337A1 (en) * 2012-12-13 2014-06-19 Abb Research Ltd A system and a method for registration of devices in a plant

Patent Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN102185788A (en) * 2011-01-31 2011-09-14 北京开心人信息技术有限公司 Method and system for searching vice accounts on basis of temporary mailbox
CN103118043A (en) * 2011-11-16 2013-05-22 阿里巴巴集团控股有限公司 Identification method and equipment of user account
WO2014091337A1 (en) * 2012-12-13 2014-06-19 Abb Research Ltd A system and a method for registration of devices in a plant

Also Published As

Publication number Publication date
CN105653912A (en) 2016-06-08

Similar Documents

Publication Publication Date Title
US10552462B1 (en) Systems and methods for tokenizing user-annotated names
US20190278853A1 (en) Extracting Structure and Semantics from Tabular Data
CN107025239B (en) Sensitive word filtering method and device
CN104778158A (en) Method and device for representing text
US9224103B1 (en) Automatic annotation for training and evaluation of semantic analysis engines
CN104423623B (en) It is a kind of to select word treatment method and electronic equipment
CN105989089A (en) Data comparison method and device
US10067983B2 (en) Analyzing tickets using discourse cues in communication logs
CN108228665A (en) Determine object tag, the method and device for establishing tab indexes, object search
CN109299269A (en) A kind of file classification method and device
CN110674297B (en) Public opinion text classification model construction method, public opinion text classification device and public opinion text classification equipment
CN105678129B (en) A kind of method and apparatus of determining subscriber identity information
CN109344406A (en) Part-of-speech tagging method, apparatus and electronic equipment
CN107402945A (en) Word stock generating method and device, short text detection method and device
CN110704608A (en) Text theme generation method and device and computer equipment
CN109299276A (en) One kind converting the text to word insertion, file classification method and device
CN108846069A (en) A kind of document execution method and device based on markup language
CN107451036A (en) Input reminding method, device and equipment
CN108875743A (en) A kind of text recognition method and device
CN106970758A (en) Electronic document operation processing method, device and electronic equipment
CN105095826B (en) A kind of character recognition method and device
CN110263140B (en) Method and device for mining subject term, electronic equipment and storage medium
CN113360685A (en) Method, device, equipment and medium for processing note content
CN105653912B (en) A kind of method and device for identifying batch registration behavior
WO2016000511A1 (en) Method and apparatus for mining rare resource of internet

Legal Events

Date Code Title Description
C06 Publication
PB01 Publication
C10 Entry into substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant
TR01 Transfer of patent right

Effective date of registration: 20201010

Address after: Cayman Enterprise Centre, 27 Hospital Road, George Town, Grand Cayman Islands

Patentee after: Innovative advanced technology Co.,Ltd.

Address before: Cayman Enterprise Centre, 27 Hospital Road, George Town, Grand Cayman Islands

Patentee before: Advanced innovation technology Co.,Ltd.

Effective date of registration: 20201010

Address after: Cayman Enterprise Centre, 27 Hospital Road, George Town, Grand Cayman Islands

Patentee after: Advanced innovation technology Co.,Ltd.

Address before: Greater Cayman, British Cayman Islands

Patentee before: Alibaba Group Holding Ltd.

TR01 Transfer of patent right