Specific embodiment
The method, apparatus and equipment that the character string that this specification embodiment provides a kind of pair of Mass production is identified.
In order to make those skilled in the art more fully understand the technical solution in this specification, below in conjunction with this explanation
Attached drawing in book embodiment is clearly and completely described the technical solution in this specification embodiment, it is clear that described
Embodiment be merely a part but not all of the embodiments of the present application.Based on this specification embodiment, this field
Those of ordinary skill's every other embodiment obtained without creative efforts, all should belong to the application
The range of protection.
The process that the character string for a kind of pair of Mass production that Fig. 1 provides for this specification embodiment know method for distinguishing is shown
It is intended to, which includes:
Step 105, the character string to be identified of Mass production is received;
In this specification embodiment, by taking the account of major network platform as an example, these accounts are spliced by character
Character string.The account that machine automatically generates very maximum probability is the random string being spliced by character, such as
" iehfdjksyneyg ", and the account of most of ordinary user's registration all can be using the character string with certain meaning, such as
" ilovekobe ", the account that machine automatically generates, character string degree of randomness is much larger than the account that ordinary user oneself registers
Character string degree of randomness.
If step 220 inputs character string (character string to be identified of Mass production) in Fig. 2, in this specification embodiment,
The character string that receiving step 220 inputs, to receive character string " for ak, ti odoe dgza ".
Step 110, the character string to be identified is split, obtains the sub- word of at least one character string to be identified
Symbol string;
Preferably, " ak, ti odoe dgza " are pre-processed character string to be identified first received to step 105, removal
The non-serviceable character of the accounts such as space and punctuation mark, character string is " aktiodoedgza " after being pre-processed;Divide again
Character string after pretreatment obtains at least one substring, as shown in step 225 in Fig. 2.
It should be noted that in this specification embodiment, after preset characters length is to pretreatment character string into
Row segmentation, such as every two characters to string segmentation once and/or every three characters it is primary to string segmentation, obtain to
A few substring.
In this specification embodiment, if taking the N=2 of N-gram model, to character string after pretreatment
" aktiodoedgza " is split, and obtaining substring is " ak ", " ti ", " od ", " oe ", " dg " and " za ";If taking N-
The N=3 of gram model is then split character string after pretreatment " aktiodoedgza ", obtain substring be " akt ",
" iod ", " oed " and " gza ".
Step 115, the probability that at least one substring of the character string to be identified occurs is determined, according to the sub- word
The degree of randomness of character string to be identified described in the determine the probability that symbol string occurs;
In this specification embodiment, first with probability dictionary, match character string to be identified " ak, ti odoe dgza's "
The probability that substring " ak ", " ti ", " od ", " oe ", " dg " and " za " occurs.According to above-mentioned substring occur probability,
Calculating character string to be identified, " probability that ak, ti odoe dgza " occur, further determines that character string to be identified " ak, ti odoe
The degree of randomness R of dgza ", as shown in step 230 in Fig. 2;Wherein, probability dictionary includes sample substring and the sub- word of sample
Accord with the corresponding relationship between the probability of string.Specifically, obtain substring " ak ", " ti ", " od ", " oe ", " dg " and
The probability that " za " individually occurs be respectively in the case of 0.79,0.59,0.63,0.71,0.56 and 0.68 calculate 0.79,0.59,
0.63,0.71,0.56 and 0.68 geometrical mean is 0.66 as character string to be identified " ak, ti odoe dgza " appearance
Probability P, further, character string to be identified " the degree of randomness R=1-P of ak, ti odoe dgza ", then degree of randomness R
It is 0.34;Or obtaining at least two substrings adjacent in substring " ak ", " ti ", " od ", " oe ", " dg " and " za "
Simultaneously occur probability scenarios under, using at least two adjacent substrings simultaneously occur probabilistic geometry average value as
The probability P that above-mentioned character string to be identified occurs.Below with obtain adjacent two substrings " ak " and " ti ", " ti " and
The probability that " od ", " od " and " oe ", " oe " and " dg " and " dg " and " za " occurs simultaneously is respectively 0.69,0.69,0.63,
0.71, for 0.66, the geometrical mean for calculating 0.69,0.69,0.63,0.71,0.66 is 0.68 as character string to be identified
" probability P that ak, ti odoe dgza " occur, further, character string " the randomness journey of ak, ti odoe dgza " to be identified
R=1-P is spent, then degree of randomness R is 0.32;Or obtain at the same time above-mentioned character string to be identified " ak, ti odoe dgza's "
It is under probability and two adjacent substrings while the probability scenarios of appearance that substring individually occurs, substring is independent
The arithmetic average for the probabilistic geometry average value that the probabilistic geometry average value of appearance and two adjacent substrings occur simultaneously
As above-mentioned character string " probability P that ak, ti odoe dgza " occur, probability P 0.67 to be identified.Further according to above-mentioned to be identified
The probability 0.67 that character string occurs determines that the degree of randomness R of character string to be identified is 0.33.
It should be noted that it is above-mentioned utilize probability dictionary, match character string to be identified " ak, ti odoe dgza's "
Before the probability that substring " ak ", " ti ", " od ", " oe ", " dg " and " za " occurs, probability dictionary is first obtained.In this explanation
In book embodiment, the type of sample string data and the character string type to be identified of Mass production are identical.Therefore to obtain English
For literary magazine, English webpage or other english articles that can normally obtain are as sample string data, as walked in Fig. 2
Shown in rapid 205.Further, sample string data is split, obtains several sample substrings;As walked in Fig. 2
Shown in rapid 210, the number and/or at least two adjacent sample substrings that several sample substrings individually occur are counted
The number occurred simultaneously;Calculate probability that several described sample substrings individually occur and/or described adjacent at least two
The probability that a sample substring occurs simultaneously, obtains probability dictionary, as shown in step 215 in Fig. 2;Wherein, in probability dictionary
The probability that individually occurs comprising several sample substrings and several described sample substrings and/or comprising it is adjacent extremely
The probability that few two sample substrings and at least two adjacent sample substrings occur simultaneously.
Step 120, according to the degree of randomness of the character string to be identified, judge the character string to be identified whether be with
The character string that machine generates.
In this specification embodiment, as shown in step 235 in Fig. 2, judges degree of randomness R and to preset random threshold value big
It is small.As shown in step 240 in Fig. 2, in above-mentioned character string to be identified, " the degree of randomness R of ak, ti odoe dgza " are greater than pre-
If in the case where random threshold value, it is determined that " ak, ti odoe dgza " are the character string generated at random to character string to be identified.It is above-mentioned
Preset random threshold value=1- predetermined probabilities threshold value;Wherein, above-mentioned predetermined probabilities threshold value is the sub- word of several samples in probability dictionary
The median for the probability that symbol string individually occurs;Or in probability dictionary adjacent at least two sample substrings occur simultaneously it is general
The median of rate;Or the median for the probability that several sample substrings individually occur in probability dictionary and phase in probability dictionary
The arithmetic mean number of the median for the probability that at least two adjacent sample substrings occur simultaneously.It is with predetermined probabilities threshold value
For 0.7, obtain presetting random threshold value being 0.3.Character string to be identified " ak, ti the odoe dgza " that above-mentioned steps 115 obtain
Degree of randomness R be greater than preset random threshold value 0.3.Therefore, " ak, ti odoe dgza " are random generate to character string to be identified
Character string.As shown in step 245 in Fig. 2, in the case where the degree of randomness R of character string is no more than random threshold value is preset,
Character string is general character string.
Further, in this specification embodiment, the character string that is generated at random to above-mentioned " ak, ti odoe dgza " into
Row emphasis prevention and control, specifically, restricted character string " permission of ak, ti odoe dgza ", or to character string " ak, ti odoe
Dgza " reinforces verifying, or forbids character string " ak, ti odoe dgza " logging in online platform.
Compared with prior art, this specification embodiment use above-mentioned technical proposal can reach it is following the utility model has the advantages that
By determining the substring probability of occurrence of character string, determines the degree of randomness of character string, further judge that the character string is
The no character string to generate at random, whole process save human cost without manually marking a large amount of training data;For to
The type of identification string can targetedly select sample string data;It improves to the lesser character of entire length
The effect that string is identified.
Fig. 3 is that the structure for the device that the character string for a kind of pair of Mass production that this specification embodiment provides is identified is shown
It is intended to, which includes: receiving module 305, segmentation module 310, determining module 315 and judgment module 320;
The receiving module 305, for receiving the character string to be identified of Mass production;
It is described to be identified to obtain at least one for being split to the character string to be identified for the segmentation module 310
The substring of character string;
The determining module 315, the probability occurred for determining at least one substring of the character string to be identified,
The degree of randomness of character string to be identified described in the determine the probability occurred according to the substring;
The judgment module 320 judges the word to be identified for the degree of randomness according to the character string to be identified
Whether symbol string is the character string generated at random.
Preferably, the determining module 315 is specifically used for utilizing probability dictionary, matches the son of the character string to be identified
The probability that character string occurs, the probability dictionary include the corresponding pass between sample substring and the probability of sample substring
System;According to the probability that the substring occurs, the degree of randomness of the character string to be identified is determined.
Preferably, described device further include: probability dictionary obtains module, for being split to sample string data,
Obtain several sample substrings;Count the number and/or adjacent at least two that several sample substrings individually occur
The number that a sample substring occurs simultaneously;Calculate the probability and/or institute that several described sample substrings individually occur
The probability stating at least two adjacent sample substrings while occurring, obtains probability dictionary;Wherein, if including in probability dictionary
Probability that dry sample substring and several described sample substrings individually occur and/or comprising adjacent at least two
The probability that sample substring and at least two adjacent sample substrings occur simultaneously.
Preferably, the type of the sample string data and the character string type to be identified of the Mass production are identical.
Preferably, the determining module 315, also particularly useful for the probability occurred according to the substring, determine described in
The probability that character string to be identified occurs;According to the probability that the character string to be identified occurs, the character string to be identified is determined
Degree of randomness.
It is highly preferred that the determining module 315, also particularly useful for obtaining the substring list of the character string to be identified
Under the probability scenarios solely occurred, the probabilistic geometry average value that the substring is individually occurred is as the character string to be identified
The probability P of appearance;Or in at least two adjacent substrings for obtaining the character string to be identified while the probability feelings occurred
Under condition, the probabilistic geometry average value that at least two adjacent substrings are occurred simultaneously is as the character string to be identified
The probability P of appearance;Or the probability and the character to be identified individually occurred in the substring for obtaining the character string to be identified
Under the probability scenarios that at least two adjacent substrings of string occur simultaneously, the probability that the substring is individually occurred is several
The arithmetic average conduct for the probabilistic geometry average value that average value and at least two adjacent substrings occur simultaneously
The probability P that the character string to be identified occurs.
Further, the determining module 315, also particularly useful for the degree of randomness R of the determination character string to be identified
The probability P that character string to be identified described in=1- occurs.
Preferably, the judgment module 320 is greater than pre- specifically for the degree of randomness R in the character string to be identified
If the character string to be identified is the character string generated at random in the case where random threshold value.
Preferably, described to preset random threshold value=1- predetermined probabilities threshold value;Wherein, the predetermined probabilities threshold value is described general
The median for the probability that several sample substrings individually occur in rate dictionary;Or adjacent at least two in the probability dictionary
The median for the probability that a sample substring occurs simultaneously;Or several sample substrings individually go out in the probability dictionary
In the probability that at least two adjacent sample substrings occur simultaneously in the median of existing probability and the probability dictionary
The arithmetic mean number of digit.
Preferably, described device further include: emphasis prevention and control module, for determining what character string to be identified was randomly generated
In the case where character string, emphasis prevention and control are carried out to the character string generated at random;Wherein, the emphasis prevention and control include limitation power
Limit at least one of reinforces verifying and/or forbids logging in.
The equipment that the character string for a kind of pair of Mass production that this specification embodiment also provides is identified, comprising: storage
Device and processor, the memory store program, and be configured to be executed by the processor receive Mass production to
Identification string;The character string to be identified is split, the substring of at least one character string to be identified is obtained;
Determine that the probability that at least one substring of the character string to be identified occurs, the probability occurred according to the substring are true
The degree of randomness of the fixed character string to be identified;According to the degree of randomness of the character string to be identified, judgement is described wait know
Whether other character string is the character string generated at random.
It should be understood by those skilled in the art that, the embodiment of the present invention can provide as method, system or computer program
Product.Therefore, complete hardware embodiment, complete software embodiment or reality combining software and hardware aspects can be used in the present invention
Apply the form of example.Moreover, it wherein includes the computer of computer usable program code that the present invention, which can be used in one or more,
The computer program implemented in usable storage medium (including but not limited to magnetic disk storage, CD-ROM, optical memory etc.) produces
The form of product.
The present invention be referring to according to the method for the embodiment of the present invention, the process of equipment (system) and computer program product
Figure and/or block diagram describe.It should be understood that every one stream in flowchart and/or the block diagram can be realized by computer program instructions
The combination of process and/or box in journey and/or box and flowchart and/or the block diagram.It can provide these computer programs
Instruct the processor of computer, special purpose computer, Embedded Processor or other programmable data processing devices to generate one
A machine so that by the instruction that the processor of computer or other programmable data processing devices executes generate for realizing
The device for the function of being specified in one or more flows of the flowchart and/or one or more blocks of the block diagram.
These computer program instructions, which may also be stored in, is able to guide computer or other programmable data processing devices with spy
Determine in the computer-readable memory that mode works, so that it includes referring to that instruction stored in the computer readable memory, which generates,
Enable the manufacture of device, the command device realize in one box of one or more flows of the flowchart and/or block diagram or
The function of being specified in multiple boxes.
These computer program instructions also can be loaded onto a computer or other programmable data processing device, so that counting
Series of operation steps are executed on calculation machine or other programmable devices to generate computer implemented processing, thus in computer or
The instruction executed on other programmable devices is provided for realizing in one or more flows of the flowchart and/or block diagram one
The step of function of being specified in a box or multiple boxes.
In a typical configuration, calculating equipment includes one or more processors (CPU), input/output interface, net
Network interface and memory.
Memory may include the non-volatile memory in computer-readable medium, random access memory (RAM) and/or
The forms such as Nonvolatile memory, such as read-only memory (ROM) or flash memory (flash RAM).Memory is computer-readable medium
Example.
Computer-readable medium includes permanent and non-permanent, removable and non-removable media can be by any method
Or technology come realize information store.Information can be computer readable instructions, data structure, the module of program or other data.
The example of the storage medium of computer includes, but are not limited to phase change memory (PRAM), static random access memory (SRAM), moves
State random access memory (DRAM), other kinds of random access memory (RAM), read-only memory (ROM), electric erasable
Programmable read only memory (EEPROM), flash memory or other memory techniques, read-only disc read only memory (CD-ROM) (CD-ROM),
Digital versatile disc (DVD) or other optical storage, magnetic cassettes, tape magnetic disk storage or other magnetic storage devices
Or any other non-transmission medium, can be used for storage can be accessed by a computing device information.As defined in this article, it calculates
Machine readable medium does not include temporary computer readable media (transitory media), such as the data-signal and carrier wave of modulation.
It should also be noted that, the terms "include", "comprise" or its any other variant are intended to nonexcludability
It include so that the process, method, commodity or the equipment that include a series of elements not only include those elements, but also to wrap
Include other elements that are not explicitly listed, or further include for this process, method, commodity or equipment intrinsic want
Element.In the absence of more restrictions, the element limited by sentence "including a ...", it is not excluded that including element
There is also other identical elements in process, method, commodity or equipment.
The above is only the embodiments of this specification, are not limited to this specification.For those skilled in the art
For, this specification can have various modifications and variations.All any modifications made within the spirit and principle of this specification,
Equivalent replacement, improvement etc., should be included within the scope of the claims of this specification.