Detailed description of the invention
For making the purpose of the application, technical scheme and advantage clearer, specifically real below in conjunction with the application
Execute example and technical scheme is clearly and completely described by corresponding accompanying drawing.Obviously, described
Embodiment is only some embodiments of the present application rather than whole embodiments.Based on the enforcement in the application
Example, the every other enforcement that those of ordinary skill in the art are obtained under not making creative work premise
Example, broadly falls into the scope of the application protection.
The information process based on risk identification that Fig. 1 provides for the embodiment of the present application, this process is specifically wrapped
Include following steps:
S101: the character comprised in information to be identified is divided into different character sets.
In the scene of the embodiment of the present application, after user have registered accounts information (such as: network account), meeting
The user profile of this user self is bound with account information, in order to be identified when corresponding operating
Certification.Therefore the information described to be identified in the embodiment of the present application, particularly as follows: bind mutually with accounts information,
For being authenticated the user profile identified.This information to be identified includes but not limited to: the phone number of user,
Passport NO. etc..
Generally, the character included in above-mentioned information to be identified has certain implication.As a example by phone number:
In 11 phone numbers 13812348888, front three numeral " 138 " represents the attribute type of phone number,
By this three bit digital, it may be determined that go out the telecom operators belonging to this phone number and corresponding service class
Type.The 4-digit number " 1234 " of the four to seven, for attaching position register (Home Location Register,
HLR) identification code, by this 4-digit number, it may be determined that go out user profile corresponding to this phone number (such as:
The homing position information of phone number, call priority information etc.).Last 4-digit number " 8888 ", represents
Customs Assigned Number, by this 4-digit number, it may be determined that concrete a certain user.Visible, for phone number
For, the numeral wherein comprised has corresponding implication.
Therefore, in above-mentioned steps S101, the character in information to be identified with certain implication can be drawn
It is divided into different character sets.
It should be noted that in above-mentioned steps S101, character is divided into the mode of character set, tool
Body, it may be that the character specific bit in information to be identified put, is divided into a character set.So, pin
To the characters specified on position different in described information to be identified, these characters are divided into different character sets,
Just multiple different character set is obtained.Wherein, the institute during the intersection of each character set comprises information to be identified
There is character, and at least two character set exists common factor.
S102, determines the component value-at-risk that each character set is corresponding respectively.
After the character with certain implication is divided into different character sets, determine each character set one by one
Component value-at-risk.Wherein, described component value-at-risk, distinguish corresponding risk for each character set
Quantized value.The implication of the character owing to being divided in different character sets is different, therefore in the embodiment of the present application
In, determine that the component value-at-risk that each character set is corresponding will in different ways, such as: based on character set
In the probability that occurs of character, accounting under given conditions, the various ways such as weight of character determine difference
The component value-at-risk that character set is corresponding.
It should be noted that the component value-at-risk in the embodiment of the present application, reflect the character in character set
Action value, and reflect risk by action value.
Specifically, still as a example by above-mentioned phone number 13812348888, if by after in this phone number
4-digit number " 8888 " is divided in a character set, it is clear that in 4-digit number, 4-digit number occur
The probability all repeated is minimum, say, that the action value that character set containing this 4-digit number is corresponding
It is high, then, under actual application scenarios, the information to be identified containing this character set has bigger possibility
It is stolen, that is, the risk that this character set is stolen is higher.
S103, according to the component value-at-risk that each character set is corresponding, determines the comprehensive wind of described information to be identified
Danger value.
Due to the character included in each character set, it is the alphabet in information to be identified, so, logical
Cross the risk that risk corresponding to each character set just can reflect that this information to be identified is overall, that is, root
According to the component value-at-risk that each character set is corresponding, it may be determined that go out the integrated risk value that this information to be identified is overall.
Certainly, in the embodiment of the present application, the component value-at-risk of each character set can be by cumulative, average etc. many
The mode of kind determines the integrated risk value of information to be identified, is not intended as the restriction to the application here.
S104, according to described integrated risk value, processes described information to be identified.
In the embodiment of the present application, integrated risk value reflects the risk of information to be identified, specifically,
Integrated risk value is the biggest, and the risk of information to be identified is the highest, then, the safety that this information to be identified is subject to
Threaten the highest, such as: be likely to be stolen, therefore, for the letter to be identified that integrated risk value is too high
Breath, needs to combine corresponding risk control system and processes, and processing mode can be an up security monitoring level
Not or increase safety prevention measure etc..In actual applications, corresponding risk threshold values can be pre-set,
When the integrated risk value of this information to be identified determined is higher than this risk threshold values, just to this letter to be identified
Breath carries out corresponding wind control and processes.
By above-mentioned steps, the character containing corresponding meaning in information to be identified is divided into different character set
Close, it is determined that after the component value-at-risk that each character set is the most corresponding, it is possible to accurately determine this letter to be identified
The integrated risk value that breath is corresponding, and do not rely on subjective judgment, at the component wind determining that each character set is corresponding
During the value of danger, based on the identification information pre-saved, thus can reflect to be identified more accurately
The real value degree of information.
In the embodiment of the present application, due to the character in kinds of characters set, there is different implications, then,
When determining component value-at-risk corresponding to kinds of characters set, also will in different ways.Specifically:
Method one:
As in figure 2 it is shown, the method one determines the component value-at-risk that each character set is corresponding process particularly as follows:
S201, according to the sequencing of each character in described information to be identified, by the word in described character set
Symbol arranges, and obtains the character string that this character set is corresponding.
When the character specifying position in information to be identified is divided into a character set, it is not according to respectively
Character is divided into corresponding character set by the sequencing of character, it is more likely that be by corresponding character randomly
Being divided in this character set, the change of character sequencing may make to be divided into the character in this character set not
There is corresponding implication.Such as, the numeral of first of phone number to the 3rd is respectively 138, then,
Assume first, second and third of phone number for specifying position, then by phone number first to the 3rd
Character when being divided into a character set, the orders such as 381 or 813 may be formed, so, at this
Three bit digital in character set the most do not have an implication representing phone number attribute type, thus cause can not
Accurately determine the component value-at-risk that character set is corresponding.
Therefore, in the embodiment of the present application, the character in information to be identified is divided into different character sets
After, the character being divided in this character set is arranged so that these characters meet in information to be identified each
The sequencing of character, that is, just obtained the character string that this character set is corresponding, thus not after Pai Lie
Change the implication of these characters.
S202, in each identified normal information pre-saved, determines the letter with identical characters sequence
The accounting of breath, as the first accounting.
In actual application scenarios, accounts information and the information bound with it, be stored in relevant device (such as:
Server) in, it is possible that user's information of accessing to your account carries out stealing the violation operations such as account, then,
Whether relevant device will occur violation operation by monitoring accounts information, judge to identify and account information phase
The information of binding is normal information or improper information.Certainly, in actual applications, determine and respectively identify
Information whether be normal information, can to use the mode such as network behavior monitoring of the prior art, analysis,
This is not intended that the restriction to the application.
Therefore, in the embodiment of the present application, each identified normal information pre-saved, can be in advance
It is stored in relevant device, and regards as normal information, such as: in a certain website, for different accounts
Different phone numbers bound in information, after corresponding identifying processing, regard as normal phone number,
It it is exactly each identified normal information pre-saved.
For including the information of above-mentioned character string, it possibly be present in identified normal information, also
Likely occur in improper information.So, add up all information with this character string, all
Accounting (the first accounting) in identified normal information.
S203, in the improper information of each identified pre-saved, determines have identical characters sequence
The accounting of information, as the second accounting.
The improper information of each identified that is similar with above-mentioned first accounting, that pre-save, can be to deposit in advance
Storage regards as improper information, such as in relevant device: the blacklist obtained after corresponding identifying processing
Phone number.By this second accounting.
S204, determines the ratio of described first accounting and described second accounting.
By the ratio of the first accounting Yu the second accounting, can represent that the information containing this character string is normal
Information or the possibility degree of improper information, specifically, if the first accounting is remote with the ratio of the second accounting
More than 1, say, that the first accounting is much larger than the second accounting, represent that the information containing this character string exists
Have been identified as the ratio in normal information, much larger than the ratio in having been identified as improper information, thus can
Bigger to determine the probability as normal information of the information containing this character string.
S205, determines, according to described ratio, the first component value-at-risk that described character set is corresponding.
It should be noted that due in actual application scenarios, the quantity of the identified information prestored
It is huge, then, the first accounting may be relatively big with the ratio of the second accounting, adds the operand of subsequent treatment.
For simplified operation, in the embodiment of the present application, this ratio can be simplified in the way of using logarithm operation,
That is, for above-mentioned steps S205, determine, according to described ratio, the first component wind that described character set is corresponding
Danger value, particularly as follows: the logarithm value of this ratio will be determined, and determines above-mentioned character set pair according to this logarithm value
The the first component value-at-risk answered.If directly using the logarithm value of this ratio as described first component value-at-risk, by
(in logarithm, if antilog is less than 1, then this logarithm is tied to be likely to occur the minus situation of numerical value in logarithm value
Fruit is less than zero), then, the integrated risk value of described information to be identified is determined according to this first component value-at-risk
Time, this integrated risk value may be brought certain error.
Therefore, more specifically, above-mentioned determine that according to described logarithm value described character set is corresponding first point
The step of amount value-at-risk, particularly as follows: by described logarithm value and the regulating constant sum preset, as described word
The first component value-at-risk that symbol set is corresponding.So, it is possible to by default regulating constant, offset
The error that described logarithm value is brought when less than zero.
In the embodiment of the present application, described default regulating constant, at least should be corresponding more than each character set
Ratio logarithm value in the absolute value of minimum numerical value.Thus, the first accounting of all character sets and the
The logarithm value of the ratio of two accountings, with described default regulating constant sum, is the numerical value more than zero, no
There will be minus situation.
Under a kind of scene provided in the embodiment of the present application, if described information to be identified is mobile phone to be identified
Number, the most described character set by: be made up of the some numerals comprised in described phone number to be identified
Digital collection, in this case, the front three that comprises in the phone number to be identified by this numeral,
When being divided into the first character set, for the first character set, according to each number in described phone number to be identified
The sequencing of word, arranges the numeral in described first character set, obtains this first character set
The first corresponding Serial No..Now, in conjunction with said method one, formula can be passed through
Determine the first component value-at-risk that described first character set is corresponding.
Wherein, S1For the first component value-at-risk that described first character set is corresponding.
p1For: in each identified normal handset number pre-saved, containing the hands of the first Serial No.
The accounting of plane No. code.
p2For: in the improper phone number of each identified pre-saved, containing the first Serial No.
The accounting of phone number.
C is default regulating constant value.
With an application example, said method one is specifically described below:
Assuming that phone number is still 13812348888, this phone number is bound with account A phase, then, when
After server receives the registration of account A, can be to the phone number bound with account A phase
13812348888 are identified.The front three numeral of this phone number is divided into the first character set by server,
And according to the sequencing of front three numeral in this phone number, the numeral in this character set is arranged,
Obtain the first character string " 138 ".
Assume the quantity being identified as normal phone number prestored in server be 10000 (
In actual application, the enormous amount of the account that server is stored, at this for convenience of describing, only with 10000
As a example by), in these 10000 normal phone numbers, containing the cell-phone number of the first character string " 138 "
Code has 2000, hence, it can be determined that go out the phone number containing this first character string " 138 ",
The first accounting p in normal handset number1, that is, p1=2000/10000=0.5.
Assume that the quantity being identified as improper phone number prestored in server is 100,
In these 100 improper phone numbers, the phone number containing the first character string " 138 " has 2
Individual, hence, it can be determined that go out the phone number containing this first character string " 138 ", prestoring
Improper phone number in the second accounting p2, that is, p2=2/100=0.02.
Obtaining the first accounting p1With the second accounting p2After, just may determine that the first accounting p1With the second accounting
p2Ratio, namely p1/p2=0.5/0.02=25.This ratio is much larger than 1, and just explanation is containing character string " 138 "
Phone number, be that the probability of normal handset number is higher.
Simultaneously, it is assumed that regulating constant value C is 8, then use above-mentioned formula, first point of the first character set
Amount value-at-risk
Visible by upper example, use the mode of the first accounting and the second accounting to determine first point of character set
Amount value-at-risk, it is possible to be accurately that the probability of normal information or improper information is entered to information to be identified
Row quantifies: this first component value-at-risk is the biggest, and this information to be identified is that the probability of normal information is the highest,
The probability with stolen risk is the biggest, otherwise, the probability for improper information is the highest, has stolen
The probability of risk is the least.
Method two:
As it is shown on figure 3, the method two determines the component value-at-risk that each character set is corresponding process particularly as follows:
S301, according to the sequencing of each character in described information to be identified, by the word in described character set
Symbol arranges, and obtains the character string that this character set is corresponding.
Similar with said method one, the character specifying position in information to be identified is divided into a character set
Time, be not that character is divided into corresponding character set by the sequencing according to each character, therefore, will to point
The character entered in this character set arranges, and obtains the character string that this character set is corresponding.
S302, in each identification information pre-saved, determines the identification information containing this character string
Corresponding each accounts information.
Owing in the embodiment of the present application, each accounts information is all bound mutually with corresponding information, then, for
Arbitrary information that identified, all can uniquely determine and identify the accounts information that information is bound mutually with this.
It should be noted that accounts information following in method two each means the identification containing this character string
Each accounts information that information is corresponding.
S303, determines the grade of service of each accounts information.
In actual application scenarios, user can use the accounts information of himself to obtain all kinds of business service,
The business service that a certain accounts information obtains is the most, just explanation user's information of the commonly used account, and then should
Accounts information is that the probability of normal accounts information is the highest.In order to quantify user's usage degree to accounts information,
The corresponding grade of service can be set in advance for different business service, thus, can use according to accounts information
The situation of business service, determines the grade of service of account information.
Such as, preset: the grade of the business service being associated with bank card is 5, then, if a certain
Accounts information has bound corresponding bank card, and has opened the business being associated with this bank card, the then account
The grade of service corresponding to information is just 5.
Certainly, if a certain accounts information uses multiple business service, then the grade of service of account information is
The grade of service sum of each business service, such as: a certain accounts information has opened two kinds of business service, and these are two years old
The grade of service planting business service is respectively 3 and 4, then, the grade of service of account information is 7.
It should be noted that in actual applications, the determination to the grade of service of accounts information, however it is not limited to
Aforesaid way, it is also possible to be the liveness according to accounts information, or the used business service of accounts information
The modes such as frequency, determine the grade of service of accounts information, and this is not intended that the restriction to the application.
S304, according to the grade of service of each accounts information, the quantity of the accounts information of statistics different business grade.
Generally, the limitednumber of business service, in each accounts information, it will have more accounts information to use
The situation of identical services service, say, that the grade of service of these accounts informations is identical.In this Shen
Please be in embodiment, it is thus necessary to determine that the quantity of the accounts information that the grade of service is identical, therefore determining that each account is believed
After the grade of service of breath, the quantity of accounts information corresponding to each grade of service will be added up.
S305, in each accounts information, determines the accounting of the accounts information of different business grade respectively.
In the case of the quantity of accounts information corresponding to known each grade of service, it is possible to determine each respectively
The accounts information that the grade of service is corresponding, in the account that all information that identify comprising described character string are corresponding
In information, shared ratio, it is thus possible to reflect that these accounts informations use business service intuitively
Degree.
S306, according to the grade of service of each accounts information, and the accounting of the accounts information of different business grade,
Determine the second component value-at-risk that described character set is corresponding.
Determining the grade of service of each accounts information, and after the accounting of the accounts information of different business grade,
The most just can show that, the grade of service distribution of all identified information containing described character string.
Under a kind of scene provided in the embodiment of the present application, if described information to be identified is mobile phone to be identified
Number, the most described character set by: be made up of the some numerals comprised in described phone number to be identified
Digital collection, in this case, the first seven bit digital comprised in the phone number to be identified by this,
When being divided into the second character set, for the second character set, according to each number in described phone number to be identified
The sequencing of word, arranges the numeral in described second character set, obtains this second character set
The second corresponding Serial No..Now, in conjunction with said method two, formula can be passed through
S2=Σ (w (i) * Prob (i))
Determine the second component value-at-risk that described second character set is corresponding.
Wherein, S2For the second component value-at-risk that described second character set is corresponding.
W (i) represents: i-th kind of grade of service in each grade of service determined is w (i).
Prob (i) is: the accounts information of i-th kind of grade of service accounting in each accounts information determined.
It should be noted that in the embodiment of the present application, before why 11 phone numbers comprising
Seven bit digital are divided into the second character set, are because: by the front three and the four to seven of phone number
The numeral of four of position, it may be determined that under a certain attribute type (e.g., same operator), there is same call
There is under the phone number of priority, or a certain attribute type the phone number etc. of same home position,
It is to say, by the first seven bit digital, it may be determined that go out there is the phone number of same characteristic features.
With an application example, said method two is specifically described below:
Assuming that phone number is still 13812348888, this phone number is bound with account A phase, then, when
After server receives the registration of account A, can be to the phone number bound with account A phase
13812348888 are identified.The first seven bit digital of this phone number is divided into the second character set by server,
And according to the sequencing of the first seven bit digital in this phone number, the numeral in this character set is arranged,
Obtain the second character string " 1381234 ".
Server, by the phone number of identification pre-saved, determines containing this second character string
All phone numbers of " 1381234 ".It is assumed that the mobile phone containing this second character string " 1381234 "
Number has 1000.So, server will determine the account bound in these 1000 phone numbers respectively
Information, corresponding, server will determine 1000 accounts informations.
Afterwards, server will determine above-mentioned 1000 accounts letter according to grade of service standard set in advance
The grade of service of breath.Server can determine its grade of service according to the business service that accounts information is used,
Certainly, server determines the grade of service of accounts information, can use each business service set in advance etc.
The various ways such as grade standard, when reality is applied, can be adjusted setting according to the needs of reality application,
Here the restriction to the application it is not intended that.
Assume, in above-mentioned 1000 accounts informations, to occur in that two kinds of grades of service altogether have 900 account letters
The grade of service of breath is the 1st kind of grade of service w (1), and w (1) is 5, other 100 accounts informations
The grade of service be the 2nd kind of grade of service w (2), and w (2) is 4.So, the grade of service is 5
Accounts information, accounting Prob (1) in above-mentioned 1000 accounts informations is 900/1000=0.9, business
Grade is the accounts information of 4, and accounting Prob (2) in above-mentioned 1000 accounts informations is 100/1000=0.1.
Thus, server can be determined containing the second character string " 1381234 " according to above-mentioned formula
Second component value-at-risk S corresponding to the second character set2=0.9*5+0.1*4=4.9.This second component risk
Value, close to grade of service w (1), say, that containing above-mentioned second character string " 1381234 "
The accounts information corresponding to phone number, its grade of service maintains essentially in the level of w (1).
Visible by upper example, determine the accounts information corresponding containing the information that identifies of described character string, and
Determine the grade of service of these accounts informations, the journey of the business service that these accounts informations use can be reflected
Degree, meanwhile, in conjunction with the quantity of accounts information corresponding to the different business grade counted, it is possible to from entirety
Upper quantization contains the grade of service of accounts information corresponding to the information that identifies of described character string.This second point
Amount value-at-risk is the biggest, and this information to be identified is that the probability of normal information is the highest, and have a stolen risk can
Energy property is the biggest, otherwise, the probability for improper information is the highest, and the probability with stolen risk is the least.
Method three:
As shown in Figure 4, the method three determines the process of component value-at-risk that each character set is corresponding particularly as follows:
S401, according to the sequencing of each character in described information to be identified, by the word in described character set
Symbol arranges, and obtains the character string that this character set is corresponding.
Similar with said method one and method two, after corresponding character is divided into character set, just to this
Each character in character set arranges.
S402, identifies the characteristic character in described character string.
In the embodiment of the present application, described characteristic character, including repeat character (RPT) and/or order character, wherein,
Repeat character (RPT), is specially at least two consecutive identical characters, such as: aaa, bb, cccc etc..Sequentially
Character, is specially at least three according to certain continuously arranged character of character sequence.Such as: abcd, 789,
321,1234 etc..
It addition, for the identification of described characteristic character, character recognition algorithm of the prior art can be used,
It is not intended that the restriction to the application.
S403, when identifying characteristic character, determines weighted value and the eigenvalue of this feature character.
In described character string, there is substantial amounts of permutation and combination method, the arrangement of most characters in different characters
Combination is all random and unordered, the most in limited instances, just permutation and combination can become described characteristic character,
It is to say, this feature character has certain probability.It addition, the character quantity in characteristic character and this spy
The probability levying character appearance is inversely proportional to, and specifically, the character quantity in characteristic character is the most, then this feature
The probability that character occurs is the lowest, and the character quantity in characteristic character is the fewest, then the appearance of this feature character is general
Rate is the highest.Such as: the probability that repeat character (RPT) " 8888 " occurs in 11 phone numbers is the least,
Comparatively speaking, the probability that repeat character (RPT) " 88 " occurs in 11 phone numbers is bigger.
Therefore, in the embodiment of the present application, the probability that will occur according to characteristic character, quantify this feature character
Weighted value, according to the character quantity comprised in this feature character, quantify the eigenvalue of this feature character.
That is, above-mentioned steps S403, specifically include: determine that described characteristic character occurs in this character string general
Rate;The weighted value of described characteristic character is determined according to this probability;Participle is carried out for described characteristic character,
To character cell;According to the quantity of the character cell obtained, determine the eigenvalue of described characteristic character.
Need exist for explanation, when described characteristic character is carried out participle, can be according to N-gram language
Model carries out participle, i.e. N continuous the character that N-gram language model can will comprise in a certain character string
Being divided into a character cell, N is exactly the quantity of the character included in be marked point of character cell.
In the embodiment of the present application, use N-gram language model time described characteristic character is carried out participle, can be by
This feature character is divided into the character cell (now N=1) of minimum, and increases character in character cell successively
Quantity, until be divided into a character cell (now in N=this feature character by whole for this feature character
The quantity of the character comprised).
Such as: for characteristic character 8888, use N-gram language model to carry out participle, divide at 1-gram
Under word method, described characteristic character is divided into 4 character cells 8,8,8,8, at 2-gram participle
Under method, described characteristic character is divided into 3 character cells 88,88,88, in 3-gram participle side
FAXIA, is divided into 2 character cells 888,888 by described characteristic character, under 4-gram segmenting method,
Features described above character is divided into 1 character cell 8888.
S404, according to weighted value and the eigenvalue of described characteristic character, determine that described character set is corresponding
Three-component value-at-risk.
For said method three, under a kind of scene provided in the embodiment of the present application, when described letter to be identified
Breath is phone number to be identified, and, the rear eight-digit number word comprised in the phone number to be identified by this,
When being divided into three character sets, for the 3rd character set, according to each number in described phone number to be identified
The sequencing of word, arranges the numeral in described 3rd character set, obtains the 3rd character set
The 3rd corresponding Serial No..If including in the 3rd Serial No., repetition is digital and/or order is digital, just
May determine that the eigenvalue that repetition is digital and/or order is digital.
When identifying repetition numeral, carry out participle for described repetition numeral, obtain different digital units,
At this point it is possible to pass through formula
Determine the described eigenvalue repeating numeral.
Wherein, ScN () is attached most importance to the eigenvalue of complex digital, independent variable n represents the number comprised in this repetition numeral
The quantity of word.
tfjAfter carrying out participle for counterweight complex digital, the quantity of the character cell obtained.
J represents jth kind segmenting method, and uses in each digital units that jth kind segmenting method obtains and comprise
The quantity of character be j.Certainly, when j uses N-gram language model to carry out participle exactly, the value of N.
The most such as: use N-gram language model to carry out drawing for characteristic character " 8888 " in upper example
On the basis of Fen, use above-mentioned formula, determine that the eigenvalue of this repetition numeral " 8888 " is:
Sc(n)=1* (4-1)+2* (3-1)+3* (2-1)+4* (1-1)=10.
Wherein, for 2* (3-1), it is based on 2-gram segmenting method, by described characteristic character " 8888 "
Being divided into 3 character cells 88,88,88, numeral " 2 " is exactly the number of the character comprised in character cell
Amount, numeral " 3 " is exactly the quantity of character cell.By that analogy, just can obtain in above-mentioned formula is each
Entry value.
In actual application scenarios, generally order numeral at least to include three characters, say, that
When order numeral is carried out participle, minimum should carry out participle for the order numeral containing three characters.And
When above-mentioned repetition numeral is carried out participle, minimum carry out participle for the repetition numeral containing two characters.Can
See, when determining eigenvalue, the character comprised in the quantity proportion complex digital of the character that sequentially numeral includes
Few one of quantity.
Therefore, when identifying order numeral, determine the quantity of the character comprised in this order numeral, now,
Formula can be passed through
Ss(n ')=Sc(n′-1)
Determine the eigenvalue of described order numeral.
Wherein, SsEigenvalue for order numeral.
Independent variable n ' is the quantity of the character that described order numeral includes.
The most such as: when determining the eigenvalue of five orders numeral " 12345 ", this feature value and repetition
Numeral, such as: the eigenvalue of " 8888 " is identical, uses above-mentioned formula, determines this order numeral " 12345 "
Eigenvalue be:
Ss(5)=Sc(4)=1* (4-1)+2* (3-1)+3* (2-1)+4* (1-1)=10.
After determining the eigenvalue that above-mentioned repetition is digital and/or order is digital, just can use formula
S3=w (Sc+Ss+1)
Determine the three-component value-at-risk that described 3rd character set is corresponding.
Wherein, S3For the three-component value-at-risk that described 3rd character set is corresponding.
W is digital and that sequentially numeral occurs in the 3rd Serial No. probit of the described repetition identified
Reciprocal.
If it should be noted that in the 3rd Serial No., only duplicating numeral, or ordinal number only occur
During word, then, only need to determine that this repetition numeral (or order numeral) occurs in the 3rd Serial No.
Probit, and using the inverse of this probit as the weighted value w of characteristic character.If at the 3rd Serial No.
In, occur in that repetition is digital and the most digital simultaneously, then, will determine that this repetition is digital and order numeral is same
Time occur in the probit in the 3rd Serial No., and by the inverse of this probit, as weight occurs simultaneously
When complex digital and order numeral, the weighted value of characteristic character.
With an application example, said method two is specifically described below:
Assuming that phone number is still 13812348888, this phone number is bound with account A phase, then, when
After server receives the registration of account A, can be to the phone number bound with account A phase
13812348888 are identified.The rear eight-digit number word of this phone number is divided into the 3rd character set by server,
And according to the sequencing of eight-digit number word rear in this phone number, the numeral in this character set is arranged,
Obtain the 3rd character string " 12348888 ".
Obviously, existing characteristics character in the 3rd character string " 12348888 ", that is, contain suitable simultaneously
Ordinal number word " 1234 " and repetition numeral " 8888 ".In order to determine the weighted value w of this feature character, just need
Determine in the eight-digit number word identical with above-mentioned 3rd character string figure place that this order numeral and weight occurs simultaneously
The probit of complex digital.
Specifically, on each position of the 3rd character string, all there are 10 kinds of probable values of numeral 0~9,
So, on eight positions of the 3rd character string, the total amount of the permutation and combination method of numeral is 108.At this
In a little permutation and combination methods, order numeral of appearance simultaneously " 1234 " and repetition numeral " 8888 " only two kinds
Situation: " 12348888 " and " 88881234 ", thus, in the 3rd character string, occur should simultaneously
Order numeral and the probit repeating numeral are 2/108.So, according to above-mentioned formula, it may be determined that go out
W=108/2.Obviously, the value of this w is relatively big, is not easy to subsequent calculations, then, in actual applications, can
In the way of using evolution, taking the logarithm, the value of abbreviation w, it is assumed that the value in this application example, to w
Carry out out 7 powers, thus, the w ' ≈ 22.4 after abbreviation.
Afterwards, server is determining repetition numeral " 8888 " and the feature of order numeral " 1234 " respectively
Value, for repeating numeral " 8888 ", its eigenvalue Sc(4)=10, for order numeral " 1234 ", its
Eigenvalue Ss(4)=Sc(3)=4.
Therefore, according to above-mentioned formula, three-component value-at-risk S of the 3rd character string3=22.4* (10+4+1)
=336.
Visible by upper example, when method three determines the three-component value-at-risk of the 3rd character set, if this
The figure place of the characteristic character included in three character sets is the most, and the weighted value of this feature character and eigenvalue are also
The biggest, this just explanation, in this case, this information to be identified has higher action value.This is years old
Three-component value-at-risk is the biggest, and this information to be identified is that the probability of normal information is the highest, has stolen risk
Probability the biggest, otherwise, the probability for improper information is the highest, has the probability of stolen risk
The least.
So far, three of the above method, determine three kinds of component value-at-risks of this information to be identified respectively, thus
The most just can determine the integrated risk value that this information to be identified is overall, in this Shen according to these component value-at-risks
Embodiment please determine the integrated risk value of described information to be identified, particularly as follows: by described each character set
Corresponding component value-at-risk carries out geometric average, obtains the integrated risk value of described information to be identified.
The most such as: the example in aforementioned manners to three, phone number " 13812348888 "
Integrated risk value
The integrated risk value of described information to be identified is the biggest, the most just illustrates that the action value of this information to be identified is the highest,
Its risk being stolen is the biggest, so, in actual applications, when this information to be identified determined
When integrated risk value is more than a certain default value-at-risk, it is possible to the account of this information to be identified and binding thereof is believed
The monitoring rank of breath, it is to avoid situation about being stolen occurs.
It addition, when making in aforementioned manners, determine combining of the information to be identified bound mutually with a certain accounts information
After closing value-at-risk, at a time, account information has bound again new information to be identified, but, new
The integrated risk value of information to be identified, far below the integrated risk value of former information to be identified, then, the account
Information very likely occurs in that situation about being stolen, it is thus possible to promote the monitoring rank to account information.
Certainly, above-mentioned is illustrate as a example by information to be identified is as phone number, the embodiment of the present application
Above-mentioned based on risk identification the information processing method provided can be additionally used in the wind identifying other information to be identified
Danger, and process based on risk, such as, this information to be identified can also is that E-mail address, certificate
Number etc..
The information processing method based on risk identification provided for the embodiment of the present application above, based on same think of
Road, the embodiment of the present application also provides for a kind of information processor based on risk identification, as shown in Figure 5.
Information processor based on risk identification in Fig. 5, including: character divides module 501, component
Value-at-risk module 502, integrated risk value module 503 and processing module 504, wherein,
Character divides module 501, for the character comprised in information to be identified is divided into different character set
Close.
Component value-at-risk module 502, for determining the component value-at-risk that each character set is corresponding respectively.
Integrated risk value module 503, for the component value-at-risk corresponding according to each character set, determines described
The integrated risk value of information to be identified.
Processing module 504, for according to described integrated risk value, processes described information to be identified.
Described character divides module 501, specifically for: the character that specific bit in information to be identified is put,
Being divided into a character set, wherein, the intersection of each character set comprises all characters in information to be identified,
At least two character set exists and occurs simultaneously.
In the embodiment of the present application, due to the character in kinds of characters set, there is different implications, then,
When determining component value-at-risk corresponding to kinds of characters set, also will in different ways.Specifically:
As shown in Figure 6, when determining the first component value-at-risk, described component value-at-risk module, specifically include:
Character arrangements submodule 601, is used for according to the sequencing of each character in described information to be identified, will
Character in described character set arranges, and obtains the character string that this character set is corresponding.
First accounting submodule 602, in each identified normal information pre-saved, determines tool
There is the accounting of the information of identical characters sequence, as the first accounting.
Second accounting submodule 603, in the improper information of each identified pre-saved, determines
There is the accounting of the information of identical characters sequence, as the second accounting.
Ratio submodule 604, for determining the ratio of described first accounting and described second accounting.
According to described ratio, first component value-at-risk submodule 605, for determining that described character set is corresponding
First component value-at-risk.
When described first component value-at-risk is excessive, in order to simplify subsequent arithmetic, described first component value-at-risk
Submodule 605, specifically for: determine the logarithm value of described ratio, determine described word according to described logarithm value
The first component value-at-risk that symbol set is corresponding.
Under the another way of the embodiment of the present application, described first component value-at-risk submodule 605, specifically
For: by described logarithm value and the regulating constant sum preset, as corresponding first point of described character set
Amount value-at-risk.
As it is shown in fig. 7, when determining second component value-at-risk, described component value-at-risk module, specifically include:
Character arrangements submodule 701, is used for according to the sequencing of each character in described information to be identified, will
Character in described character set arranges, and obtains the character string that this character set is corresponding.
Accounts information submodule 702, in each identification information pre-saved, determines containing this word
Accord with each accounts information that the information that identifies of sequence is corresponding.
Grade of service submodule 703, for determining the grade of service of each accounts information, according to each accounts information
The grade of service, the quantity of accounts information of statistics different business grade.
Accounting submodule 704, in each accounts information, determines the account letter of different business grade respectively
The accounting of breath.
Second component value-at-risk submodule 705, for the grade of service according to each accounts information and different
The accounting of the accounts information of the grade of service, determines the second component value-at-risk that described character set is corresponding.
As shown in Figure 8, when determining three-component value-at-risk, described component value-at-risk module, specifically include:
Character arrangements submodule 801, is used for according to the sequencing of each character in described information to be identified, will
Character in described character set arranges, and obtains the character string that this character set is corresponding.
Identify submodule 802, for identifying the characteristic character in described character string.
Characteristic character submodule 803, for when identifying characteristic character, determines the weight of this feature character
Value and eigenvalue.
Three-component value-at-risk submodule 804, for the weighted value according to described characteristic character and eigenvalue,
Determine the three-component value-at-risk that described character set is corresponding.
Wherein, described characteristic character includes repeat character (RPT) and/or order character.
Described characteristic character submodule 803, specifically for: determine that described characteristic character occurs in this character sequence
Probability in row, determines the weighted value of described characteristic character according to this probability, carries out for described characteristic character
Participle, obtains character cell, according to the quantity of the character cell obtained, determines the feature of described characteristic character
Value.
Under a kind of scene of the embodiment of the present application, described information to be identified is particularly as follows: phone number to be identified.
Described character set is particularly as follows: the numeral that is made up of the some numerals comprised in described phone number to be identified
Set.Described character divide module 501, specifically for: by phone number to be identified comprises first three
Bit digital, is divided into the first character set, the first seven bit digital that will comprise in phone number to be identified, draws
It is divided into the second character set, the rear eight-digit number word that will comprise in phone number to be identified, it is divided into the 3rd word
Symbol set.
Under this scene, when determining the first component value-at-risk, described component value-at-risk module, specifically for:
For the first character set, according to the sequencing of each numeral in described phone number to be identified, by described
Numeral in one character set arranges, and obtains the first Serial No. that this first character set is corresponding;
Use formulaDetermine the first component value-at-risk that described first character set is corresponding;
Wherein, S1For the first component value-at-risk that described first character set is corresponding;
p1For: in each identified normal handset number pre-saved, containing the hands of the first Serial No.
The accounting of plane No. code;
p2For: in the improper phone number of each identified pre-saved, containing the first Serial No.
The accounting of phone number;
C is default regulating constant value.
When determining second component value-at-risk, described component value-at-risk module, specifically for: for the second character
Set, according to the sequencing of each numeral in described phone number to be identified, by described second character set
Numeral arrange, obtain the second Serial No. that this second character set is corresponding;
In each identification information pre-saved, determine the cell-phone number of identification containing this second Serial No.
Each accounts information that code is corresponding;
Determine the grade of service of each accounts information;
Use formula S2=Σ (w (i) * Prob (i)) determines that described second character set is corresponding
Two component value-at-risks;
Wherein, S2For the second component value-at-risk that described second character set is corresponding;
W (i) represents: i-th kind of grade of service in each grade of service determined is w (i);
Prob (i) is: the accounts information of i-th kind of grade of service accounting in each accounts information determined.
When determining three-component value-at-risk, described component value-at-risk module, specifically for: for the 3rd character
Set, according to the sequencing of each numeral in described phone number to be identified, by described 3rd character set
Numeral arrange, obtain the 3rd Serial No. that the 3rd character set is corresponding;
Identify that the repetition in the 3rd Serial No. is digital and/or order is digital;
When identifying repetition numeral, carry out participle for described repetition numeral, obtain different digital units,
Use formulaDetermine the described eigenvalue repeating numeral;
Wherein, ScThe eigenvalue of complex digital of attaching most importance to;
tfjAfter carrying out participle for counterweight complex digital, the quantity of the character cell obtained;
J represents jth kind segmenting method, and uses in each digital units that jth kind segmenting method obtains and comprise
The quantity of character be j;
When identifying order numeral, determine the quantity of the character comprised in this order numeral, use formula
Ss(n ')=Sc(n '-1) determines the eigenvalue of described order numeral;
Wherein, SsEigenvalue for order numeral;
N ' is the quantity of the character that described order numeral includes;
Use formula S?=w (Sc+Ss+ 1) the three-component wind that described 3rd character set is corresponding is determined
Danger value;
Wherein, S3For the three-component value-at-risk that described 3rd character set is corresponding;
W is digital and that sequentially numeral occurs in the 3rd Serial No. probit of the described repetition identified
Reciprocal.
After determining above-mentioned the first to three-component value-at-risk, described integrated risk value module, specifically for:
Component value-at-risk corresponding for described each character set is carried out geometric average, obtains combining of described information to be identified
Close value-at-risk.
In a typical configuration, calculating equipment includes one or more processor (CPU), input/defeated
Outgoing interface, network interface and internal memory.
Internal memory potentially includes the volatile memory in computer-readable medium, random access memory
(RAM) and/or the form such as Nonvolatile memory, such as read only memory (ROM) or flash memory (flash RAM).
Internal memory is the example of computer-readable medium.
Computer-readable medium includes that removable media permanent and non-permanent, removable and non-can be by appointing
Where method or technology realize information storage.Information can be computer-readable instruction, data structure, program
Module or other data.The example of the storage medium of computer includes, but are not limited to phase transition internal memory
(PRAM), static RAM (SRAM), dynamic random access memory (DRAM), its
The random access memory (RAM) of his type, read only memory (ROM), electrically erasable are read-only
Memorizer (EEPROM), fast flash memory bank or other memory techniques, read-only optical disc read only memory
(CD-ROM), digital versatile disc (DVD) or other optical storage, magnetic cassette tape, tape magnetic
Disk storage or other magnetic storage apparatus or any other non-transmission medium, can be used for storage can be calculated
The information that equipment accesses.According to defining herein, computer-readable medium does not include temporary computer-readable matchmaker
Body (transitory media), such as data signal and the carrier wave of modulation.
Also, it should be noted term " includes ", " comprising " or its any other variant are intended to non-row
Comprising, so that include that the process of a series of key element, method, commodity or equipment not only include of his property
Those key elements, but also include other key elements being not expressly set out, or also include for this process,
The key element that method, commodity or equipment are intrinsic.In the case of there is no more restriction, statement " include
One ... " key element that limits, it is not excluded that including the process of described key element, method, commodity or setting
Other identical element is there is also in Bei.
It will be understood by those skilled in the art that embodiments herein can be provided as method, system or computer journey
Sequence product.Therefore, the application can use complete hardware embodiment, complete software implementation or combine software and
The form of the embodiment of hardware aspect.And, the application can use and wherein include calculating one or more
The computer-usable storage medium of machine usable program code (include but not limited to disk memory, CD-ROM,
Optical memory etc.) form of the upper computer program implemented.
The foregoing is only embodiments herein, be not limited to the application.For this area skill
For art personnel, the application can have various modifications and variations.All institutes within spirit herein and principle
Any modification, equivalent substitution and improvement etc. made, within the scope of should be included in claims hereof.