CN105718767A - Information processing method and device based on risk identification - Google Patents

Information processing method and device based on risk identification Download PDF

Info

Publication number
CN105718767A
CN105718767A CN201410734967.2A CN201410734967A CN105718767A CN 105718767 A CN105718767 A CN 105718767A CN 201410734967 A CN201410734967 A CN 201410734967A CN 105718767 A CN105718767 A CN 105718767A
Authority
CN
China
Prior art keywords
character
risk
information
identified
character set
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN201410734967.2A
Other languages
Chinese (zh)
Other versions
CN105718767B (en
Inventor
郑丹丹
林述民
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Advanced New Technologies Co Ltd
Advantageous New Technologies Co Ltd
Original Assignee
Alibaba Group Holding Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Alibaba Group Holding Ltd filed Critical Alibaba Group Holding Ltd
Priority to CN201410734967.2A priority Critical patent/CN105718767B/en
Priority to CN202010118726.0A priority patent/CN111371761B/en
Publication of CN105718767A publication Critical patent/CN105718767A/en
Application granted granted Critical
Publication of CN105718767B publication Critical patent/CN105718767B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Abstract

The application discloses an information processing method and device based on risk identification. The method comprises the following steps: partitioning characters included in information to be identified into different character sets; determining component risk values corresponding to the character sets respectively; determining a comprehensive risk value of the information to be identified according to the component risk values corresponding to the character sets; and processing the information to be identified according to the comprehensive risk value. In the application, the characters having corresponding meanings in the information to be identified are partitioned into the different character sets; the comprehensive risk value corresponding to the information to be identified can be determined accurately without subjective judgment after the component risk values corresponding to the character sets respectively are determined; and pre-saved identified information is taken as a basis during determination of the component risk values corresponding to the character sets. Thus, an actual value degree of the information to be identified can be reflected more accurately.

Description

A kind of information processing method based on risk identification and device
Technical field
The application relates to field of computer technology, particularly relates to a kind of information processing method based on risk identification And device.
Background technology
The thin number of mobile subscriber number along with the development of information technology, in the communication equipment that user is used (Mobile Directory Number, MDN, namely phone number), has become as a kind of important use Family identification information, the operation such as user is possible not only to use this number to carry out registering, login, it is also possible to will This number is bound with corresponding network account, to carry out the critical network operations such as checking.
At present, the phone number that user is used has the risk being stolen, the phone number being stolen will to The network operation at family produces great threat, easily causes the loss of user.
In prior art, for the phone number registered in website or bind, server can be to user Phone number carry out risk identification, to determine the risk size that phone number is stolen, thus carry out phase The risk prevention system measure answered.Phone number is carried out risk identification, generally has two kinds of methods: one is opponent Plane No. code carries out action value identification.Another kind is that phone number is carried out risk factor identification.
Phone number is carried out action value identification, be usually according to included in phone number numeral order, Implication, infers the action value of this phone number, generally, occurs more continuous print numeral in phone number, Or repeat identical numeral, then its action value is higher, such as: consecutive numbers occurs in phone number: 13912345678, Or double sign occurs: 13888886666, the action value of such phone number is often higher than regular handset number. The higher phone number of action value easily by as stealing object, so, by the cell-phone number higher for action value Code carries out corresponding wind control operation, such as: promote security monitoring rank etc..
Phone number is carried out risk factor identification, is usually the account of monitoring and the binding of a certain cell-phone number code-phase, Whether violation operation (such as: usurp other people account or other hostile network behaviors etc.) occurred, if there is, Then this phone number is demarcated as high-risk phone number, and carries out phase for this high-risk phone number Answer wind control to operate, such as: to be recorded as blacklist number, stop this phone number to bind or register.
But, the above-mentioned method being identified phone number still suffers from defect.Specifically:
Phone number is carried out action value identification, often relies on subjective judgment, with numeral in phone number Implication judges the action value of phone number, does not have the criterion of specification, it is impossible to fully, the most anti- Mirror the real value degree of phone number.
Phone number is carried out risk factor identification, is demarcated as the phone number of high-risk, may be lost by user Abandon, and after a certain time, reclaimed by telecom operators, be again assigned to other users and be continuing with, by It is recorded as blacklist by Virtual network operator in this phone number, then, it is newly assigned to the use of this phone number Family, it is impossible to register on corresponding website or bind, causing erroneous judgement, has a strong impact on the network of user Operation.
Summary of the invention
The embodiment of the present application provides a kind of information processing method based on risk identification and device, right in order to solve The problem that the risk identification accuracy of information is poor.
A kind of based on risk identification the information processing method that the embodiment of the present application provides, including: by be identified The character comprised in information is divided into different character sets;
Determine the component value-at-risk that each character set is corresponding respectively;
According to the component value-at-risk that each character set is corresponding, determine the integrated risk value of described information to be identified;
According to described integrated risk value, described information to be identified is processed.
A kind of based on risk identification the information processor that the embodiment of the present application provides, including: character divides Module, for being divided into different character sets by the character comprised in information to be identified;
Component value-at-risk module, for determining the component value-at-risk that each character set is corresponding respectively;
Integrated risk value module, for according to component value-at-risk corresponding to each character set, determine described in wait to know The integrated risk value of other information;
Processing module, for according to described integrated risk value, processes described information to be identified.
The embodiment of the present application provides a kind of information processing method based on risk identification and device, by letter to be identified In breath, the character containing corresponding meaning is divided into different character sets, it is determined that each character set correspondence respectively Component value-at-risk after, it is possible to accurately determine the integrated risk value that this information to be identified is corresponding, and be independent of In subjective judgment, when determining component value-at-risk corresponding to each character set, by the letter of identification pre-saved Based on breath, thus the real value degree of information to be identified can be reflected more accurately.
Accompanying drawing explanation
Accompanying drawing described herein is used for providing further understanding of the present application, constitutes the part of the application, The schematic description and description of the application is used for explaining the application, is not intended that the improper limit to the application Fixed.In the accompanying drawings:
The information process schematic diagram based on risk identification that Fig. 1 provides for the embodiment of the present application;
Method one during component value-at-risk corresponding to determination each character set that Fig. 2 provides for the embodiment of the present application Process schematic;
The mistake of method two during component value-at-risk corresponding to determination each character set that Fig. 3 the embodiment of the present application provides Journey schematic diagram;
The mistake of method three during component value-at-risk corresponding to determination each character set that Fig. 4 the embodiment of the present application provides Journey schematic diagram;
The information processor structural representation based on risk identification that Fig. 5 provides for the embodiment of the present application;
The structure of determination the first component value-at-risk real component value-at-risk module that Fig. 6 provides for the embodiment of the present application Schematic diagram;
The structure of the determination second component value-at-risk real component value-at-risk module that Fig. 7 provides for the embodiment of the present application Schematic diagram;
The structure of the determination three-component value-at-risk real component value-at-risk module that Fig. 8 provides for the embodiment of the present application Schematic diagram.
Detailed description of the invention
For making the purpose of the application, technical scheme and advantage clearer, specifically real below in conjunction with the application Execute example and technical scheme is clearly and completely described by corresponding accompanying drawing.Obviously, described Embodiment is only some embodiments of the present application rather than whole embodiments.Based on the enforcement in the application Example, the every other enforcement that those of ordinary skill in the art are obtained under not making creative work premise Example, broadly falls into the scope of the application protection.
The information process based on risk identification that Fig. 1 provides for the embodiment of the present application, this process is specifically wrapped Include following steps:
S101: the character comprised in information to be identified is divided into different character sets.
In the scene of the embodiment of the present application, after user have registered accounts information (such as: network account), meeting The user profile of this user self is bound with account information, in order to be identified when corresponding operating Certification.Therefore the information described to be identified in the embodiment of the present application, particularly as follows: bind mutually with accounts information, For being authenticated the user profile identified.This information to be identified includes but not limited to: the phone number of user, Passport NO. etc..
Generally, the character included in above-mentioned information to be identified has certain implication.As a example by phone number: In 11 phone numbers 13812348888, front three numeral " 138 " represents the attribute type of phone number, By this three bit digital, it may be determined that go out the telecom operators belonging to this phone number and corresponding service class Type.The 4-digit number " 1234 " of the four to seven, for attaching position register (Home Location Register, HLR) identification code, by this 4-digit number, it may be determined that go out user profile corresponding to this phone number (such as: The homing position information of phone number, call priority information etc.).Last 4-digit number " 8888 ", represents Customs Assigned Number, by this 4-digit number, it may be determined that concrete a certain user.Visible, for phone number For, the numeral wherein comprised has corresponding implication.
Therefore, in above-mentioned steps S101, the character in information to be identified with certain implication can be drawn It is divided into different character sets.
It should be noted that in above-mentioned steps S101, character is divided into the mode of character set, tool Body, it may be that the character specific bit in information to be identified put, is divided into a character set.So, pin To the characters specified on position different in described information to be identified, these characters are divided into different character sets, Just multiple different character set is obtained.Wherein, the institute during the intersection of each character set comprises information to be identified There is character, and at least two character set exists common factor.
S102, determines the component value-at-risk that each character set is corresponding respectively.
After the character with certain implication is divided into different character sets, determine each character set one by one Component value-at-risk.Wherein, described component value-at-risk, distinguish corresponding risk for each character set Quantized value.The implication of the character owing to being divided in different character sets is different, therefore in the embodiment of the present application In, determine that the component value-at-risk that each character set is corresponding will in different ways, such as: based on character set In the probability that occurs of character, accounting under given conditions, the various ways such as weight of character determine difference The component value-at-risk that character set is corresponding.
It should be noted that the component value-at-risk in the embodiment of the present application, reflect the character in character set Action value, and reflect risk by action value.
Specifically, still as a example by above-mentioned phone number 13812348888, if by after in this phone number 4-digit number " 8888 " is divided in a character set, it is clear that in 4-digit number, 4-digit number occur The probability all repeated is minimum, say, that the action value that character set containing this 4-digit number is corresponding It is high, then, under actual application scenarios, the information to be identified containing this character set has bigger possibility It is stolen, that is, the risk that this character set is stolen is higher.
S103, according to the component value-at-risk that each character set is corresponding, determines the comprehensive wind of described information to be identified Danger value.
Due to the character included in each character set, it is the alphabet in information to be identified, so, logical Cross the risk that risk corresponding to each character set just can reflect that this information to be identified is overall, that is, root According to the component value-at-risk that each character set is corresponding, it may be determined that go out the integrated risk value that this information to be identified is overall. Certainly, in the embodiment of the present application, the component value-at-risk of each character set can be by cumulative, average etc. many The mode of kind determines the integrated risk value of information to be identified, is not intended as the restriction to the application here.
S104, according to described integrated risk value, processes described information to be identified.
In the embodiment of the present application, integrated risk value reflects the risk of information to be identified, specifically, Integrated risk value is the biggest, and the risk of information to be identified is the highest, then, the safety that this information to be identified is subject to Threaten the highest, such as: be likely to be stolen, therefore, for the letter to be identified that integrated risk value is too high Breath, needs to combine corresponding risk control system and processes, and processing mode can be an up security monitoring level Not or increase safety prevention measure etc..In actual applications, corresponding risk threshold values can be pre-set, When the integrated risk value of this information to be identified determined is higher than this risk threshold values, just to this letter to be identified Breath carries out corresponding wind control and processes.
By above-mentioned steps, the character containing corresponding meaning in information to be identified is divided into different character set Close, it is determined that after the component value-at-risk that each character set is the most corresponding, it is possible to accurately determine this letter to be identified The integrated risk value that breath is corresponding, and do not rely on subjective judgment, at the component wind determining that each character set is corresponding During the value of danger, based on the identification information pre-saved, thus can reflect to be identified more accurately The real value degree of information.
In the embodiment of the present application, due to the character in kinds of characters set, there is different implications, then, When determining component value-at-risk corresponding to kinds of characters set, also will in different ways.Specifically:
Method one:
As in figure 2 it is shown, the method one determines the component value-at-risk that each character set is corresponding process particularly as follows:
S201, according to the sequencing of each character in described information to be identified, by the word in described character set Symbol arranges, and obtains the character string that this character set is corresponding.
When the character specifying position in information to be identified is divided into a character set, it is not according to respectively Character is divided into corresponding character set by the sequencing of character, it is more likely that be by corresponding character randomly Being divided in this character set, the change of character sequencing may make to be divided into the character in this character set not There is corresponding implication.Such as, the numeral of first of phone number to the 3rd is respectively 138, then, Assume first, second and third of phone number for specifying position, then by phone number first to the 3rd Character when being divided into a character set, the orders such as 381 or 813 may be formed, so, at this Three bit digital in character set the most do not have an implication representing phone number attribute type, thus cause can not Accurately determine the component value-at-risk that character set is corresponding.
Therefore, in the embodiment of the present application, the character in information to be identified is divided into different character sets After, the character being divided in this character set is arranged so that these characters meet in information to be identified each The sequencing of character, that is, just obtained the character string that this character set is corresponding, thus not after Pai Lie Change the implication of these characters.
S202, in each identified normal information pre-saved, determines the letter with identical characters sequence The accounting of breath, as the first accounting.
In actual application scenarios, accounts information and the information bound with it, be stored in relevant device (such as: Server) in, it is possible that user's information of accessing to your account carries out stealing the violation operations such as account, then, Whether relevant device will occur violation operation by monitoring accounts information, judge to identify and account information phase The information of binding is normal information or improper information.Certainly, in actual applications, determine and respectively identify Information whether be normal information, can to use the mode such as network behavior monitoring of the prior art, analysis, This is not intended that the restriction to the application.
Therefore, in the embodiment of the present application, each identified normal information pre-saved, can be in advance It is stored in relevant device, and regards as normal information, such as: in a certain website, for different accounts Different phone numbers bound in information, after corresponding identifying processing, regard as normal phone number, It it is exactly each identified normal information pre-saved.
For including the information of above-mentioned character string, it possibly be present in identified normal information, also Likely occur in improper information.So, add up all information with this character string, all Accounting (the first accounting) in identified normal information.
S203, in the improper information of each identified pre-saved, determines have identical characters sequence The accounting of information, as the second accounting.
The improper information of each identified that is similar with above-mentioned first accounting, that pre-save, can be to deposit in advance Storage regards as improper information, such as in relevant device: the blacklist obtained after corresponding identifying processing Phone number.By this second accounting.
S204, determines the ratio of described first accounting and described second accounting.
By the ratio of the first accounting Yu the second accounting, can represent that the information containing this character string is normal Information or the possibility degree of improper information, specifically, if the first accounting is remote with the ratio of the second accounting More than 1, say, that the first accounting is much larger than the second accounting, represent that the information containing this character string exists Have been identified as the ratio in normal information, much larger than the ratio in having been identified as improper information, thus can Bigger to determine the probability as normal information of the information containing this character string.
S205, determines, according to described ratio, the first component value-at-risk that described character set is corresponding.
It should be noted that due in actual application scenarios, the quantity of the identified information prestored It is huge, then, the first accounting may be relatively big with the ratio of the second accounting, adds the operand of subsequent treatment. For simplified operation, in the embodiment of the present application, this ratio can be simplified in the way of using logarithm operation, That is, for above-mentioned steps S205, determine, according to described ratio, the first component wind that described character set is corresponding Danger value, particularly as follows: the logarithm value of this ratio will be determined, and determines above-mentioned character set pair according to this logarithm value The the first component value-at-risk answered.If directly using the logarithm value of this ratio as described first component value-at-risk, by (in logarithm, if antilog is less than 1, then this logarithm is tied to be likely to occur the minus situation of numerical value in logarithm value Fruit is less than zero), then, the integrated risk value of described information to be identified is determined according to this first component value-at-risk Time, this integrated risk value may be brought certain error.
Therefore, more specifically, above-mentioned determine that according to described logarithm value described character set is corresponding first point The step of amount value-at-risk, particularly as follows: by described logarithm value and the regulating constant sum preset, as described word The first component value-at-risk that symbol set is corresponding.So, it is possible to by default regulating constant, offset The error that described logarithm value is brought when less than zero.
In the embodiment of the present application, described default regulating constant, at least should be corresponding more than each character set Ratio logarithm value in the absolute value of minimum numerical value.Thus, the first accounting of all character sets and the The logarithm value of the ratio of two accountings, with described default regulating constant sum, is the numerical value more than zero, no There will be minus situation.
Under a kind of scene provided in the embodiment of the present application, if described information to be identified is mobile phone to be identified Number, the most described character set by: be made up of the some numerals comprised in described phone number to be identified Digital collection, in this case, the front three that comprises in the phone number to be identified by this numeral, When being divided into the first character set, for the first character set, according to each number in described phone number to be identified The sequencing of word, arranges the numeral in described first character set, obtains this first character set The first corresponding Serial No..Now, in conjunction with said method one, formula can be passed through
S 1 = log ( p 1 p 2 ) + C
Determine the first component value-at-risk that described first character set is corresponding.
Wherein, S1For the first component value-at-risk that described first character set is corresponding.
p1For: in each identified normal handset number pre-saved, containing the hands of the first Serial No. The accounting of plane No. code.
p2For: in the improper phone number of each identified pre-saved, containing the first Serial No. The accounting of phone number.
C is default regulating constant value.
With an application example, said method one is specifically described below:
Assuming that phone number is still 13812348888, this phone number is bound with account A phase, then, when After server receives the registration of account A, can be to the phone number bound with account A phase 13812348888 are identified.The front three numeral of this phone number is divided into the first character set by server, And according to the sequencing of front three numeral in this phone number, the numeral in this character set is arranged, Obtain the first character string " 138 ".
Assume the quantity being identified as normal phone number prestored in server be 10000 ( In actual application, the enormous amount of the account that server is stored, at this for convenience of describing, only with 10000 As a example by), in these 10000 normal phone numbers, containing the cell-phone number of the first character string " 138 " Code has 2000, hence, it can be determined that go out the phone number containing this first character string " 138 ", The first accounting p in normal handset number1, that is, p1=2000/10000=0.5.
Assume that the quantity being identified as improper phone number prestored in server is 100, In these 100 improper phone numbers, the phone number containing the first character string " 138 " has 2 Individual, hence, it can be determined that go out the phone number containing this first character string " 138 ", prestoring Improper phone number in the second accounting p2, that is, p2=2/100=0.02.
Obtaining the first accounting p1With the second accounting p2After, just may determine that the first accounting p1With the second accounting p2Ratio, namely p1/p2=0.5/0.02=25.This ratio is much larger than 1, and just explanation is containing character string " 138 " Phone number, be that the probability of normal handset number is higher.
Simultaneously, it is assumed that regulating constant value C is 8, then use above-mentioned formula, first point of the first character set Amount value-at-risk S 1 ≈ log ( 0.5 0.02 ) + 8 ≈ 9.40 .
Visible by upper example, use the mode of the first accounting and the second accounting to determine first point of character set Amount value-at-risk, it is possible to be accurately that the probability of normal information or improper information is entered to information to be identified Row quantifies: this first component value-at-risk is the biggest, and this information to be identified is that the probability of normal information is the highest, The probability with stolen risk is the biggest, otherwise, the probability for improper information is the highest, has stolen The probability of risk is the least.
Method two:
As it is shown on figure 3, the method two determines the component value-at-risk that each character set is corresponding process particularly as follows:
S301, according to the sequencing of each character in described information to be identified, by the word in described character set Symbol arranges, and obtains the character string that this character set is corresponding.
Similar with said method one, the character specifying position in information to be identified is divided into a character set Time, be not that character is divided into corresponding character set by the sequencing according to each character, therefore, will to point The character entered in this character set arranges, and obtains the character string that this character set is corresponding.
S302, in each identification information pre-saved, determines the identification information containing this character string Corresponding each accounts information.
Owing in the embodiment of the present application, each accounts information is all bound mutually with corresponding information, then, for Arbitrary information that identified, all can uniquely determine and identify the accounts information that information is bound mutually with this.
It should be noted that accounts information following in method two each means the identification containing this character string Each accounts information that information is corresponding.
S303, determines the grade of service of each accounts information.
In actual application scenarios, user can use the accounts information of himself to obtain all kinds of business service, The business service that a certain accounts information obtains is the most, just explanation user's information of the commonly used account, and then should Accounts information is that the probability of normal accounts information is the highest.In order to quantify user's usage degree to accounts information, The corresponding grade of service can be set in advance for different business service, thus, can use according to accounts information The situation of business service, determines the grade of service of account information.
Such as, preset: the grade of the business service being associated with bank card is 5, then, if a certain Accounts information has bound corresponding bank card, and has opened the business being associated with this bank card, the then account The grade of service corresponding to information is just 5.
Certainly, if a certain accounts information uses multiple business service, then the grade of service of account information is The grade of service sum of each business service, such as: a certain accounts information has opened two kinds of business service, and these are two years old The grade of service planting business service is respectively 3 and 4, then, the grade of service of account information is 7.
It should be noted that in actual applications, the determination to the grade of service of accounts information, however it is not limited to Aforesaid way, it is also possible to be the liveness according to accounts information, or the used business service of accounts information The modes such as frequency, determine the grade of service of accounts information, and this is not intended that the restriction to the application.
S304, according to the grade of service of each accounts information, the quantity of the accounts information of statistics different business grade.
Generally, the limitednumber of business service, in each accounts information, it will have more accounts information to use The situation of identical services service, say, that the grade of service of these accounts informations is identical.In this Shen Please be in embodiment, it is thus necessary to determine that the quantity of the accounts information that the grade of service is identical, therefore determining that each account is believed After the grade of service of breath, the quantity of accounts information corresponding to each grade of service will be added up.
S305, in each accounts information, determines the accounting of the accounts information of different business grade respectively.
In the case of the quantity of accounts information corresponding to known each grade of service, it is possible to determine each respectively The accounts information that the grade of service is corresponding, in the account that all information that identify comprising described character string are corresponding In information, shared ratio, it is thus possible to reflect that these accounts informations use business service intuitively Degree.
S306, according to the grade of service of each accounts information, and the accounting of the accounts information of different business grade, Determine the second component value-at-risk that described character set is corresponding.
Determining the grade of service of each accounts information, and after the accounting of the accounts information of different business grade, The most just can show that, the grade of service distribution of all identified information containing described character string.
Under a kind of scene provided in the embodiment of the present application, if described information to be identified is mobile phone to be identified Number, the most described character set by: be made up of the some numerals comprised in described phone number to be identified Digital collection, in this case, the first seven bit digital comprised in the phone number to be identified by this, When being divided into the second character set, for the second character set, according to each number in described phone number to be identified The sequencing of word, arranges the numeral in described second character set, obtains this second character set The second corresponding Serial No..Now, in conjunction with said method two, formula can be passed through
S2=Σ (w (i) * Prob (i))
Determine the second component value-at-risk that described second character set is corresponding.
Wherein, S2For the second component value-at-risk that described second character set is corresponding.
W (i) represents: i-th kind of grade of service in each grade of service determined is w (i).
Prob (i) is: the accounts information of i-th kind of grade of service accounting in each accounts information determined.
It should be noted that in the embodiment of the present application, before why 11 phone numbers comprising Seven bit digital are divided into the second character set, are because: by the front three and the four to seven of phone number The numeral of four of position, it may be determined that under a certain attribute type (e.g., same operator), there is same call There is under the phone number of priority, or a certain attribute type the phone number etc. of same home position, It is to say, by the first seven bit digital, it may be determined that go out there is the phone number of same characteristic features.
With an application example, said method two is specifically described below:
Assuming that phone number is still 13812348888, this phone number is bound with account A phase, then, when After server receives the registration of account A, can be to the phone number bound with account A phase 13812348888 are identified.The first seven bit digital of this phone number is divided into the second character set by server, And according to the sequencing of the first seven bit digital in this phone number, the numeral in this character set is arranged, Obtain the second character string " 1381234 ".
Server, by the phone number of identification pre-saved, determines containing this second character string All phone numbers of " 1381234 ".It is assumed that the mobile phone containing this second character string " 1381234 " Number has 1000.So, server will determine the account bound in these 1000 phone numbers respectively Information, corresponding, server will determine 1000 accounts informations.
Afterwards, server will determine above-mentioned 1000 accounts letter according to grade of service standard set in advance The grade of service of breath.Server can determine its grade of service according to the business service that accounts information is used, Certainly, server determines the grade of service of accounts information, can use each business service set in advance etc. The various ways such as grade standard, when reality is applied, can be adjusted setting according to the needs of reality application, Here the restriction to the application it is not intended that.
Assume, in above-mentioned 1000 accounts informations, to occur in that two kinds of grades of service altogether have 900 account letters The grade of service of breath is the 1st kind of grade of service w (1), and w (1) is 5, other 100 accounts informations The grade of service be the 2nd kind of grade of service w (2), and w (2) is 4.So, the grade of service is 5 Accounts information, accounting Prob (1) in above-mentioned 1000 accounts informations is 900/1000=0.9, business Grade is the accounts information of 4, and accounting Prob (2) in above-mentioned 1000 accounts informations is 100/1000=0.1.
Thus, server can be determined containing the second character string " 1381234 " according to above-mentioned formula Second component value-at-risk S corresponding to the second character set2=0.9*5+0.1*4=4.9.This second component risk Value, close to grade of service w (1), say, that containing above-mentioned second character string " 1381234 " The accounts information corresponding to phone number, its grade of service maintains essentially in the level of w (1).
Visible by upper example, determine the accounts information corresponding containing the information that identifies of described character string, and Determine the grade of service of these accounts informations, the journey of the business service that these accounts informations use can be reflected Degree, meanwhile, in conjunction with the quantity of accounts information corresponding to the different business grade counted, it is possible to from entirety Upper quantization contains the grade of service of accounts information corresponding to the information that identifies of described character string.This second point Amount value-at-risk is the biggest, and this information to be identified is that the probability of normal information is the highest, and have a stolen risk can Energy property is the biggest, otherwise, the probability for improper information is the highest, and the probability with stolen risk is the least.
Method three:
As shown in Figure 4, the method three determines the process of component value-at-risk that each character set is corresponding particularly as follows:
S401, according to the sequencing of each character in described information to be identified, by the word in described character set Symbol arranges, and obtains the character string that this character set is corresponding.
Similar with said method one and method two, after corresponding character is divided into character set, just to this Each character in character set arranges.
S402, identifies the characteristic character in described character string.
In the embodiment of the present application, described characteristic character, including repeat character (RPT) and/or order character, wherein, Repeat character (RPT), is specially at least two consecutive identical characters, such as: aaa, bb, cccc etc..Sequentially Character, is specially at least three according to certain continuously arranged character of character sequence.Such as: abcd, 789, 321,1234 etc..
It addition, for the identification of described characteristic character, character recognition algorithm of the prior art can be used, It is not intended that the restriction to the application.
S403, when identifying characteristic character, determines weighted value and the eigenvalue of this feature character.
In described character string, there is substantial amounts of permutation and combination method, the arrangement of most characters in different characters Combination is all random and unordered, the most in limited instances, just permutation and combination can become described characteristic character, It is to say, this feature character has certain probability.It addition, the character quantity in characteristic character and this spy The probability levying character appearance is inversely proportional to, and specifically, the character quantity in characteristic character is the most, then this feature The probability that character occurs is the lowest, and the character quantity in characteristic character is the fewest, then the appearance of this feature character is general Rate is the highest.Such as: the probability that repeat character (RPT) " 8888 " occurs in 11 phone numbers is the least, Comparatively speaking, the probability that repeat character (RPT) " 88 " occurs in 11 phone numbers is bigger.
Therefore, in the embodiment of the present application, the probability that will occur according to characteristic character, quantify this feature character Weighted value, according to the character quantity comprised in this feature character, quantify the eigenvalue of this feature character. That is, above-mentioned steps S403, specifically include: determine that described characteristic character occurs in this character string general Rate;The weighted value of described characteristic character is determined according to this probability;Participle is carried out for described characteristic character, To character cell;According to the quantity of the character cell obtained, determine the eigenvalue of described characteristic character.
Need exist for explanation, when described characteristic character is carried out participle, can be according to N-gram language Model carries out participle, i.e. N continuous the character that N-gram language model can will comprise in a certain character string Being divided into a character cell, N is exactly the quantity of the character included in be marked point of character cell. In the embodiment of the present application, use N-gram language model time described characteristic character is carried out participle, can be by This feature character is divided into the character cell (now N=1) of minimum, and increases character in character cell successively Quantity, until be divided into a character cell (now in N=this feature character by whole for this feature character The quantity of the character comprised).
Such as: for characteristic character 8888, use N-gram language model to carry out participle, divide at 1-gram Under word method, described characteristic character is divided into 4 character cells 8,8,8,8, at 2-gram participle Under method, described characteristic character is divided into 3 character cells 88,88,88, in 3-gram participle side FAXIA, is divided into 2 character cells 888,888 by described characteristic character, under 4-gram segmenting method, Features described above character is divided into 1 character cell 8888.
S404, according to weighted value and the eigenvalue of described characteristic character, determine that described character set is corresponding Three-component value-at-risk.
For said method three, under a kind of scene provided in the embodiment of the present application, when described letter to be identified Breath is phone number to be identified, and, the rear eight-digit number word comprised in the phone number to be identified by this, When being divided into three character sets, for the 3rd character set, according to each number in described phone number to be identified The sequencing of word, arranges the numeral in described 3rd character set, obtains the 3rd character set The 3rd corresponding Serial No..If including in the 3rd Serial No., repetition is digital and/or order is digital, just May determine that the eigenvalue that repetition is digital and/or order is digital.
When identifying repetition numeral, carry out participle for described repetition numeral, obtain different digital units, At this point it is possible to pass through formula
S c ( n ) = Σ j = 1 n j ( t f j - 1 )
Determine the described eigenvalue repeating numeral.
Wherein, ScN () is attached most importance to the eigenvalue of complex digital, independent variable n represents the number comprised in this repetition numeral The quantity of word.
tfjAfter carrying out participle for counterweight complex digital, the quantity of the character cell obtained.
J represents jth kind segmenting method, and uses in each digital units that jth kind segmenting method obtains and comprise The quantity of character be j.Certainly, when j uses N-gram language model to carry out participle exactly, the value of N.
The most such as: use N-gram language model to carry out drawing for characteristic character " 8888 " in upper example On the basis of Fen, use above-mentioned formula, determine that the eigenvalue of this repetition numeral " 8888 " is:
Sc(n)=1* (4-1)+2* (3-1)+3* (2-1)+4* (1-1)=10.
Wherein, for 2* (3-1), it is based on 2-gram segmenting method, by described characteristic character " 8888 " Being divided into 3 character cells 88,88,88, numeral " 2 " is exactly the number of the character comprised in character cell Amount, numeral " 3 " is exactly the quantity of character cell.By that analogy, just can obtain in above-mentioned formula is each Entry value.
In actual application scenarios, generally order numeral at least to include three characters, say, that When order numeral is carried out participle, minimum should carry out participle for the order numeral containing three characters.And When above-mentioned repetition numeral is carried out participle, minimum carry out participle for the repetition numeral containing two characters.Can See, when determining eigenvalue, the character comprised in the quantity proportion complex digital of the character that sequentially numeral includes Few one of quantity.
Therefore, when identifying order numeral, determine the quantity of the character comprised in this order numeral, now, Formula can be passed through
Ss(n ')=Sc(n′-1)
Determine the eigenvalue of described order numeral.
Wherein, SsEigenvalue for order numeral.
Independent variable n ' is the quantity of the character that described order numeral includes.
The most such as: when determining the eigenvalue of five orders numeral " 12345 ", this feature value and repetition Numeral, such as: the eigenvalue of " 8888 " is identical, uses above-mentioned formula, determines this order numeral " 12345 " Eigenvalue be:
Ss(5)=Sc(4)=1* (4-1)+2* (3-1)+3* (2-1)+4* (1-1)=10.
After determining the eigenvalue that above-mentioned repetition is digital and/or order is digital, just can use formula
S3=w (Sc+Ss+1)
Determine the three-component value-at-risk that described 3rd character set is corresponding.
Wherein, S3For the three-component value-at-risk that described 3rd character set is corresponding.
W is digital and that sequentially numeral occurs in the 3rd Serial No. probit of the described repetition identified Reciprocal.
If it should be noted that in the 3rd Serial No., only duplicating numeral, or ordinal number only occur During word, then, only need to determine that this repetition numeral (or order numeral) occurs in the 3rd Serial No. Probit, and using the inverse of this probit as the weighted value w of characteristic character.If at the 3rd Serial No. In, occur in that repetition is digital and the most digital simultaneously, then, will determine that this repetition is digital and order numeral is same Time occur in the probit in the 3rd Serial No., and by the inverse of this probit, as weight occurs simultaneously When complex digital and order numeral, the weighted value of characteristic character.
With an application example, said method two is specifically described below:
Assuming that phone number is still 13812348888, this phone number is bound with account A phase, then, when After server receives the registration of account A, can be to the phone number bound with account A phase 13812348888 are identified.The rear eight-digit number word of this phone number is divided into the 3rd character set by server, And according to the sequencing of eight-digit number word rear in this phone number, the numeral in this character set is arranged, Obtain the 3rd character string " 12348888 ".
Obviously, existing characteristics character in the 3rd character string " 12348888 ", that is, contain suitable simultaneously Ordinal number word " 1234 " and repetition numeral " 8888 ".In order to determine the weighted value w of this feature character, just need Determine in the eight-digit number word identical with above-mentioned 3rd character string figure place that this order numeral and weight occurs simultaneously The probit of complex digital.
Specifically, on each position of the 3rd character string, all there are 10 kinds of probable values of numeral 0~9, So, on eight positions of the 3rd character string, the total amount of the permutation and combination method of numeral is 108.At this In a little permutation and combination methods, order numeral of appearance simultaneously " 1234 " and repetition numeral " 8888 " only two kinds Situation: " 12348888 " and " 88881234 ", thus, in the 3rd character string, occur should simultaneously Order numeral and the probit repeating numeral are 2/108.So, according to above-mentioned formula, it may be determined that go out W=108/2.Obviously, the value of this w is relatively big, is not easy to subsequent calculations, then, in actual applications, can In the way of using evolution, taking the logarithm, the value of abbreviation w, it is assumed that the value in this application example, to w Carry out out 7 powers, thus, the w ' ≈ 22.4 after abbreviation.
Afterwards, server is determining repetition numeral " 8888 " and the feature of order numeral " 1234 " respectively Value, for repeating numeral " 8888 ", its eigenvalue Sc(4)=10, for order numeral " 1234 ", its Eigenvalue Ss(4)=Sc(3)=4.
Therefore, according to above-mentioned formula, three-component value-at-risk S of the 3rd character string3=22.4* (10+4+1) =336.
Visible by upper example, when method three determines the three-component value-at-risk of the 3rd character set, if this The figure place of the characteristic character included in three character sets is the most, and the weighted value of this feature character and eigenvalue are also The biggest, this just explanation, in this case, this information to be identified has higher action value.This is years old Three-component value-at-risk is the biggest, and this information to be identified is that the probability of normal information is the highest, has stolen risk Probability the biggest, otherwise, the probability for improper information is the highest, has the probability of stolen risk The least.
So far, three of the above method, determine three kinds of component value-at-risks of this information to be identified respectively, thus The most just can determine the integrated risk value that this information to be identified is overall, in this Shen according to these component value-at-risks Embodiment please determine the integrated risk value of described information to be identified, particularly as follows: by described each character set Corresponding component value-at-risk carries out geometric average, obtains the integrated risk value of described information to be identified.
The most such as: the example in aforementioned manners to three, phone number " 13812348888 " Integrated risk value S = ( S 1 2 + S 2 2 + S 3 2 ) 1 2 = 336.17 .
The integrated risk value of described information to be identified is the biggest, the most just illustrates that the action value of this information to be identified is the highest, Its risk being stolen is the biggest, so, in actual applications, when this information to be identified determined When integrated risk value is more than a certain default value-at-risk, it is possible to the account of this information to be identified and binding thereof is believed The monitoring rank of breath, it is to avoid situation about being stolen occurs.
It addition, when making in aforementioned manners, determine combining of the information to be identified bound mutually with a certain accounts information After closing value-at-risk, at a time, account information has bound again new information to be identified, but, new The integrated risk value of information to be identified, far below the integrated risk value of former information to be identified, then, the account Information very likely occurs in that situation about being stolen, it is thus possible to promote the monitoring rank to account information.
Certainly, above-mentioned is illustrate as a example by information to be identified is as phone number, the embodiment of the present application Above-mentioned based on risk identification the information processing method provided can be additionally used in the wind identifying other information to be identified Danger, and process based on risk, such as, this information to be identified can also is that E-mail address, certificate Number etc..
The information processing method based on risk identification provided for the embodiment of the present application above, based on same think of Road, the embodiment of the present application also provides for a kind of information processor based on risk identification, as shown in Figure 5.
Information processor based on risk identification in Fig. 5, including: character divides module 501, component Value-at-risk module 502, integrated risk value module 503 and processing module 504, wherein,
Character divides module 501, for the character comprised in information to be identified is divided into different character set Close.
Component value-at-risk module 502, for determining the component value-at-risk that each character set is corresponding respectively.
Integrated risk value module 503, for the component value-at-risk corresponding according to each character set, determines described The integrated risk value of information to be identified.
Processing module 504, for according to described integrated risk value, processes described information to be identified.
Described character divides module 501, specifically for: the character that specific bit in information to be identified is put, Being divided into a character set, wherein, the intersection of each character set comprises all characters in information to be identified, At least two character set exists and occurs simultaneously.
In the embodiment of the present application, due to the character in kinds of characters set, there is different implications, then, When determining component value-at-risk corresponding to kinds of characters set, also will in different ways.Specifically:
As shown in Figure 6, when determining the first component value-at-risk, described component value-at-risk module, specifically include:
Character arrangements submodule 601, is used for according to the sequencing of each character in described information to be identified, will Character in described character set arranges, and obtains the character string that this character set is corresponding.
First accounting submodule 602, in each identified normal information pre-saved, determines tool There is the accounting of the information of identical characters sequence, as the first accounting.
Second accounting submodule 603, in the improper information of each identified pre-saved, determines There is the accounting of the information of identical characters sequence, as the second accounting.
Ratio submodule 604, for determining the ratio of described first accounting and described second accounting.
According to described ratio, first component value-at-risk submodule 605, for determining that described character set is corresponding First component value-at-risk.
When described first component value-at-risk is excessive, in order to simplify subsequent arithmetic, described first component value-at-risk Submodule 605, specifically for: determine the logarithm value of described ratio, determine described word according to described logarithm value The first component value-at-risk that symbol set is corresponding.
Under the another way of the embodiment of the present application, described first component value-at-risk submodule 605, specifically For: by described logarithm value and the regulating constant sum preset, as corresponding first point of described character set Amount value-at-risk.
As it is shown in fig. 7, when determining second component value-at-risk, described component value-at-risk module, specifically include:
Character arrangements submodule 701, is used for according to the sequencing of each character in described information to be identified, will Character in described character set arranges, and obtains the character string that this character set is corresponding.
Accounts information submodule 702, in each identification information pre-saved, determines containing this word Accord with each accounts information that the information that identifies of sequence is corresponding.
Grade of service submodule 703, for determining the grade of service of each accounts information, according to each accounts information The grade of service, the quantity of accounts information of statistics different business grade.
Accounting submodule 704, in each accounts information, determines the account letter of different business grade respectively The accounting of breath.
Second component value-at-risk submodule 705, for the grade of service according to each accounts information and different The accounting of the accounts information of the grade of service, determines the second component value-at-risk that described character set is corresponding.
As shown in Figure 8, when determining three-component value-at-risk, described component value-at-risk module, specifically include:
Character arrangements submodule 801, is used for according to the sequencing of each character in described information to be identified, will Character in described character set arranges, and obtains the character string that this character set is corresponding.
Identify submodule 802, for identifying the characteristic character in described character string.
Characteristic character submodule 803, for when identifying characteristic character, determines the weight of this feature character Value and eigenvalue.
Three-component value-at-risk submodule 804, for the weighted value according to described characteristic character and eigenvalue, Determine the three-component value-at-risk that described character set is corresponding.
Wherein, described characteristic character includes repeat character (RPT) and/or order character.
Described characteristic character submodule 803, specifically for: determine that described characteristic character occurs in this character sequence Probability in row, determines the weighted value of described characteristic character according to this probability, carries out for described characteristic character Participle, obtains character cell, according to the quantity of the character cell obtained, determines the feature of described characteristic character Value.
Under a kind of scene of the embodiment of the present application, described information to be identified is particularly as follows: phone number to be identified. Described character set is particularly as follows: the numeral that is made up of the some numerals comprised in described phone number to be identified Set.Described character divide module 501, specifically for: by phone number to be identified comprises first three Bit digital, is divided into the first character set, the first seven bit digital that will comprise in phone number to be identified, draws It is divided into the second character set, the rear eight-digit number word that will comprise in phone number to be identified, it is divided into the 3rd word Symbol set.
Under this scene, when determining the first component value-at-risk, described component value-at-risk module, specifically for: For the first character set, according to the sequencing of each numeral in described phone number to be identified, by described Numeral in one character set arranges, and obtains the first Serial No. that this first character set is corresponding;
Use formulaDetermine the first component value-at-risk that described first character set is corresponding;
Wherein, S1For the first component value-at-risk that described first character set is corresponding;
p1For: in each identified normal handset number pre-saved, containing the hands of the first Serial No. The accounting of plane No. code;
p2For: in the improper phone number of each identified pre-saved, containing the first Serial No. The accounting of phone number;
C is default regulating constant value.
When determining second component value-at-risk, described component value-at-risk module, specifically for: for the second character Set, according to the sequencing of each numeral in described phone number to be identified, by described second character set Numeral arrange, obtain the second Serial No. that this second character set is corresponding;
In each identification information pre-saved, determine the cell-phone number of identification containing this second Serial No. Each accounts information that code is corresponding;
Determine the grade of service of each accounts information;
Use formula S2=Σ (w (i) * Prob (i)) determines that described second character set is corresponding Two component value-at-risks;
Wherein, S2For the second component value-at-risk that described second character set is corresponding;
W (i) represents: i-th kind of grade of service in each grade of service determined is w (i);
Prob (i) is: the accounts information of i-th kind of grade of service accounting in each accounts information determined.
When determining three-component value-at-risk, described component value-at-risk module, specifically for: for the 3rd character Set, according to the sequencing of each numeral in described phone number to be identified, by described 3rd character set Numeral arrange, obtain the 3rd Serial No. that the 3rd character set is corresponding;
Identify that the repetition in the 3rd Serial No. is digital and/or order is digital;
When identifying repetition numeral, carry out participle for described repetition numeral, obtain different digital units, Use formulaDetermine the described eigenvalue repeating numeral;
Wherein, ScThe eigenvalue of complex digital of attaching most importance to;
tfjAfter carrying out participle for counterweight complex digital, the quantity of the character cell obtained;
J represents jth kind segmenting method, and uses in each digital units that jth kind segmenting method obtains and comprise The quantity of character be j;
When identifying order numeral, determine the quantity of the character comprised in this order numeral, use formula Ss(n ')=Sc(n '-1) determines the eigenvalue of described order numeral;
Wherein, SsEigenvalue for order numeral;
N ' is the quantity of the character that described order numeral includes;
Use formula S=w (Sc+Ss+ 1) the three-component wind that described 3rd character set is corresponding is determined Danger value;
Wherein, S3For the three-component value-at-risk that described 3rd character set is corresponding;
W is digital and that sequentially numeral occurs in the 3rd Serial No. probit of the described repetition identified Reciprocal.
After determining above-mentioned the first to three-component value-at-risk, described integrated risk value module, specifically for: Component value-at-risk corresponding for described each character set is carried out geometric average, obtains combining of described information to be identified Close value-at-risk.
In a typical configuration, calculating equipment includes one or more processor (CPU), input/defeated Outgoing interface, network interface and internal memory.
Internal memory potentially includes the volatile memory in computer-readable medium, random access memory (RAM) and/or the form such as Nonvolatile memory, such as read only memory (ROM) or flash memory (flash RAM). Internal memory is the example of computer-readable medium.
Computer-readable medium includes that removable media permanent and non-permanent, removable and non-can be by appointing Where method or technology realize information storage.Information can be computer-readable instruction, data structure, program Module or other data.The example of the storage medium of computer includes, but are not limited to phase transition internal memory (PRAM), static RAM (SRAM), dynamic random access memory (DRAM), its The random access memory (RAM) of his type, read only memory (ROM), electrically erasable are read-only Memorizer (EEPROM), fast flash memory bank or other memory techniques, read-only optical disc read only memory (CD-ROM), digital versatile disc (DVD) or other optical storage, magnetic cassette tape, tape magnetic Disk storage or other magnetic storage apparatus or any other non-transmission medium, can be used for storage can be calculated The information that equipment accesses.According to defining herein, computer-readable medium does not include temporary computer-readable matchmaker Body (transitory media), such as data signal and the carrier wave of modulation.
Also, it should be noted term " includes ", " comprising " or its any other variant are intended to non-row Comprising, so that include that the process of a series of key element, method, commodity or equipment not only include of his property Those key elements, but also include other key elements being not expressly set out, or also include for this process, The key element that method, commodity or equipment are intrinsic.In the case of there is no more restriction, statement " include One ... " key element that limits, it is not excluded that including the process of described key element, method, commodity or setting Other identical element is there is also in Bei.
It will be understood by those skilled in the art that embodiments herein can be provided as method, system or computer journey Sequence product.Therefore, the application can use complete hardware embodiment, complete software implementation or combine software and The form of the embodiment of hardware aspect.And, the application can use and wherein include calculating one or more The computer-usable storage medium of machine usable program code (include but not limited to disk memory, CD-ROM, Optical memory etc.) form of the upper computer program implemented.
The foregoing is only embodiments herein, be not limited to the application.For this area skill For art personnel, the application can have various modifications and variations.All institutes within spirit herein and principle Any modification, equivalent substitution and improvement etc. made, within the scope of should be included in claims hereof.

Claims (28)

1. an information processing method based on risk identification, it is characterised in that including:
The character comprised in information to be identified is divided into different character sets;
Determine the component value-at-risk that each character set is corresponding respectively;
According to the component value-at-risk that each character set is corresponding, determine the integrated risk value of described information to be identified;
According to described integrated risk value, described information to be identified is processed.
2. the method for claim 1, it is characterised in that the character that will comprise in information to be identified It is divided into different character sets, specifically includes:
Character specific bit in information to be identified put, is divided into a character set, wherein, each character set The intersection closed comprises all characters in information to be identified, and at least two character set exists and occurs simultaneously.
3. the method for claim 1, it is characterised in that determine that each character set is corresponding respectively Component value-at-risk, specifically includes:
According to the sequencing of each character in described information to be identified, the character in described character set is carried out Arrangement, obtains the character string that this character set is corresponding;
In each identified normal information pre-saved, determine the accounting for of information with identical characters sequence Ratio, as the first accounting;
In the improper information of each identified pre-saved, determine the information with identical characters sequence Accounting, as the second accounting;
Determine the ratio of described first accounting and described second accounting;
The first component value-at-risk that described character set is corresponding is determined according to described ratio.
4. method as claimed in claim 3, it is characterised in that determine described character according to described ratio The first component value-at-risk that set is corresponding, specifically includes:
Determine the logarithm value of described ratio;
The first component value-at-risk that described character set is corresponding is determined according to described logarithm value.
5. method as claimed in claim 4, it is characterised in that determine described word according to described logarithm value The first component value-at-risk that symbol set is corresponding, specifically includes:
By described logarithm value and the regulating constant sum preset, as the first component that described character set is corresponding Value-at-risk.
6. the method for claim 1, it is characterised in that determine that each character set is corresponding respectively Component value-at-risk, specifically includes:
According to the sequencing of each character in described information to be identified, the character in described character set is carried out Arrangement, obtains the character string that this character set is corresponding;
In each identification information pre-saved, determine and identify that information is corresponding containing this character string Each accounts information;
Determine the grade of service of each accounts information;
According to the grade of service of each accounts information, the quantity of the accounts information of statistics different business grade;
In each accounts information, determine the accounting of the accounts information of different business grade respectively;
According to the grade of service of each accounts information, and the accounting of the accounts information of different business grade, determine The second component value-at-risk that described character set is corresponding.
7. the method for claim 1, it is characterised in that determine that each character set is corresponding respectively Component value-at-risk, specifically includes:
According to the sequencing of each character in described information to be identified, the character in described character set is carried out Arrangement, obtains the character string that this character set is corresponding;
Identify the characteristic character in described character string;
When identifying characteristic character, determine weighted value and the eigenvalue of this feature character;
Weighted value according to described characteristic character and eigenvalue, determine the three-component that described character set is corresponding Value-at-risk;
Wherein, described characteristic character includes repeat character (RPT) and/or order character.
8. method as claimed in claim 7, it is characterised in that determine this feature character weighted value and Eigenvalue, specifically includes:
Determine that described characteristic character occurs in the probability in this character string;
The weighted value of described characteristic character is determined according to this probability;
Carry out participle for described characteristic character, obtain character cell;
According to the quantity of the character cell obtained, determine the eigenvalue of described characteristic character.
9. the method for claim 1, it is characterised in that described information to be identified is particularly as follows: treat Identify phone number;
Described character set is particularly as follows: be made up of the some numerals comprised in described phone number to be identified Digital collection.
10. method as claimed in claim 9, it is characterised in that by comprise in identity information to be identified Character is divided into different character sets, specifically includes:
The front three numeral that will comprise in phone number to be identified, is divided into the first character set;
The first seven bit digital that will comprise in phone number to be identified, is divided into the second character set;
The rear eight-digit number word that will comprise in phone number to be identified, is divided into the 3rd character set.
11. methods as claimed in claim 10, it is characterised in that determine that each character set is corresponding respectively Component value-at-risk, specifically include:
For the first character set, according to the sequencing of each numeral in described phone number to be identified, by institute The numeral stated in the first character set arranges, and obtains the first Serial No. that this first character set is corresponding;
Use formulaDetermine the first component value-at-risk that described first character set is corresponding;
Wherein, S1For the first component value-at-risk that described first character set is corresponding;
p1For: in each identified normal handset number pre-saved, containing the hands of the first Serial No. The accounting of plane No. code;
p2For: in the improper phone number of each identified pre-saved, containing the first Serial No. The accounting of phone number;
C is default regulating constant value.
12. methods as claimed in claim 10, it is characterised in that determine that each character set is corresponding respectively Component value-at-risk, specifically include:
For the second character set, according to the sequencing of each numeral in described phone number to be identified, by institute The numeral stated in the second character set arranges, and obtains the second Serial No. that this second character set is corresponding;
In each identification information pre-saved, determine the cell-phone number of identification containing this second Serial No. Each accounts information that code is corresponding;
Determine the grade of service of each accounts information;
Use formula S2=∑ (w (i) * Prob (i)) determines that described second character set is corresponding Two component value-at-risks;
Wherein, S2For the second component value-at-risk that described second character set is corresponding;
W (i) represents: i-th kind of grade of service in each grade of service determined is w (i);
Prob (i) is: the accounts information of i-th kind of grade of service accounting in each accounts information determined.
13. methods as claimed in claim 10, it is characterised in that determine that each character set is corresponding respectively Component value-at-risk, specifically include:
For the 3rd character set, according to the sequencing of each numeral in described phone number to be identified, by institute The numeral stated in the 3rd character set arranges, and obtains the 3rd Serial No. that the 3rd character set is corresponding;
Identify that the repetition in the 3rd Serial No. is digital and/or order is digital;
When identifying repetition numeral, carry out participle for described repetition numeral, obtain different digital units, Use formulaDetermine the described eigenvalue repeating numeral;
Wherein, ScThe eigenvalue of complex digital of attaching most importance to;
tfjAfter carrying out participle for counterweight complex digital, the quantity of the character cell obtained;
J represents jth kind segmenting method, and uses in each digital units that jth kind segmenting method obtains and comprise The quantity of character be j;
N is the quantity of the numeral comprised in this repetition numeral;
When identifying order numeral, determine the quantity of the character comprised in this order numeral, use formula Ss(n ')=Sc(n '-1) determines the eigenvalue of described order numeral;
Wherein, SsEigenvalue for order numeral;
N ' is the quantity of the character that described order numeral includes;
Use formula S3=w (Sc+Ss+ 1) the three-component wind that described 3rd character set is corresponding is determined Danger value;
Wherein, S3For the three-component value-at-risk that described 3rd character set is corresponding;
W is digital and that sequentially numeral occurs in the 3rd Serial No. probit of the described repetition identified Reciprocal.
14. as arbitrary in claim 1~13 as described in method, it is characterised in that according to each character set pair The component value-at-risk answered, determines the integrated risk value of described information to be identified, specifically includes:
Component value-at-risk corresponding for described each character set is carried out geometric average, obtains described information to be identified Integrated risk value.
15. 1 kinds of information processors based on risk identification, it is characterised in that including:
Character divides module, for the character comprised in information to be identified is divided into different character sets;
Component value-at-risk module, for determining the component value-at-risk that each character set is corresponding respectively;
Integrated risk value module, for according to component value-at-risk corresponding to each character set, determine described in wait to know The integrated risk value of other information;
Processing module, for according to described integrated risk value, processes described information to be identified.
16. devices as claimed in claim 15, it is characterised in that described character divides module, specifically For: the character that specific bit in information to be identified is put, it is divided into a character set, wherein, each character The intersection of set comprises all characters in information to be identified, and at least two character set exists and occurs simultaneously.
17. devices as claimed in claim 15, it is characterised in that described component value-at-risk module, tool Body includes:
Character arrangements submodule, for according to the sequencing of each character in described information to be identified, by described Character in character set arranges, and obtains the character string that this character set is corresponding;
First accounting submodule, in each identified normal information pre-saved, determines have phase With the accounting of the information of character string, as the first accounting;
Second accounting submodule, in the improper information of each identified pre-saved, determines and has The accounting of the information of identical characters sequence, as the second accounting;
Ratio submodule, for determining the ratio of described first accounting and described second accounting;
First component value-at-risk submodule, for determine that described character set is corresponding according to described ratio first Component value-at-risk.
18. devices as claimed in claim 17, it is characterised in that described first component value-at-risk submodule Block, specifically for: determine the logarithm value of described ratio, determine described character set pair according to described logarithm value The the first component value-at-risk answered.
19. devices as claimed in claim 18, it is characterised in that described first component value-at-risk submodule Block, specifically for: by described logarithm value and the regulating constant sum preset, corresponding as described character set The first component value-at-risk.
20. devices as claimed in claim 15, it is characterised in that described component value-at-risk module, tool Body includes:
Character arrangements submodule, for according to the sequencing of each character in described information to be identified, by described Character in character set arranges, and obtains the character string that this character set is corresponding;
Accounts information submodule, in each identification information pre-saved, determines containing this character sequence Each accounts information corresponding to the information that identifies of row;
Grade of service submodule, for determining the grade of service of each accounts information, according to the industry of each accounts information Business grade, the quantity of the accounts information of statistics different business grade;
Accounting submodule, in each accounts information, determines the accounts information of different business grade respectively Accounting;
Second component value-at-risk submodule, for the grade of service according to each accounts information, and different business The accounting of the accounts information of grade, determines the second component value-at-risk that described character set is corresponding.
21. devices as claimed in claim 15, it is characterised in that described component value-at-risk module, tool Body includes:
Character arrangements submodule, for according to the sequencing of each character in described information to be identified, by described Character in character set arranges, and obtains the character string that this character set is corresponding;
Identify submodule, for identifying the characteristic character in described character string;
Characteristic character submodule, for when identifying characteristic character, determine this feature character weighted value and Eigenvalue;
Three-component value-at-risk submodule, for the weighted value according to described characteristic character and eigenvalue, determines The three-component value-at-risk that described character set is corresponding;
Wherein, described characteristic character includes repeat character (RPT) and/or order character.
22. devices as claimed in claim 21, it is characterised in that described characteristic character submodule, tool Body is used for: determine that described characteristic character occurs in the probability in this character string;Determine described according to this probability The weighted value of characteristic character;Carry out participle for described characteristic character, obtain character cell;According to obtain The quantity of character cell, determines the eigenvalue of described characteristic character.
23. devices as claimed in claim 15, it is characterised in that described information to be identified particularly as follows: Phone number to be identified;
Described character set is particularly as follows: be made up of the some numerals comprised in described phone number to be identified Digital collection.
24. devices as claimed in claim 23, it is characterised in that described character divides module, specifically For:
The front three numeral that will comprise in phone number to be identified, is divided into the first character set;
The first seven bit digital that will comprise in phone number to be identified, is divided into the second character set;
The rear eight-digit number word that will comprise in phone number to be identified, is divided into the 3rd character set.
25. devices as claimed in claim 24, it is characterised in that described component value-at-risk module, tool Body is used for: for the first character set, according to the sequencing of each numeral in described phone number to be identified, Numeral in described first character set is arranged, obtains the first numeral that this first character set is corresponding Sequence;
Use formulaDetermine the first component value-at-risk that described first character set is corresponding;
Wherein, S1For the first component value-at-risk that described first character set is corresponding;
p1For: in each identified normal handset number pre-saved, containing the hands of the first Serial No. The accounting of plane No. code;
p2For: in the improper phone number of each identified pre-saved, containing the first Serial No. The accounting of phone number;
C is default regulating constant value.
26. devices as claimed in claim 24, it is characterised in that described component value-at-risk module, tool Body is used for: for the second character set, according to the sequencing of each numeral in described phone number to be identified, Numeral in described second character set is arranged, obtains the second numeral that this second character set is corresponding Sequence;
In each identification information pre-saved, determine the cell-phone number of identification containing this second Serial No. Each accounts information that code is corresponding;
Determine the grade of service of each accounts information;
Use formula S2=∑ (w (i) * Prob (i)) determines that described second character set is corresponding Two component value-at-risks;
Wherein, S2For the second component value-at-risk that described second character set is corresponding;
W (i) represents: i-th kind of grade of service in each grade of service determined is w (i);
Prob (i) is: the accounts information of i-th kind of grade of service accounting in each accounts information determined.
27. device as claimed in claim 24, it is characterised in that described component value-at-risk module, tool Body is used for: for the 3rd character set, according to the sequencing of each numeral in described phone number to be identified, Numeral in described 3rd character set is arranged, obtains the 3rd numeral that the 3rd character set is corresponding Sequence;
Identify that the repetition in the 3rd Serial No. is digital and/or order is digital;
When identifying repetition numeral, carry out participle for described repetition numeral, obtain different digital units, Use formulaDetermine the described eigenvalue repeating numeral;
Wherein, ScThe eigenvalue of complex digital of attaching most importance to;
tfjAfter carrying out participle for counterweight complex digital, the quantity of the character cell obtained;
J represents jth kind segmenting method, and uses in each digital units that jth kind segmenting method obtains and comprise The quantity of character be j;
N is the quantity of the numeral comprised in this repetition numeral;
When identifying order numeral, determine the quantity of the character comprised in this order numeral, use formula Ss(n ')=Sc(n '-1) determines the eigenvalue of described order numeral;
Wherein, SsEigenvalue for order numeral;
N ' is the quantity of the character that described order numeral includes;
Use formula S3=w (Sc+Ss+ 1) the three-component wind that described 3rd character set is corresponding is determined Danger value;
Wherein, S3For the three-component value-at-risk that described 3rd character set is corresponding;
W is digital and that sequentially numeral occurs in the 3rd Serial No. probit of the described repetition identified Reciprocal.
28. devices as described in arbitrary in claim 15~27, it is characterised in that described integrated risk value Module, specifically for: component value-at-risk corresponding for described each character set is carried out geometric average, obtains institute State the integrated risk value of information to be identified.
CN201410734967.2A 2014-12-04 2014-12-04 information processing method and device based on risk identification Active CN105718767B (en)

Priority Applications (2)

Application Number Priority Date Filing Date Title
CN201410734967.2A CN105718767B (en) 2014-12-04 2014-12-04 information processing method and device based on risk identification
CN202010118726.0A CN111371761B (en) 2014-12-04 2014-12-04 Information processing method and device based on risk identification

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201410734967.2A CN105718767B (en) 2014-12-04 2014-12-04 information processing method and device based on risk identification

Related Child Applications (1)

Application Number Title Priority Date Filing Date
CN202010118726.0A Division CN111371761B (en) 2014-12-04 2014-12-04 Information processing method and device based on risk identification

Publications (2)

Publication Number Publication Date
CN105718767A true CN105718767A (en) 2016-06-29
CN105718767B CN105718767B (en) 2020-01-31

Family

ID=56143708

Family Applications (2)

Application Number Title Priority Date Filing Date
CN202010118726.0A Active CN111371761B (en) 2014-12-04 2014-12-04 Information processing method and device based on risk identification
CN201410734967.2A Active CN105718767B (en) 2014-12-04 2014-12-04 information processing method and device based on risk identification

Family Applications Before (1)

Application Number Title Priority Date Filing Date
CN202010118726.0A Active CN111371761B (en) 2014-12-04 2014-12-04 Information processing method and device based on risk identification

Country Status (1)

Country Link
CN (2) CN111371761B (en)

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN108763209A (en) * 2018-05-22 2018-11-06 阿里巴巴集团控股有限公司 A kind of method, apparatus and equipment of feature extraction and risk identification
CN110427739A (en) * 2019-08-09 2019-11-08 泰康保险集团股份有限公司 Information Authentication method and device, electronic equipment and computer readable storage medium

Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN103118043A (en) * 2011-11-16 2013-05-22 阿里巴巴集团控股有限公司 Identification method and equipment of user account
CN103905532A (en) * 2014-03-13 2014-07-02 微梦创科网络科技(中国)有限公司 Microblog marketing account recognition method and system
CN104092601A (en) * 2014-07-28 2014-10-08 北京微众文化传媒有限公司 Method and device for recognizing social-media account

Family Cites Families (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US7877323B2 (en) * 2008-03-28 2011-01-25 American Express Travel Related Services Company, Inc. Consumer behaviors at lender level
CA2757290C (en) * 2008-04-01 2020-12-15 Leap Marketing Technologies Inc. Systems and methods for implementing and tracking identification tests
CN103580939B (en) * 2012-07-30 2018-03-20 腾讯科技(深圳)有限公司 A kind of unexpected message detection method and equipment based on account attribute

Patent Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN103118043A (en) * 2011-11-16 2013-05-22 阿里巴巴集团控股有限公司 Identification method and equipment of user account
CN103905532A (en) * 2014-03-13 2014-07-02 微梦创科网络科技(中国)有限公司 Microblog marketing account recognition method and system
CN104092601A (en) * 2014-07-28 2014-10-08 北京微众文化传媒有限公司 Method and device for recognizing social-media account

Non-Patent Citations (2)

* Cited by examiner, † Cited by third party
Title
MUILIAF: "《道客巴巴(DOC88.COM)》", 12 August 2013 *
倪平 等: "基于群体特征的社交僵尸网络监测方法", 《中国科学院大学学报》 *

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN108763209A (en) * 2018-05-22 2018-11-06 阿里巴巴集团控股有限公司 A kind of method, apparatus and equipment of feature extraction and risk identification
CN110427739A (en) * 2019-08-09 2019-11-08 泰康保险集团股份有限公司 Information Authentication method and device, electronic equipment and computer readable storage medium

Also Published As

Publication number Publication date
CN111371761A (en) 2020-07-03
CN105718767B (en) 2020-01-31
CN111371761B (en) 2022-10-18

Similar Documents

Publication Publication Date Title
RU2635275C1 (en) System and method of identifying user's suspicious activity in user's interaction with various banking services
CN107368259A (en) A kind of method and apparatus that business datum is write in the catenary system to block
CN106295349A (en) Risk Identification Method, identification device and the anti-Ore-controlling Role that account is stolen
CN110147967B (en) Risk prevention and control method and device
CN111507638A (en) Risk information output and risk information construction method and device
CN105337928B (en) Method for identifying ID, safety protection problem generation method and device
CN109784934A (en) A kind of transaction risk control method, apparatus and relevant device and medium
CN106033575A (en) Risk account identification method and apparatus
CN108229963A (en) The Risk Identification Method and device of user's operation behavior
US10373135B2 (en) System and method for performing secure online banking transactions
CN106027520A (en) Method and device for detecting and processing stealing of website accounts
CN110930218B (en) Method and device for identifying fraudulent clients and electronic equipment
CN106372977B (en) A kind of processing method and equipment of virtual account
CN107644098A (en) A kind of fraud recognition methods, device, equipment and storage medium
US20200234310A1 (en) Identity proofing for online accounts
CN112419041A (en) Risk monitoring method and device, storage medium and computer equipment
CN110110528A (en) Safety risk estimating method, device and the equipment of information system
CN111931047B (en) Artificial intelligence-based black product account detection method and related device
CN114186275A (en) Privacy protection method and device, computer equipment and storage medium
CN108009444A (en) Authority control method, device and the computer-readable recording medium of full-text search
WO2012123970A2 (en) A method of optimizing asset risk controls
CN105718767A (en) Information processing method and device based on risk identification
CN109918899A (en) Server, employee reveal the prediction technique and storage medium of company information
CN110363648B (en) Multi-dimensional attribute verification method and device based on same geographic type and electronic equipment
CN107623696A (en) A kind of user ID authentication method and device based on user behavior feature

Legal Events

Date Code Title Description
C06 Publication
PB01 Publication
C10 Entry into substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant
TR01 Transfer of patent right

Effective date of registration: 20200918

Address after: Cayman Enterprise Centre, 27 Hospital Road, George Town, Grand Cayman Islands

Patentee after: Innovative advanced technology Co.,Ltd.

Address before: Cayman Enterprise Centre, 27 Hospital Road, George Town, Grand Cayman Islands

Patentee before: Advanced innovation technology Co.,Ltd.

Effective date of registration: 20200918

Address after: Cayman Enterprise Centre, 27 Hospital Road, George Town, Grand Cayman Islands

Patentee after: Advanced innovation technology Co.,Ltd.

Address before: A four-storey 847 mailbox in Grand Cayman Capital Building, British Cayman Islands

Patentee before: Alibaba Group Holding Ltd.

TR01 Transfer of patent right