Embodiment
The embodiment of the present application provides a kind of recognition methods and equipment of user account, by gathering the identifying information corresponding to each user account, critical field and the model field of setting is comprised in described identifying information, when including at least one identical critical field content in the identifying information of any two user accounts, then determine that these two user accounts are registered by same user, otherwise, need to judge the content similarity of model field in the identifying information of these two user accounts, and judge these two user accounts whether by same user is registered according to the grading of the content similarity of described model field, thus solve the problem of the user account that effectively cannot identify same user registration from a large amount of user accounts.
By the technical scheme of the application, the centralized management to each user account that same user registers, the effectively efficiency of management of raising to user account can be realized, further, effectively can also analyze, follow the tracks of user behavior, the user account to user's registration pushes useful information accurately.
Embodiment one:
As shown in Figure 1, be the recognition methods schematic flow sheet of user account in the embodiment of the present application one, described recognition methods comprises the following steps:
Step 101: the identifying information of collection of server user account, comprises critical field and the model field of setting in described identifying information.
Described user account refers to the virtual account for carrying out business conduct in respective wire at registration of website that user is registered by internet site.Described user can be the individual of registered user's account, company or tissue.
The identifying information of described user account comprises following at least one information:
(1) log-on message, during user account registration.
Described log-on message can be the information that user fills in when registered user's account, and such as, for personal user, described log-on message can comprise: the information such as name, ID card No., sex, E-mail address, level of education; For enterprise customer, described log-on message can comprise: the information such as enterprise name, organization mechanism code, the type of business, contact address.
(2), from the information that third party obtains.
Described third party can refer to government, the non-government institution etc. such as other websites or industrial and commercial unit, civil unit, judicial unit outside registration of website, the information obtained from described third party can be according to the content log-on message, obtain and these other information user-dependent from described third party, such as, determine the enterprise name of enterprise customer according to described log-on message after, the information such as the registered capital of this enterprise customer, affiliated industry can be obtained from industrial and commercial unit.
(3), the information that produces when running of user account, as when user account logs in the IP address that uses or MAC information etc.
Content in the identifying information gathered can be divided into two classes by its character, and a class is the content in critical field, and another kind of is content in model field.
Described critical field represents can uniquely or close to the identifying information of identity identifying user account information registrant uniquely, such as, for personal user, the title of critical field can be ID card No., passport number etc., for this critical field of ID card No., the ID card No. that its content can be filled in when registered user's account for user.
Described model field is the non-key field in identifying information, such as, for personal user, the title of model field can be sex, date of birth, nationality, E-mail address, level of education, for this model field of date of birth, the date of birth that its content can be filled in when registered user's account for user.
It should be noted that, the critical field or model field that preset might not be comprised in the identifying information gathered, that is, if the critical field preset or model field are arranged with the form of tables of data, for each identifying information gathered, by in the fills in identifying information to the tables of data shown in table 1 time, some critical field may be there is or model field meaningful, the content of some critical field or model field is empty situation.
Such as: as shown in table 1, for the identifying information of the user account A of the tables of data form of setting, described identifying information comprises critical field and the model field of setting, if contain the name of user, ID card No., sex, nationality in this identifying information gathered for user account A, name then in table 1 is called that the critical field of ID card No. is meaningful, and name is called that in the critical field of passport number, content is for empty; Name be called name, sex, nationality model field meaningful, name be called the date of birth, E-mail address model field content for empty.
Table 1
It should be noted that, due to identifying information be can comprise above three kinds of information (user account registration time log-on message, from third party obtain information and user account run time produce information), and these three kinds of information can obtain from different approach collections, there is the situation of different content in the field that may occur for the same names of same user account.Still for the situation shown in table 1, if determine from the log-on message of user account A, the content of this model field of nationality is for China, the IP address used when user account A logs in again determines that the content of this model field of nationality is for Japan, then occur the situation of different content for the field of the same names of same user account, the present embodiment includes but not limited to following two kinds of ways:
The first way: different according to the source that identifying information gathers, for the identifying information of separate sources assigns weight, that is, can be log-on message, from third party obtain information and user account run time produce information distribution weight, when there is the situation of different content in the field of the same names for same user account, determine the weight in source belonging to different content, retain the content that the higher source of weight provides.Such as: if the weight of log-on message is the highest, secondly the weight of the information obtained from third party, the weight of the information produced when user account runs is minimum, if then determine from the log-on message of user account A, the content of this model field of nationality is for China, the IP address used when user account A logs in again determines that the content of this model field of nationality is for Japan, the content of the nationality determined in log-on message can be retained, the content of the nationality that the IP address used when abandoning user account A login is determined.
The second way: by the content different for same field of separate sources, content all as this field retains, such as: if determine from the log-on message of user account A, the content of this model field of nationality is for China, the IP address used when user account A logs in again determines that the content of this model field of nationality is for Japan, can fill in China and Japan in the content of this model field of table 1 nationality.
After the identifying information collecting each user account, can this identifying information be stored in the database of server with the form of the tables of data shown in table 1, more preferably, can be the identifying information allocation identification of storage, not only can be inquired the identifying information of each user account from database by the mark of this distribution, can also identify this user account is that personal user registers or enterprise customer's registration.Such as: the mark that the identifying information for the personal user account A shown in table 1 distributes can be A_001, wherein, A represents that corresponding identifying information is the identifying information of personal user's account, and 001 is the sequence number of identifying information in personal user's account.Again such as, the mark that the identifying information for a certain enterprise customer's account distributes can be B_001, and wherein, B represents that corresponding identifying information is the identifying information of enterprise customer's account, and 001 is the sequence number of identifying information in enterprise customer's account.
In addition, carry out database purchase easily to enable the identifying information gathered in the application and analyze, the information that the information that no matter identifying information is log-on message, obtain from third party or user account produce when running, in the information of collection critical field and model field refer to need consistent.Such as: the model field being called nationality for name in table 1, no matter the identifying information gathered from where, for this model field of nationality, its content all should represent nationality's information of user.
Step 102: server judges whether have the content of at least one critical field identical in the identifying information of described two user accounts for any two without the identifying information of the user account identified, if so, then performs step 105; Otherwise, perform step 103.
More preferably, if in a step 101, the identifying information of user account is stored in the database of server with the form of tables of data, and be that the identifying information stored is assigned with the mark can distinguishing personal user's account and enterprise customer's account, then in this step 102, according to mark, the identifying information being all personal user's account can be performed this step 102, and the identifying information being all enterprise customer's account is performed this step 102.
Because the content of critical field in identifying information is that represent can uniquely or close to the content of identity identifying user account information registrant uniquely, therefore, as long as in two identifying informations, there is the content of at least one critical field identical, then can think that this user account corresponding to two identifying informations is same user registration.
Suppose, in the identifying information of user account A, the content of this critical field of ID card No. is identical with the content of this critical field in the identifying information of user account B, though then these two identifying informations other critical fielies or the content of model field has non-correlation, all can think user account A and user account B be same user registration.
The identifying information that this step 102 is also not limited to having stored in database compares between two.
Step 103: whether the content similarity of the model field that server judges in the identifying information of described two user accounts reaches and impose a condition, if so, then performs step 105; Otherwise, perform step 104.
Particularly, server can determine the content similarity of the model field in the identifying information of these two user accounts by following steps:
Step one: the weighted value first needing to determine each model field in the identifying information of described two user accounts.
According to this model field, the weighted value of model field can represent that the importance of user identity is determined, for representing the model field that the importance of user identity is higher, its weight of distributing is also higher.
More preferably, the mode that model algorithm is set can be adopted in the present embodiment to determine the weighted value of each model field in identifying information, specifically can adopt the mode of Analytic Hierarchy Process Model, by setting up hierarchy Model, structure Paired comparison matrix, calculate weight vector and the step doing consistency check obtains the weighted value of each model field.Adopt the mode of tectonic remnant basin analytical model to determine the mode of the weighted value of each model field, the error in the result that the weighted value of subjective setting model field can be avoided largely to bring, makes the result of the weighted value obtained more accurate.
Step 2: respectively the content of model field identical for title in the identifying information of two user accounts is carried out similarity computing, and each operation result is weighted summation with the weighted value of corresponding model field, obtain the content similarity of model field.
Similarity mathematical algorithm involved in the present embodiment is not limited to arbitrary algorithm available at present, such as: two model field contents are identical, then determines that its similarity is 1, if two model field contents are not identical, then determine that its similarity is 0, more such as, Hamming distances algorithm etc. can also be used.
After similarity between the content determining the model field that multiple title is identical, each operation result can be weighted summation with the weighted value of corresponding model field, such as: for nationality, E-mail address, sex and these 4 model field of date of birth carry out content similarity judgement, suppose that its weight is respectively a, b, c and d, after carrying out similarity computing for the content of these 4 model field in the identifying information of two user accounts, its operation result is respectively X1, X2, X3 and X4, then be weighted and be summed to: aX1+bX2+cX3+dX4, the result of this weighted sum is the content similarity of model field entirety in the identifying information of two user accounts.
Described imposing a condition can be arranged according to the required precision judged similarity, if the required precision judged similarity is higher, then described imposing a condition can be set to larger threshold value, if the required precision judged similarity is lower, then described imposing a condition can be set to less threshold value.
More preferably, when calculating the content similarity of model field entirety in the identifying information of two user accounts in this step 103, it is also conceivable to the reliability standard in identifying information source, for the information that source is reliable, can be it and distribute higher weight, the mode of the content similarity of the weighted value computation model field integral utilizing identifying information to originate is:
Respectively the content of model field identical for title in the identifying information of two user accounts is carried out similarity computing, and the weighted value of information belonging to the content determining to carry out similarity computing, then each operation result and the weighted value of information belonging to the content of carrying out similarity computing are weighted summation, obtain the content similarity of model field.
Such as: for nationality, E-mail address, sex and these 4 model field of date of birth carry out content similarity judgement, nationality's information, sex and date of birth are obtained by log-on message, E-mail address is obtained by third party, suppose that the weighted value of log-on message is A, be B from the weighted value of third party's obtaining information, after carrying out similarity computing for the content of these 4 model field in the identifying information of two user accounts, its operation result is respectively X1, X2, X3 and X4, then be weighted and be summed to: AX1+BX2+AX3+AX4, the result of this weighted sum is the content similarity of model field entirety in the identifying information of two user accounts.
Represent that if consider simultaneously the importance of user identity carrys out the weight of weight and the information source of distributing for model field, after then the model field of same names can being carried out content similarity computing, operation result is weighted summation with the weight of corresponding model field, the weight of information source respectively, obtains the content similarity of model field.
Still with for nationality, E-mail address, sex and these 4 model field of date of birth are carried out content similarity and are judged as example, the weight of hypothesized model field is respectively a, b, c and d, nationality's information, sex and date of birth are obtained by log-on message, E-mail address is obtained by third party, suppose that the weighted value of log-on message is A, be B from the weighted value of third party's obtaining information, after carrying out similarity computing for the content of these 4 model field in the identifying information of two user accounts, its operation result is respectively X1, X2, X3 and X4, then be weighted and be summed to: AaX1+BbX2+AcX3+AdX4, the result of this weighted sum is the content similarity of model field entirety in the identifying information of two user accounts.
Step 104: determine that described two user accounts are registered by different user, and jump to step 106.
More preferably, in order to ensure the reliability of judged result, the mode of manual intervention can also be adopted further again to judge to the result of step 104, the scheme of the present embodiment is also not limited to: if when the result that again judges of the mode of manual intervention is different from the determination result of step 104, be as the criterion with the result of manual intervention.
Step 105: determine that described two user accounts are registered by same user, and jump to step 106.
Similar with step 104, the mode of manual intervention can also be adopted again to judge to the determination result of this step 105.
In the scheme of the embodiment of the present invention, if also carried out artificial intervention operation after step 104 or step 105, then can by the operating process of the mode record manual intervention of log information and operating result.
Step 106: whether are judged results that same user register, and judge whether that also there are two user accounts does not identify, if so, then jumps to step 102 if obtaining two user accounts, otherwise, terminate the identifying of this user account.
By the scheme of the embodiment of the present invention one, whether multiple user accounts that can identify in same website or different web sites are registered by same user, for the multiple user accounts registered by same user, the plurality of user account can be bound, and provide informing mechanism and result queries mechanism to the Website server at described multiple user account place, so that the user account centralized management that Website server is registered same user, effectively improve the efficiency of management to user account.
Preferably, by to same user register the analysis of multiple user account, effectively can follow the tracks of the business conduct of this user, the comprehensive business development trend analyzing this user, and then arbitrary user account of the multiple user accounts can registered to this user pushes useful information accurately, not only reduce Internet resources all shared by pushed information of multiple user accounts that Website server registers to this user and management cost, and make the more purposive and specific aim of the propelling movement of information, be convenient to website service one side and accurately market.
Preferably, when identifying multiple user account and being registered by same user, described multiple user account can also be divided into master, secondary user account is (as judged according to the number of times of user's login user account, login times is primary user's account than more frequently, what compare less login is time user account), and the secondary user account of this user is carried out timing according to Offtime length, when timing reaches set point, nullify the secondary user account of this user, the memory space of Website server is made to obtain certain reduction, thus reduce the operating cost of corresponding internet site to a certain extent.
Preferably, when identifying multiple user account and being registered by same user, when wherein having a user account to have certain risk if determine, then other user accounts registered by same user bound all can be identified as the user account with risk; In addition, if identifying multiple user account is when same user's short-term, inherent same website was registered, and the quantity of described multiple user account is comparatively large, can think the malicious registration of user in website.Therefore, by the scheme of the embodiment of the present invention, the security risk of user account can also be avoided, reduce the Network Security Vulnerabilities that it brings, improve the fail safe of internet site.
Embodiment two:
As shown in Figure 2, for belonging to the structural representation of the server under same inventive concept with the embodiment of the present application one, described server comprises data acquisition module 11, identifies judge module 12 and the first similarity computing module 13.
Described data acquisition module 11, for gathering the identifying information of user account, comprises critical field and the model field of setting in described identifying information.
Particularly, the identifying information of user account that described data acquisition module 11 collects comprises following at least one information: the information that log-on message during user account registration, the information obtained from third party, user account produce when running; Critical field in the identifying information that described data acquisition module 11 gathers represents can uniquely or close to the identifying information of the identity of identification user account information registrant uniquely, described model field be then the non-key field in identifying information.
It should be noted that, the identifying information gathered to enable data acquisition module 11 carries out database purchase easily and analyzes, the information that the information that no matter identifying information is log-on message, obtain from third party or user account produce when running, in the information of collection critical field and model field refer to need consistent.
Described identification judge module 12 is for receiving the identifying information of the user account of described data acquisition module 11 transmission, for the identifying information of any two user accounts, judge whether have the content of at least one critical field identical in the identifying information of described two user accounts, if, then determine that these two user accounts are registered by same user, otherwise, judge whether the content similarity of the model field in the identifying information of described two user accounts reaches to impose a condition, if, then determine that these two user accounts are registered by same user, otherwise, determine that these two user accounts are registered by different user.
Described first similarity computing module 13 is specifically for determining the weighted value of each model field, and respectively the content of model field identical for title in the identifying information of two user accounts is carried out similarity computing, each operation result is weighted summation with the weighted value of corresponding model field, obtains the content similarity of model field and be sent to identification judge module 12.
Particularly, described server can also comprise the second similarity computing module 14.
Described second similarity computing module 14 for determining log-on message in identifying information, the weighted value of information that produces when running of the information that obtains from third party and user account, respectively the content of model field identical for title in the identifying information of two user accounts is carried out similarity computing, and the weighted value of information belonging to the content determining to carry out similarity computing, and each operation result and the weighted value of information belonging to the content of carrying out similarity computing are weighted summation, obtain the content similarity of model field and be sent to and identify judge module 12.
The concrete content similarity also for obtaining according to described first similarity computing module 13 or the second similarity computing module 14 of described identification judge module 12, judges whether the content similarity of the model field in the identifying information of described two user accounts reaches and imposes a condition.
More preferably, described server can also comprise data memory module 15 and manual identified module 16.
The identifying information of each user account of described data memory module 15 for storing data acquisition module 11 and collecting; Described manual identified module 16 is for providing operation interface to receive manual intervention information, utilize the manual intervention information received to identifying that the recognition result that judge module 12 obtains judges again, if the result that the mode of manual intervention judges again, with when identifying that the determination result of judge module 12 is inconsistent, is as the criterion with the result of manual intervention.
It should be noted that, due to the critical field or model field that preset might not be comprised in the identifying information that data acquisition module 11 gathers, that is, if when default critical field and model field are stored in data memory module 15 with the form of tables of data or other forms, for each identifying information gathered, some critical field may be there is or model field meaningful, the content of some critical field or model field is empty situation; In addition, it should be noted that, the default critical field stored with the form of tables of data or other forms in data memory module 15 and the number of model field also should the dynamic changes according to the identifying information of the user account collected at every turn, more preferably, default critical field number or the number of model field can be increased according to the identifying information collected.
It should be noted that, the identifying information gathered due to data acquisition module 11 to comprise three kinds of information (information produced when log-on message during user account registration, the information obtained from third party and user account run), and these three kinds of information can obtain from different approach collections, there is the situation of different content in the field that may occur for the same names of same user account.Then occur the situation of different content for the field of the same names of same user account, the mode that data memory module 15 stores the identifying information of this user account can include but not limited to following two kinds of ways:
The first way: the source that data memory module 15 gathers according to identifying information is different, for the identifying information of separate sources assigns weight, when there is the situation of different content in the field of the same names for same user account, determine the weight in source belonging to different content, retain the content that the higher source of weight provides, and abandon the content that the lower source of other weights provides.
The second way: data memory module 15 is by the content different for same field of separate sources, and the content all as this field retains.
More preferably, described data memory module 15 can also distribute unique identification for the identifying information stored, not only can be inquired the identifying information of each user account from database by the mark of this distribution, can also identify this user account is that personal user registers or enterprise customer's registration.
More preferably, described data memory module 15 is also for binding the multiple user accounts registered by same user, and store binding information, the corresponding operating result that the artificial identification module 16 of the mode record of log information can also be adopted to carry out the identifying information judged result that this locality stores and operating process.
The embodiment of the present application provides a kind of recognition methods and equipment of user account, by gathering the identifying information of multiple user account, critical field and the model field of setting is comprised in described identifying information, when including at least one identical critical field content in the identifying information of any two user accounts, then determine that these two user accounts are registered by same user, otherwise, calculate the content similarity of the model field of described two user accounts and judge these two user accounts whether by same user is registered according to the content similarity result that obtains, thus solve and effectively cannot identify the problem whether multiple user account is same user registration, further, utilize the centralized management that the technical scheme of the application can realize each user account that same user registers, effective raising is to the efficiency of management of user account, can also effectively analyze, follow the tracks of user behavior, user account to user's registration pushes useful information accurately.
Obviously, those skilled in the art can carry out various change and modification to the application and not depart from the spirit and scope of the application.Like this, if these amendments of the application and modification belong within the scope of the application's claim and equivalent technologies thereof, then the application is also intended to comprise these change and modification.