CN110197375A - A kind of similar users recognition methods, device, similar users identification equipment and medium - Google Patents

A kind of similar users recognition methods, device, similar users identification equipment and medium Download PDF

Info

Publication number
CN110197375A
CN110197375A CN201811434297.7A CN201811434297A CN110197375A CN 110197375 A CN110197375 A CN 110197375A CN 201811434297 A CN201811434297 A CN 201811434297A CN 110197375 A CN110197375 A CN 110197375A
Authority
CN
China
Prior art keywords
user
identified
attribute
various dimensions
similar users
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN201811434297.7A
Other languages
Chinese (zh)
Inventor
杨洋
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Tencent Technology Shenzhen Co Ltd
Original Assignee
Tencent Technology Shenzhen Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Tencent Technology Shenzhen Co Ltd filed Critical Tencent Technology Shenzhen Co Ltd
Priority to CN201811434297.7A priority Critical patent/CN110197375A/en
Publication of CN110197375A publication Critical patent/CN110197375A/en
Pending legal-status Critical Current

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06QINFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
    • G06Q20/00Payment architectures, schemes or protocols
    • G06Q20/08Payment architectures
    • G06Q20/12Payment architectures specially adapted for electronic shopping systems
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06QINFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
    • G06Q20/00Payment architectures, schemes or protocols
    • G06Q20/38Payment protocols; Details thereof
    • G06Q20/40Authorisation, e.g. identification of payer or payee, verification of customer or shop credentials; Review and approval of payers, e.g. check credit lines or negative lists
    • G06Q20/401Transaction verification
    • G06Q20/4016Transaction verification involving fraud or risk level assessment in transaction processing
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06QINFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
    • G06Q30/00Commerce
    • G06Q30/06Buying, selling or leasing transactions
    • G06Q30/0601Electronic shopping [e-shopping]

Landscapes

  • Business, Economics & Management (AREA)
  • Accounting & Taxation (AREA)
  • Finance (AREA)
  • Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Strategic Management (AREA)
  • Physics & Mathematics (AREA)
  • General Business, Economics & Management (AREA)
  • Theoretical Computer Science (AREA)
  • Marketing (AREA)
  • Economics (AREA)
  • Development Economics (AREA)
  • Computer Security & Cryptography (AREA)
  • Management, Administration, Business Operations System, And Electronic Commerce (AREA)

Abstract

The application belongs to field of computer technology, disclose a kind of similar users recognition methods, device, similar users identification equipment and medium, similar users recognition methods disclosed in the present application includes, signature analysis is carried out to each user of specified type in advance, and then specifies the multidimensional attribute for similar identification;According to the characteristic ginseng value of each attribute in the multidimensional attribute of each user to be identified, various dimensions synthesis pertinence of the user to be identified in multiple dimensions is determined;Optimal separation threshold value is obtained using preset separation threshold model according to the various dimensions synthesis pertinence of acquisition, and corresponding various dimensions synthesis pertinence is higher than the optimal each user to be identified for separating threshold value, is determined as similar users.In this way, the degree of correlation by each user to be identified in each specified dimension, can identify similar users, the Efficiency and accuracy of similar users identification is improved.

Description

A kind of similar users recognition methods, device, similar users identification equipment and medium
Technical field
This application involves field of computer technology more particularly to a kind of similar users recognition methods, device, similar users knowledges Other equipment and medium.
Background technique
In trade company's operation, it usually needs carry out discriminance analysis to specific objective user, this kind of specific objective user can be with There are certain general character consumption features, be also possible to some undesirable users, according to recognition result, corresponding fortune can be formulated Seek management strategy.
Such as, illegal user clique is usually clique's action, by a large amount of false account of the registrations such as cell-phone number and mailbox, And transaction (e.g., money laundering) lack of standardization is carried out according to false account, to purify operating environment, need to know illegal user clique Not, and then to each illegal user it shields or punishes.
Therefore, how efficiently and effectively identification specific objective user becomes a kind of demand.
Summary of the invention
The embodiment of the present application provides a kind of similar users recognition methods, device, similar users identification equipment and medium, to In user's identification, according to the degree of correlation of each user in multiple specified dimensions, identifies the user of similar type, improve user The Efficiency and accuracy of identification.
On the one hand, a kind of similar users recognition methods is provided, comprising:
For each user to be identified, the characteristic ginseng value of each attribute in specified multidimensional attribute is obtained respectively;
For every two user to be identified, single attributes correlation of the corresponding characteristic ginseng value of each attribute is determined respectively, And it is determined more between every two user to be identified according to each single attributes correlation between every two user to be identified respectively Dimension synthesis pertinence;
It is obtained according to the various dimensions synthesis pertinence between every two user to be identified using preset separation threshold model Optimal separation threshold value is obtained, separates threshold model for determining the separation threshold for carrying out optimal dividing to each various dimensions synthesis pertinence Value;
Corresponding various dimensions synthesis pertinence is higher than the optimal each user to be identified for separating threshold value, is determined as similar use Family.
On the one hand, a kind of similar users identification device is provided, comprising:
Acquiring unit obtains each attribute in specified multidimensional attribute for being directed to each user to be identified respectively Characteristic ginseng value;
Determination unit determines the corresponding characteristic ginseng value of each attribute for being directed to every two user to be identified respectively Single attributes correlation, and determine that every two waits knowing according to each single attributes correlation between every two user to be identified respectively Various dimensions synthesis pertinence between other user;
Obtaining unit, for according to the various dimensions synthesis pertinence between every two user to be identified, using preset point Every threshold model, optimal separation threshold value is obtained, it is optimal to the progress of each various dimensions synthesis pertinence for determining to separate threshold model The separation threshold value of division;
Judging unit, for corresponding various dimensions synthesis pertinence to be higher than the optimal each user to be identified for separating threshold value, It is determined as similar users.
On the one hand, a kind of similar users identification equipment, including at least one processing unit and at least one storage are provided Unit, wherein storage unit is stored with computer program, when program unit processed executes, so that processing unit executes The step of stating any one similar users recognition methods.
On the one hand, a kind of computer-readable medium is provided, the calculating that can identify that equipment is executed by similar users is stored with Machine program, when program is run in similar users identification equipment, so that similar users identification equipment executes any one of the above The step of similar users recognition methods.
In a kind of similar users recognition methods provided by the embodiments of the present application, device, similar users identification equipment and medium, The each attribute that can be shared in advance for each user of specified type, formulates corresponding multidimensional attribute;According to each to be identified The characteristic ginseng value of each attribute in the specified multidimensional attribute of user determines each user to be identified in multiple dimensions Various dimensions synthesis pertinence;And then according to the various dimensions synthesis pertinence between each user determined, optimal separation threshold is determined Value, and divided each user to be identified according to optimal separation threshold value, the higher each similar users of the degree of correlation are obtained, in this way, By the degree of correlation of each user to be identified in each specified dimension, similar users can be identified, improve user's identification Efficiency and accuracy.
Other features and advantage will illustrate in the following description, also, partly become from specification It obtains it is clear that being understood and implementing the application.The purpose of the application and other advantages can be by written explanations Specifically noted structure is achieved and obtained in book, claims and attached drawing.
Detailed description of the invention
The drawings described herein are used to provide a further understanding of the present application, constitutes part of this application, this Shen Illustrative embodiments and their description please are not constituted an undue limitation on the present application for explaining the application.In the accompanying drawings:
Fig. 1 a is a kind of application scenario diagram of similar users identification in the application embodiment;
Fig. 1 b is a kind of application login interface exemplary diagram in the application embodiment;
Fig. 2 is a kind of implementation flow chart of similar users recognition methods in the application embodiment;
Fig. 3 is a kind of structural schematic diagram of similar users identification device in the application embodiment;
Fig. 4 is a kind of similar users identification device structure schematic diagram in the application embodiment;
Fig. 5 is a kind of similar users identification device structure schematic diagram in the application embodiment.
Specific embodiment
In order to according to the degree of correlation of each user in multiple specified dimensions, identify similar type when user identifies User, improves the Efficiency and accuracy of user's identification, the embodiment of the present application provide a kind of similar users recognition methods, device, Similar users identify equipment and medium.
Firstly, be illustrated to part term involved in the embodiment of the present application, in order to those skilled in the art understand that.
1, multidimensional attribute: is filtered out based on the analysis results after each user progress signature analysis for specified type Specified type the set of attribute that is respectively provided in multiple and different dimensions of each user.
2, it single attributes correlation: indicates the degree of correlation of two users in the corresponding dimension of same attribute, is by same What the dimension distance of two characteristic ginseng values in the corresponding dimension of attribute obtained, and with above-mentioned dimension apart from negatively correlated.
3, various dimensions synthesis pertinence: the degree of correlation of synthesis of two users in multiple dimensions is indicated.
In trade company's operation, it usually needs discriminance analysis is carried out to specific objective user, to carry out the non-of transaction lack of standardization For method user, it usually needs carry out analysis identification to illegal user, and then necessary prevention can be made for it, to reduce Transaction lack of standardization.Since illegal user would generally be verified automatically when registering or logging in using specific application, therefore, fortune Platform is sought when screening or identifying to user, generallys use following methods:
First way are as follows: the setting unrecognized complex verification code of application program.
In this way, it is possible to reduce illegal user is logged in by the automatic verifying of specific application, still, identifying code mistake In complexity, can make troubles to the input of user, this leverages user experience.
The second way are as follows: recognition of face is carried out to illegal user by photographic device.
But in this way, the higher cost of recognition of face, and the privacy of user can be encroached on.
The third mode are as follows: by the way that blacklist mode (e.g., EIC equipment identification code) is arranged, illegal user is screened.
But in this way, it is easy to be cracked by illegal user by modes such as brush machines.
Due to illegal user would generally team play, and would generally be largely empty by registrations such as phone number and mailboxes False account number, to carry out transaction lack of standardization according to false account, therefore, illegal user clique usually has much shared attribute, Such as, geographical location is closer to, and the verification time is shorter, and user name is more similar etc..To sum up, one is provided in the embodiment of the present application The technical solution of kind of similar users identification in the technical solution, usually has altogether according to illegal user each in illegal user clique This all higher feature of the degree of correlation of the multiple attributes and each illegal user having on different dimensions, knows illegal user Not.For example, the degree of correlation on Spatial Dimension shows that geographical location is closer to, the degree of correlation on time dimension is shown Verification time is shorter, comprehensively considers the attributes correlations of multiple dimensions in identification, can be very good to improve accuracy of identification and Efficiency reduces the probability of identification mistake.
Specifically, carrying out signature analysis to each user of specified type in advance, then, based on the analysis results, formulate corresponding Multidimensional attribute;According to the characteristic ginseng value of each attribute in the multidimensional attribute of user to be identified, each use to be identified is determined The family dimension distance on every dimension respectively, and according to the inverse of dimension distance, amplify each user to be identified respectively each The degree of correlation in dimension, and then the degree of correlation in each dimension respectively between determining each user to be identified, further, according to The degree of correlation of each user to be identified in each dimension, and separate threshold model, determine optimal separation threshold value, and will be corresponding more The degree of correlation in a dimension is higher than the above-mentioned optimal each user to be identified for separating threshold value, is determined as similar users.
Similar users recognition methods in the embodiment of the present application can be applied to application scenarios as shown in Figure 1a.Refering to figure Shown in 1a, for a kind of application scenario diagram of similar users identification.It include: similar users identification equipment 101 in the application scenarios, The user terminal 102 and management terminal 103 of each user.
Similar users identification equipment 101 is the server cluster or cloud computing that a server or several servers form Center.User terminal 102 and management terminal 103 are the electronic equipments for having network communications capability, which can be intelligence Energy mobile phone, tablet computer or portable personal computer etc., pass through wired or wireless network and similar users identify equipment 101 Connection.
When identifying similar users, each user terminal 102 first obtain respectively user to be identified specified application (e.g., Shopping application) operation information, and by operation information be sent to similar users identification equipment 101.
It wherein, include the characteristic ginseng value of each attribute in operation information, each attribute can be registration cell-phone number, using user Name, registration mailbox, mailbox user name, login time, login time interval, Internet protocol (Internet Protocol, IP) Address, logging device identity coding (Identity, ID), operating system (Operation System, OS) version, browser with And short message or picture verifying turn around time etc..
It is a kind of application login interface exemplary diagram referring particularly to shown in Fig. 1 b.User is logged in by user terminal and applies journey After sequence, the user name (e.g., the pet name, cell-phone number, mailbox) which logs in user, when the verifying of short message verification code returns Between, speed, login time are filled in, last login time interval logs in IP address, logging device ID, os release, system browsing Device, the information that the information and other application program that user actively fills in are collected are sent to similar users identification equipment 101.
Then it is perform the following steps in sequence by similar users identification equipment 101: receiving the operation that each user terminal 102 is sent Information, and according to the characteristic of target group, each attribute for including in operation information is screened, and according to the category after screening Property characteristic ginseng value, respectively determine every two user to be identified between be directed to an attribute dimension distance;Respectively according to every Two users to be identified are directed to the dimension distance of each attribute;Determine the every two user to be identified list on each attribute respectively Attributes correlation;Respectively according to each single attributes correlation between every two user to be identified, every two user to be identified is determined Various dimensions synthesis pertinence;According to each various dimensions synthesis pertinence and separate threshold model, determines optimal separation threshold value, it will Various dimensions synthesis pertinence is higher than the above-mentioned optimal each user to be identified for separating threshold value, is determined as similar users, and to network operator Management terminal 103 issue that doubtful there are the notification informations of potential user group.
Finally, after network operator receives the notification information that similar users identify that equipment 101 is sent by management terminal 103, needle Further detailed analysis is carried out to the similar users for including in notification information, and executes corresponding operation pipe based on the analysis results Reason strategy.
As shown in fig.2, being a kind of implementation flow chart of similar users recognition methods provided by the present application.Based on Fig. 1 a institute The specific implementation process of the application scenario diagram shown, this method is as follows:
Step 200: similar users identify that equipment is directed to each user to be identified, are specified respectively by user terminal acquisition The characteristic ginseng value of each attribute in multidimensional attribute.
Specifically, similar users identify that equipment according to the characteristic of the target user crowd of specified type, determines corresponding more After dimensional attribute, the operation information for the user that the user terminal of each user to be identified is sent by application program is received respectively, And according to the characteristic ginseng value of each attribute in the specified multidimensional attribute of operation information acquisition.
Wherein, specified type is the type of target user, e.g., illicit group, illegal teacher, illegal marketing team and illegal Foreign student etc..Multidimensional attribute is that researcher in advance analyzes multiple target users, determines that target user shares more The set of attribute in a dimension.
Multidimensional attribute is the target user to identify specified type, and according to target user in possible multiple dimensions The predicable feature that has and set, such as optional dimension includes: geographic area, personal information, registration information and Temporal information etc..Attribute for example can be with are as follows: address, age, educational background, the registered place of cell-phone number, when the feature and verifying of mailbox Between etc..It can wrap an attribute in one dimension, also may include multiple attributes.Such as, when dimension is personal information, personal information It may include the attributes such as height, weight, age and gender.One attribute can have a characteristic ginseng value, can also be by more A characteristic ginseng value indicates.Such as, the characteristic ginseng value of height (attribute) is height values;For another example, coordinate (attribute) is by longitude coordinate Combining with latitude coordinate indicates.
For example, illegal transaction user clique usually has the following characteristics that in the geographic area dimension of each illegal user usually It is closer to, can use address properties and judged;Verification time in temporal information dimension is usually shorter;Personal information dimension User name constitutive characteristic in degree may be more similar etc..Then gather for illegal transaction user, formulates corresponding various dimensions Each attribute that attribute includes are as follows: the applied address of phone number, the feature of user name, the feature of mailbox, mailbox login when Between and when login time interval etc., IP address when logging in, equipment identification information, operating system version, browser and verifying Between etc..
In another example target user is customers, usually have the following characteristics that log in the frequency of shopping application compared with Height, the age, women was in the majority usually between 20-40 years old, and the article of purchase is mainly cosmetics and dress ornament.Then it is directed to shopping at network Each attribute that the multidimensional attribute that user formulates includes are as follows: gender, age log in the frequency of shopping application, type of doing shopping.
In another example target user is telemarketing personnel, usually have the following characteristics that the contact person to converse daily is more, Air time is shorter, and the age, geographical location was relatively more fixed between 20-40.The various dimensions then formulated for telemarketing personnel Each attribute that attribute includes are as follows: talk times, number of contacts, air time, age and geographical location.
In this manner it is possible to the feature of target user is analyzed in advance, and then for the attribute that target user shares, if Determine the multidimensional attribute for judging the degree of correlation between each user to be identified.To pass through various dimensions category in subsequent steps Property each attribute characteristic ginseng value identify similar users.
Step 201: similar users identify that equipment is determined between every two user to be identified respectively for each attribute Dimension distance.
Specifically, dimension distance is relative distance of two users in the corresponding dimension of an attribute, i.e. dimension distance For the diversity factor of two users in one dimension.
In the embodiment of the present application, for determining any two user to be identified for the dimension distance of any one attribute It is illustrated.
Similar users identification equipment determine two users to be identified the corresponding dimension of an attribute apart from when, according to two User to be identified obtains dimension distance in the corresponding characteristic ginseng value of an attribute and preset distance rule.
It wherein, can be in the following ways apart from rule:
First way are as follows: if characteristic ginseng value is continuous variable (e.g., length, height, duration and speed etc.), then directly The difference between characteristic ginseng value was connected, determines dimension distance.
For example, the height of user A is 160cm (centimetre), the height of user B is 170cm, then similar users identification equipment is true Determining user A and user B in the corresponding dimension distance of height (attribute) is 10cm.
In another example the verification time that user A logs in shopping application is 2s (second), user B logs in testing for same shopping application The card time is 5s, then similar users identification equipment determines that user A and user B are in verification time (attribute) corresponding dimension distance 3s。
In another example the frequency that user A logs in study website is that 3 times a week, the frequency that user B logs in study website is weekly 1 time, then user A and user B is 2 times in the corresponding dimension distance of frequency (attribute) for logging in study website.
In this way, directly according to different user for the difference of the characteristic ginseng value of same attribute, so that it may obtain accurate Dimension distance.
The second way are as follows: if characteristic ginseng value is discontinuous variable (e.g., phone number, device version number, operating system And browser etc.), then classify respectively to each discontinuous variable, if belonging to the discontinuous variable of two users to be identified Classification is identical, it is determined that the dimension distance of the two is 0, is 1 otherwise.
For example, it is assumed that the attribute of user to be identified is phone number, then the operator according to belonging to phone number is returned Phone number is classified in possession, if the operator ownership of the phone number of two users to be identified it is identical, it is determined that two The dimension distance of person is 0, is otherwise 1.
In another example, it is assumed that the attribute of user to be identified is os release number, if the os release number of two users to be identified It is identical, it is determined that the dimension distance of the two is 0, is 1 otherwise.
In another example, it is assumed that the terminal device that user A is used is home-made cellphone, and terminal device that user B is used is non-domestic Mobile phone then determines that user A and user B in the corresponding dimension in terminal device place of production distance are 1.
In this way, be discontinuous variable for value type, but the characteristic ginseng value that can be divided according to classification, just Corresponding dimension distance can be determined according to the classification of characteristic ginseng value.But in this way, it can not accurately determine The dimension range accuracy of otherness of the user on attribute, acquisition is lower.
The third mode are as follows: if characteristic ginseng value is string variable (e.g., user name and mailbox etc.), then according to character string Distance algorithm determines the dimension distance between two users to be identified.
Wherein, String distance algorithm can be with are as follows: editing distance (Edit Distance) algorithm.
For example, the entitled Air city of the user of user A, the entitled castles in the air of the user of user B then use Edit Distance algorithm determines the dimension distance between user name " Air city " and user name " castles in the air ".
In this manner it is possible to be directed to the characteristic ginseng value of character types, corresponding dimension distance is determined.
4th kind of mode are as follows: if characteristic ginseng value is sentence, according to the similarity or paraphrase between sentence, determine corresponding Dimension distance.
For example, the query statement of user A input is " a good book ", the query statement of user B input is " fantasy novel ", Then the query statement of user A and user B are segmented respectively and paraphrase, determining " a good book " are that the probability of books is 0.6, it is greater than predetermined probabilities 0.5, " fantasy novel " is that the probability of books is 0.8, is greater than predetermined probabilities 0.5, then determines that " one is good Book " and " fantasy novel " each mean books, and the dimension distance of the two is 1.
Fifth procedure are as follows: corresponding grade is set for each characteristic ginseng value in advance, it is corresponding according to characteristic ginseng value Grade difference determines respective dimensions distance.
For example, presetting the corresponding grade of each educational background: the grade of primary school is 1, and the grade in middle school is 2, the grade of senior middle school It is 3, the grade of university is 4.Assuming that the educational background of user A is university, the educational background of user B is senior middle school, then determines user A and user B Between in educational background dimension distance be 1.
Only using characteristic ginseng value as continuous variable in the embodiment of the present application, discontinuous variable, string variable and sentence It being illustrated Deng for, in practical application, dimension distance can formulate distance rule accordingly according to different attribute features, This is not restricted.In this manner it is possible to determine the distance of different user in one dimension.
In the embodiment of the present application, only by determine two users in the corresponding dimension of an attribute dimension distance for into Row explanation, is based on the principle of similitude, can obtain any other two users in the dimension distance of any one attribute.
Step 202: similar users identification equipment is directed to the dimension distance of each attribute of every two user to be identified respectively, The corresponding single attributes correlation of each attribute is determined respectively.
Specifically, the inverse of every dimension distance is determined as single attributes correlation respectively by similar users identification equipment, and Single attributes correlation is normalized, single attributes correlation after being normalized.
Optionally, when determining single attributes correlation after normalizing, following formula can be used:
Wherein, i and j is the serial number of user to be identified, MijSingle category between user i to be identified and user j to be identified Property the degree of correlation, n be user to be identified total number, a be dimension distance, Δ is parameter.
In order to avoid aijWhen being 0, a can not be calculatedijInverse distance the case where, any one aij is 0 if it exists, then Δ It is not 0, if all aij are not 0, Δ=0 is set.Optionally, Δ can be set to 0.0001*Min aij
For example, in the corresponding dimension of height apart from being 10 between user A and user B, in height between user B and user C Corresponding dimension distance is 5, and user A and user C are 5 in the corresponding dimension distance of height.Then returning between user A and user B Single attributes correlation=0.1/ (0.1+0.2+0.2)=0.2 after one change.
Optionally, when executing step 202, each inverse distance can not also be normalized.In this way, it is possible to reduce calculate Data volume, improve data-handling efficiency.
In this manner it is possible to determine two users to be identified degree of correlation in each dimension respectively.Obviously, two wait know Dimension between other user apart from smaller, illustrate the difference between two users to be identified with regard to smaller, then two users to be identified Between single attributes correlation it is bigger, conversely, single attributes correlation is smaller.
For example, dimension distance of two users on the verification time is 0.001, it is determined that single attribute between two users The degree of correlation is 1000.As it can be seen that the degree of correlation of two users on this dimension of verification time is higher, that is to say, that two users exist When being verified, application verification has been carried out using the almost the same time.
Step 203: similar users identify equipment respectively according to each single attributes correlation between every two user to be identified, Determine the various dimensions synthesis pertinence of every two user to be identified.
Specifically, similar users identification equipment determines the various dimensions synthesis pertinence between every two user to be identified respectively When, following methods can be used for every two user to be identified:
First way are as follows: by the adduction of each single attributes correlation between two users to be identified, be determined as two to Identify the various dimensions synthesis pertinence between user.
For example, it is 0.5 that each single attributes correlation between user A and user B, which is respectively as follows: time correlation degree, geographical location The degree of correlation is 0.1 and the personal information degree of correlation is 0.2, then similar users identification equipment determines between user A and user B Various dimensions synthesis pertinence is 0.8.
In this manner it is possible to which each single attributes correlation is directly carried out linear, additive, various dimensions synthesis pertinence is obtained.
The second way are as follows: each single attributes correlation between two users to be identified is weighted summation, obtains two Various dimensions synthesis pertinence between a user to be identified.
For example, it is 0.2 that each single attributes correlation between user A and user B, which is respectively as follows: time correlation degree, geographical location The degree of correlation is 0.1 and the personal information degree of correlation is 0.5, and the weighted value of each attribute is 0.5, then similar users identification is set The standby various dimensions synthesis pertinence determined between user A and user B is 0.4.
In this way, using the second way, respectively according to the significance level of each single attributes correlation, respectively each single category Property the degree of correlation corresponding weight is set, and each single attributes correlation is summed up with the product of corresponding weight, and then should Adduction is determined as various dimensions synthesis pertinence.
Optionally, weighted mean method can also be used, various dimensions synthesis pertinence is obtained.
For example, it is 0.1 that each single attributes correlation between user A and user B, which is respectively as follows: the age degree of correlation, corresponding power Weight values are 0.5, and the academic degree of correlation is 0.5, and corresponding weighted value is 0.1, and the gender degree of correlation is 1, and corresponding weighted value is 0.1, The ownership place degree of correlation is 1, and corresponding weighted value is 0.3, then similar users identification equipment determines that the various dimensions of the two are comprehensive related Degree is (0.1*0.5+0.1*0.5+0.1*1+1*0.3)/4=0.125.
In this manner it is possible to determine the degree of correlation of the synthesis between user to be identified in each dimension.
Step 204: similar users identify that equipment according to the various dimensions synthesis pertinence between every two user to be identified, is adopted With preset separation threshold model, optimal separation threshold value is obtained.
In the embodiment of the present application, it is assumed that each user to be identified is divided into two classes, then determines one by the difference variance between two classes A optimal separation threshold value, to carry out classifying rationally to each user to be identified.
Wherein it is determined that when optimal separation threshold value following steps can be used:
Firstly, similar users identify that equipment generates correlation matrix according to each various dimensions synthesis pertinence, and successively it is directed to Each various dimensions synthesis pertinence of correlation matrix executes following steps:
S1, using the various dimensions synthesis pertinence as separate threshold value to the corresponding use to be identified of each various dimensions synthesis pertinence Family is divided, and specified type set and non-designated type set are obtained.
That is, each user to be identified is demarcated by line of demarcation of segmentation threshold, corresponding various dimensions are comprehensive Each user to be identified that the degree of correlation is higher than separation threshold value is divided into specified type set, and remaining other users are divided into non-finger Determine type set.
For example, it is assumed that separating threshold value is 0.5, the various dimensions synthesis pertinence between user A and user B is 0.8, user A Various dimensions synthesis pertinence between user C is 0.3, then is greater than corresponding various dimensions synthesis pertinence 0.8 and separates threshold value 0.5 user A and user B is divided into specified type set, and remaining user C is divided into non-designated type set.
The of the total quantity of S2, the quantity for determining the user to be identified that specified type set includes and all users to be identified First average value of the corresponding each various dimensions synthesis pertinence of each user to be identified of one accounting and specified type set;And Determine the second accounting of the quantity for the user to be identified that non-designated type set includes and the total quantity of all users to be identified, with And the second average value of the corresponding each various dimensions synthesis pertinence of each user to be identified of non-designated type set;Further, According to square of the difference of the first average value and the second average value, the first accounting and the second accounting determine difference variance;
Specifically, when determining difference variance following formula can be used:
G=w0·w1·(u0-u1)2 (1)
Wherein, g is difference variance, w0For the first accounting, w1For the second accounting, u0For the first average value, u1It is average for second Value.Formula (1) is obtained by following formula reasoning:
U=w0u0+w1u1 (2)
G=w0(u0-u)2+w1(u1-u)2 (3)
In this manner it is possible to which formula (2) is brought into formula (3), obtain formula (1).
Wherein, g is difference variance, w0For the first accounting, w1For the second accounting, u0For the first average value, u1It is average for second Value, u are the average value of the various dimensions synthesis pertinence of all users to be identified.
Then, similar users identification equipment obtains the difference determined for each various dimensions synthesis pertinence of correlation matrix After Singular variance, the maximum difference variance in each difference variance is determined.If the corresponding separation threshold value of maximum difference variance is higher than pre- If separating threshold value, it is determined that corresponding the separations threshold value of above-mentioned maximum difference variance is optimal separation threshold value.
Wherein, separation threshold value is preset artificially to be arranged.This is because all users to be identified may be in practical application For normal users, i.e., directly dividing the specified type set of acquisition by optimal separation threshold value may be also normal users, therefore, Default segmentation threshold is set according to actual needs, and it is pre- to judge whether optimum segmentation threshold value is higher than after obtaining optimal separation threshold value If separating threshold value, if so, illustrating that optimum segmentation threshold value is higher, a possibility that there are similar users, is higher, otherwise, illustrates exist A possibility that similar users, is very small, to save the subsequent tedious steps manually checked, is only higher than to optimum segmentation threshold value pre- If the case where separating threshold value carries out classification processing.
For example, default segmentation threshold is 0.8, similar users identify that the optimal separation threshold value that equipment determines is 0.2, it is clear that Optimal separation threshold value 0.2 separates threshold value 0.8 lower than default, then determines that all users to be identified are ordinary user, and phase is not present Like user.
Optionally, when determining maximum difference variance according to formula (1), point-by-point method or Newton iteration can also be used Method.
In this way, in such a way that maximum difference variance divides user, may be implemented to specified type set and non- The coverage rate of the classifying rationally of specified type set, each dimension of user is also higher, improves the accuracy of similar users identification.
Step 205: similar users identify that corresponding various dimensions synthesis pertinence is higher than each of optimal separation threshold value by equipment User to be identified, is determined as similar users.
Specifically, similar users identify equipment by corresponding various dimensions synthesis pertinence be higher than it is optimal separate threshold value respectively to It identifies that user forms optimal specified type set, and each user in optimal specified type set is determined as similar users.
Optionally, similar users identification equipment determines that the quantity for the similar users for including in optimal specified type set is higher than When preset quantity threshold value, determine that there may be potential user groups, and issues to management terminal doubtful there are potential user group Warning message.
Optionally, various dimensions synthesis pertinence can also be higher than the optimal each use for separating threshold value by similar users identification equipment Family is shown by highlighted or different colours to user.
In this manner it is possible to identify to each user to be identified, and according to recognition result, determine that there are the general of target user When rate is higher, notification information is issued the user with, and then user can carry out manual analysis judgement, improve according to prompt information The recognition efficiency and accuracy of similar users, bring convenience to user.It is further possible to be each according to recognition result User setting feature tag set determines user's portrait of user.
A specific application scenarios are used below, and above-described embodiment is further described:
Assuming that potential user group is illegal transaction user clique, the attribute that researcher has illegal transaction user clique After being analyzed, multidimensional attribute each attribute for including of setting illegal transaction clique are as follows: registration cell-phone number, using user name, Registration mailbox, mailbox user name, login time, login time interval log in IP address, logging device ID, os release, browser And short message or picture verify turn around time.
Then, it when similar users identification equipment identifies user according to multidimensional attribute, obtains respectively each wait know The characteristic ginseng value of each attribute in the above-mentioned multidimensional attribute of other user, and determine be directed between every two user to be identified respectively The dimension distance of each attribute, and corresponding single attributes correlation is determined according to the dimension of each attribute distance respectively.
Then, similar users identification equipment determines respectively according to single attributes correlation between every two user to be identified The various dimensions synthesis pertinence of every two user to be identified.
Finally, similar users identification equipment determines optimal separation threshold value according to various dimensions synthesis pertinence, and according to optimal Separate threshold value to divide each user to be identified, obtains illegal transaction user set and normal users set.Further, phase It issues like user-identification device to network operator doubtful there are the warning message of illegal transaction clique, allows network operator according to police It accuses information and carries out further manual analysis, and formulate corresponding maintenance measure.
Another specific application scenarios is used below, and above-described embodiment is further described:
Assuming that specified type collection is combined into trade company's set, the attribute that user has specified type trade company is analyzed, and root The attribute that multidimensional attribute according to analysis result setting trade company includes are as follows: application program uses duration, application program update time Interval, age, and educational background.
When similar users identification equipment identifies each user, determines be directed to each attribute between every two user respectively Dimension distance, and corresponding single attributes correlation is determined according to the dimension of each attribute distance respectively, and respectively according to every Single attributes correlation between two users determines the various dimensions synthesis pertinence of every two user.
Further, according to the various dimensions synthesis pertinence determined, optimum segmentation threshold value is determined, and according to optimal separation Threshold value divides each user, obtains trade company's set.
Finally, similar users identification equipment determines that the number of users 100 for including in trade company's set is higher than preset quantity thresholding Value 20, then to management terminal issue it is doubtful there are target trade company gather notification information so that administrative staff pass through management terminal It can receive notification information, and each user's further progress in the trade company's set determined manually divided according to notification information Analysis, and determine each user in trade company's set for after target group, each user into trade company's set sends corresponding promotion and disappears Breath.
Based on the same inventive concept, a kind of similar users identification device is additionally provided in the embodiment of the present application, due to above-mentioned The principle that device and equipment solve the problems, such as is similar to a kind of similar users recognition methods, and therefore, the implementation of above-mentioned apparatus can be joined The implementation of square method, overlaps will not be repeated.
As shown in figure 3, it is a kind of structural schematic diagram of similar users identification device provided by the embodiments of the present application, packet It includes:
Acquiring unit 30 obtains each attribute in specified multidimensional attribute for being directed to each user to be identified respectively Characteristic ginseng value;
Determination unit 31 determines the corresponding characteristic ginseng value of each attribute for being directed to every two user to be identified respectively Single attributes correlation, and determine that every two waits for according to each single attributes correlation between every two user to be identified respectively Identify the various dimensions synthesis pertinence between user;
Obtaining unit 32, for according to the various dimensions synthesis pertinence between every two user to be identified, use to be preset Separate threshold model, obtain optimal separation threshold value, separates threshold model and each various dimensions synthesis pertinence is carried out most for determining The separation threshold value of excellent division;
Judging unit 33, for corresponding various dimensions synthesis pertinence to be higher than the optimal each use to be identified for separating threshold value Family is determined as similar users.
Preferably, determination unit 31 is used for:
For any two user to be identified, when determining single attributes correlation of the corresponding characteristic ginseng value of any attribute, Execute following steps:
According to the characteristic ginseng value of any attribute of any two user to be identified, determine that any two are to be identified The dimension distance of any attribute is directed between user, dimension distance is difference of two users in the corresponding dimension of attribute Property;
According to the corresponding dimension distance of any attribute, determine between any two user to be identified for any category Single attributes correlation of property, single attributes correlation and dimension are apart from negatively correlated.
Preferably, determination unit 31 is used for:
Respectively by the sum after each single attributes correlation normalization between every two user to be identified, be determined as this every two Various dimensions synthesis pertinence between a user to be identified;Alternatively,
Sum after each single attributes correlation between every two user to be identified is normalized and is weighted respectively, really The various dimensions synthesis pertinence being set between every two user to be identified.
Preferably, obtaining unit 32 is used for:
Following steps successively are executed for each various dimensions synthesis pertinence: using the various dimensions synthesis pertinence as separation Threshold value divides the corresponding user to be identified of each various dimensions synthesis pertinence, obtains specified type set and non-designated type Set;Determine the first accounting and specified class of user to be identified that specified type set includes in all users to be identified First average value of the various dimensions synthesis pertinence of each user to be identified of type set, and determine that non-designated type set includes The multidimensional of each user to be identified of second accounting and non-designated type set of the user to be identified in all users to be identified Spend the second average value of synthesis pertinence;According to square of the difference of the first average value and the second average value, the first accounting, and Second accounting determines difference variance;
Determine the maximum difference variance in each difference variance obtained;
If the corresponding separation threshold value of maximum difference variance is higher than default separation threshold value, it is determined that maximum difference variance is corresponding Separation threshold value is optimal separation threshold value.
Preferably, judging unit 33 is also used to:
When determining that the quantity of each similar users is higher than preset quantity threshold value, issue the user with that doubtful there are potential user groups Warning message.
In a kind of similar users recognition methods provided by the embodiments of the present application, device, similar users identification equipment and medium, The each attribute that can be shared in advance for each user of specified type, formulates corresponding multidimensional attribute;According to each to be identified The characteristic ginseng value of each attribute in the specified multidimensional attribute of user determines each user to be identified in multiple dimensions Various dimensions synthesis pertinence;And then according to the various dimensions synthesis pertinence between each user determined, optimal separation threshold is determined Value, and divided each user to be identified according to optimal separation threshold value, the higher each similar users of the degree of correlation are obtained, in this way, By the degree of correlation of each user to be identified in each specified dimension, similar users can be identified, improve user's identification Efficiency and accuracy.
As shown in fig.4, identifying the structural schematic diagram of equipment for a kind of similar users.Based on same technical concept, this Shen Please embodiment additionally provide a kind of similar users identification equipment, similar users identification equipment 400 is for implementing above-mentioned each method The method that embodiment is recorded.Similar users identify that equipment 400 includes: processor 410, memory 420, power supply 430, display unit 440, input unit 450.
Processor 410 is the control centre of similar users identification equipment 400, utilizes various interfaces and each portion of connection Part executes similar users and identifies equipment 400 by running or executing the software program and/or data that are stored in memory 420 Various functions, thus to server carry out integral monitoring.
Optionally, processor 410 may include one or more processing units;Preferably, processor 410 can integrate at Manage device and modem processor, wherein the main processing operation system of application processor, user interface and application program etc. are adjusted Demodulation processor processed mainly handles wireless communication.It is understood that above-mentioned modem processor can not also integrate everywhere It manages in device 410.In some embodiments, processor, memory, can realize on a single chip, in some embodiments, it Can also be realized respectively on independent chip.
Memory 420 can mainly include storing program area and storage data area, wherein storing program area can store operation system System, various application programs etc.;Storage data area, which can be stored, uses created data according to similar users identification equipment 400 Deng.In addition, memory 420 may include high-speed random access memory, it can also include nonvolatile memory, for example, at least One disk memory, flush memory device or other volatile solid-state parts etc..
Similar users identification equipment 400 further includes the power supply 430 (such as battery) powered to all parts, and power supply can lead to Cross power-supply management system and processor 410 be logically contiguous, thus by power-supply management system realize management charging, electric discharge and The functions such as power consumption.
Display unit 440 can be used for showing information input by user or the information and similar users knowledge that are supplied to user The various menus etc. of other equipment 400 are mainly used for showing in similar users identification equipment 400 in the embodiment of the present application respectively using journey The entities such as text, the picture shown in the display interface and display interface of sequence.Display unit 440 may include display panel 141.Display panel 141 can use liquid crystal display (Liquid Crystal Display, LCD), Organic Light Emitting Diode Forms such as (Organic Light-Emitting Diode, OLED) configure.
Input unit 450 can be used for receiving the information such as number or the character of user's input.Input unit 450 may include touch-control Panel 451 and other input equipments 452.Wherein, it is on it or attached to collect user for touch panel 451, also referred to as touch screen Close touch operation (such as user using any suitable objects or attachment such as finger, felt pens on touch panel 451 or Operation near touch panel 451).
Specifically, touch panel 451 can detecte the touch operation of user, and detect touch operation bring signal, it will These signals are converted into contact coordinate, are sent to processor 410, and receive order that processor 410 is sent and executed.This Outside, touch panel 451 can be realized using multiple types such as resistance-type, condenser type, infrared ray and surface acoustic waves.Other inputs Equipment 452 can include but is not limited to physical keyboard, function key (such as volume control button, switching on and shutting down key etc.), trace ball, One of mouse, operating stick etc. are a variety of.
Certainly, touch panel 451 can cover display panel 441, when touch panel 451 detects touching on it or nearby After touching operation, processor 410 is sent to determine the type of touch event, is followed by subsequent processing device 410 according to the type of touch event Corresponding visual output is provided on display panel 441.Although touch panel 451 and display panel 441 are conducts in Fig. 4 Two independent components realize that similar users identification equipment 400 outputs and inputs function, but in certain embodiments, it can That similar users identification equipment 400 is realized so that touch panel 451 and display panel 441 is integrated outputs and inputs function.
Similar users identification equipment 400 may also include one or more sensors, such as pressure sensor, acceleration of gravity Sensor, close to optical sensor etc..Certainly, according to the needs in concrete application, above-mentioned similar users identification equipment 400 can be with Other components such as including camera, since these components are not the components that emphasis uses in the embodiment of the present application, in Fig. 4 In be not shown, and be no longer described in detail.
It will be understood by those skilled in the art that Fig. 4 is only the citing of similar users identification equipment, do not constitute to similar The restriction of user-identification device may include perhaps combining certain components or different than illustrating more or fewer components Component.
Based on same technical concept, the embodiment of the present application also provides a kind of similar users to identify equipment, referring to Fig. 5 institute Show, similar users identification equipment 500 is used to implement the method that above-mentioned each embodiment of the method is recorded, such as implements shown in Fig. 2 Embodiment, similar users identify that equipment 500 may include memory 501, processor 502, input unit 503 and display panel 504。
Memory 501, the computer program executed for storage processor 502.Memory 501 can mainly include storage journey Sequence area and storage data area, wherein storing program area can application program needed for storage program area, at least one function etc.; Storage data area, which can be stored, uses created data etc. according to similar users identification equipment 500.Processor 502, can be One central processing unit (central processing unit, CPU), or be digital processing element etc..Input unit 503, it can be used for obtaining the user instruction of user's input.Display panel 504, for showing information input by user or offer To the information of user, in the embodiment of the present application, display panel 504 is mainly used for showing in similar users identification equipment respectively using journey The control entity shown in the display interface of sequence and each display interface.Optionally, display panel 504 can use liquid crystal display Device (liquid crystal display, LCD) or OLED (organic light-emitting diode, organic light emission two Pole pipe) etc. forms configure display panel 504.
Above-mentioned memory 501, processor 502, input unit 503 and display panel 504 are not limited in the embodiment of the present application Between specific connection medium.The embodiment of the present application is in Fig. 5 with memory 501, processor 502, input unit 503, display It is connected between panel 504 by bus 505, bus 505 is indicated in Fig. 5 with thick line, the connection type between other components, only It is to be schematically illustrated, does not regard it as and be limited.Bus 505 can be divided into address bus, data/address bus, control bus etc..For Convenient for indicating, only indicated with a thick line in Fig. 5, it is not intended that an only bus or a type of bus.
Memory 501 can be volatile memory (volatile memory), such as random access memory (random-access memory, RAM);Memory 501 is also possible to nonvolatile memory (non-volatile Memory), such as read-only memory, flash memory (flash memory), hard disk (hard disk drive, HDD) or solid State hard disk (solid-state drive, SSD) or memory 501 can be used for carrying or storing have instruction or data The desired program code of structure type and can by any other medium of computer access, but not limited to this.Memory 501 It can be the combination of above-mentioned memory.
Processor 502, for calling the computer program stored in memory 501 execution such as to implement implementation shown in Fig. 2 Example.
In some possible embodiments, a kind of various aspects of similar users recognition methods provided by the present application may be used also In the form of being embodied as a kind of program product comprising program code, when program product is run in similar users identification equipment When, program code be used for make similar users identification equipment execute this specification foregoing description according to the various exemplary realities of the application Apply the step in a kind of similar users recognition methods of mode.For example, similar users identification equipment can execute following steps: S0, it is directed to each user to be identified, the feature for obtaining each attribute in specified multidimensional attribute by user terminal respectively is joined Numerical value;S1, the dimension distance that each attribute is directed between every two user to be identified is determined respectively;S2, it is directed to every two respectively The dimension distance of each attribute of a user to be identified determines the corresponding single attributes correlation of each attribute respectively;S3, respectively root According to each single attributes correlation between every two user to be identified, determine that the various dimensions of every two user to be identified are comprehensive related Degree;S4, it is obtained according to the various dimensions synthesis pertinence between every two user to be identified using preset separation threshold model Optimal separation threshold value;S5, corresponding various dimensions synthesis pertinence is higher than the optimal each user to be identified for separating threshold value, be determined as Similar users.
Based on the same inventive concept, the embodiment of the present application also provides a kind of computer-readable medium, being stored with can be by Similar users identify the computer program that equipment executes, when described program is run in similar users identification equipment, so that institute State the step of similar users identification equipment executes the method for being packaged application program.
It should be understood by those skilled in the art that, embodiments herein can provide as method, system or computer program Product.Therefore, complete hardware embodiment, complete software embodiment or reality combining software and hardware aspects can be used in the application Apply the form of example.Moreover, it wherein includes the computer of computer usable program code that the application, which can be used in one or more, The computer program implemented in usable storage medium (including but not limited to magnetic disk storage, CD-ROM, optical memory etc.) produces The form of product.
The application is referring to method, the process of equipment (system) and computer program product according to the embodiment of the present application Figure and/or block diagram describe.It should be understood that every one stream in flowchart and/or the block diagram can be realized by computer program instructions The combination of process and/or box in journey and/or box and flowchart and/or the block diagram.It can provide these computer programs Instruct the processor of general purpose computer, special purpose computer, Embedded Processor or other programmable data processing devices to produce A raw machine, so that being generated by the instruction that computer or the processor of other programmable data processing devices execute for real The device for the function of being specified in present one or more flows of the flowchart and/or one or more blocks of the block diagram.
These computer program instructions, which may also be stored in, is able to guide computer or other programmable data processing devices with spy Determine in the computer-readable memory that mode works, so that it includes referring to that instruction stored in the computer readable memory, which generates, Enable the manufacture of device, the command device realize in one box of one or more flows of the flowchart and/or block diagram or The function of being specified in multiple boxes.
These computer program instructions also can be loaded onto a computer or other programmable data processing device, so that counting Series of operation steps are executed on calculation machine or other programmable devices to generate computer implemented processing, thus in computer or The instruction executed on other programmable devices is provided for realizing in one or more flows of the flowchart and/or block diagram one The step of function of being specified in a box or multiple boxes.
Although the preferred embodiment of the application has been described, it is created once a person skilled in the art knows basic Property concept, then additional changes and modifications can be made to these embodiments.So it includes excellent that the following claims are intended to be interpreted as It selects embodiment and falls into all change and modification of the application range.
Obviously, those skilled in the art can carry out various modification and variations without departing from the essence of the application to the application Mind and range.In this way, if these modifications and variations of the application belong to the range of the claim of this application and its equivalent technologies Within, then the application is also intended to include these modifications and variations.

Claims (10)

1. a kind of similar users recognition methods characterized by comprising
For each user to be identified, the characteristic ginseng value of each attribute in specified multidimensional attribute is obtained respectively;
For every two user to be identified, single attributes correlation of the corresponding characteristic ginseng value of each attribute is determined respectively, and is divided Not according to each single attributes correlation between every two user to be identified, the various dimensions between every two user to be identified are determined Synthesis pertinence;
It is obtained most according to the various dimensions synthesis pertinence between every two user to be identified using preset separation threshold model Optimal sorting is every threshold value, and the threshold model that separates is for determining the separation threshold for carrying out optimal dividing to each various dimensions synthesis pertinence Value;
Corresponding various dimensions synthesis pertinence is higher than the optimal each user to be identified for separating threshold value, is determined as similar use Family.
2. the method as described in claim 1, which is characterized in that be directed to every two user to be identified, determine each attribute respectively Single attributes correlation of corresponding characteristic ginseng value, comprising:
For any two user to be identified, when determining single attributes correlation of the corresponding characteristic ginseng value of any attribute, execute Following steps:
According to the characteristic ginseng value of any attribute of any two user to be identified, any two user to be identified is determined Between be directed to the dimension distance of any attribute, dimension distance is difference of two users in the corresponding dimension of attribute Property;
According to the corresponding dimension distance of any attribute, determine between any two user to be identified for any attribute Single attributes correlation, the list attributes correlation is with the dimension apart from negatively correlated.
3. the method as described in claim 1, which is characterized in that respectively according to each single category between every two user to be identified The property degree of correlation, determines the various dimensions synthesis pertinence between every two user to be identified, comprising:
Respectively by the sum after each single attributes correlation normalization between every two user to be identified, it is determined as the every two and waits for Identify the various dimensions synthesis pertinence between user;Alternatively,
Sum after each single attributes correlation between every two user to be identified is normalized and is weighted respectively, is determined as Various dimensions synthesis pertinence between every two user to be identified.
4. the method according to claim 1, which is characterized in that according to the multidimensional between every two user to be identified It spends synthesis pertinence and optimal separation threshold value is obtained using preset separation threshold model, comprising:
Following steps successively are executed for each various dimensions synthesis pertinence: using the various dimensions synthesis pertinence as separation threshold value To each various dimensions synthesis pertinence, corresponding user to be identified is divided, and obtains specified type set and non-designated set of types It closes;Determine the first accounting in all users to be identified of user to be identified that the specified type set includes and described First average value of the various dimensions synthesis pertinence of each user to be identified of specified type set, and determine the non-designated type Second accounting and the non-designated type set of the user to be identified that set includes in all users to be identified respectively to Identify the second average value of the various dimensions synthesis pertinence of user;According to the difference of first average value and second average value Square of value, first accounting and second accounting, determine difference variance;
Determine the maximum difference variance in each difference variance obtained;
If the corresponding separation threshold value of the maximum difference variance is higher than and default separates threshold value, it is determined that the maximum difference variance pair The separation threshold value answered is optimal separation threshold value.
5. method as claimed in claim 4, which is characterized in that the method also includes:
When determining that the quantity of each similar users is higher than preset quantity threshold value, issue the user with that doubtful there are the polices of potential user group Accuse information.
6. a kind of similar users identification device characterized by comprising
Acquiring unit obtains the feature of each attribute in specified multidimensional attribute for being directed to each user to be identified respectively Parameter value;
Determination unit determines that the single of the corresponding characteristic ginseng value of each attribute belongs to for being directed to every two user to be identified respectively Property the degree of correlation determine every two use to be identified and respectively according to each single attributes correlation between every two user to be identified Various dimensions synthesis pertinence between family;
Obtaining unit, for according to the various dimensions synthesis pertinence between every two user to be identified, using preset separation threshold It is worth model, obtains optimal separation threshold value, the separation threshold model is optimal to the progress of each various dimensions synthesis pertinence for determining The separation threshold value of division;
Judging unit, for corresponding various dimensions synthesis pertinence to be higher than the optimal each user to be identified for separating threshold value, It is determined as similar users.
7. device as claimed in claim 6, which is characterized in that the determination unit is used for:
For any two user to be identified, when determining single attributes correlation of the corresponding characteristic ginseng value of any attribute, execute Following steps:
According to the characteristic ginseng value of any attribute of any two user to be identified, any two user to be identified is determined Between be directed to the dimension distance of any attribute, dimension distance is difference of two users in the corresponding dimension of attribute Property;
According to the corresponding dimension distance of any attribute, determine between any two user to be identified for any attribute Single attributes correlation, the list attributes correlation is with the dimension apart from negatively correlated.
8. device as claimed in claim 6, which is characterized in that the determination unit is used for:
Respectively by the sum after each single attributes correlation normalization between every two user to be identified, it is determined as the every two and waits for Identify the various dimensions synthesis pertinence between user;Alternatively,
Sum after each single attributes correlation between every two user to be identified is normalized and is weighted respectively, is determined as Various dimensions synthesis pertinence between every two user to be identified.
9. a kind of similar users identify equipment, which is characterized in that single including at least one processing unit and at least one storage Member, wherein the storage unit is stored with computer program, when described program is executed by the processing unit, so that described Processing unit perform claim requires the step of 1~5 any the method.
10. a kind of computer-readable medium, which is characterized in that it is stored with the computer that can identify that equipment is executed by similar users Program, when described program is run in similar users identification equipment, so that similar users identification equipment perform claim is wanted The step of seeking 1~5 any the method.
CN201811434297.7A 2018-11-28 2018-11-28 A kind of similar users recognition methods, device, similar users identification equipment and medium Pending CN110197375A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201811434297.7A CN110197375A (en) 2018-11-28 2018-11-28 A kind of similar users recognition methods, device, similar users identification equipment and medium

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201811434297.7A CN110197375A (en) 2018-11-28 2018-11-28 A kind of similar users recognition methods, device, similar users identification equipment and medium

Publications (1)

Publication Number Publication Date
CN110197375A true CN110197375A (en) 2019-09-03

Family

ID=67751413

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201811434297.7A Pending CN110197375A (en) 2018-11-28 2018-11-28 A kind of similar users recognition methods, device, similar users identification equipment and medium

Country Status (1)

Country Link
CN (1) CN110197375A (en)

Cited By (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111461186A (en) * 2020-03-20 2020-07-28 支付宝(杭州)信息技术有限公司 Data similarity processing method and device, storage medium and computer equipment
CN113672703A (en) * 2021-08-26 2021-11-19 国家电网有限公司大数据中心 User information updating method, device, equipment and storage medium
CN115130621A (en) * 2022-08-31 2022-09-30 支付宝(杭州)信息技术有限公司 Model training method and device, storage medium and electronic equipment

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN107203916A (en) * 2016-03-17 2017-09-26 阿里巴巴集团控股有限公司 A kind of user credit method for establishing model and device
CN107464132A (en) * 2017-07-04 2017-12-12 北京三快在线科技有限公司 A kind of similar users method for digging and device, electronic equipment
CN108174296A (en) * 2018-01-02 2018-06-15 武汉斗鱼网络科技有限公司 Malicious user recognition methods and device
CN108898505A (en) * 2018-05-28 2018-11-27 武汉斗鱼网络科技有限公司 Recognition methods, corresponding medium and the electronic equipment of cheating clique

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN107203916A (en) * 2016-03-17 2017-09-26 阿里巴巴集团控股有限公司 A kind of user credit method for establishing model and device
CN107464132A (en) * 2017-07-04 2017-12-12 北京三快在线科技有限公司 A kind of similar users method for digging and device, electronic equipment
CN108174296A (en) * 2018-01-02 2018-06-15 武汉斗鱼网络科技有限公司 Malicious user recognition methods and device
CN108898505A (en) * 2018-05-28 2018-11-27 武汉斗鱼网络科技有限公司 Recognition methods, corresponding medium and the electronic equipment of cheating clique

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
JUSTIN_KO: "最大类间方差法(OTSU)求阈值", 《CSDN》 *

Cited By (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111461186A (en) * 2020-03-20 2020-07-28 支付宝(杭州)信息技术有限公司 Data similarity processing method and device, storage medium and computer equipment
CN113672703A (en) * 2021-08-26 2021-11-19 国家电网有限公司大数据中心 User information updating method, device, equipment and storage medium
CN115130621A (en) * 2022-08-31 2022-09-30 支付宝(杭州)信息技术有限公司 Model training method and device, storage medium and electronic equipment

Similar Documents

Publication Publication Date Title
EP3779841B1 (en) Method, apparatus and system for sending information, and computer-readable storage medium
CN106716382B (en) The method and system of aggregation multiple utility program behavioural analysis for mobile device behavior
US10713601B2 (en) Personalized contextual suggestion engine
CN104541273B (en) Infer the socially relevant property of the information about point of interest
CN107515915B (en) User identification association method based on user behavior data
CN108256568A (en) A kind of plant species identification method and device
CN106164945A (en) Sight modeling and visualization
CN106548364A (en) Method for sending information and device
CN108304758A (en) Facial features tracking method and device
CN109961296A (en) Merchant type recognition methods and device
US10534978B2 (en) Classifying and grouping electronic images
JP2015510636A (en) System and method for identifying and analyzing a user's personal context
CN106575395A (en) Entity resolution incorporating data from various data sources
CN102710770A (en) Identification method for network access equipment and implementation system for identification method
CN106663036A (en) Processing changes in multi-tenant system
CN113190757A (en) Multimedia resource recommendation method and device, electronic equipment and storage medium
WO2021120875A1 (en) Search method and apparatus, terminal device and storage medium
CN110197375A (en) A kind of similar users recognition methods, device, similar users identification equipment and medium
CN109274639A (en) The recognition methods of open platform abnormal data access and device
JP2018077821A (en) Method, program, server device, and processor for generating predictive model of category of venue visited by user
CN102135983A (en) Group dividing method and device based on network user behavior
CN106874936A (en) Image propagates monitoring method and device
US20230104757A1 (en) Techniques for input classification and response using generative neural networks
CN111652087A (en) Car checking method and device, electronic equipment and storage medium
KR20200064148A (en) User situation detection in the messaging service environment and interaction with the messaging service based on the user situation

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination