CN110197375A - A kind of similar users recognition methods, device, similar users identification equipment and medium - Google Patents
A kind of similar users recognition methods, device, similar users identification equipment and medium Download PDFInfo
- Publication number
- CN110197375A CN110197375A CN201811434297.7A CN201811434297A CN110197375A CN 110197375 A CN110197375 A CN 110197375A CN 201811434297 A CN201811434297 A CN 201811434297A CN 110197375 A CN110197375 A CN 110197375A
- Authority
- CN
- China
- Prior art keywords
- user
- identified
- attribute
- various dimensions
- similar users
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Pending
Links
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06Q—INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
- G06Q20/00—Payment architectures, schemes or protocols
- G06Q20/08—Payment architectures
- G06Q20/12—Payment architectures specially adapted for electronic shopping systems
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06Q—INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
- G06Q20/00—Payment architectures, schemes or protocols
- G06Q20/38—Payment protocols; Details thereof
- G06Q20/40—Authorisation, e.g. identification of payer or payee, verification of customer or shop credentials; Review and approval of payers, e.g. check credit lines or negative lists
- G06Q20/401—Transaction verification
- G06Q20/4016—Transaction verification involving fraud or risk level assessment in transaction processing
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06Q—INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
- G06Q30/00—Commerce
- G06Q30/06—Buying, selling or leasing transactions
- G06Q30/0601—Electronic shopping [e-shopping]
Landscapes
- Business, Economics & Management (AREA)
- Accounting & Taxation (AREA)
- Finance (AREA)
- Engineering & Computer Science (AREA)
- General Physics & Mathematics (AREA)
- Strategic Management (AREA)
- Physics & Mathematics (AREA)
- General Business, Economics & Management (AREA)
- Theoretical Computer Science (AREA)
- Marketing (AREA)
- Economics (AREA)
- Development Economics (AREA)
- Computer Security & Cryptography (AREA)
- Management, Administration, Business Operations System, And Electronic Commerce (AREA)
Abstract
The application belongs to field of computer technology, disclose a kind of similar users recognition methods, device, similar users identification equipment and medium, similar users recognition methods disclosed in the present application includes, signature analysis is carried out to each user of specified type in advance, and then specifies the multidimensional attribute for similar identification;According to the characteristic ginseng value of each attribute in the multidimensional attribute of each user to be identified, various dimensions synthesis pertinence of the user to be identified in multiple dimensions is determined;Optimal separation threshold value is obtained using preset separation threshold model according to the various dimensions synthesis pertinence of acquisition, and corresponding various dimensions synthesis pertinence is higher than the optimal each user to be identified for separating threshold value, is determined as similar users.In this way, the degree of correlation by each user to be identified in each specified dimension, can identify similar users, the Efficiency and accuracy of similar users identification is improved.
Description
Technical field
This application involves field of computer technology more particularly to a kind of similar users recognition methods, device, similar users knowledges
Other equipment and medium.
Background technique
In trade company's operation, it usually needs carry out discriminance analysis to specific objective user, this kind of specific objective user can be with
There are certain general character consumption features, be also possible to some undesirable users, according to recognition result, corresponding fortune can be formulated
Seek management strategy.
Such as, illegal user clique is usually clique's action, by a large amount of false account of the registrations such as cell-phone number and mailbox,
And transaction (e.g., money laundering) lack of standardization is carried out according to false account, to purify operating environment, need to know illegal user clique
Not, and then to each illegal user it shields or punishes.
Therefore, how efficiently and effectively identification specific objective user becomes a kind of demand.
Summary of the invention
The embodiment of the present application provides a kind of similar users recognition methods, device, similar users identification equipment and medium, to
In user's identification, according to the degree of correlation of each user in multiple specified dimensions, identifies the user of similar type, improve user
The Efficiency and accuracy of identification.
On the one hand, a kind of similar users recognition methods is provided, comprising:
For each user to be identified, the characteristic ginseng value of each attribute in specified multidimensional attribute is obtained respectively;
For every two user to be identified, single attributes correlation of the corresponding characteristic ginseng value of each attribute is determined respectively,
And it is determined more between every two user to be identified according to each single attributes correlation between every two user to be identified respectively
Dimension synthesis pertinence;
It is obtained according to the various dimensions synthesis pertinence between every two user to be identified using preset separation threshold model
Optimal separation threshold value is obtained, separates threshold model for determining the separation threshold for carrying out optimal dividing to each various dimensions synthesis pertinence
Value;
Corresponding various dimensions synthesis pertinence is higher than the optimal each user to be identified for separating threshold value, is determined as similar use
Family.
On the one hand, a kind of similar users identification device is provided, comprising:
Acquiring unit obtains each attribute in specified multidimensional attribute for being directed to each user to be identified respectively
Characteristic ginseng value;
Determination unit determines the corresponding characteristic ginseng value of each attribute for being directed to every two user to be identified respectively
Single attributes correlation, and determine that every two waits knowing according to each single attributes correlation between every two user to be identified respectively
Various dimensions synthesis pertinence between other user;
Obtaining unit, for according to the various dimensions synthesis pertinence between every two user to be identified, using preset point
Every threshold model, optimal separation threshold value is obtained, it is optimal to the progress of each various dimensions synthesis pertinence for determining to separate threshold model
The separation threshold value of division;
Judging unit, for corresponding various dimensions synthesis pertinence to be higher than the optimal each user to be identified for separating threshold value,
It is determined as similar users.
On the one hand, a kind of similar users identification equipment, including at least one processing unit and at least one storage are provided
Unit, wherein storage unit is stored with computer program, when program unit processed executes, so that processing unit executes
The step of stating any one similar users recognition methods.
On the one hand, a kind of computer-readable medium is provided, the calculating that can identify that equipment is executed by similar users is stored with
Machine program, when program is run in similar users identification equipment, so that similar users identification equipment executes any one of the above
The step of similar users recognition methods.
In a kind of similar users recognition methods provided by the embodiments of the present application, device, similar users identification equipment and medium,
The each attribute that can be shared in advance for each user of specified type, formulates corresponding multidimensional attribute;According to each to be identified
The characteristic ginseng value of each attribute in the specified multidimensional attribute of user determines each user to be identified in multiple dimensions
Various dimensions synthesis pertinence;And then according to the various dimensions synthesis pertinence between each user determined, optimal separation threshold is determined
Value, and divided each user to be identified according to optimal separation threshold value, the higher each similar users of the degree of correlation are obtained, in this way,
By the degree of correlation of each user to be identified in each specified dimension, similar users can be identified, improve user's identification
Efficiency and accuracy.
Other features and advantage will illustrate in the following description, also, partly become from specification
It obtains it is clear that being understood and implementing the application.The purpose of the application and other advantages can be by written explanations
Specifically noted structure is achieved and obtained in book, claims and attached drawing.
Detailed description of the invention
The drawings described herein are used to provide a further understanding of the present application, constitutes part of this application, this Shen
Illustrative embodiments and their description please are not constituted an undue limitation on the present application for explaining the application.In the accompanying drawings:
Fig. 1 a is a kind of application scenario diagram of similar users identification in the application embodiment;
Fig. 1 b is a kind of application login interface exemplary diagram in the application embodiment;
Fig. 2 is a kind of implementation flow chart of similar users recognition methods in the application embodiment;
Fig. 3 is a kind of structural schematic diagram of similar users identification device in the application embodiment;
Fig. 4 is a kind of similar users identification device structure schematic diagram in the application embodiment;
Fig. 5 is a kind of similar users identification device structure schematic diagram in the application embodiment.
Specific embodiment
In order to according to the degree of correlation of each user in multiple specified dimensions, identify similar type when user identifies
User, improves the Efficiency and accuracy of user's identification, the embodiment of the present application provide a kind of similar users recognition methods, device,
Similar users identify equipment and medium.
Firstly, be illustrated to part term involved in the embodiment of the present application, in order to those skilled in the art understand that.
1, multidimensional attribute: is filtered out based on the analysis results after each user progress signature analysis for specified type
Specified type the set of attribute that is respectively provided in multiple and different dimensions of each user.
2, it single attributes correlation: indicates the degree of correlation of two users in the corresponding dimension of same attribute, is by same
What the dimension distance of two characteristic ginseng values in the corresponding dimension of attribute obtained, and with above-mentioned dimension apart from negatively correlated.
3, various dimensions synthesis pertinence: the degree of correlation of synthesis of two users in multiple dimensions is indicated.
In trade company's operation, it usually needs discriminance analysis is carried out to specific objective user, to carry out the non-of transaction lack of standardization
For method user, it usually needs carry out analysis identification to illegal user, and then necessary prevention can be made for it, to reduce
Transaction lack of standardization.Since illegal user would generally be verified automatically when registering or logging in using specific application, therefore, fortune
Platform is sought when screening or identifying to user, generallys use following methods:
First way are as follows: the setting unrecognized complex verification code of application program.
In this way, it is possible to reduce illegal user is logged in by the automatic verifying of specific application, still, identifying code mistake
In complexity, can make troubles to the input of user, this leverages user experience.
The second way are as follows: recognition of face is carried out to illegal user by photographic device.
But in this way, the higher cost of recognition of face, and the privacy of user can be encroached on.
The third mode are as follows: by the way that blacklist mode (e.g., EIC equipment identification code) is arranged, illegal user is screened.
But in this way, it is easy to be cracked by illegal user by modes such as brush machines.
Due to illegal user would generally team play, and would generally be largely empty by registrations such as phone number and mailboxes
False account number, to carry out transaction lack of standardization according to false account, therefore, illegal user clique usually has much shared attribute,
Such as, geographical location is closer to, and the verification time is shorter, and user name is more similar etc..To sum up, one is provided in the embodiment of the present application
The technical solution of kind of similar users identification in the technical solution, usually has altogether according to illegal user each in illegal user clique
This all higher feature of the degree of correlation of the multiple attributes and each illegal user having on different dimensions, knows illegal user
Not.For example, the degree of correlation on Spatial Dimension shows that geographical location is closer to, the degree of correlation on time dimension is shown
Verification time is shorter, comprehensively considers the attributes correlations of multiple dimensions in identification, can be very good to improve accuracy of identification and
Efficiency reduces the probability of identification mistake.
Specifically, carrying out signature analysis to each user of specified type in advance, then, based on the analysis results, formulate corresponding
Multidimensional attribute;According to the characteristic ginseng value of each attribute in the multidimensional attribute of user to be identified, each use to be identified is determined
The family dimension distance on every dimension respectively, and according to the inverse of dimension distance, amplify each user to be identified respectively each
The degree of correlation in dimension, and then the degree of correlation in each dimension respectively between determining each user to be identified, further, according to
The degree of correlation of each user to be identified in each dimension, and separate threshold model, determine optimal separation threshold value, and will be corresponding more
The degree of correlation in a dimension is higher than the above-mentioned optimal each user to be identified for separating threshold value, is determined as similar users.
Similar users recognition methods in the embodiment of the present application can be applied to application scenarios as shown in Figure 1a.Refering to figure
Shown in 1a, for a kind of application scenario diagram of similar users identification.It include: similar users identification equipment 101 in the application scenarios,
The user terminal 102 and management terminal 103 of each user.
Similar users identification equipment 101 is the server cluster or cloud computing that a server or several servers form
Center.User terminal 102 and management terminal 103 are the electronic equipments for having network communications capability, which can be intelligence
Energy mobile phone, tablet computer or portable personal computer etc., pass through wired or wireless network and similar users identify equipment 101
Connection.
When identifying similar users, each user terminal 102 first obtain respectively user to be identified specified application (e.g.,
Shopping application) operation information, and by operation information be sent to similar users identification equipment 101.
It wherein, include the characteristic ginseng value of each attribute in operation information, each attribute can be registration cell-phone number, using user
Name, registration mailbox, mailbox user name, login time, login time interval, Internet protocol (Internet Protocol, IP)
Address, logging device identity coding (Identity, ID), operating system (Operation System, OS) version, browser with
And short message or picture verifying turn around time etc..
It is a kind of application login interface exemplary diagram referring particularly to shown in Fig. 1 b.User is logged in by user terminal and applies journey
After sequence, the user name (e.g., the pet name, cell-phone number, mailbox) which logs in user, when the verifying of short message verification code returns
Between, speed, login time are filled in, last login time interval logs in IP address, logging device ID, os release, system browsing
Device, the information that the information and other application program that user actively fills in are collected are sent to similar users identification equipment 101.
Then it is perform the following steps in sequence by similar users identification equipment 101: receiving the operation that each user terminal 102 is sent
Information, and according to the characteristic of target group, each attribute for including in operation information is screened, and according to the category after screening
Property characteristic ginseng value, respectively determine every two user to be identified between be directed to an attribute dimension distance;Respectively according to every
Two users to be identified are directed to the dimension distance of each attribute;Determine the every two user to be identified list on each attribute respectively
Attributes correlation;Respectively according to each single attributes correlation between every two user to be identified, every two user to be identified is determined
Various dimensions synthesis pertinence;According to each various dimensions synthesis pertinence and separate threshold model, determines optimal separation threshold value, it will
Various dimensions synthesis pertinence is higher than the above-mentioned optimal each user to be identified for separating threshold value, is determined as similar users, and to network operator
Management terminal 103 issue that doubtful there are the notification informations of potential user group.
Finally, after network operator receives the notification information that similar users identify that equipment 101 is sent by management terminal 103, needle
Further detailed analysis is carried out to the similar users for including in notification information, and executes corresponding operation pipe based on the analysis results
Reason strategy.
As shown in fig.2, being a kind of implementation flow chart of similar users recognition methods provided by the present application.Based on Fig. 1 a institute
The specific implementation process of the application scenario diagram shown, this method is as follows:
Step 200: similar users identify that equipment is directed to each user to be identified, are specified respectively by user terminal acquisition
The characteristic ginseng value of each attribute in multidimensional attribute.
Specifically, similar users identify that equipment according to the characteristic of the target user crowd of specified type, determines corresponding more
After dimensional attribute, the operation information for the user that the user terminal of each user to be identified is sent by application program is received respectively,
And according to the characteristic ginseng value of each attribute in the specified multidimensional attribute of operation information acquisition.
Wherein, specified type is the type of target user, e.g., illicit group, illegal teacher, illegal marketing team and illegal
Foreign student etc..Multidimensional attribute is that researcher in advance analyzes multiple target users, determines that target user shares more
The set of attribute in a dimension.
Multidimensional attribute is the target user to identify specified type, and according to target user in possible multiple dimensions
The predicable feature that has and set, such as optional dimension includes: geographic area, personal information, registration information and
Temporal information etc..Attribute for example can be with are as follows: address, age, educational background, the registered place of cell-phone number, when the feature and verifying of mailbox
Between etc..It can wrap an attribute in one dimension, also may include multiple attributes.Such as, when dimension is personal information, personal information
It may include the attributes such as height, weight, age and gender.One attribute can have a characteristic ginseng value, can also be by more
A characteristic ginseng value indicates.Such as, the characteristic ginseng value of height (attribute) is height values;For another example, coordinate (attribute) is by longitude coordinate
Combining with latitude coordinate indicates.
For example, illegal transaction user clique usually has the following characteristics that in the geographic area dimension of each illegal user usually
It is closer to, can use address properties and judged;Verification time in temporal information dimension is usually shorter;Personal information dimension
User name constitutive characteristic in degree may be more similar etc..Then gather for illegal transaction user, formulates corresponding various dimensions
Each attribute that attribute includes are as follows: the applied address of phone number, the feature of user name, the feature of mailbox, mailbox login when
Between and when login time interval etc., IP address when logging in, equipment identification information, operating system version, browser and verifying
Between etc..
In another example target user is customers, usually have the following characteristics that log in the frequency of shopping application compared with
Height, the age, women was in the majority usually between 20-40 years old, and the article of purchase is mainly cosmetics and dress ornament.Then it is directed to shopping at network
Each attribute that the multidimensional attribute that user formulates includes are as follows: gender, age log in the frequency of shopping application, type of doing shopping.
In another example target user is telemarketing personnel, usually have the following characteristics that the contact person to converse daily is more,
Air time is shorter, and the age, geographical location was relatively more fixed between 20-40.The various dimensions then formulated for telemarketing personnel
Each attribute that attribute includes are as follows: talk times, number of contacts, air time, age and geographical location.
In this manner it is possible to the feature of target user is analyzed in advance, and then for the attribute that target user shares, if
Determine the multidimensional attribute for judging the degree of correlation between each user to be identified.To pass through various dimensions category in subsequent steps
Property each attribute characteristic ginseng value identify similar users.
Step 201: similar users identify that equipment is determined between every two user to be identified respectively for each attribute
Dimension distance.
Specifically, dimension distance is relative distance of two users in the corresponding dimension of an attribute, i.e. dimension distance
For the diversity factor of two users in one dimension.
In the embodiment of the present application, for determining any two user to be identified for the dimension distance of any one attribute
It is illustrated.
Similar users identification equipment determine two users to be identified the corresponding dimension of an attribute apart from when, according to two
User to be identified obtains dimension distance in the corresponding characteristic ginseng value of an attribute and preset distance rule.
It wherein, can be in the following ways apart from rule:
First way are as follows: if characteristic ginseng value is continuous variable (e.g., length, height, duration and speed etc.), then directly
The difference between characteristic ginseng value was connected, determines dimension distance.
For example, the height of user A is 160cm (centimetre), the height of user B is 170cm, then similar users identification equipment is true
Determining user A and user B in the corresponding dimension distance of height (attribute) is 10cm.
In another example the verification time that user A logs in shopping application is 2s (second), user B logs in testing for same shopping application
The card time is 5s, then similar users identification equipment determines that user A and user B are in verification time (attribute) corresponding dimension distance
3s。
In another example the frequency that user A logs in study website is that 3 times a week, the frequency that user B logs in study website is weekly
1 time, then user A and user B is 2 times in the corresponding dimension distance of frequency (attribute) for logging in study website.
In this way, directly according to different user for the difference of the characteristic ginseng value of same attribute, so that it may obtain accurate
Dimension distance.
The second way are as follows: if characteristic ginseng value is discontinuous variable (e.g., phone number, device version number, operating system
And browser etc.), then classify respectively to each discontinuous variable, if belonging to the discontinuous variable of two users to be identified
Classification is identical, it is determined that the dimension distance of the two is 0, is 1 otherwise.
For example, it is assumed that the attribute of user to be identified is phone number, then the operator according to belonging to phone number is returned
Phone number is classified in possession, if the operator ownership of the phone number of two users to be identified it is identical, it is determined that two
The dimension distance of person is 0, is otherwise 1.
In another example, it is assumed that the attribute of user to be identified is os release number, if the os release number of two users to be identified
It is identical, it is determined that the dimension distance of the two is 0, is 1 otherwise.
In another example, it is assumed that the terminal device that user A is used is home-made cellphone, and terminal device that user B is used is non-domestic
Mobile phone then determines that user A and user B in the corresponding dimension in terminal device place of production distance are 1.
In this way, be discontinuous variable for value type, but the characteristic ginseng value that can be divided according to classification, just
Corresponding dimension distance can be determined according to the classification of characteristic ginseng value.But in this way, it can not accurately determine
The dimension range accuracy of otherness of the user on attribute, acquisition is lower.
The third mode are as follows: if characteristic ginseng value is string variable (e.g., user name and mailbox etc.), then according to character string
Distance algorithm determines the dimension distance between two users to be identified.
Wherein, String distance algorithm can be with are as follows: editing distance (Edit Distance) algorithm.
For example, the entitled Air city of the user of user A, the entitled castles in the air of the user of user B then use Edit
Distance algorithm determines the dimension distance between user name " Air city " and user name " castles in the air ".
In this manner it is possible to be directed to the characteristic ginseng value of character types, corresponding dimension distance is determined.
4th kind of mode are as follows: if characteristic ginseng value is sentence, according to the similarity or paraphrase between sentence, determine corresponding
Dimension distance.
For example, the query statement of user A input is " a good book ", the query statement of user B input is " fantasy novel ",
Then the query statement of user A and user B are segmented respectively and paraphrase, determining " a good book " are that the probability of books is
0.6, it is greater than predetermined probabilities 0.5, " fantasy novel " is that the probability of books is 0.8, is greater than predetermined probabilities 0.5, then determines that " one is good
Book " and " fantasy novel " each mean books, and the dimension distance of the two is 1.
Fifth procedure are as follows: corresponding grade is set for each characteristic ginseng value in advance, it is corresponding according to characteristic ginseng value
Grade difference determines respective dimensions distance.
For example, presetting the corresponding grade of each educational background: the grade of primary school is 1, and the grade in middle school is 2, the grade of senior middle school
It is 3, the grade of university is 4.Assuming that the educational background of user A is university, the educational background of user B is senior middle school, then determines user A and user B
Between in educational background dimension distance be 1.
Only using characteristic ginseng value as continuous variable in the embodiment of the present application, discontinuous variable, string variable and sentence
It being illustrated Deng for, in practical application, dimension distance can formulate distance rule accordingly according to different attribute features,
This is not restricted.In this manner it is possible to determine the distance of different user in one dimension.
In the embodiment of the present application, only by determine two users in the corresponding dimension of an attribute dimension distance for into
Row explanation, is based on the principle of similitude, can obtain any other two users in the dimension distance of any one attribute.
Step 202: similar users identification equipment is directed to the dimension distance of each attribute of every two user to be identified respectively,
The corresponding single attributes correlation of each attribute is determined respectively.
Specifically, the inverse of every dimension distance is determined as single attributes correlation respectively by similar users identification equipment, and
Single attributes correlation is normalized, single attributes correlation after being normalized.
Optionally, when determining single attributes correlation after normalizing, following formula can be used:
Wherein, i and j is the serial number of user to be identified, MijSingle category between user i to be identified and user j to be identified
Property the degree of correlation, n be user to be identified total number, a be dimension distance, Δ is parameter.
In order to avoid aijWhen being 0, a can not be calculatedijInverse distance the case where, any one aij is 0 if it exists, then Δ
It is not 0, if all aij are not 0, Δ=0 is set.Optionally, Δ can be set to 0.0001*Min aij。
For example, in the corresponding dimension of height apart from being 10 between user A and user B, in height between user B and user C
Corresponding dimension distance is 5, and user A and user C are 5 in the corresponding dimension distance of height.Then returning between user A and user B
Single attributes correlation=0.1/ (0.1+0.2+0.2)=0.2 after one change.
Optionally, when executing step 202, each inverse distance can not also be normalized.In this way, it is possible to reduce calculate
Data volume, improve data-handling efficiency.
In this manner it is possible to determine two users to be identified degree of correlation in each dimension respectively.Obviously, two wait know
Dimension between other user apart from smaller, illustrate the difference between two users to be identified with regard to smaller, then two users to be identified
Between single attributes correlation it is bigger, conversely, single attributes correlation is smaller.
For example, dimension distance of two users on the verification time is 0.001, it is determined that single attribute between two users
The degree of correlation is 1000.As it can be seen that the degree of correlation of two users on this dimension of verification time is higher, that is to say, that two users exist
When being verified, application verification has been carried out using the almost the same time.
Step 203: similar users identify equipment respectively according to each single attributes correlation between every two user to be identified,
Determine the various dimensions synthesis pertinence of every two user to be identified.
Specifically, similar users identification equipment determines the various dimensions synthesis pertinence between every two user to be identified respectively
When, following methods can be used for every two user to be identified:
First way are as follows: by the adduction of each single attributes correlation between two users to be identified, be determined as two to
Identify the various dimensions synthesis pertinence between user.
For example, it is 0.5 that each single attributes correlation between user A and user B, which is respectively as follows: time correlation degree, geographical location
The degree of correlation is 0.1 and the personal information degree of correlation is 0.2, then similar users identification equipment determines between user A and user B
Various dimensions synthesis pertinence is 0.8.
In this manner it is possible to which each single attributes correlation is directly carried out linear, additive, various dimensions synthesis pertinence is obtained.
The second way are as follows: each single attributes correlation between two users to be identified is weighted summation, obtains two
Various dimensions synthesis pertinence between a user to be identified.
For example, it is 0.2 that each single attributes correlation between user A and user B, which is respectively as follows: time correlation degree, geographical location
The degree of correlation is 0.1 and the personal information degree of correlation is 0.5, and the weighted value of each attribute is 0.5, then similar users identification is set
The standby various dimensions synthesis pertinence determined between user A and user B is 0.4.
In this way, using the second way, respectively according to the significance level of each single attributes correlation, respectively each single category
Property the degree of correlation corresponding weight is set, and each single attributes correlation is summed up with the product of corresponding weight, and then should
Adduction is determined as various dimensions synthesis pertinence.
Optionally, weighted mean method can also be used, various dimensions synthesis pertinence is obtained.
For example, it is 0.1 that each single attributes correlation between user A and user B, which is respectively as follows: the age degree of correlation, corresponding power
Weight values are 0.5, and the academic degree of correlation is 0.5, and corresponding weighted value is 0.1, and the gender degree of correlation is 1, and corresponding weighted value is 0.1,
The ownership place degree of correlation is 1, and corresponding weighted value is 0.3, then similar users identification equipment determines that the various dimensions of the two are comprehensive related
Degree is (0.1*0.5+0.1*0.5+0.1*1+1*0.3)/4=0.125.
In this manner it is possible to determine the degree of correlation of the synthesis between user to be identified in each dimension.
Step 204: similar users identify that equipment according to the various dimensions synthesis pertinence between every two user to be identified, is adopted
With preset separation threshold model, optimal separation threshold value is obtained.
In the embodiment of the present application, it is assumed that each user to be identified is divided into two classes, then determines one by the difference variance between two classes
A optimal separation threshold value, to carry out classifying rationally to each user to be identified.
Wherein it is determined that when optimal separation threshold value following steps can be used:
Firstly, similar users identify that equipment generates correlation matrix according to each various dimensions synthesis pertinence, and successively it is directed to
Each various dimensions synthesis pertinence of correlation matrix executes following steps:
S1, using the various dimensions synthesis pertinence as separate threshold value to the corresponding use to be identified of each various dimensions synthesis pertinence
Family is divided, and specified type set and non-designated type set are obtained.
That is, each user to be identified is demarcated by line of demarcation of segmentation threshold, corresponding various dimensions are comprehensive
Each user to be identified that the degree of correlation is higher than separation threshold value is divided into specified type set, and remaining other users are divided into non-finger
Determine type set.
For example, it is assumed that separating threshold value is 0.5, the various dimensions synthesis pertinence between user A and user B is 0.8, user A
Various dimensions synthesis pertinence between user C is 0.3, then is greater than corresponding various dimensions synthesis pertinence 0.8 and separates threshold value
0.5 user A and user B is divided into specified type set, and remaining user C is divided into non-designated type set.
The of the total quantity of S2, the quantity for determining the user to be identified that specified type set includes and all users to be identified
First average value of the corresponding each various dimensions synthesis pertinence of each user to be identified of one accounting and specified type set;And
Determine the second accounting of the quantity for the user to be identified that non-designated type set includes and the total quantity of all users to be identified, with
And the second average value of the corresponding each various dimensions synthesis pertinence of each user to be identified of non-designated type set;Further,
According to square of the difference of the first average value and the second average value, the first accounting and the second accounting determine difference variance;
Specifically, when determining difference variance following formula can be used:
G=w0·w1·(u0-u1)2 (1)
Wherein, g is difference variance, w0For the first accounting, w1For the second accounting, u0For the first average value, u1It is average for second
Value.Formula (1) is obtained by following formula reasoning:
U=w0u0+w1u1 (2)
G=w0(u0-u)2+w1(u1-u)2 (3)
In this manner it is possible to which formula (2) is brought into formula (3), obtain formula (1).
Wherein, g is difference variance, w0For the first accounting, w1For the second accounting, u0For the first average value, u1It is average for second
Value, u are the average value of the various dimensions synthesis pertinence of all users to be identified.
Then, similar users identification equipment obtains the difference determined for each various dimensions synthesis pertinence of correlation matrix
After Singular variance, the maximum difference variance in each difference variance is determined.If the corresponding separation threshold value of maximum difference variance is higher than pre-
If separating threshold value, it is determined that corresponding the separations threshold value of above-mentioned maximum difference variance is optimal separation threshold value.
Wherein, separation threshold value is preset artificially to be arranged.This is because all users to be identified may be in practical application
For normal users, i.e., directly dividing the specified type set of acquisition by optimal separation threshold value may be also normal users, therefore,
Default segmentation threshold is set according to actual needs, and it is pre- to judge whether optimum segmentation threshold value is higher than after obtaining optimal separation threshold value
If separating threshold value, if so, illustrating that optimum segmentation threshold value is higher, a possibility that there are similar users, is higher, otherwise, illustrates exist
A possibility that similar users, is very small, to save the subsequent tedious steps manually checked, is only higher than to optimum segmentation threshold value pre-
If the case where separating threshold value carries out classification processing.
For example, default segmentation threshold is 0.8, similar users identify that the optimal separation threshold value that equipment determines is 0.2, it is clear that
Optimal separation threshold value 0.2 separates threshold value 0.8 lower than default, then determines that all users to be identified are ordinary user, and phase is not present
Like user.
Optionally, when determining maximum difference variance according to formula (1), point-by-point method or Newton iteration can also be used
Method.
In this way, in such a way that maximum difference variance divides user, may be implemented to specified type set and non-
The coverage rate of the classifying rationally of specified type set, each dimension of user is also higher, improves the accuracy of similar users identification.
Step 205: similar users identify that corresponding various dimensions synthesis pertinence is higher than each of optimal separation threshold value by equipment
User to be identified, is determined as similar users.
Specifically, similar users identify equipment by corresponding various dimensions synthesis pertinence be higher than it is optimal separate threshold value respectively to
It identifies that user forms optimal specified type set, and each user in optimal specified type set is determined as similar users.
Optionally, similar users identification equipment determines that the quantity for the similar users for including in optimal specified type set is higher than
When preset quantity threshold value, determine that there may be potential user groups, and issues to management terminal doubtful there are potential user group
Warning message.
Optionally, various dimensions synthesis pertinence can also be higher than the optimal each use for separating threshold value by similar users identification equipment
Family is shown by highlighted or different colours to user.
In this manner it is possible to identify to each user to be identified, and according to recognition result, determine that there are the general of target user
When rate is higher, notification information is issued the user with, and then user can carry out manual analysis judgement, improve according to prompt information
The recognition efficiency and accuracy of similar users, bring convenience to user.It is further possible to be each according to recognition result
User setting feature tag set determines user's portrait of user.
A specific application scenarios are used below, and above-described embodiment is further described:
Assuming that potential user group is illegal transaction user clique, the attribute that researcher has illegal transaction user clique
After being analyzed, multidimensional attribute each attribute for including of setting illegal transaction clique are as follows: registration cell-phone number, using user name,
Registration mailbox, mailbox user name, login time, login time interval log in IP address, logging device ID, os release, browser
And short message or picture verify turn around time.
Then, it when similar users identification equipment identifies user according to multidimensional attribute, obtains respectively each wait know
The characteristic ginseng value of each attribute in the above-mentioned multidimensional attribute of other user, and determine be directed between every two user to be identified respectively
The dimension distance of each attribute, and corresponding single attributes correlation is determined according to the dimension of each attribute distance respectively.
Then, similar users identification equipment determines respectively according to single attributes correlation between every two user to be identified
The various dimensions synthesis pertinence of every two user to be identified.
Finally, similar users identification equipment determines optimal separation threshold value according to various dimensions synthesis pertinence, and according to optimal
Separate threshold value to divide each user to be identified, obtains illegal transaction user set and normal users set.Further, phase
It issues like user-identification device to network operator doubtful there are the warning message of illegal transaction clique, allows network operator according to police
It accuses information and carries out further manual analysis, and formulate corresponding maintenance measure.
Another specific application scenarios is used below, and above-described embodiment is further described:
Assuming that specified type collection is combined into trade company's set, the attribute that user has specified type trade company is analyzed, and root
The attribute that multidimensional attribute according to analysis result setting trade company includes are as follows: application program uses duration, application program update time
Interval, age, and educational background.
When similar users identification equipment identifies each user, determines be directed to each attribute between every two user respectively
Dimension distance, and corresponding single attributes correlation is determined according to the dimension of each attribute distance respectively, and respectively according to every
Single attributes correlation between two users determines the various dimensions synthesis pertinence of every two user.
Further, according to the various dimensions synthesis pertinence determined, optimum segmentation threshold value is determined, and according to optimal separation
Threshold value divides each user, obtains trade company's set.
Finally, similar users identification equipment determines that the number of users 100 for including in trade company's set is higher than preset quantity thresholding
Value 20, then to management terminal issue it is doubtful there are target trade company gather notification information so that administrative staff pass through management terminal
It can receive notification information, and each user's further progress in the trade company's set determined manually divided according to notification information
Analysis, and determine each user in trade company's set for after target group, each user into trade company's set sends corresponding promotion and disappears
Breath.
Based on the same inventive concept, a kind of similar users identification device is additionally provided in the embodiment of the present application, due to above-mentioned
The principle that device and equipment solve the problems, such as is similar to a kind of similar users recognition methods, and therefore, the implementation of above-mentioned apparatus can be joined
The implementation of square method, overlaps will not be repeated.
As shown in figure 3, it is a kind of structural schematic diagram of similar users identification device provided by the embodiments of the present application, packet
It includes:
Acquiring unit 30 obtains each attribute in specified multidimensional attribute for being directed to each user to be identified respectively
Characteristic ginseng value;
Determination unit 31 determines the corresponding characteristic ginseng value of each attribute for being directed to every two user to be identified respectively
Single attributes correlation, and determine that every two waits for according to each single attributes correlation between every two user to be identified respectively
Identify the various dimensions synthesis pertinence between user;
Obtaining unit 32, for according to the various dimensions synthesis pertinence between every two user to be identified, use to be preset
Separate threshold model, obtain optimal separation threshold value, separates threshold model and each various dimensions synthesis pertinence is carried out most for determining
The separation threshold value of excellent division;
Judging unit 33, for corresponding various dimensions synthesis pertinence to be higher than the optimal each use to be identified for separating threshold value
Family is determined as similar users.
Preferably, determination unit 31 is used for:
For any two user to be identified, when determining single attributes correlation of the corresponding characteristic ginseng value of any attribute,
Execute following steps:
According to the characteristic ginseng value of any attribute of any two user to be identified, determine that any two are to be identified
The dimension distance of any attribute is directed between user, dimension distance is difference of two users in the corresponding dimension of attribute
Property;
According to the corresponding dimension distance of any attribute, determine between any two user to be identified for any category
Single attributes correlation of property, single attributes correlation and dimension are apart from negatively correlated.
Preferably, determination unit 31 is used for:
Respectively by the sum after each single attributes correlation normalization between every two user to be identified, be determined as this every two
Various dimensions synthesis pertinence between a user to be identified;Alternatively,
Sum after each single attributes correlation between every two user to be identified is normalized and is weighted respectively, really
The various dimensions synthesis pertinence being set between every two user to be identified.
Preferably, obtaining unit 32 is used for:
Following steps successively are executed for each various dimensions synthesis pertinence: using the various dimensions synthesis pertinence as separation
Threshold value divides the corresponding user to be identified of each various dimensions synthesis pertinence, obtains specified type set and non-designated type
Set;Determine the first accounting and specified class of user to be identified that specified type set includes in all users to be identified
First average value of the various dimensions synthesis pertinence of each user to be identified of type set, and determine that non-designated type set includes
The multidimensional of each user to be identified of second accounting and non-designated type set of the user to be identified in all users to be identified
Spend the second average value of synthesis pertinence;According to square of the difference of the first average value and the second average value, the first accounting, and
Second accounting determines difference variance;
Determine the maximum difference variance in each difference variance obtained;
If the corresponding separation threshold value of maximum difference variance is higher than default separation threshold value, it is determined that maximum difference variance is corresponding
Separation threshold value is optimal separation threshold value.
Preferably, judging unit 33 is also used to:
When determining that the quantity of each similar users is higher than preset quantity threshold value, issue the user with that doubtful there are potential user groups
Warning message.
In a kind of similar users recognition methods provided by the embodiments of the present application, device, similar users identification equipment and medium,
The each attribute that can be shared in advance for each user of specified type, formulates corresponding multidimensional attribute;According to each to be identified
The characteristic ginseng value of each attribute in the specified multidimensional attribute of user determines each user to be identified in multiple dimensions
Various dimensions synthesis pertinence;And then according to the various dimensions synthesis pertinence between each user determined, optimal separation threshold is determined
Value, and divided each user to be identified according to optimal separation threshold value, the higher each similar users of the degree of correlation are obtained, in this way,
By the degree of correlation of each user to be identified in each specified dimension, similar users can be identified, improve user's identification
Efficiency and accuracy.
As shown in fig.4, identifying the structural schematic diagram of equipment for a kind of similar users.Based on same technical concept, this Shen
Please embodiment additionally provide a kind of similar users identification equipment, similar users identification equipment 400 is for implementing above-mentioned each method
The method that embodiment is recorded.Similar users identify that equipment 400 includes: processor 410, memory 420, power supply 430, display unit
440, input unit 450.
Processor 410 is the control centre of similar users identification equipment 400, utilizes various interfaces and each portion of connection
Part executes similar users and identifies equipment 400 by running or executing the software program and/or data that are stored in memory 420
Various functions, thus to server carry out integral monitoring.
Optionally, processor 410 may include one or more processing units;Preferably, processor 410 can integrate at
Manage device and modem processor, wherein the main processing operation system of application processor, user interface and application program etc. are adjusted
Demodulation processor processed mainly handles wireless communication.It is understood that above-mentioned modem processor can not also integrate everywhere
It manages in device 410.In some embodiments, processor, memory, can realize on a single chip, in some embodiments, it
Can also be realized respectively on independent chip.
Memory 420 can mainly include storing program area and storage data area, wherein storing program area can store operation system
System, various application programs etc.;Storage data area, which can be stored, uses created data according to similar users identification equipment 400
Deng.In addition, memory 420 may include high-speed random access memory, it can also include nonvolatile memory, for example, at least
One disk memory, flush memory device or other volatile solid-state parts etc..
Similar users identification equipment 400 further includes the power supply 430 (such as battery) powered to all parts, and power supply can lead to
Cross power-supply management system and processor 410 be logically contiguous, thus by power-supply management system realize management charging, electric discharge and
The functions such as power consumption.
Display unit 440 can be used for showing information input by user or the information and similar users knowledge that are supplied to user
The various menus etc. of other equipment 400 are mainly used for showing in similar users identification equipment 400 in the embodiment of the present application respectively using journey
The entities such as text, the picture shown in the display interface and display interface of sequence.Display unit 440 may include display panel
141.Display panel 141 can use liquid crystal display (Liquid Crystal Display, LCD), Organic Light Emitting Diode
Forms such as (Organic Light-Emitting Diode, OLED) configure.
Input unit 450 can be used for receiving the information such as number or the character of user's input.Input unit 450 may include touch-control
Panel 451 and other input equipments 452.Wherein, it is on it or attached to collect user for touch panel 451, also referred to as touch screen
Close touch operation (such as user using any suitable objects or attachment such as finger, felt pens on touch panel 451 or
Operation near touch panel 451).
Specifically, touch panel 451 can detecte the touch operation of user, and detect touch operation bring signal, it will
These signals are converted into contact coordinate, are sent to processor 410, and receive order that processor 410 is sent and executed.This
Outside, touch panel 451 can be realized using multiple types such as resistance-type, condenser type, infrared ray and surface acoustic waves.Other inputs
Equipment 452 can include but is not limited to physical keyboard, function key (such as volume control button, switching on and shutting down key etc.), trace ball,
One of mouse, operating stick etc. are a variety of.
Certainly, touch panel 451 can cover display panel 441, when touch panel 451 detects touching on it or nearby
After touching operation, processor 410 is sent to determine the type of touch event, is followed by subsequent processing device 410 according to the type of touch event
Corresponding visual output is provided on display panel 441.Although touch panel 451 and display panel 441 are conducts in Fig. 4
Two independent components realize that similar users identification equipment 400 outputs and inputs function, but in certain embodiments, it can
That similar users identification equipment 400 is realized so that touch panel 451 and display panel 441 is integrated outputs and inputs function.
Similar users identification equipment 400 may also include one or more sensors, such as pressure sensor, acceleration of gravity
Sensor, close to optical sensor etc..Certainly, according to the needs in concrete application, above-mentioned similar users identification equipment 400 can be with
Other components such as including camera, since these components are not the components that emphasis uses in the embodiment of the present application, in Fig. 4
In be not shown, and be no longer described in detail.
It will be understood by those skilled in the art that Fig. 4 is only the citing of similar users identification equipment, do not constitute to similar
The restriction of user-identification device may include perhaps combining certain components or different than illustrating more or fewer components
Component.
Based on same technical concept, the embodiment of the present application also provides a kind of similar users to identify equipment, referring to Fig. 5 institute
Show, similar users identification equipment 500 is used to implement the method that above-mentioned each embodiment of the method is recorded, such as implements shown in Fig. 2
Embodiment, similar users identify that equipment 500 may include memory 501, processor 502, input unit 503 and display panel
504。
Memory 501, the computer program executed for storage processor 502.Memory 501 can mainly include storage journey
Sequence area and storage data area, wherein storing program area can application program needed for storage program area, at least one function etc.;
Storage data area, which can be stored, uses created data etc. according to similar users identification equipment 500.Processor 502, can be
One central processing unit (central processing unit, CPU), or be digital processing element etc..Input unit
503, it can be used for obtaining the user instruction of user's input.Display panel 504, for showing information input by user or offer
To the information of user, in the embodiment of the present application, display panel 504 is mainly used for showing in similar users identification equipment respectively using journey
The control entity shown in the display interface of sequence and each display interface.Optionally, display panel 504 can use liquid crystal display
Device (liquid crystal display, LCD) or OLED (organic light-emitting diode, organic light emission two
Pole pipe) etc. forms configure display panel 504.
Above-mentioned memory 501, processor 502, input unit 503 and display panel 504 are not limited in the embodiment of the present application
Between specific connection medium.The embodiment of the present application is in Fig. 5 with memory 501, processor 502, input unit 503, display
It is connected between panel 504 by bus 505, bus 505 is indicated in Fig. 5 with thick line, the connection type between other components, only
It is to be schematically illustrated, does not regard it as and be limited.Bus 505 can be divided into address bus, data/address bus, control bus etc..For
Convenient for indicating, only indicated with a thick line in Fig. 5, it is not intended that an only bus or a type of bus.
Memory 501 can be volatile memory (volatile memory), such as random access memory
(random-access memory, RAM);Memory 501 is also possible to nonvolatile memory (non-volatile
Memory), such as read-only memory, flash memory (flash memory), hard disk (hard disk drive, HDD) or solid
State hard disk (solid-state drive, SSD) or memory 501 can be used for carrying or storing have instruction or data
The desired program code of structure type and can by any other medium of computer access, but not limited to this.Memory 501
It can be the combination of above-mentioned memory.
Processor 502, for calling the computer program stored in memory 501 execution such as to implement implementation shown in Fig. 2
Example.
In some possible embodiments, a kind of various aspects of similar users recognition methods provided by the present application may be used also
In the form of being embodied as a kind of program product comprising program code, when program product is run in similar users identification equipment
When, program code be used for make similar users identification equipment execute this specification foregoing description according to the various exemplary realities of the application
Apply the step in a kind of similar users recognition methods of mode.For example, similar users identification equipment can execute following steps:
S0, it is directed to each user to be identified, the feature for obtaining each attribute in specified multidimensional attribute by user terminal respectively is joined
Numerical value;S1, the dimension distance that each attribute is directed between every two user to be identified is determined respectively;S2, it is directed to every two respectively
The dimension distance of each attribute of a user to be identified determines the corresponding single attributes correlation of each attribute respectively;S3, respectively root
According to each single attributes correlation between every two user to be identified, determine that the various dimensions of every two user to be identified are comprehensive related
Degree;S4, it is obtained according to the various dimensions synthesis pertinence between every two user to be identified using preset separation threshold model
Optimal separation threshold value;S5, corresponding various dimensions synthesis pertinence is higher than the optimal each user to be identified for separating threshold value, be determined as
Similar users.
Based on the same inventive concept, the embodiment of the present application also provides a kind of computer-readable medium, being stored with can be by
Similar users identify the computer program that equipment executes, when described program is run in similar users identification equipment, so that institute
State the step of similar users identification equipment executes the method for being packaged application program.
It should be understood by those skilled in the art that, embodiments herein can provide as method, system or computer program
Product.Therefore, complete hardware embodiment, complete software embodiment or reality combining software and hardware aspects can be used in the application
Apply the form of example.Moreover, it wherein includes the computer of computer usable program code that the application, which can be used in one or more,
The computer program implemented in usable storage medium (including but not limited to magnetic disk storage, CD-ROM, optical memory etc.) produces
The form of product.
The application is referring to method, the process of equipment (system) and computer program product according to the embodiment of the present application
Figure and/or block diagram describe.It should be understood that every one stream in flowchart and/or the block diagram can be realized by computer program instructions
The combination of process and/or box in journey and/or box and flowchart and/or the block diagram.It can provide these computer programs
Instruct the processor of general purpose computer, special purpose computer, Embedded Processor or other programmable data processing devices to produce
A raw machine, so that being generated by the instruction that computer or the processor of other programmable data processing devices execute for real
The device for the function of being specified in present one or more flows of the flowchart and/or one or more blocks of the block diagram.
These computer program instructions, which may also be stored in, is able to guide computer or other programmable data processing devices with spy
Determine in the computer-readable memory that mode works, so that it includes referring to that instruction stored in the computer readable memory, which generates,
Enable the manufacture of device, the command device realize in one box of one or more flows of the flowchart and/or block diagram or
The function of being specified in multiple boxes.
These computer program instructions also can be loaded onto a computer or other programmable data processing device, so that counting
Series of operation steps are executed on calculation machine or other programmable devices to generate computer implemented processing, thus in computer or
The instruction executed on other programmable devices is provided for realizing in one or more flows of the flowchart and/or block diagram one
The step of function of being specified in a box or multiple boxes.
Although the preferred embodiment of the application has been described, it is created once a person skilled in the art knows basic
Property concept, then additional changes and modifications can be made to these embodiments.So it includes excellent that the following claims are intended to be interpreted as
It selects embodiment and falls into all change and modification of the application range.
Obviously, those skilled in the art can carry out various modification and variations without departing from the essence of the application to the application
Mind and range.In this way, if these modifications and variations of the application belong to the range of the claim of this application and its equivalent technologies
Within, then the application is also intended to include these modifications and variations.
Claims (10)
1. a kind of similar users recognition methods characterized by comprising
For each user to be identified, the characteristic ginseng value of each attribute in specified multidimensional attribute is obtained respectively;
For every two user to be identified, single attributes correlation of the corresponding characteristic ginseng value of each attribute is determined respectively, and is divided
Not according to each single attributes correlation between every two user to be identified, the various dimensions between every two user to be identified are determined
Synthesis pertinence;
It is obtained most according to the various dimensions synthesis pertinence between every two user to be identified using preset separation threshold model
Optimal sorting is every threshold value, and the threshold model that separates is for determining the separation threshold for carrying out optimal dividing to each various dimensions synthesis pertinence
Value;
Corresponding various dimensions synthesis pertinence is higher than the optimal each user to be identified for separating threshold value, is determined as similar use
Family.
2. the method as described in claim 1, which is characterized in that be directed to every two user to be identified, determine each attribute respectively
Single attributes correlation of corresponding characteristic ginseng value, comprising:
For any two user to be identified, when determining single attributes correlation of the corresponding characteristic ginseng value of any attribute, execute
Following steps:
According to the characteristic ginseng value of any attribute of any two user to be identified, any two user to be identified is determined
Between be directed to the dimension distance of any attribute, dimension distance is difference of two users in the corresponding dimension of attribute
Property;
According to the corresponding dimension distance of any attribute, determine between any two user to be identified for any attribute
Single attributes correlation, the list attributes correlation is with the dimension apart from negatively correlated.
3. the method as described in claim 1, which is characterized in that respectively according to each single category between every two user to be identified
The property degree of correlation, determines the various dimensions synthesis pertinence between every two user to be identified, comprising:
Respectively by the sum after each single attributes correlation normalization between every two user to be identified, it is determined as the every two and waits for
Identify the various dimensions synthesis pertinence between user;Alternatively,
Sum after each single attributes correlation between every two user to be identified is normalized and is weighted respectively, is determined as
Various dimensions synthesis pertinence between every two user to be identified.
4. the method according to claim 1, which is characterized in that according to the multidimensional between every two user to be identified
It spends synthesis pertinence and optimal separation threshold value is obtained using preset separation threshold model, comprising:
Following steps successively are executed for each various dimensions synthesis pertinence: using the various dimensions synthesis pertinence as separation threshold value
To each various dimensions synthesis pertinence, corresponding user to be identified is divided, and obtains specified type set and non-designated set of types
It closes;Determine the first accounting in all users to be identified of user to be identified that the specified type set includes and described
First average value of the various dimensions synthesis pertinence of each user to be identified of specified type set, and determine the non-designated type
Second accounting and the non-designated type set of the user to be identified that set includes in all users to be identified respectively to
Identify the second average value of the various dimensions synthesis pertinence of user;According to the difference of first average value and second average value
Square of value, first accounting and second accounting, determine difference variance;
Determine the maximum difference variance in each difference variance obtained;
If the corresponding separation threshold value of the maximum difference variance is higher than and default separates threshold value, it is determined that the maximum difference variance pair
The separation threshold value answered is optimal separation threshold value.
5. method as claimed in claim 4, which is characterized in that the method also includes:
When determining that the quantity of each similar users is higher than preset quantity threshold value, issue the user with that doubtful there are the polices of potential user group
Accuse information.
6. a kind of similar users identification device characterized by comprising
Acquiring unit obtains the feature of each attribute in specified multidimensional attribute for being directed to each user to be identified respectively
Parameter value;
Determination unit determines that the single of the corresponding characteristic ginseng value of each attribute belongs to for being directed to every two user to be identified respectively
Property the degree of correlation determine every two use to be identified and respectively according to each single attributes correlation between every two user to be identified
Various dimensions synthesis pertinence between family;
Obtaining unit, for according to the various dimensions synthesis pertinence between every two user to be identified, using preset separation threshold
It is worth model, obtains optimal separation threshold value, the separation threshold model is optimal to the progress of each various dimensions synthesis pertinence for determining
The separation threshold value of division;
Judging unit, for corresponding various dimensions synthesis pertinence to be higher than the optimal each user to be identified for separating threshold value,
It is determined as similar users.
7. device as claimed in claim 6, which is characterized in that the determination unit is used for:
For any two user to be identified, when determining single attributes correlation of the corresponding characteristic ginseng value of any attribute, execute
Following steps:
According to the characteristic ginseng value of any attribute of any two user to be identified, any two user to be identified is determined
Between be directed to the dimension distance of any attribute, dimension distance is difference of two users in the corresponding dimension of attribute
Property;
According to the corresponding dimension distance of any attribute, determine between any two user to be identified for any attribute
Single attributes correlation, the list attributes correlation is with the dimension apart from negatively correlated.
8. device as claimed in claim 6, which is characterized in that the determination unit is used for:
Respectively by the sum after each single attributes correlation normalization between every two user to be identified, it is determined as the every two and waits for
Identify the various dimensions synthesis pertinence between user;Alternatively,
Sum after each single attributes correlation between every two user to be identified is normalized and is weighted respectively, is determined as
Various dimensions synthesis pertinence between every two user to be identified.
9. a kind of similar users identify equipment, which is characterized in that single including at least one processing unit and at least one storage
Member, wherein the storage unit is stored with computer program, when described program is executed by the processing unit, so that described
Processing unit perform claim requires the step of 1~5 any the method.
10. a kind of computer-readable medium, which is characterized in that it is stored with the computer that can identify that equipment is executed by similar users
Program, when described program is run in similar users identification equipment, so that similar users identification equipment perform claim is wanted
The step of seeking 1~5 any the method.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201811434297.7A CN110197375A (en) | 2018-11-28 | 2018-11-28 | A kind of similar users recognition methods, device, similar users identification equipment and medium |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201811434297.7A CN110197375A (en) | 2018-11-28 | 2018-11-28 | A kind of similar users recognition methods, device, similar users identification equipment and medium |
Publications (1)
Publication Number | Publication Date |
---|---|
CN110197375A true CN110197375A (en) | 2019-09-03 |
Family
ID=67751413
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201811434297.7A Pending CN110197375A (en) | 2018-11-28 | 2018-11-28 | A kind of similar users recognition methods, device, similar users identification equipment and medium |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN110197375A (en) |
Cited By (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN111461186A (en) * | 2020-03-20 | 2020-07-28 | 支付宝(杭州)信息技术有限公司 | Data similarity processing method and device, storage medium and computer equipment |
CN113672703A (en) * | 2021-08-26 | 2021-11-19 | 国家电网有限公司大数据中心 | User information updating method, device, equipment and storage medium |
CN115130621A (en) * | 2022-08-31 | 2022-09-30 | 支付宝(杭州)信息技术有限公司 | Model training method and device, storage medium and electronic equipment |
Citations (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN107203916A (en) * | 2016-03-17 | 2017-09-26 | 阿里巴巴集团控股有限公司 | A kind of user credit method for establishing model and device |
CN107464132A (en) * | 2017-07-04 | 2017-12-12 | 北京三快在线科技有限公司 | A kind of similar users method for digging and device, electronic equipment |
CN108174296A (en) * | 2018-01-02 | 2018-06-15 | 武汉斗鱼网络科技有限公司 | Malicious user recognition methods and device |
CN108898505A (en) * | 2018-05-28 | 2018-11-27 | 武汉斗鱼网络科技有限公司 | Recognition methods, corresponding medium and the electronic equipment of cheating clique |
-
2018
- 2018-11-28 CN CN201811434297.7A patent/CN110197375A/en active Pending
Patent Citations (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN107203916A (en) * | 2016-03-17 | 2017-09-26 | 阿里巴巴集团控股有限公司 | A kind of user credit method for establishing model and device |
CN107464132A (en) * | 2017-07-04 | 2017-12-12 | 北京三快在线科技有限公司 | A kind of similar users method for digging and device, electronic equipment |
CN108174296A (en) * | 2018-01-02 | 2018-06-15 | 武汉斗鱼网络科技有限公司 | Malicious user recognition methods and device |
CN108898505A (en) * | 2018-05-28 | 2018-11-27 | 武汉斗鱼网络科技有限公司 | Recognition methods, corresponding medium and the electronic equipment of cheating clique |
Non-Patent Citations (1)
Title |
---|
JUSTIN_KO: "最大类间方差法(OTSU)求阈值", 《CSDN》 * |
Cited By (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN111461186A (en) * | 2020-03-20 | 2020-07-28 | 支付宝(杭州)信息技术有限公司 | Data similarity processing method and device, storage medium and computer equipment |
CN113672703A (en) * | 2021-08-26 | 2021-11-19 | 国家电网有限公司大数据中心 | User information updating method, device, equipment and storage medium |
CN115130621A (en) * | 2022-08-31 | 2022-09-30 | 支付宝(杭州)信息技术有限公司 | Model training method and device, storage medium and electronic equipment |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
EP3779841B1 (en) | Method, apparatus and system for sending information, and computer-readable storage medium | |
CN106716382B (en) | The method and system of aggregation multiple utility program behavioural analysis for mobile device behavior | |
US10713601B2 (en) | Personalized contextual suggestion engine | |
CN104541273B (en) | Infer the socially relevant property of the information about point of interest | |
CN107515915B (en) | User identification association method based on user behavior data | |
CN108256568A (en) | A kind of plant species identification method and device | |
CN106164945A (en) | Sight modeling and visualization | |
CN106548364A (en) | Method for sending information and device | |
CN108304758A (en) | Facial features tracking method and device | |
CN109961296A (en) | Merchant type recognition methods and device | |
US10534978B2 (en) | Classifying and grouping electronic images | |
JP2015510636A (en) | System and method for identifying and analyzing a user's personal context | |
CN106575395A (en) | Entity resolution incorporating data from various data sources | |
CN102710770A (en) | Identification method for network access equipment and implementation system for identification method | |
CN106663036A (en) | Processing changes in multi-tenant system | |
CN113190757A (en) | Multimedia resource recommendation method and device, electronic equipment and storage medium | |
WO2021120875A1 (en) | Search method and apparatus, terminal device and storage medium | |
CN110197375A (en) | A kind of similar users recognition methods, device, similar users identification equipment and medium | |
CN109274639A (en) | The recognition methods of open platform abnormal data access and device | |
JP2018077821A (en) | Method, program, server device, and processor for generating predictive model of category of venue visited by user | |
CN102135983A (en) | Group dividing method and device based on network user behavior | |
CN106874936A (en) | Image propagates monitoring method and device | |
US20230104757A1 (en) | Techniques for input classification and response using generative neural networks | |
CN111652087A (en) | Car checking method and device, electronic equipment and storage medium | |
KR20200064148A (en) | User situation detection in the messaging service environment and interaction with the messaging service based on the user situation |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination |