CN107330020B - User entity analysis method based on structure and attribute similarity - Google Patents

User entity analysis method based on structure and attribute similarity Download PDF

Info

Publication number
CN107330020B
CN107330020B CN201710470266.6A CN201710470266A CN107330020B CN 107330020 B CN107330020 B CN 107330020B CN 201710470266 A CN201710470266 A CN 201710470266A CN 107330020 B CN107330020 B CN 107330020B
Authority
CN
China
Prior art keywords
account
matrix
pair
accounts
similarity
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN201710470266.6A
Other languages
Chinese (zh)
Other versions
CN107330020A (en
Inventor
徐杰
刘震
卢思变
陈文龙
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
University of Electronic Science and Technology of China
Original Assignee
University of Electronic Science and Technology of China
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by University of Electronic Science and Technology of China filed Critical University of Electronic Science and Technology of China
Priority to CN201710470266.6A priority Critical patent/CN107330020B/en
Publication of CN107330020A publication Critical patent/CN107330020A/en
Application granted granted Critical
Publication of CN107330020B publication Critical patent/CN107330020B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/24Querying
    • G06F16/245Query processing
    • G06F16/2458Special types of queries, e.g. statistical queries, fuzzy queries or distributed queries
    • G06F16/2465Query processing support for facilitating data mining operations in structured databases
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/24Querying
    • G06F16/245Query processing
    • G06F16/2458Special types of queries, e.g. statistical queries, fuzzy queries or distributed queries
    • G06F16/2462Approximate or statistical queries
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06QINFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
    • G06Q50/00Systems or methods specially adapted for specific business sectors, e.g. utilities or tourism
    • G06Q50/01Social networking

Abstract

The invention discloses a user entity analysis method based on structure and attribute similarity, which combines the friend relationship and user personal data in a social network, namely structure information and attribute information, by analyzing and modeling the social network, and achieves the purpose of user entity analysis across social platforms. In the process of entity analysis, a concept of dynamic threshold is introduced, different thresholds are used in different iterative periods to adapt to the data characteristics under the current condition, and the proportion of attributes and structures is regulated and controlled to obtain a more accurate result.

Description

User entity analysis method based on structure and attribute similarity
Technical Field
The invention belongs to the technical field of entity analysis, and particularly relates to a user entity analysis method based on structure and attribute similarity.
Background
In a dataset, objects in the real world, to which data is directed, are generally referred to as entities (entities). There may be many different representations or descriptions of the same entity in different or even the same data set, and when data sets from different sources are combined for analysis, the descriptions of the same entity may be mixed together to cause some degree of duplication. Entity Resolution (Entity Resolution) is a process of identifying and connecting a plurality of different descriptions in a data set, and determining which descriptions map to the same Entity in the real world. Entity analysis is an important step in the data preprocessing process and is mainly used for solving the quality problems of repeated redundancy and the like of data.
With the rapid development of social networks, the application of entity resolution in social networks is receiving increasing attention. Most social network users not only use one social network, but also use a plurality of social networks according to own interests and needs, and information among different social platforms is isolated and not intercommunicated, so that a method for directly identifying the virtual identity of the same user on different platforms is unavailable. The problem of cross-platform entity resolution of social networks is to match and identify accounts belonging to the same user entity on different social platforms, i.e. user identification or account matching. Through the matching of account identities, personalized services to users can be achieved, and some security issues of social networks can also be helped to be solved.
The concept of entity resolution was first proposed in 1959. The article published by Newcombe et al in science first proposes the concept, and considers entity resolution as a statistical problem, which is illustrated from the perspective of probability. In 1969 a decade later, felelgi and Sunter first normalized and formulated the entity resolution problem, they treated it as a classification problem in machine learning and specified a series of symbols and definitions of entity resolution in their article, creating the classical felelgi-Sunter model. In the subsequent research, many researchers have improved and supplemented the Fellegi-Sunter model, mainly Jaro, Winkler, Belin and Rubin, Ravikumar, Larsen, Sadinle et al, wherein Winkler has done a lot of work, and a Bayesian statistical model is adopted to make a series of improvements on parameter calculation and matching rules of the Fellegi-Sunter model.
Entity resolution research for social networks has been developed primarily in recent years. Most researchers have conducted research with a view to the attributes, structure, and social content of social networks. The attributes refer to personal details of the user, such as head portraits, user names, sexes, birthdays, education backgrounds, locations and the like, the structures refer to friends and relationships between accounts in the social network, and the social content refers to information such as texts and pictures generated by the user in social activities, such as blogs, comments, geographical positions and the like.
The attribute-based algorithm mainly utilizes personal profile information of users in the social network, takes each item of description information as an attribute of the user, and converts the problem into matching of attribute fields. Such as zafirni and Liu, propose user entity resolution algorithms using the user name and URL of the user's personal home page. Goga et al propose algorithms suitable for large-scale identification. The structure-based algorithm is to mainly utilize social networksAnd the friend relation information of the network abstracts the social network into a graph structure, and realizes user entity analysis by utilizing some graph structure information. Narayanan and Shmatikov[13]And Bartunov et al studied the relevant algorithms from this perspective. The social content based algorithm utilizes analysis of text styles and information such as time, geographic location, and the like to achieve user entity resolution. Such as Almishari and tsudiik, propose a method for identifying users on different social platforms by analyzing the author's writing style. Goga et al propose to jointly implement the user identification by using the geographic location of the user when distributing the content, the timestamp, and the authoring style of the content.
For algorithms that look at attribute information, due to a certain degree of missing and inaccurate account profiles on the social platform, such abnormal data can have an impact on algorithm performance, and such impact from the data itself is very difficult to remove. The algorithm based on the structure avoids the inaccuracy of information, but when a small group with a small number of people exists, the situation that full connection is formed between a plurality of accounts is possible, and how to distinguish the accounts becomes a very difficult problem. Therefore, the algorithm based on the structure is difficult to play a good role under the condition that the friend relationship is very dense. However, the content-based method is inconvenient in use because the related data is difficult to acquire and process. The method provided by the invention organically combines two types of information, namely attribute information and structure information, and avoids the defects of various methods as far as possible.
Disclosure of Invention
The invention aims to overcome the defects of the prior art and provide a user entity analysis method based on structure and attribute similarity, and the problem of cross-platform user entity analysis of a social network is solved by combining information of attributes and structures.
In order to achieve the above object, the present invention provides a method for analyzing a user entity based on structure and attribute similarity, comprising the following steps:
(1) establishing an attribute similarity matrix and an adjacency matrix
According to social platform A and social platformB, establishing attribute similarity matrix S according to the attribute similarity between every two accountsm×nWhere m and n are the total number of accounts in platforms A and B, respectively, Sm×nThe element in (1) represents the attribute similarity between the corresponding two accounts;
establishing an adjacency matrix according to whether all accounts on the social platform A and the social platform B are in friend relationship or not
Figure BDA0001326829010000031
And
Figure BDA0001326829010000032
each row and each column of the adjacency matrix represent an account in the platform, elements in the adjacency matrix represent whether two corresponding accounts in the platform are in a friend relationship or not, if the two corresponding accounts in the platform are in the friend relationship, the element value is 1, and if the two corresponding accounts are not in the friend relationship, the element value is 0;
(2) establishing a correlation matrix
According to a adjacency matrix
Figure BDA0001326829010000033
And a priori matching pair, establishing an incidence matrix between the unidentified account and the identified account in the social platform A and the social platform B
Figure BDA0001326829010000034
Wherein tau represents the number of prior matching pairs, each row of the association matrix represents an unidentified account, each column represents an identified account, elements in the association matrix represent whether the unidentified account and the identified account are in a friend relationship, if the unidentified account and the identified account are in the friend relationship, the element value is 1, and if the unidentified account and the identified account are not in the friend relationship, the element value is 0;
(3) establishing a common friend matrix
According to the incidence matrix
Figure BDA0001326829010000035
And a priori matching pair is used for establishing the common friend moment of the unidentified accounts in the social platform A and the social platform BArraying;
Figure BDA0001326829010000036
wherein (C)TRepresenting transpose, each row of the common friends matrix represents one unidentified account in social platform A
Figure BDA0001326829010000037
Each column represents an unidentified account in social platform B
Figure BDA0001326829010000038
Element f in common friend matrixijTo represent
Figure BDA0001326829010000039
And
Figure BDA00013268290100000310
the number of common friends in the prior matching pair;
(4) selecting an account pair consisting of two unidentified accounts corresponding to the largest non-zero element from the common friend matrix, and storing the account pair in an account pair set Q, wherein Q { (i, j) | fij=max(F(m-τ)×(n-τ))};
(5) In the attribute similarity matrix Sm×nIn the method, the attribute similarity between all account pairs in the account pair set Q is taken out and stored in a similarity set S*In, S*={sij|sij∈Sm×n,(i,j)∈Q};
(6) According to a preset initial threshold value, a similarity set S*Deleting elements lower than the initial threshold value, and deleting corresponding elements in the account pair set Q;
(7) judging whether the account pair set Q is empty, if so, setting the maximum non-zero element in the common friend matrix to be 0, and returning to the step (4); if not, entering the step (8);
(8) and extracting a similarity set S*Max element max (S) of*) And in the account pair setSelecting and max (S) from Q*) Corresponding account pair (i, j), then (i, j) corresponding set of accounts
Figure BDA0001326829010000041
Marking the matching success, and adding the result into a result set M of the iteration of the current round;
(9) deleting the account pair (i, j) added into the result set M and the account pair with the common account in the account pair set Q, and deleting the similarity set S*Middle corresponding element;
(10) judging whether elements exist in the account pair set Q or not, and if the elements exist, returning to the step (8); if not, outputting a result set M;
(11) adding the corresponding account pair in the result set M into the prior matching pair, returning to the step (2), performing the next iteration of the current round, and finishing the iteration of the current round when no new matching pair is output in the result set M;
(12) and (3) modifying the size of the initial threshold, returning to the step (2), performing the next iteration, and finishing the iteration when no new matching pair is output in the result set M after the initial threshold is modified, thereby completing the user entity analysis.
The invention aims to realize the following steps:
according to the user entity analysis method based on the structure and attribute similarity, through analysis and modeling of the social network, the friend relationship and the user personal data, namely the structure information and the attribute information, in the social network are combined, and the purpose of user entity analysis across social platforms is achieved. In the process of entity analysis, a concept of dynamic threshold is introduced, different thresholds are used in different iterative periods to adapt to the data characteristics under the current condition, and the proportion of attributes and structures is regulated and controlled to obtain a more accurate result.
Meanwhile, the user entity analysis method based on the structure and attribute similarity further has the following beneficial effects:
(1) the method combines the information of the attributes and the structure, and avoids the defects of single information and adverse effects on the result accuracy, such as the effects caused by attribute loss and the effects caused by dense friend relationships.
(2) And a dynamic threshold concept is introduced, and in the iteration process, the attribute similarity threshold is not always constant but gradually changed within a certain range along with the increase of generated results so as to adapt to the characteristics of different iteration periods. The threshold is initially performed from the upper bound, obtaining the most accurate results, and then gradually decreases to accommodate more true matching pairs with insufficiently high attribute similarity.
(3) And the priori matching pairs are used as the iteration starting points, the result is fed back to the known conditions in each iteration, a large amount of known conditions or training data are not needed for establishing a model, the method can be implemented only by a few real matching pairs, and the problem of insufficient known conditions is solved.
Drawings
FIG. 1 is a flow chart of a user entity resolution method based on structure and attribute similarity in accordance with the present invention;
FIG. 2 is a block diagram of a friendship structure of two social platforms in an example.
Detailed Description
The following description of the embodiments of the present invention is provided in order to better understand the present invention for those skilled in the art with reference to the accompanying drawings. It is to be expressly noted that in the following description, a detailed description of known functions and designs will be omitted when it may obscure the subject matter of the present invention.
Examples
FIG. 1 is a flow chart of a user entity parsing method based on structure and attribute similarity according to the present invention.
In this embodiment, we first describe the definitions of some names:
modeling a social platform into an undirected graph, wherein the accounts correspond to nodes, and the friends among the accounts correspond to edges among the nodes, namely G is { V, E }, wherein G represents the social platform, V is a set of accounts in the platform, and E is a set of friends in the platform. The friend relationships in a social platform are divided into two types, one-way and two-way. For the unidirectional friend type, theoretically, the unidirectional friend type should be abstracted into a directed graph, but in the algorithm, the friend relationship is a very important basis for user entity analysis, and considering that the intimacy degree of accounts which only concern in a unidirectional way is insufficient and the real friend making condition of the user to which the accounts belong cannot be well reflected, therefore, in the social platform with unidirectional connection, the algorithm only considers the accounts which concern in each other, and equates the relationship to the friend relationship in bidirectional connection, and still models the relationship into an undirected graph.
A series of personal profiles owned by each account in the social platform are collectively called attributes of the node, and each attribute represents a certain profile of the account, such as user name, gender, age and the like. C represents a set of attributes, C ═ C1,C2,C3…), wherein CiRepresenting the name of an attribute.
Also defined herein as a user entity is the owner of an account in the real world, i.e., the person using the account, in a social networking site. The set of user entities is denoted by U ═ U1,u2,u3,…)。
Assuming that the two social platforms are a and B, respectively, the two accounts belonging to the two social platforms respectively,
Figure BDA0001326829010000051
and
Figure BDA0001326829010000052
if they point to the same entity, i.e. they are owned by the same person in the real world, then this is called
Figure BDA0001326829010000061
And
Figure BDA0001326829010000062
match, expressed as
Figure BDA0001326829010000063
Or MA,B(i, j), otherwise, if not matched, is expressed as
Figure BDA0001326829010000064
Or UMA,B(i, j). If the corresponding user entity of the matched account number in the real world is ukThen, it can be expressed as:
Figure BDA0001326829010000065
prior to account matching, a portion of the account pairs that have previously been known to be correct matches, commonly referred to as prior account matching pairs, or seed matching pairs, are typically required. In the actual user entity analysis process, the acquisition of prior account matching pairs is difficult to solve, except for manually labeling some prior matching pairs, the main method is to find a unique identifier capable of determining the user entity, such as an electronic mailbox, a bound account number, an IP address or the like, but the information is generally difficult to obtain, so that a new method needs to be considered for replacement.
In the case where there is no information that can directly determine the a priori matching pairs, it may be considered to use the attribute similarity of the accounts to obtain the a priori matching pairs. The algorithm has low requirement on the number of the prior matching pairs, so that a part of account matching pairs with the highest similarity can be selected through attribute similarity, a part of account matching pairs with the highest friend number can be selected as the prior matching pairs to ensure the importance in the network, and then the algorithm is executed. Although the matching pair selected in this way cannot guarantee accurate pointing to an entity, the problem that the prior matching pair is difficult to obtain can be solved to a certain extent.
Referring to fig. 1, a detailed description is provided below of a user entity analysis method based on structural attribute similarity, which specifically includes the following steps:
s1, establishing attribute similarity matrix and adjacency matrix
Firstly, according to the attribute similarity between every two accounts on a social platform A and a social platform B, an attribute similarity matrix S is constructedm×nM and n are both 7 in this example, where the attribute similarity matrix is directly givenThe portions of the a priori matched pairs are removed and are presented in tabular form, as shown in table 1.
Table 1 is the main part of the attribute similarity matrix.
Figure BDA0001326829010000066
TABLE 1
Next, according to the structural relationship between the two platforms in FIG. 2, as shown in FIG. 2(a) and FIG. 2(b), an adjacency matrix is established
Figure BDA0001326829010000071
And
Figure BDA0001326829010000072
respectively as follows:
Figure BDA0001326829010000073
s2, establishing a correlation matrix
According to a adjacency matrix
Figure BDA0001326829010000074
And a priori matching pair, establishing an incidence matrix between the unidentified account and the identified account in the social platform A and the social platform B
Figure BDA0001326829010000075
In this embodiment, the a priori matching pairs are two sets of (1,1) and (2,2), and are represented by solid nodes in fig. 2, then the correlation matrices are respectively:
Figure BDA0001326829010000076
s3, establishing a common friend matrix
According to the incidence matrix
Figure BDA0001326829010000077
And a priori matching pair is used for establishing an unidentified account in the social platform A and the social platform BA common friends matrix of;
Figure BDA0001326829010000078
s4, selecting an account pair consisting of two unidentified accounts corresponding to the largest non-zero element from the common friend matrix, and storing the account pair in an account pair set Q, wherein Q { (i, j) | fij=max(F(m-τ)×(n-τ))};
S5 matrix S of similarity of attributesm×nIn the method, the attribute similarity between all account pairs in the account pair set Q is taken out and stored in a similarity set S*In, S*={sij|sij∈Sm×n,(i,j)∈Q};
S6, setting an initial threshold, where in this embodiment, the upper and lower bounds of the threshold are set to 0.8 and 0.2, respectively, and then the initial threshold is 0.8. Set similarity to S*Elements below the initial threshold are deleted while the corresponding elements in the set Q of account pairs are deleted, at which time Q { (3,3), (4,4) }, S*={0.85,1};
S7, judging whether the account pair set Q is empty, if so, setting the maximum non-zero element in the common friend matrix to 0, and returning to the step S4; if not, go to step S8;
s8, extracting similarity set S*Max element max (S) of*) I.e. 1, and the account pair (4,4) corresponding to 1 is selected from the account pair set Q, then the group of accounts corresponding to (4,4)
Figure BDA0001326829010000081
Marking the matching success, and adding the result into a result set M of the iteration of the current round;
s9, deleting the account pair (4,4) added into the result set M and the account pair with the common account with the account pair (4,4) in the account pair set Q, and deleting the similarity set S*In, when Q { (3,3) }, S*={0.85};
S10, judging whether the elements still exist in the account pair set Q, if so, returning to the step S8 to execute repeatedly; if no element exists, adding the (3,3) and the (4,4) into a result set M after the iteration is finished, and outputting the result set M;
s11, adding the corresponding account pairs in the result set M into the prior matching pairs, returning to S2, reconstructing the association matrix and the common friend matrix, performing the next iteration of the current round, and repeatedly executing the steps, wherein the iteration of the current round is finished when no new result is generated, and at this time, three groups of account pairs (3,3), (4,4) and (5,5) are added into the result set M;
s12, modifying the size of the initial threshold, returning to the step S2, and performing the next iteration;
the formula for modifying the threshold is:
Figure BDA0001326829010000082
where th denotes the modified threshold value thuAnd thlUpper and lower bounds of the initial threshold, | McI represents the number of matching pairs in the current result set M, min (N)A,NB) Represents the smaller of the account numbers of the social platforms a and B, and τ represents the number of a priori matched pairs.
According to the formula, the threshold used in the next iteration is:
Figure BDA0001326829010000083
three sets of account pairs in M are added to the a priori matching pairs, return to step S2 and perform the subsequent steps with the new thresholds.
The second iteration adds (6,6) the set of account pairs into the result set, a new iteration does not result, and after changing the threshold to 0.32, a new iteration is executed without any result, and the iteration is ended, and finally four sets of account matching results of (3,3), (4,4), (5,5) and (6,6) are generated.
Although illustrative embodiments of the present invention have been described above to facilitate the understanding of the present invention by those skilled in the art, it should be understood that the present invention is not limited to the scope of the embodiments, and various changes may be made apparent to those skilled in the art as long as they are within the spirit and scope of the present invention as defined and defined by the appended claims, and all matters of the invention which utilize the inventive concepts are protected.

Claims (2)

1. A user entity analysis method based on structural attribute similarity is characterized by comprising the following steps:
(1) establishing an attribute similarity matrix and an adjacency matrix
Constructing an attribute similarity matrix S according to the attribute similarity between every two accounts on the social platform A and the social platform Bm×nWhere m and n are the total number of accounts in platforms A and B, respectively, Sm×nThe element in (1) represents the attribute similarity between the corresponding two accounts;
establishing an adjacency matrix according to whether all accounts on the social platform A and the social platform B are in friend relationship or not
Figure FDA0002227782900000011
And
Figure FDA0002227782900000012
each row and each column of the adjacency matrix represent an account in the platform, elements in the adjacency matrix represent whether two corresponding accounts in the platform are in a friend relationship or not, if the two corresponding accounts in the platform are in the friend relationship, the element value is 1, and if the two corresponding accounts are not in the friend relationship, the element value is 0;
(2) establishing a correlation matrix
According to a adjacency matrix
Figure FDA0002227782900000013
And a priori matching pair, establishing an incidence matrix between the unidentified account and the identified account in the social platform A and the social platform B
Figure FDA0002227782900000014
Wherein tau represents the number of prior matching pairs, each row of the association matrix represents an unidentified account, each column represents an identified account, elements in the association matrix represent whether the unidentified account and the identified account are in a friend relationship, if the unidentified account and the identified account are in the friend relationship, the element value is 1, and if the unidentified account and the identified account are not in the friend relationship, the element value is 0;
(3) establishing a common friend matrix
According to the incidence matrix
Figure FDA0002227782900000015
Establishing a common friend matrix of unidentified accounts in the social platform A and the social platform B;
Figure FDA0002227782900000016
wherein (C)TRepresenting transpose, each row of the common friends matrix represents one unidentified account in social platform A
Figure FDA0002227782900000017
Each column represents an unidentified account in social platform B
Figure FDA0002227782900000018
Element f in common friend matrixijTo represent
Figure FDA0002227782900000019
And
Figure FDA00022277829000000110
the number of common friends in the prior matching pair;
(4) selecting an account pair consisting of two unidentified accounts corresponding to the largest non-zero element from the common friend matrix, and storing the account pair in an account pair set Q, wherein Q { (i, j) | fij=max(F(m-τ)×(n-τ))};
(5) Similarity between attributesMatrix Sm×nIn the method, the attribute similarity between all account pairs in the account pair set Q is taken out and stored in a similarity set S*In, S*={sij|sij∈Sm×n,(i,j)∈Q};
(6) According to a preset initial threshold value, a similarity set S*Deleting elements lower than the initial threshold value, and deleting corresponding elements in the account pair set Q;
(7) judging whether the account pair set Q is empty, if so, setting the maximum non-zero element in the common friend matrix to be 0, and returning to the step (4); if not, entering the step (8);
(8) and extracting a similarity set S*Max element max (S) of*) And selects the sum max (S) in the account pair set Q*) Corresponding account pair (i, j), then (i, j) corresponding set of accounts
Figure FDA0002227782900000021
Marking the matching success, and adding the result into a result set M of the iteration of the current round;
(9) deleting the account pair (i, j) added into the result set M and the account pair with the common account in the account pair set Q, and deleting the similarity set S*Middle corresponding element;
(10) judging whether elements exist in the account pair set Q or not, and if the elements exist, returning to the step (8); if not, outputting a result set M;
(11) adding the corresponding account pair in the result set M into the prior matching pair, returning to the step (2), performing the next iteration of the current round, and finishing the iteration of the current round when no new matching pair is output in the result set M;
(12) and (3) modifying the size of the initial threshold, returning to the step (2), performing the next iteration, and finishing the iteration when no new matching pair exists in the result set M after the initial threshold is modified, thereby completing the user entity analysis.
2. The method for analyzing user entity based on structural attribute similarity according to claim 1, wherein the method for modifying the initial threshold comprises:
Figure FDA0002227782900000022
where th denotes the modified threshold value thuAnd thlUpper and lower bounds of the initial threshold, | McI represents the number of matching pairs in the current result set M, min (N)A,NB) Represents the smaller of the account numbers of the social platforms a and B, and τ represents the number of a priori matched pairs.
CN201710470266.6A 2017-06-20 2017-06-20 User entity analysis method based on structure and attribute similarity Active CN107330020B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201710470266.6A CN107330020B (en) 2017-06-20 2017-06-20 User entity analysis method based on structure and attribute similarity

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201710470266.6A CN107330020B (en) 2017-06-20 2017-06-20 User entity analysis method based on structure and attribute similarity

Publications (2)

Publication Number Publication Date
CN107330020A CN107330020A (en) 2017-11-07
CN107330020B true CN107330020B (en) 2020-03-24

Family

ID=60194269

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201710470266.6A Active CN107330020B (en) 2017-06-20 2017-06-20 User entity analysis method based on structure and attribute similarity

Country Status (1)

Country Link
CN (1) CN107330020B (en)

Families Citing this family (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109977979B (en) * 2017-12-28 2021-12-07 中国移动通信集团广东有限公司 Method and device for locating seed user, electronic equipment and storage medium
CN109150974B (en) * 2018-07-19 2020-06-12 电子科技大学 User identity linking method based on neighbor iteration similarity
US10776269B2 (en) * 2018-07-24 2020-09-15 International Business Machines Corporation Two level compute memoing for large scale entity resolution
CN109978033B (en) * 2019-03-15 2020-08-04 第四范式(北京)技术有限公司 Method and device for constructing same-operator recognition model and method and device for identifying same-operator
CN110222790B (en) * 2019-06-17 2021-05-25 南京中孚信息技术有限公司 User identity identification method and device and server
CN111475738B (en) * 2020-05-22 2022-05-17 哈尔滨工程大学 Heterogeneous social network location anchor link identification method based on meta-path
CN113159976B (en) * 2021-05-13 2022-05-24 电子科技大学 Identification method for important users of microblog network

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN1984354A (en) * 2006-04-13 2007-06-20 华为技术有限公司 Method and device for managing user account resource
CN105429999A (en) * 2015-12-17 2016-03-23 北京荣之联科技股份有限公司 Unified identity authentication system based on cloud platform
CN105741175A (en) * 2016-01-27 2016-07-06 电子科技大学 Method for linking accounts in OSNs (On-line Social Networks)
CN105933311A (en) * 2016-04-19 2016-09-07 安徽电信规划设计有限责任公司 Account auditing method

Family Cites Families (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20150051988A1 (en) * 2013-08-15 2015-02-19 Hui-Min Chen Detecting marketing opportunities based on shared account characteristics systems and methods

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN1984354A (en) * 2006-04-13 2007-06-20 华为技术有限公司 Method and device for managing user account resource
CN105429999A (en) * 2015-12-17 2016-03-23 北京荣之联科技股份有限公司 Unified identity authentication system based on cloud platform
CN105741175A (en) * 2016-01-27 2016-07-06 电子科技大学 Method for linking accounts in OSNs (On-line Social Networks)
CN105933311A (en) * 2016-04-19 2016-09-07 安徽电信规划设计有限责任公司 Account auditing method

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
基于Web的人物信息搜索关键问题研究;辛涛;《中国优秀硕士学位论文全文数据库》;20150915(第2015年第09期);第I139-68页 *

Also Published As

Publication number Publication date
CN107330020A (en) 2017-11-07

Similar Documents

Publication Publication Date Title
CN107330020B (en) User entity analysis method based on structure and attribute similarity
CN106156127B (en) Method and device for selecting data content to push to terminal
WO2017211051A1 (en) Mining method and server for social network account of target subject, and storage medium
CN103024017B (en) A kind of social networks important goal and Community Group recognition methods
CN107133277B (en) A kind of tourist attractions recommended method based on Dynamic Theme model and matrix decomposition
CN109241454A (en) A kind of point of interest recommended method merging social networks and picture material
CN107688605B (en) Cross-platform data matching process, device, computer equipment and storage medium
CN105868267B (en) A kind of modeling method of mobile social networking user interest
CN107767279A (en) A kind of average weighted personalized friend recommendation method based on LDA
CN109584094B (en) Interpersonal path rapid positioning system, method and medium
CN101496003A (en) Compatibility scoring of users in a social network
CN105631749A (en) User portrait calculation method based on statistical data
CN103984771B (en) Method for extracting geographical interest points in English microblog and perceiving time trend of geographical interest points
CN103377237B (en) The neighbor search method of high dimensional data and fast approximate image searching method
CN104951544A (en) User data processing method and system and method and system for providing user data
WO2015021937A1 (en) Method and device for user recommendation
CN104778224A (en) Target object social relation identification method based on video semantics
CN111177559A (en) Text travel service recommendation method and device, electronic equipment and storage medium
CN110069619A (en) Source of houses methods of exhibiting, device, equipment and computer readable storage medium
KR101224312B1 (en) Friend recommendation method for SNS user, recording medium for the same, and SNS and server using the same
CN107506362A (en) Image classification based on customer group optimization imitates brain storage method
CN114881041A (en) Multi-dimensional intelligent extraction system for microblog big data hot topics
CN105354343B (en) User characteristics method for digging based on remote dialogue
CN106844743B (en) Emotion classification method and device for Uygur language text
CN109299368B (en) Method and system for intelligent and personalized recommendation of environmental information resources AI

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant