CN107330020A - A kind of user subject analytic method based on structure and attributes similarity - Google Patents

A kind of user subject analytic method based on structure and attributes similarity Download PDF

Info

Publication number
CN107330020A
CN107330020A CN201710470266.6A CN201710470266A CN107330020A CN 107330020 A CN107330020 A CN 107330020A CN 201710470266 A CN201710470266 A CN 201710470266A CN 107330020 A CN107330020 A CN 107330020A
Authority
CN
China
Prior art keywords
account
mrow
matrix
similarity
msub
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN201710470266.6A
Other languages
Chinese (zh)
Other versions
CN107330020B (en
Inventor
徐杰
刘震
卢思变
陈文龙
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
University of Electronic Science and Technology of China
Original Assignee
University of Electronic Science and Technology of China
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by University of Electronic Science and Technology of China filed Critical University of Electronic Science and Technology of China
Priority to CN201710470266.6A priority Critical patent/CN107330020B/en
Publication of CN107330020A publication Critical patent/CN107330020A/en
Application granted granted Critical
Publication of CN107330020B publication Critical patent/CN107330020B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/24Querying
    • G06F16/245Query processing
    • G06F16/2458Special types of queries, e.g. statistical queries, fuzzy queries or distributed queries
    • G06F16/2465Query processing support for facilitating data mining operations in structured databases
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/24Querying
    • G06F16/245Query processing
    • G06F16/2458Special types of queries, e.g. statistical queries, fuzzy queries or distributed queries
    • G06F16/2462Approximate or statistical queries
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06QINFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
    • G06Q50/00Information and communication technology [ICT] specially adapted for implementation of business processes of specific business sectors, e.g. utilities or tourism
    • G06Q50/01Social networking

Landscapes

  • Engineering & Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Theoretical Computer Science (AREA)
  • Databases & Information Systems (AREA)
  • Probability & Statistics with Applications (AREA)
  • General Physics & Mathematics (AREA)
  • Computational Linguistics (AREA)
  • Data Mining & Analysis (AREA)
  • Business, Economics & Management (AREA)
  • General Engineering & Computer Science (AREA)
  • Software Systems (AREA)
  • Mathematical Physics (AREA)
  • Fuzzy Systems (AREA)
  • Computing Systems (AREA)
  • Health & Medical Sciences (AREA)
  • Economics (AREA)
  • General Health & Medical Sciences (AREA)
  • Human Resources & Organizations (AREA)
  • Marketing (AREA)
  • Primary Health Care (AREA)
  • Strategic Management (AREA)
  • Tourism & Hospitality (AREA)
  • General Business, Economics & Management (AREA)
  • Management, Administration, Business Operations System, And Electronic Commerce (AREA)

Abstract

The invention discloses a kind of user subject analytic method based on structure and attributes similarity, pass through the analysis and modeling to social networks, combine the friend relation and individual subscriber data in social networks, i.e. structural information and attribute information, realize the purpose across the user subject parsing of social platform.During entity resolution, the concept of dynamic threshold is introduced, the data characteristicses under present case, regulation and control attribute and structure proportion is adapted to using different threshold values in the different times of iteration, to obtain more precisely result.

Description

A kind of user subject analytic method based on structure and attributes similarity
Technical field
The invention belongs to entity resolution technical field, more specifically, it is related to a kind of based on structure and attributes similarity User subject analytic method.
Background technology
In data set, the object in real world pointed by data, commonly referred to as entity (Entity).For same One entity, in different or even same data set, it is understood that there may be a variety of different performances or description form, comes when by multiple differences When the data set in source is merged to analyze and process, these then can be mixed in together for the description of same entity, causes certain The polyisomenism of degree.Entity resolution (Entity Resolution), is exactly that a variety of different descriptions concentrated to data are carried out Identification, connection, determine which description is mapped in the process of the same entity in real world.Entity resolution is data prediction mistake An important step in journey, is mainly used in solving the quality problems such as the repeated and redundant of data.
With the fast development of social networks, application of the entity resolution in terms of social networks is gradually of concern.Greatly Part social network user not only uses a social networks, but according to oneself interest and needs, while using multiple Social networks, and the information between different social platform is isolated, not intercommunication, therefore Direct Recognition of having no idea is same Virtual identity of the individual user in different platform.The cross-platform entity resolution problem of social networks is exactly matching and is recognized in difference The account for belonging to same user subject in social platform, i.e. user's identification or account are matched.Pass through the matching of account identity, energy The personalized service to user is enough realized, and also contributes to solve some safety problems of social networks.
This concept of entity resolution is set forth in nineteen fifty-nine earliest.Newcombe et al. exists《Science》On the article delivered it is first It is secondary to propose this concept, and think that entity resolution is a statistical problem, elaborate entity resolution problem from the angle of probability. 1969 after 10 years, Fellegi and Sunter were made that standardization and formulated to entity resolution problem first, they by its The classification problem that is considered as in a machine learning and the specification series of sign of entity resolution and fixed in their article Justice, establishes the Fellegi-Sunter models of classics.In research after this, there are Many researchers to Fellegi- Sunter models are improved and supplemented, and mainly have Jaro, Winkler, Belin and a Rubin, Ravikumar, Larsen, Sadinle et al., wherein, Winkler has done substantial amounts of work, using Bayesian statistical model, to Fellegi-Sunter moulds A series of improvement has been done in the parameter calculating of type and matched rule etc..
For the entity resolution research of social networks, mainly deploy in recent years.Most of researcher is conceived to social network These aspect expansion researchs of attribute, structure and the social content of network.Attribute refers to the personal information of user, such as head portrait, use Name in an account book, sex, birthday, education background, location etc., structure refer to the friend relation between account and account in social networks, And social content refers to the information such as text, picture that user produces in doings, such as blog, comment, geographical position.
What the algorithm based on attribute was mainly utilized is the personal information information of user in social networks, and each single item is described to believe Breath is respectively seen as the attribute of user, problem is converted into the matching of attribute field.As Zafarani and Liu utilizes user name With the URL of individual subscriber homepage, it is proposed that user subject analytical algorithm.Goga et al. is proposed suitable for extensive identification Algorithm.Structure-based algorithm is exactly the main friend relation information using social networks, and social networks is abstracted into figure knot Structure, realizes that user subject is parsed using some graph structure information.Narayanan and Shmatikov[13]And Bartunov etc. People have studied the algorithm of correlation from this angle.Algorithm based on social content using analysis to text style, and the time, The information such as geographical position, realize that user subject is parsed.Write as Almishari and Tsudik proposes one kind by analyzing author Style, the method for recognizing user in different social platforms.Goga et al. propose using user issue content when geo-location, Timestamp, and the writing style of content realize the work of user's identification to combine.
Algorithm for being conceived to attribute information, because the account personal information in social platform has a certain degree of lack Become estranged inaccurate, this kind of abnormal data can be impacted to algorithm performance, and this influence from data in itself is very What hardly possible was removed.The inaccurate of information is avoided from the algorithm of structure, but ought be existed in small groupuscule, Ke Nengduo Situation about connecting entirely is almost formd between individual account, then how to distinguish then very difficult into one between these accounts Problem.Therefore structure-based algorithm is in the case where friend relation is very intensive, it is difficult to play a role well.And based on interior The method of appearance, related data is difficult to obtain and is difficult to handle, and is not convenient to use.Method proposed by the present invention, is organically combined Attribute and the category information of structure two, avoid the defect of various methods as far as possible.
The content of the invention
It is an object of the invention to overcome the deficiencies of the prior art and provide a kind of user based on structure and attributes similarity Entity resolution method, with reference to attribute and the aspect information of structure two, problem is parsed to solve the cross-platform user subject of social networks.
For achieving the above object, a kind of user subject analytic method based on structure and attributes similarity of the present invention, It is characterised in that it includes following steps:
(1) attributes similarity matrix and adjacency matrix, are set up
According to the attributes similarity of all accounts between any two on social platform A and social platform B, attribute is built similar Spend matrix Sm×n, wherein, m and n are respectively the account base in platform A and B, Sm×nIn element representation correspondence two accounts between Attributes similarity;
Whether it is between any two respectively friend relation according to all accounts on social platform A and social platform B, sets up adjacent Connect matrixWithWherein, every a line of adjacency matrix, each row all represent an account in the platform, adjacent square In the element representation platform between two accounts of correspondence whether it is friend relation in battle array, if friend relation, then the element value For 1, the element value is 0 if being not friend relation;
(2) incidence matrix, is set up
According to adjacency matrixWith priori matching pair, set up unidentified in social platform A and social platform B Account and the incidence matrix between account is recognizedWherein, τ represent priori matching to number, association Every a line of matrix represents unidentified account, and each row, which are all represented, has recognized account, the unidentified account of element representation in incidence matrix Family and recognize between account whether be friend relation, if friend relation, then the element value is 1, if do not closed for good friend Then the element value is 0 for system;
(3) common friend matrix, is set up
According to incidence matrixWith priori matching pair, set up in social platform A and social platform B and do not know The common friend matrix of other account;
Wherein, ()TTransposition is represented, every a line of common friend matrix represents a unidentified account in social platform AEach row represent a unidentified account in social platform BElement f in common friend matrixijRepresentWith Priori matches the common friend number of centering;
(4) account pair of the corresponding two unidentified account compositions of maximum nonzero element, is selected from common friend matrix, And account is stored in in set Q, Q=(i, j) | fij=max (F(m-τ)×(n-τ))};
(5), in attributes similarity matrix Sm×nIn, take out account similar to the attribute between all accounts pair in set Q Degree, and it is stored in similarity set S*In, S*={ sij|sij∈Sm×n,(i,j)∈Q};
(6), according to default initial threshold, by similarity set S*In less than initial threshold element delete, simultaneously will Account is deleted the corresponding element in set Q;
(7) whether to set Q be empty, if sky, then by the maximum nonzero element in common friend matrix if, judging account Set to 0, return again to step (4);If being not sky, into step (8);
(8) similarity set S, is taken out*In greatest member max (S*), and selected and max (S in account is to set Q*) Corresponding account is to (i, j), then (i, j) corresponding one group of accountLabeled as the match is successful, and it is added to epicycle iteration Result set M in;
(9), account has common account to deleting the account that is added in result set M in set Q to (i, j), and with (i, j) The account pair at family, while deleting similarity set S*Middle corresponding element;
(10) account, is judged to whether also there is element in set Q, if it is present return to step (8);If do not deposited In then output result collection M;
(11), corresponding account in result set M is returned again to step (2), carry out epicycle to being added to priori matching centering Next iteration, when do not have in result set M new matching to output when epicycle iteration terminate;
(12) size of initial threshold, is changed, step (2) is returned again to, the iteration of next round is carried out, when initial by modification After threshold value, in result set M still without new matching to output when iteration terminate, complete user subject parsing.
What the goal of the invention of the present invention was realized in:
A kind of user subject analytic method based on structure and attributes similarity of the present invention, passes through the analysis to social networks And modeling, combine the friend relation and individual subscriber data, i.e. structural information and attribute information in social networks, realize across The purpose of the user subject parsing of social platform.During entity resolution, the concept of dynamic threshold is introduced, in iteration Different times adapt to the data characteristicses under present case, regulation and control attribute and structure proportion using different threshold values, to obtain Obtain more precisely result.
Meanwhile, a kind of user subject analytic method based on structure and attributes similarity of the present invention also has following beneficial effect Really:
(1) information of both attribute and structure, is combined, it is to avoid the defect of single piece of information and result accuracy is made Into adverse effect, the influence that such as the attribute influence that causes of missing and friend relation dense band come.
(2) concept of dynamic threshold, is introduced, during iteration, attributes similarity threshold value is not constant all the time, and It is to increase with producing result, is gradually changed in certain scope, the characteristics of adapts to different iteration periods.Threshold value is originally Performed since with the upper bound, obtain result the most accurate, be then gradually reduced, it is not high enough to store more attributes similarities True match pair.
(3), using priori matching to as iteration starting point, result is all fed back among known conditions, is not required to by each iteration Want comparatively large number of known conditions or training data to set up model, it is only necessary to less true match to can implementation, keep away The problem of known conditions is not enough is exempted from.
Brief description of the drawings
Fig. 1 is the user subject analytic method flow chart of the invention based on structure and attributes similarity;
Fig. 2 is the friend relation structure chart of two social platforms in example.
Embodiment
The embodiment to the present invention is described below in conjunction with the accompanying drawings, so as to those skilled in the art preferably Understand the present invention.Requiring particular attention is that, in the following description, when known function and design detailed description perhaps When can desalinate the main contents of the present invention, these descriptions will be ignored herein.
Embodiment
Fig. 1 is the user subject analytic method flow chart of the invention based on structure and attributes similarity.
In the present embodiment, our definition first to some titles are described:
One social platform is modeled as to the friend relation pair between the form of a non-directed graph, account corresponding node, account The side between node, i.e. G={ V, E } are answered, wherein G represents social platform, and V is the set of account in the platform, and E is good in the platform The set of friendly relation.Friend relation in social platform is divided into unidirectional and two-way two types.It is theoretical for unidirectional good friend's type On abstract should turn to digraph, but in this algorithm, friend relation is the highly important foundation of user subject parsing, it is considered to To between the account only unidirectionally paid close attention to, its intimate degree is not enough, it is impossible to which the user belonging to reflection account really hands over well Friendship condition, therefore, in the social platform unidirectionally connected, this algorithm only considers the account paid close attention to mutually, and by such relation The friend relation in being bi-directionally connected is equivalent to, a non-directed graph is still modeled as.
A series of personal informations that each account in social platform possesses are collectively referred to as to the attribute of node, each single item Attribute shows a certain item data of account, such as user name, sex, age.C represents the set of attribute, C=(C1,C2, C3...), wherein, CiRepresent the title of an attribute.
The owner herein by the account in social network sites in real world, i.e., using the people of the account, define simultaneously For user subject.The collection of user subject shares U and represented, U=(u1,u2,u3,…)。
Assuming that two social platforms are respectively A and B, then two accounts of two social platforms are belonging respectively to,WithSuch as Really they point to same entity, i.e., they are possessed by the same person in real world, then are claimedWithMatching, is expressed asOr MA,B(i, j), if opposite mismatch, then it represents that beOr UMA,B(i,j).If matching account exists Corresponding user subject is u in real worldk, then can be expressed as:
Before account matching is carried out, the account pair of a part of correct matching known in advance is generally required, these Matching is to being commonly referred to as the matching pair of priori account, or seed matching pair.In actual user subject resolving, priori account Family matching to acquisition be relatively difficult to solve, except artificial manual some priori of mark are matched in addition to, main method is A kind of unique mark that can determine user subject is found, account, or IP address etc., but this category information are bound in such as E-mail address It is general to be relatively difficult to obtain, it is therefore desirable to consider that new method is substituted.
Can not determine directly priori matching to information in the case of, it may be considered that the attribute accessed to your account is similar Spend to obtain priori matching pair.This paper algorithms priori is matched to quantitative requirement it is not high, therefore attribute can be passed through first Similarity, selects a part of account matching pair of similarity highest, and choose the most a part of account pair of good friend's number wherein It is considered as priori matching pair, to ensure critical role in a network, then carries out the execution of algorithm.The matching pair so selected Though it is impossible to ensure that accurately pointing to an entity, it can solve the problem that priori matching to more unobtainable problem to a certain extent.
With reference to shown in Fig. 1, the user subject analytic method based on structure attribute similarity a kind of to the present invention is carried out Describe in detail, specifically include following steps:
S1, set up attributes similarity matrix and adjacency matrix
First according to the attributes similarity of all accounts between any two on social platform A and social platform B, attribute is built Similarity matrix Sm×n, m and n is 7 in this example, directly give here removed in attributes similarity matrix priori matching to portion Point, shown with the form of form, as shown in table 1.
Table 1 is the major part of attributes similarity matrix.
Table 1
Next according to the structural relation of two platforms in Fig. 2, such as Fig. 2 (a) and Fig. 2 (b) are shown, set up adjacency matrixWithRespectively:
S2, set up incidence matrix
According to adjacency matrixWith priori matching pair, set up unidentified in social platform A and social platform B Account and the incidence matrix between account is recognizedIn the present embodiment, priori be paired into (1,1) and Two groups of (2,2), represent that then incidence matrix is respectively with solid node in fig. 2:
S3, set up common friend matrix
According to incidence matrixWith priori matching pair, unidentified account in social platform A and social platform B is set up The common friend matrix at family;
S4, the account pair for selecting from common friend matrix the corresponding two unidentified account compositions of maximum nonzero element, And account is stored in in set Q, Q=(i, j) | fij=max (F(m-τ)×(n-τ))};
S5, in attributes similarity matrix Sm×nIn, account is taken out to the attributes similarity between all accounts pair in set Q, And it is stored in similarity set S*In, S*={ sij|sij∈Sm×n,(i,j)∈Q};
The upper bound of threshold value and lower bound are respectively set to 0.8 and 0.2 in S6, setting initial threshold, the present embodiment, then initially Threshold value is 0.8.By similarity set S*In less than initial threshold element delete, while by account to the corresponding element in set Q Element is deleted, now Q={ (3,3), (4,4) }, S*={ 0.85,1 };
S7, judge whether account is empty to set Q, if sky, then by the maximum nonzero element in common friend matrix Set to 0, return again to step S4;If being not sky, into step S8;
S8, taking-up similarity set S*In greatest member max (S*), i.e., 1, and selected and 1 pair in account is to set Q The account answered is to (4,4), then (4,4) corresponding one group of accountLabeled as the match is successful, and it is added to epicycle iteration In result set M;
S9, account have common account to deleting the account that is added in result set M in set Q to (4,4), and with (4,4) The account pair at family, while deleting similarity set S*Middle corresponding element, now Q={ (3,3) }, S*={ 0.85 };
S10, account is judged to whether also there is element in set Q, if still with the presence of element, return to step S8 repeats to hold OK;If existed without element, after current iteration terminates, (3,3) and (4,4) are added into result set M, and output result collection M;
S11, by corresponding account in result set M to be added to priori matching centering, return again to step S2, rebuild pass Join matrix and common friend matrix, carry out the next iteration of epicycle, repeat above-mentioned steps, finally when without new result During generation, epicycle iteration terminates, and now has (3,3), (4,4), and (5,5) three groups of accounts are to being added into result set M;
S12, the size for changing initial threshold, return again to step S2, carry out the iteration of next round;
Modification threshold value formula be:
Wherein, th represents amended threshold value, thuAnd thlThe respectively upper bound of initial threshold and lower bound, | Mc| represent to work as Matched in preceding result set M to number, min (NA,NB) less value in social platform A and B account quantity is represented, τ represents elder generation Test matching to number.
According to formula, the threshold value that next round iteration is used is:
Three groups of accounts in M are simultaneously performed into follow-up step to being added to priori matching centering, return to step S2 with new threshold value Suddenly.
Second wheel iteration can be by (6,6) this group of account to being added in result set, and newly iteration once is come to nothing production It is raw, and change threshold value and remained unchanged generation of coming to nothing after 0.32, to perform new round iteration, now iteration terminates, and finally generates (3,3), (4,4), (5,5), (6,6) this four groups of account matching results.
Although illustrative embodiment of the invention is described above, in order to the technology of the art Personnel understand the present invention, it should be apparent that the invention is not restricted to the scope of embodiment, to the common skill of the art For art personnel, as long as various change is in the spirit and scope of the present invention that appended claim is limited and is determined, these Change is it will be apparent that all utilize the innovation and creation of present inventive concept in the row of protection.

Claims (2)

1. a kind of user subject analytic method based on structure attribute similarity, it is characterised in that comprise the following steps:
(1) attributes similarity matrix and adjacency matrix, are set up
According to the attributes similarity of all accounts between any two on social platform A and social platform B, attributes similarity square is built Battle array Sm×n, wherein, m and n are respectively the account base in platform A and B, Sm×nIn element representation correspondence two accounts between category Property similarity;
Whether it is between any two respectively friend relation according to all accounts on social platform A and social platform B, sets up adjacent square Battle arrayWithWherein, every a line of adjacency matrix, each row are all represented in an account in the platform, adjacency matrix Between two accounts of correspondence whether it is friend relation in the element representation platform, if friend relation, then the element value is 1, The element value is 0 if being not friend relation;
(2) incidence matrix, is set up
According to adjacency matrixWith priori matching pair, set up in social platform A and social platform B unidentified account with The incidence matrix between account is recognizedWherein, τ represent priori matching to number, incidence matrix Unidentified account is represented per a line, each row, which are all represented, has recognized account, in incidence matrix the unidentified account of element representation with Whether it is friend relation between identification account, if friend relation, then the element value is 1, should if being not friend relation Element value is 0;
(3) common friend matrix, is set up
According to incidence matrixWith priori matching pair, unidentified account in social platform A and social platform B is set up The common friend matrix at family;
<mrow> <msup> <mi>F</mi> <mrow> <mo>(</mo> <mi>m</mi> <mo>-</mo> <mi>&amp;tau;</mi> <mo>)</mo> <mo>&amp;times;</mo> <mo>(</mo> <mi>n</mi> <mo>-</mo> <mi>&amp;tau;</mi> <mo>)</mo> </mrow> </msup> <mo>=</mo> <msubsup> <mi>R</mi> <mi>A</mi> <mrow> <mo>(</mo> <mi>m</mi> <mo>-</mo> <mi>&amp;tau;</mi> <mo>)</mo> <mo>&amp;times;</mo> <mi>&amp;tau;</mi> </mrow> </msubsup> <mo>&amp;times;</mo> <msup> <mrow> <mo>(</mo> <msubsup> <mi>R</mi> <mi>B</mi> <mrow> <mo>(</mo> <mi>n</mi> <mo>-</mo> <mi>&amp;tau;</mi> <mo>)</mo> <mo>&amp;times;</mo> <mi>&amp;tau;</mi> </mrow> </msubsup> <mo>)</mo> </mrow> <mi>T</mi> </msup> </mrow>
Wherein, ()TTransposition is represented, every a line of common friend matrix represents a unidentified account in social platform A Each row represent a unidentified account in social platform BElement f in common friend matrixijRepresentAnd vBIn priori Match the common friend number of centering;
(4) account pair of the corresponding two unidentified account compositions of maximum nonzero element, is selected from common friend matrix, and is deposited Account is placed on in set Q, Q=(i, j) | fij=max (F(m-τ)×(n-τ))};
(5), in attributes similarity matrix Sm×nIn, account is taken out to the attributes similarity between all accounts pair in set Q, and It is stored in similarity set S*In, S*={ sij|sij∈Sm×n,(i,j)∈Q};
(6), according to default initial threshold, by similarity set S*In deleted less than the element of initial threshold, while by account pair Corresponding element in set Q is deleted;
(7) whether to set Q be empty, if sky if, judging account, then put the maximum nonzero element in common friend matrix 0, return again to step (4);If being not sky, into step (8);
(8) similarity set S, is taken out*In greatest member max (S*), and selected and max (S in account is to set Q*) correspondence Account to (i, j), then (i, j) corresponding one group of accountLabeled as the match is successful, and it is added to the knot of epicycle iteration In fruit collection M;
(9), account has joint account to deleting the account that is added in result set M in set Q to (i, j), and with (i, j) Account pair, while deleting similarity set S*Middle corresponding element;
(10) account, is judged to whether also there is element in set Q, if it is present return to step (8);If it does not exist, then Output result collection M;
(11), corresponding account in result set M is returned again to step (2), carried out under epicycle to being added to priori matching centering An iteration, when not having new matching to output in result set M, epicycle iteration terminates;
(12) size of initial threshold, is changed, step (2) is returned again to, the iteration of next round is carried out, when by changing initial threshold Afterwards, iteration terminates when appointing in result set M so without new matching to output, completes user subject parsing.
2. the user subject analytic method according to claim 1 based on structure attribute similarity, it is characterised in that described The method of modification initial threshold be:
<mrow> <mi>t</mi> <mi>h</mi> <mo>=</mo> <msub> <mi>th</mi> <mi>u</mi> </msub> <mo>-</mo> <mfrac> <mrow> <mo>|</mo> <msub> <mi>M</mi> <mi>c</mi> </msub> <mo>|</mo> </mrow> <mrow> <mi>m</mi> <mi>i</mi> <mi>n</mi> <mrow> <mo>(</mo> <msub> <mi>N</mi> <mi>A</mi> </msub> <mo>,</mo> <msub> <mi>N</mi> <mi>B</mi> </msub> <mo>)</mo> </mrow> <mo>-</mo> <mi>&amp;tau;</mi> </mrow> </mfrac> <mo>&amp;times;</mo> <mrow> <mo>(</mo> <msub> <mi>th</mi> <mi>u</mi> </msub> <mo>-</mo> <msub> <mi>th</mi> <mi>l</mi> </msub> <mo>)</mo> </mrow> </mrow>
Wherein, th represents amended threshold value, thuAnd thlThe respectively upper bound of initial threshold and lower bound, | Mc| represent current knot Fruit collection M in match to number, min (NA,NB) less value in social platform A and B account quantity is represented, τ represents priori The number of pairing.
CN201710470266.6A 2017-06-20 2017-06-20 User entity analysis method based on structure and attribute similarity Active CN107330020B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201710470266.6A CN107330020B (en) 2017-06-20 2017-06-20 User entity analysis method based on structure and attribute similarity

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201710470266.6A CN107330020B (en) 2017-06-20 2017-06-20 User entity analysis method based on structure and attribute similarity

Publications (2)

Publication Number Publication Date
CN107330020A true CN107330020A (en) 2017-11-07
CN107330020B CN107330020B (en) 2020-03-24

Family

ID=60194269

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201710470266.6A Active CN107330020B (en) 2017-06-20 2017-06-20 User entity analysis method based on structure and attribute similarity

Country Status (1)

Country Link
CN (1) CN107330020B (en)

Cited By (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109150974A (en) * 2018-07-19 2019-01-04 电子科技大学 A kind of user identity link method based on neighbours' iteration similarity
CN109978033A (en) * 2019-03-15 2019-07-05 第四范式(北京)技术有限公司 The method and apparatus of the building of biconditional operation people's identification model and biconditional operation people identification
CN109977979A (en) * 2017-12-28 2019-07-05 中国移动通信集团广东有限公司 Position method, apparatus, electronic equipment and the storage medium of seed user
CN110222790A (en) * 2019-06-17 2019-09-10 南京中孚信息技术有限公司 Method for identifying ID, device and server
WO2020021404A1 (en) * 2018-07-24 2020-01-30 International Business Machines Corporation Two level compute memoing for large scale entity resolution
CN111475738A (en) * 2020-05-22 2020-07-31 哈尔滨工程大学 Heterogeneous social network location anchor link identification method based on meta-path
CN113159976A (en) * 2021-05-13 2021-07-23 电子科技大学 Identification method for important users of microblog network

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN1984354A (en) * 2006-04-13 2007-06-20 华为技术有限公司 Method and device for managing user account resource
US20150051988A1 (en) * 2013-08-15 2015-02-19 Hui-Min Chen Detecting marketing opportunities based on shared account characteristics systems and methods
CN105429999A (en) * 2015-12-17 2016-03-23 北京荣之联科技股份有限公司 Unified identity authentication system based on cloud platform
CN105741175A (en) * 2016-01-27 2016-07-06 电子科技大学 Method for linking accounts in OSNs (On-line Social Networks)
CN105933311A (en) * 2016-04-19 2016-09-07 安徽电信规划设计有限责任公司 Account auditing method

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN1984354A (en) * 2006-04-13 2007-06-20 华为技术有限公司 Method and device for managing user account resource
US20150051988A1 (en) * 2013-08-15 2015-02-19 Hui-Min Chen Detecting marketing opportunities based on shared account characteristics systems and methods
CN105429999A (en) * 2015-12-17 2016-03-23 北京荣之联科技股份有限公司 Unified identity authentication system based on cloud platform
CN105741175A (en) * 2016-01-27 2016-07-06 电子科技大学 Method for linking accounts in OSNs (On-line Social Networks)
CN105933311A (en) * 2016-04-19 2016-09-07 安徽电信规划设计有限责任公司 Account auditing method

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
辛涛: "基于Web的人物信息搜索关键问题研究", 《中国优秀硕士学位论文全文数据库》 *

Cited By (13)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109977979A (en) * 2017-12-28 2019-07-05 中国移动通信集团广东有限公司 Position method, apparatus, electronic equipment and the storage medium of seed user
CN109977979B (en) * 2017-12-28 2021-12-07 中国移动通信集团广东有限公司 Method and device for locating seed user, electronic equipment and storage medium
CN109150974A (en) * 2018-07-19 2019-01-04 电子科技大学 A kind of user identity link method based on neighbours' iteration similarity
GB2588874A (en) * 2018-07-24 2021-05-12 Ibm Two level compute memoing for large scale entity resolution
WO2020021404A1 (en) * 2018-07-24 2020-01-30 International Business Machines Corporation Two level compute memoing for large scale entity resolution
US10776269B2 (en) 2018-07-24 2020-09-15 International Business Machines Corporation Two level compute memoing for large scale entity resolution
CN109978033A (en) * 2019-03-15 2019-07-05 第四范式(北京)技术有限公司 The method and apparatus of the building of biconditional operation people's identification model and biconditional operation people identification
CN109978033B (en) * 2019-03-15 2020-08-04 第四范式(北京)技术有限公司 Method and device for constructing same-operator recognition model and method and device for identifying same-operator
CN110222790A (en) * 2019-06-17 2019-09-10 南京中孚信息技术有限公司 Method for identifying ID, device and server
CN111475738A (en) * 2020-05-22 2020-07-31 哈尔滨工程大学 Heterogeneous social network location anchor link identification method based on meta-path
CN111475738B (en) * 2020-05-22 2022-05-17 哈尔滨工程大学 Heterogeneous social network location anchor link identification method based on meta-path
CN113159976A (en) * 2021-05-13 2021-07-23 电子科技大学 Identification method for important users of microblog network
CN113159976B (en) * 2021-05-13 2022-05-24 电子科技大学 Identification method for important users of microblog network

Also Published As

Publication number Publication date
CN107330020B (en) 2020-03-24

Similar Documents

Publication Publication Date Title
CN107330020A (en) A kind of user subject analytic method based on structure and attributes similarity
CN106250412B (en) Knowledge mapping construction method based on the fusion of multi-source entity
CN102902362B (en) Character input method and system
CN110019647B (en) Keyword searching method and device and search engine
CN107122411B (en) Collaborative filtering recommendation method based on discrete multi-view Hash
CN105631037B (en) A kind of image search method
CN105389329B (en) A kind of open source software recommended method based on community review
CN101496003A (en) Compatibility scoring of users in a social network
TW201317814A (en) Method and Apparatus of Ranking Search Results, and Search Method and Apparatus
CN107145545A (en) Top k zone users text data recommends method in a kind of location-based social networks
CN104978396A (en) Knowledge database based question and answer generating method and apparatus
CN104778210B (en) A kind of microblogging forwarding tree and forwarding forest construction method
CN107665217A (en) A kind of vocabulary processing method and system for searching service
CN105761154B (en) A kind of socialization recommended method and device
CN102163234A (en) Equipment and method for error correction of query sequence based on degree of error correction association
CN112667877A (en) Scenic spot recommendation method and equipment based on tourist knowledge map
CN107368540A (en) The film that multi-model based on user&#39;s self-similarity is combined recommends method
CN103377237B (en) The neighbor search method of high dimensional data and fast approximate image searching method
CN109992786A (en) A kind of semantic sensitive RDF knowledge mapping approximate enquiring method
CN106156155A (en) A kind of method and system that e-book resource is provided
CN111177559B (en) Text travel service recommendation method and device, electronic equipment and storage medium
CN105869058B (en) A kind of method that multilayer latent variable model user portrait extracts
CN109284411A (en) One kind being based on having supervision hypergraph discretized image binary-coding method
CN102081666B (en) Index construction method and device for distributed picture search
CN112784590A (en) Text processing method and device

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant