CN107330020A - A kind of user subject analytic method based on structure and attributes similarity - Google Patents
A kind of user subject analytic method based on structure and attributes similarity Download PDFInfo
- Publication number
- CN107330020A CN107330020A CN201710470266.6A CN201710470266A CN107330020A CN 107330020 A CN107330020 A CN 107330020A CN 201710470266 A CN201710470266 A CN 201710470266A CN 107330020 A CN107330020 A CN 107330020A
- Authority
- CN
- China
- Prior art keywords
- account
- mrow
- matrix
- similarity
- msub
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Granted
Links
- 238000004458 analytical method Methods 0.000 title claims abstract description 15
- 239000011159 matrix material Substances 0.000 claims description 53
- 238000000034 method Methods 0.000 claims description 10
- 239000000203 mixture Substances 0.000 claims description 3
- 238000012986 modification Methods 0.000 claims description 3
- 230000004048 modification Effects 0.000 claims description 3
- 235000013399 edible fruits Nutrition 0.000 claims 2
- 230000008859 change Effects 0.000 description 3
- 238000011160 research Methods 0.000 description 3
- 230000007547 defect Effects 0.000 description 2
- 238000005516 engineering process Methods 0.000 description 2
- 230000008569 process Effects 0.000 description 2
- 230000002159 abnormal effect Effects 0.000 description 1
- 230000002411 adverse Effects 0.000 description 1
- 230000009286 beneficial effect Effects 0.000 description 1
- 230000007812 deficiency Effects 0.000 description 1
- 238000013461 design Methods 0.000 description 1
- 238000011161 development Methods 0.000 description 1
- 230000000694 effects Effects 0.000 description 1
- 230000006870 function Effects 0.000 description 1
- 230000006872 improvement Effects 0.000 description 1
- 238000010801 machine learning Methods 0.000 description 1
- 238000004519 manufacturing process Methods 0.000 description 1
- 239000007787 solid Substances 0.000 description 1
- 238000013179 statistical model Methods 0.000 description 1
- 238000012360 testing method Methods 0.000 description 1
- 238000012549 training Methods 0.000 description 1
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/20—Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
- G06F16/24—Querying
- G06F16/245—Query processing
- G06F16/2458—Special types of queries, e.g. statistical queries, fuzzy queries or distributed queries
- G06F16/2465—Query processing support for facilitating data mining operations in structured databases
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/20—Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
- G06F16/24—Querying
- G06F16/245—Query processing
- G06F16/2458—Special types of queries, e.g. statistical queries, fuzzy queries or distributed queries
- G06F16/2462—Approximate or statistical queries
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06Q—INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
- G06Q50/00—Information and communication technology [ICT] specially adapted for implementation of business processes of specific business sectors, e.g. utilities or tourism
- G06Q50/01—Social networking
Landscapes
- Engineering & Computer Science (AREA)
- Physics & Mathematics (AREA)
- Theoretical Computer Science (AREA)
- Databases & Information Systems (AREA)
- Probability & Statistics with Applications (AREA)
- General Physics & Mathematics (AREA)
- Computational Linguistics (AREA)
- Data Mining & Analysis (AREA)
- Business, Economics & Management (AREA)
- General Engineering & Computer Science (AREA)
- Software Systems (AREA)
- Mathematical Physics (AREA)
- Fuzzy Systems (AREA)
- Computing Systems (AREA)
- Health & Medical Sciences (AREA)
- Economics (AREA)
- General Health & Medical Sciences (AREA)
- Human Resources & Organizations (AREA)
- Marketing (AREA)
- Primary Health Care (AREA)
- Strategic Management (AREA)
- Tourism & Hospitality (AREA)
- General Business, Economics & Management (AREA)
- Management, Administration, Business Operations System, And Electronic Commerce (AREA)
Abstract
The invention discloses a kind of user subject analytic method based on structure and attributes similarity, pass through the analysis and modeling to social networks, combine the friend relation and individual subscriber data in social networks, i.e. structural information and attribute information, realize the purpose across the user subject parsing of social platform.During entity resolution, the concept of dynamic threshold is introduced, the data characteristicses under present case, regulation and control attribute and structure proportion is adapted to using different threshold values in the different times of iteration, to obtain more precisely result.
Description
Technical field
The invention belongs to entity resolution technical field, more specifically, it is related to a kind of based on structure and attributes similarity
User subject analytic method.
Background technology
In data set, the object in real world pointed by data, commonly referred to as entity (Entity).For same
One entity, in different or even same data set, it is understood that there may be a variety of different performances or description form, comes when by multiple differences
When the data set in source is merged to analyze and process, these then can be mixed in together for the description of same entity, causes certain
The polyisomenism of degree.Entity resolution (Entity Resolution), is exactly that a variety of different descriptions concentrated to data are carried out
Identification, connection, determine which description is mapped in the process of the same entity in real world.Entity resolution is data prediction mistake
An important step in journey, is mainly used in solving the quality problems such as the repeated and redundant of data.
With the fast development of social networks, application of the entity resolution in terms of social networks is gradually of concern.Greatly
Part social network user not only uses a social networks, but according to oneself interest and needs, while using multiple
Social networks, and the information between different social platform is isolated, not intercommunication, therefore Direct Recognition of having no idea is same
Virtual identity of the individual user in different platform.The cross-platform entity resolution problem of social networks is exactly matching and is recognized in difference
The account for belonging to same user subject in social platform, i.e. user's identification or account are matched.Pass through the matching of account identity, energy
The personalized service to user is enough realized, and also contributes to solve some safety problems of social networks.
This concept of entity resolution is set forth in nineteen fifty-nine earliest.Newcombe et al. exists《Science》On the article delivered it is first
It is secondary to propose this concept, and think that entity resolution is a statistical problem, elaborate entity resolution problem from the angle of probability.
1969 after 10 years, Fellegi and Sunter were made that standardization and formulated to entity resolution problem first, they by its
The classification problem that is considered as in a machine learning and the specification series of sign of entity resolution and fixed in their article
Justice, establishes the Fellegi-Sunter models of classics.In research after this, there are Many researchers to Fellegi-
Sunter models are improved and supplemented, and mainly have Jaro, Winkler, Belin and a Rubin, Ravikumar, Larsen,
Sadinle et al., wherein, Winkler has done substantial amounts of work, using Bayesian statistical model, to Fellegi-Sunter moulds
A series of improvement has been done in the parameter calculating of type and matched rule etc..
For the entity resolution research of social networks, mainly deploy in recent years.Most of researcher is conceived to social network
These aspect expansion researchs of attribute, structure and the social content of network.Attribute refers to the personal information of user, such as head portrait, use
Name in an account book, sex, birthday, education background, location etc., structure refer to the friend relation between account and account in social networks,
And social content refers to the information such as text, picture that user produces in doings, such as blog, comment, geographical position.
What the algorithm based on attribute was mainly utilized is the personal information information of user in social networks, and each single item is described to believe
Breath is respectively seen as the attribute of user, problem is converted into the matching of attribute field.As Zafarani and Liu utilizes user name
With the URL of individual subscriber homepage, it is proposed that user subject analytical algorithm.Goga et al. is proposed suitable for extensive identification
Algorithm.Structure-based algorithm is exactly the main friend relation information using social networks, and social networks is abstracted into figure knot
Structure, realizes that user subject is parsed using some graph structure information.Narayanan and Shmatikov[13]And Bartunov etc.
People have studied the algorithm of correlation from this angle.Algorithm based on social content using analysis to text style, and the time,
The information such as geographical position, realize that user subject is parsed.Write as Almishari and Tsudik proposes one kind by analyzing author
Style, the method for recognizing user in different social platforms.Goga et al. propose using user issue content when geo-location,
Timestamp, and the writing style of content realize the work of user's identification to combine.
Algorithm for being conceived to attribute information, because the account personal information in social platform has a certain degree of lack
Become estranged inaccurate, this kind of abnormal data can be impacted to algorithm performance, and this influence from data in itself is very
What hardly possible was removed.The inaccurate of information is avoided from the algorithm of structure, but ought be existed in small groupuscule, Ke Nengduo
Situation about connecting entirely is almost formd between individual account, then how to distinguish then very difficult into one between these accounts
Problem.Therefore structure-based algorithm is in the case where friend relation is very intensive, it is difficult to play a role well.And based on interior
The method of appearance, related data is difficult to obtain and is difficult to handle, and is not convenient to use.Method proposed by the present invention, is organically combined
Attribute and the category information of structure two, avoid the defect of various methods as far as possible.
The content of the invention
It is an object of the invention to overcome the deficiencies of the prior art and provide a kind of user based on structure and attributes similarity
Entity resolution method, with reference to attribute and the aspect information of structure two, problem is parsed to solve the cross-platform user subject of social networks.
For achieving the above object, a kind of user subject analytic method based on structure and attributes similarity of the present invention,
It is characterised in that it includes following steps:
(1) attributes similarity matrix and adjacency matrix, are set up
According to the attributes similarity of all accounts between any two on social platform A and social platform B, attribute is built similar
Spend matrix Sm×n, wherein, m and n are respectively the account base in platform A and B, Sm×nIn element representation correspondence two accounts between
Attributes similarity;
Whether it is between any two respectively friend relation according to all accounts on social platform A and social platform B, sets up adjacent
Connect matrixWithWherein, every a line of adjacency matrix, each row all represent an account in the platform, adjacent square
In the element representation platform between two accounts of correspondence whether it is friend relation in battle array, if friend relation, then the element value
For 1, the element value is 0 if being not friend relation;
(2) incidence matrix, is set up
According to adjacency matrixWith priori matching pair, set up unidentified in social platform A and social platform B
Account and the incidence matrix between account is recognizedWherein, τ represent priori matching to number, association
Every a line of matrix represents unidentified account, and each row, which are all represented, has recognized account, the unidentified account of element representation in incidence matrix
Family and recognize between account whether be friend relation, if friend relation, then the element value is 1, if do not closed for good friend
Then the element value is 0 for system;
(3) common friend matrix, is set up
According to incidence matrixWith priori matching pair, set up in social platform A and social platform B and do not know
The common friend matrix of other account;
Wherein, ()TTransposition is represented, every a line of common friend matrix represents a unidentified account in social platform AEach row represent a unidentified account in social platform BElement f in common friend matrixijRepresentWith
Priori matches the common friend number of centering;
(4) account pair of the corresponding two unidentified account compositions of maximum nonzero element, is selected from common friend matrix,
And account is stored in in set Q, Q=(i, j) | fij=max (F(m-τ)×(n-τ))};
(5), in attributes similarity matrix Sm×nIn, take out account similar to the attribute between all accounts pair in set Q
Degree, and it is stored in similarity set S*In, S*={ sij|sij∈Sm×n,(i,j)∈Q};
(6), according to default initial threshold, by similarity set S*In less than initial threshold element delete, simultaneously will
Account is deleted the corresponding element in set Q;
(7) whether to set Q be empty, if sky, then by the maximum nonzero element in common friend matrix if, judging account
Set to 0, return again to step (4);If being not sky, into step (8);
(8) similarity set S, is taken out*In greatest member max (S*), and selected and max (S in account is to set Q*)
Corresponding account is to (i, j), then (i, j) corresponding one group of accountLabeled as the match is successful, and it is added to epicycle iteration
Result set M in;
(9), account has common account to deleting the account that is added in result set M in set Q to (i, j), and with (i, j)
The account pair at family, while deleting similarity set S*Middle corresponding element;
(10) account, is judged to whether also there is element in set Q, if it is present return to step (8);If do not deposited
In then output result collection M;
(11), corresponding account in result set M is returned again to step (2), carry out epicycle to being added to priori matching centering
Next iteration, when do not have in result set M new matching to output when epicycle iteration terminate;
(12) size of initial threshold, is changed, step (2) is returned again to, the iteration of next round is carried out, when initial by modification
After threshold value, in result set M still without new matching to output when iteration terminate, complete user subject parsing.
What the goal of the invention of the present invention was realized in:
A kind of user subject analytic method based on structure and attributes similarity of the present invention, passes through the analysis to social networks
And modeling, combine the friend relation and individual subscriber data, i.e. structural information and attribute information in social networks, realize across
The purpose of the user subject parsing of social platform.During entity resolution, the concept of dynamic threshold is introduced, in iteration
Different times adapt to the data characteristicses under present case, regulation and control attribute and structure proportion using different threshold values, to obtain
Obtain more precisely result.
Meanwhile, a kind of user subject analytic method based on structure and attributes similarity of the present invention also has following beneficial effect
Really:
(1) information of both attribute and structure, is combined, it is to avoid the defect of single piece of information and result accuracy is made
Into adverse effect, the influence that such as the attribute influence that causes of missing and friend relation dense band come.
(2) concept of dynamic threshold, is introduced, during iteration, attributes similarity threshold value is not constant all the time, and
It is to increase with producing result, is gradually changed in certain scope, the characteristics of adapts to different iteration periods.Threshold value is originally
Performed since with the upper bound, obtain result the most accurate, be then gradually reduced, it is not high enough to store more attributes similarities
True match pair.
(3), using priori matching to as iteration starting point, result is all fed back among known conditions, is not required to by each iteration
Want comparatively large number of known conditions or training data to set up model, it is only necessary to less true match to can implementation, keep away
The problem of known conditions is not enough is exempted from.
Brief description of the drawings
Fig. 1 is the user subject analytic method flow chart of the invention based on structure and attributes similarity;
Fig. 2 is the friend relation structure chart of two social platforms in example.
Embodiment
The embodiment to the present invention is described below in conjunction with the accompanying drawings, so as to those skilled in the art preferably
Understand the present invention.Requiring particular attention is that, in the following description, when known function and design detailed description perhaps
When can desalinate the main contents of the present invention, these descriptions will be ignored herein.
Embodiment
Fig. 1 is the user subject analytic method flow chart of the invention based on structure and attributes similarity.
In the present embodiment, our definition first to some titles are described:
One social platform is modeled as to the friend relation pair between the form of a non-directed graph, account corresponding node, account
The side between node, i.e. G={ V, E } are answered, wherein G represents social platform, and V is the set of account in the platform, and E is good in the platform
The set of friendly relation.Friend relation in social platform is divided into unidirectional and two-way two types.It is theoretical for unidirectional good friend's type
On abstract should turn to digraph, but in this algorithm, friend relation is the highly important foundation of user subject parsing, it is considered to
To between the account only unidirectionally paid close attention to, its intimate degree is not enough, it is impossible to which the user belonging to reflection account really hands over well
Friendship condition, therefore, in the social platform unidirectionally connected, this algorithm only considers the account paid close attention to mutually, and by such relation
The friend relation in being bi-directionally connected is equivalent to, a non-directed graph is still modeled as.
A series of personal informations that each account in social platform possesses are collectively referred to as to the attribute of node, each single item
Attribute shows a certain item data of account, such as user name, sex, age.C represents the set of attribute, C=(C1,C2,
C3...), wherein, CiRepresent the title of an attribute.
The owner herein by the account in social network sites in real world, i.e., using the people of the account, define simultaneously
For user subject.The collection of user subject shares U and represented, U=(u1,u2,u3,…)。
Assuming that two social platforms are respectively A and B, then two accounts of two social platforms are belonging respectively to,WithSuch as
Really they point to same entity, i.e., they are possessed by the same person in real world, then are claimedWithMatching, is expressed asOr MA,B(i, j), if opposite mismatch, then it represents that beOr UMA,B(i,j).If matching account exists
Corresponding user subject is u in real worldk, then can be expressed as:
Before account matching is carried out, the account pair of a part of correct matching known in advance is generally required, these
Matching is to being commonly referred to as the matching pair of priori account, or seed matching pair.In actual user subject resolving, priori account
Family matching to acquisition be relatively difficult to solve, except artificial manual some priori of mark are matched in addition to, main method is
A kind of unique mark that can determine user subject is found, account, or IP address etc., but this category information are bound in such as E-mail address
It is general to be relatively difficult to obtain, it is therefore desirable to consider that new method is substituted.
Can not determine directly priori matching to information in the case of, it may be considered that the attribute accessed to your account is similar
Spend to obtain priori matching pair.This paper algorithms priori is matched to quantitative requirement it is not high, therefore attribute can be passed through first
Similarity, selects a part of account matching pair of similarity highest, and choose the most a part of account pair of good friend's number wherein
It is considered as priori matching pair, to ensure critical role in a network, then carries out the execution of algorithm.The matching pair so selected
Though it is impossible to ensure that accurately pointing to an entity, it can solve the problem that priori matching to more unobtainable problem to a certain extent.
With reference to shown in Fig. 1, the user subject analytic method based on structure attribute similarity a kind of to the present invention is carried out
Describe in detail, specifically include following steps:
S1, set up attributes similarity matrix and adjacency matrix
First according to the attributes similarity of all accounts between any two on social platform A and social platform B, attribute is built
Similarity matrix Sm×n, m and n is 7 in this example, directly give here removed in attributes similarity matrix priori matching to portion
Point, shown with the form of form, as shown in table 1.
Table 1 is the major part of attributes similarity matrix.
Table 1
Next according to the structural relation of two platforms in Fig. 2, such as Fig. 2 (a) and Fig. 2 (b) are shown, set up adjacency matrixWithRespectively:
S2, set up incidence matrix
According to adjacency matrixWith priori matching pair, set up unidentified in social platform A and social platform B
Account and the incidence matrix between account is recognizedIn the present embodiment, priori be paired into (1,1) and
Two groups of (2,2), represent that then incidence matrix is respectively with solid node in fig. 2:
S3, set up common friend matrix
According to incidence matrixWith priori matching pair, unidentified account in social platform A and social platform B is set up
The common friend matrix at family;
S4, the account pair for selecting from common friend matrix the corresponding two unidentified account compositions of maximum nonzero element,
And account is stored in in set Q, Q=(i, j) | fij=max (F(m-τ)×(n-τ))};
S5, in attributes similarity matrix Sm×nIn, account is taken out to the attributes similarity between all accounts pair in set Q,
And it is stored in similarity set S*In, S*={ sij|sij∈Sm×n,(i,j)∈Q};
The upper bound of threshold value and lower bound are respectively set to 0.8 and 0.2 in S6, setting initial threshold, the present embodiment, then initially
Threshold value is 0.8.By similarity set S*In less than initial threshold element delete, while by account to the corresponding element in set Q
Element is deleted, now Q={ (3,3), (4,4) }, S*={ 0.85,1 };
S7, judge whether account is empty to set Q, if sky, then by the maximum nonzero element in common friend matrix
Set to 0, return again to step S4;If being not sky, into step S8;
S8, taking-up similarity set S*In greatest member max (S*), i.e., 1, and selected and 1 pair in account is to set Q
The account answered is to (4,4), then (4,4) corresponding one group of accountLabeled as the match is successful, and it is added to epicycle iteration
In result set M;
S9, account have common account to deleting the account that is added in result set M in set Q to (4,4), and with (4,4)
The account pair at family, while deleting similarity set S*Middle corresponding element, now Q={ (3,3) }, S*={ 0.85 };
S10, account is judged to whether also there is element in set Q, if still with the presence of element, return to step S8 repeats to hold
OK;If existed without element, after current iteration terminates, (3,3) and (4,4) are added into result set M, and output result collection M;
S11, by corresponding account in result set M to be added to priori matching centering, return again to step S2, rebuild pass
Join matrix and common friend matrix, carry out the next iteration of epicycle, repeat above-mentioned steps, finally when without new result
During generation, epicycle iteration terminates, and now has (3,3), (4,4), and (5,5) three groups of accounts are to being added into result set M;
S12, the size for changing initial threshold, return again to step S2, carry out the iteration of next round;
Modification threshold value formula be:
Wherein, th represents amended threshold value, thuAnd thlThe respectively upper bound of initial threshold and lower bound, | Mc| represent to work as
Matched in preceding result set M to number, min (NA,NB) less value in social platform A and B account quantity is represented, τ represents elder generation
Test matching to number.
According to formula, the threshold value that next round iteration is used is:
Three groups of accounts in M are simultaneously performed into follow-up step to being added to priori matching centering, return to step S2 with new threshold value
Suddenly.
Second wheel iteration can be by (6,6) this group of account to being added in result set, and newly iteration once is come to nothing production
It is raw, and change threshold value and remained unchanged generation of coming to nothing after 0.32, to perform new round iteration, now iteration terminates, and finally generates
(3,3), (4,4), (5,5), (6,6) this four groups of account matching results.
Although illustrative embodiment of the invention is described above, in order to the technology of the art
Personnel understand the present invention, it should be apparent that the invention is not restricted to the scope of embodiment, to the common skill of the art
For art personnel, as long as various change is in the spirit and scope of the present invention that appended claim is limited and is determined, these
Change is it will be apparent that all utilize the innovation and creation of present inventive concept in the row of protection.
Claims (2)
1. a kind of user subject analytic method based on structure attribute similarity, it is characterised in that comprise the following steps:
(1) attributes similarity matrix and adjacency matrix, are set up
According to the attributes similarity of all accounts between any two on social platform A and social platform B, attributes similarity square is built
Battle array Sm×n, wherein, m and n are respectively the account base in platform A and B, Sm×nIn element representation correspondence two accounts between category
Property similarity;
Whether it is between any two respectively friend relation according to all accounts on social platform A and social platform B, sets up adjacent square
Battle arrayWithWherein, every a line of adjacency matrix, each row are all represented in an account in the platform, adjacency matrix
Between two accounts of correspondence whether it is friend relation in the element representation platform, if friend relation, then the element value is 1,
The element value is 0 if being not friend relation;
(2) incidence matrix, is set up
According to adjacency matrixWith priori matching pair, set up in social platform A and social platform B unidentified account with
The incidence matrix between account is recognizedWherein, τ represent priori matching to number, incidence matrix
Unidentified account is represented per a line, each row, which are all represented, has recognized account, in incidence matrix the unidentified account of element representation with
Whether it is friend relation between identification account, if friend relation, then the element value is 1, should if being not friend relation
Element value is 0;
(3) common friend matrix, is set up
According to incidence matrixWith priori matching pair, unidentified account in social platform A and social platform B is set up
The common friend matrix at family;
<mrow>
<msup>
<mi>F</mi>
<mrow>
<mo>(</mo>
<mi>m</mi>
<mo>-</mo>
<mi>&tau;</mi>
<mo>)</mo>
<mo>&times;</mo>
<mo>(</mo>
<mi>n</mi>
<mo>-</mo>
<mi>&tau;</mi>
<mo>)</mo>
</mrow>
</msup>
<mo>=</mo>
<msubsup>
<mi>R</mi>
<mi>A</mi>
<mrow>
<mo>(</mo>
<mi>m</mi>
<mo>-</mo>
<mi>&tau;</mi>
<mo>)</mo>
<mo>&times;</mo>
<mi>&tau;</mi>
</mrow>
</msubsup>
<mo>&times;</mo>
<msup>
<mrow>
<mo>(</mo>
<msubsup>
<mi>R</mi>
<mi>B</mi>
<mrow>
<mo>(</mo>
<mi>n</mi>
<mo>-</mo>
<mi>&tau;</mi>
<mo>)</mo>
<mo>&times;</mo>
<mi>&tau;</mi>
</mrow>
</msubsup>
<mo>)</mo>
</mrow>
<mi>T</mi>
</msup>
</mrow>
Wherein, ()TTransposition is represented, every a line of common friend matrix represents a unidentified account in social platform A
Each row represent a unidentified account in social platform BElement f in common friend matrixijRepresentAnd vBIn priori
Match the common friend number of centering;
(4) account pair of the corresponding two unidentified account compositions of maximum nonzero element, is selected from common friend matrix, and is deposited
Account is placed on in set Q, Q=(i, j) | fij=max (F(m-τ)×(n-τ))};
(5), in attributes similarity matrix Sm×nIn, account is taken out to the attributes similarity between all accounts pair in set Q, and
It is stored in similarity set S*In, S*={ sij|sij∈Sm×n,(i,j)∈Q};
(6), according to default initial threshold, by similarity set S*In deleted less than the element of initial threshold, while by account pair
Corresponding element in set Q is deleted;
(7) whether to set Q be empty, if sky if, judging account, then put the maximum nonzero element in common friend matrix
0, return again to step (4);If being not sky, into step (8);
(8) similarity set S, is taken out*In greatest member max (S*), and selected and max (S in account is to set Q*) correspondence
Account to (i, j), then (i, j) corresponding one group of accountLabeled as the match is successful, and it is added to the knot of epicycle iteration
In fruit collection M;
(9), account has joint account to deleting the account that is added in result set M in set Q to (i, j), and with (i, j)
Account pair, while deleting similarity set S*Middle corresponding element;
(10) account, is judged to whether also there is element in set Q, if it is present return to step (8);If it does not exist, then
Output result collection M;
(11), corresponding account in result set M is returned again to step (2), carried out under epicycle to being added to priori matching centering
An iteration, when not having new matching to output in result set M, epicycle iteration terminates;
(12) size of initial threshold, is changed, step (2) is returned again to, the iteration of next round is carried out, when by changing initial threshold
Afterwards, iteration terminates when appointing in result set M so without new matching to output, completes user subject parsing.
2. the user subject analytic method according to claim 1 based on structure attribute similarity, it is characterised in that described
The method of modification initial threshold be:
<mrow>
<mi>t</mi>
<mi>h</mi>
<mo>=</mo>
<msub>
<mi>th</mi>
<mi>u</mi>
</msub>
<mo>-</mo>
<mfrac>
<mrow>
<mo>|</mo>
<msub>
<mi>M</mi>
<mi>c</mi>
</msub>
<mo>|</mo>
</mrow>
<mrow>
<mi>m</mi>
<mi>i</mi>
<mi>n</mi>
<mrow>
<mo>(</mo>
<msub>
<mi>N</mi>
<mi>A</mi>
</msub>
<mo>,</mo>
<msub>
<mi>N</mi>
<mi>B</mi>
</msub>
<mo>)</mo>
</mrow>
<mo>-</mo>
<mi>&tau;</mi>
</mrow>
</mfrac>
<mo>&times;</mo>
<mrow>
<mo>(</mo>
<msub>
<mi>th</mi>
<mi>u</mi>
</msub>
<mo>-</mo>
<msub>
<mi>th</mi>
<mi>l</mi>
</msub>
<mo>)</mo>
</mrow>
</mrow>
Wherein, th represents amended threshold value, thuAnd thlThe respectively upper bound of initial threshold and lower bound, | Mc| represent current knot
Fruit collection M in match to number, min (NA,NB) less value in social platform A and B account quantity is represented, τ represents priori
The number of pairing.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201710470266.6A CN107330020B (en) | 2017-06-20 | 2017-06-20 | User entity analysis method based on structure and attribute similarity |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201710470266.6A CN107330020B (en) | 2017-06-20 | 2017-06-20 | User entity analysis method based on structure and attribute similarity |
Publications (2)
Publication Number | Publication Date |
---|---|
CN107330020A true CN107330020A (en) | 2017-11-07 |
CN107330020B CN107330020B (en) | 2020-03-24 |
Family
ID=60194269
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201710470266.6A Active CN107330020B (en) | 2017-06-20 | 2017-06-20 | User entity analysis method based on structure and attribute similarity |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN107330020B (en) |
Cited By (7)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN109150974A (en) * | 2018-07-19 | 2019-01-04 | 电子科技大学 | A kind of user identity link method based on neighbours' iteration similarity |
CN109978033A (en) * | 2019-03-15 | 2019-07-05 | 第四范式(北京)技术有限公司 | The method and apparatus of the building of biconditional operation people's identification model and biconditional operation people identification |
CN109977979A (en) * | 2017-12-28 | 2019-07-05 | 中国移动通信集团广东有限公司 | Position method, apparatus, electronic equipment and the storage medium of seed user |
CN110222790A (en) * | 2019-06-17 | 2019-09-10 | 南京中孚信息技术有限公司 | Method for identifying ID, device and server |
WO2020021404A1 (en) * | 2018-07-24 | 2020-01-30 | International Business Machines Corporation | Two level compute memoing for large scale entity resolution |
CN111475738A (en) * | 2020-05-22 | 2020-07-31 | 哈尔滨工程大学 | Heterogeneous social network location anchor link identification method based on meta-path |
CN113159976A (en) * | 2021-05-13 | 2021-07-23 | 电子科技大学 | Identification method for important users of microblog network |
Citations (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN1984354A (en) * | 2006-04-13 | 2007-06-20 | 华为技术有限公司 | Method and device for managing user account resource |
US20150051988A1 (en) * | 2013-08-15 | 2015-02-19 | Hui-Min Chen | Detecting marketing opportunities based on shared account characteristics systems and methods |
CN105429999A (en) * | 2015-12-17 | 2016-03-23 | 北京荣之联科技股份有限公司 | Unified identity authentication system based on cloud platform |
CN105741175A (en) * | 2016-01-27 | 2016-07-06 | 电子科技大学 | Method for linking accounts in OSNs (On-line Social Networks) |
CN105933311A (en) * | 2016-04-19 | 2016-09-07 | 安徽电信规划设计有限责任公司 | Account auditing method |
-
2017
- 2017-06-20 CN CN201710470266.6A patent/CN107330020B/en active Active
Patent Citations (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN1984354A (en) * | 2006-04-13 | 2007-06-20 | 华为技术有限公司 | Method and device for managing user account resource |
US20150051988A1 (en) * | 2013-08-15 | 2015-02-19 | Hui-Min Chen | Detecting marketing opportunities based on shared account characteristics systems and methods |
CN105429999A (en) * | 2015-12-17 | 2016-03-23 | 北京荣之联科技股份有限公司 | Unified identity authentication system based on cloud platform |
CN105741175A (en) * | 2016-01-27 | 2016-07-06 | 电子科技大学 | Method for linking accounts in OSNs (On-line Social Networks) |
CN105933311A (en) * | 2016-04-19 | 2016-09-07 | 安徽电信规划设计有限责任公司 | Account auditing method |
Non-Patent Citations (1)
Title |
---|
辛涛: "基于Web的人物信息搜索关键问题研究", 《中国优秀硕士学位论文全文数据库》 * |
Cited By (13)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN109977979A (en) * | 2017-12-28 | 2019-07-05 | 中国移动通信集团广东有限公司 | Position method, apparatus, electronic equipment and the storage medium of seed user |
CN109977979B (en) * | 2017-12-28 | 2021-12-07 | 中国移动通信集团广东有限公司 | Method and device for locating seed user, electronic equipment and storage medium |
CN109150974A (en) * | 2018-07-19 | 2019-01-04 | 电子科技大学 | A kind of user identity link method based on neighbours' iteration similarity |
GB2588874A (en) * | 2018-07-24 | 2021-05-12 | Ibm | Two level compute memoing for large scale entity resolution |
WO2020021404A1 (en) * | 2018-07-24 | 2020-01-30 | International Business Machines Corporation | Two level compute memoing for large scale entity resolution |
US10776269B2 (en) | 2018-07-24 | 2020-09-15 | International Business Machines Corporation | Two level compute memoing for large scale entity resolution |
CN109978033A (en) * | 2019-03-15 | 2019-07-05 | 第四范式(北京)技术有限公司 | The method and apparatus of the building of biconditional operation people's identification model and biconditional operation people identification |
CN109978033B (en) * | 2019-03-15 | 2020-08-04 | 第四范式(北京)技术有限公司 | Method and device for constructing same-operator recognition model and method and device for identifying same-operator |
CN110222790A (en) * | 2019-06-17 | 2019-09-10 | 南京中孚信息技术有限公司 | Method for identifying ID, device and server |
CN111475738A (en) * | 2020-05-22 | 2020-07-31 | 哈尔滨工程大学 | Heterogeneous social network location anchor link identification method based on meta-path |
CN111475738B (en) * | 2020-05-22 | 2022-05-17 | 哈尔滨工程大学 | Heterogeneous social network location anchor link identification method based on meta-path |
CN113159976A (en) * | 2021-05-13 | 2021-07-23 | 电子科技大学 | Identification method for important users of microblog network |
CN113159976B (en) * | 2021-05-13 | 2022-05-24 | 电子科技大学 | Identification method for important users of microblog network |
Also Published As
Publication number | Publication date |
---|---|
CN107330020B (en) | 2020-03-24 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN107330020A (en) | A kind of user subject analytic method based on structure and attributes similarity | |
CN106250412B (en) | Knowledge mapping construction method based on the fusion of multi-source entity | |
CN102902362B (en) | Character input method and system | |
CN110019647B (en) | Keyword searching method and device and search engine | |
CN107122411B (en) | Collaborative filtering recommendation method based on discrete multi-view Hash | |
CN105631037B (en) | A kind of image search method | |
CN105389329B (en) | A kind of open source software recommended method based on community review | |
CN101496003A (en) | Compatibility scoring of users in a social network | |
TW201317814A (en) | Method and Apparatus of Ranking Search Results, and Search Method and Apparatus | |
CN107145545A (en) | Top k zone users text data recommends method in a kind of location-based social networks | |
CN104978396A (en) | Knowledge database based question and answer generating method and apparatus | |
CN104778210B (en) | A kind of microblogging forwarding tree and forwarding forest construction method | |
CN107665217A (en) | A kind of vocabulary processing method and system for searching service | |
CN105761154B (en) | A kind of socialization recommended method and device | |
CN102163234A (en) | Equipment and method for error correction of query sequence based on degree of error correction association | |
CN112667877A (en) | Scenic spot recommendation method and equipment based on tourist knowledge map | |
CN107368540A (en) | The film that multi-model based on user's self-similarity is combined recommends method | |
CN103377237B (en) | The neighbor search method of high dimensional data and fast approximate image searching method | |
CN109992786A (en) | A kind of semantic sensitive RDF knowledge mapping approximate enquiring method | |
CN106156155A (en) | A kind of method and system that e-book resource is provided | |
CN111177559B (en) | Text travel service recommendation method and device, electronic equipment and storage medium | |
CN105869058B (en) | A kind of method that multilayer latent variable model user portrait extracts | |
CN109284411A (en) | One kind being based on having supervision hypergraph discretized image binary-coding method | |
CN102081666B (en) | Index construction method and device for distributed picture search | |
CN112784590A (en) | Text processing method and device |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
GR01 | Patent grant | ||
GR01 | Patent grant |