CN110598129A - Cross-social network user identity recognition method based on two-stage information entropy - Google Patents

Cross-social network user identity recognition method based on two-stage information entropy Download PDF

Info

Publication number
CN110598129A
CN110598129A CN201910865901.XA CN201910865901A CN110598129A CN 110598129 A CN110598129 A CN 110598129A CN 201910865901 A CN201910865901 A CN 201910865901A CN 110598129 A CN110598129 A CN 110598129A
Authority
CN
China
Prior art keywords
user
social network
users
similarity
attributes
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN201910865901.XA
Other languages
Chinese (zh)
Other versions
CN110598129B (en
Inventor
邢玲
邓凯凯
高建平
吴红海
谢萍
张明川
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Henan University of Science and Technology
Original Assignee
Henan University of Science and Technology
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Henan University of Science and Technology filed Critical Henan University of Science and Technology
Priority to CN201910865901.XA priority Critical patent/CN110598129B/en
Publication of CN110598129A publication Critical patent/CN110598129A/en
Application granted granted Critical
Publication of CN110598129B publication Critical patent/CN110598129B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/90Details of database functions independent of the retrieved data types
    • G06F16/95Retrieval from the web
    • G06F16/953Querying, e.g. by the use of web search engines
    • G06F16/9536Search customisation based on social or collaborative filtering
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06QINFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
    • G06Q50/00Information and communication technology [ICT] specially adapted for implementation of business processes of specific business sectors, e.g. utilities or tourism
    • G06Q50/01Social networking

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Databases & Information Systems (AREA)
  • Business, Economics & Management (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • Computing Systems (AREA)
  • Data Mining & Analysis (AREA)
  • Health & Medical Sciences (AREA)
  • Economics (AREA)
  • General Health & Medical Sciences (AREA)
  • Human Resources & Organizations (AREA)
  • Marketing (AREA)
  • Primary Health Care (AREA)
  • Strategic Management (AREA)
  • Tourism & Hospitality (AREA)
  • General Business, Economics & Management (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

The invention discloses a cross-social network user identity recognition method based on two-level information entropy, which comprises the steps of crawling archive information and behavior information of respective users from two social networks, screening common attributes from the archive information attributes of the two social networks, extracting data corresponding to the common attributes from the archive information of each user, calculating the similarity of the common attributes of the users in the two social networks, extracting characteristic attributes of behaviors from the behavior information of each user, calculating the similarity of the behavior attributes of the users in the two social networks, performing weight distribution based on the two-level information entropy, weighting each attribute to obtain matching scores of the two users, and performing user matching according to the matching scores to obtain a user identity recognition result. The method for distributing the weight based on the two-level information entropy solves the problem of unbalance of multiple attributes of the user in the aspect of weight distribution, and improves the user identity recognition performance.

Description

Cross-social network user identity recognition method based on two-stage information entropy
Technical Field
The invention belongs to the technical field of data mining, and particularly relates to a cross-social-network user identity recognition method based on two-level information entropy.
Background
Social networks provide people with a rich social service. According to statistics, 42% of users have multiple social network accounts at the same time. Because different social networks have respective unique social modes and bring different social services to users, rich social user information is generated. However, the individual social network accounts are isolated and have no direct connection, so that the social information generated by the user account is distributed over multiple social networks. The identification of the user identity across the social networks refers to identifying virtual accounts belonging to the same real user in different social networks. The technical solution can provide comprehensive user information for network recommendation, user modeling and user behavior analysis, and realize full mining of the multisource social network big data.
The core idea of the existing related research is to utilize user profile information, network topology information and user behavior information to calculate and analyze whether a user account matching pair is the same user. Cross-social network user identification consists essentially of three parts: user data extraction, data similarity calculation and account matching. The user data is extracted by mainly adopting a relatively efficient crawler technology to crawl, clean and store the data. Secondly, the similarity between the user data is calculated by using the extracted data and the similarity function, and the greater the similarity is, the greater the probability that different virtual accounts belong to the same user is. And finally, matching the account numbers by adopting a related matching strategy according to the calculated similarity.
The existing cross-social network user identity identification method based on user profile information has the possibility of user data forgery, and people pay more and more attention to privacy protection at present. Therefore, the recognition effect of this kind of method is not ideal. And secondly, the identification method based on the network topological structure is adopted, and although the friend relationship of the user is easy to obtain, the connection of the friend relationship has sparsity. And finally, the method is an identification method based on user behavior data, the method utilizes the content issued by the user to identify the user identity, and compared with the two methods, the method breaks the limit of the two methods. In addition, the existing research also utilizes the combination of user profile information and network structure for identification, but the method is still limited by the above conditions so as not to achieve good identification effect.
Disclosure of Invention
The invention aims to overcome the defects of the prior art, provides a cross-social network user identity recognition method based on two-level information entropy, and provides a weight distribution method based on two-level information entropy, so that the problem of unbalance of multiple attributes of a user in the aspect of weight distribution is solved, and the user identity recognition performance is improved.
In order to achieve the purpose, the cross-social network user identity recognition method based on the two-level information entropy comprises the following steps:
s1: respectively crawling the profile information and the behavior information of respective users from the social network A and the social network B, and respectively recording the number N of the users in the two social networksAAnd NB
S2: common attributes are screened out from the profile information attributes of the two social networks, data corresponding to the common attributes are extracted from the profile information of each user, and then the similarity of each common attribute of each user i in the social network A and each common attribute of each user j in the social network B is calculatedi=1,2,…,NA,j=1,2,…,NBM is 1,2, …, M represents the number of common attributes;
s3: extracting preset data of N characteristic attributes from the behavior information of each user, and then calculating each user i in the social network A and each user i in the social network BSimilarity of each characteristic attribute of each user jn=1,2,…,N;
S4: integrating the data of M common attributes of all users extracted from the archive information and the data of N characteristic attributes of all users extracted from the behavior information into H attribute data, wherein H is M + N, then determining the weight of the H attributes by adopting an entropy weight method, and taking the weight as a primary weight z of each attributeh,h=1,2,…,H;
Calculating contribution probability normalization value P of each attributeh
Construction of variant weight R based on information entropyh
E(Ph)=-PhlogPh
Calculating attribute weight W based on two-stage information entropyh
S5: the attribute weight W obtained in the step S4 is adoptedhCalculating the weighted sum of the similarity of the H attributes of each user i in the social network A and each user j in the social network B as the matching score of each user i in the social network A and each user j in the social network Bi,j
S6: score according to matching scores of each user i in the social network A and each user j in the social network Bi,jAnd matching the users in the two social networks to obtain a user identity recognition result.
The invention discloses a cross-social network user identity recognition method based on two-level information entropy, which comprises the steps of crawling archive information and behavior information of respective users from two social networks, screening common attributes from the archive information attributes of the two social networks, extracting data corresponding to the common attributes from the archive information of each user, calculating the similarity of the common attributes of the users in the two social networks, extracting characteristic attributes of behaviors from the behavior information of each user, calculating the similarity of the behavior attributes of the users in the two social networks, performing weight distribution based on the two-level information entropy, weighting each attribute to obtain matching scores of the two users, and performing user matching according to the matching scores to obtain a user identity recognition result.
The invention integrates two types of information which are most relevant to the user, namely user file information and user behavior information, so that the calculated similarity is more accurate, weight distribution is carried out based on two-stage information entropy, the problem of unbalance of multiple attributes of the user in the aspect of weight distribution is solved, the accuracy of user matching scoring can be improved, and the user identity identification performance is improved.
Drawings
FIG. 1 is a flowchart of an embodiment of a cross-social-network user identity recognition method based on two levels of information entropy;
FIG. 2 is a flowchart of a method for calculating the similarity of common attributes in the present embodiment;
FIG. 3 is a flowchart of a text information feature extraction calculation method based on frequent pattern mining in the present embodiment;
FIG. 4 is a graph comparing accuracy of the weight assignment method and the comparison method according to the present invention;
FIG. 5 is a chart comparing recall rates of the weight assignment method and the comparison method according to the present invention;
FIG. 6 is a comparison graph of F1 scores in the weight assignment method and the comparison method of the present invention;
FIG. 7 is a comparison graph of AUC of the weight assignment method and the comparison method of the present invention in this example;
fig. 8 is a comparison chart of four evaluation indexes of the user identification method and two comparison methods according to the present invention.
Detailed Description
The following description of the embodiments of the present invention is provided in order to better understand the present invention for those skilled in the art with reference to the accompanying drawings. It is to be expressly noted that in the following description, a detailed description of known functions and designs will be omitted when it may obscure the subject matter of the present invention.
Examples
FIG. 1 is a flowchart of a specific embodiment of a cross-social-network user identity recognition method based on two levels of information entropy. As shown in fig. 1, the method for identifying the user identity across the social network based on the two-level information entropy includes the following specific steps:
s101: acquiring user data:
respectively crawling the profile information and the behavior information of respective users from the social network A and the social network B, and respectively recording the number N of the users in the two social networksAAnd NB. Generally speaking, can make NA=NB
S102: calculating the similarity of the user profile information:
common attributes are screened out from the profile information attributes of the two social networks, data corresponding to the common attributes are extracted from the profile information of each user, and then the similarity of each common attribute of each user i in the social network A and each common attribute of each user j in the social network B is calculatedi=1,2,…,NA,j=1,2,…,NBAnd M is 1,2, …, M indicates the number of common attributes.
Since the user profile information includes a plurality of common attributes, for example, the user profile information includes 17 common attributes in this embodiment, and the data format corresponding to each common attribute may be different, it is necessary to select different ways to calculate the similarity of the common attributes according to the actual situation. Fig. 2 is a flowchart of a method for calculating the similarity of common attributes in this embodiment. As shown in fig. 2, the specific steps of the common attribute similarity in this embodiment include:
s201: firstly, judging whether the m-th common attribute is a preset key attribute. The key attribute refers to an attribute whose data must be consistent to determine similarity between users, for example, gender information of two users must be "male" or "female" at the same time to indicate similarity between the two users. If the attribute is the key attribute, the process proceeds to step S202, otherwise, the process proceeds to step S203.
S202: determining similarity based on consistency:
judging whether the m-th common attributes of the two users are consistent, if so, determining the similarity of the common attributesOtherwise
S203: and judging whether the m-th common attribute data is vectorized, if so, entering the step S204, otherwise, entering the step S205.
S204: determining similarity based on cosine similarity:
vectorizing the data of the m-th common attributes of the two users, calculating the cosine similarity between the two vectors, and taking the cosine similarity as the similarity of the m-th common attributes of the two usersThe cosine similarity is calculated as follows:
where A and B represent a vector formed by two data, Aq、BqDenotes the qth dimension of vectors a and B, respectively, Q being 1,2, …, Q denoting the vector dimension.
S205: determining similarity based on the Dice coefficient:
taking the data of the m-th common attributes of the two users as character strings, then calculating a Dice coefficient between the two character strings, and taking the Dice coefficient as the m-th common attributes of the two usersDegree of similarity ofThe calculation formula of the Dice coefficient is as follows:
where a and b represent two character strings, comm (a &' b) represents the number of identical characters in a and b, and len () represents the length of the character string.
S103: calculating the similarity of user behaviors:
extracting preset data of N characteristic attributes from the behavior information of each user, and then calculating the similarity of each characteristic attribute of each user i in the social network A and each characteristic attribute of each user j in the social network Bn=1,2,…,N。
As for the feature attributes, the category of the feature attributes may be determined according to actual needs, and in the embodiment of the present invention, three feature attributes are adopted: text information features, punctuation features, and state timestamp features. The similarity calculation methods for the three behavior feature attributes are described below.
Text information feature:
firstly, extracting text information features of each user based on frequent pattern mining to obtain a plurality of frequent items and support degree counts corresponding to the frequent items, and then calculating by adopting the following formula to obtain text information feature similarity of the two users
Wherein, F represents a frequent item,respectively representing the support degree counts of frequent items F corresponding to the user i in the social network A and the user j in the social network B, CFThe number of item sets representing frequent items F. Adding a "1" to the formula is to avoid high frequency terms.
Fig. 3 is a flowchart of a text information feature extraction calculation method based on frequent pattern mining in this embodiment. As shown in fig. 3, the text information feature extraction method based on frequent pattern mining in this embodiment includes the specific steps of:
s301: text word segmentation:
and performing word segmentation on each piece of text information issued by each user, taking each word obtained by word segmentation as a transaction, and obtaining a transaction set T according to all the text information issued by the user.
S302: acquiring a frequent 1 item set:
traversing all items in the transaction set T and calculating the support degree of all items to form an item set C of 11Filtering out the item set which does not meet the condition according to the preset minimum support degree of the 1 item set to obtain the frequent 1 item set L1In the present embodiment, the minimum support degree is set to 2 for 1 item set. Let the number of terms parameter k equal to 1.
S303: generating a frequent k +1 term set:
set L of frequent k itemskConnecting with itself (the inner item sets are mutually merged) to obtain a k +1 item set Ck+1Filtering out the item set which does not meet the condition according to the preset minimum support degree of the k +1 item set to obtain a frequent k +1 item set Lk+1
S304: judging whether L isk+1Null, if null, then state all k +1 term sets C currentlyk+1If the minimum support degree is not met, the item set generation is finished, and the step S306 is entered, otherwise, the step S305 is entered.
S305: let k be k +1, return to step S303.
S306: determining text information characteristics:
and obtaining frequent items corresponding to the text published by the current user, and obtaining the support degree count corresponding to each frequent item.
Punctuation features:
and (3) statistically obtaining the proportion of the use times of each punctuation mark in the total punctuation mark number from the text information issued by the user i in the social network A and the user j in the social network B to form punctuation mark vectors, and calculating the similarity between the two vectors to be the punctuation mark similarity.
Time status stamp feature:
dividing each day into G time periods, counting to obtain the average dynamic number of each user in each time period in a preset date, and calculating the similarity of the time state stamps of the user i in the social network A and the user j in the social network B by adopting the following formula:
wherein, Vi A(g)、The average dynamic numbers of the user i in the social network A and the user j in the social network B in the g-th time period are respectively represented, and | l represents the absolute value.
S104: and (3) weight distribution based on two levels of information entropy:
in order to fuse all the similarities obtained above, a weight needs to be assigned to each attribute. In order to make the obtained weight more reasonable, the invention provides a weight distribution method based on two-level information entropy, which comprises the following specific steps:
integrating the data of M common attributes of all users extracted from the archive information and the data of N characteristic attributes of all users extracted from the behavior information into H attribute data, wherein H is M + N, then determining the weight of the H attributes by adopting an entropy weight method, and taking the weight as a primary weight z of each attributeh,h=1,2,…,H。
The basic idea of the entropy weight method is that the larger the degree of difference of the indexes is, the larger the weight difference is correspondingly. Therefore, the problem of weight assignment in user identification can be solved by using the concept of information entropy. The specific method of the entropy weight method may be set as required, and the specific method of the entropy weight method in this embodiment is as follows: firstly, calculating to obtain each genusSexual information entropy EhThen, the posterior probability p (y) of each attribute of the user is obtainedx| x), a primary weight corresponding to the attribute can be calculated: z is a radical ofh=p(yx|x)×Eh. By adopting the method, the influence of each attribute on the user identity identification performance can be more accurately acquired.
The output of Softmax characterizes the relative probability between the different classes, so the present invention utilizes the concept of Softmax to perform a secondary weight assignment on the attributes of the user. After the first-level weight of the user attribute is obtained, the weight values of all the attributes are combined into an array Z ═ (Z ═1,z2,…,zH) As input, the contribution probability normalization value P of each attribute is obtained by utilizing the concept of SoftmaxhThe calculation formula is as follows:
wherein, PhThe contribution probability normalization value of the h attribute is represented, and the value range of the contribution probability normalization value is [0,1 ]]And ishPh1, e represents a natural constant.
The concept of information entropy is utilized again, and a variant weight R is constructedhThe calculation formula is as follows:
E(Ph)=-PhlogPh
finally, user attribute weight distribution based on two-level information entropy is obtained, namely attribute weight WhThe calculation formula of (2) is as follows:
by carrying out weight distribution on each attribute item of the user and calculating the variance formed between different attribute weight distribution methods, it can be obviously seen that the method of the invention has more distinctiveness.
S105: and (3) similarity fusion:
the attribute weight W obtained in the step S4 is adoptedhCalculating the weighted sum of the similarity of the H attributes of each user i in the social network A and each user j in the social network B as the matching score of each user i in the social network A and each user j in the social network Bi,j
Wherein, WhRepresents the weight of the H-th attribute of all H attributes,and the similarity of the h-th attribute corresponding to the user i in the social network A and the user j in the social network B is represented.
Match scorei,jIs used to determine whether the physical users behind the two social accounts have identity.
S106: matching users:
score according to matching scores of each user i in the social network A and each user j in the social network Bi,jAnd matching the users in the two social networks to obtain a user identity recognition result.
In this embodiment, a two-way stable marital matching algorithm is used for user matching, and the specific method is as follows: sequentially selecting users i in the social network A, and collecting users to be matched with the users i in the social network A by lambdaiSet to the set of all users in social network B. From the set of users to be matched lambdaiAnd screening out the user j with the highest matching score with the user i, and matching the user j with the user i if the user j is not matched with other users in the social network A. If the user j is matched with other users i ' in the social network A, if the matching score of the user i and the user j is higher than that of the user i ' and the user j, matching the user i and the user j, deleting the matching result of the user i ', and otherwise, selecting the user j from the user set lambda to be matchediDeletion from the deleted set λ of users to be matchediIn re-screening matching scores with user iThe highest user is scored until a matching user of user i in social network B is determined.
In order to illustrate the technical effects of the invention, the invention was experimentally verified by using a specific example. In the experimental verification, user data on two social networks, namely Facebook and Twitter, are selected for cross-social-network user identity identification, and precision (precision), recall (call), F-measure (F1) and AUC (area UnderCurve) are adopted as evaluation standards.
AUC is the area under the ROC curve. False Positive Rate (FPR) is defined as X-axis and True Positive Rate (TPR) is defined as Y-axis. Because the result of the invention is divided into two categories, namely the same entity user and different entity users, the AUC can also be used for evaluating the quality of the identification result.
Wherein, TP represents positive and actually positive matching pairs, TN represents negative and actually negative matching pairs, FP represents positive and actually negative matching pairs, FN represents negative and actually positive matching pairs.
In order to illustrate the effectiveness of the weight distribution method (TIW) based on the two-level information entropy, the method is compared and analyzed with other two methods, wherein the comparison method comprises an empirical probability-based weight distribution method (EW) and a posterior probability-based weight distribution method (PW). FIG. 4 is a graph comparing accuracy of the weight assignment method and the comparison method according to the present invention. FIG. 5 is a chart comparing recall rates of the weight assignment method and the comparison method according to the present invention. Fig. 6 is a comparison graph of F1 scores in the weight assignment method and the comparison method of the present invention in this embodiment. FIG. 7 is a comparison graph of AUC of the weight assignment method and the comparison method of the present invention in this example. As can be seen from fig. 4 to 7, the evaluation indexes of the aspects of the present invention are superior to those of the other two comparative methods. With the increase of the number of users, the evaluation indexes of the three methods are reduced to a certain extent, because when the number of user accounts is increased, the condition that the accounts are high in similarity but are not users of the same entity occurs. Once this occurs, the final matching result will be negatively affected. The rate of decrease of the present invention is small, whereas the rate of decrease of the other two comparative methods is relatively fast. Compared with other two comparison methods, the method has better performance in the aspect of cross-social-network user identification.
And then comparing a user identity recognition method (TIW-UI) which combines weight distribution based on two-stage information entropy and user matching based on a bidirectional stable marital matching algorithm with a random forest confirmation algorithm (RFCA-SMM) based on stable marital matching and a cross matching method (RCM) based on ranking. Fig. 8 is a comparison chart of four evaluation indexes of the user identification method and two comparison methods according to the present invention. As shown in FIG. 8, the present invention is superior to RFCA-SMM and RCM in terms of accuracy, recall, F1 score and AUC, which also demonstrates the effectiveness of the present invention.
Although illustrative embodiments of the present invention have been described above to facilitate the understanding of the present invention by those skilled in the art, it should be understood that the present invention is not limited to the scope of the embodiments, and various changes may be made apparent to those skilled in the art as long as they are within the spirit and scope of the present invention as defined and defined by the appended claims, and all matters of the invention which utilize the inventive concepts are protected.

Claims (4)

1. A cross-social network user identity recognition method based on two-level information entropy is characterized by comprising the following steps:
s1: respectively crawling the profile information and the behavior information of respective users from the social network A and the social network B, and respectively recording the number N of the users in the two social networksAAnd NB
S2: common attributes are screened out from the profile information attributes of the two social networks, data corresponding to the common attributes are extracted from the profile information of each user, and then the similarity of each common attribute of each user i in the social network A and each common attribute of each user j in the social network B is calculatedi=1,2,…,NA,j=1,2,…,NBM is 1,2, …, M represents the number of common attributes;
s3: extracting preset data of N characteristic attributes from the behavior information of each user, and then calculating the similarity of each characteristic attribute of each user i in the social network A and each characteristic attribute of each user j in the social network B
S4: integrating the data of M common attributes of all users extracted from the archive information and the data of N characteristic attributes of all users extracted from the behavior information into H attribute data, wherein H is M + N, then determining the weight of the H attributes by adopting an entropy weight method, and taking the weight as a primary weight z of each attributeh,h=1,2,…,H;
Calculating contribution probability normalization value P of each attributeh
Construction of variant weight R based on information entropyh
E(Ph)=-PhlogPh
Calculating attribute weight W based on two-stage information entropyh
S5: the attribute weight W obtained in the step S4 is adoptedhCalculating the weighted sum of the similarity of the H attributes of each user i in the social network A and each user j in the social network B as the matching score of each user i in the social network A and each user j in the social network Bi,j
S6: score according to matching scores of each user i in the social network A and each user j in the social network Bi,jAnd matching the users in the two social networks to obtain a user identity recognition result.
2. The method for identifying the identity of the user across the social network based on the two-level entropy of information as claimed in claim 1, wherein the similarity of the common attributesThe calculation method comprises the following steps:
s2.1: firstly, judging whether the m-th common attribute is a preset key attribute, if so, entering a step S2.2, otherwise, entering a step S2.3;
s2.2: judging whether the m-th common attributes of the two users are consistent, if so, determining the similarity of the common attributesOtherwise
S2.3: judging whether the m-th common attribute data is vectorized, if so, entering a step S2.4, otherwise, entering a step S2.5;
s2.4: vectorizing the data of the m-th common attributes of the two users, calculating the cosine similarity between the two vectors, and taking the cosine similarity as the similarity of the m-th common attributes of the two users
S2.5: taking the data of the m-th common attributes of the two users as character strings, then calculating a Dice coefficient between the two character strings, and taking the Dice coefficient as the similarity of the m-th common attributes of the two users
3. The method for identifying the user identity across the social network based on the two-level information entropy of claim 1, wherein the feature attributes in the step S3 include a text information feature, a punctuation mark feature and a state timestamp feature, and the similarity calculation methods thereof respectively are as follows:
for text information features, firstly extracting text information features of each user based on frequent pattern mining to obtain a plurality of frequent items and support degree counts corresponding to the frequent items, and then calculating by adopting the following formula to obtain text information feature similarity of two users
Wherein, F represents a frequent item,support degree meter for respectively representing frequent items F corresponding to user i in social network A and user j in social network BNumber, CFA number of item sets representing frequent items F;
for punctuation mark characteristics, the proportion of the use times of each punctuation mark in the total punctuation mark number is statistically obtained from text information issued by a user i in a social network A and a user j in a social network B to form punctuation mark vectors, and the similarity between the two vectors is calculated to be punctuation mark similarity;
for the time state stamp characteristics, dividing each day into G time periods, counting to obtain the average dynamic number of each user in each time period within a preset date, and calculating the similarity of the time state stamps of the user i in the social network A and the user j in the social network B by adopting the following formula:
wherein, Vi A(g)、Respectively represent the average dynamic numbers of the user i in the social network A and the user j in the social network B in the g-th time period.
4. The method for identifying the user identity across the social network based on the two-level information entropy of claim 1, wherein the user matching in step S7 adopts a bidirectional stable marital matching algorithm, and the specific method is as follows: sequentially selecting users i in the social network A, and collecting users to be matched with the users i in the social network A by lambdaiSet to the set of all users in social network B. From the set of users to be matched lambdaiScreening out a user j with the highest matching score with the user i, and matching the user j with the user i if the user j is not matched with other users in the social network A; if the user j is matched with other users i ' in the social network A, if the matching score of the user i and the user j is higher than that of the user i ' and the user j, matching the user i and the user j, deleting the matching result of the user i ', and otherwise, selecting the user j from the user set lambda to be matchediDeletion from deletionDivided user set lambda to be matchediRe-screening the user with the highest matching score with the user i until determining the matching user of the user i in the social network B.
CN201910865901.XA 2019-09-09 2019-09-09 Cross-social network user identity recognition method based on two-stage information entropy Active CN110598129B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201910865901.XA CN110598129B (en) 2019-09-09 2019-09-09 Cross-social network user identity recognition method based on two-stage information entropy

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201910865901.XA CN110598129B (en) 2019-09-09 2019-09-09 Cross-social network user identity recognition method based on two-stage information entropy

Publications (2)

Publication Number Publication Date
CN110598129A true CN110598129A (en) 2019-12-20
CN110598129B CN110598129B (en) 2022-10-18

Family

ID=68859245

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201910865901.XA Active CN110598129B (en) 2019-09-09 2019-09-09 Cross-social network user identity recognition method based on two-stage information entropy

Country Status (1)

Country Link
CN (1) CN110598129B (en)

Cited By (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111242218A (en) * 2020-01-13 2020-06-05 河南科技大学 Cross-social network user identity recognition method fusing user multi-attribute information
CN111931023A (en) * 2020-07-01 2020-11-13 西北工业大学 Community structure identification method and device based on network embedding
CN114610958A (en) * 2022-05-10 2022-06-10 上海飞旗网络技术股份有限公司 Processing method and device of transmission resources and electronic equipment
CN115048563A (en) * 2022-08-15 2022-09-13 中国电子科技集团公司第三十研究所 Cross-social-network user identity matching method, medium and device based on entropy weight method

Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20110178943A1 (en) * 2009-12-17 2011-07-21 New Jersey Institute Of Technology Systems and Methods For Anonymity Protection
CN108846422A (en) * 2018-05-28 2018-11-20 中国人民公安大学 Account relating method and system across social networks
CN108897789A (en) * 2018-06-11 2018-11-27 西南科技大学 A kind of cross-platform social network user personal identification method

Patent Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20110178943A1 (en) * 2009-12-17 2011-07-21 New Jersey Institute Of Technology Systems and Methods For Anonymity Protection
CN108846422A (en) * 2018-05-28 2018-11-20 中国人民公安大学 Account relating method and system across social networks
CN108897789A (en) * 2018-06-11 2018-11-27 西南科技大学 A kind of cross-platform social network user personal identification method

Non-Patent Citations (4)

* Cited by examiner, † Cited by third party
Title
KAIKAI DENG等: "A User Identification Algorithm Based on User Behavior Analysis in Social Networks", 《IEEE ACCESS》 *
XING XIAO等: "protecting the squeezing of a two-level system be detuning in non-Markovian environments", 《PHYSICA SCRIPTA》 *
吴铮等: "基于信息熵的跨社交网络用户身份识别方法", 《计算机应用》 *
王大洋等: "基于综合权重可变模糊集的最严格水资源管理评价", 《人民珠江》 *

Cited By (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111242218A (en) * 2020-01-13 2020-06-05 河南科技大学 Cross-social network user identity recognition method fusing user multi-attribute information
CN111242218B (en) * 2020-01-13 2023-04-07 河南科技大学 Cross-social network user identity recognition method fusing user multi-attribute information
CN111931023A (en) * 2020-07-01 2020-11-13 西北工业大学 Community structure identification method and device based on network embedding
CN114610958A (en) * 2022-05-10 2022-06-10 上海飞旗网络技术股份有限公司 Processing method and device of transmission resources and electronic equipment
CN114610958B (en) * 2022-05-10 2022-08-30 上海飞旗网络技术股份有限公司 Processing method and device of transmission resources and electronic equipment
CN115048563A (en) * 2022-08-15 2022-09-13 中国电子科技集团公司第三十研究所 Cross-social-network user identity matching method, medium and device based on entropy weight method

Also Published As

Publication number Publication date
CN110598129B (en) 2022-10-18

Similar Documents

Publication Publication Date Title
CN110598129B (en) Cross-social network user identity recognition method based on two-stage information entropy
CN103793484B (en) The fraud identifying system based on machine learning in classification information website
CN108897789B (en) Cross-platform social network user identity identification method
CN108898479B (en) Credit evaluation model construction method and device
CN107066616A (en) Method, device and electronic equipment for account processing
CN108833139B (en) OSSEC alarm data aggregation method based on category attribute division
CN109034194A (en) Transaction swindling behavior depth detection method based on feature differentiation
CN110706095B (en) Target node key information filling method and system based on associated network
CN111242218B (en) Cross-social network user identity recognition method fusing user multi-attribute information
CN115577152B (en) Online book borrowing management system based on data analysis
CN113298373A (en) Financial risk assessment method, device, storage medium and equipment
CN111428113A (en) Network public opinion guiding effect prediction method based on fuzzy comprehensive evaluation
Gokulkumari et al. Analyze the political preference of a common man by using data mining and machine learning
CN104850868A (en) Customer segmentation method based on k-means and neural network cluster
CN110689427A (en) Consumption stage default probability model based on survival analysis
CN110598126B (en) Cross-social network user identity recognition method based on behavior habits
TWI254880B (en) Method for classifying electronic document analysis
CN110910235A (en) Method for detecting abnormal behavior in credit based on user relationship network
CN112819499A (en) Information transmission method, information transmission device, server and storage medium
CN115587828A (en) Interpretable method of telecommunication fraud scene based on Shap value
CN110147497B (en) Individual content recommendation method for teenager group
CN104090950B (en) Data flow clustering method integrating cluster existence strength
CN113706258A (en) Product recommendation method, device, equipment and storage medium based on combined model
CN109308565B (en) Crowd performance grade identification method and device, storage medium and computer equipment
CN113011875A (en) Text processing method and device, computer equipment and storage medium

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant