CN110598129B - Cross-social network user identity recognition method based on two-stage information entropy - Google Patents

Cross-social network user identity recognition method based on two-stage information entropy Download PDF

Info

Publication number
CN110598129B
CN110598129B CN201910865901.XA CN201910865901A CN110598129B CN 110598129 B CN110598129 B CN 110598129B CN 201910865901 A CN201910865901 A CN 201910865901A CN 110598129 B CN110598129 B CN 110598129B
Authority
CN
China
Prior art keywords
user
social network
users
attributes
similarity
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN201910865901.XA
Other languages
Chinese (zh)
Other versions
CN110598129A (en
Inventor
邢玲
邓凯凯
高建平
吴红海
谢萍
张明川
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Henan University of Science and Technology
Original Assignee
Henan University of Science and Technology
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Henan University of Science and Technology filed Critical Henan University of Science and Technology
Priority to CN201910865901.XA priority Critical patent/CN110598129B/en
Publication of CN110598129A publication Critical patent/CN110598129A/en
Application granted granted Critical
Publication of CN110598129B publication Critical patent/CN110598129B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/90Details of database functions independent of the retrieved data types
    • G06F16/95Retrieval from the web
    • G06F16/953Querying, e.g. by the use of web search engines
    • G06F16/9536Search customisation based on social or collaborative filtering
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06QINFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
    • G06Q50/00Systems or methods specially adapted for specific business sectors, e.g. utilities or tourism
    • G06Q50/01Social networking

Abstract

The invention discloses a cross-social network user identity identification method based on two-level information entropy, which comprises the steps of crawling archive information and behavior information of respective users from two social networks, screening common attributes from archive information attributes of the two social networks, extracting data corresponding to the common attributes from archive information of each user, then calculating similarity of the common attributes of the users in the two social networks, extracting characteristic attributes of behaviors from the behavior information of each user, calculating similarity of the behavior attributes of the users in the two social networks, carrying out weight distribution based on the two-level information entropy, weighting each attribute to obtain matching scores of the two users, carrying out user matching according to the matching scores, and obtaining a user identity identification result. The method for distributing the weight based on the two-level information entropy solves the problem of unbalance of multiple attributes of the user in the aspect of weight distribution, and improves the user identity recognition performance.

Description

Cross-social network user identity recognition method based on two-stage information entropy
Technical Field
The invention belongs to the technical field of data mining, and particularly relates to a cross-social network user identity identification method based on two-level information entropy.
Background
Social networks provide people with a rich social service. According to statistics, 42% of users have multiple social network accounts at the same time. Because different social networks have respective unique social modes and bring different social services to users, rich social user information is generated. However, the individual social network accounts are isolated and have no direct connection, so that the social information generated by the user account is distributed over multiple social networks. The identification of the user identity across the social networks refers to identifying virtual accounts belonging to the same real user in different social networks. The technical solution can provide comprehensive user information for network recommendation, user modeling and user behavior analysis, and realize full mining of the multisource social network big data.
The core idea of the existing related research is to utilize user profile information, network topology information and user behavior information to calculate and analyze whether a user account matching pair is the same user. Cross-social network user identification consists essentially of three parts: user data extraction, data similarity calculation and account matching. The user data is extracted by mainly adopting a relatively efficient crawler technology to crawl, clean and store the data. Secondly, the similarity between the user data is calculated by using the extracted data and the similarity function, and the greater the similarity is, the greater the probability that different virtual accounts belong to the same user is. And finally, carrying out account matching by adopting a related matching strategy according to the calculated similarity.
The existing cross-social network user identity identification method based on user profile information has the possibility of user data forgery, and people pay more and more attention to privacy protection at present. Therefore, the recognition effect of this type of method is not very good. And secondly, the identification method based on the network topological structure is adopted, and although the friend relationship of the user is easy to obtain, the connection of the friend relationship has sparsity. And finally, the method is an identification method based on user behavior data, the method utilizes the content issued by the user to identify the user identity, and compared with the two methods, the method breaks the limit of the two methods. In addition, the existing research also utilizes the combination of user profile information and network structure to identify, but the method is still limited by the above conditions, so that a good identification effect cannot be achieved.
Disclosure of Invention
The invention aims to overcome the defects of the prior art, provides a cross-social network user identity recognition method based on two-level information entropy, and provides a weight distribution method based on two-level information entropy, so that the problem of unbalance of multiple attributes of a user in the aspect of weight distribution is solved, and the user identity recognition performance is improved.
In order to achieve the purpose, the cross-social network user identity identification method based on the two-level information entropy comprises the following steps:
s1: respectively crawling the profile information and the behavior information of respective users from the social network A and the social network B, and respectively recording the number N of the users in the two social networks A And N B
S2: common attributes are screened out from the profile information attributes of the two social networks, data corresponding to the common attributes are extracted from the profile information of each user, and then the similarity of each common attribute of each user i in the social network A and each common attribute of each user j in the social network B is calculated
Figure GDA0003777788330000021
i=1,2,…,N A ,j=1,2,…,N B M =1,2, \8230;, M, M represents the number of common attributes;
s3: extracting preset data of N characteristic attributes from the behavior information of each user, and then calculating the similarity of each characteristic attribute of each user i in the social network A and each characteristic attribute of each user j in the social network B
Figure GDA0003777788330000022
n=1,2,…,N;
S4: integrating the data of M common attributes of all users extracted from the archive information and the data of N characteristic attributes of all users extracted from the behavior information into the data of H attributes, H = M + N, and then determining the weights of the H attributes by adopting an entropy weight method to serve as primary weights z of all the attributes h ,h=1,2,…,H;
Calculating contribution probability normalization value P of each attribute h
Figure GDA0003777788330000023
Construction of variant weights R based on information entropy h
Figure GDA0003777788330000024
E(P h )=-P h logP h
Calculating attribute weight W based on two-stage information entropy h
Figure GDA0003777788330000025
S5: adopting the attribute weight W obtained in the step S4 h Calculating the weighted sum of the similarity of the H attributes of each user i in the social network A and each user j in the social network B as the matching score of each user i in the social network A and each user j in the social network B i,j
S6: score according to matching scores of each user i in the social network A and each user j in the social network B i,j And matching the users in the two social networks to obtain a user identity recognition result.
The invention discloses a cross-social network user identity recognition method based on two-level information entropy, which comprises the steps of crawling archive information and behavior information of respective users from two social networks, screening common attributes from the archive information attributes of the two social networks, extracting data corresponding to the common attributes from the archive information of each user, calculating the similarity of the common attributes of the users in the two social networks, extracting characteristic attributes of behaviors from the behavior information of each user, calculating the similarity of the behavior attributes of the users in the two social networks, performing weight distribution based on the two-level information entropy, weighting each attribute to obtain matching scores of the two users, and performing user matching according to the matching scores to obtain a user identity recognition result.
The invention integrates two types of information which are most relevant to the user, namely user file information and user behavior information, so that the calculated similarity is more accurate, weight distribution is carried out based on two-stage information entropy, the problem of unbalance of multiple attributes of the user in the aspect of weight distribution is solved, the accuracy of user matching scoring can be improved, and the user identity identification performance is improved.
Drawings
FIG. 1 is a flowchart of an embodiment of a cross-social-network user identity recognition method based on two levels of information entropy;
FIG. 2 is a flowchart of a method for calculating the similarity of common attributes in the present embodiment;
FIG. 3 is a flowchart of a text information feature extraction calculation method based on frequent pattern mining in this embodiment;
FIG. 4 is a graph comparing accuracy of the weight assignment method and the comparison method according to the present invention;
FIG. 5 is a chart comparing recall rates of the weight assignment method and the comparison method according to the present invention;
FIG. 6 is a comparison graph of F1 scores of the weight assignment method and the comparison method of the present invention in this embodiment;
FIG. 7 is a comparison graph of AUC of the weight assignment method and the comparison method of the present invention in this example;
fig. 8 is a comparison chart of four evaluation indexes of the user identification method and two comparison methods according to the present invention.
Detailed Description
The following description of the embodiments of the present invention is provided in order to better understand the present invention for those skilled in the art with reference to the accompanying drawings. It is to be expressly noted that in the following description, a detailed description of known functions and designs will be omitted when it may obscure the subject matter of the present invention.
Examples
FIG. 1 is a flowchart of a specific embodiment of a cross-social-network user identity recognition method based on two levels of information entropy. As shown in FIG. 1, the method for identifying the identity of the user across the social network based on the two-level information entropy comprises the following specific steps:
s101: acquiring user data:
respectively crawling the profile information and the behavior information of respective users from the social network A and the social network B, and respectively recording the number N of the users in the two social networks A And N B . Generally speaking, can make N A =N B
S102: calculating the similarity of the user profile information:
screening common attributes from the profile information attributes of the two social networks, extracting data corresponding to the common attributes from the profile information of each user, and then calculating the similarity of each common attribute of each user i in the social network A and each common attribute of each user j in the social network B
Figure GDA0003777788330000041
i=1,2,…,N A ,j=1,2,…,N B M =1,2, \8230;, M represents the number of common attributes.
Since the user profile information includes a plurality of common attributes, for example, the user profile information includes 17 common attributes in this embodiment, and the data format corresponding to each common attribute may be different, it is necessary to select different ways to calculate the similarity of the common attributes according to the actual situation. Fig. 2 is a flowchart of a method for calculating the similarity of common attributes in this embodiment. As shown in fig. 2, the specific steps of the common attribute similarity in this embodiment include:
s201: firstly, judging whether the m-th common attribute is a preset key attribute. The key attribute refers to an attribute whose data must be consistent to determine similarity between users, for example, the sex information of two users must be "male" or "female" at the same time to indicate that the two users are similar. If the attribute is the key attribute, the process proceeds to step S202, otherwise, the process proceeds to step S203.
S202: determining similarity based on consistency:
judging whether the m-th common attributes of the two users are consistent, if so, judging the similarity of the common attributes
Figure GDA0003777788330000042
Otherwise
Figure GDA0003777788330000043
S203: and judging whether the m-th common attribute data is vectorized, if so, entering the step S204, otherwise, entering the step S205.
S204: determining similarity based on cosine similarity:
vectorizing the data of the m-th common attributes of the two users, calculating the cosine similarity between the two vectors, and taking the cosine similarity as the similarity of the m-th common attributes of the two users
Figure GDA0003777788330000051
The cosine similarity is calculated as follows:
Figure GDA0003777788330000052
where A and B represent a vector formed by two data, A q 、B q Representing the qth dimension of vectors a and B, Q =1,2, \ 8230;, respectively, Q representing the vector dimensions.
S205: determining similarity based on the Dice coefficient:
taking the data of the m-th common attributes of the two users as character strings, then calculating a Dice coefficient between the two character strings, and taking the Dice coefficient as the similarity of the m-th common attributes of the two users
Figure GDA0003777788330000053
The calculation formula of the Dice coefficient is as follows:
Figure GDA0003777788330000054
where a and b represent two character strings, comm (a &' b) represents the number of identical characters in a and b, and len () represents the length of the character string.
S103: calculating the similarity of user behaviors:
from each to eachExtracting preset data of N characteristic attributes from the behavior information of each user, and then calculating the similarity of each characteristic attribute of each user i in the social network A and each characteristic attribute of each user j in the social network B
Figure GDA0003777788330000055
n=1,2,…,N。
As for the feature attributes, the category of the feature attributes may be determined according to actual needs, and in the embodiment of the present invention, three feature attributes are adopted: text information features, punctuation features, and state timestamp features. The similarity calculation methods for the three behavior feature attributes are described below.
Text information feature:
firstly, extracting text information features of each user based on frequent pattern mining to obtain a plurality of frequent items and support degree counts corresponding to the frequent items, and then calculating by adopting the following formula to obtain text information feature similarity of the two users
Figure GDA0003777788330000056
Figure GDA0003777788330000057
Wherein, F represents a frequent item,
Figure GDA0003777788330000061
respectively representing the support degree counts of frequent items F corresponding to the user i in the social network A and the user j in the social network B, C F The number of item sets representing frequent items F. Adding a "1" to the formula is to avoid high frequency terms.
Fig. 3 is a flowchart of a text information feature extraction calculation method based on frequent pattern mining in this embodiment. As shown in fig. 3, the text information feature extraction method based on frequent pattern mining in this embodiment includes the specific steps of:
s301: text word segmentation:
and performing word segmentation on each piece of text information issued by each user, taking each word obtained by word segmentation as a transaction, and obtaining a transaction set T according to all the text information issued by the user.
S302: acquiring a frequent 1 item set:
traversing all items in the transaction set T and calculating the support degree of all items to form 1 item set C 1 Filtering out the item set which does not meet the condition according to the preset minimum support degree of the 1 item set to obtain the frequent 1 item set L 1 In the present embodiment, the minimum support degree is set to 2 for 1 item set. Let the number of terms parameter k =1.
S303: generating a frequent k +1 term set:
set L of frequent k items k Connecting with itself (the inner item sets are mutually merged) to obtain a k +1 item set C k+1 Filtering out the item set which does not meet the condition according to the preset minimum support degree of the k +1 item set to obtain a frequent k +1 item set L k+1
S304: judging whether L is k+1 Null, if null, then state all current k +1 term sets C k+1 If the minimum support degree is not met, the item set generation is finished, and the step S306 is entered, otherwise, the step S305 is entered.
S305: let k = k +1, return to step S303.
S306: determining text information characteristics:
and obtaining frequent items corresponding to the text published by the current user, and obtaining the support degree count corresponding to each frequent item.
Punctuation features:
statistically obtaining the proportion of the using times of each punctuation mark to the total punctuation mark number from the text information released by the user i in the social network A and the user j in the social network B to form punctuation mark vectors, and calculating the similarity between the two vectors, namely the similarity of the punctuation marks
Figure GDA0003777788330000062
Time status stamp feature:
dividing each day into G time periods, counting to obtain the average dynamic number of each user in each time period within a preset date, and calculating the similarity of the time state stamps of the user i in the social network A and the user j in the social network B by adopting the following formula:
Figure GDA0003777788330000071
wherein the content of the first and second substances,
Figure GDA0003777788330000072
the average dynamic numbers of the user i in the social network a and the user j in the social network B in the g-th time period are respectively represented, and | | represents the absolute value.
S104: and (3) weight distribution based on two levels of information entropy:
in order to fuse all the similarities obtained above, a weight needs to be assigned to each attribute. In order to make the obtained weight more reasonable, the invention provides a weight distribution method based on two-level information entropy, which comprises the following specific steps:
integrating the data of M common attributes of all users extracted from the archive information and the data of N characteristic attributes of all users extracted from the behavior information into the data of H attributes, H = M + N, then determining the weight of the H attributes by adopting an entropy weight method, and taking the weight as a primary weight z of each attribute h ,h=1,2,…,H。
The basic idea of the entropy weight method is that the larger the degree of difference of the indexes is, the larger the weight difference is correspondingly. Therefore, the problem of weight assignment in user identification can be solved by using the concept of information entropy. The specific method of the entropy weight method may be set as required, and the specific method of the entropy weight method in this embodiment is as follows: firstly, the information entropy E of each attribute is obtained through calculation h Then, the posterior probability p (y) of each attribute of the user is obtained x | x), a primary weight corresponding to the attribute can be calculated: z is a radical of formula h =p(y x |x)×E h . By adopting the method, the influence of each attribute on the user identity identification performance can be more accurately acquired.
The output of Softmax characterizes the relative probability between the different classes, so the present invention takes advantage of the concept of Softmax to perform a secondary weight assignment on the attributes of the user. Obtaining a userAfter the first-level weighting of the attributes, combining the weighted values of all the attributes into an array Z = (Z) 1 ,z 2 ,…,z H ) As input, the contribution probability normalization value P of each attribute is obtained by utilizing the concept of Softmax h The calculation formula is as follows:
Figure GDA0003777788330000073
wherein, P h The contribution probability normalization value of the h attribute is represented, and the value range of the value is [0,1 ]]And is h P h =1,e represents a natural constant.
The concept of information entropy is utilized again, and a variation weight R is constructed h The calculation formula is as follows:
Figure GDA0003777788330000074
E(P h )=-P h logP h
finally, user attribute weight distribution based on two-level information entropy is obtained, namely attribute weight W h The calculation formula of (2) is as follows:
Figure GDA0003777788330000081
by carrying out weight distribution on each attribute item of the user and calculating the variance formed between different attribute weight distribution methods, the method of the invention has more distinctiveness obviously.
S105: and (3) similarity fusion:
adopting the attribute weight W obtained in the step S4 h Calculating the weighted sum of the similarity of the H attributes of each user i in the social network A and each user j in the social network B as the matching score of each user i in the social network A and each user j in the social network B i,j
Figure GDA0003777788330000082
Wherein, W h Represents the weight of the H-th attribute of all H attributes,
Figure GDA0003777788330000083
and the similarity of the h-th attributes corresponding to the user i in the social network A and the user j in the social network B is represented.
Match score i,j Is used to determine whether the physical users behind the two social accounts have identity.
S106: user matching:
score according to matching scores of each user i in the social network A and each user j in the social network B i,j And matching the users in the two social networks to obtain a user identity recognition result.
In this embodiment, a two-way stable marital matching algorithm is used for user matching, and the specific method is as follows: sequentially selecting users i in the social network A, and collecting users to be matched with the users i in the social network A by lambda i Set to the set of all users in social network B. From the set of users to be matched lambda i And screening out a user j with the highest matching score with the user i, and matching the user j with the user i if the user j is not matched with other users in the social network A. If the user j is matched with other users i ' in the social network A, if the matching score of the user i and the user j is higher than that of the user i ' and the user j, matching the user i with the user j, deleting the matching result of the user i ', and otherwise, selecting the user j from the user set lambda to be matched i Deleting from the deleted user set lambda to be matched i Re-screening the user with the highest matching score with the user i until determining the matching user of the user i in the social network B.
In order to illustrate the technical effects of the invention, the invention was experimentally verified by using a specific example. In the experimental verification, user data on two social networks, namely Facebook and Twitter, are selected for cross-social-network user identity identification, and accuracy (precision), recall (recall), F-measure (F1) and AUC (Area Under Current) are adopted as evaluation criteria.
Figure GDA0003777788330000091
Figure GDA0003777788330000092
Figure GDA0003777788330000093
AUC is the area under the ROC curve. False Positive Rate (FPR) is defined as X-axis and True Positive Rate (TPR) is defined as Y-axis. Because the result of the invention is divided into two categories, namely the same entity user and different entity users, the AUC can also be used for evaluating the quality of the identification result.
Figure GDA0003777788330000094
Figure GDA0003777788330000095
Wherein, TP represents positive and actually positive matching pairs, TN represents negative and actually negative matching pairs, FP represents positive and actually negative matching pairs, FN represents negative and actually positive matching pairs.
In order to illustrate the effectiveness of the weight distribution method (TIW) based on the two-level information entropy, the method is compared and analyzed with other two methods, wherein the comparison method comprises an empirical probability-based weight distribution method (EW) and a posterior probability-based weight distribution method (PW). Fig. 4 is a graph comparing accuracy of the weight assignment method and the comparison method of the present invention in this embodiment. FIG. 5 is a chart comparing recall rates of the weight assignment method and the comparison method according to the present invention. Fig. 6 is a comparison graph of F1 scores of the weight assignment method and the comparison method of the present invention in this embodiment. FIG. 7 is a graph comparing AUC of the weight assignment method and the comparison method of the present invention in this embodiment. As can be seen from fig. 4 to 7, the evaluation indexes of the aspects of the present invention are superior to those of the other two comparative methods. With the increase of the number of users, the evaluation indexes of the three methods are reduced to a certain extent, because when the number of user accounts is increased, the condition that the accounts are high in similarity but are not users of the same entity occurs. Once this occurs, the final matching result will be negatively affected. The rate of decrease of the present invention is small, whereas the rate of decrease of the other two comparative methods is relatively fast. Compared with other two comparison methods, the method has better performance in the aspect of cross-social-network user identification.
And then comparing a user identity recognition method (TIW-UI) which combines weight distribution based on two-stage information entropy and user matching based on a bidirectional stable marital matching algorithm with a random forest confirmation algorithm (RFCA-SMM) based on stable marital matching and a cross matching method (RCM) based on ranking. Fig. 8 is a comparison chart of four evaluation indexes of the user identification method and two comparison methods according to the present invention. As shown in FIG. 8, the present invention is superior to RFCA-SMM and RCM in terms of accuracy, recall, F1 score and AUC, which also demonstrates the effectiveness of the present invention.
Although the illustrative embodiments of the present invention have been described in order to facilitate those skilled in the art to understand the present invention, it is to be understood that the present invention is not limited to the scope of the embodiments, and that various changes may be made apparent to those skilled in the art as long as they are within the spirit and scope of the present invention as defined and defined in the appended claims, and all matters of the invention using the inventive concepts are protected.

Claims (4)

1. A cross-social network user identity recognition method based on two-level information entropy is characterized by comprising the following steps:
s1: crawling profile information and rows of respective users from social network A and social network B respectivelyFor information, the number of users in two social networks is respectively recorded as N A And N B
S2: common attributes are screened out from the profile information attributes of the two social networks, data corresponding to the common attributes are extracted from the profile information of each user, and then the similarity of each common attribute of each user i in the social network A and each common attribute of each user j in the social network B is calculated
Figure FDA0003777788320000011
Figure FDA0003777788320000012
M represents the number of common attributes;
s3: extracting preset data of N characteristic attributes from the behavior information of each user, and then calculating the similarity of each characteristic attribute of each user i in the social network A and each characteristic attribute of each user j in the social network B
Figure FDA0003777788320000013
S4: integrating the data of M common attributes of all users extracted from the archive information and the data of N characteristic attributes of all users extracted from the behavior information into the data of H attributes, H = M + N, then determining the weight of the H attributes by adopting an entropy weight method, and taking the weight as a primary weight z of each attribute h ,h=1,2,…,H;
Calculating contribution probability normalization value P of each attribute h
Figure FDA0003777788320000014
Wherein e represents a natural constant;
construction of variant weight R based on information entropy h
Figure FDA0003777788320000015
E(P h )=-P h logP h
Calculating attribute weight W based on two-stage information entropy h
Figure FDA0003777788320000016
S5: adopting the attribute weight W obtained in the step S4 h Calculating the weighted sum of the H attribute similarity of each user i in the social network A and each user j in the social network B as the matching score of each user i in the social network A and each user j in the social network B i,j
S6: score according to matching scores of each user i in the social network A and each user j in the social network B i,j And matching the users in the two social networks to obtain a user identity recognition result.
2. The method for identifying the identity of a user across social networks based on two levels of entropy of information according to claim 1, wherein the similarity of the common attributes
Figure FDA0003777788320000021
The calculation method comprises the following steps:
s2.1: firstly, judging whether the mth common attribute is a preset key attribute, if so, entering a step S2.2, otherwise, entering a step S2.3;
s2.2: judging whether the m-th common attributes of the two users are consistent, if so, determining the similarity of the common attributes
Figure FDA0003777788320000022
Otherwise
Figure FDA0003777788320000023
S2.3: judging whether the m-th common attribute data is vectorized, if so, entering a step S2.4, and otherwise, entering a step S2.5;
s2.4: vectorizing the data of the m-th common attributes of the two users, calculating the cosine similarity between the two obtained vectors, and taking the cosine similarity as the similarity of the m-th common attributes of the two users
Figure FDA0003777788320000024
S2.5: taking the data of the m-th common attributes of the two users as character strings, then calculating a Dice coefficient between the two character strings, and taking the Dice coefficient as the similarity of the m-th common attributes of the two users
Figure FDA0003777788320000025
3. The method for identifying the user identity across the social network based on the two-level information entropy of claim 1, wherein the feature attributes in the step S3 include a text information feature, a punctuation mark feature and a state timestamp feature, and the similarity calculation methods respectively include:
for text information features, firstly extracting text information features of each user based on frequent pattern mining to obtain a plurality of frequent items and support degree counts corresponding to the frequent items, and then calculating by adopting the following formula to obtain text information feature similarity of two users
Figure FDA0003777788320000026
Figure FDA0003777788320000027
Wherein, F represents a frequent item,
Figure FDA0003777788320000028
respectively representing the support degree counts of the frequent items F corresponding to the user i in the social network A and the user j in the social network B, C F A number of sets of items representing frequent items F;
for punctuation mark characteristics, the proportion of the use times of each punctuation mark in the total punctuation mark number is statistically obtained from text information issued by a user i in a social network A and a user j in a social network B to form punctuation mark vectors, and the similarity between the two vectors is calculated to be the punctuation mark similarity
Figure FDA0003777788320000029
For the time state stamp characteristics, dividing each day into G time periods, counting to obtain the average dynamic number of each user in each time period within a preset date, and calculating the similarity of the time state stamps of the user i in the social network A and the user j in the social network B by adopting the following formula:
Figure FDA0003777788320000031
wherein, V i A (g)、V j B (g) Respectively represent the average dynamic numbers of the user i in the social network A and the user j in the social network B in the g-th time period.
4. The cross-social-network user identity recognition method based on two-level information entropy of claim 1, wherein a bidirectional stable marital matching algorithm is adopted for user matching, and the specific method comprises the following steps: sequentially selecting users i in the social network A, and collecting users to be matched with users lambda i Setting as a set of all users in social network B; from the set of users to be matched lambda i Screening out a user j with the highest matching score with the user i, and matching the user j with the user i if the user j is not matched with other users in the social network A; if the user j is matched with other users i ' in the social network A, if the matching score of the user i and the user j is higher than that of the user i ' and the user j, matching the user i with the user j, deleting the matching result of the user i ', and otherwise, selecting the user j from the user set lambda to be matched i Deletion from the deleted set λ of users to be matched i Re-screening the user with the highest matching score with the user i until determining the matching user of the user i in the social network B.
CN201910865901.XA 2019-09-09 2019-09-09 Cross-social network user identity recognition method based on two-stage information entropy Active CN110598129B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201910865901.XA CN110598129B (en) 2019-09-09 2019-09-09 Cross-social network user identity recognition method based on two-stage information entropy

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201910865901.XA CN110598129B (en) 2019-09-09 2019-09-09 Cross-social network user identity recognition method based on two-stage information entropy

Publications (2)

Publication Number Publication Date
CN110598129A CN110598129A (en) 2019-12-20
CN110598129B true CN110598129B (en) 2022-10-18

Family

ID=68859245

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201910865901.XA Active CN110598129B (en) 2019-09-09 2019-09-09 Cross-social network user identity recognition method based on two-stage information entropy

Country Status (1)

Country Link
CN (1) CN110598129B (en)

Families Citing this family (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111242218B (en) * 2020-01-13 2023-04-07 河南科技大学 Cross-social network user identity recognition method fusing user multi-attribute information
CN111931023B (en) * 2020-07-01 2022-03-01 西北工业大学 Community structure identification method and device based on network embedding
CN114610958B (en) * 2022-05-10 2022-08-30 上海飞旗网络技术股份有限公司 Processing method and device of transmission resources and electronic equipment
CN115048563A (en) * 2022-08-15 2022-09-13 中国电子科技集团公司第三十研究所 Cross-social-network user identity matching method, medium and device based on entropy weight method

Family Cites Families (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20110178943A1 (en) * 2009-12-17 2011-07-21 New Jersey Institute Of Technology Systems and Methods For Anonymity Protection
CN108846422B (en) * 2018-05-28 2021-08-31 中国人民公安大学 Account number association method and system across social networks
CN108897789B (en) * 2018-06-11 2022-07-26 西南科技大学 Cross-platform social network user identity identification method

Non-Patent Citations (2)

* Cited by examiner, † Cited by third party
Title
protecting the squeezing of a two-level system be detuning in non-Markovian environments;Xing Xiao等;《Physica Scripta》;20110920;第84卷(第04期);第1-6页 *
基于综合权重可变模糊集的最严格水资源管理评价;王大洋等;《人民珠江》;20160509;第37卷(第05期);第10-14页 *

Also Published As

Publication number Publication date
CN110598129A (en) 2019-12-20

Similar Documents

Publication Publication Date Title
CN110598129B (en) Cross-social network user identity recognition method based on two-stage information entropy
CN109325691B (en) Abnormal behavior analysis method, electronic device and computer program product
CN108897789B (en) Cross-platform social network user identity identification method
CN108898479B (en) Credit evaluation model construction method and device
Johnson et al. Identifying stance by analyzing political discourse on twitter
CN110706095B (en) Target node key information filling method and system based on associated network
CN108833139B (en) OSSEC alarm data aggregation method based on category attribute division
TW201426578A (en) Generation method and device and risk assessment method and device for anonymous dataset
CN110287292B (en) Judgment criminal measuring deviation degree prediction method and device
CN111242218B (en) Cross-social network user identity recognition method fusing user multi-attribute information
CN108647800A (en) A kind of online social network user missing attribute forecast method based on node insertion
CN106960387A (en) Individual credit risk appraisal procedure and system
CN104850868A (en) Customer segmentation method based on k-means and neural network cluster
CN109635010A (en) A kind of user characteristics and characterization factor extract, querying method and system
CN108665270A (en) Data diddling recognition methods, device, computer equipment and storage medium
CN111428113A (en) Network public opinion guiding effect prediction method based on fuzzy comprehensive evaluation
Gokulkumari et al. Analyze the political preference of a common man by using data mining and machine learning
CN110598126B (en) Cross-social network user identity recognition method based on behavior habits
CN113344438A (en) Loan system, loan monitoring method, loan monitoring apparatus, and loan medium for monitoring loan behavior
CN112132589A (en) Method for constructing fraud recognition model based on multiple times of fusion
CN116523293A (en) User risk assessment method based on fusion behavior flow chart characteristics
CN115587828A (en) Interpretable method of telecommunication fraud scene based on Shap value
CN112508462B (en) Data screening method and device and storage medium
CN113706258A (en) Product recommendation method, device, equipment and storage medium based on combined model
CN114547294A (en) Rumor detection method and system based on comprehensive information of propagation process

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant