CN110210244B

CN110210244B - Method and system for detecting privacy disclosure of social media users

Info

Publication number: CN110210244B
Application number: CN201910387263.5A
Authority: CN
Inventors: 梁英; 董祥祥; 李锦涛; 谢小杰; 史红周; 高昂
Original assignee: Institute of Computing Technology of CAS
Current assignee: Institute of Computing Technology of CAS
Priority date: 2019-05-10
Filing date: 2019-05-10
Publication date: 2020-12-29
Anticipated expiration: 2039-05-10
Also published as: CN110210244A

Abstract

The embodiment of the invention provides a method and a system for detecting privacy disclosure of a social media user, wherein the certainty of each privacy attribute of the user is evaluated based on data published by the user, the visibility of the user data is determined based on the network structure of the social media where the user is located, the degree of privacy disclosure of the user is measured according to the certainty of the privacy attributes of the user and the visibility of the user data, and a privacy disclosure risk prompt is sent to the user. According to the technical scheme of the embodiment of the invention, the privacy disclosure degree of the user is comprehensively and effectively quantified based on factors such as information content published by the user, social network structure, social relationship strength of the user, privacy preference setting of the user and the like, and the social media user can be helped to find out privacy disclosure events in time, so that the harm of privacy disclosure is reduced.

Description

Method and system for detecting privacy disclosure of social media users

Technical Field

The invention relates to social media data mining and privacy protection technologies, in particular to a method and a system for detecting whether privacy of social media users is disclosed.

Background

Social Media (Social Media) refers to a platform for content production and exchange based on user relationships on the internet. Currently, social media has been widely used in people's daily lives, and is a tool and platform for people to share opinions, insights, and opinions among each other. Social media also brings privacy disclosure risks while facilitating online socialization of people. People often actively post information through social media that is likely to be related to user privacy, such as the user's gender, work, address, etc. In the social network, information published by a user can be easily acquired by others, and privacy leakage is likely to be caused; it is difficult for the user to know or control exactly the specific directions of the messages, so that it is difficult to detect in time that the privacy has been revealed. Therefore, a method capable of helping social media users to find privacy disclosure events in time is urgently needed, so that the harm of privacy disclosure is reduced as much as possible, and the method has positive significance for maintaining the safety of social networks.

Disclosure of Invention

The embodiment of the invention aims to provide a method and a system for detecting privacy disclosure of a social media user, which are used for effectively evaluating the privacy disclosure risk of the user and helping the user to find possible privacy disclosure events in time.

The above purpose is realized by the following technical scheme:

according to a first aspect of the embodiments of the present invention, there is provided a method for detecting privacy disclosure of a social media user, including:

evaluating the certainty of each privacy attribute of the user based on the data issued by the user, wherein the certainty of the privacy attributes is used for indicating the possibility that the value of the privacy attributes of the user can be inferred according to the data issued by the user; determining the visibility of user data based on the network structure of the social media where the user is located, wherein the visibility of the user data is used for indicating the possibility that the data published by the user can be acquired by other users in the social media; measuring the degree of privacy leakage of the user according to the certainty of the privacy attributes of the user and the visibility of user data; and responding to the fact that the degree of the privacy leakage of the user is larger than a set threshold value, and sending privacy leakage risk prompt information to the user.

In some embodiments, the method may further include acquiring a preference setting of the user for each privacy attribute, and determining a sensitivity level of the user for each privacy attribute according to the privacy attribute preference set by the user; and measuring the degree of the privacy leakage of the user jointly according to the certainty of the privacy attributes of the user, the visibility of user data and the sensitivity degree of the user to each privacy attribute.

In some embodiments, evaluating the certainty of the user privacy attributes based on the data published by the user may be accomplished using pre-trained attribute recognition models corresponding to the privacy attributes, where the attribute recognition model corresponding to each privacy attribute inputs the data published for the user and outputs a probability of taking each attribute value for the privacy attribute of the user.

In some embodiments, the attribute recognition model corresponding to each privacy attribute may be trained by: collecting information published by each user in social media within a period of time, and calibrating attribute values of the privacy attribute of each piece of information in the collected data set, wherein the attribute values of the user publishing the information are relative to the privacy attribute; and taking the calibrated data set as a sample set to train an attribute recognition model corresponding to the privacy attribute.

In some embodiments, the certainty of the user privacy attributes may be calculated using the following formula:

wherein, cer_jmRepresents the certainty of the mth privacy attribute, pra, for user j in social media_jmkThe probability that the mth privacy attribute representing user j takes the kth attribute value, K_mRepresenting the number of possible attribute values taken for the mth privacy attribute.

In some embodiments, the visibility of user data may be measured using a metric based on one or more of the following: the importance degree of the users in the social network, the social relationship strength among the users and the activity degree of the users; the importance degree of the user in the social media is calculated according to the number of users paying attention to the user and the importance degree of each user paying attention to the user, which are counted by the current network structure of the social media; the social relationship strength between users can be set according to the attention relationship between users and/or the interaction frequency between users; the activity level of a user may be measured using the amount of information that the user publishes over a period of time.

In some embodiments, the importance of the user in the social network may be obtained by:

step A1: the importance level of each user of social media is represented by a user importance vector UR, which is n-dimensional, where n indicates the number of users of social media, the ith element UR of the vector_iRepresenting the importance degree of a user i in the social network, and initializing the value of each element of the vector to be 1/n;

step A2: based on the social relationship among the users in the social network, the user importance vector is updated according to the following update formula:

wherein, UR_tRepresenting the user importance vector after the t round of updating; q is a damping coefficient, which takes a real number between 0 and 1; t is a matrix indicating social relations among users in the social network, and the matrix T has an element T in the ith row and the j column_ijIndicates the degree of attention of user i to user j, t_ij0 means that user i does not care about user j, t_ij>0 indicates that user i is interested in user j.

In some embodiments, the visibility of user data may be calculated using the following formula:

or

Wherein, vis_jRepresenting data visibility, t, of user j_ijIndicates the degree of attention of user i to user j, t_ij0 means that user i does not care about user j, t_ij>0 indicates that user i is interested in user j; i (x) represents an indicator function, the input variable x of which returns a1 if true, otherwise returns a 0; ur_jRepresents the importance of user j in the network of social media; wb_jThe number of the information released by the user j in a period of time is shown, and h is a parameter with the value between 0 and 1。

In some embodiments, the user's sensitivity to each privacy attribute may be calculated using the following formula:

wherein sbj _ sen_jmRepresenting the sensitivity of the user j to the mth privacy attribute of the user j, and d representing the number of the privacy attributes of the user; r is_jmA preference value representing user j's setting for its mth privacy attribute; r is_jqRepresenting the preference value set by user j for his q-th privacy attribute.

In some embodiments, the degree to which user privacy is revealed may be calculated as follows:

or

Wherein ps_jRepresenting the degree of privacy disclosure of user j; sbj _ sen_imRepresenting the sensitivity of user i to its mth privacy attribute; vis_jData visibility representing user j; cer_jmRepresents a certainty of the mth privacy attribute for user j in social media; sbj _ sen_jmIndicating how sensitive user j is to its mth privacy attribute.

According to a second aspect of the embodiment of the invention, a system for detecting privacy disclosure of social media users is further provided, and the system comprises an attribute certainty estimation module, a data visibility estimation module, a privacy disclosure evaluation module and a prompt module. The attribute certainty estimation module is used for evaluating the certainty of each privacy attribute of the user based on data issued by the user, and the certainty of the privacy attributes is used for indicating the possibility that the value of the privacy attributes of the user can be inferred according to the data issued by the user. The data visibility estimation module is used for determining the visibility of the user data based on the network structure of the social media where the user is located, and the visibility of the user data is used for indicating the possibility that the data published by the user can be acquired by other users in the social media. And the privacy disclosure evaluation module is used for measuring the degree of the user privacy disclosure according to the certainty of the user privacy attributes and the visibility of the user data. And the prompting module is used for responding to the condition that the degree of privacy disclosure of the user is greater than a set threshold value and sending privacy disclosure risk prompting information to the user.

The technical scheme of the embodiment of the invention can have the following beneficial effects:

the method not only considers the influence of the information published by the user on privacy disclosure, but also considers the propagation range of the user information in the social network, the personalized demand of the user on the privacy and the like, comprehensively and effectively quantifies the privacy disclosure degree of the user based on the social network structure, the social relationship strength of the user, the privacy preference setting of the user and other factors, and can help the social media user to find the privacy disclosure event in time, thereby reducing the harm of privacy disclosure.

It is to be understood that both the foregoing general description and the following detailed description are exemplary and explanatory only and are not restrictive of the invention, as claimed.

Drawings

The accompanying drawings, which are incorporated in and constitute a part of this specification, illustrate embodiments consistent with the invention and together with the description, serve to explain the principles of the invention. It is obvious that the drawings in the following description are only some embodiments of the invention, and that for a person skilled in the art, other drawings can be derived from them without inventive effort. In the drawings:

FIG. 1 shows a flowchart of a method for detecting privacy disclosure of a social media user according to one embodiment of the invention.

Fig. 2 is a flowchart illustrating a method for calculating user importance according to an embodiment of the present invention.

FIG. 3 is a schematic structural diagram of a system for detecting privacy disclosure of a social media user according to one embodiment of the invention.

Detailed Description

In order to make the objects, technical solutions and advantages of the present invention more apparent, the present invention will be further described in detail by embodiments with reference to the accompanying drawings. It is to be understood that the embodiments described are only a few embodiments of the present invention, and not all embodiments. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present invention.

Furthermore, the described features, structures, or characteristics may be combined in any suitable manner in one or more embodiments. In the following description, numerous specific details are provided to provide a thorough understanding of embodiments of the invention. One skilled in the relevant art will recognize, however, that the invention may be practiced without one or more of the specific details, or with other methods, components, devices, steps, and so forth. In other instances, well-known methods, devices, implementations or operations have not been shown or described in detail to avoid obscuring aspects of the invention.

The block diagrams shown in the figures are functional entities only and do not necessarily correspond to physically separate entities. I.e. these functional entities may be implemented in the form of software, or in one or more hardware modules or integrated circuits, or in different networks and/or processor means and/or microcontroller means.

The flow charts shown in the drawings are merely illustrative and do not necessarily include all of the contents and operations/steps, nor do they necessarily have to be performed in the order described. For example, some operations/steps may be decomposed, and some operations/steps may be combined or partially combined, so that the actual execution sequence may be changed according to the actual situation.

FIG. 1 is a flowchart illustrating a method for detecting privacy disclosure of a social media user according to an embodiment of the present invention. The method mainly comprises the following steps: s1) evaluating the certainty of each privacy attribute of the user based on the data issued by the user; s2) determining the visibility of the user data based on the network structure of the social media where the user is located; s3) measuring the degree of the privacy leakage of the user according to the certainty of the privacy attribute of the user and the visibility of the user data; and S4) in response to the degree of the user' S privacy disclosure being greater than the set threshold, issuing a privacy disclosure risk prompt to the user.

More specifically, at step S1) the certainty of each privacy attribute of the user is evaluated based on the data issued by the user. The privacy attribute generally refers to user attribute information that a user wants to keep secret and does not want other users of the social media to know without permission. In general, the set of privacy attributes set or specified by the user may be obtained through a corresponding interface set to the social media system, or the set of privacy attributes set by the social media system as a default for the user may be employed. Although a user may pay attention to hiding information related to the privacy attributes when publishing information, the content, the idiomatic language and the like of the information published by the user often reveal some privacy attribute information of the user to a certain extent, so that the privacy attributes of the user are likely to be inferred through public data published by the user. For example, if "fairy", "make-up", "lovely", and the like are often found in the information posted by the user, even if the gender attribute is hidden by the user, other users in the social network may presume that the gender of the user is female based on the information posted by the user. Thus, an important factor to consider in evaluating the degree of privacy disclosure of a user is the certainty of each privacy attribute of the user. The certainty of the privacy attributes is used for indicating the possibility or probability that the value of the privacy attributes of the user can be inferred according to data issued by the user, and the greater the certainty of the privacy attributes is, the greater the privacy leakage risk of the corresponding attributes is. The user attribute certainty can be estimated based on the information entropy of the attribute posterior distribution, and can be calculated by using the following formula:

wherein, cer_jmRepresents the certainty of the mth privacy attribute, pra, for user j in social media_jmkThe probability that the mth privacy attribute representing user j takes the kth attribute value, K_mRepresenting the number of possible attribute values taken for the mth privacy attribute. Each attribute of a user may have multiple values, for example, the "gender" attribute may have values of "male" and "female".

In one embodiment, the probability pra that the mth privacy attribute of user j takes the kth attribute value can be estimated based on a keyword statistical method_jmk. For example, some keywords are preset for each attribute value, then information text data published by a user in a period of time is collected, and then pra is estimated according to the number of keywords hit by the attribute value in the information text published by the user_jmk. Taking the attribute of "gender" as an example, the attribute values include "male" and "female", corresponding keywords set for the attribute value "male" are { "brother", "grandmother" }, and corresponding keywords set for the attribute value "female" are { "own fairy", "make-up", "lovely" }. In the collected information text data set issued by the user, three words of ' brother ', ' brother ' and ' grandmother ' appear 8 times, while ' fairy ', ' make-up ' and ' lovely ' appear twice, so that the ' gender ' attribute takes the probability pra of the 1 st attribute value ' male_jm10.8, the attribute of "sex" takes the probability pra of the 2 nd attribute value "female_jm20.2 for female gender.

In yet another embodiment, the probability pra that the mth privacy attribute of user j takes the kth attribute value can be estimated by a text classification learning method_jmk. In this embodiment, it is necessary to train a corresponding attribute recognition model for each privacy attribute, and complete the evaluation of the certainty of the user privacy attributes by using the pre-trained attribute recognition models corresponding to the privacy attributes. The attribute identification model corresponding to each privacy attribute inputs data issued by the user, and outputs the probability of taking each attribute value for the privacy attribute of the user. E.g. by logistic regression modelA sexual identification model, wherein for the attribute identification model of the mth attribute of the user j, the possible value of the mth attribute is assumed to be K_mIf the input received by the corresponding attribute identification model is the information text of the user, outputting the common K_mAnd the probability of each attribute value is respectively taken according to the mth attribute of the user. The attribute recognition model corresponding to each privacy attribute may be trained by the following steps: I) collecting information published by each user in social media within a period of time, and calibrating attribute values of the privacy attribute of each piece of information in the collected data set, wherein the attribute values of the user publishing the information are relative to the privacy attribute; II) taking the calibrated data set as a sample set to train an attribute identification model corresponding to the privacy attribute. In this way, on the basis of the information issued by the user, the probability that the user takes each attribute value for each privacy attribute is determined by using the attribute recognition model of each attribute trained in the process, that is, the probability pra that the mth privacy attribute of the user j required in the formula takes the kth attribute value is estimated_jmk。

With continued reference to FIG. 1, at step S2) the visibility of the user data is determined based on the network structure of the social media in which the user is located. Although the certainty of the user privacy attribute estimated by using the information published by the user through step S1) may reflect or reflect the magnitude of the risk of revealing the user privacy to some extent, it is found by the research of the inventors that the risk of revealing the user privacy is also related to the propagation range of the information published by the user in social media, the social relationship strength between users, and the like. Thus, in the embodiment of the invention, another important factor, namely the visibility of user data, is also considered when evaluating the degree of user privacy disclosure. The visibility of user data is used to indicate how likely it is that data posted by a user can be retrieved by other users in the social media. User data visibility can also be understood as the degree of exposure of the user data, i.e. the probability that the user data is seen by others. The greater the visibility of user data, the higher the risk of privacy exposure for the user. Visibility of user data may be measured using a metric based on one or more of the following: the importance of the users in the social media, the strength of the social relationship between the users, and the activity of the users. The importance degree of the user in the social media is calculated according to the number of users paying attention to the user and the importance degree of each user paying attention to the user, which are counted by the current network structure of the social media; the social relationship strength between users can be set according to the attention relationship between users and/or the interaction frequency between users; the activity level of a user may be measured using the amount of information that the user publishes over a period of time.

In one embodiment, the degree of importance of a user in social media (which may also be referred to as user importance) may be used to indicate or characterize the probability that the user's data is seen by others. The more important the user is, the greater the probability that the user data is seen by others, and the greater the risk that the user's privacy is revealed. For example, the microblog release is very important due to the fact that the large V account number in the microblog is numerous in fan, the microblog release can be seen by many people, and if the microblog contains the privacy, the privacy is easily revealed. User importance may be quantitatively evaluated according to the network structure of social media. In a social network, the more other users a user is concerned with (i.e., the more fans), the more important the user is; the more important the user is to focus on the user at the same time (i.e. the more important the fans themselves are), and correspondingly the more important the user is. That is, the importance of a user is closely related to the number of its fan users and the importance of the fan users. Each fan user is also concerned by other users, so the importance of a user needs to be calculated based on the social network structure, and the importance of the fan users needs to be calculated layer by layer.

FIG. 2 is a flow diagram of a method for calculating user importance using multiple iterative updates, according to one embodiment of the present invention. The method mainly comprises the following steps:

step 201: initializing a user importance vector; wherein the importance of each user of the social network is represented by a user importance vector UR, which is n-dimensional, where n indicates the number of users of the social network, the ith dimensional element UR of the vector_iRepresenting the importance of the user i, the value of each element of the vector is initialized to 1/n.

Step 202: updating the user importance vector; based on the social relationship among the users in the social network, the importance vector of the user is updated according to the following updating formula:

wherein, UR_tRepresenting the importance vector after updating t rounds; q is a damping coefficient, and the value of q is a real number between 0 and 1, and is usually greater than 0.5, in this embodiment, the damping coefficient may be set to 0.85, and by reasonably setting the value of q, it is possible to avoid that all finally obtained importance values flow to a hanging point in the social network as much as possible, that is, users who are concerned by other users but do not pay attention to any user. Wherein T is a matrix indicating the strength of social relationships among users in the social network, and the strength of social relationships among users can be set according to attention relationships among users and/or interaction frequency among users. For example, the element T in the ith row and j column of the matrix T_ijIndicating the degree of interest of user i to user j, e.g. t_ij0 means that user i does not care about user j, t_ij1 or t_ij>0 indicates that user i is interested in user j. As another example, the matrix T has an element T in the ith row and j column_ijRepresenting the frequency of interaction of user i with user j, e.g. t_ij0 means that user i has no interaction with user j, t_ij1 indicates that there is an interaction between user i and user j, or t may be equal to_ijSet to a natural number greater than 0 indicating the number of interactions within a predetermined time period.

Step 203: judging the current user importance vector UR_tUser importance vector UR with previous round_t-1Is less than the acceptable error. If the Euclidean distance difference between the two front wheels and the two rear wheels is smaller than the acceptable error, the updating is stopped, and the step 204 is carried out to output the current user importance vector UR; otherwise go to step 202 to continue updating. Thus, the probability that user j's data is seen by others (i.e., the user data visibility) vis_j＝ur_jTherein ur_jFor the jth element in the user importance vector URAnd (4) element.

In yet another embodiment, assume p_ijThe probability that the user i can obtain the data published by the user j is represented, and the data visibility of the user j can be represented as an average value of the probabilities that the users can obtain the data published by the user j in the social media, that is, the data visibility vis of the user j_jCan be calculated by the following formula:

wherein, the probability p that the user i can obtain the data issued by the user j_ijNot only the importance of user j, but also the social relationship between user i and user j. That is, the data of the user j can be seen by the fans of the user j, but the probability of being seen by the fans is mainly related to the importance of the user j in the social network, so that the probability p that the user i can obtain the data published by the user j_ijCan be calculated by the following formula:

then it is substituted into the data visibility vis of user j above_jCalculating equation (1) yields:

wherein t is_ijIndicating the degree of interest of user i to user j, e.g. t_ij0 means that user i does not care about user j, t_ij>0 indicates that user i is interested in user j; i (x) represents an indicator function, the input variable x of which returns a1 if true, otherwise returns a 0; ur_jRepresenting the degree of importance of user j in the network of social media. In a preferred embodiment, the probability p that the user i can obtain the data issued by the user j is calculated_ijAt least the following 3-aspect factor correlations are considered: 1) by usingImportance ur of household j itself_j，ur_jThe larger the likelihood that user j's data is seen by others; 2) social relationship strength t of user i and user j_ij，t_ijThe larger the user i sees the user j data the more likely; 3) the activity of user j, for example, may be the number wb of information recently released by user j_jTo measure, wb_jThe larger the user j data is, the more likely it is to be seen by others. Therefore, the probability p that the user i can acquire the data issued by the user j_ijCan be calculated by the following formula:

wherein h is [0,1 ]]The parameter is set by the system and is used for estimating the degree of clicking or viewing the information after the information is acquired, and the higher the h value is set, the higher the possibility that the information is read by the user on the whole is, and the relative visibility is not influenced, namely the possibility that the information is acquired is not influenced. p'_ijThe probability that user i sees a piece of information of user j is represented, and the calculation method can be as follows:

then, according to the data visibility vis of the above user j_jCalculating equation (1) yields:

wherein t is_ijIndicating the degree of interest of user i to user j, e.g. t_ij0 means that user i does not care about user j, t_ij>0 indicates that user i is interested in user j; i (x) represents an indicator function, the input variable x of which returns a1 if true, otherwise returns a 0; ur_jRepresents the importance of user j in the network of social media; wb_jRepresenting the amount of information user j has published over a period of time.

With continued reference to FIG. 1, at step S3) measures the degree to which the user ' S privacy is revealed based on the certainty of the user ' S privacy attributes and the visibility of the user ' S data. For example, the degree of privacy disclosure of the user can be obtained by a weighted sum of the two indexes. For example, using the following formula:

wherein ps_jRepresenting the degree of privacy disclosure of user j; vis_jData visibility representing user j; cer_jmRepresenting the certainty of the mth privacy attribute for user j in social media. Also can give vis according to actual demand_jAnd cer_jmDifferent weights are assigned to distinguish the importance of the two indices.

At step S4), in response to the degree of the user' S privacy leakage being greater than the set threshold, issuing privacy leakage risk prompt information to the user. The setting of the threshold value may be set according to an empirical value based on historical data statistics or may be specified by the user according to the degree of emphasis on privacy of the user. In this embodiment, ps obtained by step S3)_jThe degree of privacy disclosure of the user j can be reflected as a whole, and the certainty of each user privacy attribute estimated by step S1) can reflect which privacy attribute of the user is disclosed to a greater degree from a finer granularity. Therefore, more detailed privacy disclosure risk prompt information can be provided for the user.

In still other embodiments, the method may further include obtaining a user's preference setting for each privacy attribute, and determining a degree of sensitivity of the user to each privacy attribute (which may also be referred to as privacy attribute sensitivity) according to the privacy attribute preference set by the user; and combining the sensitivity degree of the user to each privacy attribute with the certainty of the privacy attributes of the user and the visibility of the user data determined above to jointly measure the degree of privacy disclosure of the user. This is to take into account that the sensitivity to different privacy attributes is different for the user himself. A user is generally sensitive to one or some privacy attributes and is not sensitive to the rest of the privacy attributes, and even if the rest of the privacy attributes are all revealed, the user may think that the privacy is not exposed to others; whereas if one or several of the privacy attributes to which the user is more sensitive are revealed, the user will immediately perceive that privacy is violated. Wherein the user sensitivity to privacy attributes may be quantitatively evaluated based on user privacy preference settings. Assume that the set of privacy attributes associated with the user contains d privacy attributes. Each user in the social network can set the preference degree or sensitivity of each attribute in the privacy attribute set to be a natural number in a preset interval according to requirements, and the larger the numerical value is, the higher the privacy preference degree is, and the more sensitive the user is to the privacy attribute is. Thus, the user's privacy preference settings may be expressed in the form of a d-dimensional integer vector. Based on the quantified user privacy preference settings, the user's sensitivity to privacy attributes may be calculated by:

step C1: and obtaining a vector corresponding to the privacy preference of each user in the social media and constructing a sensitivity response matrix R. The sensitivity response matrix R is a matrix formed by privacy preference vectors of all users of the social media, wherein the rows correspond to the users, the columns correspond to the attributes, and the element R of the jth row and the mth column_jmIndicating the degree of privacy preference that user j sets for the mth attribute. The jth row R of the sensitivity response matrix R_jPrivacy preference vector, mth column R, representing user j_mIndicating the degree of privacy preference that all users set for the mth attribute.

Step C2: sensitivity sbj _ sen of user j to his mth privacy attribute_jmCan be expressed as:

where d represents the number of privacy attributes of the user; r is_jmA preference value representing user j's setting for its mth privacy attribute; r is_jqRepresenting the preference value set by user j for his q-th privacy attribute. User obtained hereSensitivity of j to its mth privacy attribute sbj _ sen_jmIn fact, subjective sensitivity may be considered, which represents subjective evaluation of the user's sensitivity to the attribute, and is related to both the user and the attribute, and subjective sensitivities of different users to the same attribute may be different. In yet another embodiment, the overall sensitivity of a user to an attribute in social media may be characterized based on an average of the sensitivity of different users to the same attribute, which may also be understood as an objective sensitivity, and reflects the overall sensitivity of all users in social media to an attribute. For example, the objective sensitivity obj _ sen of the attribute m_mCan be obtained from the average of all users' subjective sensitivities to the attribute, and the calculation formula is as follows:

where n represents the total number of users in the social media, sbj _ sen_imIs subjective sensitivity and represents the sensitivity of the user i to the mth privacy attribute. Objective sensitivity here may represent the inherent sensitivity of the attribute and is no longer limited to the subjective perception of a certain user itself.

After obtaining a quantitative assessment of the user's sensitivity to privacy attributes, it may be combined with the user data visibility indicators and/or user privacy attribute certainty indicators mentioned above to assess the extent to which user privacy is revealed. For example, the three indexes can be comprehensively considered to quantitatively score the privacy disclosure degree of the user. With the above-mentioned objective sensitivity as an indicator of the sensitivity of the user to the privacy attribute, the following formula can be used to evaluate the privacy disclosure degree of the user j:

wherein obj _ ps_jRepresenting the objective privacy disclosure degree of the user j; obj _ sen_mRepresenting the overall sensitivity degree of each user of the social media to the mth privacy attribute; vis_jData representing user jVisibility; cer_jmRepresenting the certainty of the mth privacy attribute for user j in social media. The privacy leakage degree obj _ ps of the user j acquired here_jIt can be understood that the global privacy scores of the users are evaluated by the objective sensitivity of the attributes, and the global privacy scores of different users can be compared with each other because the objective sensitivity is related to the attributes and has little relation with the users. In another embodiment, the above-mentioned subjective sensitivity may also be used as an indicator of the sensitivity of the user to the privacy attribute to evaluate the privacy disclosure degree of the user, for example, the privacy disclosure degree of the user j may also be calculated by the following formula:

wherein sbj _ ps_jRepresenting the subjective privacy disclosure degree of the user j; vis_jData visibility representing user j; cer_jmRepresents a certainty of the mth privacy attribute for user j in social media; sbj _ sen_jmIndicating how sensitive user j is to its mth privacy attribute. The privacy leakage degree sbj _ ps of the user j thus acquired_jThe subjective sensitivity index of the attribute is adopted for evaluation, so that the personalized privacy score of the user can be understood, different preferences of different users on the privacy attribute are considered, and the personalized requirements of the user are met more easily. In yet another embodiment, the global privacy score obj _ ps may also be employed simultaneously_jAnd personalized privacy score sbj _ ps_jTo evaluate the degree of privacy disclosure of the user from different perspectives. It should be understood that the symbols obj _ ps_jAnd sbj _ ps_jIs an example of symbolic representation of the degree of objective privacy disclosure and the degree of subjective privacy for user j, and may also be represented by the symbol ps_jTo indicate any privacy exposure level of the user, and the specific symbolic signs are not limited in the embodiments herein.

In the embodiment of the invention, the influence of the information published by the user on privacy disclosure, the propagation range of the user information in the social network and the individual requirements of the user on privacy are integrated, the privacy disclosure degree of the user is effectively quantized based on the social network structure, the social relationship strength of the user, the privacy preference setting of the user and other factors, and the social media user can be helped to find the privacy disclosure event in time, so that the harm of privacy disclosure is reduced.

FIG. 3 is a schematic structural diagram of a system for detecting privacy disclosure of a social media user according to an embodiment of the present invention. As shown in fig. 3, the system 300 includes an attribute certainty estimation module 301, a data visibility estimation module 302, a privacy disclosure evaluation module 303, and a hint module 304. Although the block diagrams depict components in a functionally separate manner, such depiction is for illustrative purposes only. The components shown in the figures may be arbitrarily combined or separated into separate software, firmware, and/or hardware components. Moreover, regardless of how such components are combined or divided, they may execute on the same computing device or multiple computing devices, which may be connected by one or more networks.

The attribute certainty estimation module 301 estimates the certainty of each privacy attribute of the user based on the data issued by the user in the manner described above, where the certainty of the privacy attribute is used to indicate the possibility that the value of the privacy attribute of the user can be inferred according to the data issued by the user. The data visibility estimation module 302 determines the visibility of the user data based on the network structure of the social media in which the user is located in the manner as introduced above, wherein the visibility of the user data is used to indicate how likely the data posted by the user can be obtained by other users in the social media. The privacy disclosure evaluation module 303 may measure the extent of the user's privacy disclosure based on the certainty of the user's privacy attributes and the visibility of the user's data as introduced above. The prompt module 304 may issue a privacy disclosure risk prompt message to the user in response to the degree of the privacy disclosure of the user being greater than a set threshold.

Reference in the specification to "various embodiments," "some embodiments," "one embodiment," or "an embodiment," etc., means that a particular feature, structure, or characteristic described in connection with the embodiment is included in at least one embodiment. Thus, appearances of the phrases "in various embodiments," "in some embodiments," "in one embodiment," or "in an embodiment," or the like, in various places throughout this specification are not necessarily referring to the same embodiment. Furthermore, the particular features, structures, or characteristics may be combined in any suitable manner in one or more embodiments. Thus, a particular feature, structure, or characteristic illustrated or described in connection with one embodiment may be combined, in whole or in part, with a feature, structure, or characteristic of one or more other embodiments without limitation, as long as the combination is not logical or operational.

The terms "comprises," "comprising," and "having," and similar referents in this specification, are intended to cover non-exclusive inclusions, such that a process, method, system, article, or apparatus that comprises a list of steps or elements is not limited to only those steps or elements but may alternatively include other steps or elements not expressly listed or inherent to such process, method, article, or apparatus. The word "a" or "an" does not exclude a plurality. Additionally, the various elements of the drawings of the present application are merely schematic illustrations and are not drawn to scale.

Although the present invention has been described by the above embodiments, the present invention is not limited to the embodiments described herein, and various changes and modifications may be made without departing from the scope of the present invention.

Claims

1. A method of detecting privacy disclosure of a social media user, comprising:

evaluating the certainty of each privacy attribute of the user based on the data issued by the user, wherein the certainty of the privacy attributes is used for indicating the possibility that the value of the privacy attributes of the user can be inferred according to the data issued by the user;

determining the visibility of user data based on the network structure of the social media where the user is located, wherein the visibility of the user data is used for indicating the possibility that the data published by the user can be acquired by other users in the social media;

measuring the degree of privacy leakage of the user according to the certainty of the privacy attributes of the user and the visibility of user data;

responding to the fact that the degree of the privacy leakage of the user is larger than a set threshold value, and sending privacy leakage risk prompt information to the user;

the certainty of evaluating the privacy attributes of the user based on the data issued by the user is completed by utilizing pre-trained attribute recognition models corresponding to the privacy attributes, the attribute recognition model corresponding to each privacy attribute is input as the data issued by the user, and the output is the probability that the privacy attribute of the user respectively takes each attribute value.

2. The method of claim 1, further comprising obtaining user preference settings for each privacy attribute, and determining the sensitivity of the user to each privacy attribute according to the privacy attribute preferences set by the user; and

and measuring the degree of the privacy leakage of the user jointly according to the certainty of the privacy attributes of the user, the visibility of user data and the sensitivity degree of the user to each privacy attribute.

3. The method of claim 1, wherein the attribute recognition model for each privacy attribute is trained by:

collecting information published by each user in social media within a period of time, and calibrating attribute values of the privacy attribute of each piece of information in the collected data set, wherein the attribute values of the user publishing the information are relative to the privacy attribute;

and taking the calibrated data set as a sample set to train an attribute recognition model corresponding to the privacy attribute.

4. The method of claim 1, wherein the certainty of the user privacy attributes is calculated as follows:

5. The method of any of claims 1-2 or 3-4, wherein visibility of user data is measured in terms of one or more of the following: the importance degree of the users in the social network, the social relationship strength among the users and the activity degree of the users; the importance degree of the user in the social media is calculated according to the number of users paying attention to the user and the importance degree of each user paying attention to the user, which are counted by the current network structure of the social media; the social relationship strength between the users is set according to the attention relationship between the users and/or the interaction frequency between the users; the activity of a user is measured in terms of the amount of information that the user publishes over a period of time.

6. The method of claim 5, wherein the importance level of the user in the social network is obtained by:

step A2: based on the social relationship among the users in the social network, updating the user importance vector according to the following updating formula:

wherein, UR_tRepresentation updateA user importance vector after t rounds; q is a damping coefficient, which takes a real number between 0 and 1; t is a matrix indicating social relations among users in the social network, and the matrix T has an element T in the ith row and the j column_ijIndicates the degree of attention of user i to user j, t_ij0 means that user i does not care about user j, t_ij>0 indicates that user i is interested in user j.

7. The method of claim 6, wherein the visibility of user data is calculated as follows:

or

Wherein, vis_jRepresenting data visibility, t, of user j_ijIndicates the degree of attention of user i to user j, t_ij0 means that user i does not care about user j, t_ij>0 indicates that user i is interested in user j; i (x) represents an indicator function, the input variable x of which returns a1 if true, otherwise returns a 0; ur_jRepresents the importance of user j in the network of social media; wb_jThe number of the information issued by the user j in a period of time is shown, and h is a parameter with the value of 0-1.

8. The method of claim 2, wherein the user's sensitivity to each privacy attribute is calculated as follows:

wherein sbj _ sen_jmRepresenting the sensitivity of the user j to the mth privacy attribute of the user j, and d representing the number of the privacy attributes of the user; r is_jmIndicating that user j has set for his mth privacy attributeA preference value; r is_jqRepresenting the preference value set by user j for his q-th privacy attribute.

9. The method of claim 8, wherein the degree of user privacy disclosure is calculated as follows:

or

Wherein ps_jRepresenting the degree of privacy disclosure of user j; sbj _ sen_imRepresenting the sensitivity of user i to its mth privacy attribute; vis_jData visibility representing user j; cer_jmRepresents a certainty of the mth privacy attribute for user j in social media; sbj _ sen_jmRepresents the sensitivity of user j to his mth privacy attribute; n represents the total number of users in the social media.

10. A system to detect privacy disclosure of social media users, comprising:

the attribute certainty estimation module is used for evaluating the certainty of each privacy attribute of the user based on data issued by the user, and the certainty of the privacy attributes is used for indicating the possibility that the value of the privacy attributes of the user can be inferred according to the data issued by the user; the certainty of evaluating the privacy attributes of the users based on the data issued by the users is completed by utilizing pre-trained attribute recognition models corresponding to the privacy attributes, the attribute recognition model corresponding to each privacy attribute is input as the data issued by the users, and the output is the probability that the privacy attributes of the users respectively take each attribute value;

the data visibility estimation module is used for determining the visibility of user data based on the network structure of the social media where the user is located, and the visibility of the user data is used for indicating the possibility that the data published by the user can be acquired by other users in the social media;

a privacy disclosure evaluation module for measuring the degree of the user privacy disclosure according to the certainty of the user privacy attributes and the visibility of the user data, an

And the prompt module is used for responding to the condition that the degree of privacy disclosure of the user is greater than a set threshold value and sending privacy disclosure risk prompt information to the user.