CN110210244B - Method and system for detecting privacy disclosure of social media users - Google Patents
Method and system for detecting privacy disclosure of social media users Download PDFInfo
- Publication number
- CN110210244B CN110210244B CN201910387263.5A CN201910387263A CN110210244B CN 110210244 B CN110210244 B CN 110210244B CN 201910387263 A CN201910387263 A CN 201910387263A CN 110210244 B CN110210244 B CN 110210244B
- Authority
- CN
- China
- Prior art keywords
- user
- privacy
- attribute
- data
- users
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Active
Links
Images
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F18/00—Pattern recognition
- G06F18/20—Analysing
- G06F18/21—Design or setup of recognition systems or techniques; Extraction of features in feature space; Blind source separation
- G06F18/214—Generating training patterns; Bootstrap methods, e.g. bagging or boosting
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F21/00—Security arrangements for protecting computers, components thereof, programs or data against unauthorised activity
- G06F21/60—Protecting data
- G06F21/62—Protecting access to data via a platform, e.g. using keys or access control rules
- G06F21/6218—Protecting access to data via a platform, e.g. using keys or access control rules to a system of files or objects, e.g. local or distributed file system or database
- G06F21/6245—Protecting personal data, e.g. for financial or medical purposes
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F21/00—Security arrangements for protecting computers, components thereof, programs or data against unauthorised activity
- G06F21/60—Protecting data
- G06F21/62—Protecting access to data via a platform, e.g. using keys or access control rules
- G06F21/6218—Protecting access to data via a platform, e.g. using keys or access control rules to a system of files or objects, e.g. local or distributed file system or database
- G06F21/6245—Protecting personal data, e.g. for financial or medical purposes
- G06F21/6254—Protecting personal data, e.g. for financial or medical purposes by anonymising data, e.g. decorrelating personal data from the owner's identification
Landscapes
- Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- Bioethics (AREA)
- General Health & Medical Sciences (AREA)
- Health & Medical Sciences (AREA)
- Physics & Mathematics (AREA)
- General Physics & Mathematics (AREA)
- General Engineering & Computer Science (AREA)
- Databases & Information Systems (AREA)
- Software Systems (AREA)
- Computer Security & Cryptography (AREA)
- Computer Hardware Design (AREA)
- Medical Informatics (AREA)
- Data Mining & Analysis (AREA)
- Artificial Intelligence (AREA)
- Life Sciences & Earth Sciences (AREA)
- Bioinformatics & Cheminformatics (AREA)
- Bioinformatics & Computational Biology (AREA)
- Computer Vision & Pattern Recognition (AREA)
- Evolutionary Biology (AREA)
- Evolutionary Computation (AREA)
- Management, Administration, Business Operations System, And Electronic Commerce (AREA)
- Information Retrieval, Db Structures And Fs Structures Therefor (AREA)
Abstract
The embodiment of the invention provides a method and a system for detecting privacy disclosure of a social media user, wherein the certainty of each privacy attribute of the user is evaluated based on data published by the user, the visibility of the user data is determined based on the network structure of the social media where the user is located, the degree of privacy disclosure of the user is measured according to the certainty of the privacy attributes of the user and the visibility of the user data, and a privacy disclosure risk prompt is sent to the user. According to the technical scheme of the embodiment of the invention, the privacy disclosure degree of the user is comprehensively and effectively quantified based on factors such as information content published by the user, social network structure, social relationship strength of the user, privacy preference setting of the user and the like, and the social media user can be helped to find out privacy disclosure events in time, so that the harm of privacy disclosure is reduced.
Description
Technical Field
The invention relates to social media data mining and privacy protection technologies, in particular to a method and a system for detecting whether privacy of social media users is disclosed.
Background
Social Media (Social Media) refers to a platform for content production and exchange based on user relationships on the internet. Currently, social media has been widely used in people's daily lives, and is a tool and platform for people to share opinions, insights, and opinions among each other. Social media also brings privacy disclosure risks while facilitating online socialization of people. People often actively post information through social media that is likely to be related to user privacy, such as the user's gender, work, address, etc. In the social network, information published by a user can be easily acquired by others, and privacy leakage is likely to be caused; it is difficult for the user to know or control exactly the specific directions of the messages, so that it is difficult to detect in time that the privacy has been revealed. Therefore, a method capable of helping social media users to find privacy disclosure events in time is urgently needed, so that the harm of privacy disclosure is reduced as much as possible, and the method has positive significance for maintaining the safety of social networks.
Disclosure of Invention
The embodiment of the invention aims to provide a method and a system for detecting privacy disclosure of a social media user, which are used for effectively evaluating the privacy disclosure risk of the user and helping the user to find possible privacy disclosure events in time.
The above purpose is realized by the following technical scheme:
according to a first aspect of the embodiments of the present invention, there is provided a method for detecting privacy disclosure of a social media user, including:
evaluating the certainty of each privacy attribute of the user based on the data issued by the user, wherein the certainty of the privacy attributes is used for indicating the possibility that the value of the privacy attributes of the user can be inferred according to the data issued by the user; determining the visibility of user data based on the network structure of the social media where the user is located, wherein the visibility of the user data is used for indicating the possibility that the data published by the user can be acquired by other users in the social media; measuring the degree of privacy leakage of the user according to the certainty of the privacy attributes of the user and the visibility of user data; and responding to the fact that the degree of the privacy leakage of the user is larger than a set threshold value, and sending privacy leakage risk prompt information to the user.
In some embodiments, the method may further include acquiring a preference setting of the user for each privacy attribute, and determining a sensitivity level of the user for each privacy attribute according to the privacy attribute preference set by the user; and measuring the degree of the privacy leakage of the user jointly according to the certainty of the privacy attributes of the user, the visibility of user data and the sensitivity degree of the user to each privacy attribute.
In some embodiments, evaluating the certainty of the user privacy attributes based on the data published by the user may be accomplished using pre-trained attribute recognition models corresponding to the privacy attributes, where the attribute recognition model corresponding to each privacy attribute inputs the data published for the user and outputs a probability of taking each attribute value for the privacy attribute of the user.
In some embodiments, the attribute recognition model corresponding to each privacy attribute may be trained by: collecting information published by each user in social media within a period of time, and calibrating attribute values of the privacy attribute of each piece of information in the collected data set, wherein the attribute values of the user publishing the information are relative to the privacy attribute; and taking the calibrated data set as a sample set to train an attribute recognition model corresponding to the privacy attribute.
In some embodiments, the certainty of the user privacy attributes may be calculated using the following formula:
wherein, cerjmRepresents the certainty of the mth privacy attribute, pra, for user j in social mediajmkThe probability that the mth privacy attribute representing user j takes the kth attribute value, KmRepresenting the number of possible attribute values taken for the mth privacy attribute.
In some embodiments, the visibility of user data may be measured using a metric based on one or more of the following: the importance degree of the users in the social network, the social relationship strength among the users and the activity degree of the users; the importance degree of the user in the social media is calculated according to the number of users paying attention to the user and the importance degree of each user paying attention to the user, which are counted by the current network structure of the social media; the social relationship strength between users can be set according to the attention relationship between users and/or the interaction frequency between users; the activity level of a user may be measured using the amount of information that the user publishes over a period of time.
In some embodiments, the importance of the user in the social network may be obtained by:
step A1: the importance level of each user of social media is represented by a user importance vector UR, which is n-dimensional, where n indicates the number of users of social media, the ith element UR of the vectoriRepresenting the importance degree of a user i in the social network, and initializing the value of each element of the vector to be 1/n;
step A2: based on the social relationship among the users in the social network, the user importance vector is updated according to the following update formula:
wherein, URtRepresenting the user importance vector after the t round of updating; q is a damping coefficient, which takes a real number between 0 and 1; t is a matrix indicating social relations among users in the social network, and the matrix T has an element T in the ith row and the j columnijIndicates the degree of attention of user i to user j, tij0 means that user i does not care about user j, tij>0 indicates that user i is interested in user j.
In some embodiments, the visibility of user data may be calculated using the following formula:
Wherein, visjRepresenting data visibility, t, of user jijIndicates the degree of attention of user i to user j, tij0 means that user i does not care about user j, tij>0 indicates that user i is interested in user j; i (x) represents an indicator function, the input variable x of which returns a1 if true, otherwise returns a 0; urjRepresents the importance of user j in the network of social media; wbjThe number of the information released by the user j in a period of time is shown, and h is a parameter with the value between 0 and 1。
In some embodiments, the user's sensitivity to each privacy attribute may be calculated using the following formula:
wherein sbj _ senjmRepresenting the sensitivity of the user j to the mth privacy attribute of the user j, and d representing the number of the privacy attributes of the user; r isjmA preference value representing user j's setting for its mth privacy attribute; r isjqRepresenting the preference value set by user j for his q-th privacy attribute.
In some embodiments, the degree to which user privacy is revealed may be calculated as follows:
Wherein psjRepresenting the degree of privacy disclosure of user j; sbj _ senimRepresenting the sensitivity of user i to its mth privacy attribute; visjData visibility representing user j; cerjmRepresents a certainty of the mth privacy attribute for user j in social media; sbj _ senjmIndicating how sensitive user j is to its mth privacy attribute.
According to a second aspect of the embodiment of the invention, a system for detecting privacy disclosure of social media users is further provided, and the system comprises an attribute certainty estimation module, a data visibility estimation module, a privacy disclosure evaluation module and a prompt module. The attribute certainty estimation module is used for evaluating the certainty of each privacy attribute of the user based on data issued by the user, and the certainty of the privacy attributes is used for indicating the possibility that the value of the privacy attributes of the user can be inferred according to the data issued by the user. The data visibility estimation module is used for determining the visibility of the user data based on the network structure of the social media where the user is located, and the visibility of the user data is used for indicating the possibility that the data published by the user can be acquired by other users in the social media. And the privacy disclosure evaluation module is used for measuring the degree of the user privacy disclosure according to the certainty of the user privacy attributes and the visibility of the user data. And the prompting module is used for responding to the condition that the degree of privacy disclosure of the user is greater than a set threshold value and sending privacy disclosure risk prompting information to the user.
The technical scheme of the embodiment of the invention can have the following beneficial effects:
the method not only considers the influence of the information published by the user on privacy disclosure, but also considers the propagation range of the user information in the social network, the personalized demand of the user on the privacy and the like, comprehensively and effectively quantifies the privacy disclosure degree of the user based on the social network structure, the social relationship strength of the user, the privacy preference setting of the user and other factors, and can help the social media user to find the privacy disclosure event in time, thereby reducing the harm of privacy disclosure.
It is to be understood that both the foregoing general description and the following detailed description are exemplary and explanatory only and are not restrictive of the invention, as claimed.
Drawings
The accompanying drawings, which are incorporated in and constitute a part of this specification, illustrate embodiments consistent with the invention and together with the description, serve to explain the principles of the invention. It is obvious that the drawings in the following description are only some embodiments of the invention, and that for a person skilled in the art, other drawings can be derived from them without inventive effort. In the drawings:
FIG. 1 shows a flowchart of a method for detecting privacy disclosure of a social media user according to one embodiment of the invention.
Fig. 2 is a flowchart illustrating a method for calculating user importance according to an embodiment of the present invention.
FIG. 3 is a schematic structural diagram of a system for detecting privacy disclosure of a social media user according to one embodiment of the invention.
Detailed Description
In order to make the objects, technical solutions and advantages of the present invention more apparent, the present invention will be further described in detail by embodiments with reference to the accompanying drawings. It is to be understood that the embodiments described are only a few embodiments of the present invention, and not all embodiments. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present invention.
Furthermore, the described features, structures, or characteristics may be combined in any suitable manner in one or more embodiments. In the following description, numerous specific details are provided to provide a thorough understanding of embodiments of the invention. One skilled in the relevant art will recognize, however, that the invention may be practiced without one or more of the specific details, or with other methods, components, devices, steps, and so forth. In other instances, well-known methods, devices, implementations or operations have not been shown or described in detail to avoid obscuring aspects of the invention.
The block diagrams shown in the figures are functional entities only and do not necessarily correspond to physically separate entities. I.e. these functional entities may be implemented in the form of software, or in one or more hardware modules or integrated circuits, or in different networks and/or processor means and/or microcontroller means.
The flow charts shown in the drawings are merely illustrative and do not necessarily include all of the contents and operations/steps, nor do they necessarily have to be performed in the order described. For example, some operations/steps may be decomposed, and some operations/steps may be combined or partially combined, so that the actual execution sequence may be changed according to the actual situation.
FIG. 1 is a flowchart illustrating a method for detecting privacy disclosure of a social media user according to an embodiment of the present invention. The method mainly comprises the following steps: s1) evaluating the certainty of each privacy attribute of the user based on the data issued by the user; s2) determining the visibility of the user data based on the network structure of the social media where the user is located; s3) measuring the degree of the privacy leakage of the user according to the certainty of the privacy attribute of the user and the visibility of the user data; and S4) in response to the degree of the user' S privacy disclosure being greater than the set threshold, issuing a privacy disclosure risk prompt to the user.
More specifically, at step S1) the certainty of each privacy attribute of the user is evaluated based on the data issued by the user. The privacy attribute generally refers to user attribute information that a user wants to keep secret and does not want other users of the social media to know without permission. In general, the set of privacy attributes set or specified by the user may be obtained through a corresponding interface set to the social media system, or the set of privacy attributes set by the social media system as a default for the user may be employed. Although a user may pay attention to hiding information related to the privacy attributes when publishing information, the content, the idiomatic language and the like of the information published by the user often reveal some privacy attribute information of the user to a certain extent, so that the privacy attributes of the user are likely to be inferred through public data published by the user. For example, if "fairy", "make-up", "lovely", and the like are often found in the information posted by the user, even if the gender attribute is hidden by the user, other users in the social network may presume that the gender of the user is female based on the information posted by the user. Thus, an important factor to consider in evaluating the degree of privacy disclosure of a user is the certainty of each privacy attribute of the user. The certainty of the privacy attributes is used for indicating the possibility or probability that the value of the privacy attributes of the user can be inferred according to data issued by the user, and the greater the certainty of the privacy attributes is, the greater the privacy leakage risk of the corresponding attributes is. The user attribute certainty can be estimated based on the information entropy of the attribute posterior distribution, and can be calculated by using the following formula:
wherein, cerjmRepresents the certainty of the mth privacy attribute, pra, for user j in social mediajmkThe probability that the mth privacy attribute representing user j takes the kth attribute value, KmRepresenting the number of possible attribute values taken for the mth privacy attribute. Each attribute of a user may have multiple values, for example, the "gender" attribute may have values of "male" and "female".
In one embodiment, the probability pra that the mth privacy attribute of user j takes the kth attribute value can be estimated based on a keyword statistical methodjmk. For example, some keywords are preset for each attribute value, then information text data published by a user in a period of time is collected, and then pra is estimated according to the number of keywords hit by the attribute value in the information text published by the userjmk. Taking the attribute of "gender" as an example, the attribute values include "male" and "female", corresponding keywords set for the attribute value "male" are { "brother", "grandmother" }, and corresponding keywords set for the attribute value "female" are { "own fairy", "make-up", "lovely" }. In the collected information text data set issued by the user, three words of ' brother ', ' brother ' and ' grandmother ' appear 8 times, while ' fairy ', ' make-up ' and ' lovely ' appear twice, so that the ' gender ' attribute takes the probability pra of the 1 st attribute value ' malejm10.8, the attribute of "sex" takes the probability pra of the 2 nd attribute value "femalejm20.2 for female gender.
In yet another embodiment, the probability pra that the mth privacy attribute of user j takes the kth attribute value can be estimated by a text classification learning methodjmk. In this embodiment, it is necessary to train a corresponding attribute recognition model for each privacy attribute, and complete the evaluation of the certainty of the user privacy attributes by using the pre-trained attribute recognition models corresponding to the privacy attributes. The attribute identification model corresponding to each privacy attribute inputs data issued by the user, and outputs the probability of taking each attribute value for the privacy attribute of the user. E.g. by logistic regression modelA sexual identification model, wherein for the attribute identification model of the mth attribute of the user j, the possible value of the mth attribute is assumed to be KmIf the input received by the corresponding attribute identification model is the information text of the user, outputting the common KmAnd the probability of each attribute value is respectively taken according to the mth attribute of the user. The attribute recognition model corresponding to each privacy attribute may be trained by the following steps: I) collecting information published by each user in social media within a period of time, and calibrating attribute values of the privacy attribute of each piece of information in the collected data set, wherein the attribute values of the user publishing the information are relative to the privacy attribute; II) taking the calibrated data set as a sample set to train an attribute identification model corresponding to the privacy attribute. In this way, on the basis of the information issued by the user, the probability that the user takes each attribute value for each privacy attribute is determined by using the attribute recognition model of each attribute trained in the process, that is, the probability pra that the mth privacy attribute of the user j required in the formula takes the kth attribute value is estimatedjmk。
With continued reference to FIG. 1, at step S2) the visibility of the user data is determined based on the network structure of the social media in which the user is located. Although the certainty of the user privacy attribute estimated by using the information published by the user through step S1) may reflect or reflect the magnitude of the risk of revealing the user privacy to some extent, it is found by the research of the inventors that the risk of revealing the user privacy is also related to the propagation range of the information published by the user in social media, the social relationship strength between users, and the like. Thus, in the embodiment of the invention, another important factor, namely the visibility of user data, is also considered when evaluating the degree of user privacy disclosure. The visibility of user data is used to indicate how likely it is that data posted by a user can be retrieved by other users in the social media. User data visibility can also be understood as the degree of exposure of the user data, i.e. the probability that the user data is seen by others. The greater the visibility of user data, the higher the risk of privacy exposure for the user. Visibility of user data may be measured using a metric based on one or more of the following: the importance of the users in the social media, the strength of the social relationship between the users, and the activity of the users. The importance degree of the user in the social media is calculated according to the number of users paying attention to the user and the importance degree of each user paying attention to the user, which are counted by the current network structure of the social media; the social relationship strength between users can be set according to the attention relationship between users and/or the interaction frequency between users; the activity level of a user may be measured using the amount of information that the user publishes over a period of time.
In one embodiment, the degree of importance of a user in social media (which may also be referred to as user importance) may be used to indicate or characterize the probability that the user's data is seen by others. The more important the user is, the greater the probability that the user data is seen by others, and the greater the risk that the user's privacy is revealed. For example, the microblog release is very important due to the fact that the large V account number in the microblog is numerous in fan, the microblog release can be seen by many people, and if the microblog contains the privacy, the privacy is easily revealed. User importance may be quantitatively evaluated according to the network structure of social media. In a social network, the more other users a user is concerned with (i.e., the more fans), the more important the user is; the more important the user is to focus on the user at the same time (i.e. the more important the fans themselves are), and correspondingly the more important the user is. That is, the importance of a user is closely related to the number of its fan users and the importance of the fan users. Each fan user is also concerned by other users, so the importance of a user needs to be calculated based on the social network structure, and the importance of the fan users needs to be calculated layer by layer.
FIG. 2 is a flow diagram of a method for calculating user importance using multiple iterative updates, according to one embodiment of the present invention. The method mainly comprises the following steps:
step 201: initializing a user importance vector; wherein the importance of each user of the social network is represented by a user importance vector UR, which is n-dimensional, where n indicates the number of users of the social network, the ith dimensional element UR of the vectoriRepresenting the importance of the user i, the value of each element of the vector is initialized to 1/n.
Step 202: updating the user importance vector; based on the social relationship among the users in the social network, the importance vector of the user is updated according to the following updating formula:
wherein, URtRepresenting the importance vector after updating t rounds; q is a damping coefficient, and the value of q is a real number between 0 and 1, and is usually greater than 0.5, in this embodiment, the damping coefficient may be set to 0.85, and by reasonably setting the value of q, it is possible to avoid that all finally obtained importance values flow to a hanging point in the social network as much as possible, that is, users who are concerned by other users but do not pay attention to any user. Wherein T is a matrix indicating the strength of social relationships among users in the social network, and the strength of social relationships among users can be set according to attention relationships among users and/or interaction frequency among users. For example, the element T in the ith row and j column of the matrix TijIndicating the degree of interest of user i to user j, e.g. tij0 means that user i does not care about user j, tij1 or tij>0 indicates that user i is interested in user j. As another example, the matrix T has an element T in the ith row and j columnijRepresenting the frequency of interaction of user i with user j, e.g. tij0 means that user i has no interaction with user j, tij1 indicates that there is an interaction between user i and user j, or t may be equal toijSet to a natural number greater than 0 indicating the number of interactions within a predetermined time period.
Step 203: judging the current user importance vector URtUser importance vector UR with previous roundt-1Is less than the acceptable error. If the Euclidean distance difference between the two front wheels and the two rear wheels is smaller than the acceptable error, the updating is stopped, and the step 204 is carried out to output the current user importance vector UR; otherwise go to step 202 to continue updating. Thus, the probability that user j's data is seen by others (i.e., the user data visibility) visj=urjTherein urjFor the jth element in the user importance vector URAnd (4) element.
In yet another embodiment, assume pijThe probability that the user i can obtain the data published by the user j is represented, and the data visibility of the user j can be represented as an average value of the probabilities that the users can obtain the data published by the user j in the social media, that is, the data visibility vis of the user jjCan be calculated by the following formula:
wherein, the probability p that the user i can obtain the data issued by the user jijNot only the importance of user j, but also the social relationship between user i and user j. That is, the data of the user j can be seen by the fans of the user j, but the probability of being seen by the fans is mainly related to the importance of the user j in the social network, so that the probability p that the user i can obtain the data published by the user jijCan be calculated by the following formula:
then it is substituted into the data visibility vis of user j abovejCalculating equation (1) yields:
wherein t isijIndicating the degree of interest of user i to user j, e.g. tij0 means that user i does not care about user j, tij>0 indicates that user i is interested in user j; i (x) represents an indicator function, the input variable x of which returns a1 if true, otherwise returns a 0; urjRepresenting the degree of importance of user j in the network of social media. In a preferred embodiment, the probability p that the user i can obtain the data issued by the user j is calculatedijAt least the following 3-aspect factor correlations are considered: 1) by usingImportance ur of household j itselfj,urjThe larger the likelihood that user j's data is seen by others; 2) social relationship strength t of user i and user jij,tijThe larger the user i sees the user j data the more likely; 3) the activity of user j, for example, may be the number wb of information recently released by user jjTo measure, wbjThe larger the user j data is, the more likely it is to be seen by others. Therefore, the probability p that the user i can acquire the data issued by the user jijCan be calculated by the following formula:
wherein h is [0,1 ]]The parameter is set by the system and is used for estimating the degree of clicking or viewing the information after the information is acquired, and the higher the h value is set, the higher the possibility that the information is read by the user on the whole is, and the relative visibility is not influenced, namely the possibility that the information is acquired is not influenced. p'ijThe probability that user i sees a piece of information of user j is represented, and the calculation method can be as follows:
then, according to the data visibility vis of the above user jjCalculating equation (1) yields:
wherein t isijIndicating the degree of interest of user i to user j, e.g. tij0 means that user i does not care about user j, tij>0 indicates that user i is interested in user j; i (x) represents an indicator function, the input variable x of which returns a1 if true, otherwise returns a 0; urjRepresents the importance of user j in the network of social media; wbjRepresenting the amount of information user j has published over a period of time.
With continued reference to FIG. 1, at step S3) measures the degree to which the user ' S privacy is revealed based on the certainty of the user ' S privacy attributes and the visibility of the user ' S data. For example, the degree of privacy disclosure of the user can be obtained by a weighted sum of the two indexes. For example, using the following formula:
wherein psjRepresenting the degree of privacy disclosure of user j; visjData visibility representing user j; cerjmRepresenting the certainty of the mth privacy attribute for user j in social media. Also can give vis according to actual demandjAnd cerjmDifferent weights are assigned to distinguish the importance of the two indices.
At step S4), in response to the degree of the user' S privacy leakage being greater than the set threshold, issuing privacy leakage risk prompt information to the user. The setting of the threshold value may be set according to an empirical value based on historical data statistics or may be specified by the user according to the degree of emphasis on privacy of the user. In this embodiment, ps obtained by step S3)jThe degree of privacy disclosure of the user j can be reflected as a whole, and the certainty of each user privacy attribute estimated by step S1) can reflect which privacy attribute of the user is disclosed to a greater degree from a finer granularity. Therefore, more detailed privacy disclosure risk prompt information can be provided for the user.
In still other embodiments, the method may further include obtaining a user's preference setting for each privacy attribute, and determining a degree of sensitivity of the user to each privacy attribute (which may also be referred to as privacy attribute sensitivity) according to the privacy attribute preference set by the user; and combining the sensitivity degree of the user to each privacy attribute with the certainty of the privacy attributes of the user and the visibility of the user data determined above to jointly measure the degree of privacy disclosure of the user. This is to take into account that the sensitivity to different privacy attributes is different for the user himself. A user is generally sensitive to one or some privacy attributes and is not sensitive to the rest of the privacy attributes, and even if the rest of the privacy attributes are all revealed, the user may think that the privacy is not exposed to others; whereas if one or several of the privacy attributes to which the user is more sensitive are revealed, the user will immediately perceive that privacy is violated. Wherein the user sensitivity to privacy attributes may be quantitatively evaluated based on user privacy preference settings. Assume that the set of privacy attributes associated with the user contains d privacy attributes. Each user in the social network can set the preference degree or sensitivity of each attribute in the privacy attribute set to be a natural number in a preset interval according to requirements, and the larger the numerical value is, the higher the privacy preference degree is, and the more sensitive the user is to the privacy attribute is. Thus, the user's privacy preference settings may be expressed in the form of a d-dimensional integer vector. Based on the quantified user privacy preference settings, the user's sensitivity to privacy attributes may be calculated by:
step C1: and obtaining a vector corresponding to the privacy preference of each user in the social media and constructing a sensitivity response matrix R. The sensitivity response matrix R is a matrix formed by privacy preference vectors of all users of the social media, wherein the rows correspond to the users, the columns correspond to the attributes, and the element R of the jth row and the mth columnjmIndicating the degree of privacy preference that user j sets for the mth attribute. The jth row R of the sensitivity response matrix RjPrivacy preference vector, mth column R, representing user jmIndicating the degree of privacy preference that all users set for the mth attribute.
Step C2: sensitivity sbj _ sen of user j to his mth privacy attributejmCan be expressed as:
where d represents the number of privacy attributes of the user; r isjmA preference value representing user j's setting for its mth privacy attribute; r isjqRepresenting the preference value set by user j for his q-th privacy attribute. User obtained hereSensitivity of j to its mth privacy attribute sbj _ senjmIn fact, subjective sensitivity may be considered, which represents subjective evaluation of the user's sensitivity to the attribute, and is related to both the user and the attribute, and subjective sensitivities of different users to the same attribute may be different. In yet another embodiment, the overall sensitivity of a user to an attribute in social media may be characterized based on an average of the sensitivity of different users to the same attribute, which may also be understood as an objective sensitivity, and reflects the overall sensitivity of all users in social media to an attribute. For example, the objective sensitivity obj _ sen of the attribute mmCan be obtained from the average of all users' subjective sensitivities to the attribute, and the calculation formula is as follows:
where n represents the total number of users in the social media, sbj _ senimIs subjective sensitivity and represents the sensitivity of the user i to the mth privacy attribute. Objective sensitivity here may represent the inherent sensitivity of the attribute and is no longer limited to the subjective perception of a certain user itself.
After obtaining a quantitative assessment of the user's sensitivity to privacy attributes, it may be combined with the user data visibility indicators and/or user privacy attribute certainty indicators mentioned above to assess the extent to which user privacy is revealed. For example, the three indexes can be comprehensively considered to quantitatively score the privacy disclosure degree of the user. With the above-mentioned objective sensitivity as an indicator of the sensitivity of the user to the privacy attribute, the following formula can be used to evaluate the privacy disclosure degree of the user j:
wherein obj _ psjRepresenting the objective privacy disclosure degree of the user j; obj _ senmRepresenting the overall sensitivity degree of each user of the social media to the mth privacy attribute; visjData representing user jVisibility; cerjmRepresenting the certainty of the mth privacy attribute for user j in social media. The privacy leakage degree obj _ ps of the user j acquired herejIt can be understood that the global privacy scores of the users are evaluated by the objective sensitivity of the attributes, and the global privacy scores of different users can be compared with each other because the objective sensitivity is related to the attributes and has little relation with the users. In another embodiment, the above-mentioned subjective sensitivity may also be used as an indicator of the sensitivity of the user to the privacy attribute to evaluate the privacy disclosure degree of the user, for example, the privacy disclosure degree of the user j may also be calculated by the following formula:
wherein sbj _ psjRepresenting the subjective privacy disclosure degree of the user j; visjData visibility representing user j; cerjmRepresents a certainty of the mth privacy attribute for user j in social media; sbj _ senjmIndicating how sensitive user j is to its mth privacy attribute. The privacy leakage degree sbj _ ps of the user j thus acquiredjThe subjective sensitivity index of the attribute is adopted for evaluation, so that the personalized privacy score of the user can be understood, different preferences of different users on the privacy attribute are considered, and the personalized requirements of the user are met more easily. In yet another embodiment, the global privacy score obj _ ps may also be employed simultaneouslyjAnd personalized privacy score sbj _ psjTo evaluate the degree of privacy disclosure of the user from different perspectives. It should be understood that the symbols obj _ psjAnd sbj _ psjIs an example of symbolic representation of the degree of objective privacy disclosure and the degree of subjective privacy for user j, and may also be represented by the symbol psjTo indicate any privacy exposure level of the user, and the specific symbolic signs are not limited in the embodiments herein.
In the embodiment of the invention, the influence of the information published by the user on privacy disclosure, the propagation range of the user information in the social network and the individual requirements of the user on privacy are integrated, the privacy disclosure degree of the user is effectively quantized based on the social network structure, the social relationship strength of the user, the privacy preference setting of the user and other factors, and the social media user can be helped to find the privacy disclosure event in time, so that the harm of privacy disclosure is reduced.
FIG. 3 is a schematic structural diagram of a system for detecting privacy disclosure of a social media user according to an embodiment of the present invention. As shown in fig. 3, the system 300 includes an attribute certainty estimation module 301, a data visibility estimation module 302, a privacy disclosure evaluation module 303, and a hint module 304. Although the block diagrams depict components in a functionally separate manner, such depiction is for illustrative purposes only. The components shown in the figures may be arbitrarily combined or separated into separate software, firmware, and/or hardware components. Moreover, regardless of how such components are combined or divided, they may execute on the same computing device or multiple computing devices, which may be connected by one or more networks.
The attribute certainty estimation module 301 estimates the certainty of each privacy attribute of the user based on the data issued by the user in the manner described above, where the certainty of the privacy attribute is used to indicate the possibility that the value of the privacy attribute of the user can be inferred according to the data issued by the user. The data visibility estimation module 302 determines the visibility of the user data based on the network structure of the social media in which the user is located in the manner as introduced above, wherein the visibility of the user data is used to indicate how likely the data posted by the user can be obtained by other users in the social media. The privacy disclosure evaluation module 303 may measure the extent of the user's privacy disclosure based on the certainty of the user's privacy attributes and the visibility of the user's data as introduced above. The prompt module 304 may issue a privacy disclosure risk prompt message to the user in response to the degree of the privacy disclosure of the user being greater than a set threshold.
Reference in the specification to "various embodiments," "some embodiments," "one embodiment," or "an embodiment," etc., means that a particular feature, structure, or characteristic described in connection with the embodiment is included in at least one embodiment. Thus, appearances of the phrases "in various embodiments," "in some embodiments," "in one embodiment," or "in an embodiment," or the like, in various places throughout this specification are not necessarily referring to the same embodiment. Furthermore, the particular features, structures, or characteristics may be combined in any suitable manner in one or more embodiments. Thus, a particular feature, structure, or characteristic illustrated or described in connection with one embodiment may be combined, in whole or in part, with a feature, structure, or characteristic of one or more other embodiments without limitation, as long as the combination is not logical or operational.
The terms "comprises," "comprising," and "having," and similar referents in this specification, are intended to cover non-exclusive inclusions, such that a process, method, system, article, or apparatus that comprises a list of steps or elements is not limited to only those steps or elements but may alternatively include other steps or elements not expressly listed or inherent to such process, method, article, or apparatus. The word "a" or "an" does not exclude a plurality. Additionally, the various elements of the drawings of the present application are merely schematic illustrations and are not drawn to scale.
Although the present invention has been described by the above embodiments, the present invention is not limited to the embodiments described herein, and various changes and modifications may be made without departing from the scope of the present invention.
Claims (10)
1. A method of detecting privacy disclosure of a social media user, comprising:
evaluating the certainty of each privacy attribute of the user based on the data issued by the user, wherein the certainty of the privacy attributes is used for indicating the possibility that the value of the privacy attributes of the user can be inferred according to the data issued by the user;
determining the visibility of user data based on the network structure of the social media where the user is located, wherein the visibility of the user data is used for indicating the possibility that the data published by the user can be acquired by other users in the social media;
measuring the degree of privacy leakage of the user according to the certainty of the privacy attributes of the user and the visibility of user data;
responding to the fact that the degree of the privacy leakage of the user is larger than a set threshold value, and sending privacy leakage risk prompt information to the user;
the certainty of evaluating the privacy attributes of the user based on the data issued by the user is completed by utilizing pre-trained attribute recognition models corresponding to the privacy attributes, the attribute recognition model corresponding to each privacy attribute is input as the data issued by the user, and the output is the probability that the privacy attribute of the user respectively takes each attribute value.
2. The method of claim 1, further comprising obtaining user preference settings for each privacy attribute, and determining the sensitivity of the user to each privacy attribute according to the privacy attribute preferences set by the user; and
and measuring the degree of the privacy leakage of the user jointly according to the certainty of the privacy attributes of the user, the visibility of user data and the sensitivity degree of the user to each privacy attribute.
3. The method of claim 1, wherein the attribute recognition model for each privacy attribute is trained by:
collecting information published by each user in social media within a period of time, and calibrating attribute values of the privacy attribute of each piece of information in the collected data set, wherein the attribute values of the user publishing the information are relative to the privacy attribute;
and taking the calibrated data set as a sample set to train an attribute recognition model corresponding to the privacy attribute.
4. The method of claim 1, wherein the certainty of the user privacy attributes is calculated as follows:
wherein, cerjmRepresents the certainty of the mth privacy attribute, pra, for user j in social mediajmkThe probability that the mth privacy attribute representing user j takes the kth attribute value, KmRepresenting the number of possible attribute values taken for the mth privacy attribute.
5. The method of any of claims 1-2 or 3-4, wherein visibility of user data is measured in terms of one or more of the following: the importance degree of the users in the social network, the social relationship strength among the users and the activity degree of the users; the importance degree of the user in the social media is calculated according to the number of users paying attention to the user and the importance degree of each user paying attention to the user, which are counted by the current network structure of the social media; the social relationship strength between the users is set according to the attention relationship between the users and/or the interaction frequency between the users; the activity of a user is measured in terms of the amount of information that the user publishes over a period of time.
6. The method of claim 5, wherein the importance level of the user in the social network is obtained by:
step A1: the importance level of each user of social media is represented by a user importance vector UR, which is n-dimensional, where n indicates the number of users of social media, the ith element UR of the vectoriRepresenting the importance degree of a user i in the social network, and initializing the value of each element of the vector to be 1/n;
step A2: based on the social relationship among the users in the social network, updating the user importance vector according to the following updating formula:
wherein, URtRepresentation updateA user importance vector after t rounds; q is a damping coefficient, which takes a real number between 0 and 1; t is a matrix indicating social relations among users in the social network, and the matrix T has an element T in the ith row and the j columnijIndicates the degree of attention of user i to user j, tij0 means that user i does not care about user j, tij>0 indicates that user i is interested in user j.
7. The method of claim 6, wherein the visibility of user data is calculated as follows:
Wherein, visjRepresenting data visibility, t, of user jijIndicates the degree of attention of user i to user j, tij0 means that user i does not care about user j, tij>0 indicates that user i is interested in user j; i (x) represents an indicator function, the input variable x of which returns a1 if true, otherwise returns a 0; urjRepresents the importance of user j in the network of social media; wbjThe number of the information issued by the user j in a period of time is shown, and h is a parameter with the value of 0-1.
8. The method of claim 2, wherein the user's sensitivity to each privacy attribute is calculated as follows:
wherein sbj _ senjmRepresenting the sensitivity of the user j to the mth privacy attribute of the user j, and d representing the number of the privacy attributes of the user; r isjmIndicating that user j has set for his mth privacy attributeA preference value; r isjqRepresenting the preference value set by user j for his q-th privacy attribute.
9. The method of claim 8, wherein the degree of user privacy disclosure is calculated as follows:
Wherein psjRepresenting the degree of privacy disclosure of user j; sbj _ senimRepresenting the sensitivity of user i to its mth privacy attribute; visjData visibility representing user j; cerjmRepresents a certainty of the mth privacy attribute for user j in social media; sbj _ senjmRepresents the sensitivity of user j to his mth privacy attribute; n represents the total number of users in the social media.
10. A system to detect privacy disclosure of social media users, comprising:
the attribute certainty estimation module is used for evaluating the certainty of each privacy attribute of the user based on data issued by the user, and the certainty of the privacy attributes is used for indicating the possibility that the value of the privacy attributes of the user can be inferred according to the data issued by the user; the certainty of evaluating the privacy attributes of the users based on the data issued by the users is completed by utilizing pre-trained attribute recognition models corresponding to the privacy attributes, the attribute recognition model corresponding to each privacy attribute is input as the data issued by the users, and the output is the probability that the privacy attributes of the users respectively take each attribute value;
the data visibility estimation module is used for determining the visibility of user data based on the network structure of the social media where the user is located, and the visibility of the user data is used for indicating the possibility that the data published by the user can be acquired by other users in the social media;
a privacy disclosure evaluation module for measuring the degree of the user privacy disclosure according to the certainty of the user privacy attributes and the visibility of the user data, an
And the prompt module is used for responding to the condition that the degree of privacy disclosure of the user is greater than a set threshold value and sending privacy disclosure risk prompt information to the user.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201910387263.5A CN110210244B (en) | 2019-05-10 | 2019-05-10 | Method and system for detecting privacy disclosure of social media users |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201910387263.5A CN110210244B (en) | 2019-05-10 | 2019-05-10 | Method and system for detecting privacy disclosure of social media users |
Publications (2)
Publication Number | Publication Date |
---|---|
CN110210244A CN110210244A (en) | 2019-09-06 |
CN110210244B true CN110210244B (en) | 2020-12-29 |
Family
ID=67787049
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201910387263.5A Active CN110210244B (en) | 2019-05-10 | 2019-05-10 | Method and system for detecting privacy disclosure of social media users |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN110210244B (en) |
Families Citing this family (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN110781518B (en) * | 2019-10-31 | 2021-07-27 | 北京工业大学 | Simulation method for determining privacy information propagation range in social network |
CN112364373B (en) * | 2020-11-03 | 2024-07-19 | 中国银联股份有限公司 | Data processing method, device, equipment and medium |
CN112632328B (en) * | 2020-12-07 | 2022-12-02 | 西安电子科技大学 | Vlog privacy leakage measurement evaluation method, system, medium and application |
CN115544370A (en) * | 2022-10-21 | 2022-12-30 | 珠海格力电器股份有限公司 | Information security assessment method, device, security assessment platform and storage medium |
Citations (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN103914659A (en) * | 2014-03-12 | 2014-07-09 | 西安电子科技大学 | System and method for track restraining data publishing privacy protection based on frequency |
CN106572111A (en) * | 2016-11-09 | 2017-04-19 | 南京邮电大学 | Big-data-oriented privacy information release exposure chain discovery method |
CN108390865A (en) * | 2018-01-30 | 2018-08-10 | 南京航空航天大学 | A kind of fine-grained access control mechanisms and system based on privacy driving |
CN109271806A (en) * | 2018-08-14 | 2019-01-25 | 同济大学 | Research on Privacy Preservation Mechanism based on user behavior |
-
2019
- 2019-05-10 CN CN201910387263.5A patent/CN110210244B/en active Active
Patent Citations (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN103914659A (en) * | 2014-03-12 | 2014-07-09 | 西安电子科技大学 | System and method for track restraining data publishing privacy protection based on frequency |
CN106572111A (en) * | 2016-11-09 | 2017-04-19 | 南京邮电大学 | Big-data-oriented privacy information release exposure chain discovery method |
CN108390865A (en) * | 2018-01-30 | 2018-08-10 | 南京航空航天大学 | A kind of fine-grained access control mechanisms and system based on privacy driving |
CN109271806A (en) * | 2018-08-14 | 2019-01-25 | 同济大学 | Research on Privacy Preservation Mechanism based on user behavior |
Non-Patent Citations (2)
Title |
---|
数据发布中的隐私保护研究综述;兰丽辉 等;《计算机应用研究》;20100831;第27卷(第8期);全文 * |
面向云数据的隐私度量研究进展;熊金波 等;《软件学报》;20171017;第29卷(第7期);全文 * |
Also Published As
Publication number | Publication date |
---|---|
CN110210244A (en) | 2019-09-06 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN110210244B (en) | Method and system for detecting privacy disclosure of social media users | |
CN110162703B (en) | Content recommendation method, training device, content recommendation equipment and storage medium | |
CN105701191B (en) | Pushed information click rate estimation method and device | |
Babaei et al. | Analyzing biases in perception of truth in news stories and their implications for fact checking | |
CN107908753B (en) | Client demand mining method and device based on social media comment data | |
CN105574067A (en) | Item recommendation device and item recommendation method | |
US8346710B2 (en) | Evaluating statistical significance of test statistics using placebo actions | |
WO2022188773A1 (en) | Text classification method and apparatus, device, computer-readable storage medium, and computer program product | |
CN106407364B (en) | Information recommendation method and device based on artificial intelligence | |
CN110035302B (en) | Information recommendation method and device, model training method and device, computing equipment and storage medium | |
US11115359B2 (en) | Method and apparatus for importance filtering a plurality of messages | |
Alahmadi et al. | Twitter-based recommender system to address cold-start: A genetic algorithm based trust modelling and probabilistic sentiment analysis | |
CN108053050A (en) | Clicking rate predictor method, device, computing device and storage medium | |
CN110532429B (en) | Online user group classification method and device based on clustering and association rules | |
CN110169021B (en) | Method and apparatus for filtering multiple messages | |
CN113407854A (en) | Application recommendation method, device and equipment and computer readable storage medium | |
CN113705792A (en) | Personalized recommendation method, device, equipment and medium based on deep learning model | |
CN110502639B (en) | Information recommendation method and device based on problem contribution degree and computer equipment | |
CN114996348A (en) | User portrait generation method and device, electronic equipment and storage medium | |
CN114218378A (en) | Content pushing method, device, equipment and medium based on knowledge graph | |
CN113886697A (en) | Clustering algorithm based activity recommendation method, device, equipment and storage medium | |
CN115905648B (en) | Gaussian mixture model-based user group and financial user group analysis method and device | |
CN111340540A (en) | Monitoring method, recommendation method and device of advertisement recommendation model | |
CN111177564A (en) | Product recommendation method and device | |
Kim et al. | Context-aware based item recommendation for personalized service |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
GR01 | Patent grant | ||
GR01 | Patent grant |