Disclosure of Invention
In view of the above, the present invention provides a method, an apparatus, a storage medium, and an electronic device for protecting data privacy, which overcome the above problems or at least partially solve the above problems.
In a first aspect, a method for protecting data privacy includes:
the enterprise informatization system executes a first mode on each metadata of each Internet of things object stored by the enterprise informatization system so as to obtain generalized metadata corresponding to each metadata, wherein one Internet of things object corresponds to at least one piece of metadata, and each piece of metadata comprises a plurality of similar identifiers and identifier attribute values corresponding to the similar identifiers;
in a first mode, executing the following steps for any metadata: based on the principle of a k-anonymous privacy protection model, generalizing the metadata according to the identifier attribute values corresponding to the same kind of identifiers included in the metadata to obtain an identifier generalized attribute value, and replacing the identifier attribute values corresponding to the same kind of identifiers included in the metadata with the identifier generalized attribute values;
the enterprise informatization system divides the generalized metadata into a nearest equal set according to a proximity principle of the same kind of identifiers, so that a user can conveniently input the same kind of identifiers to query, and accordingly the generalized metadata in the nearest equal set is obtained, wherein each equal set meets the principle of the k-anonymous privacy protection model: at least k pieces of the generalized metadata with the same category identifier are included, and k is an integer not less than 1.
With reference to the first aspect, in certain optional embodiments, the method further comprises:
the enterprise informatization system sends each generalized metadata to an identification node management system so as to register each generalized metadata in the identification node management system, so that a user can conveniently input the same kind of identifiers to the identification node management system for query, and accordingly the corresponding generalized metadata can be obtained.
With reference to the first aspect, in some optional implementations, the generalizing the metadata according to the identifier attribute value corresponding to each of the same-class identifiers included in the metadata based on the principle of the k-anonymous privacy protection model to obtain an identifier generalized attribute value includes:
based on the principle of a k-anonymous privacy protection model, the identifier attribute values corresponding to the same type of identifiers included in the metadata are spliced in sequence, so that one identifier generalized attribute value is obtained.
With reference to the first aspect, in some optional embodiments, the proximity principle of the homogeneous identifier includes:
for any of the generalized metadata: determining the equality set as the nearest equality set if the homogeneous identifier of the generalized metadata is homogeneous with the homogeneous identifiers of each of the generalized metadata in the equality set; determining the nearest equal set according to a storage address if the homogeneous identifier of the generalized metadata is not homogeneous with the homogeneous identifier of each of the generalized metadata in each of the equal sets, wherein the storage address of the generalized metadata in a database is adjacent to the storage address of at least one of the generalized metadata in the nearest equal set in the database.
With reference to the first aspect, in certain optional embodiments, the method further comprises:
the enterprise informatization system judges whether the number of the generalized metadata included in each equal set is less than k or greater than 2 k;
if the number of pieces of the generalized metadata included in the equal set is less than k, dividing the generalized metadata included in the equal set into other nearest equal sets so as to solve the equal sets of which the number of pieces of the generalized metadata is less than k;
if the number of pieces of the generalized metadata included in the equal sets is greater than 2k, dividing the generalized metadata included in the equal sets into two equal sets, wherein the number of pieces of the generalized metadata included in each of the two equal sets is not less than k.
With reference to the first aspect, in some optional embodiments, the generic identifier is a quasi identifier, and the corresponding identifier attribute value is a quasi identifier attribute value, where the quasi identifier is a data column that cannot directly characterize a user corresponding to the metadata, but can characterize the user corresponding to the metadata by combining multiple columns of external data or other external information, and the quasi identifier includes: name class identifier, gender class identifier, birthday class identifier or zip code class identifier;
or, the generic identifier is an explicit identifier, and the corresponding identifier attribute value is an explicit identifier attribute value, where the explicit identifier is a data column capable of directly characterizing a user or an object of the internet of things corresponding to the metadata, and the explicit identifier includes: an identification number identifier or an article unique code identifier;
or, the generic identifier is a sensitive identifier, and the corresponding identifier attribute value is a sensitive identifier attribute value, where the sensitive identifier is a data column capable of characterizing sensitive privacy data of a user corresponding to the metadata, and the sensitive identifier includes: salary class identifiers or disease class identifiers.
In a second aspect, an apparatus for protecting data privacy includes: a generalization unit and a division unit;
the generalization unit is configured to execute a first execution mode on metadata of each internet of things object stored in the generalization unit to obtain generalized metadata corresponding to each piece of metadata, wherein one internet of things object corresponds to at least one piece of metadata, and each piece of metadata includes a plurality of similar identifiers and identifier attribute values corresponding to the similar identifiers;
in a first mode, executing the following steps for any metadata: based on the principle of a k-anonymous privacy protection model, generalizing the metadata according to the identifier attribute values corresponding to the same kind of identifiers included in the metadata to obtain an identifier generalized attribute value, and replacing the identifier attribute values corresponding to the same kind of identifiers included in the metadata with the identifier generalized attribute values;
the dividing unit is configured to perform dividing of the generalized metadata into a nearest equal set according to a proximity principle of homogeneous identifiers, so that a user can perform query by inputting the homogeneous identifiers to obtain the generalized metadata in the nearest equal set, wherein each equal set at least comprises k-1 pieces of the generalized metadata with the same homogeneous identifier.
In combination with the second aspect, in certain alternative embodiments, the apparatus further comprises: a registration unit;
the registration unit is configured to perform sending of each piece of generalized metadata to an identification node management system to register each piece of generalized metadata in the identification node management system, so that a user can conveniently input the same kind of identifier to the identification node management system for query, thereby obtaining the corresponding generalized metadata.
In a third aspect, a processor-readable storage medium stores thereon a program that, when executed by a processor, implements the method for protecting data privacy of any one of the above.
In a fourth aspect, an electronic device comprises at least one processor, and at least one memory, a bus, connected to the processor; the processor and the memory complete mutual communication through the bus; the processor is used for calling the program instructions in the memory to execute any one of the above-mentioned methods for protecting data privacy.
By means of the technical scheme, the method, the device, the storage medium and the electronic equipment for protecting data privacy provided by the invention can perform the first mode on the metadata of each internet of things object stored by an enterprise informatization system so as to obtain the generalized metadata corresponding to each metadata, wherein one internet of things object corresponds to at least one piece of metadata, and each piece of metadata comprises a plurality of similar identifiers and identifier attribute values corresponding to the similar identifiers; in a first mode, executing the following steps for any metadata: based on the principle of a k-anonymous privacy protection model, generalizing the metadata according to the identifier attribute values corresponding to the same kind of identifiers included in the metadata to obtain an identifier generalized attribute value, and replacing the identifier attribute values corresponding to the same kind of identifiers included in the metadata with the identifier generalized attribute values; the enterprise informatization system divides the generalized metadata into a nearest equal set according to a proximity principle of the same kind of identifiers, so that a user can conveniently input the same kind of identifiers to query, and accordingly the generalized metadata in the nearest equal set is obtained, wherein each equal set meets the principle of the k-anonymous privacy protection model: at least k pieces of said generalized metadata having the same family identifier are included. Therefore, the method and the device can generalize each attribute value of the same-class identifier based on the principle of a k-anonymous privacy protection model to achieve the desensitization effect, and further set the equivalent set based on the principle of the k-anonymous privacy protection model, so that a user can input the same-class identifier to inquire, and therefore each piece of generalized metadata in the nearest equivalent set is obtained instead of a specific piece of generalized metadata, and the safety of the private data is improved. And because the attribute values of the identifiers of the same type in each piece of generalized metadata are generalized to the same value, a user can accurately analyze the attribute values to obtain the original attribute values of the identifiers of the same type without errors, namely, original data which is not generalized can not be obtained, and the protection of the invention on private data is further improved.
The foregoing description is only an overview of the technical solutions of the present invention, and the embodiments of the present invention are described below in order to make the technical means of the present invention more clearly understood and to make the above and other objects, features, and advantages of the present invention more clearly understandable.
Detailed Description
Exemplary embodiments of the present disclosure will be described in more detail below with reference to the accompanying drawings. While exemplary embodiments of the present disclosure are shown in the drawings, it should be understood that the present disclosure may be embodied in various forms and should not be limited to the embodiments set forth herein. Rather, these embodiments are provided so that this disclosure will be thorough and complete, and will fully convey the scope of the disclosure to those skilled in the art.
As shown in fig. 1, the present invention provides a method for protecting data privacy, including: s100 and S200;
s100, executing a first mode on each metadata of each Internet of things object stored by an enterprise informatization system to obtain generalized metadata corresponding to each metadata;
the Internet of things object corresponds to at least one piece of metadata, and each piece of metadata comprises a plurality of similar identifiers and identifier attribute values corresponding to the similar identifiers;
optionally, the enterprise information system may collect data of each internet of things object, and perform corresponding identification to obtain each metadata of each internet of things object, which is not limited in the present invention.
Optionally, the present invention does not specifically limit a piece of metadata, and any feasible manner is included in the protection scope of the present invention. For example, a piece of metadata may be understood as a string of long characters, which may include a plurality of large fields, each of which may be divided into at least one small field, and each of which may be a homogeneous identifier and corresponding metadata. For example, metadata dataA: a.01: a01-A.02: a02-A.03: a03-A.04: a04-A.05: a05-A.06: a06-A.07: a07, wherein A.01: a01 can be understood as one small field, A.02: a02 can be understood as another small field, and different small fields are separated by "-". A.01 may be an explicit identifier, A.02 and A.03 are quasi identifiers, A.04 and A.05 are sensitive identifiers, and A.06 and A.07 are non-sensitive identifiers. The attribute values corresponding to a.01, a.02, a.03, a.04, a.05, a.06 and a.07 are respectively a.01, a.02, a.03, a.04, a.05, a.06 and a.07 in sequence, in this case, a.01: a01 can also be understood as a large field because no other explicit identifier exists in the metadata dataA; a.02: a02-A.03: a03 may be understood as another large field, and the present invention is not limited thereto.
In connection with the above, the plurality of identifiers of the same type according to the present invention is understood to be a plurality of identifiers of the same type. I.e. an explicit identifier may be understood as a generic identifier and a quasi-identifier may be understood as another generic identifier, which for a quasi-identifier in metadata dataA may comprise both a.02 and a.03 generic identifiers, which is not limited by the invention.
For another example, in combination with the embodiment shown in fig. 1, in some alternative embodiments, the generic identifier is a quasi identifier, and the corresponding identifier attribute value is a quasi identifier attribute value, where the quasi identifier is a data column that cannot directly characterize the user corresponding to the metadata, but can characterize the user corresponding to the metadata by combining multiple columns of external data or other external information, and the quasi identifier includes: name class identifier, gender class identifier, birthday class identifier or zip code class identifier;
or, the generic identifier is an explicit identifier, and the corresponding identifier attribute value is an explicit identifier attribute value, where the explicit identifier is a data column capable of directly characterizing a user or an object of the internet of things corresponding to the metadata, and the explicit identifier includes: an identification number class identifier or an item unique code class identifier.
Or, the generic identifier is a sensitive identifier, and the corresponding identifier attribute value is a sensitive identifier attribute value, where the sensitive identifier is a data column capable of characterizing sensitive privacy data of a user corresponding to the metadata, and the sensitive identifier includes: salary class identifiers or disease class identifiers.
Alternatively, as mentioned above for the metadata dataA, if the quasi-identifier is a name class identifier, the quasi-identifier in the metadata dataA may be replaced by the name class identifier: a.01: a 01-name: a 02-name: a03-A.04: a04-A.05: a05-A.06: a06-A.07: a07, and the metadata comprises two names, namely a02 and a03, which are not limited by the invention.
Optionally, as mentioned above for the metadata dataA, if the explicit identifier is an item-unique encoding class identifier, the explicit identifier in the metadata dataA may be replaced by the item-unique encoding class identifier: the item code is a 01-name a 02-name a03-A.04: a04-A.05: a05-A.06: a06-A.07: a07, wherein the metadata includes an item code a01, which is not limited in the present invention.
Optionally, as described in the foregoing, if the sensitive identifier is a salary identifier, the sensitive identifier in the metadata dataA may be replaced with the salary identifier: the article code is a 01-name a 02-name a 03-month salary a 04-month salary a05-A.06: a06-A.07: a07, wherein there are two month salaries in the metadata, which are a04 and a05, respectively, and the invention is not limited thereto.
Optionally, any metadata may include all or part of the quasi-identifiers, explicit identifiers, sensitive identifiers, non-sensitive identifiers, and other identifiers, which is not limited in the present invention.
In a first mode, executing the following steps for any metadata: based on the principle of a k-anonymous privacy protection model, generalizing the metadata according to the identifier attribute values corresponding to the same kind of identifiers included in the metadata to obtain an identifier generalized attribute value, and replacing the identifier attribute values corresponding to the same kind of identifiers included in the metadata with the identifier generalized attribute values;
alternatively, the method is performed based on the principle of a k-anonymous privacy protection model, which is a well-known model in the art, and refer to the related description in the art specifically, and the present invention is not described in detail, and the present invention is not limited thereto.
Optionally, for any metadata, if the metadata includes a quasi identifier, an explicit identifier, and a sensitive identifier, the metadata may be generalized according to quasi identifier attribute values corresponding to all quasi identifiers to obtain a quasi identifier generalized attribute value; generalizing according to the display identifier attribute values corresponding to all the explicit identifiers to obtain a display identifier generalized attribute value; and performing generalization processing according to the sensitive identifier attribute values corresponding to all the sensitive identifiers to obtain a generalized sensitive identifier attribute value.
Of course, if the quasi-identifier covers a plurality of different quasi-identifiers, for example, the quasi-identifiers include: name class identifiers, gender class identifiers and birthday class identifiers, or performing more detailed generalization, that is, performing generalization processing according to the quasi-identifier attribute values corresponding to all name class identifiers to obtain a quasi-identifier generalized attribute value (corresponding to the name class identifier); the method can also carry out generalization treatment according to the quasi-identifier attribute values corresponding to all gender identifiers to obtain a quasi-identifier generalized attribute value (corresponding to the gender identifiers); or generalizing the quasi-identifier attribute values corresponding to all the birthday class identifiers to obtain a quasi-identifier generalized attribute value (corresponding to the birthday class identifier), which is not limited in the present invention.
S200, dividing each generalized metadata into a nearest equal set by the enterprise informatization system according to a proximity principle of the same-class identifier, so that a user can conveniently input the same-class identifier to inquire, and thus each generalized metadata in the nearest equal set is obtained;
wherein each of the equal sets satisfies the principles of the k-anonymous privacy preserving model: at least k pieces of the generalized metadata with the same category identifier are included, and k is an integer not less than 1.
Optionally, records with the same identifier (e.g., quasi-identifier) form an equivalence class, which is referred to as an equivalence set in the present invention, and each record is indistinguishable from at least K-1 records in the table prior to the release of the data. In fact, the equal set is just called, that is, the records with the same identifier form an equal set, so that the records in the same equal set cannot be distinguished after the data is processed.
Optionally, for the k-anonymous privacy protection model, the larger the value of k is, the higher the degree of generalization is required to be, the larger the loss to data is, that is, the usability of data is poor, the degree of privacy protection of data is high, and the value of k needs to be analyzed according to specific conditions, which is not limited by the present invention.
Optionally, the present invention is not limited to the following principle, and any feasible manner is within the protection scope of the present invention. For example, in connection with the embodiment shown in fig. 1, in some alternative embodiments, the proximity principle of the generic identifier includes: for any of the generalized metadata: determining the equality set as the nearest equality set if the homogeneous identifier of the generalized metadata is homogeneous with the homogeneous identifiers of each of the generalized metadata in the equality set; determining the nearest equal set according to a storage address if the homogeneous identifier of the generalized metadata is not homogeneous with the homogeneous identifier of each of the generalized metadata in each of the equal sets, wherein the storage address of the generalized metadata in a database is adjacent to the storage address of at least one of the generalized metadata in the nearest equal set in the database.
Optionally, whether the homogeneous identifier of the generalized metadata is homogeneous with the homogeneous identifier of each of the generalized metadata in the equal set needs to be determined in accordance with the foregoing granularity in the generalization process. For example, if the data generalization is performed with quasi-identifiers as the granularity, whether the homogeneous identifier is homogeneous with the homogeneous identifier of each of the generalized metadata in the equal set may be understood as: whether the homogeneous identifier of the generalized metadata and the homogeneous identifier of each of the generalized metadata in the equal set are quasi-identifiers is not limited in this embodiment.
For another example, if the data generalization is performed with a name class identifier in a quasi-identifier as a granularity, whether the similar type identifier is similar to the similar type identifier of each of the generalized metadata in the equal set may be understood as: whether the homogeneous identifier of the generalized metadata and the homogeneous identifiers of each of the generalized metadata in the equal set are name-like identifiers is not limited in this respect.
Optionally, when storing the generalized metadata in the data, different equal sets may be referred to and stored in corresponding storage addresses, that is, the generalized metadata in the same equal set is stored at a continuous storage address, so that the storage addresses of the generalized metadata in the same equal set are continuous. Based on this situation, if the homogeneous identifier of the generalized metadata is not in a same class as the homogeneous identifier of each of the generalized metadata in each of the equal sets, determining the nearest equal set according to a storage address, where the storage address of the generalized metadata in the database is adjacent to the storage address of at least one of the generalized metadata in the nearest equal set in the database, which is not limited by the present invention.
Optionally, as mentioned above, the homogeneous identifiers of the generalized metadata of the same equal set are consistent, and the corresponding identifier attribute values are also the same. Therefore, a user, for example, an attacker cannot directly obtain the data of one user desired by the user through inputting the same-class identifier, but obtains the data of the whole equal set, so that the attacker cannot distinguish the data of each user, and the data privacy of the user is protected.
Optionally, after obtaining the plurality of pieces of generalized metadata, the attacker analyzes the identifiers of the same type of the generalized metadata to obtain corresponding identifier attribute values. The identifier attribute values under normal conditions are of practical significance, i.e. specific privacy data that may characterize a specific user. However, since the present invention has already generalized the metadata to obtain the generalized metadata, the identifier attribute value of the generalized metadata is subjected to the generalization desensitization process. That is, the attacker cannot reversely analyze the identifier attribute value of the metadata without knowing the generalization desensitization rule used by the present invention, so that the attacker obtains the identifier generalization attribute value without practical significance.
In some alternative embodiments, in combination with the embodiment shown in fig. 1, the method further comprises: the enterprise informatization system sends each generalized metadata to an identification node management system so as to register each generalized metadata in the identification node management system, so that a user can conveniently input the same kind of identifiers to the identification node management system for query, and accordingly the corresponding generalized metadata can be obtained.
Optionally, the identifier node management system may include an identifier registration module and an identifier resolution module, where the identifier registration module is in communication connection with the enterprise information system, and the identifier registration module is in communication connection with the identifier resolution module. The enterprise informatization system sends each generalized metadata to an identification registration module so as to register each generalized metadata in the identification node management system. The user may input the similar identifier to the identifier parsing module to query and obtain the corresponding generalized metadata, which is not limited in the present invention.
With reference to the implementation manner shown in fig. 1, in some optional implementation manners, in the principle of the k-anonymous privacy protection model in S200, generalizing the metadata according to the identifier attribute value corresponding to each of the same-class identifiers included in the metadata to obtain an identifier generalized attribute value, where the method includes: based on the principle of a k-anonymous privacy protection model, the identifier attribute values corresponding to the same type of identifiers included in the metadata are spliced in sequence, so that one identifier generalized attribute value is obtained.
Optionally, the present invention does not limit the specific way of generalization processing, and any feasible way falls into the protection scope of the present invention. The generalization processing method of sequentially splicing the identifier attribute values corresponding to the similar identifiers included in the metadata is only an optional embodiment proposed by the present inventors, and the present invention is not limited thereto.
For example, the generalization process of the present invention may be: the range is set, and for the numerical data, the numerical values corresponding to the attributes in the equal set are expressed in intervals from the minimum value to the maximum value, for example, the age in the equal set is 20, 21, 19, 25, and then the range is generalized to [19-25 ]. For character-type data, a wildcard "+" may be used instead of a partial character, which is not limited by the present invention.
In some alternative embodiments, in combination with the embodiment shown in fig. 1, the method further comprises: the enterprise informatization system judges whether the number of the generalized metadata included in each equal set is less than k or greater than 2 k;
if the number of pieces of the generalized metadata included in the equal set is less than k, dividing the generalized metadata included in the equal set into other nearest equal sets so as to solve the equal sets of which the number of pieces of the generalized metadata is less than k;
if the number of pieces of the generalized metadata included in the equal sets is greater than 2k, dividing the generalized metadata included in the equal sets into two equal sets, wherein the number of pieces of the generalized metadata included in each of the two equal sets is not less than k.
Optionally, in order to improve the effect of the present invention, the number of pieces of generalized metadata included in each equal set should be kept in a relatively balanced state, i.e., it should be avoided that the number of pieces of generalized metadata of some equal sets is too large or too small.
Optionally, by using the method provided by the present invention, data privacy protection of the sensitive attribute in the process of identifier analysis can be performed, and the corresponding relationship between the sensitive data of identifier analysis and the identification individual (object marked by the identifier) is stripped, so that an attacker cannot uniquely derive a specific individual corresponding to the sensitive information, and the problem of privacy disclosure caused by link attack is effectively solved, thereby realizing protection of user privacy.
The data in the identification analysis is processed anonymously, so that the privacy protection of the user is realized with low cost in some scenes without high data quality, and an attacker cannot uniquely deduce a specific individual corresponding to the sensitive information.
The principle of the k-anonymous privacy protection model realizes anonymization of identification analysis data (privacy protection of key sensitive data) from the privacy protection perspective, and can not only open query but also prevent privacy disclosure of some association relations.
K anonymization is performed on quasi-identifiers in identifier analysis data, for example, how to protect privacy by means of sensitive data existing in an analysis result obtained by analyzing the identifier "20.500.100/36" by Handle, and protection by using k anonymization is a very low-cost method. The attacker cannot distinguish the specific individual (such as a business or a person associated with the identification) to which the private information of '20.500.100/36' belongs, so that the privacy of the business or the person is protected, and k is anonymous and specifies the maximum information leakage risk which can be borne by the user through a parameter k. k-anonymity protects the privacy of a person or business to some extent, but at the same time reduces the availability of data.
For example, (1) in an enterprise informatization system, data corresponding to a001 (a.01) is identified as dataA, the dataA includes a series of metadata attribute fields (identifier fields) and field values (identifier attribute values), the metadata attribute fields of dataA are a.01, a.02, a.03, a.04, a.05, a.06, a.07, etc., where a.01 is an explicit identifier, a.02, a.03 are quasi-identifier attributes, a.04, a.05 are sensitive attributes, and a.06, a.07 are non-sensitive attributes. The metadata attributes A.01, A.02, A.03, A.04, A.05, A.06 and A.07 correspond to attribute values of a.01, a.02, a.03, a.04, a.05, a.06 and a.07 respectively.
(2) In the enterprise informatization system, data corresponding to the identifier B001 is identified as dataB, metadata attributes of the data dataB are B.01, B.02, B.03, B.04, B.05, B.06, B.07 and the like, wherein B.01 is an explicit identifier attribute, B.02 and B.03 are quasi identifier attributes, B.04 and B.05 are sensitive attributes, and B.06 and B.07 are non-sensitive attributes.
(3) In the enterprise informatization system, data corresponding to the identifier C001 is identified as dataC, metadata attributes of the data dataC are C.01, C.02, C.03, C.04, C.05, C.06, C.07 and the like, wherein C.01 is an explicit identifier attribute, C.02 and C.03 are quasi identifier attributes, C.04 and C.05 are sensitive attributes, and C.06 and C.07 are non-sensitive attributes.
(4) In the enterprise informatization system, data corresponding to the identifier D001 is dataD, metadata attributes of the data dataD are D.01, D.02, D.03, D.04, D.05, D.06, D.07 and the like, wherein D.01 is an explicit identifier attribute, D.02 and D.03 are quasi identifier attributes, D.04 and D.05 are sensitive attributes, and D.06 and D.07 are non-sensitive attributes.
(5) The data is anonymized, before the information system registers the identifier a001 with the identifier node management system, the data is processed, so that a certain number (at least k) of tuples with the same value on the quasi-identifier attribute exist in each tuple of the data set of the identifier metadata in the identifier node management system, for example, the identifier a001 analyzes that a.02 and a.03 in the metadata attribute of the data dataA are quasi-identifier attributes, the attribute values of the a.02 and the a.03 are added, the processed data attribute values are a.01, a.02+ a.03, a.03+ a.02, a.04, a.05, a.06 and a.07, and the corresponding relation between the identified sensitive attribute and the identifier is stripped. In this way, even if an attacker connects with other data, the correspondence between the sensitive attribute of the analysis data dataA of the identifier a001 and the identifier cannot be uniquely determined, and the correspondence between the analysis data dataA and the object to which the analysis data dataA belongs can only be marked with a probability not exceeding 1/k, thereby reducing the risk of privacy leakage. When the information system registers the identifier a001 with the identifier node management system, the registered metadata information includes attribute fields a.01, a.02, a.03, a.04, a.05, a.06, and a.07, and their corresponding attribute values a.01, a.02+ a.03, a.03+ a.02, a.04, a.05, a.06, and a.07.
(6) Similarly, the identifiers B001, C001 and D001 are processed, so that the quasi-identifier attribute values of a plurality of metadata records corresponding to the identifiers A001, B001, C001 and D001 are the same, the set of metadata information with the same quasi-identifier attribute value is called an equal set, k anonymization requires that the number of records in the equal set to which any identifier metadata record belongs is not less than k, namely at least k-1 quasi-identifier attribute value of the metadata information record is the same as the quasi-identifier attribute value of the identifier metadata information record, finally, whether the number of the identifier metadata information records in each equal set is between k and 2k is checked, the equal set less than k is to be disassembled, the identifier metadata information records in the disassembled equal set are added to other equal sets nearby (according to the quasi-identifier), and the number of the identifier metadata information records in the equal sets is more than 2k, the equality sets need to be split into several equality sets so that eventually each equality set contains between k and 2k identification metadata information records. Therefore, privacy protection can be realized, the corresponding relation between the sensitive data corresponding to the identification and the identification individuals (the physical objects and the virtual resources marked by the identification) is stripped, an attacker cannot uniquely derive the specific individuals corresponding to the sensitive information, the privacy disclosure problem caused by link attack is effectively solved, and the protection of the user privacy is realized.
(7) In the identification node management system, an identification analysis request of an internet of things identification A001 (a.01) is initiated, and identification analysis data dataA is returned, wherein the data attribute fields A.01, A.02, A.03, A.04, A.05, A.06 and A.07 of the dataA are added with corresponding attribute values a.01, a.02+ a.03, a.03+ a.02, a.04, a.05, a.06 and a.07. If the user is a special authorized user, returning data which is not anonymous through k. Therefore, if the identification node management system is subjected to network attack or network crawler, an attacker cannot uniquely derive the specific individual corresponding to the sensitive information even if acquiring the data corresponding to the identification, and the problem of privacy disclosure caused by link attack is effectively solved, so that the user privacy is protected.
As shown in fig. 2, the present invention provides a device for protecting data privacy, including: a generalization unit 100 and a division unit 200;
the generalization unit 100 is configured to execute a first execution mode on metadata of each internet of things object stored in the generalization unit, so as to obtain generalized metadata corresponding to each piece of metadata, where one internet of things object corresponds to at least one piece of metadata, and each piece of metadata includes a plurality of similar identifiers and identifier attribute values corresponding to each similar identifier;
in a first mode, executing the following steps for any metadata: based on the principle of a k-anonymous privacy protection model, generalizing the metadata according to the identifier attribute values corresponding to the same kind of identifiers included in the metadata to obtain an identifier generalized attribute value, and replacing the identifier attribute values corresponding to the same kind of identifiers included in the metadata with the identifier generalized attribute values;
the dividing unit 200 is configured to perform dividing the generalized metadata into a nearest equal set according to a proximity principle of homogeneous identifiers, so that a user can perform a query by inputting the homogeneous identifiers, thereby obtaining the generalized metadata in the nearest equal set, where each equal set at least includes k-1 pieces of the generalized metadata with the same homogeneous identifier.
In some alternative embodiments, in combination with the embodiment shown in fig. 2, the apparatus further comprises: a registration unit;
the registration unit is configured to perform sending of each piece of generalized metadata to an identification node management system to register each piece of generalized metadata in the identification node management system, so that a user can conveniently input the same kind of identifier to the identification node management system for query, thereby obtaining the corresponding generalized metadata.
With reference to the embodiment shown in fig. 2, in some optional embodiments, the generalization unit 100 executes a principle based on a k-anonymous privacy protection model, and performs generalization processing on the metadata according to the identifier attribute value corresponding to each of the same-class identifiers included in the metadata to obtain an identifier generalized attribute value, which specifically includes: a first sub-unit;
the first mode subunit is configured to execute a principle based on a k-anonymous privacy protection model, and sequentially concatenate the identifier attribute values corresponding to the same type identifiers included in the metadata, so as to obtain one identifier generalized attribute value.
In some alternative embodiments, in combination with the embodiment shown in fig. 2, the apparatus further comprises: the device comprises a judging unit, a first dividing subunit and a second dividing subunit;
the judging unit is configured to perform judgment on whether the number of pieces of the generalized metadata included in each of the equal sets is less than k or greater than 2 k;
the first dividing unit is configured to divide the generalized metadata included in the equal set into other nearest equal sets if the number of pieces of the generalized metadata included in the equal set is less than k, so as to dissolve the equal set whose number of pieces of the generalized metadata is less than k;
the second dividing subunit is configured to divide the generalized metadata included in the equal sets into two equal sets if the number of pieces of the generalized metadata included in the equal sets is greater than 2k, where the number of pieces of the generalized metadata included in each of the two equal sets is not less than k.
The present invention provides a processor-readable storage medium on which a program is stored, the program implementing any one of the methods for protecting data privacy described above when executed by a processor.
As shown in fig. 3, the present invention provides an electronic device 70, wherein the electronic device 70 comprises at least one processor 701, at least one memory 702 connected to the processor 701, and a bus 703; the processor 701 and the memory 702 complete communication with each other through the bus 703; the processor 701 is configured to call the program instructions in the memory 702 to execute any one of the above-mentioned methods for protecting data privacy.
In this application, relational terms such as first and second, and the like may be used solely to distinguish one entity or action from another entity or action without necessarily requiring or implying any actual such relationship or order between such entities or actions. Also, the terms "comprises," "comprising," or any other variation thereof, are intended to cover a non-exclusive inclusion, such that a process, method, article, or apparatus that comprises a list of elements does not include only those elements but may include other elements not expressly listed or inherent to such process, method, article, or apparatus. Without further limitation, an element defined by the phrase "comprising an … …" does not exclude the presence of other identical elements in a process, method, article, or apparatus that comprises the element.
All the embodiments in the present specification are described in a related manner, and the same and similar parts among the embodiments may be referred to each other, and each embodiment focuses on the differences from the other embodiments. In particular, for the system embodiment, since it is substantially similar to the method embodiment, the description is simple, and for the relevant points, reference may be made to the partial description of the method embodiment.
The previous description of the disclosed embodiments is provided to enable any person skilled in the art to make or use the present invention. Various modifications to these embodiments will be readily apparent to those skilled in the art, and the generic principles defined herein may be applied to other embodiments without departing from the spirit or scope of the invention. Thus, the present invention is not intended to be limited to the embodiments shown herein but is to be accorded the widest scope consistent with the principles and novel features disclosed herein.
The above description is only for the preferred embodiment of the present invention, and is not intended to limit the scope of the present invention. Any modification, equivalent replacement, or improvement made within the spirit and principle of the present invention shall fall within the protection scope of the present invention.