CN110750725A - Privacy-protecting user portrait generation method, terminal device and storage medium - Google Patents

Privacy-protecting user portrait generation method, terminal device and storage medium Download PDF

Info

Publication number
CN110750725A
CN110750725A CN201911018936.6A CN201911018936A CN110750725A CN 110750725 A CN110750725 A CN 110750725A CN 201911018936 A CN201911018936 A CN 201911018936A CN 110750725 A CN110750725 A CN 110750725A
Authority
CN
China
Prior art keywords
dimension
count
rectangular
label data
subspace
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN201911018936.6A
Other languages
Chinese (zh)
Inventor
霍峥
王腾
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Business & Economics In Hebei, University of
Original Assignee
Business & Economics In Hebei, University of
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Business & Economics In Hebei, University of filed Critical Business & Economics In Hebei, University of
Priority to CN201911018936.6A priority Critical patent/CN110750725A/en
Publication of CN110750725A publication Critical patent/CN110750725A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/90Details of database functions independent of the retrieved data types
    • G06F16/95Retrieval from the web
    • G06F16/953Querying, e.g. by the use of web search engines
    • G06F16/9535Search customisation based on user profiles and personalisation
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/90Details of database functions independent of the retrieved data types
    • G06F16/906Clustering; Classification
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F21/00Security arrangements for protecting computers, components thereof, programs or data against unauthorised activity
    • G06F21/60Protecting data
    • G06F21/62Protecting access to data via a platform, e.g. using keys or access control rules
    • G06F21/6218Protecting access to data via a platform, e.g. using keys or access control rules to a system of files or objects, e.g. local or distributed file system or database
    • G06F21/6245Protecting personal data, e.g. for financial or medical purposes

Landscapes

  • Engineering & Computer Science (AREA)
  • Databases & Information Systems (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Data Mining & Analysis (AREA)
  • Health & Medical Sciences (AREA)
  • Bioethics (AREA)
  • General Health & Medical Sciences (AREA)
  • Medical Informatics (AREA)
  • Computer Hardware Design (AREA)
  • Computer Security & Cryptography (AREA)
  • Software Systems (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

The invention is applicable to the field of data mining, and particularly relates to a user portrait generation method for protecting privacy, a terminal device and a storage medium. The user portrait generation method for protecting privacy comprises the following steps: acquiring a label data set of a user; creating a multi-dimensional space based on the dimensions of the label data in the label data set, and dividing the multi-dimensional space into a plurality of mutually disjoint rectangular units; acquiring the number of the label data in the rectangular unit, and recording the number as a first count of the rectangular unit; adding noise to the first count of the rectangular unit to obtain a second count of the rectangular unit; and obtaining a user portrait based on the rectangular units with the second counts meeting the preset conditions. When the user label data set is clustered, the count value of each rectangular unit of label data in the label data set is processed, and the count value is protected, so that the information privacy of a user is protected.

Description

Privacy-protecting user portrait generation method, terminal device and storage medium
Technical Field
The invention is applicable to the field of data mining, and particularly relates to a user portrait generation method for protecting privacy, a terminal device and a storage medium.
Background
Generating a user representation is typically performed by clustering the user tag data. When clustering is performed, the clustering algorithm may be classified into distance-based clustering, hierarchy-based clustering, distance-based clustering, partition-based clustering, grid-based clustering, density-based clustering, model-based clustering, and the like according to its implementation technique. Most of these clustering algorithms are distance and density based algorithms.
The density-based clustering algorithm can only find spherical clusters, and the number of found clusters depends on the specification of user parameters, which is very difficult for users. And a clustering algorithm based on grids and density can identify dense subspaces in high-dimensional data and can find clusters in any shapes. However, simple clustering of user tags using grid and density based algorithms to generate user portraits may lead to leakage of privacy with individuals.
Disclosure of Invention
In view of the above, embodiments of the present invention provide a user representation generating method, a terminal device, and a storage medium for protecting privacy, so as to solve a problem that privacy may be leaked in an existing method for generating a user representation.
The first aspect of the embodiments of the present invention provides a user portrait generation method for protecting privacy, including:
acquiring a label data set of a user;
creating a multi-dimensional space based on the dimensions of the label data in the label data set, and dividing the multi-dimensional space into a plurality of mutually disjoint rectangular units;
acquiring the number of the label data in the rectangular unit, and recording the number as a first count of the rectangular unit;
adding noise to the first count of the rectangular unit to obtain a second count of the rectangular unit;
and obtaining a user portrait based on the rectangular units with the second counts meeting the preset conditions.
A second aspect of an embodiment of the present invention provides a user representation generation system for protecting privacy, including:
the acquisition module is used for acquiring a label data set of a user;
the dividing module is used for creating a multi-dimensional space based on the dimensions of the label data in the label data set and dividing the multi-dimensional space into a plurality of mutually-disjoint rectangular units;
the counting module is used for acquiring the number of the label data in the rectangular unit and recording the number as a first count of the rectangular unit;
the noise module is used for adding noise to the first count of the rectangular unit to obtain a second count of the rectangular unit;
and the generating module is used for obtaining the user portrait based on the rectangular unit with the second count meeting the preset condition.
A third aspect of an embodiment of the present invention provides a terminal device, including: a memory, a processor and a computer program stored in the memory and executable on the processor, the processor implementing the steps of the privacy preserving user representation generation method of the first aspect when executing the computer program.
A fourth aspect of embodiments of the present invention provides a computer-readable storage medium, in which a computer program is stored, and the computer program, when executed by a processor, implements the steps of the privacy-preserving user representation generation method of the first aspect.
The method comprises the steps of obtaining a label data set of a user, creating a multi-dimensional space based on the dimension of label data in the label data set, dividing the multi-dimensional space into a plurality of mutually-disjointed rectangular units, obtaining the number of the label data in the rectangular units, recording the number as a first count of the rectangular units, conducting privacy protection on a first count value of the rectangular units by adding noise to the first count of the rectangular units to obtain a second count of the rectangular units, generating clustering through the rectangular units meeting a preset condition and meeting a preset condition to obtain a user portrait, protecting a real count value of the data of the user label while finishing generating the user portrait based on the user label, and preventing privacy leakage.
Drawings
In order to more clearly illustrate the technical solutions in the embodiments of the present invention, the drawings needed to be used in the embodiments or the prior art descriptions will be briefly described below, and it is obvious that the drawings in the following description are only some embodiments of the present invention, and it is obvious for those skilled in the art to obtain other drawings based on these drawings without inventive exercise.
FIG. 1 is a schematic flow chart diagram of a privacy preserving user representation generation method provided by an embodiment of the invention;
FIG. 2 is a schematic diagram of the rectangular unit partition of the three-dimensional space provided by the embodiment of the invention;
FIG. 3 is a diagram illustrating a first count value of a rectangular unit in a one-dimensional space according to an embodiment of the present invention;
FIG. 4 is a diagram illustrating a first count value of a rectangular unit in a two-dimensional space according to an embodiment of the present invention;
FIG. 5 is a diagram illustrating a first count value of a rectangular unit in a three-dimensional space according to an embodiment of the present invention;
FIG. 6 is a schematic diagram of a privacy preserving user representation generation system according to an embodiment of the present invention;
fig. 7 is a schematic diagram of a terminal device according to an embodiment of the present invention.
Detailed Description
In the following description, for purposes of explanation and not limitation, specific details are set forth, such as particular system structures, techniques, etc. in order to provide a thorough understanding of the embodiments of the invention. It will be apparent, however, to one skilled in the art that the present invention may be practiced in other embodiments that depart from these specific details. In other instances, detailed descriptions of well-known systems, devices, circuits, and methods are omitted so as not to obscure the description of the present invention with unnecessary detail.
The terms "comprises" and "comprising," as well as any other variations, in the description and claims of this invention and the drawings described above, are intended to mean "including but not limited to," and are intended to cover non-exclusive inclusions. For example, a process, method, or system, article, or apparatus that comprises a list of steps or elements is not limited to only those steps or elements listed, but may alternatively include other steps or elements not listed, or inherent to such process, method, article, or apparatus. Furthermore, the terms "first," "second," and "third," etc. are used to distinguish between different objects and are not used to describe a particular order.
In order to explain the technical means of the present invention, the following description will be given by way of specific examples.
Fig. 1 is a schematic flow diagram of a privacy-preserving user representation generation method according to an embodiment of the present invention, and referring to fig. 1, the privacy-preserving user representation generation method may include:
step S101, a label data set of a user is obtained.
In an embodiment of the invention, the tag data is used to characterize the user's characteristic information. By way of example, on a social networking site, some tag information or remark information of a certain user. Taking the label information as an example, the label information may be used to characterize the user, such as "rock youth", "90 th", "lipstick control", and so on. All the tag information of a certain user are combined together to form tag data of the user. In practical application, the labels can be classified in advance, the label of each category represents one dimension, and the number of times of using or marking the label of the category can be recorded. The set formed by the label data of a plurality of users is the label data set of the users.
Step S102, a multi-dimensional space is created based on the dimensions of the label data in the label data set, and the multi-dimensional space is divided into a plurality of mutually disjoint rectangular units.
In the embodiment of the present invention, for convenience of description and understanding, it is assumed that the tag data set of the user is three-dimensional, and correspondingly, the multi-dimensional space created based on the dimensions of the tag data is also three-dimensional. Referring to fig. 2, the three-dimensional space is divided into 27 mutually disjoint rectangular units with an interval size of "2", where t1、t2And t3Labels representing three categories, three dimensions, respectively. After the division, the user's label is shown as a black dot in FIG. 2The tag data in the data set falls into rectangular cells.
Step S103, acquiring the number of the label data in the rectangular unit, and recording the number as the first count of the rectangular unit.
In the embodiment of the present invention, referring to fig. 2, the number of tag data included in each rectangular unit may be counted as the first count of each rectangular unit.
And step S104, adding noise to the first count of the rectangular unit to obtain a second count of the rectangular unit.
In the embodiment of the present invention, in order to protect the number of tag data included in each rectangular unit to prevent privacy leakage, noise may be added to the first count value of each rectangular unit to obtain the second count value of each rectangular unit.
And step S105, obtaining a user portrait based on the rectangle unit with the second count meeting the preset condition.
In the embodiment of the invention, the rectangular units with the second counts meeting the preset conditions form a set, one rectangular unit is selected from the set to serve as an initial cluster, the rectangular units except the initial cluster in the set are traversed, other rectangular units with the second counts meeting the preset conditions and communicated with the rectangular units are added into the initial cluster to obtain the cluster, and the cluster is taken as a user portrait.
The method comprises the steps of obtaining a label data set of a user, creating a multi-dimensional space based on the dimension of label data in the label data set, dividing the multi-dimensional space into a plurality of mutually-disjointed rectangular units, obtaining the number of the label data in the rectangular units, recording the number as a first count of the rectangular units, conducting privacy protection on a first count value of the rectangular units by adding noise to the first count of the rectangular units to obtain a second count of the rectangular units, generating clustering through the rectangular units meeting a preset condition and meeting a preset condition to obtain a user portrait, protecting a real count value of the data of the user label while finishing generating the user portrait based on the user label, and preventing privacy leakage.
Optionally, acquiring the number of the tag data in the rectangular unit, and recording as the first count of the rectangular unit may include:
under one dimension, selecting one dimension, projecting the label data on the selected dimension to obtain the number of the label data in each subspace under the current dimension, and recording the number of the label data as a first count of each subspace under the current dimension, wherein the subspace under the current dimension is a space formed by projection of the rectangular unit under the current dimension; adding noise to the first count of each subspace in the current dimension to obtain a second count of each subspace in the current dimension; and recording the subspace of which the second count meets the preset condition as a current dimension dense unit.
In the embodiment of the present invention, referring to FIG. 3, a dimension t is selected1As a dimension, corresponding to a subspace s in the current dimension1、s2And s3I.e. rectangular unit at t1The space formed by the projection in the dimension. Tag data at t1The projection of (c) is counted with count(s)i) Representing the first count of the ith subspace, as shown in FIG. 3, the first count of each subspace can be found: count(s)1)=4,count(s2)=3,count(s3) 2. Adding noise to each first count, which in the embodiment of the present invention may be laplacian noise, the probability density function of which is:
Laplace(x)=exp(-|x|×ε’/Δf)
wherein ∈, —, and Δ f are preset constants, n represents a dimension of the label data set, and x represents a variable of the probability density function. Using count'(s)i) A second count representing the ith subspace, resulting in a second count after adding noise: count'(s)1)=4.7,count’(s2)=4.1,count’(s3) 1.9. Since the count value cannot be a decimal number, each second count value is rounded. Recording the subspace of which the second count meets a preset condition as a current dimension dense unit, wherein the preset condition is as follows: if the second count is greater than δThen the subspace corresponding to the second count is the current dimension dense element. In the embodiment of the present invention, δ is 1, and count'(s) is used1)、count’(s2) And count'(s)3) All satisfy the preset condition, therefore s1、s2And s3Are all of dimension t1Dense cells.
Optionally, acquiring the number of the tag data in the rectangular unit, and recording as the first count of the rectangular unit may further include:
under the i dimension, adding a dimension on the basis of the dimension selected in the i-1 dimension as the dimension selected in the i dimension, projecting the label data on the selected dimension to obtain the number of the label data in a subspace effective to the current dimension, and recording the number of the label data as a first count of the subspace associated with the upper dimension; the subspace with the effective current dimension is a subspace corresponding to the dense unit of the previous dimension in the current dimension; adding noise to the first count of the subspace with the effective current dimension to obtain a second count of the subspace with the effective current dimension; recording the subspace of which the second count meets the preset condition as a current dimension dense unit;
under n dimensions, adding one dimension as the dimension selected by the n dimensions on the basis of the dimension selected by the n-1 dimensions, projecting the label data on the selected dimension to obtain the number of the label data in a subspace effective to the current dimension, and recording the number of the label data as a first count of the subspace associated with the upper dimension; where n represents the dimension of the set of label data, i ∈ [2, n-1 ].
In the embodiment of the present invention, when i is 2, see fig. 4, in two dimensions, in one dimension t is selected1On the basis of the (A) adding a dimension t2As a two-dimensional selected dimension, tag data is taken at t1And t2I.e. projecting the label data at t1And t2And projecting the formed two-dimensional plane to obtain the number of the label data in the subspace effective to the current dimension. Wherein, the subspace with the current dimension being valid is the upper dimension t1Dense cell ofs1、s2And s3In the current dimension t2The lower corresponding subspace, see fig. 4. In the embodiment of the invention, the dimension t is1The subspaces below are all dense units, so the front dimension t2The effective subspace of the lower correspondence includes the current dimension t2All subspaces of the lower correspondence. Adding noise to the first count of the subspace with the current dimension being valid, delaying the noise by using laplacian noise, obtaining a second count of the subspace with the current dimension being valid, and calculating to obtain a second count meeting a preset condition according to fig. 4: count'(s)11)=4,count’(s12)=2,count’(s13) When the result is 3, s is11、s12And s13Is denoted by t1And t2Dense units of dimension.
In the embodiment of the present invention, since the tag data set of the user is three-dimensional, and the corresponding data space is also three-dimensional, n is 3. Referring to FIG. 5, the dimension t is selected in two dimensions1And t2On the basis of the (A) adding a dimension t3As the three-dimensional selected dimension, tag data is taken at t1、t2And t3Projection in dimensions (i.e. the present three-dimensional space), in three dimensions, s11、s12And s13The corresponding effective subspace is s11X,s12X,s13XThe total number of the subspaces is 9, and the 9 subspaces at this time are 9 rectangular units in the 27 rectangular units divided in step S102. According to fig. 5, the first count of the 9 subspaces is obtained, i.e., the first count of the rectangular cells in the three-dimensional space.
Optionally, the adding noise to the first count of the rectangular unit may include:
and adding Laplace noise to the first count of the rectangular unit, and rounding the first count after the addition of the Laplace noise to obtain a second count of the rectangular unit.
Optionally, the probability density function of the laplacian noise may be:
Laplace(x)=exp(-|x|×ε’/Δf)
wherein ∈, —, and Δ f are preset constants, n represents a dimension of the label data set, and x represents a variable of the probability density function.
In the embodiment of the present invention, the first counts of the 9 subspaces obtained in the above description, i.e., the first counts of the rectangular unit, are added with the laplacian noise, and are rounded to obtain the second counts of the rectangular unit. Wherein the process of adding noise may be the same as the process of adding noise to the first count of each subspace in each dimension described above.
Optionally, the obtaining a user portrait based on the rectangular unit of which the second count satisfies a preset condition may include:
and forming a set by the rectangular units with the second counts meeting the preset conditions, selecting one rectangular unit from the set as an initial cluster, traversing the rectangular units except the initial cluster in the set, adding other rectangular units with the second counts meeting the preset conditions, which are communicated with the rectangular units, into the initial cluster to obtain a cluster, and taking the cluster as a user portrait.
Optionally, the traversing the rectangular units in the set outside the initial cluster may include:
and traversing the rectangular units outside the initial cluster based on a depth-first principle.
In the embodiment of the present invention, referring to fig. 5, according to the obtained second count of the rectangular units, the rectangular units whose second count satisfies a preset condition are grouped into a set, where the preset condition is: the second count is greater than δ, which may be set according to actual requirements, and δ is equal to 1 in the embodiment of the present invention. Thereby obtaining a rectangular unit with the second count satisfying the preset condition: count'(s)113)=4,count’(s123)=3,count’(s133) Put it in set SC3In (1). As an example, the secondary SC may3Optionally selecting a rectangular unit s113As an initial cluster, the sum s can be found through traversal based on the depth-first principle113Connected rectangular units s123And combining the rectangular units s123Adding rectangular units s113In the formed clusters. Go on to traverse to find the second and s113Connected rectangular units s133And combining the rectangular units s133Adding rectangular units s113And a rectangular unit s123In the formed clusters. To this end, SC3The traversing of the rectangular units in the cluster C is finished to obtain a cluster C1={12,u1,u2,u3},u1=(0,6],u2=(0,2],u3=(4,6]Where l is the number of tag data contained in the cluster, uiTo cluster C1In the range of each dimension. Resulting clusters C1The user representation is obtained.
Therefore, the embodiment of the invention obtains the label data set of the user, creates the multidimensional space based on the dimension of the label data in the label data set, and divides the multidimensional space into a plurality of mutually disjoint rectangular units, and the label data set of the user is often high-dimensional data. When the number of the label data in the rectangular unit is obtained and is recorded as the first count of the rectangular unit, one dimension is added on the basis of the dimension selected in the i-1 dimension as the dimension selected in the i dimension, the label data is projected on the selected dimension to obtain the number of the label data in a subspace effective to the current dimension, and the number of the label data is recorded as the first count of the subspace associated with the upper dimension, so that the prior property in association rule mining is met, namely if one subspace in the i dimension is dense, the projection of the label data in the i-1 dimension is also dense, namely if the subspace of the projection in the i-1 dimension is not dense, the association subspace of the label data in the i dimension is not dense, namely, the data processing amount is reduced by introducing an effective subspace concept, the data processing efficiency is increased. And noise is introduced during calculation of each dimension, the second count is used as a judgment standard, the privacy of the tag data of the user is protected in each dimension space, and the privacy disclosure is avoided. And then, in the space with the highest dimension, adding noise on the first count of the rectangular unit, and carrying out privacy protection on the first count value of the rectangular unit to obtain a second count of the rectangular unit. Dense subspaces can be identified in high-dimensional data by identifying rectangular units meeting the second count of the preset conditions, and clusters of any shape can be found by traversing the dense subspaces according to the depth-first principle, so that the method has good flexibility and does not need to consider any normalized spatial data distribution. According to the embodiment of the invention, the user image is generated based on the user label, meanwhile, the real count value of the data of the user label is protected, and the privacy disclosure is prevented.
It should be understood that, the sequence numbers of the steps in the foregoing embodiments do not imply an execution sequence, and the execution sequence of each process should be determined by its function and inherent logic, and should not constitute any limitation to the implementation process of the embodiments of the present invention.
FIG. 6 is a schematic diagram of a privacy preserving user representation generation system provided by an embodiment of the present invention, and referring to FIG. 6, the privacy preserving user representation generation system 60 may include:
an obtaining module 61, configured to obtain a tag data set of a user;
a dividing module 62, configured to create a multidimensional space based on the dimensions of the tag data in the tag data set, and divide the multidimensional space into a plurality of mutually disjoint rectangular units;
a counting module 63, configured to obtain the number of the tag data in the rectangular unit, which is recorded as a first count of the rectangular unit;
a noise module 64, configured to add noise to the first count of the rectangular unit to obtain a second count of the rectangular unit.
And the generating module 65 is configured to obtain the user portrait based on the rectangular unit of which the second count satisfies a preset condition.
Optionally, the noise module is further configured to:
and adding Laplace noise to the first count of the rectangular unit, and rounding the first count after the addition of the Laplace noise to obtain a second count of the rectangular unit.
Optionally, the probability density function of the laplacian noise is:
Laplace(x)=exp(-|x|×ε’/Δf)
wherein ∈, —, and Δ f are preset constants, n represents a dimension of the label data set, and x represents a variable of the probability density function.
Optionally, the counting module is further configured to: under one dimension, selecting one dimension, projecting the label data on the selected dimension to obtain the number of the label data in each subspace under the current dimension, and recording the number of the label data as a first count of each subspace under the current dimension, wherein the subspace under the current dimension is a space formed by projection of the rectangular unit under the current dimension; adding noise to the first count of each subspace in the current dimension to obtain a second count of each subspace in the current dimension; and recording the subspace of which the second count meets the preset condition as a current dimension dense unit.
Optionally, the counting module is further configured to:
under the i dimension, adding a dimension on the basis of the dimension selected in the i-1 dimension as the dimension selected in the i dimension, projecting the label data on the selected dimension to obtain the number of the label data in a subspace effective to the current dimension, and recording the number of the label data as a first count of the subspace associated with the upper dimension; the subspace with the effective current dimension is a subspace corresponding to the dense unit of the previous dimension in the current dimension; adding noise to the first count of the subspace with the effective current dimension to obtain a second count of the subspace with the effective current dimension; recording the subspace of which the second count meets the preset condition as a current dimension dense unit;
under n dimensions, adding one dimension as the dimension selected by the n dimensions on the basis of the dimension selected by the n-1 dimensions, projecting the label data on the selected dimension to obtain the number of the label data in a subspace effective to the current dimension, and recording the number of the label data as a first count of the subspace associated with the upper dimension; where n represents the dimension of the set of label data, i ∈ [2, n-1 ].
Optionally, the generating module is further configured to:
and forming a set by the rectangular units with the second counts meeting the preset conditions, selecting one rectangular unit from the set as an initial cluster, traversing the rectangular units except the initial cluster in the set, adding other rectangular units with the second counts meeting the preset conditions, which are communicated with the rectangular units, into the initial cluster to obtain a cluster, and taking the cluster as a user portrait.
Optionally, the generating module is further configured to: and traversing the rectangular units outside the initial cluster based on a depth-first principle.
It is clear to those skilled in the art that, for convenience and brevity of description, only the above-mentioned division of the functional units and modules is illustrated, and in practical applications, the above-mentioned function distribution may be performed by different functional units and modules according to needs, that is, the internal structure of the privacy-protecting user representation generating system is divided into different functional units or modules to perform all or part of the above-mentioned functions. Each functional unit and module in the embodiments may be integrated in one processing unit, or each unit may exist alone physically, or two or more units are integrated in one unit, and the integrated unit may be implemented in a form of hardware, or in a form of software functional unit. In addition, specific names of the functional units and modules are only for convenience of distinguishing from each other, and are not used for limiting the protection scope of the present application. The specific working processes of the units and modules in the system may refer to the corresponding processes in the foregoing method embodiments, and are not described herein again.
Fig. 7 is a schematic diagram of a terminal device according to an embodiment of the present invention. As shown in fig. 7, in the present embodiment, the terminal device 70 includes: a processor 71, a memory 72 and a computer program 73 stored in said memory 72 and executable on said processor 71. The processor 71, when executing the computer program 73, implements the steps in the embodiments as in the first aspect of the embodiments, e.g. steps S101 to S105 shown in fig. 1. Alternatively, the processor 71, when executing the computer program 73, may implement the functionality of various modules/units of the privacy preserving user representation generation system embodiments described above, such as modules 61-65 of FIG. 6.
Illustratively, the computer program 73 may be partitioned into one or more modules/units, which are stored in the memory 72 and executed by the processor 71 to accomplish the present application. The one or more modules/units may be a series of computer program instruction segments capable of performing specific functions, which are used to describe the execution process of the computer program 73 in the terminal device 70.
The terminal device can be a mobile phone, a tablet computer and other computing devices. The terminal device may include, but is not limited to, a processor 71, a memory 72. Those skilled in the art will appreciate that fig. 7 is merely an example of the terminal device 70, and does not constitute a limitation of the terminal device 70, and may include more or less components than those shown, or combine some components, or different components, for example, the terminal device 70 may further include an input-output device, a network access device, a bus, etc.
The Processor 71 may be a Central Processing Unit (CPU), other general purpose Processor, a Digital Signal Processor (DSP), an Application Specific Integrated Circuit (ASIC), an off-the-shelf Programmable Gate Array (FPGA) or other Programmable logic device, discrete Gate or transistor logic device, discrete hardware component, etc. A general purpose processor may be a microprocessor or the processor may be any conventional processor or the like.
The memory 72 may be an internal storage unit of the terminal device 70, such as a hard disk or a memory of the terminal device 70. The memory 72 may also be an external storage device of the terminal device 70, such as a plug-in hard disk, a Smart Media Card (SMC), a Secure Digital (SD) Card, a Flash memory Card (Flash Card), and the like, which are provided on the terminal device 70. Further, the memory 72 may also include both an internal storage unit and an external storage device of the terminal device 70. The memory 72 is used for storing the computer program 73 and other programs and data required by the terminal device 70. The memory 72 may also be used to temporarily store data that has been output or is to be output.
In the above embodiments, the descriptions of the respective embodiments have respective emphasis, and reference may be made to the related descriptions of other embodiments for parts that are not described or illustrated in a certain embodiment.
Those of ordinary skill in the art will appreciate that the various illustrative elements and algorithm steps described in connection with the embodiments disclosed herein may be implemented as electronic hardware or combinations of computer software and electronic hardware. Whether such functionality is implemented as hardware or software depends upon the particular application and design constraints imposed on the implementation. Skilled artisans may implement the described functionality in varying ways for each particular application, but such implementation decisions should not be interpreted as causing a departure from the scope of the present application.
In the embodiments provided in the present application, it should be understood that the disclosed privacy protecting user representation generation method, system and terminal device may be implemented in other ways. For example, the above-described privacy preserving user representation generation system embodiments are merely illustrative, and for example, the division of the modules or elements into only one logical division may be implemented in practice in another manner, e.g., multiple elements or components may be combined or integrated into another system, or some features may be omitted, or not implemented. In addition, the shown or discussed mutual coupling or direct coupling or communication connection may be an indirect coupling or communication connection through some interfaces, devices or units, and may be in an electrical, mechanical or other form.
The units described as separate parts may or may not be physically separate, and parts displayed as units may or may not be physical units, may be located in one place, or may be distributed on a plurality of network units. Some or all of the units can be selected according to actual needs to achieve the purpose of the solution of the embodiment.
In addition, functional units in the embodiments of the present application may be integrated into one processing unit, or each unit may exist alone physically, or two or more units are integrated into one unit. The integrated unit can be realized in a form of hardware, and can also be realized in a form of a software functional unit.
The integrated modules/units, if implemented in the form of software functional units and sold or used as separate products, may be stored in a computer readable storage medium. Based on such understanding, all or part of the flow in the method of the embodiments described above can be realized by a computer program, which can be stored in a computer-readable storage medium and can realize the steps of the embodiments of the methods described above when the computer program is executed by a processor. Wherein the computer program comprises computer program code, which may be in the form of source code, object code, an executable file or some intermediate form, etc. The computer-readable medium may include: any entity or device capable of carrying the computer program code, recording medium, usb disk, removable hard disk, magnetic disk, optical disk, computer Memory, Read-Only Memory (ROM), Random Access Memory (RAM), electrical carrier wave signals, telecommunications signals, software distribution medium, and the like. It should be noted that the computer readable medium may contain other components which may be suitably increased or decreased as required by legislation and patent practice in jurisdictions, for example, in some jurisdictions, computer readable media which may not include electrical carrier signals and telecommunications signals in accordance with legislation and patent practice.
The above-mentioned embodiments are only used for illustrating the technical solutions of the present application, and not for limiting the same; although the present application has been described in detail with reference to the foregoing embodiments, it should be understood by those of ordinary skill in the art that: the technical solutions described in the foregoing embodiments may still be modified, or some technical features may be equivalently replaced; such modifications and substitutions do not substantially depart from the spirit and scope of the embodiments of the present application and are intended to be included within the scope of the present application.

Claims (10)

1. A privacy preserving user representation generation method, comprising:
acquiring a label data set of a user;
creating a multi-dimensional space based on the dimensions of the label data in the label data set, and dividing the multi-dimensional space into a plurality of mutually disjoint rectangular units;
acquiring the number of the label data in the rectangular unit, and recording the number as a first count of the rectangular unit;
adding noise to the first count of the rectangular unit to obtain a second count of the rectangular unit;
and obtaining a user portrait based on the rectangular units with the second counts meeting the preset conditions.
2. The privacy-preserving user representation generation method of claim 1, wherein the adding noise on the first count of rectangular cells comprises:
and adding Laplace noise to the first count of the rectangular unit, and rounding the first count after the addition of the Laplace noise to obtain a second count of the rectangular unit.
3. The privacy-preserving user representation generation method of claim 2, wherein the probability density function of laplacian noise is:
Laplace(x)=exp(-|x|×ε’/Δf)
wherein ∈, —, and Δ f are preset constants, n represents a dimension of the label data set, and x represents a variable of the probability density function.
4. The privacy-preserving user representation generation method of claim 1, wherein the obtaining the number of tag data within the rectangular unit as a first count of the rectangular unit comprises:
under one dimension, selecting one dimension, projecting the label data on the selected dimension to obtain the number of the label data in each subspace under the current dimension, and recording the number of the label data as a first count of each subspace under the current dimension, wherein the subspace under the current dimension is a space formed by projection of the rectangular unit under the current dimension; adding noise to the first count of each subspace in the current dimension to obtain a second count of each subspace in the current dimension; and recording the subspace of which the second count meets the preset condition as a current dimension dense unit.
5. The privacy-preserving user representation generation method of claim 1, wherein the obtaining the number of tag data within the rectangular unit as a first count of the rectangular unit further comprises:
under the i dimension, adding a dimension on the basis of the dimension selected in the i-1 dimension as the dimension selected in the i dimension, projecting the label data on the selected dimension to obtain the number of the label data in a subspace effective to the current dimension, and recording the number of the label data as a first count of the subspace associated with the upper dimension; the subspace with the effective current dimension is a subspace corresponding to the dense unit of the previous dimension in the current dimension; adding noise to the first count of the subspace with the effective current dimension to obtain a second count of the subspace with the effective current dimension; recording the subspace of which the second count meets the preset condition as a current dimension dense unit;
under n dimensions, adding one dimension as the dimension selected by the n dimensions on the basis of the dimension selected by the n-1 dimensions, projecting the label data on the selected dimension to obtain the number of the label data in a subspace effective to the current dimension, and recording the number of the label data as a first count of the subspace associated with the upper dimension; where n represents the dimension of the set of label data, i ∈ [2, n-1 ].
6. The privacy-preserving user representation generation method of claim 1, wherein obtaining a user representation based on the rectangular cells for which the second count satisfies a preset condition comprises:
and forming a set by the rectangular units with the second counts meeting the preset conditions, selecting one rectangular unit from the set as an initial cluster, traversing the rectangular units except the initial cluster in the set, adding other rectangular units with the second counts meeting the preset conditions, which are communicated with the rectangular units, into the initial cluster to obtain a cluster, and taking the cluster as a user portrait.
7. The privacy-preserving user representation generation method of claim 6, wherein the traversing rectangular elements of the set outside of the initial cluster comprises:
and traversing the rectangular units outside the initial cluster based on a depth-first principle.
8. A privacy preserving user representation generation system, comprising:
the acquisition module is used for acquiring a label data set of a user;
the dividing module is used for creating a multi-dimensional space based on the dimensions of the label data in the label data set and dividing the multi-dimensional space into a plurality of mutually-disjoint rectangular units;
the counting module is used for acquiring the number of the label data in the rectangular unit and recording the number as a first count of the rectangular unit;
the noise module is used for adding noise to the first count of the rectangular unit to obtain a second count of the rectangular unit;
and the generating module is used for obtaining the user portrait based on the rectangular unit with the second count meeting the preset condition.
9. A terminal device comprising a memory, a processor and a computer program stored in the memory and executable on the processor, wherein the processor when executing the computer program performs the steps of the privacy preserving user representation generating method of any one of claims 1 to 7.
10. A computer-readable storage medium, in which a computer program is stored, which, when being executed by a processor, carries out the steps of the privacy preserving user representation generation method according to any one of claims 1 to 7.
CN201911018936.6A 2019-10-24 2019-10-24 Privacy-protecting user portrait generation method, terminal device and storage medium Pending CN110750725A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201911018936.6A CN110750725A (en) 2019-10-24 2019-10-24 Privacy-protecting user portrait generation method, terminal device and storage medium

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201911018936.6A CN110750725A (en) 2019-10-24 2019-10-24 Privacy-protecting user portrait generation method, terminal device and storage medium

Publications (1)

Publication Number Publication Date
CN110750725A true CN110750725A (en) 2020-02-04

Family

ID=69279770

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201911018936.6A Pending CN110750725A (en) 2019-10-24 2019-10-24 Privacy-protecting user portrait generation method, terminal device and storage medium

Country Status (1)

Country Link
CN (1) CN110750725A (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113343306A (en) * 2021-06-29 2021-09-03 招商局金融科技有限公司 Data query method, device, equipment and storage medium based on differential privacy

Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20170316346A1 (en) * 2016-04-28 2017-11-02 Qualcomm Incorporated Differentially private iteratively reweighted least squares
CN107766740A (en) * 2017-10-20 2018-03-06 辽宁工业大学 A kind of data publication method based on difference secret protection under Spark frameworks
CN109784092A (en) * 2019-01-23 2019-05-21 北京工业大学 A kind of recommended method based on label and difference secret protection
CN109886334A (en) * 2019-02-20 2019-06-14 安徽师范大学 A kind of shared nearest neighbor density peak clustering method of secret protection
US20190311219A1 (en) * 2016-10-10 2019-10-10 King Abdullah University Of Science And Technology Quasi-clique prototype-based hybrid clustering
CN110334757A (en) * 2019-06-27 2019-10-15 南京邮电大学 Secret protection clustering method and computer storage medium towards big data analysis

Patent Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20170316346A1 (en) * 2016-04-28 2017-11-02 Qualcomm Incorporated Differentially private iteratively reweighted least squares
US20190311219A1 (en) * 2016-10-10 2019-10-10 King Abdullah University Of Science And Technology Quasi-clique prototype-based hybrid clustering
CN107766740A (en) * 2017-10-20 2018-03-06 辽宁工业大学 A kind of data publication method based on difference secret protection under Spark frameworks
CN109784092A (en) * 2019-01-23 2019-05-21 北京工业大学 A kind of recommended method based on label and difference secret protection
CN109886334A (en) * 2019-02-20 2019-06-14 安徽师范大学 A kind of shared nearest neighbor density peak clustering method of secret protection
CN110334757A (en) * 2019-06-27 2019-10-15 南京邮电大学 Secret protection clustering method and computer storage medium towards big data analysis

Non-Patent Citations (2)

* Cited by examiner, † Cited by third party
Title
刘珊: "《大数据与新媒体运营》", 30 October 2017 *
项响琴 等: "CLIQUE聚类算法的分析研究", 《合肥学院学报(自然科学版)》 *

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113343306A (en) * 2021-06-29 2021-09-03 招商局金融科技有限公司 Data query method, device, equipment and storage medium based on differential privacy
CN113343306B (en) * 2021-06-29 2024-02-20 招商局金融科技有限公司 Differential privacy-based data query method, device, equipment and storage medium

Similar Documents

Publication Publication Date Title
Alahmadi et al. Passive detection of image forgery using DCT and local binary pattern
Chen et al. Saliency detection via the improved hierarchical principal component analysis method
CN106845331B (en) A kind of image processing method and terminal
CN106156755A (en) Similarity calculating method in a kind of recognition of face and system
CN107688824A (en) Picture match method and terminal device
Wu et al. Texture descriptors in MPEG-7
CN106407920B (en) The fringes noise removing method of fingerprint image
CN108989581B (en) User risk identification method, device and system
CN109299615B (en) Differential privacy processing and publishing method for social network data
CN105956628A (en) Data classification method and device for data classification
CN104123718A (en) Device and method for image processing, image processing control program, and recording medium
CN110298687B (en) Regional attraction assessment method and device
CN110264573A (en) Three-dimensional rebuilding method, device, terminal device and storage medium based on structure light
CN110222829A (en) Feature extracting method, device, equipment and medium based on convolutional neural networks
CN109785246A (en) A kind of noise-reduction method of non-local mean filtering, device and equipment
CN112241789A (en) Structured pruning method, device, medium and equipment for lightweight neural network
US20120200566A1 (en) System and method for mesh refinement
Zrour et al. Optimal consensus set for digital line and plane fitting
CN112825199A (en) Collision detection method, device, equipment and storage medium
CN110750725A (en) Privacy-protecting user portrait generation method, terminal device and storage medium
CN110738204A (en) Method and device for positioning certificate areas
CN104573696B (en) Method and apparatus for handling face characteristic data
CN107798249B (en) Method for releasing behavior pattern data and terminal equipment
CN110162549A (en) A kind of fire data analysis method, device, readable storage medium storing program for executing and terminal device
CN109326324A (en) A kind of detection method of epitope, system and terminal device

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
RJ01 Rejection of invention patent application after publication
RJ01 Rejection of invention patent application after publication

Application publication date: 20200204