CN111090877B - Data generation and acquisition methods, corresponding devices and storage medium - Google Patents

Data generation and acquisition methods, corresponding devices and storage medium Download PDF

Info

Publication number
CN111090877B
CN111090877B CN201911148392.5A CN201911148392A CN111090877B CN 111090877 B CN111090877 B CN 111090877B CN 201911148392 A CN201911148392 A CN 201911148392A CN 111090877 B CN111090877 B CN 111090877B
Authority
CN
China
Prior art keywords
data
voting
preference data
candidate
voting preference
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN201911148392.5A
Other languages
Chinese (zh)
Other versions
CN111090877A (en
Inventor
王绍蔚
杜家春
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Tencent Technology Shenzhen Co Ltd
Original Assignee
Tencent Technology Shenzhen Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Tencent Technology Shenzhen Co Ltd filed Critical Tencent Technology Shenzhen Co Ltd
Priority to CN201911148392.5A priority Critical patent/CN111090877B/en
Publication of CN111090877A publication Critical patent/CN111090877A/en
Application granted granted Critical
Publication of CN111090877B publication Critical patent/CN111090877B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F21/00Security arrangements for protecting computers, components thereof, programs or data against unauthorised activity
    • G06F21/60Protecting data
    • G06F21/62Protecting access to data via a platform, e.g. using keys or access control rules
    • G06F21/6218Protecting access to data via a platform, e.g. using keys or access control rules to a system of files or objects, e.g. local or distributed file system or database
    • G06F21/6245Protecting personal data, e.g. for financial or medical purposes
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/21Design, administration or maintenance of databases
    • G06F16/215Improving data quality; Data cleansing, e.g. de-duplication, removing invalid entries or correcting typographical errors
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/25Fusion techniques
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/25Fusion techniques
    • G06F18/259Fusion by voting

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Data Mining & Analysis (AREA)
  • Databases & Information Systems (AREA)
  • General Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Health & Medical Sciences (AREA)
  • Health & Medical Sciences (AREA)
  • Bioethics (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Evolutionary Computation (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Evolutionary Biology (AREA)
  • Artificial Intelligence (AREA)
  • Medical Informatics (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Computer Hardware Design (AREA)
  • Computer Security & Cryptography (AREA)
  • Software Systems (AREA)
  • Quality & Reliability (AREA)
  • Management, Administration, Business Operations System, And Electronic Commerce (AREA)

Abstract

The application discloses a data generation and acquisition method, a corresponding device and a storage medium, wherein the method comprises the following steps: acquiring voting preference data of a target object; randomly perturbing the voting preference data to output a perturbation data set meeting a local differential privacy mechanism; converting the disturbance data set by using a preset function to generate unbiased estimation data of voting preference data of the target object; and transmitting unbiased estimation data of voting preference data of the target object to a server so that the server can acquire decision data. The embodiment of the application has the characteristics of wide application range, small calculation cost, no interaction and high effectiveness, and can be suitable for voting preference data aggregation in a large-scale distributed network and a low-resource terminal environment.

Description

Data generation and acquisition methods, corresponding devices and storage medium
Technical Field
The embodiment of the application relates to the technical field of information security, in particular to a data generation and acquisition method, a corresponding device and a storage medium.
Background
At any time, development of network technology and popularization of mobile terminal devices are advanced, and how to collect and analyze user data and protect user privacy is an important issue of concern in the industry.
In the scheme based on cryptography, the computing and communication interaction cost is high, and the scheme can not be applied to the scene of large-scale voting preference data aggregation decision in the millions or above in the millions under the network environment.
In the scheme based on data disturbance, the voting preference data aggregation result based on Laplace noise addition has huge error, and is not beneficial to making effective decisions.
Accordingly, the prior art has drawbacks and needs to be improved and developed.
Disclosure of Invention
The embodiment of the application provides a data generation and acquisition method, a corresponding device and a storage medium, which have the characteristics of wide application range, small calculation cost, no interaction and high effectiveness, and can be suitable for voting preference data aggregation in a large-scale distributed network and a low-resource terminal environment.
The embodiment of the application provides a data generation method, which is applicable to a terminal, and comprises the following steps:
acquiring voting preference data of a target object;
randomly perturbing the voting preference data to output a perturbation data set meeting a local differential privacy mechanism;
converting the disturbance data set by using a preset function to generate unbiased estimation data of voting preference data of the target object;
And transmitting unbiased estimation data of voting preference data of the target object to a server so that the server can acquire decision data.
In the data generating method according to the embodiment of the present application, the voting preference data includes any one of category data, collection data, and preference data of the target object.
The embodiment of the application also provides a data acquisition method, which is applicable to the server and comprises the following steps:
receiving unbiased estimation data of voting preference data of a target object sent by a terminal, wherein the unbiased estimation data of the voting preference data is obtained by randomly perturbing the voting preference data by the terminal to output a perturbation data set meeting a local differential privacy mechanism and converting the perturbation data set;
calculating voting integral unbiased estimation quantity and confidence interval of each candidate item in the voting preference data according to unbiased estimation data of the voting preference data;
and generating a voting decision according to the unbiased estimation quantity and the confidence interval of the voting integral of each candidate item in the voting preference data.
The embodiment of the application also provides a data generating device, which is suitable for a terminal, and the device comprises:
An acquisition unit configured to acquire voting preference data of a target object;
the disturbance unit is used for randomly disturbing the voting preference data to output a disturbance data set meeting a local differential privacy mechanism;
the transformation unit is used for transforming the disturbance data set by using a preset function so as to generate unbiased estimation data of voting preference data of the target object;
and the sending unit is used for sending the unbiased estimation data of the voting preference data of the target object to a server so that the server can acquire the decision data.
In the data generating apparatus according to the embodiment of the present application, the acquiring unit includes:
an acquisition subunit configured to acquire a candidate set of a target object, where the candidate set includes a plurality of candidates;
and the determining subunit is used for determining voting preference data of the target object according to the preference sequence of all the candidate items in the candidate set.
In the data generating apparatus according to the embodiment of the present application, the average value of the unbiased estimation data of the voting preference data corresponding to each candidate item in the plurality of candidate items in the candidate set is equal to the integral unbiased estimation amount of the corresponding candidate item in the voting preference data.
In the data generating apparatus according to the embodiment of the present application, the voting preference data includes any one of category data, collection data, and preference data of the target object.
The embodiment of the application also provides a data acquisition device, which is applicable to a server, and the device comprises:
the receiving unit is used for receiving unbiased estimation data of voting preference data of a target object sent by the terminal, wherein the unbiased estimation data of the voting preference data is obtained by randomly perturbing the voting preference data by the terminal to output a perturbation data set meeting a local differential privacy mechanism and converting the perturbation data set;
a calculation unit, configured to calculate a voting integral unbiased estimation amount and a confidence interval of each candidate item in the voting preference data according to unbiased estimation data of the voting preference data;
and the decision unit is used for generating a voting decision according to the voting integral unbiased estimated quantity and the confidence interval of each candidate item in the voting preference data.
In the data acquisition device according to the embodiment of the present application, the calculation unit includes:
the first calculating subunit is used for accumulating unbiased estimation data of voting preference data corresponding to the same candidate item in the voting preference data to obtain voting integral unbiased estimation quantity of each candidate item;
And the second calculating subunit is used for calculating the confidence interval according to the variance of the unbiased estimation quantity of the voting integral.
In the data acquisition device of the embodiment of the present application, the second calculating subunit is configured to input the variance of the unbiased estimation amount of the voting integral into chebyshev inequality for calculation, so as to obtain the confidence interval.
In the data acquisition device according to the embodiment of the present application, the decision unit is configured to determine, within the limited range of the confidence interval, a candidate with the largest value of the unbiased estimation amount of the voting score in the voting preference data as a winning candidate.
The embodiment of the application further provides a storage medium, where the storage medium stores a plurality of instructions, where the instructions are adapted to be loaded by a processor, to perform any of the steps in the data generating method provided in the embodiment of the application, or to perform any of the steps in the data acquiring method provided in the embodiment of the application.
According to the embodiment of the application, voting preference data of a target object are obtained; randomly perturbing the voting preference data to output a perturbation data set meeting a local differential privacy mechanism; converting the disturbance data set by using a preset function to generate unbiased estimation data of voting preference data of the target object; and transmitting unbiased estimation data of voting preference data of the target object to a server so that the server can acquire decision data. The embodiment of the application has the characteristics of wide application range, small calculation cost, no interaction and high effectiveness, and can be suitable for voting preference data aggregation in a large-scale distributed network and a low-resource terminal environment.
Drawings
In order to more clearly illustrate the technical solutions of the embodiments of the present application, the drawings that are needed in the description of the embodiments will be briefly introduced below, it being obvious that the drawings in the following description are only some embodiments of the present application, and that other drawings may be obtained according to these drawings without inventive effort for a person skilled in the art.
FIG. 1 is a system architecture diagram of a data processing system according to an embodiment of the present application.
Fig. 2 is a schematic view of a scenario of a data processing system according to an embodiment of the present application.
Fig. 3 is a flow chart of a data generating method according to an embodiment of the present application.
Fig. 4 is a flowchart of a data acquisition method according to an embodiment of the present application.
Fig. 5 is a schematic diagram of test results of a data acquisition method according to an embodiment of the present application.
Fig. 6 is a schematic diagram of another test result of a data acquisition method according to an embodiment of the present application.
Fig. 7 is a schematic structural diagram of a data generating device according to an embodiment of the present application.
Fig. 8 is a schematic structural diagram of a data acquisition device according to an embodiment of the present application.
Fig. 9 is a schematic structural diagram of a terminal according to an embodiment of the present application.
Fig. 10 is a schematic structural diagram of a server according to an embodiment of the present application.
Detailed Description
The following description of the embodiments of the present application will be made clearly and fully with reference to the accompanying drawings, in which it is evident that the embodiments described are only some, but not all, of the embodiments of the present application. All other embodiments, which can be made by those skilled in the art based on the embodiments herein without making any inventive effort, are intended to be within the scope of the present application.
The terms "first" and "second" and the like in this application are used for distinguishing between different objects and not for describing a particular sequential order. Furthermore, the terms "comprise" and "have," as well as any variations thereof, are intended to cover a non-exclusive inclusion. For example, a process, method, system, article, or apparatus that comprises a list of steps or modules is not limited to only those steps or modules but may include other steps or modules not expressly listed or inherent to such process, method, article, or apparatus.
Reference herein to "an embodiment" means that a particular feature, structure, or characteristic described in connection with the embodiment may be included in at least one embodiment of the present application. The appearances of such phrases in various places in the specification are not necessarily all referring to the same embodiment, nor are separate or alternative embodiments mutually exclusive of other embodiments. Those of skill in the art will explicitly and implicitly appreciate that the embodiments described herein may be combined with other embodiments.
In the scheme based on cryptography, voting preference data is aggregated through secure multiparty calculation (like state encryption) and the like, so that a data collector or other third parties cannot acquire voting preference data plaintext, and the purpose of protecting voting preference data privacy is achieved. However, the scheme has high computational and communication interaction cost, and cannot be applied to a scene of large-scale voting preference data aggregation decision of millions or more in the network environment.
In a scheme based on data perturbation, the maximum l according to voting preference data 1 The norm changes delta by adding scale asThe Laplace noise of the system can prevent a network node, a data collector or other third parties from acquiring real voting preference data, thereby achieving the purpose of protecting the privacy of the voting preference data. But based on Laplace noise additionThe error of the result of the aggregation of the ticket preference data is huge and is not beneficial to making an effective decision.
Therefore, the embodiment of the application provides a data generation and acquisition method, a corresponding device and a storage medium, wherein the data generation and acquisition method, the corresponding device and the storage medium are used for carrying out localized differential privacy protection on voting preference data of a target object (an individual or an organization) through probabilistically outputting a disturbance data set, so that a data collector can calculate voting integral unbiased estimation and confidence interval of candidates, and effective data aggregation analysis is carried out while protecting the data privacy of the target object (the individual or the organization). The embodiment of the application has the characteristics of wide application range, small calculation cost, no interaction and high effectiveness, and can be suitable for voting preference data aggregation in a large-scale distributed network and a low-resource terminal environment.
The embodiment of the application provides a data processing system, which can comprise any data generating device suitable for a terminal and any data acquiring device suitable for a server. The data generating means may be integrated in the terminal. The data acquisition means may be integrated in a network device, such as a server or the like.
For example, referring to FIG. 1, a system architecture diagram of a data processing system is provided, the patterning system comprising: a terminal 10, a server 20, and a network 30, the terminal 10 and the server 20 being connected via the network. Wherein the network 30 comprises network entities such as routers, gateways, etc., which are not shown. Wherein the data generating means are integrated in the terminal 10 and the data retrieving means are integrated in the server 20.
Wherein the terminal 10 may be configured to: acquiring voting preference data of a target object; randomly perturbing the voting preference data to output a perturbation data set meeting a local differential privacy mechanism; converting the disturbance data set by using a preset function to generate unbiased estimation data of voting preference data of the target object; unbiased estimate data of voting preference data of the target object is transmitted to the server 20 so that the server 20 acquires decision data. For example, the terminal 10 performs local privacy protection on the collected voting preference data, and then sends the data to the server 20, so as to instruct the server 20 to calculate the unbiased estimation and the confidence interval of the voting points of the candidates, so that the server 20 performs voting decision according to the unbiased estimation and the confidence interval of the voting points of the candidates.
The terminal 10 may be a mobile phone, a tablet computer, a notebook computer, a wearable device, etc., and the terminal 10 shown in fig. 1 is an example of a mobile phone. The terminal 10 may also have installed therein various applications required by a user, such as an entertainment-enabled application (e.g., a live broadcast application, an audio play application, a game application, a reading software), and a service-enabled application (e.g., a map navigation application, a shopping application, etc.).
Wherein the server 20 is configured to: receiving unbiased estimation data of voting preference data of a target object sent by a terminal 10, wherein the unbiased estimation data of the voting preference data is obtained by randomly perturbing the voting preference data by the terminal 10 to output a perturbation data set meeting a local differential privacy mechanism and converting the perturbation data set; calculating voting integral unbiased estimation quantity and confidence interval of each candidate item in the voting preference data according to unbiased estimation data of the voting preference data; and generating a voting decision according to the unbiased estimation quantity and the confidence interval of the voting integral of each candidate item in the voting preference data. For example, the server 20 calculates the unbiased estimate of the voting points and the confidence interval of the candidates from the received data view with privacy protection, and then makes a voting decision.
For example, referring to fig. 2, fig. 2 shows a schematic view of a scenario of a data processing system, a user performs random disturbance on voting preference data V of the user independently at a client or a terminal side to obtain a data disturbance data set Z with privacy protection, and then transmits the data disturbance data set Z to a data collector (such as a cloud server) through network communication, and the cloud server can obtain unbiased estimation of aggregated data and a confidence interval thereof through data analysis and calculation after collecting a data view of the user, so as to make a voting decision.
The system architecture referred to in fig. 1 is only one example of a system architecture for implementing the embodiments of the present application, and the system architecture is not limited to the system architecture referred to in the present application. Based on the system architecture, various embodiments of the present application are presented.
The following detailed description will be given respectively, and the following description sequence of the embodiments does not limit the specific implementation sequence.
Referring to fig. 3, fig. 3 is a flow chart of a data generating method according to an embodiment of the present application. The method is suitable for the terminal, and comprises the following steps:
step 101, voting preference data of a target object is acquired.
In some embodiments, the obtaining voting preference data of the target object includes:
collecting a candidate set of target objects, wherein the candidate set comprises a plurality of candidates;
and determining voting preference data of the target object according to the preference sequence of all the candidate items in the candidate set.
For example, in the embodiment of the present application, the object for collecting voting preference data may be data collection by a client installed in the terminal, or may be data collection directly by the terminal itself. The client is a program corresponding to the server and providing local service for the client, such as a web browser, an email client, an instant messaging client, an online shopping application client and the like.
The client or terminal side collects sensitive data of the user, wherein any one or more of category data (such as gender and region), collection data (such as website access records, shopping basket lists and App lists) and preference data (such as votes, clicked contents and purchased props) of the user can be used as voting preference data, and the voting preference data contains sensitive personal information.
Wherein the candidate set may include a plurality of candidates, and the expression of the candidate set a may be: a= { a 1 ,A 2 ,……,A d And, wherein each item in candidate set a is referred to as a candidate item.
Wherein the voting preference data may include any one of category data, collection data, and preference data of the target object. For example, one voting preference data may be a sex man, a sex woman, an a region, or the like in the category data; for example, a voting preference data may be a website, a commodity, or an App list in the aggregated data; for example, a vote preference data may also be a vote, a click content, or a prop in the preference data, etc. Wherein the target object is a user of the acquired data, such as an individual or institution of the acquired data.
Wherein the voting preference data V of an individual or institution is determined by the preference order of the individual or institution for all candidates on the candidate set. Assume that the score obtained by ranking the ith candidate in order of preference is W i If the candidate A is marked j In the preference ordering position R (j), voting preference data v= { W R(1) ,W R(2) ,……,W R(d) }. Wherein, in one voting rule, w= { W 1 ,W 2 ,……,W d The value of } is fixed and non-ascending. For example, in the wave (Borda) voting rule, w= { d-1, d-2, … …,0}, each candidate is compared with the other candidates one by one, 1 score is obtained for each win. For example, in the Nauru voting rules According to the ranking order, the point of each vote is calculated by using the formula 1/n. For example, in the relative majority (plurity) voting rule, w= {1,0, … …,0}, each voter casts a vote, and the candidate with the greatest number of votes wins.
For example, d represents d candidates (or candidates), W 1 Score (integral) representing the first candidate (or candidate) of the rank, W 2 Score (integral) representing the second ranked candidate (or candidate), W d Representing the score (integral) of the candidate (or candidate) ranked d.
R (1) represents the ranking of candidate 1 in a voting preference data, R #2) Representing the ranking of candidate 2 in one voting preference data. Because R (1) represents the ranking of candidate 1 in one voting preference data, W within voting preference data V R(1) Representing the score (integral) of candidate 1 in this vote. For example, R (1) represents a ranking, such as ranking first or third, R (1) is a numerical value, such as R (1) =3, then W R(1) =W 3
Step 102, randomly perturbing the voting preference data to output a perturbation data set meeting a local differential privacy mechanism.
The voting preference data V of an individual or an organization is taken as input, and the output processed by an epsilon-local differential privacy mechanism Q is taken as a disturbance data set Z, and the output probability of the disturbance data set Z needs to meet the following limitation of an inequality (for any possible V and V'), wherein the inequality is:
Pr[Z|V]≤Pr[Z|V’]·exp(ε);
Where Z represents the disturbance data set, V represents the voting preference data, and V' represents any one of the further voting preference data. Pr denotes a probability, pr [ Z|V ] denotes an output probability of the disturbance data set Z inputted with the voting preference data V, and Pr [ Z|V '] denotes an output probability of the disturbance data set Z inputted with any one of the other voting preference data V'. exp represents an exponential function based on a natural constant e, exp (ε) represents the power of e, ε is a real number greater than zero, and ε in embodiments of the present application may be a real number between 0 and 3, i.e., 0< ε <3.
Wherein the disturbance data set Z, which satisfies the epsilon-local differential privacy mechanism Q, is output, and can represent a subset of the candidate set A, and the disturbance data set Z can be represented in a collective manner. The disturbance data set Z may also be represented by a bitmap (bitmap), which is also called a Raster image (Raster graphics) or a bitmap, which is an image represented by using a Pixel array (Pixel-array/Dot-matrix lattice).
Wherein the numerical value in the voting preference data V can be input into equation one to calculate the output probability of the disturbance data set Z. Wherein, formula one is:
wherein A is j Represents the j-th candidate, Ω is a normalization factor, d represents d candidates, W R(d) Representing the score (integral) of candidate d in the voting preference data V, R (1) representing the ranking of candidate 1 in the voting preference data V, W 1 Score (integral) representing first-order candidates, W d Representing the score (integral) of the candidates of row d.
Wherein, the normalization factor Ω can be expressed by the formula two, which is:
wherein W is i The score (integral) obtained by the candidate item of the ith row of the preference sequence is represented, d represents d candidates, d is a positive integer greater than or equal to 1, W 1 Score (integral) representing first-order candidates, W d Representing the score (integral) of the candidates of row d.
Wherein the output probability Pr [ Z=A j |V]The definition of the co-local differential privacy mechanism described above is satisfied.
And step 103, converting the disturbance data set by using a preset function to generate unbiased estimation data of voting preference data of the target object.
Specifically, the disturbance data set is input into a preset function for conversion, and unbiased estimation data of voting preference data of the target object is obtained. The average value of the unbiased estimation data of the voting preference data corresponding to each candidate item in the candidate items is equal to the integral unbiased estimation quantity of the corresponding candidate item in the voting preference data.
Wherein, according to the candidate item A j Whether or not conversion occurs in the subset Z (disturbance data set) results in corresponding candidates in the voting preference data VOption a j Is used for carrying out conversion processing on the disturbance data set Z to generate unbiased estimation data X of voting preference data of the target object. For example, the conversion condition may be expressed by the formula three:
E[X j ]=E[f([A j ∈Z])]=W R(j)
wherein E [ X ] j ]X represents j Mean value of E [ X ] j ]Equal to candidate a j Integral W of (2) R(j) Unbiased estimates of (a). X is X j =f([A j ∈Z]),X j Representing candidate A j The corresponding unbiased estimate of voting preference data may be a linear transformation, e.g., the preset function f may be expressed by equation four:
f([A j ∈Z])=a·[A j ∈Z]+c;
the parameter a can be calculated by a formula five:
wherein W is i The score (integral) obtained by the candidate item of the ith row of the preference sequence is represented, d represents d candidates, d is a positive integer greater than or equal to 1, W 1 Score (integral) representing first-order candidates, W d The score (integral) representing candidates of the d-th bit of the row, exp (epsilon) represents epsilon to the power of e, epsilon being a real number greater than zero.
The parameter c can be calculated by a formula six:
Wherein d represents d candidates, d is a positive integer greater than or equal to 1, W 1 Score (integral) representing first-order candidates, W d Score (integral) representing candidates of d-th bit of row, exp (ε) represents eEpsilon to the power epsilon is a real number greater than zero.
Wherein the unbiased estimation data of the voting preference data of the target object includes unbiased estimation data of the voting preference data corresponding to a plurality of candidates, and the expression of the unbiased estimation data X of the voting preference data may be: x= { X 1 ,X 2 ,……,X d (wherein X is 1 Representing candidate A 1 Unbiased estimate data, X, of corresponding voting preference data 2 Representing candidate A 2 Unbiased estimate data, X, of corresponding voting preference data d Representing candidate A d Unbiased estimate data of corresponding voting preference data.
And 104, transmitting unbiased estimation data of voting preference data of the target object to a server so that the server can acquire decision data.
For example, unbiased estimation data X of voting preference data of a target object is transmitted to a server as a data collector so that the server acquires decision data.
The unbiased estimation data X of the voting preference data of the target object is data with privacy protection.
In some embodiments, the step of generating unbiased estimate data X of voting preference data of the target object may also be performed at the server side. Specifically, after the terminal equipment acquires the disturbance data set Z meeting the local differential privacy mechanism, the disturbance data set Z is directly sent to a server side for data analysis and processing.
All the above technical solutions may be combined to form an optional embodiment of the present application, which is not described here in detail.
According to the embodiment of the application, voting preference data of a target object are obtained; randomly perturbing the voting preference data to output a perturbation data set meeting a local differential privacy mechanism; converting the disturbance data set by using a preset function to generate unbiased estimation data of voting preference data of the target object; and transmitting unbiased estimation data of voting preference data of the target object to a server so that the server can acquire decision data. The embodiment of the application has the characteristics of wide application range, small calculation cost, no interaction and high effectiveness, and can be suitable for voting preference data aggregation in a large-scale distributed network and a low-resource terminal environment.
Referring to fig. 4 to 6, fig. 4 is a flow chart of a data acquisition method according to an embodiment of the present application, fig. 5 is a test result diagram of a data acquisition method according to an embodiment of the present application, and fig. 6 is another test result diagram of a data acquisition method according to an embodiment of the present application. The method is applicable to a server, and comprises the following steps:
step 201, receiving unbiased estimation data of voting preference data of a target object sent by a terminal, wherein the unbiased estimation data of the voting preference data is obtained by randomly perturbing the voting preference data by the terminal to output a perturbation data set meeting a local differential privacy mechanism and converting the perturbation data set.
For example, the server collects unbiased estimate data X of voting preference data of user k transmitted by the terminal (k) . For example, the collection instruction may be triggered within a preset time interval, so that the request for transmitting data of the terminal may be received in time. For example, the server may first send a request to the terminal to collect data to instruct the terminal to collect unbiased estimate data with privacy-preserving voting preference data on demand, and then the server receives the unbiased estimate data of the voting preference data of the target object sent by the terminal.
Step 202, calculating the voting integral unbiased estimation and confidence interval of each candidate item in the voting preference data according to the unbiased estimation data of the voting preference data.
In some embodiments, the calculating the voting integral unbiased estimate and the confidence interval of each candidate item in the voting preference data according to the unbiased estimate data of the voting preference data includes:
accumulating unbiased estimation data of voting preference data corresponding to the same candidate in the voting preference data to obtain voting integral unbiased estimation quantity of each candidate;
and calculating the confidence interval according to the variance of the unbiased estimation quantity of the voting integral.
Wherein for all individuals or institutions as target objects, each candidate item A j Calculating corresponding voting integral unbiased estimation quantityAs a true integral P j Is an unbiased estimate of (1).
Wherein by combining the same candidate A in the voting preference data j Unbiased estimation data of corresponding voting preference dataAccumulating to obtain each candidate A j Is an unbiased estimate of the voting integral of (2)> Can be represented by formula seven, which is:
wherein, the liquid crystal display device comprises a liquid crystal display device,representing candidate A in voting preference data issued by the kth individual or institution j Is the integral of the kth individual or organization corresponding candidate item A sent to the server by the terminal j Unbiased estimate data of voting preference data +.>n represents the number of users, and n is a positive integer greater than or equal to 1.
Candidate A j Is true of (2)Integral P j Can be expressed by a formula of eight, and the candidate A j True integral P of (2) j Namely candidate A j The mathematical expectation of the voting integral of (c), equation eight is:
wherein, the liquid crystal display device comprises a liquid crystal display device,candidate A in voting preference data representing the authenticity of the kth individual or institution j N represents the number of users, n is a positive integer greater than or equal to 1.
The confidence interval may be calculated from the variance of the voting integral unbiased estimate.
Specifically, the confidence interval may be calculated based on the true score of each candidate and the variance of the unbiased estimate of the voting score.
Wherein each candidate A is calculated according to formula nine j Unbiased estimate of voting integral of (a)Is given by formula nine:
wherein, the liquid crystal display device comprises a liquid crystal display device,as candidate A j Unbiased estimate of voting integral, P j As candidate A j N represents the number of votes or the number of users, n is a positive integer greater than or equal to 1.
In some embodiments, the variance of the voting integral unbiased estimate is input into a chebyshev inequality for calculation to obtain the confidence interval.
Set random variableMathematical expectation value E (Y) =μ, variance D (Y) =σ of Y 2 For any positive number λ, the inequality two holds, the inequality two represents chebyshev inequality, and the inequality two is:
the second inequality can also be transformed into the third inequality, which is:
wherein, the random variable Y can represent the voting result randomly performed at any time, and the candidate A j The mathematical expectation μ of the voting integral of (2) represents the candidate a j True integral P of (2) j Variance sigma 2 Representing candidate A j Unbiased estimate of voting integral of (a)Variance of->An arbitrary positive number lambda represents the voting integral unbiased estimate +.>With true integral P j Absolute value of the difference.
Wherein the confidence interval is a reference for measuring the reliability of the result, and the inequality is threeIndicating the confidence level.
And 203, generating a voting decision according to the voting integral unbiased estimated quantity and the confidence interval of each candidate item in the voting preference data.
Specifically, in the confidence interval limiting range, the candidate with the largest value of the unbiased estimation quantity of the voting integral in the voting preference data is determined as the winning candidate.
The calculation communication cost of the embodiment of the application is small, the embodiment of the application only has the linear calculation cost of Θ (d) at the side of an individual client or the side of a terminal, and the embodiment of the application only has the complexity of Θ (n.d) at a server serving as a data collecting party, wherein d is the size of a candidate set, and n is the number of individuals/institutions. Experimental results show that when d=32, the calculation time required for a single individual is less than 0.1ms on a desktop, and the time required for the collector to aggregate unbiased estimate data of 1000 individual voting preference data is only about 20ms.
The decision validity of the embodiment of the application is high, the voting integral of the candidate item obtained by the embodiment of the application is unbiased estimated quantity, the variance is smaller, and a finer confidence interval can be obtained.
As shown in fig. 5 and 6, fig. 5 shows a total variation error comparison chart of 1000 individuals or institutions aggregated by a Borda voting rule, and parts (a), (b), (c) and (d) show experimental results when candidates are d=4, d=8, d=1 and d=32, respectively, wherein an ordinate is a total variation log (errTV E), an abscissa is a privacy level epsilon, a curve corresponding to four different methods is shown in the chart, a curve 1 shows a curve corresponding to a Laplace (Laplace) method, a curve 2 shows a curve corresponding to an original Sampling (real Sampling) method, a curve 3 shows a curve corresponding to a weighted Sampling (Weighted Sampling) method, and a curve 4 shows a curve corresponding to an addition (Additive) method. Fig. 6 shows a Top-1 accuracy comparison chart of 1000 individuals or institutions aggregated by a Borda voting rule, and parts (a), (b), (c) and (d) show experimental results when candidates are d=4, d=8, d=16 and d=32, respectively, wherein the ordinate is the accuracy (accuracy AOW) of winning candidates, the abscissa is the privacy level epsilon, the chart shows curves corresponding to four different methods, curve 1 shows a curve corresponding to a Laplace (Laplace) method, curve 2 shows a curve corresponding to an original Sampling (real Sampling) method, curve 3 shows a curve corresponding to a weighted Sampling (Weighted Sampling) method, and curve 4 shows a curve corresponding to an Additive (Additive) method. The method in the embodiment of the application is marked as an Additive method, and experimental results show that when d=16, the total variation is reduced by about 50% compared with the Laplace method under the same privacy protection level (epsilon=1), and the accuracy of the candidates of the first row (Top-1) is improved from 60% to 80%. Wherein, the smaller the total variation is, the smaller the error of the experimental result is, and the higher the accuracy is, the higher the accuracy of the experimental result is.
All the above technical solutions may be combined to form an optional embodiment of the present application, which is not described here in detail.
According to the method, the device and the system, the unbiased estimation data of the voting preference data of the target object sent by the terminal are received, the unbiased estimation data of the voting preference data are randomly disturbed by the terminal, so that a disturbance data set meeting a local differential privacy mechanism is output, and the disturbance data set is converted to obtain the unbiased estimation data; calculating voting integral unbiased estimation quantity and confidence interval of each candidate item in the voting preference data according to unbiased estimation data of the voting preference data; and generating a voting decision according to the unbiased estimation quantity and the confidence interval of the voting integral of each candidate item in the voting preference data. The embodiment of the application has the characteristics of wide application range (random voting rule), small calculation cost (low linear complexity), no interaction (the submission of voting preference data can be completed by single data communication), high effectiveness (small voting integral aggregation error) and suitability for voting preference data aggregation in a large-scale distributed network and low-resource terminal environment.
The embodiment of the application also provides a data generating device, as shown in fig. 7, and fig. 7 is a schematic structural diagram of the data generating device provided in the embodiment of the application. The data generating apparatus 300 is suitable for a terminal, and the data generating apparatus 300 may include an acquisition unit 301, a perturbation unit 302, a transformation unit 303, and a transmission unit 304.
Wherein, the acquiring unit 301 is configured to acquire voting preference data of a target object;
the perturbation unit 302 is configured to randomly perturb the voting preference data to output a perturbation data set that satisfies a local differential privacy mechanism;
the transforming unit 303 is configured to transform the disturbance data set using a preset function to generate unbiased estimation data of voting preference data of the target object;
the sending unit 304 is configured to send unbiased estimation data of voting preference data of the target object to a server, so that the server obtains decision data.
In some embodiments, the acquiring unit 301 includes:
an acquisition subunit 3011 for acquiring a candidate set of target objects, wherein the candidate set includes a plurality of candidates;
a determining subunit 3012, configured to determine voting preference data of the target object according to the preference order of all the candidates in the candidate set.
In some embodiments, the mean of the unbiased estimate of voting preference data for each candidate in the plurality of candidates of the candidate set is equal to the integrated unbiased estimate of the corresponding candidate in the voting preference data.
In some embodiments, the voting preference data includes any one of category data, collection data, and preference data of the target object.
All the above technical solutions may be combined to form an optional embodiment of the present application, which is not described here in detail.
The data generating device 300 provided in the embodiment of the present application acquires voting preference data of a target object through the acquiring unit 301; the perturbation unit 302 randomly perturbs the voting preference data to output a perturbation data set satisfying a local differential privacy mechanism; the transformation unit 303 transforms the disturbance data set using a preset function to generate unbiased estimation data of voting preference data of the target object; the transmitting unit 304 transmits unbiased estimation data of voting preference data of the target object to a server so that the server acquires decision data. The embodiment of the application has the characteristics of wide application range, small calculation cost, no interaction and high effectiveness, and can be suitable for voting preference data aggregation in a large-scale distributed network and a low-resource terminal environment.
The embodiment of the application further provides a data generating device, as shown in fig. 8, and fig. 8 is a schematic structural diagram of a data acquiring device provided in the embodiment of the application. The data acquisition device 400 is suitable for a server, and the data acquisition device 400 may comprise a receiving unit 401, a calculating unit 402, and a deciding unit 403.
The receiving unit 401 is configured to receive unbiased estimation data of voting preference data of a target object sent by a terminal, where the unbiased estimation data of the voting preference data is obtained by randomly perturbing the voting preference data by the terminal, so as to output a perturbation data set meeting a local differential privacy mechanism, and converting the perturbation data set;
the calculating unit 402 is configured to calculate, according to the unbiased estimation data of the voting preference data, a voting integral unbiased estimation amount and a confidence interval of each candidate item in the voting preference data;
the decision unit 403 is configured to generate a voting decision according to the voting integral unbiased estimated amount and the confidence interval of each candidate item in the voting preference data.
In some embodiments, the computing unit 402 includes:
a first calculating subunit 4021, configured to accumulate unbiased estimation data of voting preference data corresponding to the same candidate in the voting preference data, so as to obtain an unbiased estimation amount of voting integral of each candidate;
A second calculation subunit 4022 is configured to calculate the confidence interval according to the variance of the unbiased estimation of the voting integral.
In some embodiments, the second calculating subunit 4022 is configured to input the variance of the voting integral unbiased estimate into chebyshev inequality for calculation to obtain the confidence interval.
In some embodiments, the decision unit 403 is configured to determine, as the winning candidate, a candidate with a largest value of the unbiased estimation of the voting score in the voting preference data within the confidence interval limit.
According to the data acquisition device 400 provided by the embodiment of the application, the receiving unit 401 is used for receiving the unbiased estimation data of the voting preference data of the target object sent by the terminal, the unbiased estimation data of the voting preference data are randomly disturbed by the terminal to output a disturbance data set meeting a local differential privacy mechanism, and the disturbance data set is converted to obtain the disturbance data set; the calculation unit 402 calculates a voting integral unbiased estimation amount and a confidence interval of each candidate item in the voting preference data according to unbiased estimation data of the voting preference data; the decision unit 403 generates a voting decision based on the voting integral unbiased estimate and the confidence interval for each candidate in the voting preference data. The embodiment of the application has the characteristics of wide application range, small calculation cost, no interaction and high effectiveness, and can be suitable for voting preference data aggregation in a large-scale distributed network and a low-resource terminal environment.
Accordingly, embodiments of the present application also provide a terminal, as shown in fig. 9, which may include a Radio Frequency (RF) circuit 501, a memory 502 including one or more computer readable storage media, an input unit 503, a display unit 504, a sensor 505, an audio circuit 506, a wireless fidelity (WiFi, wireless Fidelity) module 507, a processor 508 including one or more processing cores, and a power supply 509. It will be appreciated by those skilled in the art that the terminal structure shown in fig. 7 is not limiting of the terminal and may include more or fewer components than shown, or may combine certain components, or a different arrangement of components. Wherein:
the RF circuit 501 may be configured to receive and send information or signals during a call, and in particular, after receiving downlink information of a base station, the downlink information is processed by one or more processors 508; in addition, data relating to uplink is transmitted to the base station. Typically, RF circuitry 501 includes, but is not limited to, an antenna, at least one amplifier, a tuner, one or more oscillators, a subscriber identity module (SIM, subscriber Identity Module) card, a transceiver, a coupler, a low noise amplifier (LNA, low Noise Amplifier), a duplexer, and the like. In addition, RF circuitry 501 may also communicate with networks and other devices via wireless communications. The wireless communication may use any communication standard or protocol including, but not limited to, global system for mobile communications (GSM, global System of Mobile communication), general packet radio service (GPRS, general Packet Radio Service), code division multiple access (CDMA, code Division Multiple Access), wideband code division multiple access (WCDMA, wideband Code Division Multiple Access), long term evolution (LTE, long Term Evolution), email, short message service (SMS, short Messaging Service), and the like.
The memory 502 may be used to store software programs and modules that the processor 508 performs various functional applications and data processing by executing the software programs and modules stored in the memory 502. The memory 502 may mainly include a storage program area and a storage data area, wherein the storage program area may store an operating system, an application program (such as a sound playing function, an image playing function, etc.) required for at least one function, and the like; the storage data area may store data (such as audio data, phonebook, etc.) created according to the use of the terminal, etc. In addition, memory 502 may include high-speed random access memory, and may also include non-volatile memory, such as at least one magnetic disk storage device, flash memory device, or other volatile solid-state storage device. Accordingly, the memory 502 may also include a memory controller to provide access to the memory 502 by the processor 508 and the input unit 503.
The input unit 503 may be used to receive input numeric or character information and to generate keyboard, mouse, joystick, optical or trackball signal inputs related to user settings and function control. In particular, in one particular embodiment, the input unit 503 may include a touch-sensitive surface, as well as other input devices. The touch-sensitive surface, also referred to as a touch display screen or a touch pad, may collect touch operations thereon or thereabout by a user (e.g., operations thereon or thereabout by a user using any suitable object or accessory such as a finger, stylus, etc.), and actuate the corresponding connection means according to a predetermined program. Alternatively, the touch-sensitive surface may comprise two parts, a touch detection device and a touch controller. The touch detection device detects the touch azimuth of a user, detects a signal brought by touch operation and transmits the signal to the touch controller; the touch controller receives touch information from the touch detection device and converts it into touch point coordinates, which are then sent to the processor 508, and can receive commands from the processor 508 and execute them. In addition, touch sensitive surfaces may be implemented in a variety of types, such as resistive, capacitive, infrared, and surface acoustic waves. The input unit 503 may comprise other input devices besides a touch-sensitive surface. In particular, other input devices may include, but are not limited to, one or more of a physical keyboard, function keys (such as volume control keys, switch keys, etc.), a trackball, mouse, joystick, etc.
The display unit 504 may be used to display information input by a user or information provided to the user and various graphical user interfaces of the terminal, which may be composed of graphics, text, icons, video and any combination thereof. The display unit 504 may include a display panel, which may be optionally configured in the form of a liquid crystal display (LCD, liquid Crystal Display), an Organic Light-Emitting Diode (OLED), or the like. Further, the touch-sensitive surface may overlay a display panel, and upon detection of a touch operation thereon or thereabout, the touch-sensitive surface is passed to the processor 508 to determine the type of touch event, and the processor 508 then provides a corresponding visual output on the display panel based on the type of touch event. Although in fig. 9 the touch sensitive surface and the display panel are implemented as two separate components for input and output functions, in some embodiments the touch sensitive surface may be integrated with the display panel to implement the input and output functions.
The terminal may also include at least one sensor 505, such as a light sensor, a motion sensor, and other sensors. Specifically, the light sensor may include an ambient light sensor that may adjust the brightness of the display panel according to the brightness of ambient light, and a proximity sensor that may turn off the display panel and/or backlight when the terminal moves to the ear. As one of the motion sensors, the gravity acceleration sensor can detect the acceleration in all directions (generally three axes), and can detect the gravity and the direction when the mobile phone is stationary, and can be used for applications of recognizing the gesture of the mobile phone (such as horizontal and vertical screen switching, related games, magnetometer gesture calibration), vibration recognition related functions (such as pedometer and knocking), and the like; other sensors such as gyroscopes, barometers, hygrometers, thermometers, infrared sensors, etc. that may also be configured in the terminal are not described in detail herein.
Audio circuitry 506, speakers, and a microphone may provide an audio interface between the user and the terminal. The audio circuit 506 may transmit the received electrical signal after audio data conversion to a speaker, where the electrical signal is converted into a sound signal for output; on the other hand, the microphone converts the collected sound signals into electrical signals, which are received by the audio circuit 506 and converted into audio data, which are processed by the audio data output processor 508, and then sent to, for example, another terminal via the RF circuit 501, or the audio data are output to the memory 502 for further processing. The audio circuit 506 may also include an ear bud jack to provide communication of the peripheral ear bud with the terminal.
The WiFi belongs to a short-distance wireless transmission technology, and the terminal can help the user to send and receive e-mails, browse web pages, access streaming media and the like through the WiFi module 507, so that wireless broadband internet access is provided for the user. Although fig. 9 shows a WiFi module 507, it is understood that it does not belong to the essential constitution of the terminal, and may be omitted entirely as required within a range not changing the essence of the invention.
The processor 508 is a control center of the terminal, and connects various parts of the entire mobile phone using various interfaces and lines, and performs various functions of the terminal and processes data by running or executing software programs and/or modules stored in the memory 502 and calling data stored in the memory 502, thereby performing overall processing of the mobile phone. Optionally, the processor 508 may include one or more processing cores; preferably, the processor 508 may integrate an application processor that primarily handles operating systems, user interfaces, applications, etc., with a modem processor that primarily handles wireless communications. It will be appreciated that the modem processor described above may not be integrated into the processor 508.
The terminal also includes a power supply 509 (e.g., a battery) for powering the various components, which may be logically connected to the processor 508 via a power management system so as to provide for the management of charge, discharge, and power consumption by the power management system. The power supply 509 may also include one or more of any of a direct current or alternating current power supply, a recharging system, a power failure detection circuit, a power converter or inverter, a power status indicator, and the like.
Although not shown, the terminal may further include a camera, a bluetooth module, etc., which will not be described herein. In this embodiment, the processor 508 in the terminal loads executable files corresponding to the processes of one or more application programs into the memory 502 according to the following instructions, and the processor 508 executes the application programs stored in the memory 502, so as to implement various functions:
acquiring voting preference data of a target object; randomly perturbing the voting preference data to output a perturbation data set meeting a local differential privacy mechanism; converting the disturbance data set by using a preset function to generate unbiased estimation data of voting preference data of the target object; and transmitting unbiased estimation data of voting preference data of the target object to a server so that the server can acquire decision data.
In some embodiments, the processor 508 is configured to obtain voting preference data for the target object, including:
collecting a candidate set of target objects, wherein the candidate set comprises a plurality of candidates;
and determining voting preference data of the target object according to the preference sequence of all the candidate items in the candidate set.
In some embodiments, the mean of the unbiased estimate of voting preference data for each candidate in the plurality of candidates of the candidate set is equal to the integrated unbiased estimate of the corresponding candidate in the voting preference data.
In some embodiments, the voting preference data includes any one of category data, collection data, and preference data of the target object.
The above operations are specifically referred to the previous embodiments, and are not described herein.
From the above, the terminal provided in this embodiment obtains voting preference data of the target object; randomly perturbing the voting preference data to output a perturbation data set meeting a local differential privacy mechanism; converting the disturbance data set by using a preset function to generate unbiased estimation data of voting preference data of the target object; and transmitting unbiased estimation data of voting preference data of the target object to a server so that the server can acquire decision data. The embodiment of the application has the characteristics of wide application range, small calculation cost, no interaction and high effectiveness, and can be suitable for voting preference data aggregation in a large-scale distributed network and a low-resource terminal environment.
The embodiment of the application also provides a server, as shown in fig. 10, which shows a schematic structural diagram of the server according to the embodiment of the application, specifically:
the server may include one or more processing cores 'processors 601, one or more computer-readable storage media's memory 602, power supply 603, and input unit 604, among other components. Those skilled in the art will appreciate that the server architecture shown in fig. 10 is not limiting of the server and may include more or fewer components than shown, or may combine certain components, or a different arrangement of components. Wherein:
the processor 601 is a control center of the server, connects respective portions of the entire server using various interfaces and lines, and performs various functions of the server and processes data by running or executing software programs and/or modules stored in the memory 602, and calling data stored in the memory 602, thereby performing overall processing of the server. Optionally, the processor 601 may include one or more processing cores; preferably, the processor 601 may integrate an application processor and a modem processor, wherein the application processor primarily handles operating systems, user interfaces, applications, etc., and the modem processor primarily handles wireless communications. It will be appreciated that the modem processor described above may not be integrated into the processor 601.
The memory 602 may be used to store software programs and modules, and the processor 601 may execute various functional applications and data processing by executing the software programs and modules stored in the memory 602. The memory 602 may mainly include a storage program area and a storage data area, wherein the storage program area may store an operating system, an application program (such as a sound playing function, an image playing function, etc.) required for at least one function, and the like; the storage data area may store data created according to the use of the server, etc. In addition, the memory 602 may include high-speed random access memory, and may also include non-volatile memory, such as at least one magnetic disk storage device, flash memory device, or other volatile solid-state storage device. Accordingly, the memory 602 may also include a memory controller to provide access to the memory 602 by the processor 601.
The server also includes a power supply 603 for powering the various components, preferably, the power supply 603 can be logically coupled to the processor 601 through a power management system, such that functions of managing charging, discharging, and power consumption are performed by the power management system. The power supply 603 may also include one or more of any components, such as a direct current or alternating current power supply, a recharging system, a power failure detection circuit, a power converter or inverter, a power status indicator, and the like.
The server may further comprise an input unit 604, which input unit 604 may be used for receiving input numerical or character information and for generating keyboard, mouse, joystick, optical or trackball signal inputs in connection with user settings and function control.
Although not shown, the server may further include a display unit or the like, which is not described herein. In this embodiment, the processor 601 in the server loads executable files corresponding to the processes of one or more application programs into the memory 602 according to the following instructions, and the processor 601 executes the application programs stored in the memory 602, so as to implement various functions as follows:
receiving unbiased estimation data of voting preference data of a target object sent by a terminal, wherein the unbiased estimation data of the voting preference data is obtained by randomly perturbing the voting preference data by the terminal to output a perturbation data set meeting a local differential privacy mechanism and converting the perturbation data set; calculating voting integral unbiased estimation quantity and confidence interval of each candidate item in the voting preference data according to unbiased estimation data of the voting preference data; and generating a voting decision according to the unbiased estimation quantity and the confidence interval of the voting integral of each candidate item in the voting preference data.
In some embodiments, the processor 601 is configured to calculate a voting integral unbiased estimate and a confidence interval for each candidate item in the voting preference data according to the unbiased estimate data of the voting preference data, including:
accumulating unbiased estimation data of voting preference data corresponding to the same candidate in the voting preference data to obtain voting integral unbiased estimation quantity of each candidate;
and calculating the confidence interval according to the variance of the unbiased estimation quantity of the voting integral.
In some embodiments, the processor 601 is configured to calculate the confidence interval from a variance of the voting integral unbiased estimate, including:
and inputting the variance of the voting integral unbiased estimation quantity into a chebyshev inequality for calculation to obtain the confidence interval.
In some embodiments, the processor 601 is configured to generate a voting decision based on the voting integral unbiased estimates and the confidence interval for each candidate in the list, including:
and determining the candidate with the largest value of the unbiased estimation of the voting integral in the voting preference data as the winning candidate within the limited range of the confidence interval.
The above operations are specifically referred to the previous embodiments, and are not described herein.
As can be seen from the foregoing, in the server provided in this embodiment, the unbiased estimation data of the voting preference data of the target object sent by the terminal is received, and the unbiased estimation data of the voting preference data is obtained by randomly perturbing the voting preference data by the terminal, so as to output a perturbation data set satisfying the local differential privacy mechanism, and converting the perturbation data set; calculating voting integral unbiased estimation quantity and confidence interval of each candidate item in the voting preference data according to unbiased estimation data of the voting preference data; and generating a voting decision according to the unbiased estimation quantity and the confidence interval of the voting integral of each candidate item in the voting preference data. The embodiment of the application has the characteristics of wide application range (random voting rule), small calculation cost (low linear complexity), no interaction (the submission of voting preference data can be completed by single data communication), high effectiveness (small voting integral aggregation error) and suitability for voting preference data aggregation in a large-scale distributed network and low-resource terminal environment.
Those of ordinary skill in the art will appreciate that all or a portion of the steps of the various methods of the above embodiments may be performed by instructions, or by instructions controlling associated hardware, which may be stored in a computer-readable storage medium and loaded and executed by a processor.
To this end, embodiments of the present application provide a storage medium having stored therein a plurality of instructions capable of being loaded by a processor to perform steps in any of the data generation methods provided by embodiments of the present application, or to perform steps in any of the data acquisition methods provided by embodiments of the present application. For example, the instructions may perform the steps of:
acquiring voting preference data of a target object; randomly perturbing the voting preference data to output a perturbation data set meeting a local differential privacy mechanism; converting the disturbance data set by using a preset function to generate unbiased estimation data of voting preference data of the target object; and transmitting unbiased estimation data of voting preference data of the target object to a server so that the server can acquire decision data.
For example, the instructions may perform the steps of:
receiving unbiased estimation data of voting preference data of a target object sent by a terminal, wherein the unbiased estimation data of the voting preference data is obtained by randomly perturbing the voting preference data by the terminal to output a perturbation data set meeting a local differential privacy mechanism and converting the perturbation data set; calculating voting integral unbiased estimation quantity and confidence interval of each candidate item in the voting preference data according to unbiased estimation data of the voting preference data; and generating a voting decision according to the unbiased estimation quantity and the confidence interval of the voting integral of each candidate item in the voting preference data.
The specific implementation of each operation above may be referred to the previous embodiments, and will not be described herein.
Wherein the storage medium may include: read Only Memory (ROM), random access Memory (RAM, random Access Memory), magnetic or optical disk, and the like.
Because the instructions stored in the storage medium may perform any step in the data generating method or the data acquiring method provided in the embodiments of the present application, the beneficial effects that any one of the data generating method or the data acquiring method provided in the embodiments of the present application may be achieved, which are detailed in the previous embodiments and are not described herein again.
The foregoing describes in detail a data generating and acquiring method, corresponding devices and storage medium provided in the embodiments of the present application, and specific examples are applied to illustrate principles and implementations of the present application, where the foregoing description of the embodiments is only used to help understand the method and core idea of the present application; meanwhile, those skilled in the art will have variations in the specific embodiments and application scope in light of the ideas of the present application, and the present description should not be construed as limiting the present application in view of the above.

Claims (10)

1. A data generation method, suitable for a terminal, comprising:
collecting a candidate set of a target object, wherein the candidate set comprises a plurality of candidates;
acquiring voting preference data of the target object based on the candidate set of the target object;
randomly perturbing the voting preference data to output a perturbation data set satisfying a local differential privacy mechanism, the perturbation data set being a subset of the candidate set;
if the candidate item is converted in the disturbance data set, converting the disturbance data set by using a preset function to generate unbiased estimation data of voting preference data of the target object;
and transmitting unbiased estimation data of voting preference data of the target object to a server so that the server can acquire decision data.
2. The data generation method of claim 1, wherein the acquiring voting preference data of the target object includes:
collecting a candidate set of target objects, wherein the candidate set comprises a plurality of candidates;
and determining voting preference data of the target object according to the preference sequence of all the candidate items in the candidate set.
3. The data generation method of claim 2, wherein a mean value of unbiased estimation data of the voting preference data corresponding to each candidate item in the plurality of candidate items of the candidate set is equal to an integrated unbiased estimation amount of the corresponding candidate item in the voting preference data.
4. A data acquisition method, suitable for a server, comprising:
receiving unbiased estimation data of voting preference data of a target object sent by a terminal, wherein the unbiased estimation data of the voting preference data is obtained by randomly perturbing the voting preference data by the terminal to output a perturbation data set meeting a local differential privacy mechanism and converting the perturbation data set, and the unbiased estimation data is obtained by adopting the method as claimed in claim 1;
calculating voting integral unbiased estimation quantity and confidence interval of each candidate item in the voting preference data according to unbiased estimation data of the voting preference data;
and generating a voting decision according to the unbiased estimation quantity and the confidence interval of the voting integral of each candidate item in the voting preference data.
5. The data acquisition method of claim 4 wherein the calculating the voting integral unbiased estimate and confidence interval for each candidate item in the voting preference data from the unbiased estimate data of the voting preference data includes:
Accumulating unbiased estimation data of voting preference data corresponding to the same candidate in the voting preference data to obtain voting integral unbiased estimation quantity of each candidate;
and calculating the confidence interval according to the variance of the unbiased estimation quantity of the voting integral.
6. The data acquisition method of claim 5 wherein said calculating the confidence interval based on the variance of the voting integral unbiased estimate comprises:
and inputting the variance of the voting integral unbiased estimation quantity into a chebyshev inequality for calculation to obtain the confidence interval.
7. The data acquisition method of claim 4 wherein generating a voting decision based on the voting integral unbiased estimate and the confidence interval for each of the candidates comprises:
and determining the candidate with the largest value of the unbiased estimation of the voting integral in the voting preference data as the winning candidate within the limited range of the confidence interval.
8. A data generating apparatus adapted for a terminal, the apparatus comprising:
an acquisition unit configured to acquire a candidate set of a target object, the candidate set including a plurality of candidates; acquiring voting preference data of the target object based on the candidate set of the target object;
The disturbance unit is used for randomly disturbing the voting preference data to output a disturbance data set meeting a local differential privacy mechanism, wherein the disturbance data set is a subset of the candidate set;
the transformation unit is used for transforming the disturbance data set by using a preset function if the candidate item is transformed in the disturbance data set so as to generate unbiased estimation data of voting preference data of the target object;
and the sending unit is used for sending the unbiased estimation data of the voting preference data of the target object to a server so that the server can acquire the decision data.
9. A data acquisition device adapted for use with a server, the device comprising:
the receiving unit is used for receiving unbiased estimation data of voting preference data of a target object sent by a terminal, wherein the unbiased estimation data of the voting preference data are obtained by randomly perturbing the voting preference data by the terminal to output a perturbation data set meeting a local differential privacy mechanism and converting the perturbation data set, and the unbiased estimation data are obtained by adopting the method as claimed in claim 1;
A calculation unit, configured to calculate a voting integral unbiased estimation amount and a confidence interval of each candidate item in the voting preference data according to unbiased estimation data of the voting preference data;
and the decision unit is used for generating a voting decision according to the voting integral unbiased estimated quantity and the confidence interval of each candidate item in the voting preference data.
10. A storage medium storing instructions adapted to be loaded by a processor to perform the steps of the data generation method of any one of claims 1-3 or to perform the steps of the data acquisition method of any one of claims 4-7.
CN201911148392.5A 2019-11-21 2019-11-21 Data generation and acquisition methods, corresponding devices and storage medium Active CN111090877B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201911148392.5A CN111090877B (en) 2019-11-21 2019-11-21 Data generation and acquisition methods, corresponding devices and storage medium

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201911148392.5A CN111090877B (en) 2019-11-21 2019-11-21 Data generation and acquisition methods, corresponding devices and storage medium

Publications (2)

Publication Number Publication Date
CN111090877A CN111090877A (en) 2020-05-01
CN111090877B true CN111090877B (en) 2023-07-28

Family

ID=70393546

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201911148392.5A Active CN111090877B (en) 2019-11-21 2019-11-21 Data generation and acquisition methods, corresponding devices and storage medium

Country Status (1)

Country Link
CN (1) CN111090877B (en)

Families Citing this family (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111858575B (en) * 2020-08-05 2024-04-19 杭州锘崴信息科技有限公司 Private data analysis method and system
CN112380567A (en) * 2020-11-27 2021-02-19 南京航空航天大学 Investigation method with confidence based on localized differential privacy
CN115129978B (en) * 2022-05-27 2024-03-29 暨南大学 Preference query method, user terminal, server and storage medium

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109063502A (en) * 2018-08-13 2018-12-21 阿里巴巴集团控股有限公司 Data encryption, data analysing method and device
CN109902506A (en) * 2019-01-08 2019-06-18 中国科学院软件研究所 A kind of local difference private data sharing method and system of more privacy budgets
CN110309671A (en) * 2019-06-26 2019-10-08 复旦大学 General data based on random challenge technology issues method for secret protection
CN110443061A (en) * 2018-05-03 2019-11-12 阿里巴巴集团控股有限公司 A kind of data ciphering method and device

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110443061A (en) * 2018-05-03 2019-11-12 阿里巴巴集团控股有限公司 A kind of data ciphering method and device
CN109063502A (en) * 2018-08-13 2018-12-21 阿里巴巴集团控股有限公司 Data encryption, data analysing method and device
CN109902506A (en) * 2019-01-08 2019-06-18 中国科学院软件研究所 A kind of local difference private data sharing method and system of more privacy budgets
CN110309671A (en) * 2019-06-26 2019-10-08 复旦大学 General data based on random challenge technology issues method for secret protection

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
《本地差分隐私保护的数据统计分析研究》;王绍蔚;《中国博士学位论文全文数据库(信息科技辑 )》(第8期);全文 *

Also Published As

Publication number Publication date
CN111090877A (en) 2020-05-01

Similar Documents

Publication Publication Date Title
CN109241431B (en) Resource recommendation method and device
US10411945B2 (en) Time-distributed and real-time processing in information recommendation system, method and apparatus
US11169827B2 (en) Resource loading at application startup using attributes of historical data groups
CN109918669B (en) Entity determining method, device and storage medium
CN111090877B (en) Data generation and acquisition methods, corresponding devices and storage medium
CN110245293B (en) Network content recall method and device
CN108322523B (en) Application recommendation method, server and mobile terminal
CN111125523A (en) Searching method, searching device, terminal equipment and storage medium
CN110659179A (en) Method and device for evaluating system running condition and electronic equipment
CN114973351A (en) Face recognition method, device, equipment and storage medium
CN113032587B (en) Multimedia information recommendation method, system, device, terminal and server
CN109726726B (en) Event detection method and device in video
CN117332844A (en) Challenge sample generation method, related device and storage medium
CN112182461A (en) Method and device for calculating webpage sensitivity
CN114840565A (en) Sampling query method, device, electronic equipment and computer readable storage medium
CN114840570A (en) Data processing method and device, electronic equipment and storage medium
CN111666485B (en) Information recommendation method, device and terminal
CN112748835A (en) Terminal, server, recent task list display method and application recommendation method
Zhu et al. Motion-sensor fusion-based gesture recognition and its VLSI architecture design for mobile devices
CN116450808B (en) Data processing method and device and storage medium
US11463539B2 (en) Electronic device for transmitting and receiving data with server device
CN115412726B (en) Video authenticity detection method, device and storage medium
CN111914113B (en) Image retrieval method and related device
CN112181508B (en) Page automatic refreshing method and device and computer equipment
CN116975404A (en) Search evaluation method, device, equipment, storage medium and product

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant